Chapter 8.2 Dyna: Integrated Planning, Acting, and Learning

To demonstrate the flexibility of ReinforcementLearning.jl, the DynaAgent is also included and we'll explore its performance in this notebook.

12.8 μs
49.3 s

The Maze Environment

In this chapter, the authors introduced a specific maze environment. So let's define it by implementing the interfaces in ReinforcementLearning.jl.

8.7 μs
40.4 ms
4.2 ms
29.7 μs
25.6 μs
21.6 μs
26.3 μs
21.4 μs
19.9 μs
* (generic function with 861 methods)
449 μs
x
# MazeEnv

## Traits

| Trait Type        |                                            Value |
|:----------------- | ------------------------------------------------:|
| NumAgentStyle     |          ReinforcementLearningBase.SingleAgent() |
| DynamicStyle      |           ReinforcementLearningBase.Sequential() |
| InformationStyle  | ReinforcementLearningBase.ImperfectInformation() |
| ChanceStyle       |           ReinforcementLearningBase.Stochastic() |
| RewardStyle       |           ReinforcementLearningBase.StepReward() |
| UtilityStyle      |           ReinforcementLearningBase.GeneralSum() |
| ActionStyle       |     ReinforcementLearningBase.MinimalActionSet() |
| StateStyle        |     ReinforcementLearningBase.Observation{Any}() |
| DefaultStateStyle |     ReinforcementLearningBase.Observation{Any}() |

## Is Environment Terminated?

No

## State Space

`Base.OneTo(54)`

## Action Space

`Base.OneTo(4)`

## Current State

```
3
```
336 ms

Figure 8.2

5.5 μs
26.4 ms
plan_step (generic function with 1 method)
48.5 μs
8.9 s
26.2 ms
75.1 s

Figure 8.4

4.9 μs
cumulative_dyna_reward (generic function with 1 method)
79.6 μs
walls (generic function with 1 method)
47.1 μs
change_walls (generic function with 1 method)
20.3 μs
36.8 s

Figure 8.5

5.6 μs
new_walls (generic function with 1 method)
47.7 μs
new_change_walls (generic function with 1 method)
20.2 μs
120 s

Example 8.4

5.3 μs
run_once (generic function with 2 methods)
83.2 μs
136 s