Chapter 6.2 Random Walk

7.2 μs
49.9 s

In this section, we'll use the RandomWalk1D env provided in ReinforcementLearning.

7.3 μs
env
# RandomWalk1D

## Traits

| Trait Type        |                                          Value |
|:----------------- | ----------------------------------------------:|
| NumAgentStyle     |        ReinforcementLearningBase.SingleAgent() |
| DynamicStyle      |         ReinforcementLearningBase.Sequential() |
| InformationStyle  | ReinforcementLearningBase.PerfectInformation() |
| ChanceStyle       |      ReinforcementLearningBase.Deterministic() |
| RewardStyle       |     ReinforcementLearningBase.TerminalReward() |
| UtilityStyle      |         ReinforcementLearningBase.GeneralSum() |
| ActionStyle       |   ReinforcementLearningBase.MinimalActionSet() |
| StateStyle        | ReinforcementLearningBase.Observation{Int64}() |
| DefaultStateStyle | ReinforcementLearningBase.Observation{Int64}() |

## Is Environment Terminated?

No

## State Space

`Base.OneTo(7)`

## Action Space

`Base.OneTo(2)`

## Current State

```
4
```
26.5 ms
512 ns

As is explained in the book, the true values of state A to E are:

8.1 μs
true_values
1.6 μs

To estimate the state values, we'll use the VBasedPolicy with a random action generator.

5.1 μs
create_TD_agent (generic function with 1 method)
50.7 μs
4.7 s

To calculate the RMS error, we need to define such a hook first.

4.1 μs
RecordRMS
5.8 ms
145 μs

Now let's take a look at the performance of TDLearner under different α.

4.4 μs
2.0 s

Then we can compare the differences between TDLearner and MonteCarloLearner.

4.6 μs
create_MC_agent (generic function with 1 method)
48.7 μs
299 ms
85.0 ns
578 ms

Warning

Some of you might have noticed that the above figure is not the same with the one on the book of Figure 6.2. Actually we are not doing **BATCH TRAINING** here, because we're emptying the `trajectory` at the end of each episode. We leave it as an exercise for readers to practice developing new customized algorithms with `ReinforcementLearning.jl`. 😉
2.5 μs