Example 6.5: Windy Gridworld

First, let's define this environment by implementing the interfaces defined in RLBase.

13.6 μs
41.1 s
world
# WindyGridWorldEnv

## Traits

| Trait Type        |                                            Value |
|:----------------- | ------------------------------------------------:|
| NumAgentStyle     |          ReinforcementLearningBase.SingleAgent() |
| DynamicStyle      |           ReinforcementLearningBase.Sequential() |
| InformationStyle  | ReinforcementLearningBase.ImperfectInformation() |
| ChanceStyle       |           ReinforcementLearningBase.Stochastic() |
| RewardStyle       |           ReinforcementLearningBase.StepReward() |
| UtilityStyle      |           ReinforcementLearningBase.GeneralSum() |
| ActionStyle       |     ReinforcementLearningBase.MinimalActionSet() |
| StateStyle        |     ReinforcementLearningBase.Observation{Any}() |
| DefaultStateStyle |     ReinforcementLearningBase.Observation{Any}() |

## Is Environment Terminated?

No

## State Space

`Base.OneTo(70)`

## Action Space

`Base.OneTo(4)`

## Current State

```
4
```
3.6 ms
agent
Agent
├─ policy => QBasedPolicy
│  ├─ learner => TDLearner
│  │  ├─ approximator => TabularApproximator
│  │  │  ├─ table => 4×70 Array{Float64,2}
│  │  │  └─ optimizer => Descent
│  │  │     └─ eta => 0.5
│  │  ├─ γ => 1.0
│  │  ├─ method => SARSA
│  │  └─ n => 0
│  └─ explorer => EpsilonGreedyExplorer
│     ├─ ϵ_stable => 0.1
│     ├─ ϵ_init => 1.0
│     ├─ warmup_steps => 0
│     ├─ decay_steps => 0
│     ├─ step => 1
│     ├─ rng => Random._GLOBAL_RNG
│     └─ is_training => true
└─ trajectory => Trajectory
   └─ traces => NamedTuple
      ├─ state => 0-element Array{Int64,1}
      ├─ action => 0-element Array{Int64,1}
      ├─ reward => 0-element Array{Float32,1}
      └─ terminal => 0-element Array{Bool,1}
209 ms
hook
3.2 ms
1.3 s
8.9 s
3.7 s