Chapter 13 Short Corridor

8.6 μs
74.2 s
5.9 ms
world
# ShortCorridorEnv

## Traits

| Trait Type        |                                            Value |
|:----------------- | ------------------------------------------------:|
| NumAgentStyle     |          ReinforcementLearningBase.SingleAgent() |
| DynamicStyle      |           ReinforcementLearningBase.Sequential() |
| InformationStyle  | ReinforcementLearningBase.ImperfectInformation() |
| ChanceStyle       |           ReinforcementLearningBase.Stochastic() |
| RewardStyle       |           ReinforcementLearningBase.StepReward() |
| UtilityStyle      |           ReinforcementLearningBase.GeneralSum() |
| ActionStyle       |     ReinforcementLearningBase.MinimalActionSet() |
| StateStyle        |     ReinforcementLearningBase.Observation{Any}() |
| DefaultStateStyle |     ReinforcementLearningBase.Observation{Any}() |

## Is Environment Terminated?

No

## State Space

`Base.OneTo(4)`

## Action Space

`Base.OneTo(2)`

## Current State

```
1
```
3.3 ms
65.0 ns
run_once (generic function with 1 method)
67.9 μs
X
0.05:0.05:0.95
2.3 μs
24.7 s

REINFORCE Policy

Based on descriptions in Chapter 13.1, we need to define a new customized approximator.

6.9 μs
35.9 ms
36.9 ms
run_once_RL (generic function with 1 method)
68.7 μs
9.5 s

Interested in how to reproduce figure 13.2? A PR is warmly welcomed! See you there!

5.2 μs