47.7 s

Left Right Environment

5.5 μs
4.8 ms
world
# LeftRightEnv

## Traits

| Trait Type        |                                            Value |
|:----------------- | ------------------------------------------------:|
| NumAgentStyle     |          ReinforcementLearningBase.SingleAgent() |
| DynamicStyle      |           ReinforcementLearningBase.Sequential() |
| InformationStyle  | ReinforcementLearningBase.ImperfectInformation() |
| ChanceStyle       |           ReinforcementLearningBase.Stochastic() |
| RewardStyle       |           ReinforcementLearningBase.StepReward() |
| UtilityStyle      |           ReinforcementLearningBase.GeneralSum() |
| ActionStyle       |     ReinforcementLearningBase.MinimalActionSet() |
| StateStyle        |     ReinforcementLearningBase.Observation{Any}() |
| DefaultStateStyle |     ReinforcementLearningBase.Observation{Any}() |

## Is Environment Terminated?

No

## State Space

`Base.OneTo(2)`

## Action Space

`Base.OneTo(2)`

## Current State

```
1
```
2.2 ms
2
81.0 ns
π_t
VBasedPolicy
├─ learner => MonteCarloLearner
│  ├─ approximator
│  │  ├─ 1
│  │  │  └─ TabularApproximator
│  │  │     ├─ table => 2-element Array{Float64,1}
│  │  │     └─ optimizer => Descent
│  │  │        └─ eta => 1.0
│  │  └─ 2
│  │     └─ TabularApproximator
│  │        ├─ table => 2-element Array{Float64,1}
│  │        └─ optimizer => InvDecay
│  │           ├─ gamma => 1.0
│  │           └─ state => IdDict
│  ├─ γ => 1.0
│  ├─ kind => ReinforcementLearningZoo.FirstVisit
│  └─ sampling => ReinforcementLearningZoo.OrdinaryImportanceSampling
└─ mapping => Main.var"#1#2"
79.2 ms
37.0 μs
1.6 ms
119 μs