# ReinforcementLearningBase.jl

ReinforcementLearningBase.STOCHASTICConstant

No chance player in the environment. And the game is stochastic. To help increase reproducibility, these environments should generally accept a AbstractRNG as a keyword argument. For some third-party environments, at least a seed is exposed in the constructor.

ReinforcementLearningBase.AbstractPolicyType
(π::AbstractPolicy)(env) -> action

Policy is the most basic concept in reinforcement learning. Unlike the definition in some other packages, here a policy is defined as a functional object which takes in an environment and returns an action.

Note

See discussions here if you are wondering why we define the input as AbstractEnv instead of state.

Warning

The policy π may change its internal state but it shouldn't change env. When it's really necessary, remember to make a copy of env to keep the original env untouched.

ReinforcementLearningBase.RewardStyleMethod

Specify whether we can get reward after each step or only at the end of an game. Possible values are STEP_REWARD (the default one) or TERMINAL_REWARD.

Note

Environments of TERMINAL_REWARD style can be viewed as a subset of environments of STEP_REWARD style. For some algorithms, like MCTS, we may have some a more efficient implementation for environments of TERMINAL_REWARD style.

ReinforcementLearningBase.stateMethod
state(env, style=[DefaultStateStyle(env)], player=[current_player(env)])

The state can be of any type. However, most neural network based algorithms assume an AbstractArray is returned. For environments with many different states provided (inner state, information state, etc), users need to provide style to declare which kind of state they want.

