ReinforcementLearningBase.jl

ReinforcementLearningBase.SAMPLED_STOCHASTICConstant

Environment contains chance player and the probability is unknown. Usually only a dummy action is allowed in this case.

Note

The chance player (chance_player(env)) must appears in the result of players(env). The result of action_space(env, chance_player) should only contains one dummy action.

source
ReinforcementLearningBase.STOCHASTICConstant

No chance player in the environment. And the game is stochastic. To help increase reproducibility, these environments should generally accept a AbstractRNG as a keyword argument. For some third-party environments, at least a seed is exposed in the constructor.

source
ReinforcementLearningBase.AbstractPolicyType
(π::AbstractPolicy)(env) -> action

Policy is the most basic concept in reinforcement learning. Unlike the definition in some other packages, here a policy is defined as a functional object which takes in an environment and returns an action.

Note

See discussions here if you are wondering why we define the input as AbstractEnv instead of state.

Warning

The policy π may change its internal state but it shouldn't change env. When it's really necessary, remember to make a copy of env to keep the original env untouched.

source
ReinforcementLearningBase.ObservationType

Sometimes people from different field talk about the same thing with a different name. Here we set the Observation{Any}() as the default state style in this package.

See discussions here

source
Base.:==Method
Base.:(==)(env1::T, env2::T) where T<:AbstractEnv
Warning

Only check the state of all players in the env.

source
Base.copyMethod

Make an independent copy of env,

Note

rng (if env has) is also copied!

source
ReinforcementLearningBase.stateMethod
state(env, style=[DefaultStateStyle(env)], player=[current_player(env)])

The state can be of any type. However, most neural network based algorithms assume an AbstractArray is returned. For environments with many different states provided (inner state, information state, etc), users need to provide style to declare which kind of state they want.

Warning

The state may be reused and be mutated at each step. Always remember to make a copy if this is not what you expect.

source