ReinforcementLearningBase.jl
ReinforcementLearningBase.CHANCE_PLAYER
— ConstantBasic player type for a random step in game.
ReinforcementLearningBase.CONSTANT_SUM
— ConstantRewards of all players sum to a constant
ReinforcementLearningBase.DETERMINISTIC
— ConstantNo ChancePlayer
in the environment. And the game is fully deterministic.
ReinforcementLearningBase.EXPLICIT_STOCHASTIC
— ConstantUsually used to describe extensive-form game. The environment contains a chance player and the corresponding probability is known. Therefore, prob
(env, player=chance_player(env))
must be defined.
ReinforcementLearningBase.FULL_ACTION_SET
— ConstantAlias for FullActionSet()
ReinforcementLearningBase.GENERAL_SUM
— ConstantTotal rewards of all players may be different in each step
ReinforcementLearningBase.IDENTICAL_UTILITY
— ConstantEvery player gets the same reward
ReinforcementLearningBase.IMPERFECT_INFORMATION
— ConstantThe inner state of some players' observations may be different
ReinforcementLearningBase.MINIMAL_ACTION_SET
— ConstantAlias for MinimalActionSet()
ReinforcementLearningBase.PERFECT_INFORMATION
— ConstantAll players observe the same state
ReinforcementLearningBase.SAMPLED_STOCHASTIC
— ConstantEnvironment contains chance player and the probability is unknown. Usually only a dummy action is allowed in this case.
The chance player (chance_player
(env)
) must appears in the result of RLBase.players
(env)
. The result of action_space(env, chance_player)
should only contains one dummy action.
ReinforcementLearningBase.SEQUENTIAL
— ConstantEnvironment with the DynamicStyle
of SEQUENTIAL
must takes actions from different players one-by-one.
ReinforcementLearningBase.SIMULTANEOUS
— ConstantEnvironment with the DynamicStyle
of SIMULTANEOUS
must take in actions from some (or all) players at one time
ReinforcementLearningBase.SPECTATOR
— ConstantSPECTATOR
Spectator is a special player who doesn't take any action.
ReinforcementLearningBase.STEP_REWARD
— ConstantAlias for StepReward()
ReinforcementLearningBase.STOCHASTIC
— ConstantNo chance player in the environment. And the game is stochastic. To help increase reproducibility, these environments should generally accept a AbstractRNG
as a keyword argument. For some third-party environments, at least a seed
is exposed in the constructor.
ReinforcementLearningBase.TERMINAL_REWARD
— ConstantOnly get reward at the end of environment
ReinforcementLearningBase.ZERO_SUM
— ConstantRewards of all players sum to 0. A special case of [CONSTANT_SUM
].
ReinforcementLearningBase.AbstractEnv
— Typeact!(env::AbstractEnv, action, player=current_player(env))
Super type of all reinforcement learning environments.
ReinforcementLearningBase.AbstractEnvironmentModel
— TypeTODO:
Describe how to model a reinforcement learning environment. TODO: need more investigation Ref: https://bair.berkeley.edu/blog/2019/12/12/mbpo/
- Analytic gradient computation
- Sampling-based planning
- Model-based data generation
- Value-equivalence prediction Model-based Reinforcement Learning: A Survey. Tutorial on Model-Based Methods in Reinforcement Learning
ReinforcementLearningBase.AbstractPolicy
— Typeplan!(π::AbstractPolicy, env) -> action
The policy is the most basic concept in reinforcement learning. Here an agent's action is determined by a plan!
which takes an environment and policy and returns an action.
See discussions here if you are wondering why we define the input as AbstractEnv
instead of state.
The policy π
may change its internal state but it shouldn't change env
. When it's really necessary, remember to make a copy of env
to keep the original env
untouched.
ReinforcementLearningBase.ConstantSum
— TypeAbstractUtilityStyle
for environments where the sum of all players' rewards is constant.
ReinforcementLearningBase.Deterministic
— TypeAbstractChanceStyle
for fully deterministic games without a ChancePlayer
.
ReinforcementLearningBase.Episodic
— TypeThe environment will terminate in finite steps.
ReinforcementLearningBase.FullActionSet
— TypeThe action space of the environment may contains illegal actions. For environments of FULL_ACTION_SET
, legal_action_space
and legal_action_space_mask
must also be defined.
ReinforcementLearningBase.GeneralSum
— TypeAbstractUtilityStyle
for environments where the sum of all players' rewards is not constant.
ReinforcementLearningBase.GoalState
— TypeUse it to represent the goal state
ReinforcementLearningBase.IdenticalUtility
— TypeAbstractUtilityStyle
for environments where all players get the same reward.
ReinforcementLearningBase.ImperfectInformation
— TypeOther Player
s actions are not known by other Player
s.
ReinforcementLearningBase.InformationSet
— TypeSee the definition of information set
ReinforcementLearningBase.InternalState
— TypeUse it to represent the internal state.
ReinforcementLearningBase.MinimalActionSet
— TypeAll actions in the action space of the environment are legal
ReinforcementLearningBase.MultiAgent
— MethodMultiAgent(n::Integer) -> MultiAgent{n}()
n
must be ≥ 2.
ReinforcementLearningBase.NeverEnding
— TypeThe environment can run infinitely.
ReinforcementLearningBase.Observation
— TypeSometimes people from different field talk about the same thing with a different name. Here we set the Observation{Any}()
as the default state style in this package.
See discussions here
ReinforcementLearningBase.PerfectInformation
— TypeAll Player
s actions are visible to other Player
s.
ReinforcementLearningBase.Sequential
— TypePlayer
s act one after the other.
ReinforcementLearningBase.Simultaneous
— TypePlayer
s act at the same time.
ReinforcementLearningBase.SingleAgent
— TypeAbstractNumAgentStyle for environments with a single agent
ReinforcementLearningBase.StepReward
— TypeWe can get reward after each step
ReinforcementLearningBase.Stochastic
— TypeStochastic()
Default ChanceStyle
.
ReinforcementLearningBase.TerminalReward
— TypeOnly get reward at the end of environment
ReinforcementLearningBase.ZeroSum
— TypeAbstractUtilityStyle
for environments where the sum of all players' rewards is equal to zero.
Base.:==
— MethodBase.:(==)(env1::T, env2::T) where T<:AbstractEnv
Only check the state of all players in the env.
Base.copy
— MethodMake an independent copy of env
,
rng (if env
has) is also copied!
Random.seed!
— MethodSet the seed of internal rng
ReinforcementLearningBase.ActionStyle
— MethodActionStyle(env::AbstractEnv)
For environments of discrete actions, specify whether the current state of env
contains a full action set or a minimal action set. By default the MINIMAL_ACTION_SET
is returned.
ReinforcementLearningBase.ChanceStyle
— MethodChanceStyle(env) = STOCHASTIC
Specify which role the chance plays in the env
. Possible returns are:
STOCHASTIC
. This is the default return.DETERMINISTIC
EXPLICIT_STOCHASTIC
SAMPLED_STOCHASTIC
ReinforcementLearningBase.DefaultStateStyle
— MethodSpecify the default state style when calling state(env)
.
ReinforcementLearningBase.DynamicStyle
— MethodDynamicStyle(env::AbstractEnv) = SEQUENTIAL
Only valid in environments with a NumAgentStyle
of MultiAgent
. Determine whether the players can play simultaneously or not. Possible returns are:
SEQUENTIAL
. This is the default return.SIMULTANEOUS
.
ReinforcementLearningBase.InformationStyle
— MethodInformationStyle(env) = IMPERFECT_INFORMATION
Distinguish environments between PERFECT_INFORMATION
and IMPERFECT_INFORMATION
. IMPERFECT_INFORMATION
is returned by default.
ReinforcementLearningBase.NumAgentStyle
— MethodNumAgentStyle(env)
Number of agents involved in the env
. Possible returns are:
SingleAgent
. This is the default return.MultiAgent
.
ReinforcementLearningBase.RewardStyle
— MethodSpecify whether we can get reward after each step or only at the end of an game. Possible values are STEP_REWARD
(the default one) or TERMINAL_REWARD
.
Environments of TERMINAL_REWARD
style can be viewed as a subset of environments of STEP_REWARD
style. For some algorithms, like MCTS, we may have some a more efficient implementation for environments of TERMINAL_REWARD
style.
ReinforcementLearningBase.StateStyle
— MethodStateStyle(env::AbstractEnv)
Define the possible styles of state(env)
. Possible values are:
Observation{T}
. This is the default return.InternalState{T}
InformationSet{T}
- You can also define your customized state style when necessary.
Or a tuple contains several of the above ones.
This is useful for environments which provide more than one kind of state.
ReinforcementLearningBase.UtilityStyle
— MethodUtilityStyle(env::AbstractEnv)
Specify the utility style in multi-agent environments. Possible values are:
- GENERAL_SUM. The default return.
- ZERO_SUM
- CONSTANT_SUM
- IDENTICAL_UTILITY
ReinforcementLearningBase.action_space
— Functionaction_space(env, player=current_player(env))
Get all available actions from environment. See also: legal_action_space
ReinforcementLearningBase.chance_player
— Methodchance_player(env)
Only valid for environments with a chance player.
ReinforcementLearningBase.child
— Methodchild(env::AbstractEnv, action)
Treat the env
as a game tree. Create an independent child after applying action
.
ReinforcementLearningBase.current_player
— Methodcurrent_player(env)
Return the next player to take action. For Extensive Form Games, a chance player may be returned. (See also chance_player
) For SIMULTANEOUS environments, a simultaneous player is always returned. (See also simultaneous_player
).
ReinforcementLearningBase.is_terminated
— Methodis_terminated(env, player=current_player(env))
ReinforcementLearningBase.legal_action_space
— Functionlegal_action_space(env, player=current_player(env))
For environments of MINIMAL_ACTION_SET
, the result is the same with action_space
.
ReinforcementLearningBase.legal_action_space_mask
— Functionlegal_action_space_mask(env, player=current_player(env)) -> AbstractArray{Bool}
Required for environments of FULL_ACTION_SET
. As a default implementation, legal_action_space_mask
creates a mask of action_space
with the subset legal_action_space
.
ReinforcementLearningBase.next_player!
— Methodnext_player!(env::E) where {E<:AbstractEnv}
Advance to the next player. This is a no-op for single-player and simultaneous games. Sequential
MultiAgent
games should implement this method.
ReinforcementLearningBase.optimise!
— MethodRLBase.optimise!(π::AbstractPolicy, experience)
Optimise the policy π
with online/offline experience or parameters.
ReinforcementLearningBase.players
— Methodplayers(env::RLBaseEnv)
Players in the game. This is a no-op for single-player games. MultiAgent
games should implement this method.
ReinforcementLearningBase.priority
— Methodpriority(π::AbstractPolicy, experience)
Usually used in offline policies to evaluate the priorities of the experience.
ReinforcementLearningBase.prob
— FunctionGet the action distribution of chance player.
Only valid for environments of EXPLICIT_STOCHASTIC
style. The current player of env
must be the chance player.
ReinforcementLearningBase.prob
— Methodprob(π::AbstractPolicy, env, action)
Only valid for environments with discrete actions.
ReinforcementLearningBase.prob
— Methodprob(π::AbstractPolicy, env) -> Distribution
Get the probability distribution of actions based on policy π
given an env
.
ReinforcementLearningBase.reset!
— MethodReset the internal state of an environment
ReinforcementLearningBase.reward
— Functionreward(env, player=current_player(env))
ReinforcementLearningBase.simultaneous_player
— Methodsimultaneous_player(env)
Only valid for environments of SIMULTANEOUS
style.
ReinforcementLearningBase.spectator_player
— Methodspectator_player(env)
Used in imperfect multi-agent environments.
ReinforcementLearningBase.state
— Methodstate(env, style=[DefaultStateStyle(env)], player=[current_player(env)])
The state can be of any type. However, most neural network based algorithms assume an AbstractArray
is returned. For environments with many different states provided (inner state, information state, etc), users need to provide style
to declare which kind of state they want.
The state may be reused and be mutated at each step. Always remember to make a copy if this is not what you expect.
ReinforcementLearningBase.state_space
— Methodstate_space(env, style=[DefaultStateStyle(env)], player=[current_player(env)])
Describe all possible states.
ReinforcementLearningBase.test_interfaces!
— MethodCall this function after writing your customized environment to make sure that all the necessary interfaces are implemented correctly and consistently.
ReinforcementLearningBase.walk
— Methodwalk(f, env::AbstractEnv)
Call f
with env
and its descendants. Only use it with small games.