ReinforcementLearningBase.jl
ReinforcementLearningBase.RLBase
— ModuleReinforcementLearningBase.jl (RLBase) provides some common constants, traits, abstractions and interfaces in developing reinforcement learning algorithms in Julia.
Basically, we defined the following two main concepts in reinforcement learning:
ReinforcementLearningBase.CONSTANT_SUM
— ConstantRewards of all players sum to a constant
ReinforcementLearningBase.DETERMINISTIC
— ConstantNo chance player in the environment. And the game is fully deterministic.
ReinforcementLearningBase.EXPLICIT_STOCHASTIC
— ConstantUsually used to describe extensive-form game. The environment contains a chance player and the corresponding probability is known. Therefore, prob
(env, player=chance_player(env))
must be defined.
ReinforcementLearningBase.FULL_ACTION_SET
— ConstantThe action space of the environment may contains illegal actions
ReinforcementLearningBase.GENERAL_SUM
— ConstantTotal rewards of all players may be different in each step
ReinforcementLearningBase.IDENTICAL_UTILITY
— ConstantEvery player gets the same reward
ReinforcementLearningBase.IMPERFECT_INFORMATION
— ConstantThe inner state of some players' observations may be different
ReinforcementLearningBase.MINIMAL_ACTION_SET
— ConstantAll actions in the action space of the environment are legal
ReinforcementLearningBase.PERFECT_INFORMATION
— ConstantAll players observe the same state
ReinforcementLearningBase.SAMPLED_STOCHASTIC
— ConstantEnvironment contains chance player and the probability is unknown. Usually only a dummy action is allowed in this case.
The chance player (chance_player
(env)
) must appears in the result of players
(env)
. The result of action_space(env, chance_player)
should only contains one dummy action.
ReinforcementLearningBase.SEQUENTIAL
— ConstantEnvironment with the DynamicStyle
of SEQUENTIAL
must takes actions from different players one-by-one.
ReinforcementLearningBase.SIMULTANEOUS
— ConstantEnvironment with the DynamicStyle
of SIMULTANEOUS
must take in actions from some (or all) players at one time
ReinforcementLearningBase.STEP_REWARD
— ConstantWe can get reward after each step
ReinforcementLearningBase.STOCHASTIC
— ConstantNo chance player in the environment. And the game is stochastic. To help increase reproducibility, these environments should generally accept a AbstractRNG
as a keyword argument. For some third-party environments, at least a seed
is exposed in the constructor.
ReinforcementLearningBase.TERMINAL_REWARD
— ConstantOnly get reward at the end of environment
ReinforcementLearningBase.ZERO_SUM
— ConstantRewards of all players sum to 0. A special case of [CONSTANT_SUM
].
ReinforcementLearningBase.AbstractEnv
— Type(env::AbstractEnv)(action, player=current_player(env))
Super type of all reinforcement learning environments.
ReinforcementLearningBase.AbstractEnvironmentModel
— TypeTODO:
Describe how to model a reinforcement learning environment. TODO: need more investigation Ref: https://bair.berkeley.edu/blog/2019/12/12/mbpo/
- Analytic gradient computation
- Sampling-based planning
- Model-based data generation
- Value-equivalence prediction Model-based Reinforcement Learning: A Survey. Tutorial on Model-Based Methods in Reinforcement Learning
ReinforcementLearningBase.AbstractPolicy
— Type(π::AbstractPolicy)(env) -> action
Policy is the most basic concept in reinforcement learning. Unlike the definition in some other packages, here a policy is defined as a functional object which takes in an environment and returns an action.
See discussions here if you are wondering why we define the input as AbstractEnv
instead of state.
The policy π
may change its internal state but it shouldn't change env
. When it's really necessary, remember to make a copy of env
to keep the original env
untouched.
ReinforcementLearningBase.GoalState
— TypeUse it to represent the goal state
ReinforcementLearningBase.InformationSet
— TypeSee the definition of information set
ReinforcementLearningBase.InternalState
— TypeUse it to represent the internal state.
ReinforcementLearningBase.MultiAgent
— MethodMultiAgent(n::Integer) -> MultiAgent{n}()
n
must be ≥ 2.
ReinforcementLearningBase.Observation
— TypeSometimes people from different field talk about the same thing with a different name. Here we set the Observation{Any}()
as the default state style in this package.
See discussions here
ReinforcementLearningBase.Space
— TypeA wrapper to treat each element as a sub-space which supports:
Base.in
Random.rand
ReinforcementLearningBase.WorldSpace
— TypeIn some cases, we may not be interested in the action/state space. One can return WorldSpace()
to keep the interface consistent.
Base.:==
— MethodBase.:(==)(env1::T, env2::T) where T<:AbstractEnv
Only check the state of all players in the env.
Base.copy
— MethodMake an independent copy of env
,
rng (if env
has) is also copied!
Random.seed!
— MethodSet the seed of internal rng
ReinforcementLearningBase.ActionStyle
— MethodActionStyle(env::AbstractEnv)
For environments of discrete actions, specify whether the current state of env
contains a full action set or a minimal action set. By default the MINIMAL_ACTION_SET
is returned.
ReinforcementLearningBase.ChanceStyle
— MethodChanceStyle(env) = STOCHASTIC
Specify which role the chance plays in the env
. Possible returns are:
STOCHASTIC
. This is the default return.DETERMINISTIC
EXPLICIT_STOCHASTIC
SAMPLED_STOCHASTIC
ReinforcementLearningBase.DefaultStateStyle
— MethodSpecify the default state style when calling state(env)
.
ReinforcementLearningBase.DynamicStyle
— MethodDynamicStyle(env::AbstractEnv) = SEQUENTIAL
Only valid in environments with a NumAgentStyle
of MultiAgent
. Determine whether the players can play simultaneously or not. Possible returns are:
SEQUENTIAL
. This is the default return.SIMULTANEOUS
.
ReinforcementLearningBase.InformationStyle
— MethodInformationStyle(env) = IMPERFECT_INFORMATION
Distinguish environments between PERFECT_INFORMATION
and IMPERFECT_INFORMATION
. IMPERFECT_INFORMATION
is returned by default.
ReinforcementLearningBase.NumAgentStyle
— MethodNumAgentStyle(env)
Number of agents involved in the env
. Possible returns are:
SINGLE_AGENT
. This is the default return.- [
MultiAgent
][@ref].
ReinforcementLearningBase.RewardStyle
— MethodSpecify whether we can get reward after each step or only at the end of an game. Possible values are STEP_REWARD
(the default one) or TERMINAL_REWARD
.
Environments of TERMINAL_REWARD
style can be viewed as a subset of environments of STEP_REWARD
style. For some algorithms, like MCTS, we may have some a more efficient implementation for environments of TERMINAL_REWARD
style.
ReinforcementLearningBase.StateStyle
— MethodStateStyle(env::AbstractEnv)
Define the possible styles of state(env)
. Possible values are:
Observation{T}
. This is the default return.InternalState{T}
Information{T}
- You can also define your customized state style when necessary.
Or a tuple contains several of the above ones.
This is useful for environments which provide more than one kind of state.
ReinforcementLearningBase.UtilityStyle
— MethodUtilityStyle(env::AbstractEnv)
Specify the utility style in multi-agent environments. Possible values are:
- GENERAL_SUM. The default return.
- ZERO_SUM
- CONSTANT_SUM
- IDENTICAL_UTILITY
ReinforcementLearningBase.action_space
— Functionaction_space(env, player=current_player(env))
Get all available actions from environment. See also: legal_action_space
ReinforcementLearningBase.chance_player
— Methodchance_player(env)
Only valid for environments with a chance player.
ReinforcementLearningBase.child
— Methodchild(env::AbstractEnv, action)
Treat the env
as a game tree. Create an independent child after applying action
.
ReinforcementLearningBase.current_player
— Methodcurrent_player(env)
Return the next player to take action. For Extensive Form Games, a chance player may be returned. (See also chance_player
) For SIMULTANEOUS environments, a simultaneous player is always returned. (See also simultaneous_player
).
ReinforcementLearningBase.is_terminated
— Methodis_terminated(env, player=current_player(env))
ReinforcementLearningBase.legal_action_space
— Functionlegal_action_space(env, player=current_player(env))
For environments of MINIMAL_ACTION_SET
, the result is the same with action_space
.
ReinforcementLearningBase.legal_action_space_mask
— Functionlegal_action_space_mask(env, player=current_player(env)) -> AbstractArray{Bool}
Required for environments of FULL_ACTION_SET
. As a default implementation, legal_action_space_mask
creates a mask of action_space
with the subset legal_action_space
.
ReinforcementLearningBase.priority
— Methodpriority(π::AbstractPolicy, experience)
Usually used in offline policies.
ReinforcementLearningBase.prob
— FunctionGet the action distribution of chance player.
Only valid for environments of EXPLICIT_STOCHASTIC
style. The current player of env
must be the chance player.
ReinforcementLearningBase.prob
— Methodprob(π::AbstractPolicy, env, action)
Only valid for environments with discrete actions.
ReinforcementLearningBase.prob
— Methodprob(π::AbstractPolicy, env) -> Distribution
Get the probability distribution of actions based on policy π
given an env
.
ReinforcementLearningBase.reset!
— MethodReset the internal state of an environment
ReinforcementLearningBase.reward
— Functionreward(env, player=current_player(env))
ReinforcementLearningBase.simultaneous_player
— Methodsimultaneous_player(env)
Only valid for environments of SIMULTANEOUS
style.
ReinforcementLearningBase.spectator_player
— Methodspectator_player(env)
Used in imperfect multi-agent environments.
ReinforcementLearningBase.state
— Methodstate(env, style=[DefaultStateStyle(env)], player=[current_player(env)])
The state can be of any type. However, most neural network based algorithms assume an AbstractArray
is returned. For environments with many different states provided (inner state, information state, etc), users need to provide style
to declare which kind of state they want.
The state may be reused and be mutated at each step. Always remember to make a copy if this is not what you expect.
ReinforcementLearningBase.state_space
— Methodstate_space(env, style=[DefaultStateStyle(env)], player=[current_player(env)])
Describe all possible states.
ReinforcementLearningBase.test_interfaces!
— MethodCall this function after writing your customized environment to make sure that all the necessary interfaces are implemented correctly and consistently.
ReinforcementLearningBase.update!
— Methodupdate!(π::AbstractPolicy, experience)
Update the policy π
with online/offline experience or parameters.
ReinforcementLearningBase.walk
— Methodwalk(f, env::AbstractEnv)
Call f
with env
and its descendants. Only use it with small games.