# ReinforcementLearningBase.jl

`ReinforcementLearningBase.RLBase`

— ModuleReinforcementLearningBase.jl (**RLBase**) provides some common constants, traits, abstractions and interfaces in developing reinforcement learning algorithms in Julia.

Basically, we defined the following two main concepts in reinforcement learning:

`ReinforcementLearningBase.CONSTANT_SUM`

— ConstantRewards of all players sum to a constant

`ReinforcementLearningBase.DETERMINISTIC`

— ConstantNo chance player in the environment. And the game is fully deterministic.

`ReinforcementLearningBase.EXPLICIT_STOCHASTIC`

— ConstantUsually used to describe extensive-form game. The environment contains a chance player and the corresponding probability is known. Therefore, `prob`

`(env, player=chance_player(env))`

must be defined.

`ReinforcementLearningBase.FULL_ACTION_SET`

— ConstantThe action space of the environment may contains illegal actions

`ReinforcementLearningBase.GENERAL_SUM`

— ConstantTotal rewards of all players may be different in each step

`ReinforcementLearningBase.IDENTICAL_UTILITY`

— ConstantEvery player gets the same reward

`ReinforcementLearningBase.IMPERFECT_INFORMATION`

— ConstantThe inner state of some players' observations may be different

`ReinforcementLearningBase.MINIMAL_ACTION_SET`

— ConstantAll actions in the action space of the environment are legal

`ReinforcementLearningBase.PERFECT_INFORMATION`

— ConstantAll players observe the same state

`ReinforcementLearningBase.SAMPLED_STOCHASTIC`

— ConstantEnvironment contains chance player and the probability is unknown. Usually only a dummy action is allowed in this case.

The chance player (`chance_player`

`(env)`

) must appears in the result of `players`

`(env)`

. The result of `action_space(env, chance_player)`

should only contains one dummy action.

`ReinforcementLearningBase.SEQUENTIAL`

— ConstantEnvironment with the `DynamicStyle`

of `SEQUENTIAL`

must takes actions from different players one-by-one.

`ReinforcementLearningBase.SIMULTANEOUS`

— ConstantEnvironment with the `DynamicStyle`

of `SIMULTANEOUS`

must take in actions from some (or all) players at one time

`ReinforcementLearningBase.STEP_REWARD`

— ConstantWe can get reward after each step

`ReinforcementLearningBase.STOCHASTIC`

— ConstantNo chance player in the environment. And the game is stochastic. To help increase reproducibility, these environments should generally accept a `AbstractRNG`

as a keyword argument. For some third-party environments, at least a `seed`

is exposed in the constructor.

`ReinforcementLearningBase.TERMINAL_REWARD`

— ConstantOnly get reward at the end of environment

`ReinforcementLearningBase.ZERO_SUM`

— ConstantRewards of all players sum to 0. A special case of [`CONSTANT_SUM`

].

`ReinforcementLearningBase.AbstractEnv`

— Type`(env::AbstractEnv)(action, player=current_player(env))`

Super type of all reinforcement learning environments.

`ReinforcementLearningBase.AbstractEnvironmentModel`

— TypeTODO:

Describe how to model a reinforcement learning environment. TODO: need more investigation Ref: https://bair.berkeley.edu/blog/2019/12/12/mbpo/

- Analytic gradient computation
- Sampling-based planning
- Model-based data generation
- Value-equivalence prediction Model-based Reinforcement Learning: A Survey. Tutorial on Model-Based Methods in Reinforcement Learning

`ReinforcementLearningBase.AbstractPolicy`

— Type`(π::AbstractPolicy)(env) -> action`

Policy is the most basic concept in reinforcement learning. Unlike the definition in some other packages, here a policy is defined as a functional object which takes in an environment and returns an action.

See discussions here if you are wondering why we define the input as `AbstractEnv`

instead of state.

The policy `π`

may change its internal state but it shouldn't change `env`

. When it's really necessary, remember to make a copy of `env`

to keep the original `env`

untouched.

`ReinforcementLearningBase.GoalState`

— TypeUse it to represent the goal state

`ReinforcementLearningBase.InformationSet`

— TypeSee the definition of information set

`ReinforcementLearningBase.InternalState`

— TypeUse it to represent the internal state.

`ReinforcementLearningBase.MultiAgent`

— Method`MultiAgent(n::Integer) -> MultiAgent{n}()`

`n`

must be ≥ 2.

`ReinforcementLearningBase.Observation`

— TypeSometimes people from different field talk about the same thing with a different name. Here we set the `Observation{Any}()`

as the default state style in this package.

See discussions here

`ReinforcementLearningBase.Space`

— TypeA wrapper to treat each element as a sub-space which supports:

`Base.in`

`Random.rand`

`ReinforcementLearningBase.WorldSpace`

— TypeIn some cases, we may not be interested in the action/state space. One can return `WorldSpace()`

to keep the interface consistent.

`Base.:==`

— Method`Base.:(==)(env1::T, env2::T) where T<:AbstractEnv`

Only check the state of all players in the env.

`Base.copy`

— MethodMake an independent copy of `env`

,

rng (if `env`

has) is also copied!

`Random.seed!`

— MethodSet the seed of internal rng

`ReinforcementLearningBase.ActionStyle`

— Method`ActionStyle(env::AbstractEnv)`

For environments of discrete actions, specify whether the current state of `env`

contains a full action set or a minimal action set. By default the `MINIMAL_ACTION_SET`

is returned.

`ReinforcementLearningBase.ChanceStyle`

— Method`ChanceStyle(env) = STOCHASTIC`

Specify which role the chance plays in the `env`

. Possible returns are:

`STOCHASTIC`

. This is the default return.`DETERMINISTIC`

`EXPLICIT_STOCHASTIC`

`SAMPLED_STOCHASTIC`

`ReinforcementLearningBase.DefaultStateStyle`

— MethodSpecify the default state style when calling `state(env)`

.

`ReinforcementLearningBase.DynamicStyle`

— Method`DynamicStyle(env::AbstractEnv) = SEQUENTIAL`

Only valid in environments with a `NumAgentStyle`

of `MultiAgent`

. Determine whether the players can play simultaneously or not. Possible returns are:

`SEQUENTIAL`

. This is the default return.`SIMULTANEOUS`

.

`ReinforcementLearningBase.InformationStyle`

— Method`InformationStyle(env) = IMPERFECT_INFORMATION`

Distinguish environments between `PERFECT_INFORMATION`

and `IMPERFECT_INFORMATION`

. `IMPERFECT_INFORMATION`

is returned by default.

`ReinforcementLearningBase.NumAgentStyle`

— Method`NumAgentStyle(env)`

Number of agents involved in the `env`

. Possible returns are:

`SINGLE_AGENT`

. This is the default return.- [
`MultiAgent`

][@ref].

`ReinforcementLearningBase.RewardStyle`

— MethodSpecify whether we can get reward after each step or only at the end of an game. Possible values are `STEP_REWARD`

(the default one) or `TERMINAL_REWARD`

.

Environments of `TERMINAL_REWARD`

style can be viewed as a subset of environments of `STEP_REWARD`

style. For some algorithms, like MCTS, we may have some a more efficient implementation for environments of `TERMINAL_REWARD`

style.

`ReinforcementLearningBase.StateStyle`

— Method`StateStyle(env::AbstractEnv)`

Define the possible styles of `state(env)`

. Possible values are:

`Observation{T}`

. This is the default return.`InternalState{T}`

`Information{T}`

- You can also define your customized state style when necessary.

Or a tuple contains several of the above ones.

This is useful for environments which provide more than one kind of state.

`ReinforcementLearningBase.UtilityStyle`

— Method`UtilityStyle(env::AbstractEnv)`

Specify the utility style in multi-agent environments. Possible values are:

- GENERAL_SUM. The default return.
- ZERO_SUM
- CONSTANT_SUM
- IDENTICAL_UTILITY

`ReinforcementLearningBase.action_space`

— Function`action_space(env, player=current_player(env))`

Get all available actions from environment. See also: `legal_action_space`

`ReinforcementLearningBase.chance_player`

— Method`chance_player(env)`

Only valid for environments with a chance player.

`ReinforcementLearningBase.child`

— Method`child(env::AbstractEnv, action)`

Treat the `env`

as a game tree. Create an independent child after applying `action`

.

`ReinforcementLearningBase.current_player`

— Method`current_player(env)`

Return the next player to take action. For Extensive Form Games, a *chance player* may be returned. (See also `chance_player`

) For SIMULTANEOUS environments, a *simultaneous player* is always returned. (See also `simultaneous_player`

).

`ReinforcementLearningBase.is_terminated`

— Method`is_terminated(env, player=current_player(env))`

`ReinforcementLearningBase.legal_action_space`

— Function`legal_action_space(env, player=current_player(env))`

For environments of `MINIMAL_ACTION_SET`

, the result is the same with `action_space`

.

`ReinforcementLearningBase.legal_action_space_mask`

— Function`legal_action_space_mask(env, player=current_player(env)) -> AbstractArray{Bool}`

Required for environments of `FULL_ACTION_SET`

.

`ReinforcementLearningBase.priority`

— Method`priority(π::AbstractPolicy, experience)`

Usually used in offline policies.

`ReinforcementLearningBase.prob`

— FunctionGet the action distribution of chance player.

Only valid for environments of `EXPLICIT_STOCHASTIC`

style. The current player of `env`

must be the chance player.

`ReinforcementLearningBase.prob`

— Method`prob(π::AbstractPolicy, env, action)`

Only valid for environments with discrete actions.

`ReinforcementLearningBase.prob`

— Method`prob(π::AbstractPolicy, env) -> Distribution`

Get the probability distribution of actions based on policy `π`

given an `env`

.

`ReinforcementLearningBase.reset!`

— MethodReset the internal state of an environment

`ReinforcementLearningBase.reward`

— Function`reward(env, player=current_player(env))`

`ReinforcementLearningBase.simultaneous_player`

— Method`simultaneous_player(env)`

Only valid for environments of `SIMULTANEOUS`

style.

`ReinforcementLearningBase.spectator_player`

— Method`spectator_player(env)`

Used in imperfect multi-agent environments.

`ReinforcementLearningBase.state`

— Method`state(env, style=[DefaultStateStyle(env)], player=[current_player(env)])`

The state can be of any type. However, most neural network based algorithms assume an `AbstractArray`

is returned. For environments with many different states provided (inner state, information state, etc), users need to provide `style`

to declare which kind of state they want.

The state **may** be reused and be mutated at each step. Always remember to make a copy if this is not what you expect.

`ReinforcementLearningBase.state_space`

— Method`state_space(env, style=[DefaultStateStyle(env)], player=[current_player(env)])`

Describe all possible states.

`ReinforcementLearningBase.test_interfaces!`

— MethodCall this function after writing your customized environment to make sure that all the necessary interfaces are implemented correctly and consistently.

`ReinforcementLearningBase.update!`

— Method`update!(π::AbstractPolicy, experience)`

Update the policy `π`

with online/offline experience or parameters.

`ReinforcementLearningBase.walk`

— Method`walk(f, env::AbstractEnv)`

Call `f`

with `env`

and its descendants. Only use it with small games.