ReinforcementLearningBase.jl

ReinforcementLearningBase.CHANCE_PLAYER — Constant

Basic player type for a random step in game.

source

ReinforcementLearningBase.CONSTANT_SUM — Constant

Rewards of all players sum to a constant

source

ReinforcementLearningBase.DETERMINISTIC — Constant

No ChancePlayer in the environment. And the game is fully deterministic.

source

ReinforcementLearningBase.EXPLICIT_STOCHASTIC — Constant

Usually used to describe extensive-form game. The environment contains a chance player and the corresponding probability is known. Therefore, prob(env, player=chance_player(env)) must be defined.

source

ReinforcementLearningBase.FULL_ACTION_SET — Constant

Alias for FullActionSet()

source

ReinforcementLearningBase.GENERAL_SUM — Constant

Total rewards of all players may be different in each step

source

ReinforcementLearningBase.IDENTICAL_UTILITY — Constant

Every player gets the same reward

source

ReinforcementLearningBase.IMPERFECT_INFORMATION — Constant

The inner state of some players' observations may be different

source

ReinforcementLearningBase.MINIMAL_ACTION_SET — Constant

Alias for MinimalActionSet()

source

ReinforcementLearningBase.PERFECT_INFORMATION — Constant

All players observe the same state

source

ReinforcementLearningBase.SAMPLED_STOCHASTIC — Constant

Environment contains chance player and the probability is unknown. Usually only a dummy action is allowed in this case.

Note

The chance player (chance_player(env)) must appears in the result of RLBase.players(env). The result of action_space(env, chance_player) should only contains one dummy action.

source

ReinforcementLearningBase.SEQUENTIAL — Constant

Environment with the DynamicStyle of SEQUENTIAL must takes actions from different players one-by-one.

source

ReinforcementLearningBase.SIMULTANEOUS — Constant

Environment with the DynamicStyle of SIMULTANEOUS must take in actions from some (or all) players at one time

source

ReinforcementLearningBase.SPECTATOR — Constant

SPECTATOR

Spectator is a special player who doesn't take any action.

source

ReinforcementLearningBase.STEP_REWARD — Constant

Alias for StepReward()

source

ReinforcementLearningBase.STOCHASTIC — Constant

No chance player in the environment. And the game is stochastic. To help increase reproducibility, these environments should generally accept a AbstractRNG as a keyword argument. For some third-party environments, at least a seed is exposed in the constructor.

source

ReinforcementLearningBase.TERMINAL_REWARD — Constant

Only get reward at the end of environment

source

ReinforcementLearningBase.ZERO_SUM — Constant

Rewards of all players sum to 0. A special case of [CONSTANT_SUM].

source

ReinforcementLearningBase.AbstractEnv — Type

act!(env::AbstractEnv, action, player=current_player(env))

Super type of all reinforcement learning environments.

source

ReinforcementLearningBase.AbstractEnvironmentModel — Type

TODO:

Describe how to model a reinforcement learning environment. TODO: need more investigation Ref: https://bair.berkeley.edu/blog/2019/12/12/mbpo/

Analytic gradient computation
Sampling-based planning
Model-based data generation
Value-equivalence prediction Model-based Reinforcement Learning: A Survey. Tutorial on Model-Based Methods in Reinforcement Learning

source

ReinforcementLearningBase.AbstractPolicy — Type

plan!(π::AbstractPolicy, env) -> action

The policy is the most basic concept in reinforcement learning. Here an agent's action is determined by a plan! which takes an environment and policy and returns an action.

Note

See discussions here if you are wondering why we define the input as AbstractEnv instead of state.

Warning

The policy π may change its internal state but it shouldn't change env. When it's really necessary, remember to make a copy of env to keep the original env untouched.

source

ReinforcementLearningBase.ConstantSum — Type

AbstractUtilityStyle for environments where the sum of all players' rewards is constant.

source

ReinforcementLearningBase.Deterministic — Type

AbstractChanceStyle for fully deterministic games without a ChancePlayer.

source

ReinforcementLearningBase.Episodic — Type

The environment will terminate in finite steps.

source

ReinforcementLearningBase.FullActionSet — Type

The action space of the environment may contains illegal actions. For environments of FULL_ACTION_SET, legal_action_space and legal_action_space_mask must also be defined.

source

ReinforcementLearningBase.GeneralSum — Type

AbstractUtilityStyle for environments where the sum of all players' rewards is not constant.

source

ReinforcementLearningBase.GoalState — Type

Use it to represent the goal state

source

ReinforcementLearningBase.IdenticalUtility — Type

AbstractUtilityStyle for environments where all players get the same reward.

source

ReinforcementLearningBase.ImperfectInformation — Type

Other Players actions are not known by other Players.

source

ReinforcementLearningBase.InformationSet — Type

See the definition of information set

source

ReinforcementLearningBase.InternalState — Type

Use it to represent the internal state.

source

ReinforcementLearningBase.MinimalActionSet — Type

All actions in the action space of the environment are legal

source

ReinforcementLearningBase.MultiAgent — Method

MultiAgent(n::Integer) -> MultiAgent{n}()

n must be ≥ 2.

source

ReinforcementLearningBase.NeverEnding — Type

The environment can run infinitely.

source

ReinforcementLearningBase.Observation — Type

Sometimes people from different field talk about the same thing with a different name. Here we set the Observation{Any}() as the default state style in this package.

See discussions here

source

ReinforcementLearningBase.PerfectInformation — Type

All Players actions are visible to other Players.

source

ReinforcementLearningBase.Sequential — Type

Players act one after the other.

source

ReinforcementLearningBase.Simultaneous — Type

Players act at the same time.

source

ReinforcementLearningBase.SingleAgent — Type

AbstractNumAgentStyle for environments with a single agent

source

ReinforcementLearningBase.StepReward — Type

We can get reward after each step

source

ReinforcementLearningBase.Stochastic — Type

Stochastic()

Default ChanceStyle.

source

ReinforcementLearningBase.TerminalReward — Type

Only get reward at the end of environment

source

ReinforcementLearningBase.ZeroSum — Type

AbstractUtilityStyle for environments where the sum of all players' rewards is equal to zero.

source

Base.:== — Method

Base.:(==)(env1::T, env2::T) where T<:AbstractEnv

Warning

Only check the state of all players in the env.

source

Base.copy — Method

Make an independent copy of env,

Note

rng (if env has) is also copied!

source

Random.seed! — Method

Set the seed of internal rng

source

ReinforcementLearningBase.ActionStyle — Method

ActionStyle(env::AbstractEnv)

For environments of discrete actions, specify whether the current state of env contains a full action set or a minimal action set. By default the MINIMAL_ACTION_SET is returned.

source

ReinforcementLearningBase.ChanceStyle — Method

ChanceStyle(env) = STOCHASTIC

Specify which role the chance plays in the env. Possible returns are:

STOCHASTIC. This is the default return.
DETERMINISTIC
EXPLICIT_STOCHASTIC
SAMPLED_STOCHASTIC

source

ReinforcementLearningBase.DefaultStateStyle — Method

Specify the default state style when calling state(env).

source

ReinforcementLearningBase.DynamicStyle — Method

DynamicStyle(env::AbstractEnv) = SEQUENTIAL

Only valid in environments with a NumAgentStyle of MultiAgent. Determine whether the players can play simultaneously or not. Possible returns are:

SEQUENTIAL. This is the default return.
SIMULTANEOUS.

source

ReinforcementLearningBase.InformationStyle — Method

InformationStyle(env) = IMPERFECT_INFORMATION

Distinguish environments between PERFECT_INFORMATION and IMPERFECT_INFORMATION. IMPERFECT_INFORMATION is returned by default.

source

ReinforcementLearningBase.NumAgentStyle — Method

NumAgentStyle(env)

Number of agents involved in the env. Possible returns are:

SingleAgent. This is the default return.
MultiAgent.

source

ReinforcementLearningBase.RewardStyle — Method

Specify whether we can get reward after each step or only at the end of an game. Possible values are STEP_REWARD (the default one) or TERMINAL_REWARD.

Note

Environments of TERMINAL_REWARD style can be viewed as a subset of environments of STEP_REWARD style. For some algorithms, like MCTS, we may have some a more efficient implementation for environments of TERMINAL_REWARD style.

source

ReinforcementLearningBase.StateStyle — Method

StateStyle(env::AbstractEnv)

Define the possible styles of state(env). Possible values are:

Observation{T}. This is the default return.
InternalState{T}
InformationSet{T}
You can also define your customized state style when necessary.

Or a tuple contains several of the above ones.

This is useful for environments which provide more than one kind of state.

source

ReinforcementLearningBase.UtilityStyle — Method

UtilityStyle(env::AbstractEnv)

Specify the utility style in multi-agent environments. Possible values are:

GENERAL_SUM. The default return.
ZERO_SUM
CONSTANT_SUM
IDENTICAL_UTILITY

source

ReinforcementLearningBase.action_space — Function

action_space(env, player=current_player(env))

Get all available actions from environment. See also: legal_action_space

source

ReinforcementLearningBase.chance_player — Method

chance_player(env)

Only valid for environments with a chance player.

source

ReinforcementLearningBase.child — Method

child(env::AbstractEnv, action)

Treat the env as a game tree. Create an independent child after applying action.

source

ReinforcementLearningBase.current_player — Method

current_player(env)

Return the next player to take action. For Extensive Form Games, a chance player may be returned. (See also chance_player) For SIMULTANEOUS environments, a simultaneous player is always returned. (See also simultaneous_player).

source

ReinforcementLearningBase.is_terminated — Method

is_terminated(env, player=current_player(env))

source

ReinforcementLearningBase.legal_action_space — Function

legal_action_space(env, player=current_player(env))

For environments of MINIMAL_ACTION_SET, the result is the same with action_space.

source

ReinforcementLearningBase.legal_action_space_mask — Function

legal_action_space_mask(env, player=current_player(env)) -> AbstractArray{Bool}

Required for environments of FULL_ACTION_SET. As a default implementation, legal_action_space_mask creates a mask of action_space with the subset legal_action_space.

source

ReinforcementLearningBase.next_player! — Method

next_player!(env::E) where {E<:AbstractEnv}

Advance to the next player. This is a no-op for single-player and simultaneous games. Sequential MultiAgent games should implement this method.

source

ReinforcementLearningBase.optimise! — Method

RLBase.optimise!(π::AbstractPolicy, experience)

Optimise the policy π with online/offline experience or parameters.

source

ReinforcementLearningBase.players — Method

players(env::RLBaseEnv)

Players in the game. This is a no-op for single-player games. MultiAgent games should implement this method.

source

ReinforcementLearningBase.priority — Method

priority(π::AbstractPolicy, experience)

Usually used in offline policies to evaluate the priorities of the experience.

source

ReinforcementLearningBase.prob — Function

Get the action distribution of chance player.

Note

Only valid for environments of EXPLICIT_STOCHASTIC style. The current player of env must be the chance player.

source

ReinforcementLearningBase.prob — Method

prob(π::AbstractPolicy, env, action)

Only valid for environments with discrete actions.

source

ReinforcementLearningBase.prob — Method

prob(π::AbstractPolicy, env) -> Distribution

Get the probability distribution of actions based on policy π given an env.

source

ReinforcementLearningBase.reset! — Method

Reset the internal state of an environment

source

ReinforcementLearningBase.reward — Function

reward(env, player=current_player(env))

source

ReinforcementLearningBase.simultaneous_player — Method

simultaneous_player(env)

Only valid for environments of SIMULTANEOUS style.

source

ReinforcementLearningBase.spectator_player — Method

spectator_player(env)

Used in imperfect multi-agent environments.

source

ReinforcementLearningBase.state — Method

state(env, style=[DefaultStateStyle(env)], player=[current_player(env)])

The state can be of any type. However, most neural network based algorithms assume an AbstractArray is returned. For environments with many different states provided (inner state, information state, etc), users need to provide style to declare which kind of state they want.

Warning

The state may be reused and be mutated at each step. Always remember to make a copy if this is not what you expect.

source

ReinforcementLearningBase.state_space — Method

state_space(env, style=[DefaultStateStyle(env)], player=[current_player(env)])

Describe all possible states.

source

ReinforcementLearningBase.test_interfaces! — Method

Call this function after writing your customized environment to make sure that all the necessary interfaces are implemented correctly and consistently.

source

ReinforcementLearningBase.walk — Method

walk(f, env::AbstractEnv)

Call f with env and its descendants. Only use it with small games.

source