Multiplayer Interface
CommonRLInterface provides a basic interface for multiplayer games.
Sequential games
Sequential games should implement the optional function players
to return a range of player ids, and player
to indicate which player's turn it is. There is no requirement that players play in the order returned by the players
function. Only the action for the current player should be supplied to act!
, but rewards for all players should be returned. observe
returns the observation for only the current player.
Simultaneous Games/Multi-agent (PO)MDPs
Environments in which all players take actions at once should implement the all_act!
and all_observe
optional functions which take a collection of actions for all players and return observations for each player, respectively.
Indicating reward properties
The UtilityStyle
trait can be used to indicate that the rewards will meet properties, for example that rewards for all players are identical or that the game is zero-sum.
CommonRLInterface.players
— Functionplayers(env::AbstractEnv)
Return an ordered iterable collection of integer indices for all players, starting with one.
This function is a static property of the environment; the value it returns should not change based on the state.
Example
players(::MyEnv) = 1:2
CommonRLInterface.player
— Functionplayer(env::AbstractEnv)
Return the index of the player who should play next in the environment.
CommonRLInterface.all_act!
— Functionall_act!(env::AbstractEnv, actions::AbstractVector)
Take actions
for all players and advance AbstractEnv env
forward, and return rewards for all players.
Environments that support simultaneous actions by all players should implement this in addition to or instead of act!
.
CommonRLInterface.all_observe
— Functionall_observe(env::AbstractEnv)
Return observations from the environment for all players.
Environments that support simultaneous actions by all players should implement this in addition to or instead of observe
.
CommonRLInterface.UtilityStyle
— TypeUtilityStyle(env)
Trait that allows an environment to declare certain properties about the relative utility for the players.
Possible returns are:
ZeroSum()
ConstantSum()
GeneralSum()
IdenticalUtility()
See the docstrings for each for more details.
CommonRLInterface.ZeroSum
— TypeIf UtilityStyle(env) == ZeroSum()
then the sum of the rewards returned by act!
is always zero.
CommonRLInterface.ConstantSum
— TypeIf UtilityStyle(env) == ConstantSum()
then the sum of the rewards returned by act!
will always be the same constant.
CommonRLInterface.GeneralSum
— TypeIf UtilityStyle(env) == GeneralSum()
, the sum of rewards over a trajectory can take any form.
CommonRLInterface.IdenticalUtility
— TypeIf UtilityStyle(env) == IdenticalUtility()
, all entries of the reward returned by act!
will be identical for all players.