A Beginner's Guide to ReinforcementLearning.jl

What are legal_action_space and legal_action_space_mask?

For environments of FULL_ACTION_SET, the legal actions can not be determined ahead of time. So we need to define legal_action_space(env) to return valid actions at each step. For environments of MultiAgent, legal_action_space(env, player) should also be defined. Also note that now the result of legal_action_space(env) at each step must be a subset of action_space(env).

To handle the environments of FULL_ACTION_SET with discrete actions, some algorithms need to know the mask of legal actions compared to the full actions (the result of action_space(env)). For example, in neural network based algorithms, we usually apply this mask to the last output layer to select legal actions only. So the legal_action_space_mask may also be implemented in this case. In most cases it can be simply defined like this:

RLBase.legal_action_space_mask(env::YourEnv) = map(action_space(env)) do action action in legal_action_space(env) end

How to write a environment wrapper?

Sometimes, you may want to write a new environment starting from existing environments. To write a such environment wrapper, you only need to define your structure as a subtype of AbstractEnvWrapper and store the original environment in the env field. Then by default all environment related APIs defined in RLBase will be forwarded into the inner env. You only need to implement the interfaces as needed.

The following example defines a wrapper to clip the reward:

struct ClipRewardWrapper{T} <: AbstractEnvWrapper env::T end RLBase.reward(env::ClipRewardWrapper) = clamp(reward(env.env), -0.1, 0.1)

How to write a customized hook?

In most cases, you don't need to write a customized hook. Some generic hooks are provided so that you can inject logic at the appropriate time:

However, if you do need to write a customized hook, the following methods must be provided:

(hook::YourHook)(::PreActStage, agent, env, action), note that there's an extra argument of action.

(hook::YourHook)(::PostActStage, agent, env)

(hook::YourHook)(::PreEpisodeStage, agent, env)

(hook::YourHook)(::PostEpisodeStage, agent, env)

If your hook is a subtype of AbstractHook, then all the above methods will have a default implementation which just returns nothing. So that you only need to extend the necessary method you want.

How to evaluate an agent during training?

Well, just like the matryoshka doll, we run an experiment inside an experiment with a hook!

run( agent, env, stop_condition, DoEveryNSteps(EVALUATION_FREQ) do t, agent, env run(agent, env, eval_stop_condition, eval_hook) end )

From https://cdn.dribbble.com/users/882503/screenshots/3744602/dolls.gif

This website is built with Franklin.jl of the DistillTemplate (licensed under Apache License 2.0) and Documenter.jl. The source code of this website is licensed under MIT License. The JuliaReinforcementLearning organization was first created by Johanni Brea and then co-maintained by Jun Tian. And we thank all the contributors .

A Beginner's Guide to ReinforcementLearning.jl

Table of content

What are `legal_action_space` and `legal_action_space_mask`?

How to write a customized environment?

How to write a environment wrapper?

How to write a customized stop condition?

How to write a customized hook?

How to use TensorBoard?

How to evaluate an agent during training?

A Beginner's Guide to ReinforcementLearning.jl

Table of content

What are legal_action_space and legal_action_space_mask?

How to write a customized environment?

How to write a environment wrapper?

How to write a customized stop condition?

How to write a customized hook?

How to use TensorBoard?

How to evaluate an agent during training?

What are `legal_action_space` and `legal_action_space_mask`?