A Beginner's Guide to ReinforcementLearning.jl

From Novice to Professional

Table of content

  1. What are legal_action_space and legal_action_space_mask?
  2. How to write a customized environment?
  3. How to write a environment wrapper?
  4. How to write a customized stop condition?
  5. How to write a customized hook?
  6. How to use TensorBoard?
  7. How to evaluate an agent during training?

Here we collect some common questions and answers to help you gain a better understanding of ReinforcementLearning.jl.

For environments of FULL_ACTION_SET, the legal actions can not be determined ahead of time. So we need to define legal_action_space(env) to return valid actions at each step. For environments of MultiAgent, legal_action_space(env, player) should also be defined. Also note that now the result of legal_action_space(env) at each step must be a subset of action_space(env).

To handle the environments of FULL_ACTION_SET with discrete actions, some algorithms need to know the mask of legal actions compared to the full actions (the result of action_space(env)). For example, in neural network based algorithms, we usually apply this mask to the last output layer to select legal actions only. So the legal_action_space_mask may also be implemented in this case. In most cases it can be simply defined like this:

RLBase.legal_action_space_mask(env::YourEnv) = map(action_space(env)) do action
    action in legal_action_space(env)

How to write a customized environment?

See the detailed blog.

How to write a environment wrapper?

Sometimes, you may want to write a new environment starting from existing environments. To write a such environment wrapper, you only need to define your structure as a subtype of AbstractEnvWrapper and store the original environment in the env field. Then by default all environment related APIs defined in RLBase will be forwarded into the inner env. You only need to implement the interfaces as needed.

The following example defines a wrapper to clip the reward:

struct ClipRewardWrapper{T} <: AbstractEnvWrapper

RLBase.reward(env::ClipRewardWrapper) = clamp(reward(env.env), -0.1, 0.1)

How to write a customized stop condition?

Stop condition is just a function which is executed after interacting environment and returns a bool value indicating whether to stop an experiment or not.

function hook(agent, env)::Bool
    # ...

Usually a closure or a functional object will be used to store some intermediate data.

How to write a customized hook?

In most cases, you don't need to write a customized hook. Some ver general hooks are provided so that you can inject any runtime logic at appropriate time:

However, if you do need to write a customized hook, the following methods must be provided:

If your hook is a subtype of AbstractHook, then all the above methods will have a default implementation which just returns nothing. So that you only need to extend the necessary method you want.

How to use TensorBoard?

This package adopts a non-invasive way for logging. So you can log everything you like with a hook. For example, to log the loss of each step. You can use the DoEveryNStep.

DoEveryNStep() do t, agent, env
    with_logger(lg) do
        @info "training" loss = agent.policy.learner.loss

How to evaluate an agent during training?

Well, just like the matryoshka doll, we run an experiment inside an experiment with a hook!

    DoEveryNStep(EVALUATION_FREQ) do t, agent, env
        run(agent, env, eval_stop_condition, eval_hook)
From https://cdn.dribbble.com/users/882503/screenshots/3744602/dolls.gif