Chapter 8.6 Trajectory Sampling
The general function
run(policy, env, stop_condition, hook) is very flexible and powerful. However, we are not restricted to use it only. In this notebook, we'll see how to use part of the components provided in
ReinforcementLearning.jl to finish some specific experiments.
First, let's define the environment mentioned in Chapter 8.6:
Note that this environment is not described very clearly on the book. Part of the information are inferred from the lisp source code.
Actually the lisp code is also not perfect, I spent a whole afternoon to figure out the code logic. So good luck if you also want to understand it.
The definitions above are just like any other environment we've defined before in previous chapters. Now we'll add an extra function to make it work for our planning purpose.
sweep (generic function with 1 method)
on_policy (generic function with 1 method)