Chapter 8.6 Trajectory Sampling

The general function run(policy, env, stop_condition, hook) is very flexible and powerful. However, we are not restricted to use it only. In this notebook, we'll see how to use part of the components provided in ReinforcementLearning.jl to finish some specific experiments.

First, let's define the environment mentioned in Chapter 8.6:

13.5 μs
59.1 s
4.9 ms

Note that this environment is not described very clearly on the book. Part of the information are inferred from the lisp source code.


Actually the lisp code is also not perfect, I spent a whole afternoon to figure out the code logic. So good luck if you also want to understand it.

The definitions above are just like any other environment we've defined before in previous chapters. Now we'll add an extra function to make it work for our planning purpose.

9.9 μs
334 μs
76.0 ns
77.0 ns
204 μs
119 μs
sweep (generic function with 1 method)
81.2 μs
on_policy (generic function with 1 method)
79.2 μs
88.5 s
173 s