45.1 s

To describe the Grid World in Example 3.5, we'll create a distributional environment model. Here the distributional means, given a state-action paire, we can predict the possible next state, reward, termination info and the corresponding probability.

12.5 μs
24.4 ms
V
21.9 ms
456 ms
table
5×5 Array{Float64,2}:
  3.30943     8.78964    4.42795    5.32267    1.49249
  1.52199     2.99266    2.25045    1.90786    0.54769
  0.0512075   0.738502   0.673411   0.358465  -0.40287
 -0.973217   -0.435172  -0.354592  -0.585334  -1.18281
 -1.85733    -1.34491   -1.22898   -1.42265   -1.97492
1.8 μs
16.7 s