Changes

Jump to: navigation, search

Allen's REINFORCE notes

2 bytes added, 24 May
no edit summary
=== Motivation ===
Recall that the objective of Reinforcement Learning is to find an optimal policy <math>\pi^*<\/math> which we encode in a neural network with parameters <math>\theta^*</math>. These optimal parameters are defined as
<math>\theta^* = \text<argmax>_\theta E_{\tau \sim p_\theta(\tau)} \left[ \sum_t r(s_t, a_t) \right] </math>
53
edits

Navigation menu