Changes

Allen's REINFORCE notes

325 bytes added, 21:41, 24 May 2024

→‎Motivation

[[Category:Reinforcement Learning]]

=== Motivation === Recall that the objective of Reinforcement Learning is to find an optimal policy <math>\pi^*<\math> which we encode in a neural network with parameters <math>\theta^*<\math>. These optimal parameters are defined as<math>\theta^* = \text<argmax>_\theta E_{\tau \sim p_\theta(\tau)} \left[ \sum_t r(s_t, a_t) \right] <\math>

=== Learning ===

Allen12

53

edits

Changes

Allen's REINFORCE notes

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools