Changes

Allen's REINFORCE notes

342 bytes added, 23:08, 25 May 2024

no edit summary

</syntaxhighlight>

=== Objective Function ===

The goal of reinforcement learning is to maximize the expected reward over the entire episode. We use <math>R(\tau)</math> to denote the total reward over some trajectory <math>\tau</math> defined by our policy. Thus we want to maximize <math>E_{\tau ~ \pi_\theta}[R(\tau)]</math>

=== Loss Function ===

The goal of REINFORCE is to optimize the expected cumulative reward.We do so using gradient descent

Allen12

53

edits

Humanoid Robots Wiki β

Changes

Allen's REINFORCE notes

Humanoid Robots Wiki ^β