Open main menu

Humanoid Robots Wiki β

Changes

Allen's REINFORCE notes

342 bytes added, 25 May
no edit summary
</syntaxhighlight>
 
=== Objective Function ===
 
The goal of reinforcement learning is to maximize the expected reward over the entire episode. We use <math>R(\tau)</math> to denote the total reward over some trajectory <math>\tau</math> defined by our policy. Thus we want to maximize <math>E_{\tau ~ \pi_\theta}[R(\tau)]</math>
=== Loss Function ===
The goal of REINFORCE is to optimize the expected cumulative reward.We do so using gradient descent
53
edits