Changes

Allen's REINFORCE notes

221 bytes added, 00:15, 25 May 2024

no edit summary

‎<syntaxhighlight lang="bash" line>

Initialize neural network with input dimensions = observation dimensions and output dimensions = action dimensions

For ~~\# of episodes~~each episode:

While not terminated:

Get observation from environment

Step environment using action and store reward

Calculate loss over entire trajectory as function of probabilities and rewards

Recall loss functions are differentiable with respect to each parameter - thus, calculate how changes in parameters correlate with changes in the loss

Based on the loss, use a gradient descent policy to update weights

</syntaxhighlight>

=== Loss Function ===

Allen12

53

edits

Humanoid Robots Wiki β

Changes

Allen's REINFORCE notes

Humanoid Robots Wiki ^β