Changes

Allen's REINFORCE notes

7 bytes added, 00:09, 26 May 2024

no edit summary

====Log Derivative Trick====

Suppose we'd like to find <math>\nabla_{x_1}\log(f(x_1, x_2, x_3, ...))</math>. By the chain rule this is equal to <math>\frac{\nabla_{x_1}f(x_1, x_2, x_3 ...)}{f(x_1, x_2, x_3 ...)}</math>. Thus, by rearranging, we can take the gradient of any function with respect to some variable as <math>\nabla_{x_1}f(x_1, x_2, x_3, ...)= f(x_1, x_2, x_3,...)\nabla_{x_1}\log(f(x_1, x_2, x_3, ...)</math>.

=== Loss Function ===

The goal of REINFORCE is to optimize the expected cumulative reward. We do so using gradient descent

Allen12

53

edits

Changes

Allen's REINFORCE notes

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools