Difference between revisions of "Allen's REINFORCE notes"
(→Motivation) |
|||
Line 7: | Line 7: | ||
[[Category:Reinforcement Learning]] | [[Category:Reinforcement Learning]] | ||
− | === Motivation === | + | === Motivation === |
+ | |||
+ | Recall that the objective of Reinforcement Learning is to find an optimal policy <math>\pi^*<\math> which we encode in a neural network with parameters <math>\theta^*<\math>. These optimal parameters are defined as | ||
+ | <math>\theta^* = \text<argmax>_\theta E_{\tau \sim p_\theta(\tau)} \left[ \sum_t r(s_t, a_t) \right] <\math> | ||
+ | |||
=== Learning === | === Learning === | ||
Revision as of 21:41, 24 May 2024
Allen's REINFORCE notes
Links
Motivation
Recall that the objective of Reinforcement Learning is to find an optimal policy Failed to parse (unknown function "\math"): {\displaystyle \pi^*<\math> which we encode in a neural network with parameters <math>\theta^*<\math>. These optimal parameters are defined as <math>\theta^* = \text<argmax>_\theta E_{\tau \sim p_\theta(\tau)} \left[ \sum_t r(s_t, a_t) \right] <\math> === Learning === Learning involves the agent taking actions and the environment returning a new state and reward. * Input: <math>s_t} : States at each time step
- Output: : Actions at each time step
- Data:
- Learn to maximize