Difference between revisions of "Allen's REINFORCE notes"
(→Motivation) |
|||
Line 7: | Line 7: | ||
[[Category:Reinforcement Learning]] | [[Category:Reinforcement Learning]] | ||
− | === Motivation === | + | === Motivation === |
+ | |||
+ | Recall that the objective of Reinforcement Learning is to find an optimal policy <math>\pi^*<\math> which we encode in a neural network with parameters <math>\theta^*<\math>. These optimal parameters are defined as | ||
+ | <math>\theta^* = \text<argmax>_\theta E_{\tau \sim p_\theta(\tau)} \left[ \sum_t r(s_t, a_t) \right] <\math> | ||
+ | |||
=== Learning === | === Learning === | ||
Revision as of 21:41, 24 May 2024
Allen's REINFORCE notes
Links
Motivation
Recall that the objective of Reinforcement Learning is to find an optimal policy Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \pi^*<\math> which we encode in a neural network with parameters <math>\theta^*<\math>. These optimal parameters are defined as <math>\theta^* = \text<argmax>_\theta E_{\tau \sim p_\theta(\tau)} \left[ \sum_t r(s_t, a_t) \right] <\math> === Learning === Learning involves the agent taking actions and the environment returning a new state and reward. * Input: <math>s_t} : States at each time step
- Output: : Actions at each time step
- Data:
- Learn to maximize