Difference between revisions of "Allen's REINFORCE notes"
(→Motivation) |
|||
Line 9: | Line 9: | ||
=== Motivation === | === Motivation === | ||
− | Recall that the objective of Reinforcement Learning is to find an optimal policy <math>\pi^*< | + | Recall that the objective of Reinforcement Learning is to find an optimal policy <math> \pi^* </math> which we encode in a neural network with parameters <math>\theta^*</math>. These optimal parameters are defined as |
<math>\theta^* = \text<argmax>_\theta E_{\tau \sim p_\theta(\tau)} \left[ \sum_t r(s_t, a_t) \right] </math> | <math>\theta^* = \text<argmax>_\theta E_{\tau \sim p_\theta(\tau)} \left[ \sum_t r(s_t, a_t) \right] </math> | ||
Revision as of 21:42, 24 May 2024
Allen's REINFORCE notes
Links
Motivation
Recall that the objective of Reinforcement Learning is to find an optimal policy which we encode in a neural network with parameters . These optimal parameters are defined as Failed to parse (syntax error): {\displaystyle \theta^* = \text<argmax>_\theta E_{\tau \sim p_\theta(\tau)} \left[ \sum_t r(s_t, a_t) \right] }
Learning
Learning involves the agent taking actions and the environment returning a new state and reward.
- Input: : States at each time step
- Output: : Actions at each time step
- Data:
- Learn to maximize