Difference between revisions of "Allen's REINFORCE notes"
| Line 10: | Line 10: | ||
Recall that the objective of Reinforcement Learning is to find an optimal policy <math> \pi^* </math> which we encode in a neural network with parameters <math>\theta^*</math>. These optimal parameters are defined as | Recall that the objective of Reinforcement Learning is to find an optimal policy <math> \pi^* </math> which we encode in a neural network with parameters <math>\theta^*</math>. These optimal parameters are defined as | ||
| − | <math>\theta^* = \text | + | <math>\theta^* = \text{argmax}_\theta E_{\tau \sim p_\theta(\tau)} \left[ \sum_t r(s_t, a_t) \right] </math> |
=== Learning === | === Learning === | ||
Revision as of 21:43, 24 May 2024
Allen's REINFORCE notes
Links
Motivation
Recall that the objective of Reinforcement Learning is to find an optimal policy which we encode in a neural network with parameters . These optimal parameters are defined as
Learning
Learning involves the agent taking actions and the environment returning a new state and reward.
- Input: : States at each time step
- Output: : Actions at each time step
- Data: Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle (s_1, a_1, r_1, ... , s_T, a_T, r_T)}
- Learn Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \pi_\theta : s_t -> a_t } to maximize Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \sum_t r_t }