Allen's REINFORCE notes
Allen's REINFORCE notes
Links
Motivation
Recall that the objective of Reinforcement Learning is to find an optimal policy Failed to parse (unknown function "\math"): {\displaystyle \pi^*<\math> which we encode in a neural network with parameters <math>\theta^*} . These optimal parameters are defined as Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \theta^* = \text<argmax>_\theta E_{\tau \sim p_\theta(\tau)} \left[ \sum_t r(s_t, a_t) \right] }
Learning
Learning involves the agent taking actions and the environment returning a new state and reward.
- Input: : States at each time step
- Output: : Actions at each time step
- Data:
- Learn to maximize