Allen's REINFORCE notes
Links
Motivation
Recall that the objective of Reinforcement Learning is to find an optimal policy
which we encode in a neural network with parameters
. These optimal parameters are defined as
Learning
Learning involves the agent taking actions and the environment returning a new state and reward.
- Input:
: States at each time step
- Output:
: Actions at each time step
- Data:

- Learn
to maximize 
State vs. Observation