Allen's REINFORCE notes
Allen's REINFORCE notes
Links
Motivation
Recall that the objective of Reinforcement Learning is to find an optimal policy which we encode in a neural network with parameters . These optimal parameters are defined as . Let's unpack what this means. To phrase it in english, this is basically saying that the optimal policy is one such that the expected value of the total reward over following a trajectory determined by the policy is the highest over all policies.
Learning
Learning involves the agent taking actions and the environment returning a new state and reward.
- Input: : States at each time step
- Output: : Actions at each time step
- Data:
- Learn to maximize