Allen's REINFORCE notes

From Humanoid Robots Wiki
Revision as of 21:43, 24 May 2024 by Allen12 (talk | contribs)
Jump to: navigation, search

Allen's REINFORCE notes



Recall that the objective of Reinforcement Learning is to find an optimal policy which we encode in a neural network with parameters . These optimal parameters are defined as


Learning involves the agent taking actions and the environment returning a new state and reward.

  • Input: : States at each time step
  • Output: : Actions at each time step
  • Data:
  • Learn to maximize

State vs. Observation