Open main menu

Humanoid Robots Wiki β

Allen's REINFORCE notes

Revision as of 21:43, 24 May 2024 by Allen12 (talk | contribs)

Allen's REINFORCE notes

Contents

Links

Motivation

Recall that the objective of Reinforcement Learning is to find an optimal policy   which we encode in a neural network with parameters  . These optimal parameters are defined as  

Learning

Learning involves the agent taking actions and the environment returning a new state and reward.

  • Input:  : States at each time step
  • Output:  : Actions at each time step
  • Data:  
  • Learn   to maximize  

State vs. Observation