Open main menu

Humanoid Robots Wiki β

Allen's REINFORCE notes

Revision as of 21:41, 24 May 2024 by Allen12 (talk | contribs) (Motivation)

Allen's REINFORCE notes

Links

Motivation

Recall that the objective of Reinforcement Learning is to find an optimal policy Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \pi^*<\math> which we encode in a neural network with parameters <math>\theta^*<\math>. These optimal parameters are defined as <math>\theta^* = \text<argmax>_\theta E_{\tau \sim p_\theta(\tau)} \left[ \sum_t r(s_t, a_t) \right] <\math> === Learning === Learning involves the agent taking actions and the environment returning a new state and reward. * Input: <math>s_t} : States at each time step

  • Output:  : Actions at each time step
  • Data:  
  • Learn   to maximize  

State vs. Observation