Allen's PPO Notes

From Humanoid Robots Wiki
Revision as of 19:28, 26 May 2024 by Allen12 (talk | contribs)
Jump to: navigation, search

Intuition: Want to avoid too large of a policy update

  1. Smaller policy updates more likely to converge to optimal
  2. Falling "off the cliff" might mean it's impossible to recover

How we solve this: Measure how much policy changes w.r.t. previous, clip ratio to Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle [1-\varepislon, 1 + \varepsilon]} removing incentive to go too far.