Open main menu

Humanoid Robots Wiki β

Changes

Allen's PPO Notes

330 bytes added, 26 May
Created page with "Intuition: Want to avoid too large of a policy update #Smaller policy updates more likely to converge to optimal #Falling "off the cliff" might mean it's impossible to recover..."
Intuition: Want to avoid too large of a policy update
#Smaller policy updates more likely to converge to optimal
#Falling "off the cliff" might mean it's impossible to recover
How we solve this: Measure how much policy changes w.r.t. previous, clip ratio to <math>[1-\varepislon, 1 + \varepsilon] removing incentive to go too far.
53
edits