Changes

Jump to: navigation, search

Allen's PPO Notes

No change in size, 26 May
no edit summary
#Smaller policy updates more likely to converge to optimal
#Falling "off the cliff" might mean it's impossible to recover
How we solve this: Measure how much policy changes w.r.t. previous, clip ratio to <math>[1-\varepislonvarepsilon, 1 + \varepsilon]</math> removing incentive to go too far.
53
edits

Navigation menu