Allen's PPO Notes

Intuition: Want to avoid too large of a policy update

Smaller policy updates more likely to converge to optimal
Falling "off the cliff" might mean it's impossible to recover

How we solve this: Measure how much policy changes w.r.t. previous, clip ratio to <math>[1-\varepislon, 1 + \varepsilon] removing incentive to go too far.

Allen's PPO Notes

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools