Difference between revisions of "Reinforcement Learning"
(Created page with " ==Training algorithms== ===A2C=== ===PPO===") |
(→PPO) |
||
Line 6: | Line 6: | ||
===A2C=== | ===A2C=== | ||
− | ===PPO=== | + | ===[https://en.wikipedia.org/wiki/Proximal_policy_optimization PPO]=== |