Difference between revisions of "Reinforcement Learning"
|  (→PPO) |  (→A2C) | ||
| Line 4: | Line 4: | ||
| ==Training algorithms== | ==Training algorithms== | ||
| − | ===A2C=== | + | ===[https://en.wikipedia.org/wiki/Advantage_Actor_Critic A2C]=== | 
| ===[https://en.wikipedia.org/wiki/Proximal_policy_optimization PPO]=== | ===[https://en.wikipedia.org/wiki/Proximal_policy_optimization PPO]=== | ||

