467
edits
Changes
→Training algorithms
== Training algorithms ==
* [https://en.wikipedia.org/wiki/Advantage_Actor_Critic A2C] (also see slides on Actor Critic methods at [https://cs224r.stanford.edu/slides/cs224r-actor-critic-split.pdf1] Stanford CS224R)
* [https://en.wikipedia.org/wiki/Proximal_policy_optimization PPO]
* [https://spinningup.openai.com/en/latest/algorithms/sac.html SAC]
== References ==
* [1] [https://cs224r.stanford.edu/slides/cs224r-actor-critic-split.pdf Stanford CS224R]
== Resources ==