Difference between revisions of "Reinforcement Learning"
(→Training algorithms) |
|||
Line 1: | Line 1: | ||
== Training algorithms == | == Training algorithms == | ||
− | * [https://en.wikipedia.org/wiki/Advantage_Actor_Critic A2C] | + | * [https://en.wikipedia.org/wiki/Advantage_Actor_Critic A2C] |
* [https://en.wikipedia.org/wiki/Proximal_policy_optimization PPO] | * [https://en.wikipedia.org/wiki/Proximal_policy_optimization PPO] | ||
* [https://spinningup.openai.com/en/latest/algorithms/sac.html SAC] | * [https://spinningup.openai.com/en/latest/algorithms/sac.html SAC] | ||
− | |||
− | |||
== Resources == | == Resources == | ||
* [https://mandi-zhao.gitbook.io/deeprl-notes Mandy Zhao's Reinforcement Learning Notes] | * [https://mandi-zhao.gitbook.io/deeprl-notes Mandy Zhao's Reinforcement Learning Notes] | ||
+ | |||
+ | * [https://cs224r.stanford.edu/slides/cs224r-actor-critic-split.pdf Stanford CS224R Actor Critic Slides] | ||
[[Category: Software]] | [[Category: Software]] |
Revision as of 06:22, 16 May 2024
Training algorithms