Difference between revisions of "Reinforcement Learning"
|  (→Training algorithms) |  (→Training algorithms) | ||
| Line 1: | Line 1: | ||
| == Training algorithms == | == Training algorithms == | ||
| − | * [https://en.wikipedia.org/wiki/Advantage_Actor_Critic A2C] (also see slides on Actor Critic methods at [ | + | * [https://en.wikipedia.org/wiki/Advantage_Actor_Critic A2C] (also see slides on Actor Critic methods at [1]) | 
| * [https://en.wikipedia.org/wiki/Proximal_policy_optimization PPO] | * [https://en.wikipedia.org/wiki/Proximal_policy_optimization PPO] | ||
| * [https://spinningup.openai.com/en/latest/algorithms/sac.html SAC] | * [https://spinningup.openai.com/en/latest/algorithms/sac.html SAC] | ||
| + | |||
| + | == References == | ||
| + | |||
| + | * [1] [https://cs224r.stanford.edu/slides/cs224r-actor-critic-split.pdf Stanford CS224R] | ||
| == Resources == | == Resources == | ||

