Difference between revisions of "Reinforcement Learning"

From Humanoid Robots Wiki
Jump to: navigation, search
(Training algorithms)
(Training algorithms)
Line 1: Line 1:
 
== Training algorithms ==
 
== Training algorithms ==
  
* [https://en.wikipedia.org/wiki/Advantage_Actor_Critic A2C] (also see slides on Actor Critic methods at [https://cs224r.stanford.edu/slides/cs224r-actor-critic-split.pdf] Stanford CS224R)
+
* [https://en.wikipedia.org/wiki/Advantage_Actor_Critic A2C] (also see slides on Actor Critic methods at [1])
 
* [https://en.wikipedia.org/wiki/Proximal_policy_optimization PPO]
 
* [https://en.wikipedia.org/wiki/Proximal_policy_optimization PPO]
 
* [https://spinningup.openai.com/en/latest/algorithms/sac.html SAC]
 
* [https://spinningup.openai.com/en/latest/algorithms/sac.html SAC]
 +
 +
== References ==
 +
 +
* [1] [https://cs224r.stanford.edu/slides/cs224r-actor-critic-split.pdf Stanford CS224R]
  
 
== Resources ==
 
== Resources ==

Revision as of 06:21, 16 May 2024

Training algorithms

  • A2C (also see slides on Actor Critic methods at [1])
  • PPO
  • SAC

References

Resources