Difference between revisions of "Allen's REINFORCE notes"

From Humanoid Robots Wiki

Jump to: navigation, search

Revision as of 20:24, 24 May 2024

Allen's REINFORCE notes

Contents

[hide]

1 Links
2 Motivation
3 Learning
4 State vs. Observation

Links

/RLbook2020

Motivation

Learning

Learning involves the agent taking actions and the environment returning a new state and reward.

Input: $s_{t}$ : States at each time step
Output: $a_{t}$ : Actions at each time step
Data: $(s_{1},a_{1},r_{1},...,s_{T},a_{T},r_{T})$
Learn $\pi _{\theta }:s_{t}->a_{t}$ to maximize $\sum _{t}r_{t}$

State vs. Observation

Retrieved from "http://54.204.126.50/index.php?title=Allen%27s_REINFORCE_notes&oldid=1235"

Reinforcement Learning