Allen's REINFORCE notes

From Humanoid Robots Wiki

Revision as of 20:11, 24 May 2024 by Allen12 (talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Jump to: navigation, search

Allen's REINFORCE notes

Contents

1 Links
2 Motivation
3 Learning
4 State vs. Observation

Links

[1]

Motivation

Learning

Learning involves the agent taking actions and the environment returning a new state and reward.

Input: $s_{t}$ : States at each time step
Output: $a_{t}$ : Actions at each time step
Data: $(s_{1},a_{1},r_{1},...,s_{T},a_{T},r_{T})$
Learn $\pi _{\theta }:s_{t}->a_{t}$ to maximize $\sum _{t}r_{t}$

State vs. Observation

Retrieved from "http://54.204.126.50/index.php?title=Allen%27s_REINFORCE_notes&oldid=1234"

Reinforcement Learning