Difference between revisions of "Allen's REINFORCE notes"

Revision as of 21:41, 24 May 2024

Allen's REINFORCE notes

Links

RLbook2020

Motivation

Recall that the objective of Reinforcement Learning is to find an optimal policy Failed to parse (unknown function "\math"): {\displaystyle \pi^*<\math> which we encode in a neural network with parameters <math>\theta^*<\math>. These optimal parameters are defined as <math>\theta^* = \text<argmax>_\theta E_{\tau \sim p_\theta(\tau)} \left[ \sum_t r(s_t, a_t) \right] <\math> === Learning === Learning involves the agent taking actions and the environment returning a new state and reward. * Input: <math>s_t} : States at each time step

Output: $a_{t}$ : Actions at each time step
Data: $(s_{1},a_{1},r_{1},...,s_{T},a_{T},r_{T})$
Learn $\pi _{\theta }:s_{t}->a_{t}$ to maximize $\sum _{t}r_{t}$

State vs. Observation

@@ Line 7: / Line 7: @@
 [[Category:Reinforcement Learning]]
 === Motivation ===
+Recall that the objective of Reinforcement Learning is to find an optimal policy <math>\pi^*<\math> which we encode in a neural network with parameters <math>\theta^*<\math>. These optimal parameters are defined as
+<math>\theta^* = \text<argmax>_\theta E_{\tau \sim p_\theta(\tau)} \left[ \sum_t r(s_t, a_t) \right] <\math>
 === Learning ===

Difference between revisions of "Allen's REINFORCE notes"

Revision as of 21:41, 24 May 2024

Links

Motivation

State vs. Observation

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools