Changes

Allen's Reinforcement Learning Notes

14 bytes added, 18:15, 21 May 2024

→‎Markov Chain & Decision Process

Markov Chain: <math> M = {S, T} </math>, where S - state space, T- transition operator. The state space is the set of all states, and can be discrete or continuous. The transition probabilities is represented in a matrix, where the i,j'th entry is the probability of going into state i at state j, and we can express the next time step by multiplying the current time step with the transition operator.

Markov Decision Process: <math> M = {S, A, T, r} </math>, where A - action space. T is now a tensor, containing the current state, current action, and next state. We let <math> T_{i, j, k} = p(s_t + 1 = i | s_t = j, a_t = k)</math>. r is the reward function.

=== Reinforcement Learning Algorithms - High-level ===

Ben

blockimmune, Bureaucrats, Administrators

488

edits

Humanoid Robots Wiki β

Changes

Allen's Reinforcement Learning Notes

Humanoid Robots Wiki ^β