Changes

Allen's Reinforcement Learning Notes

2 bytes added, 05:44, 21 May 2024

→‎Markov Chain & Decision Process

The reward is a function of the state and action r(s, a) -> int, which tells us what states and actions are better. When choosing hyperparameters we need to be careful to make sure that we go for completing long term goals instead of always looking for immediate reward.

=== Markov Chain & Decision Process===

Markov Chain: <math> M = {S, T} <\math>, where S - state space, T- transition operator. The state space is the set of all states, and can be discrete or continuous. The transition probabilities is represented in a matrix, where the i,j'th entry is the probability of going into state i at state j, and we can express the next time step by multiplying the current time step with the transition operator.

Allen12

53

edits

Humanoid Robots Wiki β

Changes

Allen's Reinforcement Learning Notes

Humanoid Robots Wiki ^β