Changes

Allen's Reinforcement Learning Notes

2 bytes added, 05:44, 21 May 2024

→‎Markov Chain & Decision Process

The reward is a function of the state and action r(s, a) -> int, which tells us what states and actions are better. When choosing hyperparameters we need to be careful to make sure that we go for completing long term goals instead of always looking for immediate reward.

=== Markov Chain & Decision Process===

Markov Chain: <math> M = {S, T} <\math>, where S - state space, T- transition operator. The state space is the set of all states, and can be discrete or continuous. The transition probabilities is represented in a matrix, where the i,j'th entry is the probability of going into state i at state j, and we can express the next time step by multiplying the current time step with the transition operator.

Allen12

53

edits

Changes

Allen's Reinforcement Learning Notes

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools