53
edits
Changes
→Markov Chain & Decision Process
The reward is a function of the state and action r(s, a) -> int, which tells us what states and actions are better. When choosing hyperparameters we need to be careful to make sure that we go for completing long term goals instead of always looking for immediate reward.
=== Markov Chain & Decision Process===
Markov Chain: <math> M = {S, T} <\math>, where S - state space, T- transition operator. The state space is the set of all states, and can be discrete or continuous. The transition probabilities is represented in a matrix, where the i,j'th entry is the probability of going into state i at state j, and we can express the next time step by multiplying the current time step with the transition operator.