53
edits
Changes
no edit summary
=== State vs. Observation ===
A state is a complete representation of the physical world while the observation is some subset or representation of s. They are not necessarily the same in that we can't always infer know what s_t is from o_t, but o_t is inferable from s_t. To think of it as a network of conditional probability, we have
* <math> s_1 -> o_1 - (\pi_\theta) -> a_1 </math> (policy)
* <math> s_1, a_1 - (p(s_{t+1} | s_t, a_t) -> s_2 </math> (dynamics)
Note that theta represents the parameters of the policy (for example, the parameters of a neural network). Assumption: Markov Property - Future states are independent of past states given present states. This is the fundamental difference between states and observations.