Changes
no edit summary
A state is a complete representation of the physical world while the observation is some subset or representation of s. They are not necessarily the same in that we can't always infer s_t from o_t, but o_t is inferable from s_t. To think of it as a network of conditional probability, we have
* <math> s_1 -> o_1 - (pi_theta\pi_\theta) -> a_1 </math> (policy)
* <math> s_1, a_1 - (p(s_{t+1} | s_t, a_t) -> s_2 </math> (dynamics)