Environment and State¶

The environment is an entity that the agent can interact with. It represents the information in the form of state.
In RL, History or Trajectory is the collection of all variables that describe all the events that have taken place between the environment and the agent.

\[\mathbb{H}_t \, \text{or} \, \tau_t = \left( O_0, A_0, R_0, O_1, A_1, R_1, ... O_t, A_t, R_t \right) \, \implies S_t = f(H_t) \]

A state signal that retains all relevant history information is said to have Markov property, and such RL tasks are called Markov decision process. That is, the environment need only know the current state and action to estimate the transition into the next state.

\[\therefore p (s' | s, a) = p ( s' | h, a)\]

Note

The future is independent of the past given the present

\[\mathbb{H}_{t} \longrightarrow S_t \longrightarrow \mathbb{H}_{t+1}\]

However, the state can be defined for both Environment and Agent.

The agent state is a function of the history
Full observability is Agent directly observing environment state. [Markov Decision Process (MDP) ]
Partial observability is Agent indirectly observing environment state. [Partially Observable Markov Decision Process (POMDP) ]

Reinforcement Learning