Environment and StateΒΆ
The environment is an entity that the agent can interact with. It represents the information in the form of state.
In RL, History or Trajectory is the collection of all variables that describe all the events that have taken place between the environment and the agent.
\[\mathbb{H}_t \, \text{or} \, \tau_t = \left( O_0, A_0, R_0, O_1, A_1, R_1, ... O_t, A_t, R_t \right) \, \implies S_t = f(H_t) \]
A state signal that retains all relevant history information is said to have Markov property, and such RL tasks are called Markov decision process. That is, the environment need only know the current state and action to estimate the transition into the next state.
\[\therefore p (s' | s, a) = p ( s' | h, a)\]
Note
The future is independent of the past given the present
\[\mathbb{H}_{t} \longrightarrow S_t \longrightarrow \mathbb{H}_{t+1}\]
However, the state can be defined for both Environment and Agent.
- The environment state is the environment's internal state.
- It may or may not be fully visible to the agent.
- The agent state is a function of the history
- Full observability is Agent directly observing environment state. [Markov Decision Process (MDP) ]
- Partial observability is Agent indirectly observing environment state. [Partially Observable Markov Decision Process (POMDP) ]