Action and Policy¶

A policy $\pi$ is the agent’s internal strategy on picking actions. Moreover, an action is the agent’s degree of freedom to act for maximizing the reward.

Policy maps states to actions $\pi : S \longrightarrow \A$ . Also, there are the following approaches for training this function to find the optimal policy.

Policy-based: Directly train the policy.
\[\begin{split}a = \pi ( s ) \, \text{or} \, \substack{argmax \\ a} \, \pi ( a | s )\end{split}\]
Value-based: Train a value function, such that our policy is going to the state with the highest value.
\[\textbf{State-Value Function:} V_{\pi} (s) = \mathbb{E}_{\pi} [ G_t | s], \]

\[\textbf{Action-Value Function:} Q_{\pi} (s, a) = \mathbb{E}_{\pi} [ G_t | s, a]\]
Model-based

Reinforcement Learning

Action and Policy¶