Action and Policy¶
A policy is the agent’s internal strategy on picking actions. Moreover, an action is the agent’s degree of freedom to act for maximizing the reward.
Policy maps states to actions . Also, there are the following approaches for training this function to find the optimal policy.
- Policy-based: Directly train the policy.
\[\begin{split}a = \pi ( s ) \, \text{or} \, \substack{argmax \\ a} \, \pi ( a | s )\end{split}\]
- Value-based: Train a value function, such that our policy is going to the state with the highest value.
\[\textbf{State-Value Function:} V_{\pi} (s) = \mathbb{E}_{\pi} [ G_t | s], \]\[\textbf{Action-Value Function:} Q_{\pi} (s, a) = \mathbb{E}_{\pi} [ G_t | s, a]\]
- Model-based