Anatomy of an OpenAI GymΒΆ
OpenAI gym, citing from the official documentation, is a toolkit for developing and comparing reinforcement learning techniques. It allows us to work with simple gmaes to complex physics-based environments, on which RL algorithmic implementations can be studied.
In addition to the environments available with the full gym installation, other third-party environments are also available on the internet (some free, while some may need a lisence). The gym library provides an easy-to-use class to define a custom environment of our choice.
At the minimum, any custom environment must inherit from gym.Env and define the following four methods,
_init_(): It defines the observation and action set of the environment using the class gym.spaces.
step(): It defines the transition of environment from current state to next state based on an action input.
reset(): It resets the state of the environment as per some initial state assumptions.
render(): It is not mandatory to define this, as it guides the code to display the Agent-Environment interaction through an episode. Even without defining this we can proceed with the training of the agent.
# Conda library installations
'''
!conda install -n env_name -c conda-forge gym[all]
!conda install -n env_name -c conda-forge atari_py
!conda install -n env_name -c conda-forge box2d-py
!conda install -n env_name -c conda-forge stable-baselines3[extra]
'''
import gym
from gym import Env
from gym.spaces import Discrete, Box, Dict, Tuple, MultiBinary, MultiDiscrete
import numpy as np
import random
import os
class CustomEnv(Env):
def __init__(self):
raise NotImplementedError
'''
self.action_space = ### gym.spaces object ###
self.observation_space = ### gym.spaces object ###
##############################################
- Environment state
- Hyperparameters
- Bookkeeping variables
- Memory buffer and other information
##############################################
'''
def step(self, action):
raise NotImplementedError
'''
##############################################
Perform 'action' to environment
Observe the state transition
Define the rewards
##############################################
# Return must be in this format
return self.state, reward, done, info
'''
def render(self):
'''
##############################################
Define a visualization or a simple print
##############################################
'''
pass
def reset(self):
raise NotImplementedError
'''
##############################################
Reset the environment state before starting a new episode
##############################################
# Return must be in this format
return self.state
'''
In addition to setting up the custom environments, one must understand the following as well,[source]
Wrappers: This class provides the functionality to modify various parts of an environment to suit specific needs.
Vectorized Environments: A lot of algorithms use parallel threads, where each thread runs an instance of the environment to both speed up the training process and improve efficiency. Vectorization of the environment is a form of wrapper.