Anatomy of an OpenAI GymΒΆ

OpenAI gym, citing from the official documentation, is a toolkit for developing and comparing reinforcement learning techniques. It allows us to work with simple gmaes to complex physics-based environments, on which RL algorithmic implementations can be studied.

In addition to the environments available with the full gym installation, other third-party environments are also available on the internet (some free, while some may need a lisence). The gym library provides an easy-to-use class to define a custom environment of our choice.

At the minimum, any custom environment must inherit from gym.Env and define the following four methods,

  1. _init_(): It defines the observation and action set of the environment using the class gym.spaces.

  2. step(): It defines the transition of environment from current state to next state based on an action input.

  3. reset(): It resets the state of the environment as per some initial state assumptions.

  4. render(): It is not mandatory to define this, as it guides the code to display the Agent-Environment interaction through an episode. Even without defining this we can proceed with the training of the agent.

# Conda library installations
'''
!conda install -n env_name -c conda-forge gym[all]
!conda install -n env_name -c conda-forge atari_py
!conda install -n env_name -c conda-forge box2d-py
!conda install -n env_name -c conda-forge stable-baselines3[extra]
'''
import gym
from gym import Env
from gym.spaces import Discrete, Box, Dict, Tuple, MultiBinary, MultiDiscrete 
import numpy as np
import random
import os

class CustomEnv(Env):
    def __init__(self):
        raise NotImplementedError
        '''
        self.action_space = ### gym.spaces object ###
        self.observation_space = ### gym.spaces object ###
        ##############################################
        - Environment state
        - Hyperparameters
        - Bookkeeping variables
        - Memory buffer and other information
        ##############################################
        '''
        
    def step(self, action):
        raise NotImplementedError
        '''
        ##############################################
        Perform 'action' to environment
        Observe the state transition
        Define the rewards
        ##############################################
        # Return must be in this format
        return self.state, reward, done, info
        '''

    def render(self):
        '''
        ##############################################
        Define a visualization or a simple print
        ##############################################
        '''
        pass
    
    def reset(self):
        raise NotImplementedError
        '''
        ##############################################
        Reset the environment state before starting a new episode
        ##############################################
        # Return must be in this format
        return self.state
        '''

In addition to setting up the custom environments, one must understand the following as well,[source]

  1. Wrappers: This class provides the functionality to modify various parts of an environment to suit specific needs.

  2. Vectorized Environments: A lot of algorithms use parallel threads, where each thread runs an instance of the environment to both speed up the training process and improve efficiency. Vectorization of the environment is a form of wrapper.