in this article, I will skip how to install the OpenAI Gym. Please check the offical website.

Basic usages

From [1]

Check the number of environment available.



In total we have 790 envs available!The first 4 envs are:--------------------EnvSpec(id='ALE/Tetris-v5', entry_point='gym.envs.atari:AtariEnv', reward_threshold=None, nondeterministic=False, max_episode_steps=27000, order_enforce=True, autoreset=False, disable_env_checker=False, new_step_api=False, kwargs={'game': 'tetris', 'obs_type': 'rgb', 'repeat_action_probability': 0.25, 'full_action_space': False, 'frameskip': 4}, namespace='ALE', name='Tetris', version=5)--------------------EnvSpec(id='ALE/Tetris-ram-v5', entry_point='gym.envs.atari:AtariEnv', reward_threshold=None, nondeterministic=False, max_episode_steps=27000, order_enforce=True, autoreset=False, disable_env_checker=False, new_step_api=False, kwargs={'game': 'tetris', 'obs_type': 'ram', 'repeat_action_probability': 0.25, 'full_action_space': False, 'frameskip': 4}, namespace='ALE', name='Tetris-ram', version=5)--------------------EnvSpec(id='Adventure-v0', entry_point='gym.envs.atari:AtariEnv', reward_threshold=None, nondeterministic=False, max_episode_steps=10000, order_enforce=True, autoreset=False, disable_env_checker=False, new_step_api=False, kwargs={'game': 'adventure', 'obs_type': 'rgb', 'repeat_action_probability': 0.25, 'full_action_space': False, 'frameskip': (2, 5)}, namespace=None, name='Adventure', version=0)--------------------EnvSpec(id='AdventureDeterministic-v0', entry_point='gym.envs.atari:AtariEnv', reward_threshold=None, nondeterministic=False, max_episode_steps=100000, order_enforce=True, autoreset=False, disable_env_checker=False, new_step_api=False, kwargs={'game': 'adventure', 'obs_type': 'rgb', 'repeat_action_probability': 0.25, 'full_action_space': False, 'frameskip': 4}, namespace=None, name='AdventureDeterministic', version=0)

Play a MountainCar randomly


One frame of the game

First 3 step output can be found here:

-------------------- 0 --------------------
observation: [-0.44981995 0.00045449]
reward: -1.0
done: False
info {}
-------------------- 1 --------------------
observation: [ 0.05265583 -0.00346465]
reward: -1.0
done: False
info {}
-------------------- 2 --------------------
observation: [-0.08554003 -0.00142082]
reward: -1.0
done: False
info {}

What is the meaning of the output?

The observation should be the (x, y) coordinate of the MountainCar?

The code used to visualize the observation can be found here:

In order to verify our guess, let’s plot the observations.

Looks like the plot is flipped in X. Let’s flip it back by multiple X with -1 and we have the following:

This time, it seems the observation matches with the curve in the game visualization. Just not clear about why it has 3 curves instead of one. Let’s check the source code.

Before checking the source code, we can output the action_space, observation_space and reward_range based on API described in [2]



MoutainCar env<TimeLimit<OrderEnforcing<StepAPICompatibility<PassiveEnvChecker<MountainCarEnv<MountainCar-v0>>>>>>action_space
Box([-1.2 -0.07], [0.6 0.07], (2,), float32)
(-inf, inf)

From the source code, we can find:

It makes sense, as at each position, we have 3 states, and for each state we need one specified velocity to make it keep that state.

Play a MountainCar with PPO

install stable-baseline3

pip install stable-baselines3[extra]

on Mac, you may need the following command to install stable-baseline3

pip install 'stable-baselines3[extra]'


Code is slightly adjusted from [4].

Note*: Need a powerful server in order to avoid the segmentation error.

Tensorboard visualization for the training process

Train process, the tensorboard log can be found in the github repo list at the end of this article

All the code can be found in this github repo.


