OpenAI Gym

Jimmy (xiaoke) Shen

3 min readAug 2, 2022

in this article, I will skip how to install the OpenAI Gym. Please check the offical website.

Basic usages

From [1]

Check the number of environment available.

Code

Output

In total we have 790 envs available!The first 4 envs are:--------------------EnvSpec(id='ALE/Tetris-v5', entry_point='gym.envs.atari:AtariEnv', reward_threshold=None, nondeterministic=False, max_episode_steps=27000, order_enforce=True, autoreset=False, disable_env_checker=False, new_step_api=False, kwargs={'game': 'tetris', 'obs_type': 'rgb', 'repeat_action_probability': 0.25, 'full_action_space': False, 'frameskip': 4}, namespace='ALE', name='Tetris', version=5)--------------------EnvSpec(id='ALE/Tetris-ram-v5', entry_point='gym.envs.atari:AtariEnv', reward_threshold=None, nondeterministic=False, max_episode_steps=27000, order_enforce=True, autoreset=False, disable_env_checker=False, new_step_api=False, kwargs={'game': 'tetris', 'obs_type': 'ram', 'repeat_action_probability': 0.25, 'full_action_space': False, 'frameskip': 4}, namespace='ALE', name='Tetris-ram', version=5)--------------------EnvSpec(id='Adventure-v0', entry_point='gym.envs.atari:AtariEnv', reward_threshold=None, nondeterministic=False, max_episode_steps=10000, order_enforce=True, autoreset=False, disable_env_checker=False, new_step_api=False, kwargs={'game': 'adventure', 'obs_type': 'rgb', 'repeat_action_probability': 0.25, 'full_action_space': False, 'frameskip': (2, 5)}, namespace=None, name='Adventure', version=0)--------------------EnvSpec(id='AdventureDeterministic-v0', entry_point='gym.envs.atari:AtariEnv', reward_threshold=None, nondeterministic=False, max_episode_steps=100000, order_enforce=True, autoreset=False, disable_env_checker=False, new_step_api=False, kwargs={'game': 'adventure', 'obs_type': 'rgb', 'repeat_action_probability': 0.25, 'full_action_space': False, 'frameskip': 4}, namespace=None, name='AdventureDeterministic', version=0)

Play a MountainCar randomly

Output

First 3 step output can be found here:

-------------------- 0 --------------------
observation: [-0.44981995  0.00045449]
reward: -1.0
done: False
info {}
-------------------- 1 --------------------
observation: [ 0.05265583 -0.00346465]
reward: -1.0
done: False
info {}
-------------------- 2 --------------------
observation: [-0.08554003 -0.00142082]
reward: -1.0
done: False
info {}

What is the meaning of the output?

The observation should be the (x, y) coordinate of the MountainCar?

The code used to visualize the observation can be found here:

In order to verify our guess, let’s plot the observations.

Looks like the plot is flipped in X. Let’s flip it back by multiple X with -1 and we have the following:

This time, it seems the observation matches with the curve in the game visualization. Just not clear about why it has 3 curves instead of one. Let’s check the source code.

Before checking the source code, we can output the action_space, observation_space and reward_range based on API described in [2]

Code

Output

MoutainCar env<TimeLimit<OrderEnforcing<StepAPICompatibility<PassiveEnvChecker<MountainCarEnv<MountainCar-v0>>>>>>action_space
Discrete(3)observation_space
Box([-1.2  -0.07], [0.6  0.07], (2,), float32)reward_range
(-inf, inf)

From the source code, we can find: