OpenAI Gym
in this article, I will skip how to install the OpenAI Gym. Please check the offical website.
Basic usages
From [1]
Check the number of environment available.
Code
Output
In total we have 790 envs available!The first 4 envs are:--------------------EnvSpec(id='ALE/Tetris-v5', entry_point='gym.envs.atari:AtariEnv', reward_threshold=None, nondeterministic=False, max_episode_steps=27000, order_enforce=True, autoreset=False, disable_env_checker=False, new_step_api=False, kwargs={'game': 'tetris', 'obs_type': 'rgb', 'repeat_action_probability': 0.25, 'full_action_space': False, 'frameskip': 4}, namespace='ALE', name='Tetris', version=5)--------------------EnvSpec(id='ALE/Tetris-ram-v5', entry_point='gym.envs.atari:AtariEnv', reward_threshold=None, nondeterministic=False, max_episode_steps=27000, order_enforce=True, autoreset=False, disable_env_checker=False, new_step_api=False, kwargs={'game': 'tetris', 'obs_type': 'ram', 'repeat_action_probability': 0.25, 'full_action_space': False, 'frameskip': 4}, namespace='ALE', name='Tetris-ram', version=5)--------------------EnvSpec(id='Adventure-v0', entry_point='gym.envs.atari:AtariEnv', reward_threshold=None, nondeterministic=False, max_episode_steps=10000, order_enforce=True, autoreset=False, disable_env_checker=False, new_step_api=False, kwargs={'game': 'adventure', 'obs_type': 'rgb', 'repeat_action_probability': 0.25, 'full_action_space': False, 'frameskip': (2, 5)}, namespace=None, name='Adventure', version=0)--------------------EnvSpec(id='AdventureDeterministic-v0', entry_point='gym.envs.atari:AtariEnv', reward_threshold=None, nondeterministic=False, max_episode_steps=100000, order_enforce=True, autoreset=False, disable_env_checker=False, new_step_api=False, kwargs={'game': 'adventure', 'obs_type': 'rgb', 'repeat_action_probability': 0.25, 'full_action_space': False, 'frameskip': 4}, namespace=None, name='AdventureDeterministic', version=0)
Play a MountainCar randomly
Output
First 3 step output can be found here:
-------------------- 0 --------------------
observation: [-0.44981995 0.00045449]
reward: -1.0
done: False
info {}
-------------------- 1 --------------------
observation: [ 0.05265583 -0.00346465]
reward: -1.0
done: False
info {}
-------------------- 2 --------------------
observation: [-0.08554003 -0.00142082]
reward: -1.0
done: False
info {}
What is the meaning of the output?
The observation should be the (x, y) coordinate of the MountainCar?
The code used to visualize the observation can be found here:
In order to verify our guess, let’s plot the observations.
Looks like the plot is flipped in X. Let’s flip it back by multiple X with -1 and we have the following:
This time, it seems the observation matches with the curve in the game visualization. Just not clear about why it has 3 curves instead of one. Let’s check the source code.
Before checking the source code, we can output the action_space, observation_space and reward_range based on API described in [2]
Code
Output
MoutainCar env<TimeLimit<OrderEnforcing<StepAPICompatibility<PassiveEnvChecker<MountainCarEnv<MountainCar-v0>>>>>>action_space
Discrete(3)observation_space
Box([-1.2 -0.07], [0.6 0.07], (2,), float32)reward_range
(-inf, inf)
From the source code, we can find:
It makes sense, as at each position, we have 3 states, and for each state we need one specified velocity to make it keep that state.
Play a MountainCar with PPO
install stable-baseline3
pip install stable-baselines3[extra]
on Mac, you may need the following command to install stable-baseline3
pip install 'stable-baselines3[extra]'
Code[4]
Code is slightly adjusted from [4].
Note*: Need a powerful server in order to avoid the segmentation error.
Tensorboard visualization for the training process
All the code can be found in this github repo.
Reference
[1]Getting started with OpenAI Gym TDS blog
[2] Gym OpenAI offical website related to Gym API
[3] Moutain Car source code