Proximal Policy Optimization in Reinforcement learning

1 min readJul 30, 2022

Proximal Policy Optimization (PPO) is a popular algorithm for Reinforcement learning. In this article, I will put some tutorials I feel helpful during my learning process.

A general RL tutorial summary can be found here

Must see

Please check [1][2][11]. In case you understand Chinese, there are some tutorials in [2][5][12].

Reference

[0]https://openai.com/blog/openai-baselines-ppo/

[1] Proximal Policy Optimization Algorithms paper

[2]https://www.bilibili.com/video/BV1L3411373y?p=2&vd_source=b6240f85372d36f441bc6159b90611de

[3] https://zhuanlan.zhihu.com/p/377768673

[4]https://www.jianshu.com/p/9f113adc0c50

[5] https://www.bilibili.com/video/BV1MW411w79n?p=2&vd_source=b6240f85372d36f441bc6159b90611de

[6]https://zhuanlan.zhihu.com/p/48293363

[7]https://blog.csdn.net/cindy_1102/article/details/87905272

[8]PPO source code reading https://blog.csdn.net/jinzhuojun/article/details/80417179

[9]first author John Schulman

[10] first author’s advisor Pieter Abbeel https://people.eecs.berkeley.edu/~pabbeel/

[11]https://youtu.be/PtAIh9KSnjo

[12] https://zhuanlan.zhihu.com/p/389283724

Proximal Policy Optimization in Reinforcement learning

Must see

Further reading about the first author of the PPO algorithm

Reference

Written by Jimmy (xiaoke) Shen

No responses yet