Proximal Policy Optimization in Reinforcement learning

Jimmy (xiaoke) Shen
1 min readJul 30, 2022

--

Proximal Policy Optimization (PPO) is a popular algorithm for Reinforcement learning. In this article, I will put some tutorials I feel helpful during my learning process.

A general RL tutorial summary can be found here

Must see

Please check [1][2][11]. In case you understand Chinese, there are some tutorials in [2][5][12].

Further reading about the first author of the PPO algorithm

See [9][10]

Of course, based on the audience’s different background, you may have different requirements on your learning process. If you wanna understand the basic and know how to use the PPO algorithm, the info here is pretty much enough. However, if you wanna do research, i highly recommend you read the papers and blogs from [9]. It is really helpful.

Reference

[0]https://openai.com/blog/openai-baselines-ppo/

[1] Proximal Policy Optimization Algorithms paper

[2]https://www.bilibili.com/video/BV1L3411373y?p=2&vd_source=b6240f85372d36f441bc6159b90611de

[3] https://zhuanlan.zhihu.com/p/377768673

[4]https://www.jianshu.com/p/9f113adc0c50

[5] https://www.bilibili.com/video/BV1MW411w79n?p=2&vd_source=b6240f85372d36f441bc6159b90611de

[6]https://zhuanlan.zhihu.com/p/48293363

[7]https://blog.csdn.net/cindy_1102/article/details/87905272

[8]PPO source code reading https://blog.csdn.net/jinzhuojun/article/details/80417179

[9]first author John Schulman

[10] first author’s advisor Pieter Abbeel https://people.eecs.berkeley.edu/~pabbeel/

[11]https://youtu.be/PtAIh9KSnjo

[12] https://zhuanlan.zhihu.com/p/389283724

--

--

No responses yet