Reinforcement learning

Screenshot from [1]

Based on [1], for RL algorithm, we can have:

  • Value-based
  • Policy-based
  • Both value and policy based algorithms.

Some nice book and tutorials can be found in [2][3][4].

This article is not complete yet, I will keep on updating. If any suggestions, please leave a comment. Thanks!

Reference

[1] Lihong Yi video in Chinese

[2] textbook

[3]lectures by David Silver from deep mind

[4] Lecture by John Schulman (the first author of PPO) OpenAI

--

--