强化学习是目前热门的研究方向。对不同强化学习的方法与paper进行分类有助于我们进一步了解针对不同的应用场景,如何使用合适的强化学习方法。本文将对强化学习进行分类并列出对应的paper。
11. Imitation Learning and Inverse Reinforcement Learning系列
算法名称:Modeling Purposeful Adaptive Behavior with the Principle of Maximum Causal Entropy
论文标题:Modeling Purposeful Adaptive Behavior with the Principle of Maximum Causal Entropy
发表会议:Carnegie Mellon University
论文链接:http://www.cs.cmu.edu/~bziebart/publications/thesis-bziebart.pdf
当前谷歌学术引用次数:291
算法名称:GCL
论文标题:Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization
发表会议:ICML, 2016
论文链接:https://arxiv.org/abs/1603.00448
当前谷歌学术引用次数:538
算法名称:GAIL
论文标题:Generative Adversarial Imitation Learning
发表会议:NIPS, 2016
论文链接:https://arxiv.org/abs/1606.03476
当前谷歌学术引用次数:1194
算法名称:DeepMimic
论文标题:DeepMimic: Example-Guided Deep Reinforcement Learning of Physics-Based Character Skills
发表会议:ACM Transactions on Graphics (TOG), 2018
论文链接:https://xbpeng.github.io/projects/DeepMimic/2018_TOG_DeepMimic.pdf
当前谷歌学术引用次数:354
--
算法名称:VAIL
论文标题:Variational Discriminator Bottleneck: Improving Imitation Learning, Inverse RL, and GANs by Constraining Information Flow
发表会议:ICLR, 2019
论文链接:https://arxiv.org/abs/1810.00821
当前谷歌学术引用次数:95
算法名称:MetaMimic
论文标题:One-Shot High-Fidelity Imitation: Training Large-Scale Deep Nets with RL
发表会议:Arxiv
论文链接:https://arxiv.org/abs/1810.05017
当前谷歌学术引用次数:12
12. Reproducibility, Analysis, and Critique系列
算法名称:rllab
论文标题:Benchmarking Deep Reinforcement Learning for Continuous Control
发表会议:ICML, 2016
论文链接:https://arxiv.org/abs/1604.06778
当前谷歌学术引用次数:1106
算法名称:Reproducibility of Benchmarked Deep Reinforcement Learning Tasks for Continuous Control
论文标题:Reproducibility of Benchmarked Deep Reinforcement Learning Tasks for Continuous Control
发表会议:ICML, 2017
论文链接:https://arxiv.org/abs/1708.04133
当前谷歌学术引用次数:149
算法名称:Deep Reinforcement Learning that Matters
论文标题:Deep Reinforcement Learning that Matters
发表会议:AAAI, 2018
论文链接:https://arxiv.org/abs/1709.06560
当前谷歌学术引用次数:852
算法名称:Where Did My Optimum Go?: An Empirical Analysis of Gradient Descent Optimization in Policy Gradient Methods
论文标题:Where Did My Optimum Go?: An Empirical Analysis of Gradient Descent Optimization in Policy Gradient Methods
发表会议:European Workshop on Reinforcement Learning, 2018
论文链接:https://arxiv.org/abs/1810.02525
当前谷歌学术引用次数:8
算法名称:Are Deep Policy Gradient Algorithms Truly Policy Gradient Algorithms?
论文标题:Are Deep Policy Gradient Algorithms Truly Policy Gradient Algorithms?
发表会议:CoRR, 2018
论文链接:https://openreview.net/forum?id=qGJDBwwhFTN
当前谷歌学术引用次数:47
算法名称:Simple Random Search Provides a Competitive Approach to Reinforcement Learning
论文标题:Simple Random Search Provides a Competitive Approach to Reinforcement Learning
发表会议:NIPS, 2018
论文链接:https://arxiv.org/abs/1803.07055
当前谷歌学术引用次数:195
算法名称:Benchmarking Model-Based Reinforcement Learning
论文标题:Benchmarking Model-Based Reinforcement Learning
发表会议:NIPS, 2018
论文链接:https://arxiv.org/abs/1907.02057
当前谷歌学术引用次数:95
13. Bonus: Classic Papers in RL Theory or Review系列
算法名称:Policy Gradient Methods for Reinforcement Learning with Function Approximation
论文标题:Policy Gradient Methods for Reinforcement Learning with Function Approximation
发表会议:NIPS, 1999
论文链接:https://papers.nips.cc/paper/1713-policy-gradient-methods-for-reinforcement-learning-with-function-approximation.pdf
当前谷歌学术引用次数:4152
算法名称:An Analysis of Temporal-Difference Learning with Function Approximation
论文标题:An Analysis of Temporal-Difference Learning with Function Approximation
发表会议:IEEE transactions on automatic control, 1997
论文链接:http://web.mit.edu/jnt/www/Papers/J063-97-bvr-td.pdf
当前谷歌学术引用次数:1302
算法名称:Reinforcement Learning of Motor Skills with Policy Gradients
论文标题:Reinforcement Learning of Motor Skills with Policy Gradients
发表会议:Neural networks, 2008
论文链接:http://www.kyb.mpg.de/fileadmin/user_upload/files/publications/attachments/Neural-Netw-2008-21-682_4867%5b0%5d.pdf
当前谷歌学术引用次数:911
算法名称:Approximately Optimal Approximate Reinforcement Learning
论文标题:Approximately Optimal Approximate Reinforcement Learning
发表会议:ICML, 2002
论文链接:https://people.eecs.berkeley.edu/~pabbeel/cs287-fa09/readings/KakadeLangford-icml2002.pdf
当前谷歌学术引用次数:516
算法名称:A Natural Policy Gradient
论文标题:A Natural Policy Gradient
发表会议:NIPS, 2001
论文链接:https://papers.nips.cc/paper/2073-a-natural-policy-gradient.pdf
当前谷歌学术引用次数:844
算法名称:Algorithms for Reinforcement Learning
论文标题:Algorithms for Reinforcement Learnin
发表会议:ICML, 2010
论文链接:https://sites.ualberta.ca/~szepesva/papers/RLAlgsInMDPs.pdf
当前谷歌学术引用次数:1152