强化学习是目前热门的研究方向。对不同强化学习的方法与paper进行分类有助于我们进一步了解针对不同的应用场景,如何使用合适的强化学习方法。本文将对强化学习进行分类并列出对应的paper。
7. Meta-RL系列
算法名称:RL^2
论文标题:RL^2: Fast Reinforcement Learning via Slow Reinforcement Learning
发表会议:ICLR, 2017
论文链接:https://arxiv.org/abs/1611.02779
当前谷歌学术引用次数:472
算法名称:MAML
论文标题:Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
发表会议:ICML, 2017
论文链接:https://arxiv.org/abs/1703.03400
当前谷歌学术引用次数:3383
算法名称:MAML
论文标题:A Simple Neural Attentive Meta-Learner
发表会议:ICLR, 2018
论文链接:https://openreview.net/pdf?id=B1DmUzWAW
当前谷歌学术引用次数:576
8. Scaling RL系列
算法名称:IMPALA
论文标题:IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures
发表会议:ICML, 2018
论文链接:https://arxiv.org/abs/1802.01561
当前谷歌学术引用次数:583
算法名称:Ape-X
论文标题:Distributed Prioritized Experience Replay
发表会议:ICLR, 2018
论文链接:https://openreview.net/forum?id=H1Dy---0Z
当前谷歌学术引用次数:303
算法名称:R2D2
论文标题:Recurrent Experience Replay in Distributed Reinforcement Learning
发表会议:ICLR, 2018
论文链接:https://openreview.net/forum?id=r1lyTjAqYX
当前谷歌学术引用次数:139
9. RL in the Real World系列
算法名称:QT-Opt
论文标题:QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation
发表会议:Proceedings of The 2nd Conference on Robot Learning, 2018
论文链接:https://arxiv.org/abs/1806.10293
当前谷歌学术引用次数:310
算法名称:Horizon: Facebook’s Open Source Applied Reinforcement Learning Platform
论文标题:Horizon: Facebook’s Open Source Applied Reinforcement Learning Platform
发表会议:Arxiv
论文链接:https://arxiv.org/abs/1811.00260
当前谷歌学术引用次数:66
10. Safety系列
算法名称:LFP
论文标题:Deep Reinforcement Learning From Human Preferences
发表会议:NIPS, 2017
论文链接:https://arxiv.org/abs/1706.03741
当前谷歌学术引用次数:412
算法名称:CPO
论文标题:Constrained Policy Optimization
发表会议:ICML, 2017
论文链接:https://arxiv.org/abs/1705.10528
当前谷歌学术引用次数:368
算法名称:HIRL
论文标题:Trial without Error: Towards Safe Reinforcement Learning via Human Intervention
发表会议:AAMAS, 2018
论文链接:https://arxiv.org/abs/1707.05173
当前谷歌学术引用次数:105
算法名称:Leave No Trace
论文标题:Leave No Trace: Learning to Reset for Safe and Autonomous Reinforcement Learning
发表会议:ICLR, 2018
论文链接:https://arxiv.org/abs/1711.06782
当前谷歌学术引用次数:56