LLaMA: Open and Efficient Foundation Language Models Feb 2023 Hugo Touvr...
Scaling Laws vs Model Architectures: How does Inductive Bias Influence S...
UL2: Unifying Language Learning Paradigms https://arxiv.org/abs/2205.051...
Transcending Scaling Laws with 0.1% Extra Compute https://arxiv.org/abs/...
Emergent Abilities of Large Language Models https://arxiv.org/abs/2206.0...
A Pretrainer's Guide to Training Data: Measuring the Effects of Data Age...
Scaling Laws for Autoregressive Generative Modeling Oct 2020 https://arx...
Scaling Laws for Neural Language Models Jan 2020 https://arxiv.org/abs/2...
预训练数据域(如维基百科、书籍、网络文本)的混合比例极大地影响了语言模型(LM)的性能。在本文中,我们提出了具有Minimax优化的域重新加权(...