Scaling Laws vs Model Architectures: How does Inductive Bias Influence Scaling? Yi Tay,...
Scaling Laws vs Model Architectures: How does Inductive Bias Influence Scaling? Yi Tay,...
UL2: Unifying Language Learning Paradigms https://arxiv.org/abs/2205.05131v3 Yi Tay, Mo...
Transcending Scaling Laws with 0.1% Extra Compute https://arxiv.org/abs/2210.11399 Yi T...
Emergent Abilities of Large Language Models https://arxiv.org/abs/2206.07682 Jason Wei,...
A Pretrainer's Guide to Training Data: Measuring the Effects of Data Age, Domain Covera...
Scaling Laws for Autoregressive Generative Modeling Oct 2020 https://arxiv.org/abs/2010...
Scaling Laws for Neural Language Models Jan 2020 https://arxiv.org/abs/2001.08361 Jared...
预训练数据域(如维基百科、书籍、网络文本)的混合比例极大地影响了语言模型(LM)的性能。在本文中,我们提出了具有Minimax优化的域重新加权(DoReMi),它首先在域上使...
LoRA: Low-Rank Adaptation of Large Language Models Jun 2021 Edward J. Hu*, Yelong Shen*...
May 2023 https://arxiv.org/abs/2305.11206 [Meta AI, Carnegie Mellon University, Univers...