导读
从两克重的大黄蜂到重达数吨的鲸鱼,地球上存在着包括人类在内的丰富的物种,在过去的漫长时间里,它们几乎适应了地球上的所有环境。其中,哺乳动物是最多样化的一类动物,无论是在大小上,还是在形状上,均表现出丰富的多样性。自生命科学研究出现以来,了解哺乳动物的变异是何时、如何以及在何种选择压力下发展起来的一直是人们感兴趣的问题。此外,通过研究人的进化史,还可以进一步了解人类的健康状况,例如,那些在许多物种中保守的基因可能是对正常功能至关重要的基因,因此当其发生改变时可能导致疾病。
2023 年 4 月 28 日,诸多科学家们与世界上最大的哺乳动物基因组学比较资源 Zoonomia Project 的国际合作,同日在 Science 杂志上发表了 11 篇研究论文。他们对 240 种哺乳动物物种(占哺乳动物家族的 80% 以上)的基因组多样性进行了编目。其中,部分研究发现指出人类基因组中经过数百万年进化后保持不变的部分,提供了可能揭示人类健康和疾病的信息。
Zoonomia Project 是由麻省理工学院、哈佛大学等单位的科学家牵头的一个大型国际研究项目,研究人员通过对一系列哺乳动物基因组进行测序,然后将数百个物种的基因组进行整合分析,为理解哺乳动物、哺乳动物进化和人类自身打开一扇新的大门。研究人员对一系列哺乳动物基因组进行测序,然后将它们进行对比,这是一项巨大的计算任务。利用这种比对,研究人员确定了基因组的关键区域,在哺乳动物物种和数百万年的进化中最为保守或不变。
作者假设,虽然这些区域不产生蛋白质,但可能包含指导蛋白质产生时间和数量的指令,这些区域的突变可能在疾病的起源或哺乳动物物种的独特特征中发挥重要作用。通过他们的分析,研究人员也验证了这一假设,并能够确定至少 10% 的人类基因组是有功能的,大约是蛋白质编码(1%)的十倍。研究结果进一步揭示了遗传变异可能在罕见和常见的人类疾病(包括癌症)中起到因果作用。
01
如果某些东西对物种正常的功能很重要,那么它往往会在进化过程中被保存下来,即进化约束概念。因此,进化约束是衡量基因组中特定区域在生命进化树上的变化程度。在今日 Science 特刊的一篇研究 Leveraging base-pair mammalian constraint to understand genetic variation and human disease中,Sullivan 等人观察到的在许多物种和进化过程中保持不变的 DNA 序列,以及在一个或几个谱系中突然开始积累突变的序列,都强有力地表明了功能相关性和进化力量在起作用。研究人员还通过研究髓母细胞瘤患者,发现了人类基因组进化保守位置的突变,他们认为这些突变可能导致脑肿瘤生长更快或抵抗治疗。结果表明,在疾病研究中使用这些数据和方法可以更容易地发现增加疾病风险的遗传变化。
02
在研究 Evolutionary constraint and innovation across hundreds of placental mammals中,研究人员确定了与哺乳动物世界中一些特殊特征相关的基因组部分,例如非凡的大脑大小、卓越的嗅觉以及在冬季冬眠的能力。作者使用基因组来证实,对有效种群规模和多样性的估计可以帮助预测难以监测和采样的物种的风险。
03
在另一项研究 A genomic timescale for placental mammal evolution 中表明,甚至在大约 65 万年前,即地球被小行星撞击、恐龙灭绝之前,哺乳动物就已经开始变异和分化。
Superordinal mammalian diversification took place in the Cretaceous during periods of continental fragmentation and sea level rise with little phylogenomic discordance (pie charts: left, autosomes; right, X chromosome), which is consistent with allopatric speciation. By contrast, the Paleogene hosted intraordinal diversification in the aftermath of the K-Pg mass extinction event, when clades exhibited higher phylogenomic discordance consistent with speciation with gene flow and incomplete lineage sorting.
04
另一项题为Three-dimensional genome rewiring in loci with human accelerated regions的研究中,使用 Zoonomia 数据和实验分析检查了 10000 多个特定于人类的基因缺失,并将其中一些与神经元的功能联系起来。
The HAR is nearby and regulates gene A, but not gene B, as the chimpanzee genome folds. An insertion in the human genome brings the HAR closer to gene B, causing expression of gene B. The HAR adapts to being in gene B’s regulatory domain through substitutions to previously conserved nucleotides.
05
一篇题为Comparative genomics of Balto, a famous historic dog, captures lost diversity of 1920s sled dogs的研究中,提供了为什么 1920 年代一只名叫巴尔托的著名雪橇犬能够在阿拉斯加的恶劣环境中幸存下来的遗传解释。
In an unsupervised admixture analysis, Balto’s ancestry, representing 20th-century Alaskan sled dogs, is assigned predominantly to four Arctic lineage dog populations. He had no discernable wolf ancestry. The Alaskan sled dogs (a working population) did not fall into a distinct ancestry cluster but shared about a third of their ancestry with Balto in the supervised admixture analysis. Balto and working sled dogs carried fewer constrained and missense rare variants than modern dog breeds.IMAGE CREDIT: K. MORRILL
06
一篇题为 The functional and evolutionary impacts of human-specific deletions in conserved elements的研究中,Xue 等人则分享了对基因组结构的研究。在确定了仅跨越少数碱基的缺失后,他们分析了这些缺失在多种人类细胞类型中调节基因表达的能力,并探索了这些缺失是否可能导致独特的人类表型。结果发现,复杂的认知功能再次成为人类进化过程中序列变化的主要受益者之一,这些小缺失附近的基因系统地富集了那些在大脑和神经元功能中发挥作用的基因。通过实验证实了它们在多种细胞类型中的功能后,作者还观察到,许多缺失导致人类细胞中基因表达的增加,这是获取新功能的驱动因素。
We assessed 10,032 hCONDELs across diverse, biologically relevant datasets and identified tissue-specific enrichment (top left). The regulatory impact of hCONDELs was characterized by comparing chimp and human sequences in MPRAs (bottom left). The ability of hCONDELs to either improve or perturb activating and repressing gene-regulatory elements was assessed (top right). The deleted chimpanzee sequence was reintroduced back into human cells, causing a cascade of transcriptional differences for an hCONDEL regulating LOXL2 (bottom right).
07
在一篇题为 Relating enhancer genetic variation across mammals to complex phenotypes using machine learning的研究中,研究人员使用机器学习来识别与大脑大小相关的基因组区域。
TACIT works by generating open chromatin data from a few species in a tissue related to a phenotype, using the sequences underlying open and closed chromatin regions to train a machine learning model for predicting tissue-specific open chromatin and associating open chromatin predictions across dozens of mammals with the phenotype. [Species silhouettes are from PhyloPic]
08
在题为 Mammalian evolution of human cis-regulatory elements and transcription factor binding sites的研究中,描述了人类基因组中调控序列的进化。
(A) Distribution of human cCREs by the number of genomes they align.
(B) Projection of cCREs by alignments to the other 240 mammalian genomes.
(C) Project of HNF4A sites (constrained, red; unconstrained, blue).
(D) Heritability enrichment for 69 human traits in partitions of TFBSs ordered by evolutionary constraint.
(E) Heritability enrichment for human traits by subsets of TFBSs.
09
在题为 Insights into mammalian TE diversity through the curation of 248 genome assemblies 的研究中,检测了 248 个胎盘哺乳动物基因组装配体的转座元件 (transposable element, TE) 含量,这是迄今真核生物中最大的 de novo TE 管理工作。研究发现,尽管哺乳动物在总 TE 含量和多样性方面相似,但它们在近期 TE 积累方面表现出实质性的差异。哺乳动物在任何给定的时间往往只积累少数几种 TE,其中一种 TE 占主导地位。此外,还发现了饮食习惯与 DNA 转座子入侵之间的关联。
Five categories of TE were examined: DNA transposons, long interspersed elements (LINEs), long terminal repeat (LTR) retrotransposons, rolling circle (RC) transposons, and short interspersed elements (SINEs). Species with the highest and lowest proportions for each TE type are indicated by a picture of the organism and its common name. With regard to RC and DNA transposons, we found that most mammalian genome assemblies exhibit essentially zero recent accumulation (RC: 240 of 248 mammals had <0.1%; DNA: 210 of 248 mammals had <0.1%).ILLUSTRATIONS: BRITTANY ANN HALE
10
在题为 The contribution of historical processes to contemporary extinction risk in placental mammals 的研究中,调查了 240 种哺乳动物的单基因组的遗传变异,发现由于遗传负荷的长期积累和固定,历史上种群较小的物种携带了比例较大的有害等位基因,有较高的灭绝风险。
Across 240 mammals, species with smaller historical Ne had lower genetic diversity, higher genetic load, and were more likely to be threatened with extinction. Genomic data were used to train models that predict whether a species is threatened, which can be valuable for assessing extinction risk in species lacking ecological or census data. [Animal silhouettes are from PhyloPic]
11
在题为 Integrating gene annotation with orthology inference at scale 的研究中,提出了 TOGA(Tool to infer Orthologs from Genome Alignments),这是一种集成了结构基因注释和同源序列推断的方法。研究人员将其应用于 488 个胎盘哺乳动物和 501 个鸟类,从而创建了迄今最大的比较基因资源。
Orthologous, but not paralogous, genes have partially aligning intronic and intergenic regions. TOGA uses this principle to infer orthologous gene loci and integrates orthology inference with gene annotation. Using a reference species, TOGA can be applied to hundreds of aligned query genomes to provide rich comparative genomics resources.
在本期 Science 特刊的一系列论文中,比较了 240 种哺乳动物的基因组,其中还包含了许多受威胁或濒危物种。这些 DNA 样本由全球 50 多个不同的机构收集和提供,这些发现有助于说明比较基因组学如何不仅可以阐明某些物种如何取得非凡的壮举,还可以帮助科学家更好地了解我们基因组中功能正常的部分以及它们如何影响健康和疾病。
参考文献:
1. Bogdan M. Kirilenko et al. Integrating gene annotation with orthology inference at scale. Science (2023).
2. Aryn P. Wilder et al. The contribution of historical processes to contemporary extinction risk in placental mammals. Science (2023).
3. Nicole M. Foley et al. A genomic timescale for placental mammal evolution. Science (2023).
4. Austin B. Osmanski et al. Insights into mammalian TE diversity through the curation of 248 genome assemblies. Science (2023).
5. James R. Xue et al. The functional and evolutionary impacts of human-specific deletions in conserved elements. Science (2023).
6. Matthew J. Christmas and Irene M. Kaplow et al. Evolutionary constraint and innovation across hundreds of placental mammals. Science (2023).
7. Katherine L. Moon et al. Comparative genomics of Balto, a famous historic dog, captures lost diversity of 1920s sled dogs. Science (2023).
8. Gregory Andrews et al. Mammalian evolution of human cis-regulatory elements and transcription factor binding sites. Science (2023).
9. Kathleen C. Keough et al. Three-dimensional genome rewiring in loci with human accelerated regions. Science (2023).
10. Irene M. Kaplow et al. Relating enhancer genetic variation across mammals to complex phenotypes using machine learning. Science (2023).
11. Patrick F. Sullivan et al. Leveraging base-pair mammalian constraint to understand genetic variation and human disease. Science (2023).