Cell type discovery and representation in the era of high-content single cell phenotyping
题目:高含量单细胞表型时代下的细胞类型发现与表征
作者及单位:
Trygve Bakken†, Lindsay Cowell†, Brian D. Aevermann, Mark Novotny, Rebecca Hodge, Jeremy A. Miller, Alexandra Lee, Ivan Chang, Jamison McCorrison, Bali Pulendran, Yu Qian, Nicholas J. Schork, Roger S. Lasken, Ed S. Lein and Richard H. Scheuermann
- J. Craig Venter Institute, 4120 Capricorn Lane, La Jolla, CA 92037, USA
- Department of Pathology, University of California San Diego, 9500 Gilman Drive, La Jolla, CA 92093, USA
发表期刊及时间:
BMC BioinformaticsBMC series
- Published: 21 December 2017
摘要:
Background
A fundamental characteristic of multicellular organisms is the specialization of functional cell types through the process of differentiation. These specialized cell types not only characterize the normal functioning of different organs and tissues, they can also be used as cellular biomarkers of a variety of different disease states and therapeutic/vaccine responses. In order to serve as a reference for cell type representation, the Cell Ontology has been developed to provide a standard nomenclature of defined cell types for comparative analysis and biomarker discovery. Historically, these cell types have been defined based on unique cellular shapes and structures, anatomic locations, and marker protein expression. However, we are now experiencing a revolution in cellular characterization resulting from the application of new high-throughput, high-content cytometry and sequencing technologies. The resulting explosion in the number of distinct cell types being identified is challenging the current paradigm for cell type definition in the Cell Ontology.
背景
多细胞生物的基本特征是通过分化过程使功能细胞类型特化。这些特化细胞类型不仅表征不同器官和组织 的正常功能,它们还可以用作各种不同疾病状态和治疗/疫苗反应的细胞生物标志物。为了作为细胞类型代表的 参考,已经开发了细胞本体论(Cell Ontology)数据库以提供定义细胞类型的标准命名法,这套方法可以用于 比较分析和生物标记物的发现。历史上,已经基于细胞独特的形状、结构、解剖学位置和标记蛋白表达来定义 这些细胞类型。然而,由于新高通量、高含量细胞计数和测序技术的应用,我们现在正在经历细胞表征的革 命。由此导致鉴定出的不同细胞类型数量的爆炸性增长,给当前细胞本体论中细胞类型定义的范例带来了挑 战
Results
In this paper, we provide examples of state-of-the-art cellular biomarker characterization using high-content cytometry and single cell RNA sequencing, and present strategies for standardized cell type representations based on the data outputs from these cutting-edge technologies, including “context annotations” in the form of standardized experiment metadata about the specimen source analyzed and marker genes that serve as the most useful features in machine learning-based cell type classification models. We also propose a statistical strategy for comparing new experiment data to these standardized cell type representations.
结果 在本文中,我们利用高含量细胞计数和单细胞RNA测序的方法,提供最新细胞生物标记物表征的实例,并 基于这些尖端技术的输出数据提供了标准化细胞类型表示的策略,包括以标准化实验元数据的形式的“上下文注 释”,这些元数据关于标本源分析以及marker基因(基于机器学习的细胞类型分类模型中最有用的特征得 到)。我们还提出了一种统计策略,用于将新实验数据与这些标准化细胞类型代表进行比较
Conclusion
The advent of high-throughput/high-content single cell technologies is leading to an explosion in the number of distinct cell types being identified. It will be critical for the bioinformatics community to develop and adopt data standard conventions that will be compatible with these new technologies and support the data representation needs of the research community. The proposals enumerated here will serve as a useful starting point to address these challenges.
结论 高通量/高含量单细胞技术的出现带来了不同细胞类型的数量激增。重要的是,生物信息学领域需要开发 和采用与这些新技术兼容的数据标准惯例,并支持研究界的数据表示需求。这里列举的提案将成为应对这些挑 战的有用起点。
Keyword
- Cell ontology
- Single cell transcriptomics
- Cell phenotype
- Peripheral blood mononuclear cells
- Neuron
- Next generation sequencing
- Cytometry
- Open biomedical ontologies
- Marker genes
关键词: 细胞本体论, 单细胞转录组学, 细胞表型, 外周血单核 细胞, 神经元, 二代测序, 血细胞计数, 开放性生物医学本体论, 标 记基因
图表选摘:
Fig. 1 Identification of myeloid cell subtypes using manual gating and directed automated filtering.
图1. 使用手动控制和定向自动过滤技术鉴 定骨髓细胞亚型
A gating hierarchy (a series of iterative two-dimensional manual data partitions) has been established by the investigative team in which peripheral blood mononuclear cells (PBMC) are assessed for expression of HLA-DR and CD3, CD3- cells (Population #5) are assessed for expression of CD19 and CD14, CD19- cells (Population #7) are then assessed for expression of HLA-DR and CD16, HLA-DR+ cells (Population #10) are assessed for expression of HLA-DR and CD14, CD14- cells (Population #19) are assessed for expression of CD123 and CD141, CD141- cells (Population #21) are assessed for expression of CD11c and CD123, and CD11c + cells (Population #23) are assessed for expression of CD1c and CD16. Manual gating results are shown in the top panel; directed automated filter results using the DAFi method, a modified version of the FLOCK algorithm [21] are shown in the bottom panel
评估外周血单核细胞 HLA-DR 和 CD3 的表达情况,建立层级调控(一 系列二维迭代对数据进行划分) , 评估缺少 CD3 的细胞(群落 5) 中 CD19 和 CD14 的表达情况, 然后评估缺少 CD19 的细胞(群落 7) 中 HLA-DR 和 CD16 的表达情况, 评估 HLA-DR 表达的细胞(群落 10) 中 HLA-DR 和 CD14 的表达情况, 评估缺少 CD14 的细胞(群落 19) 中 CD123 和 CD141 的表达情况, 评估缺少 CD141 的细胞(群 落 21) 中 CD11c 和 CD123 的表达情况, 评估 CD11c 表达的细胞(群 落 23) 中 CD1c 和 CD16 的表达情况, 手工控制调控结果显示在图表 上方, 图表底部显示了使用 DAFi 方法的定向自动过滤结果, 即修改 后的 flock 算法版本
Fig. 2 Cell type representations in the Cell Ontology.
图2. 在细胞本体论中的细胞类型的表示
a The expanded is_a hierarchy of the monocyte branch. b The expanded is_a hierarchy of the dendritic cell branch. c An example of a cell type term record for dendritic cell. Note the presence of both textual definitions in the “definition” field, and the components of the logical axioms in the “has part”, “lacks_plasma_membrane_part”, and “subClassOf” fields
a 扩展的是单核细胞分支的各类分级。b 扩展的是树突状细胞分支的各类分级。 c 树突状细胞的细胞类型术语 的例子。注意文本定义同时存在于“definition”部分,以及组成逻辑公理的“has part”,“lacks_plasma_membrane_part”,以及“subClassOf” 部分。
Fig. 3 Cell type clustering and marker gene expression from RNA sequencing of single nuclei isolated from layer 1 cortex of post-mortem human brain.
图3.单细胞细胞核RNA测序结果,样本来自于人脑的1皮层。
a Heatmap of CPM expression levels of a subset of genes that show selective expression in the 11 clusters of cells identified by principle component analysis (not show). An example of the statistical methods used to identify cell clusters and marker genes from single cell/single nuclei data can be found in [13]. b Violin plots of selected marker genes in each of the 11 cell clusters. c The expanded is_a hierarchy of the neuron branch of the Cell Ontology, with the interneuron sub-branch highlighted
利用marker基因的表达水平进行聚类 来区分细胞类型 a. 热图的PCA聚类结果, 11种细胞类型 b. 小提琴图,每种marker基因在11种细胞中的表达情况 (单位: CPM) c. 细胞类型树,高亮的Interneuron与本次鉴定出的细胞类型最匹配
Cell population identification from single cell transcriptional profiling
单细 胞转录谱的细胞群体鉴定
While flow cytometry relies on detection of a pre-selected set of proteins to help define a cell’s “parts list”, transcriptional profiling uses unbiased RNA detection and quantification to characterize the parts list. Recently, the RNA sequencing technology for transcriptional profiling has been optimized for use on single cells, so-called single cell RNA sequencing (scRNAseq). The application of scRNAseq on samples from a variety of different normal and abnormal tissues is revealing a level of cellular complexity that was unanticipated only a few years ago. Thus, we are experiencing an explosion in the number of new cell types being identified using these unbiased highthroughput/high-content experimental technologies
流式细胞术依靠检测一组预先选定的蛋白质来帮助定义细胞的“部分列表”,而转录图谱则使用不带偏见 的RNA检测和定量来描述部分列表。近年来,转录谱的RNA测序技术已经被优化用于单细胞,也就是所 谓的单细胞RNA测序(Scrnaseq)。 Scrnaseq在来自不同正常和异常组织的样本上的应用揭示了几年前还 未预料到的细胞复杂性水平。因此,我们正经历着使用这些不偏不倚的高通量/高含量实验技术来识别新 的细胞类型的数量激增。
As an example, our group has recently completed an analysis of the transcriptional profiles of single nuclei from post-mortem human brain using single nucleus RNA sequencing (snRNAseq). Single nuclei from cortical layer 1 of the middle temporal gyrus were sorted into individual wells of a microtiter plate for snRNAseq analysis, and specific cell type clusters identified using iterative principle component analysis (unpublished). A heatmap of gene expression values reveals the differential expression pattern across cells from the 11 different neuronal cell clusters identified** (Fig. 3a)**. Note that cells in all 11 clusters express GAD1 (top row), a well-known marker of inhibitory interneurons. Violin plots of selected marker genes for each cell cluster demonstrate their selective expression patterns (Fig. 3b). For example, GRIK3 is selectively expressed in the i2 cluster.
作为一个例子,我们的小组最近已经完成了使用单核RNA测序(Snrnaseq)对死后人脑单个核转录谱的分 析。将颞中回第1层的单个核分为微滴板的单个井进行snrnaseq分析,并通过迭代主成分分析(未发表) 确定特定的细胞类型簇。基因表达值的热图显示了在识别出的11个不同的神经细胞群中细胞间的差异表 达模式(图)。 3A)。注意,所有11个簇中的细胞都表达小工具1(顶行),这是一种众所周知的抑制性中间 神经元的标记。为每个细胞簇选择标记基因的小提琴图显示了它们的选择性表达模式(图)。 3B)。例如, grik 3在i2集群中有选择地表示
In order to determine if the distinct cell types reflected in these snRNAseq-derived clusters have been previously reported, we examine the neuronal branch of the CL (Fig. 3c) and found that the cerebral cortex GABAergic interneuron is probably the closest match based on the following relevant definitions: cerebral cortex GABAergic interneuron - a GABAergic interneuron that is part_of a cerebral cortex.
为了确定在这些snrnaseq衍生簇中所反映的不同的细胞类型,我们检查了cl的神经元分支(图)(3C)并 根据以下相关定义发现大脑皮层GABAergic 与中间神经元可能是最接近匹配的:大脑皮层GABAergic 中间神经 元-GABAergic 中间神经元是大脑皮层的一部分。
GABAergic interneuron – An interneuron that uses GABA as a vesicular neurotransmitter.
GABAergic 中 间神经元-一种利用GABA作为水泡神经递质的中间神经元。
interneuron – Most generally any neuron which is not motor or sensory. Interneurons may also refer to neurons whose axons remain within a particular brain region as contrasted with projection neurons which have axons projecting to other brain regions.
中间神经元-通常是任何不是运动或感觉的神经元。中间神经元也可以指轴突停留在特定脑区的神经 元,与投射神经元形成对比,投射神经元的轴突投射到其他脑区
neuron - The basic cellular unit of nervous tissue. Each neuron consists of a body, an axon, and dendrites. Their purpose is to receive, conduct, and transmit impulses in the nervous system.
神经 元-神经组织的基本细胞单位。每个神经元由一个身体,一个轴突和树突组成。他们的目的是接收、引 导和传递神经系统中的冲动。
Given these definitions, it appears that each of the cell types defined by these single nuclei expression clusters represents a novel cell type that should be positioned under the cerebral cortex GABAergic interneuron parent class in the CL.
根据这些定义,似乎这些单个核表达簇所定义的每一种细胞类型都代表了一种新的细胞类型,这种类型 应该位于CL中大脑皮层GABA能间神经元的父类之下。
Fig. 4 Proposed cell type names and definitions for cell types identified from the snRNAseq experiment shown in Fig. 3
图4. 从图 3 所示的 snRNAseq 实验中识别的细胞类型,提出了细胞类型名称和 定义
翻译小组:
王俊豪、叶名琛、郑易民、陈志荣、邓峻玮、郑凌伶