高含量单细胞表型时代下的细胞类型发现与表征

Cell type discovery and representation in the era of high-content single cell phenotyping

题目:高含量单细胞表型时代下的细胞类型发现与表征

作者及单位:

Trygve Bakken†, Lindsay Cowell†, Brian D. Aevermann, Mark Novotny, Rebecca Hodge, Jeremy A. Miller, Alexandra Lee, Ivan Chang, Jamison McCorrison, Bali Pulendran, Yu Qian, Nicholas J. Schork, Roger S. Lasken, Ed S. Lein and Richard H. Scheuermann

  • J. Craig Venter Institute, 4120 Capricorn Lane, La Jolla, CA 92037, USA
  • Department of Pathology, University of California San Diego, 9500 Gilman Drive, La Jolla, CA 92093, USA

发表期刊及时间:

BMC BioinformaticsBMC series

  • Published: 21 December 2017

摘要:

Background

A fundamental characteristic of multicellular organisms is the specialization of functional cell types through the process of differentiation. These specialized cell types not only characterize the normal functioning of different organs and tissues, they can also be used as cellular biomarkers of a variety of different disease states and therapeutic/vaccine responses. In order to serve as a reference for cell type representation, the Cell Ontology has been developed to provide a standard nomenclature of defined cell types for comparative analysis and biomarker discovery. Historically, these cell types have been defined based on unique cellular shapes and structures, anatomic locations, and marker protein expression. However, we are now experiencing a revolution in cellular characterization resulting from the application of new high-throughput, high-content cytometry and sequencing technologies. The resulting explosion in the number of distinct cell types being identified is challenging the current paradigm for cell type definition in the Cell Ontology.

背景

多细胞生物的基本特征是通过分化过程使功能细胞类型特化。这些特化细胞类型不仅表征不同器官和组织 的正常功能,它们还可以用作各种不同疾病状态和治疗/疫苗反应的细胞生物标志物。为了作为细胞类型代表的 参考,已经开发了细胞本体论(Cell Ontology)数据库以提供定义细胞类型的标准命名法,这套方法可以用于 比较分析和生物标记物的发现。历史上,已经基于细胞独特的形状、结构、解剖学位置和标记蛋白表达来定义 这些细胞类型。然而,由于新高通量、高含量细胞计数和测序技术的应用,我们现在正在经历细胞表征的革 命。由此导致鉴定出的不同细胞类型数量的爆炸性增长,给当前细胞本体论中细胞类型定义的范例带来了挑 战

Results

In this paper, we provide examples of state-of-the-art cellular biomarker characterization using high-content cytometry and single cell RNA sequencing, and present strategies for standardized cell type representations based on the data outputs from these cutting-edge technologies, including “context annotations” in the form of standardized experiment metadata about the specimen source analyzed and marker genes that serve as the most useful features in machine learning-based cell type classification models. We also propose a statistical strategy for comparing new experiment data to these standardized cell type representations.

结果 在本文中,我们利用高含量细胞计数和单细胞RNA测序的方法,提供最新细胞生物标记物表征的实例,并 基于这些尖端技术的输出数据提供了标准化细胞类型表示的策略,包括以标准化实验元数据的形式的“上下文注 释”,这些元数据关于标本源分析以及marker基因(基于机器学习的细胞类型分类模型中最有用的特征得 到)。我们还提出了一种统计策略,用于将新实验数据与这些标准化细胞类型代表进行比较

Conclusion

The advent of high-throughput/high-content single cell technologies is leading to an explosion in the number of distinct cell types being identified. It will be critical for the bioinformatics community to develop and adopt data standard conventions that will be compatible with these new technologies and support the data representation needs of the research community. The proposals enumerated here will serve as a useful starting point to address these challenges.

结论 高通量/高含量单细胞技术的出现带来了不同细胞类型的数量激增。重要的是,生物信息学领域需要开发 和采用与这些新技术兼容的数据标准惯例,并支持研究界的数据表示需求。这里列举的提案将成为应对这些挑 战的有用起点。

Keyword

  • Cell ontology
  • Single cell transcriptomics
  • Cell phenotype
  • Peripheral blood mononuclear cells
  • Neuron
  • Next generation sequencing
  • Cytometry
  • Open biomedical ontologies
  • Marker genes

关键词: 细胞本体论, 单细胞转录组学, 细胞表型, 外周血单核 细胞, 神经元, 二代测序, 血细胞计数, 开放性生物医学本体论, 标 记基因

图表选摘:

image.png

Fig. 1 Identification of myeloid cell subtypes using manual gating and directed automated filtering.

图1. 使用手动控制和定向自动过滤技术鉴 定骨髓细胞亚型

A gating hierarchy (a series of iterative two-dimensional manual data partitions) has been established by the investigative team in which peripheral blood mononuclear cells (PBMC) are assessed for expression of HLA-DR and CD3, CD3- cells (Population #5) are assessed for expression of CD19 and CD14, CD19- cells (Population #7) are then assessed for expression of HLA-DR and CD16, HLA-DR+ cells (Population #10) are assessed for expression of HLA-DR and CD14, CD14- cells (Population #19) are assessed for expression of CD123 and CD141, CD141- cells (Population #21) are assessed for expression of CD11c and CD123, and CD11c + cells (Population #23) are assessed for expression of CD1c and CD16. Manual gating results are shown in the top panel; directed automated filter results using the DAFi method, a modified version of the FLOCK algorithm [21] are shown in the bottom panel

评估外周血单核细胞 HLA-DR 和 CD3 的表达情况,建立层级调控(一 系列二维迭代对数据进行划分) , 评估缺少 CD3 的细胞(群落 5) 中 CD19 和 CD14 的表达情况, 然后评估缺少 CD19 的细胞(群落 7) 中 HLA-DR 和 CD16 的表达情况, 评估 HLA-DR 表达的细胞(群落 10) 中 HLA-DR 和 CD14 的表达情况, 评估缺少 CD14 的细胞(群落 19) 中 CD123 和 CD141 的表达情况, 评估缺少 CD141 的细胞(群 落 21) 中 CD11c 和 CD123 的表达情况, 评估 CD11c 表达的细胞(群 落 23) 中 CD1c 和 CD16 的表达情况, 手工控制调控结果显示在图表 上方, 图表底部显示了使用 DAFi 方法的定向自动过滤结果, 即修改 后的 flock 算法版本

image.png

Fig. 2 Cell type representations in the Cell Ontology.

图2. 在细胞本体论中的细胞类型的表示

a The expanded is_a hierarchy of the monocyte branch. b The expanded is_a hierarchy of the dendritic cell branch. c An example of a cell type term record for dendritic cell. Note the presence of both textual definitions in the “definition” field, and the components of the logical axioms in the “has part”, “lacks_plasma_membrane_part”, and “subClassOf” fields

a 扩展的是单核细胞分支的各类分级。b 扩展的是树突状细胞分支的各类分级。 c 树突状细胞的细胞类型术语 的例子。注意文本定义同时存在于“definition”部分,以及组成逻辑公理的“has part”,“lacks_plasma_membrane_part”,以及“subClassOf” 部分。

image.png

Fig. 3 Cell type clustering and marker gene expression from RNA sequencing of single nuclei isolated from layer 1 cortex of post-mortem human brain.

图3.单细胞细胞核RNA测序结果,样本来自于人脑的1皮层。

a Heatmap of CPM expression levels of a subset of genes that show selective expression in the 11 clusters of cells identified by principle component analysis (not show). An example of the statistical methods used to identify cell clusters and marker genes from single cell/single nuclei data can be found in [13]. b Violin plots of selected marker genes in each of the 11 cell clusters. c The expanded is_a hierarchy of the neuron branch of the Cell Ontology, with the interneuron sub-branch highlighted

利用marker基因的表达水平进行聚类 来区分细胞类型 a. 热图的PCA聚类结果, 11种细胞类型 b. 小提琴图,每种marker基因在11种细胞中的表达情况 (单位: CPM) c. 细胞类型树,高亮的Interneuron与本次鉴定出的细胞类型最匹配

Cell population identification from single cell transcriptional profiling

单细 胞转录谱的细胞群体鉴定

While flow cytometry relies on detection of a pre-selected set of proteins to help define a cell’s “parts list”, transcriptional profiling uses unbiased RNA detection and quantification to characterize the parts list. Recently, the RNA sequencing technology for transcriptional profiling has been optimized for use on single cells, so-called single cell RNA sequencing (scRNAseq). The application of scRNAseq on samples from a variety of different normal and abnormal tissues is revealing a level of cellular complexity that was unanticipated only a few years ago. Thus, we are experiencing an explosion in the number of new cell types being identified using these unbiased highthroughput/high-content experimental technologies

流式细胞术依靠检测一组预先选定的蛋白质来帮助定义细胞的“部分列表”,而转录图谱则使用不带偏见 的RNA检测和定量来描述部分列表。近年来,转录谱的RNA测序技术已经被优化用于单细胞,也就是所 谓的单细胞RNA测序(Scrnaseq)。 Scrnaseq在来自不同正常和异常组织的样本上的应用揭示了几年前还 未预料到的细胞复杂性水平。因此,我们正经历着使用这些不偏不倚的高通量/高含量实验技术来识别新 的细胞类型的数量激增。

As an example, our group has recently completed an analysis of the transcriptional profiles of single nuclei from post-mortem human brain using single nucleus RNA sequencing (snRNAseq). Single nuclei from cortical layer 1 of the middle temporal gyrus were sorted into individual wells of a microtiter plate for snRNAseq analysis, and specific cell type clusters identified using iterative principle component analysis (unpublished). A heatmap of gene expression values reveals the differential expression pattern across cells from the 11 different neuronal cell clusters identified** (Fig. 3a)**. Note that cells in all 11 clusters express GAD1 (top row), a well-known marker of inhibitory interneurons. Violin plots of selected marker genes for each cell cluster demonstrate their selective expression patterns (Fig. 3b). For example, GRIK3 is selectively expressed in the i2 cluster.

作为一个例子,我们的小组最近已经完成了使用单核RNA测序(Snrnaseq)对死后人脑单个核转录谱的分 析。将颞中回第1层的单个核分为微滴板的单个井进行snrnaseq分析,并通过迭代主成分分析(未发表) 确定特定的细胞类型簇。基因表达值的热图显示了在识别出的11个不同的神经细胞群中细胞间的差异表 达模式(图)。 3A)。注意,所有11个簇中的细胞都表达小工具1(顶行),这是一种众所周知的抑制性中间 神经元的标记。为每个细胞簇选择标记基因的小提琴图显示了它们的选择性表达模式(图)。 3B)。例如, grik 3在i2集群中有选择地表示

In order to determine if the distinct cell types reflected in these snRNAseq-derived clusters have been previously reported, we examine the neuronal branch of the CL (Fig. 3c) and found that the cerebral cortex GABAergic interneuron is probably the closest match based on the following relevant definitions: cerebral cortex GABAergic interneuron - a GABAergic interneuron that is part_of a cerebral cortex.

为了确定在这些snrnaseq衍生簇中所反映的不同的细胞类型,我们检查了cl的神经元分支(图)(3C)并 根据以下相关定义发现大脑皮层GABAergic 与中间神经元可能是最接近匹配的:大脑皮层GABAergic 中间神经 元-GABAergic 中间神经元是大脑皮层的一部分。

GABAergic interneuron – An interneuron that uses GABA as a vesicular neurotransmitter.

GABAergic 中 间神经元-一种利用GABA作为水泡神经递质的中间神经元。

interneuron – Most generally any neuron which is not motor or sensory. Interneurons may also refer to neurons whose axons remain within a particular brain region as contrasted with projection neurons which have axons projecting to other brain regions.

中间神经元-通常是任何不是运动或感觉的神经元。中间神经元也可以指轴突停留在特定脑区的神经 元,与投射神经元形成对比,投射神经元的轴突投射到其他脑区

neuron - The basic cellular unit of nervous tissue. Each neuron consists of a body, an axon, and dendrites. Their purpose is to receive, conduct, and transmit impulses in the nervous system.

神经 元-神经组织的基本细胞单位。每个神经元由一个身体,一个轴突和树突组成。他们的目的是接收、引 导和传递神经系统中的冲动。

Given these definitions, it appears that each of the cell types defined by these single nuclei expression clusters represents a novel cell type that should be positioned under the cerebral cortex GABAergic interneuron parent class in the CL.

根据这些定义,似乎这些单个核表达簇所定义的每一种细胞类型都代表了一种新的细胞类型,这种类型 应该位于CL中大脑皮层GABA能间神经元的父类之下。

image.png

Fig. 4 Proposed cell type names and definitions for cell types identified from the snRNAseq experiment shown in Fig. 3

图4. 从图 3 所示的 snRNAseq 实验中识别的细胞类型,提出了细胞类型名称和 定义

翻译小组:

王俊豪、叶名琛、郑易民、陈志荣、邓峻玮、郑凌伶

©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 203,098评论 5 476
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 85,213评论 2 380
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 149,960评论 0 336
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 54,519评论 1 273
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 63,512评论 5 364
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 48,533评论 1 281
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 37,914评论 3 395
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 36,574评论 0 256
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 40,804评论 1 296
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 35,563评论 2 319
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 37,644评论 1 329
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 33,350评论 4 318
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 38,933评论 3 307
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 29,908评论 0 19
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 31,146评论 1 259
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 42,847评论 2 349
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 42,361评论 2 342

推荐阅读更多精彩内容