七、左揽雀尾
转体撤手 收脚抱球 转体上步 弓步掤臂 摆臂后捋
转体搭手 弓下前挤 转腕分手 后坐引手 弓步前按
载入数据
同时测量来自同一细胞的多种数据类型的能力(称为多峰分析)代表了单细胞基因组学的一个令人兴奋的新领域。例如,CITE-seq可以同时测量同一细胞的转录组和细胞表面蛋白。其他令人兴奋的多峰技术,例如10x多基因组试剂盒,可以对细胞转录组和染色质可及性(即scRNA-seq + scATAC-seq)进行配对测量。可以与细胞转录组一起测量的其他方式包括遗传扰动,细胞甲基化组和来自细胞哈希的标签寡核苷酸。我们设计了Seurat4,可以无缝存储,分析和探索各种多模式单细胞数据集。
在此小插图中,我们介绍了创建多模式Seurat对象并执行初始分析的入门工作流程。例如,我们演示了如何根据测得的细胞转录组对CITE-seq数据集进行聚类,然后发现在每个聚类中富集的细胞表面蛋白。我们注意到Seurat4还启用了用于分析多峰数据的更高级技术,尤其是我们的加权最近邻(WNN)方法的应用,该方法可基于两种模态的加权组合同时对单元进行聚类,您可以探索此功能在这里。
在这里,我们分析了8617个脐带血单核细胞(CBMC)的数据集,其中转录组学测量与11种表面蛋白的丰度估计值配对,其表面水平用DNA条形码抗体定量。首先,我们加载两个计数矩阵:一个用于RNA测量,一个用于抗体衍生标签(ADT)。您也可以下载ADT文件在这里和RNA的文件在这里
library(Seurat)
library(ggplot2)
library(patchwork)
# Load in the RNA UMI matrix
# Note that this dataset also contains ~5% of mouse cells, which we can use as negative controls
# for the protein measurements. For this reason, the gene expression matrix has HUMAN_ or MOUSE_
# appended to the beginning of each gene.
cbmc.rna <- as.sparse(read.csv(file = "../data/GSE100866_CBMC_8K_13AB_10X-RNA_umi.csv.gz", sep = ",",
header = TRUE, row.names = 1))
# To make life a bit easier going forward, we're going to discard all but the top 100 most
# highly expressed mouse genes, and remove the 'HUMAN_' from the CITE-seq prefix
cbmc.rna <- CollapseSpeciesExpressionMatrix(cbmc.rna)
# Load in the ADT UMI matrix
cbmc.adt <- as.sparse(read.csv(file = "../data/GSE100866_CBMC_8K_13AB_10X-ADT_umi.csv.gz", sep = ",",
header = TRUE, row.names = 1))
# Note that since measurements were made in the same cells, the two matrices have identical
# column names
all.equal(colnames(cbmc.rna), colnames(cbmc.adt))
## [1] TRUE
设置Seurat对象,添加RNA和蛋白质数据
现在,我们创建一个Seurat对象,并添加ADT数据作为第二次检测
# creates a Seurat object based on the scRNA-seq data
cbmc <- CreateSeuratObject(counts = cbmc.rna)
# We can see that by default, the cbmc object contains an assay storing RNA measurement
Assays(cbmc)
## [1] "RNA"
# create a new assay to store ADT information
adt_assay <- CreateAssayObject(counts = cbmc.adt)
# add this assay to the previously created Seurat object
cbmc[["ADT"]] <- adt_assay
# Validate that the object now contains multiple assays
Assays(cbmc)
## [1] "RNA" "ADT"
# Extract a list of features measured in the ADT assay
rownames(cbmc[["ADT"]])
## [1] "CD3" "CD4" "CD8" "CD45RA" "CD56" "CD16" "CD10" "CD11c"
## [9] "CD14" "CD19" "CD34" "CCR5" "CCR7"
# Note that we can easily switch back and forth between the two assays to specify the default
# for visualization and analysis
# List the current default assay
DefaultAssay(cbmc)
## [1] "RNA"
# Switch the default to ADT
DefaultAssay(cbmc) <- "ADT"
DefaultAssay(cbmc)
## [1] "ADT"
根据其scRNA-seq谱图对细胞进行聚类
以下步骤表示基于scRNA-seq数据的PBMC的快速聚类。有关各个步骤或更多高级选项的更多详细信息,请参见此处的PBMC群集指南。
# Note that all operations below are performed on the RNA assay Set and verify that the default
# assay is RNA
DefaultAssay(cbmc) <- "RNA"
DefaultAssay(cbmc)
## [1] "RNA"
# perform visualization and clustering steps
cbmc <- NormalizeData(cbmc)
cbmc <- FindVariableFeatures(cbmc)
cbmc <- ScaleData(cbmc)
cbmc <- RunPCA(cbmc, verbose = FALSE)
cbmc <- FindNeighbors(cbmc, dims = 1:30)
cbmc <- FindClusters(cbmc, resolution = 0.8, verbose = FALSE)
cbmc <- RunUMAP(cbmc, dims = 1:30)
DimPlot(cbmc, label = TRUE)
并排可视化多种模式
现在,我们已经从scRNA-seq谱图中获得了簇,我们可以在数据集中可视化蛋白质或RNA分子的表达。重要的是,Seurat提供了两种方式在模态之间进行切换,并指定您对分析或可视化感兴趣的模态。这一点特别重要,因为在某些情况下,相同的功能可能会以多种形式出现-例如,此数据集包含对B细胞标记CD19(蛋白质和RNA水平)的独立测量。
# Normalize ADT data,
DefaultAssay(cbmc) <- "ADT"
cbmc <- NormalizeData(cbmc, normalization.method = "CLR", margin = 2)
DefaultAssay(cbmc) <- "RNA"
# Note that the following command is an alternative but returns the same result
cbmc <- NormalizeData(cbmc, normalization.method = "CLR", margin = 2, assay = "ADT")
# Now, we will visualize CD14 levels for RNA and protein By setting the default assay, we can
# visualize one or the other
DefaultAssay(cbmc) <- "ADT"
p1 <- FeaturePlot(cbmc, "CD19", cols = c("lightgrey", "darkgreen")) + ggtitle("CD19 protein")
DefaultAssay(cbmc) <- "RNA"
p2 <- FeaturePlot(cbmc, "CD19") + ggtitle("CD19 RNA")
# place plots side-by-side
p1 | p2
# Alternately, we can use specific assay keys to specify a specific modality Identify the key
# for the RNA and protein assays
Key(cbmc[["RNA"]])
## [1] "rna_"
Key(cbmc[["ADT"]])
## [1] "adt_"
# Now, we can include the key in the feature name, which overrides the default assay
p1 <- FeaturePlot(cbmc, "adt_CD19", cols = c("lightgrey", "darkgreen")) + ggtitle("CD19 protein")
p2 <- FeaturePlot(cbmc, "rna_CD19") + ggtitle("CD19 RNA")
p1 | p2
识别scRNA-seq簇的细胞表面标记
我们可以利用配对的CITE-seq测量结果来注释来自scRNA-seq的簇,并识别蛋白质和RNA标记。
# as we know that CD19 is a B cell marker, we can identify cluster 6 as expressing CD19 on the
# surface
VlnPlot(cbmc, "adt_CD19")
# we can also identify alternative protein and RNA markers for this cluster through differential
# expression
adt_markers <- FindMarkers(cbmc, ident.1 = 5, assay = "ADT")
rna_markers <- FindMarkers(cbmc, ident.1 = 5, assay = "RNA")
head(adt_markers)
## p_val avg_log2FC pct.1 pct.2 p_val_adj
## CD10 1.161293e-206 0.4512418 1 1 1.509680e-205
## CCR7 2.052649e-189 0.2835441 1 1 2.668443e-188
## CD34 9.647958e-188 0.4379917 1 1 1.254234e-186
## CCR5 4.601039e-150 0.2871257 1 1 5.981350e-149
## CD45RA 6.699498e-86 -2.2198583 1 1 8.709348e-85
## CD14 3.093576e-62 -0.7499958 1 1 4.021649e-61
head(rna_markers)
## p_val avg_log2FC pct.1 pct.2 p_val_adj
## AC109351.1 0 0.3203893 0.265 0.005 0
## CTD-2090I13.1 0 2.0024376 0.972 0.062 0
## DCAF5 0 0.6637418 0.619 0.055 0
## DYNLL2 0 2.0387603 0.984 0.094 0
## FAM186B 0 0.3000479 0.244 0.002 0
## HIST2H2AB 0 1.3104432 0.812 0.013 0
多模式数据的其他可视化
# Draw ADT scatter plots (like biaxial plots for FACS). Note that you can even 'gate' cells if
# desired by using HoverLocator and FeatureLocator
FeatureScatter(cbmc, feature1 = "adt_CD19", feature2 = "adt_CD3")
# view relationship between protein and RNA
FeatureScatter(cbmc, feature1 = "adt_CD3", feature2 = "rna_CD3E")
FeatureScatter(cbmc, feature1 = "adt_CD4", feature2 = "adt_CD8")
# Let's look at the raw (non-normalized) ADT counts. You can see the values are quite high,
# particularly in comparison to RNA values. This is due to the significantly higher protein copy
# number in cells, which significantly reduces 'drop-out' in ADT data
FeatureScatter(cbmc, feature1 = "adt_CD4", feature2 = "adt_CD8", slot = "counts")
从10x多模态实验加载数据
Seurat还能够分析使用CellRanger v3处理的多模式10X实验的数据;例如,我们使用7900个外周血单核细胞(PBMC)的数据集(可从10X Genomics此处免费获得)重新创建上述图。
pbmc10k.data <- Read10X(data.dir = "../data/pbmc10k/filtered_feature_bc_matrix/")
rownames(x = pbmc10k.data[["Antibody Capture"]]) <- gsub(pattern = "_[control_]*TotalSeqB", replacement = "",
x = rownames(x = pbmc10k.data[["Antibody Capture"]]))
pbmc10k <- CreateSeuratObject(counts = pbmc10k.data[["Gene Expression"]], min.cells = 3, min.features = 200)
pbmc10k <- NormalizeData(pbmc10k)
pbmc10k[["ADT"]] <- CreateAssayObject(pbmc10k.data[["Antibody Capture"]][, colnames(x = pbmc10k)])
pbmc10k <- NormalizeData(pbmc10k, assay = "ADT", normalization.method = "CLR")
plot1 <- FeatureScatter(pbmc10k, feature1 = "adt_CD19", feature2 = "adt_CD3", pt.size = 1)
plot2 <- FeatureScatter(pbmc10k, feature1 = "adt_CD4", feature2 = "adt_CD8a", pt.size = 1)
plot3 <- FeatureScatter(pbmc10k, feature1 = "adt_CD3", feature2 = "CD3E", pt.size = 1)
(plot1 + plot2 + plot3) & NoLegend()
Seurat中的多模式数据的附加功能
Seurat v4还包括用于多模式数据集的分析,可视化和集成的其他功能。有关更多信息,请浏览以下资源: