scRNA-seq文章复现：A Cellular Taxonomy of the Bone Marrow Stroma in Homeostasis and Leukemia （1）

2020-04-13
参考文献：A Cellular Taxonomy of the Bone Marrow Stroma in Homeostasis and Leukemia
目标：利用文章中描述的工具和方法，复现文中主要的scRNA-seq分析图表

前言

最有效的学习方法就是在实践中学习，今天开始要尝试复现的文章是A Cellular Taxonomy of the Bone Marrow Stroma in Homeostasis and Leukemia，该文章于2019年发表在Cell 杂志上。文章的分析代码并未开源，因此只能根据原文描述来尽量复现其结果。

摘要

Stroma is a poorly defined non-parenchymal component of virtually every organ with key roles in organ development, homeostasis, and repair. Studies of the bone marrow stroma have defined individual populations in the stem cell niche regulating hematopoietic regeneration and capable of initiating leukemia. Here, we use single-cell RNA sequencing (scRNAseq) to define a cellular taxonomy of the mouse bone marrow stroma and its perturbation by malignancy. We identified seventeen stromal subsets expressing distinct hematopoietic regulatory genes spanning new fibroblastic and osteoblastic subpopulations including distinct osteoblast differentiation trajectories. Emerging acute myeloid leukemia impaired mesenchymal osteogenic differentiation and reduced regulatory molecules necessary for normal hematopoiesis. These data suggest that tissue stroma responds to malignant cells by disadvantaging normal parenchymal cells. Our taxonomy of the stromal compartment provides a comprehensive bone marrow cell census and experimental support for cancer cell crosstalk with specific stromal elements to impair normal tissue function and thereby enable emergent cancer.

文章思路

利用单细胞测序分析小鼠正常骨髓基质，鉴定了17个细胞亚群及其基因表达特征，以及在稳定造血状态下表达关键龛位（niche）因子的基质细胞。
进一步推断细胞亚群的分化关系。
描绘了急性髓细胞白血病（Acute myeloid leukemia , AML）对骨髓微环境的整体影响及相应的细胞和分子异常。

scRNA-seq

建库方式：Chromium Single Cell 30 v2 Reagent Kit (10x Genomics)
上游分析：Cellranger toolkit (version 2.0.1, 10X Genomics)
表达矩阵下载：GSE128423
下游分析：Seurat v2.3.4、destiny等

这么多样本，重新跑Cellranger流程比较花时间，我们直接从下载表达矩阵开始。按照Cellranger输出格式来组织文件，每个数据集放在单独的文件夹，内含matrix.mtx.gz、barcodes.tsv.gz和genes.tsv.gz（CellRanger 3.0以上版本改为features.tsv.gz）三个文件。这里我手动改成了3.0的形式，以便于保持.gz压缩格式读入Seurat

lyc@lyc-VirtualBox:~/lyc-1995/BM_Mouse/data$ tree
.
├── b1
│   ├── barcodes.tsv.gz
│   ├── features.tsv.gz
│   └── matrix.mtx.gz
├── b2
│   ├── barcodes.tsv.gz
│   ├── features.tsv.gz
│   └── matrix.mtx.gz
├── b3
│   ├── barcodes.tsv.gz
│   ├── features.tsv.gz
│   └── matrix.mtx.gz
├── b4
│   ├── barcodes.tsv.gz
│   ├── features.tsv.gz
│   └── matrix.mtx.gz
├── bm1
│   ├── barcodes.tsv.gz
│   ├── features.tsv.gz
│   └── matrix.mtx.gz
├── bm2
│   ├── barcodes.tsv.gz
│   ├── features.tsv.gz
│   └── matrix.mtx.gz
├── bm3
│   ├── barcodes.tsv.gz
│   ├── features.tsv.gz
│   └── matrix.mtx.gz
├── bm4
│   ├── barcodes.tsv.gz
│   ├── features.tsv.gz
│   └── matrix.mtx.gz
├── ctrl_10May
│   ├── barcodes.tsv.gz
│   ├── features.tsv.gz
│   └── matrix.mtx.gz
├── ctrl_16May
│   ├── barcodes.tsv.gz
│   ├── features.tsv.gz
│   └── matrix.mtx.gz
├── ctrl_26May
│   ├── barcodes.tsv.gz
│   ├── features.tsv.gz
│   └── matrix.mtx.gz
├── ctrl_7Jun
│   ├── barcodes.tsv.gz
│   ├── features.tsv.gz
│   └── matrix.mtx.gz
├── ctrl_8May
│   ├── barcodes.tsv.gz
│   ├── features.tsv.gz
│   └── matrix.mtx.gz
├── MLL_10May
│   ├── barcodes.tsv.gz
│   ├── features.tsv.gz
│   └── matrix.mtx.gz
├── MLL_26May
│   ├── barcodes.tsv.gz
│   ├── features.tsv.gz
│   └── matrix.mtx.gz
├── MLL_31May
│   ├── barcodes.tsv.gz
│   ├── features.tsv.gz
│   └── matrix.mtx.gz
├── MLL_8May
│   ├── barcodes.tsv.gz
│   ├── features.tsv.gz
│   └── matrix.mtx.gz
├── std1
│   ├── barcodes.tsv.gz
│   ├── features.tsv.gz
│   └── matrix.mtx.gz
├── std2
│   ├── barcodes.tsv.gz
│   ├── features.tsv.gz
│   └── matrix.mtx.gz
├── std3
│   ├── barcodes.tsv.gz
│   ├── features.tsv.gz
│   └── matrix.mtx.gz
├── std4
│   ├── barcodes.tsv.gz
│   ├── features.tsv.gz
│   └── matrix.mtx.gz
├── std5
│   ├── barcodes.tsv.gz
│   ├── features.tsv.gz
│   └── matrix.mtx.gz
└── std6
    ├── barcodes.tsv.gz
    ├── features.tsv.gz
    └── matrix.mtx.gz

安装Seurat

Seurat目前已经更新到3.1.4版，但基本流程和2.3.4并没有太大的差别。详见：单细胞Seurat包升级，换汤不换药
我们直接安装最新版Seurat：

> install.packages('Seurat')
> library(Seurat)

分析稳态下的小鼠骨髓（n = 6）

主要目的是得到Figure S1中的两张细胞分群tSNE图：

标注的std7和std8可能是笔误吧……

读取表达矩阵

> getwd()
[1] "/home/lyc/lyc-1995/BM_Mouse"

批量读取表达矩阵，并在细胞barcode前添加样本标签：

sample.id <- paste0('std', 1:6)
raw.data <- lapply(sample.id, function(x) {
  counts <- Read10X(data.dir = file.path('data', x))
  colnames(counts) <- paste0(x, '_', colnames(counts))
  return(counts)
})
names(raw.data) <- sample.id

数据质控

原文对数据质控的描述：

Pre-processing of scRNA-seq data
ScRNA-Seq data were demultiplexed, aligned to the mouse genome, version mm10, and UMI-collapsed with the Cellranger toolkit (version 2.0.1, 10X Genomics). We excluded cells with fewer than 500 detected genes (where each gene had to have at least one UMI aligned). Gene expression was represented as the fraction of its UMI count with respect to total UMI in the cell and then multiplied by 10,000. We denoted it by TP10K – transcripts per 10K transcripts.

Filtering hematopoietic clusters and doublets
Based on cluster annotations with characteristic genes, we removed hematopoietic clusters from further analysis. It is further expected that a small fraction of data should consist of cell doublets (and to an even lesser extent of higher order multiplets) due to co-encapsulation into droplets and/or as occasional pairs of cells that were not dissociated in sample preparation. Therefore, when we found small clusters of cells expressing both hematopoietic and stromal markers we removed them from further analysis (original cluster 14). A small number of additional clusters and subclusters was marked by genes differentially expressed in at least two larger stromal clusters and were annotated as doublets if their average number of expressed genes was higher than the averages for corresponding suspected singlet cluster sources and/or they were not characterized by specific differentially expressed genes (original clusters 18 and 19). All marked doublets were removed from the discussion.

按照原文描述，先过滤掉检测基因数少于500的细胞，再根据下游聚类分析移除一些推定的doublets。我们先据此构建Seurat对象：

seu <- lapply(raw.data, CreateSeuratObject, min.features = 500)

检查了第一步质控，留下36,000+细胞，和作者原文给出的30,543数量差得有点多啊。由于没有源代码，只能猜测是作者省略了一些方法学上的描述。我们先按照常规流程走一遍看看吧：

seu.merge <- merge(seu[[1]], seu[2:length(seu)])
> dim(seu.merge)
[1] 27998 36181

进一步检查UMI数、基因数和线粒体相关UMI比例：

seu.merge[["percent.mt"]] <- PercentageFeatureSet(seu.merge, pattern = "^mt-")
VlnPlot(seu.merge, features = c('nCount_RNA', 'nFeature_RNA', 'percent.mt'))

线粒体基因高比例的细胞还是不少的，通常代表低质量细胞。而且也存在UMI数和基因数异常高的细胞，可能提示潜在的doublets。
再检查一下UMI数、基因数和线粒体基因比例的相关性：

plot1 <- FeatureScatter(seu.merge, feature1 = "nCount_RNA", feature2 = "nFeature_RNA")
plot2 <- FeatureScatter(seu.merge, feature1 = "nCount_RNA", feature2 = "percent.mt")
plot1 + plot2

根据以上质控指标，设置进一步过滤：

seu.clean <- subset(seu.merge, nCount_RNA < 50000 & nFeature_RNA < 6000 & percent.mt < 7.5)
rm(seu, seu.merge);gc(reset = TRUE)
> dim(seu.clean)
[1] 27998 34062

呃……还是多出来很多细胞。文章对这部分质控没有更多描述了，我们先暂且搁置这个问题。

Normalization

用的是最经典的LogNormalize，这个过程简单来说就是先将每个细胞内的每个基因的原始UMI counts除以每个细胞的总UMI counts，再乘以一个常数（默认为10,000）进行缩放，最后用log1p函数进行对数转换，以达到拉齐细胞间测序深度的目的。

> seu.clean <- NormalizeData(seu.clean)
Performing log-normalization
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|

Feature selection and Dimensionality reduction

Dimensionality reduction
We performed dimensionality reduction using gene expression data for a subset of variable genes. The variable genes were selected based on dispersion of binned variance to mean expression ratios using FindVariableGenes function of Seurat package (Satija et al., 2015) followed by filtering of cell-cycle, ribosomal protein, and mitochondrial genes. Next, we performed principal component analysis (PCA) and reduced the data to the top 50 PCA components (number of components was chosen based on standard deviations of the principal components – in a plateau region of an ‘‘elbow plot’’).

利用Seurat 2.3.4 版的FindVariableGenes函数，基于每个基因的平均表达量和离散程度，选择高可变基因进行下游的降维和聚类分析。在Seurat 3 中对应FindVariableFeatures函数，将selection.method参数改为'mvp'来对应 2.3.4 版中的方法。
原文没有给出具体参数，这里使用默认参数：

> seu.clean <- FindVariableFeatures(seu.clean, selection.method = 'mvp')
Calculating gene means
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Calculating gene variance to mean ratios
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
> length(VariableFeatures(seu.clean))
[1] 1127

top10 <- head(VariableFeatures(seu.clean), 10)

plot1 <- VariableFeaturePlot(seu.clean)
plot2 <- LabelPoints(plot = plot1, points = top10, repel = TRUE)
plot1 + plot2

筛选到了1,127个高可变基因。作者同时从其中剔除了线粒体基因、核糖体基因和细胞周期相关的基因，但是并没有指出挑选基因的规则，原文附件也没有给定基因集，这里我们同样只能按照一般做法来尝试：

mt.genes <- grep(pattern = '^mt-', rownames(seu.clean), value = TRUE)
ribo.genes <- grep(pattern = '^(Rpl[0-9]|Rps[0-9])', rownames(seu.clean), value = TRUE)
# 细胞周期基因
install.packages('org.Mm.eg.db')
library(org.Mm.eg.db)
cellcycle <- select(org.Mm.eg.db, keys = "GO:0007049", columns = "SYMBOL", keytype = "GOALL")
cellcycle <- unique(cellcycle$SYMBOL)

rm.genes <- c(mt.genes, ribo.genes, cellcycle)
VariableFeatures(seu.clean) <- setdiff(VariableFeatures(seu.clean), rm.genes)
> length(VariableFeatures(seu.clean))
[1] 1043

主成分分析（PCA），将1,000多个高可变基因通过线性变换投射到低维空间，默认计算50个主成分。PCA之前需要将表达矩阵中心化（centered）和标准化（standardized），使得每个基因均值为0、方差为1：

> seu.clean <- ScaleData(seu.clean)
Centering and scaling data matrix
  |============================================================================================================================================| 100%
> seu.clean <- RunPCA(seu.clean)
PC_ 1 
Positive:  Col1a2, Cst3, Mgp, Igfbp5, Abi3bp, Comp, Dcn, Cxcl14, Clu, Serping1 
       1500015O10Rik, Fam46a, Htra1, Fmod, Spp1, Ibsp, Mt2, Fibin, Chad, Pam 
       Gsn, Scara3, Itm2a, Olfml3, Col1a1, Col3a1, Ogn, Meg3, Pth1r, Col11a1 
Negative:  Fabp4, Cldn5, Cdh5, Lrg1, Tfpi, Kdr, Stab2, Gpihbp1, Mmrn2, Emcn 
       Fam167b, Esam, Ctla2a, Gpm6a, Stab1, Cd93, Apold1, Flt1, Gm1673, Cd36 
       Kcnj8, Ptprb, Mrc1, Flt4, Cyp4b1, Pecam1, Ecscr, Dnase1l3, Ushbp1, Abcc9 
PC_ 2 
Positive:  Tmem176b, Cxcl12, Vcam1, Gas6, Lpl, Gdpd2, Fbln5, Esm1, Nrp1, Kitl 
       Adipoq, Serping1, Hp, Dpep1, Pappa, Rarres2, Cxcl14, Epas1, Cdh11, Cyp1b1 
       Ebf3, 1500009L16Rik, Sfrp4, Tnc, Lepr, Angptl4, Trf, Arrdc4, Agt, Chrdl1 
Negative:  Chchd10, Rac2, Vpreb3, Cd79a, Cd79b, Ptprcap, Coro1a, Cd37, Mzb1, Lrmp 
       Pafah1b3, Blnk, Cnp, Arl5c, Rhoh, Laptm5, Cd72, Pou2af1, Gmfg, Cd53 
       Siglecg, Atp1b1, Fcrla, Xrcc6, Dusp2, Cytip, Tifa, Spib, Dnajc7, Bcl7a 
PC_ 3 
Positive:  Comp, Chad, Fmod, Pcolce2, 1500015O10Rik, Col11a1, Meg3, Anxa8, Ndufa4l2, Cilp2 
       Mfge8, Dcn, Hapln1, Acan, Scrg1, Cilp, Fibin, Ucma, 3110079O15Rik, Mgp 
       Col2a1, Igfbp6, Nbl1, Prg4, Tnfrsf11b, Dhx58os, Crispld1, Col11a2, S100a4, Tppp3 
Negative:  Ebf1, Vpreb3, Cd79a, Cd79b, Zeb2, Hp, Tifa, Ptprcap, Rac2, Gdpd2 
       Coro1a, Chchd10, Esm1, Adipoq, Cxcl12, Mzb1, Dpep1, Cd37, Blnk, Lpl 
       Lrmp, Pappa, Arl5c, Cd72, Kitl, Pou2af1, Atp1b1, Siglecg, Cyp1b1, Rhoh 
PC_ 4 
Positive:  Igfbp6, Nbl1, Crip1, Col3a1, Tppp3, Dcn, S100a4, Col1a1, Abi3bp, Cilp2 
       Mustn1, Cdh13, Ly6c1, Cilp, Tnxb, Ebf1, Angptl7, Lgals1, Col1a2, Ly6a 
       Vpreb3, Thbs4, Anxa8, Medag, Cd79a, Slurp1, Cd79b, Htra1, Clec3b, Cav1 
Negative:  Alox5ap, Lyz2, Slpi, Pglyrp1, Rgs18, Plek, Bin2, Wfdc21, Tmem40, Tyrobp 
       Hcst, Lcn2, S100a8, Prkar2b, Ppbp, Fcer1g, Ncf1, S100a9, Mcemp1, Ifitm6 
       Gp1bb, Nfe2, Ngp, Pf4, Gp9, Chil3, Camp, Fermt3, Clec1b, Ly6c2 
PC_ 5 
Positive:  Col9a2, Col9a1, Col9a3, Mia, 3110079O15Rik, Matn3, Fxyd2, Col11a2, Lect1, Col27a1 
       Hapln1, Col2a1, Acan, Scrg1, Ucma, Epyc, Col11a1, Prkg2, Il17b, Pth1r 
       Ppa1, Panx3, Serpina1a, Dhx58os, C1qtnf3, Serpina1d, Bhlhe41, Calml3, Pla2g5, Tnni2 
Negative:  Igfbp6, S100a4, Alox5ap, Tppp3, Lyz2, Nbl1, Slpi, Col3a1, Rgs18, Plek 
       Bin2, Pglyrp1, Tmem40, Ppbp, Col1a1, Mustn1, Gp9, Tnxb, Stx11, Dcn 
       Wfdc21, Hcst, Clec1b, Tyrobp, Itga2b, Rgs10, Fcer1g, Abi3bp, Pf4, Fermt3

查看碎石图：

ElbowPlot(seu.clean, ndims = 50)

文章没说选择多少个PC来进行下游分析，根据碎石图拐点，我们大致选择25个PC左右，进行tSNE降维：

seu.clean <- RunTSNE(seu.clean, dims = 1:25)

原文的tSNE图显示数据集之间几乎没有批次效应，我们来检查一下：

DimPlot(seu.clean, reduction = 'tsne')

DimPlot(seu.clean, reduction = 'tsne', split.by = 'orig.ident', ncol = 3)

可以看到6个数据集之间确实没有明显的批次效应。

无监督聚类

原文对于无监督聚类策略的描述：

Clustering and sub-clustering
We used graph-based clustering of the PCA reduced data with the Louvain Method (Blondel et al., 2008) after computing a shared nearest neighbor graph (Satija et al., 2015). We visualized the clusters on a 2D map produced with t-distributed stochastic neighbor embedding (t-SNE) (van der Maaten and Hinton, 2008). For sub-clustering, we applied the same procedure of finding variable genes, dimensionality reduction, and clustering to the restricted set of data (usually restricted to one initial cluster).

作者在PCA降维后的数据空间内计算 shared nearest neighbor graph（SNN graph），再利用Louvain算法基于图聚类（graph-based clustering）获得细胞分群。作者针对初始聚类结果的子集重复上述的“特征选择-降维-聚类”流程进行亚聚类（sub-clustering）分析。原文没有给出具体的分辨率（resolution）参数，但是根据Figure S1C可以看出似乎是先得到了10个初始分群，亚聚类分析后再得出33个亚群。
Seurat 3把原Seurat 2的FindClusters函数拆分成FindNeighbors和FindClusters，分别用于计算SNN graph和聚类，其实并没有太大的变化。
和tSNE降维时一样，我们使用1-25个主成分构建SNN graph，FindClusters的默认参数是algorithm = 1（Louvain），分辨率是resolution = 0.8：

> seu.clean <- FindNeighbors(seu.clean, dims = 1:25)
Computing nearest neighbor graph
Computing SNN
> seu.clean <- FindClusters(seu.clean)
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck

Number of nodes: 34062
Number of edges: 1270143

Running Louvain algorithm...
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Maximum modularity in 10 random starts: 0.9290
Number of communities: 33
Elapsed time: 8 seconds

哎哟，直接得到了和原文FigureS1C一样的亚群数（33个）？可视化检查一下：

DimPlot(seu.clean, reduction = 'tsne', label = TRUE, repel = TRUE) + NoLegend()

当然也不能排除作者是得到了33个聚类后根据已知marker基因在FigureS1中做了手动标注

过滤造血干细胞和doublets

由于作者关注的是骨髓基质细胞，因此对造血干祖细胞相关的亚群进行了过滤，并将同时表达两种或以上细胞类型marker的聚类判断为doublets加以去除：

Filtering hematopoietic clusters and doublets
Based on cluster annotations with characteristic genes, we removed hematopoietic clusters from further analysis. It is further expected that a small fraction of data should consist of cell doublets (and to an even lesser extent of higher order multiplets) due to co-encapsulation into droplets and/or as occasional pairs of cells that were not dissociated in sample preparation. Therefore, when we found small clusters of cells expressing both hematopoietic and stromal markers we removed them from further analysis (original cluster 14). A small number of additional clusters and subclusters was marked by genes differentially expressed in at least two larger stromal clusters and were annotated as doublets if their average number of expressed genes was higher than the averages for corresponding suspected singlet cluster sources and/or they were not characterized by specific differentially expressed genes (original clusters 18 and 19). All marked doublets were removed from the discussion.

1. 基于差异表达分析得到各亚群marker基因

Differential expression of gene signatures
For each cluster, we used the Wilcoxon Rank-Sum Test to find genes that had significantly different RNA-seq TP10K expression when compared to the remaining clusters (paired tests when indicated) (after multiple hypothesis testing correction). As a support measure for ranking differentially expressed genes we also used the area under receiver operating characteristic (ROC) curve.

作者基于LogNormalize后的表达值，使用了Wilcoxon Rank-Sum Test的统计检验方式进行差异表达。Seurat中通过FindAllMarkers函数实现，默认参数test.use = "wilcox"。
利用future包的plan函数开启并行运算（根据自己的系统配置合理选择核心数），详见 Parallelization in Seurat with future。
最后不要忘记调回单核运算，并释放内存。

library(future)
plan('multiprocess', workers = 8)
all.markers <- FindAllMarkers(seu.clean)
plan('sequential');gc(reset = TRUE)

即使是在并行运算下，差异表达也花了超过了1小时。接下来根据p值对差异基因进行过滤（p_val 或p_val_adj都行）。尽管文中没有明说是否做了这一步，但单细胞测序数据本身就具有高噪声的特点，根据统计学显著性筛选基因可以减少假阳性结果：

all.markers <- subset(all.markers, p_val_adj < 0.05)

作者同时还使用了接收者操作特征（receiver operating characteristic curve, ROC）曲线的检验方法作为辅助。我们令test.use = "roc"：

all.markers.roc <- FindAllMarkers(seu.clean, test.use = 'roc')

需要注意的是ROC方法不支持future的并行计算，这真是要算到地老天荒了……
关于ROC方法，Seurat帮助文档里面已经说得比较清楚了：

"roc" : Identifies 'markers' of gene expression using ROC analysis. For each gene, evaluates (using AUC) a classifier built on that gene alone, to classify between two groups of cells. An AUC value of 1 means that expression values for this gene alone can perfectly classify the two groupings (i.e. Each of the cells in cells.1 exhibit a higher level than each of the cells in cells.2). An AUC value of 0 also means there is perfect classification, but in the other direction. A value of 0.5 implies that the gene has no predictive power to classify the two groups. Returns a 'predictive power' (abs(AUC-0.5) * 2) ranked matrix of putative differentially expressed genes.

ROC曲线可以用来评估给定二分类模型的优劣。简单地说，就是对于每一个给定的基因，评估其能否较准确地识别出目的细胞亚群。ROC的曲线下面积（AUC）可以反映分类模型的正确率，大于或小于0.5都是有意义的（小于0.5时说明基于该基因的分类模型总是给出错误预测，此时我们取相反结果即可得到正确预测，也就意味着该基因可能在目的亚群中表达下调），而AUC在0.5附近时说明基于该基因的分类模型近似于随机分类，该基因可能在两群细胞之间没有显著差异表达。Seurat根据每个基因的AUC计算了分类power代替p值。

2. 基于已知的亚群marker基因

作者同时还比较了已知的不同细胞亚群marker基因的表达分布（Figure S1D）：

我们知道Seurat可以在降维图中展示基因表达特征：

FeaturePlot(seu.clean, features = c('Cd79a', 'Cd79b'))

同样可以利用AddModuleScore来评估一个基因集的表达情况。我们根据Figure S1D选择基因集：

cd <- list(
  C1 = c('Cd79a', 'Cd79b'),
  C2 = c('Gypa', 'Hbb-bt', 'Rhag', 'Rhd', 'Tfrc'),
  C3 = c('Cd52', 'Cd177', 'Plaur', 'Clec4a2'),
  C4 = c('Cd52', 'Selplg', 'Ms4a6c', 'Cd53'),
  C5 = c('Gp9', 'Itga2b', 'Cd9', 'Gp1bb'),
  C6 = c('Ms4a3', 'Clec12a', 'Fcgr3')
)
seu.clean <- AddModuleScore(
  object = seu.clean,
  features = cd,
  ctrl = 5,
  name = 'Known_markers'
)
FeaturePlot(seu.clean, features = paste0('Known_markers', 1:6), ncol = 3)

基本和原文吻合。

我们先把重要的数据保存下来：

save(seu.clean, all.markers, all.markers.roc, file = file.path('output', 'tmp1.RData'))

最后编辑于：2020.04.30 13:35:50

禁止转载，如需转载请通过简信或评论联系作者。

人面猴
序言：七十年代末，一起剥皮案震惊了整个滨河市，随后出现的几起案子，更是在滨河造成了极大的恐慌，老刑警刘岩，带你破解...
沈念sama阅读 193,968评论 5赞 459
死咒
序言：滨河连续发生了三起死亡事件，死亡现场离奇诡异，居然都是意外死亡，警方通过查阅死者的电脑和手机，发现死者居然都...
沈念sama阅读 81,682评论 2赞 371
救了他两次的神仙让他今天三更去死
文/潘晓璐我一进店门，熙熙楼的掌柜王于贵愁眉苦脸地迎上来，“玉大人，你说我怎么就摊上这事。” “怎么了？”我有些...
开封第一讲书人阅读 141,254评论 0赞 319
道士缉凶录：失踪的卖姜人
文/不坏的土叔我叫张陵，是天一观的道长。经常有香客问我，道长，这世上最难降的妖魔是什么？我笑而不...
开封第一讲书人阅读 52,074评论 1赞 263
港岛之恋（遗憾婚礼）
正文为了忘掉前任，我火速办了婚礼，结果婚礼上，老公的妹妹穿的比我还像新娘。我一直安慰自己，他们只是感情好，可当我...
茶点故事阅读 60,964评论 4赞 355
恶毒庶女顶嫁案：这布局不是一般人想出来的
文/花漫我一把揭开白布。她就那样静静地躺着，像睡着了一般。火红的嫁衣衬着肌肤如雪。梳的纹丝不乱的头发上，一...
开封第一讲书人阅读 46,055评论 1赞 272
城市分裂传说
那天，我揣着相机与录音，去河边找鬼。笑死，一个胖子当着我的面吹牛，可吹牛的内容都是我干的。我是一名探鬼主播，决...
沈念sama阅读 36,484评论 3赞 381
双鸳鸯连环套：你想象不到人心有多黑
文/苍兰香墨我猛地睁开眼，长吁一口气：“原来是场噩梦啊……” “哼！你这毒妇竟也来了？” 一声冷哼从身侧响起，我...
开封第一讲书人阅读 35,170评论 0赞 253
万荣杀人案实录
序言：老挝万荣一对情侣失踪，失踪者是张志新（化名）和其女友刘颖，没想到半个月后，有当地人在树林里发现了一具尸体，经...
沈念sama阅读 39,433评论 1赞 290
护林员之死
正文独居荒郊野岭守林人离奇死亡，尸身上长有42处带血的脓包…… 初始之章·张勋以下内容为张勋视角年9月15日...
茶点故事阅读 34,512评论 2赞 308
白月光启示录
正文我和宋清朗相恋三年，在试婚纱的时候发现自己被绿了。大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
茶点故事阅读 36,296评论 1赞 325
活死人
序言：一个原本活蹦乱跳的男人离奇死亡，死状恐怖，灵堂内的尸体忽然破棺而出，到底是诈尸还是另有隐情，我是刑警宁泽，带...
沈念sama阅读 32,184评论 3赞 312
日本核电站爆炸内幕
正文年R本政府宣布，位于F岛的核电站，受9级特大地震影响，放射性物质发生泄漏。R本人自食恶果不足惜，却给世界环境...
茶点故事阅读 37,545评论 3赞 298
男人毒药：我在死后第九天来索命
文/蒙蒙一、第九天我趴在偏房一处隐蔽的房顶上张望。院中可真热闹，春花似锦、人声如沸。这庄子的主人今日做“春日...
开封第一讲书人阅读 28,880评论 0赞 17
一桩弑父案，背后竟有这般阴谋
文/苍兰香墨我抬头看了看天上的太阳。三九已至，却和暖如春，着一层夹袄步出监牢的瞬间，已是汗流浃背。一阵脚步声响...
开封第一讲书人阅读 30,150评论 1赞 250
情欲美人皮
我被黑心中介骗来泰国打工，没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留，地道东北人。一个月前我还...
沈念sama阅读 41,437评论 2赞 341
代替公主和亲
正文我出身青楼，却偏偏与公主长得像，于是被迫代替她去往敌国和亲。传闻我的和亲对象是个残疾皇子，可洞房花烛夜当晚...
茶点故事阅读 40,630评论 2赞 335