《拟时序分析》5.monocle3的降维、分群、聚类

上次教程已经带领大家先简单的认识一下monocle3的特点与功能(单细胞测序数据进阶分析—《拟时序分析》4.初识monocle3)。虽然是录了五十分钟，但是鉴于monocle3的坑还是很多，所以每一部分的功能还是需要带大家展开了解一下。这次内容主要带领大家学习monocle3的降维、分群和聚类功能，不知道大家是更喜欢用Seurat还是这次的monocle3呢？
视频中提到的Rmakdown还没有制作完，大家不要着急，我们先把这节课的测试文件与代码放在这个链接中，monocle3系列课程制作完毕会再进行更新，请大家持续关注：

资料链接：
https://pan.baidu.com/s/16lPDInnenycR9sTcABolSg?pwd=tdp7

你也可以直接前往monocle3官网教程进行学习：

https://cole-trapnell-lab.github.io/monocle3/docs/differential/

视频教程总是上传失败，大家直接去B站看吧：
https://www.bilibili.com/video/BV1br4y1x7Hf?p=9

2.1.monocle内置
2.1.1.降维

rm(list = ls());gc()

##            used  (Mb) gc trigger   (Mb)  max used   (Mb)
## Ncells  9994083 533.8   27726538 1480.8  27726538 1480.8
## Vcells 18353825 140.1  158881073 1212.2 198600815 1515.3

library(monocle3)library(dplyr)
cds <- readRDS('test.data/simple.rds')#怎么构建cds就不重复了，直接读一下上一部分的
cds <- preprocess_cds(cds, num_dim = 100)#主要是完成normalize，上一个视频刚讲过
plot_pc_variance_explained(cds)

image.png

cds <- reduce_dimension(cds)#降维，相当于Seurat::RunUMAP()

## No preprocess_method specified, using preprocess_method = 'PCA'

#这里我们没有指定reduction_method，但是默认是"UMAP"，有"UMAP", "tSNE", "PCA", "LSI", "Aligned"可供选择
 #一般大于10000个细胞认为是大数据集，这时可以用 umap.fast_sgd=TRUE

umap.fast_sgd参数或者设置cores可以加快运算速度，但是可以能会让降维图最终显得有些不同，所以要慎重选择

red.1 <- plot_cells(cds, color_cells_by="cao_cell_type",                    cell_size = 2,
                    graph_label_size = 5, 
                   group_label_size  = 3)#类似于Seurat::DimPlot()
## No trajectory to plot. Has learn_graph() been called yet?
cds <- reduce_dimension(cds,umap.fast_sgd=T)
## No preprocess_method specified, using preprocess_method = 'PCA'
## Note: reduce_dimension will produce slightly different output each time you run it unless you set 'umap.fast_sgd = FALSE' and 'cores = 1'
red.2 <- plot_cells(cds, color_cells_by="cao_cell_type",                    cell_size = 2,                    graph_label_size = 5,                    group_label_size  = 3)
## No trajectory to plot. Has learn_graph() been called yet?
#这些参数的默认值都太小了，不利于我们观察，每次要敲很多参数，这就是我吐槽monocle3代码不友好的地方
cds <- reduce_dimension(cds,cores =5)
## No preprocess_method specified, using preprocess_method = 'PCA'
## Note: reduce_dimension will produce slightly different output each time you run it unless you set 'umap.fast_sgd = FALSE' and 'cores = 1'
red.3 <- plot_cells(cds, color_cells_by="cao_cell_type",
                    cell_size = 2,
                   graph_label_size = 5,                    group_label_size  = 3)
## No trajectory to plot. Has learn_graph() been called yet?suppressMessages(library(patchwork))red.1|red.2|red.3#线程对降维的影响似乎要小一些
## Warning: Removed 1 rows containing missing values (geom_text_repel).
## Removed 1 rows containing missing values (geom_text_repel).
## Removed 1 rows containing missing values (geom_text_repel).
## Warning: ggrepel: 8 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
## Warning: ggrepel: 9 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
## Warning: ggrepel: 8 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps

image.png

试试tSNE

cds <- reduce_dimension(cds, reduction_method="tSNE")

## No preprocess_method specified, using preprocess_method = 'PCA'

plot_cells(cds, color_cells_by="cao_cell_type", reduction_method="tSNE")#给大家看看默认效果多让人抓狂

## No trajectory to plot. Has learn_graph() been called yet?

## Warning: Removed 1 rows containing missing values (geom_text_repel).

image.png

#看看UMAP和tSNE的差别吧red.1|plot_cells(cds, reduction_method="tSNE",
                color_cells_by="cao_cell_type",               cell_size = 2,
               group_label_size = 3)

## No trajectory to plot. Has learn_graph() been called yet?

## Warning: Removed 1 rows containing missing values (geom_text_repel).
## Removed 1 rows containing missing values (geom_text_repel).

## Warning: ggrepel: 8 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps

image.png

2.1.2.分群

cds <- cluster_cells(cds)#分群
#这里计算方式用的是community detection，类似于Seurat::FindClusters()，但原理有所不同cds <- cluster_cells(cds, resolution=0.1)
#貌似现在只能选择分辨率，不能选择分群数量了，我觉得还挺好的一个功能没了
unique(partitions(cds))#查看分群ID

## [1] 1 3 2
## Levels: 1 2 3

plot_cells(cds)

## No trajectory to plot. Has learn_graph() been called yet?

image.png

plot_cells(cds, genes=c("cpna-2", "egl-21", "ram-2", "inos-1"))#看看基因在降维图中的表达量吧

## No trajectory to plot. Has learn_graph() been called yet?

image.png

#相当于Seurat::FeaturePlot()

2.1.3.去批次

names(cds@colData)#用于去批次的变量从这里挑

## [1] "plate"         "cao_cluster"   "cao_cell_type" "cao_tissue"   
## [5] "Size_Factor"

bfalig <- plot_cells(cds, color_cells_by="plate", label_cell_groups=FALSE)

## No trajectory to plot. Has learn_graph() been called yet?

cds <- align_cds(cds, alignment_group = "plate")#去除批次效应

## Aligning cells from different batches using Batchelor. 
## Please remember to cite:
##   Haghverdi L, Lun ATL, Morgan MD, Marioni JC (2018). 'Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors.' Nat. Biotechnol., 36(5), 421-427. doi: 10.1038/nbt.4091

aftalig <- plot_cells(cds, color_cells_by="plate", label_cell_groups=FALSE)

## No trajectory to plot. Has learn_graph() been called yet?

bfalig|aftalig#看看效果

image.png

2.1.4.计算并展示marker基因

#计算marker基因,类似于Seurat::FindAllMarkers()marker_test_res <- top_markers(cds, group_cells_by="cao_cell_type", 
                               reference_cells=1000, cores=2)

marker_test_res[1:4,1:4]

##          gene_id gene_short_name               cell_group marker_score
## 1 WBGene00000004           aat-3        GABAergic neurons    0.4279421
## 2 WBGene00000016           abf-5        Pharyngeal muscle    0.3333333
## 3 WBGene00000031           abu-8     Pharyngeal epithelia    0.3269503
## 4 WBGene00000037           ace-3 Canal associated neurons    0.4750789

top_specific_markers <- marker_test_res %>%  filter(fraction_expressing >= 0.10) %>%  group_by(cell_group) %>%  top_n(1, pseudo_R2)top_specific_markers[1:4,1:4]

## # A tibble: 4 × 4
## # Groups:   cell_group [4]
##   gene_id        gene_short_name cell_group           marker_score
##   <chr>          <fct>           <chr>                       <dbl>
## 1 WBGene00000485 che-3           Dopaminergic neurons        0.591
## 2 WBGene00000681 col-107         Seam cells                  0.326
## 3 WBGene00000753 col-180         Non-seam hypodermis         0.355
## 4 WBGene00001172 egl-3           Pharyngeal neurons          0.297

top_specific_marker_ids <- unique(top_specific_markers %>% pull(gene_id))plot_genes_by_group(cds,                    top_specific_marker_ids[1:10],
                    group_cells_by="cao_cell_type", 
                   max.size=3)#相当于Seurat::DotPlot

## Warning in if (ordering_type == "cluster_row_col") {: the condition has length >
## 1 and only the first element will be used

## Warning in stats::cor(t(res)): the standard deviation is zero

## Warning in if (axis_order == "marker_group") {: the condition has length > 1 and
## only the first element will be used

image.png

plot_genes_by_group(cds,                    top_specific_marker_ids[1:10],
                    max.size=3,
                    group_cells_by="partition")#根据cluster分组绘制

## Warning in if (ordering_type == "cluster_row_col") {: the condition has length >
## 1 and only the first element will be used

## Warning in if (ordering_type == "cluster_row_col") {: the condition has length >
## 1 and only the first element will be used

image.png

2.1.5.注释细胞

colData(cds)$assigned_cell_type <- as.character(partitions(cds))
colData(cds)$assigned_cell_type <-   dplyr::recode(colData(cds)$assigned_cell_type,                "1"="Bio_1",                "2"="Bio_2",                "3"="Bio_3", 
               '4'='Bio_4')#值得表扬的是没有像Seurat一样从0开始编号plot_cells(cds, group_cells_by="partition",            color_cells_by="assigned_cell_type", 
          cell_size = 2,           graph_label_size = 5,           group_label_size  = 3)

## No trajectory to plot. Has learn_graph() been called yet?

image.png

2.1.6.取子集以及亚群分析

try(cds_subset <- choose_cells(cds))#交互式的圈出一些细胞用于下游分析

## Error : choose_cells only works in interactive mode.

cds_subset <- cds #这里我本来细胞数就比较少就不取了class(cds_subset)

## [1] "cell_data_set"
## attr(,"package")
## [1] "monocle3"

pr_graph_test_res <- graph_test(cds_subset,
                                 neighbor_graph="knn", cores=2)

通过marker基因集可视化帮助判断

#专门用来分析亚群的marker genepr_deg_ids <- row.names(subset(pr_graph_test_res,
                                morans_I > 0.01 & q_value < 0.05))
gene_module_df <- find_gene_modules(cds_subset[pr_deg_ids,],                                     resolution=1e-3)#Seurat::AddModuleScore()
gene_module_df[1:5,1:5]

## # A tibble: 5 × 5
##   id             module supermodule dim_1  dim_2
##   <chr>          <fct>  <fct>       <dbl>  <dbl>
## 1 WBGene00000002 11     5            1.19  2.30 
## 2 WBGene00000003 2      2            9.66 -2.55 
## 3 WBGene00000004 5      3            4.72  0.633
## 4 WBGene00000006 2      2           10.7  -2.83 
## 5 WBGene00000010 3      1           -5.50 -2.16

plot_cells(cds_subset, genes=gene_module_df,            show_trajectory_graph=FALSE,
            label_cell_groups=FALSE,           cell_size = 0.5,           graph_label_size = 5,
           group_label_size  = 3)#相当于基因集打分

## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.

image.png

2.2.Garnett
这个功能旨在帮助大家自动注释细胞

2.2.1.利用Marker制作Garnett文件

assigned_type_marker_test_res <- top_markers(cds,                                             
group_cells_by="assigned_cell_type",                                             
reference_cells=1000,                                             cores=2)

garnett_markers <- assigned_type_marker_test_res %>%  filter(marker_test_q_value < 0.01 & specificity >= 0.5) %>%  group_by(cell_group) %>%  top_n(5, marker_score)garnett_markers <- garnett_markers %>%   group_by(gene_short_name) %>%  filter(n() == 1)garnett_markers[1:5,1:5]

## # A tibble: 5 × 5
## # Groups:   gene_short_name [5]
##   gene_id        gene_short_name cell_group marker_score mean_expression
##   <chr>          <fct>           <chr>             <dbl>           <dbl>
## 1 WBGene00000063 act-1           Bio_1             0.750            3.39
## 2 WBGene00001172 egl-3           Bio_2             0.229            1.88
## 3 WBGene00001189 egl-21          Bio_2             0.212            1.39
## 4 WBGene00002048 ida-1           Bio_2             0.291            2.14
## 5 WBGene00003369 mlc-1           Bio_1             0.738            1.41

generate_garnett_marker_file(garnett_markers, 
                            file="./marker_file.txt")

## Garnett marker file written to ./marker_file.txt

2.2.2.利用先验Marker制作Garnett文件

#如果你有一些先验的marker，也可以自行修改Garnett文件#目前Garnett可以支持monocle2和monocle3
if(!require(org.Mm.eg.db))BiocManager::install(
  c("org.Mm.eg.db", "org.Hs.eg.db","org.Ce.eg.db"))

## Warning: package 'AnnotationDbi' was built under R version 4.1.1

if(!require(garnett))devtools::install_github("cole-trapnell-lab/garnett", ref="monocle3")
unique(clusters(cds))

## [1] 6 1 4 3 5 7 2 8 9
## Levels: 1 2 3 4 5 6 7 8 9

colData(cds)$garnett_cluster <- clusters(cds)#根据marker基因注释细胞
worm_classifier <- train_cell_classifier(cds = cds,
                                         marker_file = "./marker_file.txt",
                                          db=org.Ce.eg.db::org.Ce.eg.db,
                                         cds_gene_id_type = "ENSEMBL",
                                         num_unknown = 50,
                                         #防止被错误的分配，可以输入unknown的数量
                                         marker_file_gene_id_type = "SYMBOL",
                                         cores=4)#读取并构建garnett注释

## Warning in make_name_map(parse_list,
## as.character(row.names(rowData(norm_cds))), : 1 genes could not be converted
## from SYMBOL to ENSEMBL These genes are listed below:

## Warning in make_name_map(parse_list, as.character(row.names(rowData(norm_cds))), : The following genes from the cell type definition file are not presentin the cell dataset.  Please check these genes for errors. Cell typedetermination will continue, ignoring these genes.
## tag-80

## training_sample
## Cell type Bio_1 Cell type Bio_2 Cell type Bio_3         Unknown 
##             118             106              64              50

rownames(cds)[1:5]

## [1] "WBGene00000001" "WBGene00000002" "WBGene00000003" "WBGene00000004"
## [5] "WBGene00000005"

class(worm_classifier)#如果你对一个组织器官比较了解，可以给出一些先验的marker

## [1] "garnett_classifier"
## attr(,"package")
## [1] "garnett"

cds <- classify_cells(cds, worm_classifier, 
                     db = org.Ce.eg.db::org.Ce.eg.db,
                      cluster_extend = TRUE,
                      cds_gene_id_type = "ENSEMBL")#把garnett的注释加进来unique
(colData(cds)$garnett_cluster)

## [1] 6 1 4 3 5 7 2 8 9
## Levels: 1 2 3 4 5 6 7 8 9

unique(colData(cds)$cluster_ext_type)

## [1] "Cell type Bio_1" "Cell type Bio_3" "Unknown"         "Cell type Bio_2"

plot_cells(cds,show_trajectory_graph = F,           group_cells_by="partition",
           color_cells_by="cluster_ext_type",           cell_size = 2,
           graph_label_size = 5,           group_label_size  = 3)|
plot_cells(cds,show_trajectory_graph = F,           group_cells_by="partition",
           color_cells_by="assigned_cell_type",           cell_size = 2, 
          graph_label_size = 5,           group_label_size  = 3)

image.png

欢迎关注同名公众号~

人面猴
序言：七十年代末，一起剥皮案震惊了整个滨河市，随后出现的几起案子，更是在滨河造成了极大的恐慌，老刑警刘岩，带你破解...
沈念sama阅读 202,607评论 5赞 476
死咒
序言：滨河连续发生了三起死亡事件，死亡现场离奇诡异，居然都是意外死亡，警方通过查阅死者的电脑和手机，发现死者居然都...
沈念sama阅读 85,047评论 2赞 379
救了他两次的神仙让他今天三更去死
文/潘晓璐我一进店门，熙熙楼的掌柜王于贵愁眉苦脸地迎上来，“玉大人，你说我怎么就摊上这事。” “怎么了？”我有些...
开封第一讲书人阅读 149,496评论 0赞 335
道士缉凶录：失踪的卖姜人
文/不坏的土叔我叫张陵，是天一观的道长。经常有香客问我，道长，这世上最难降的妖魔是什么？我笑而不...
开封第一讲书人阅读 54,405评论 1赞 273
港岛之恋（遗憾婚礼）
正文为了忘掉前任，我火速办了婚礼，结果婚礼上，老公的妹妹穿的比我还像新娘。我一直安慰自己，他们只是感情好，可当我...
茶点故事阅读 63,400评论 5赞 364
恶毒庶女顶嫁案：这布局不是一般人想出来的
文/花漫我一把揭开白布。她就那样静静地躺着，像睡着了一般。火红的嫁衣衬着肌肤如雪。梳的纹丝不乱的头发上，一...
开封第一讲书人阅读 48,479评论 1赞 281
城市分裂传说
那天，我揣着相机与录音，去河边找鬼。笑死，一个胖子当着我的面吹牛，可吹牛的内容都是我干的。我是一名探鬼主播，决...
沈念sama阅读 37,883评论 3赞 395
双鸳鸯连环套：你想象不到人心有多黑
文/苍兰香墨我猛地睁开眼，长吁一口气：“原来是场噩梦啊……” “哼！你这毒妇竟也来了？” 一声冷哼从身侧响起，我...
开封第一讲书人阅读 36,535评论 0赞 256
万荣杀人案实录
序言：老挝万荣一对情侣失踪，失踪者是张志新（化名）和其女友刘颖，没想到半个月后，有当地人在树林里发现了一具尸体，经...
沈念sama阅读 40,743评论 1赞 295
护林员之死
正文独居荒郊野岭守林人离奇死亡，尸身上长有42处带血的脓包…… 初始之章·张勋以下内容为张勋视角年9月15日...
茶点故事阅读 35,544评论 2赞 319
白月光启示录
正文我和宋清朗相恋三年，在试婚纱的时候发现自己被绿了。大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
茶点故事阅读 37,612评论 1赞 329
活死人
序言：一个原本活蹦乱跳的男人离奇死亡，死状恐怖，灵堂内的尸体忽然破棺而出，到底是诈尸还是另有隐情，我是刑警宁泽，带...
沈念sama阅读 33,309评论 4赞 318
日本核电站爆炸内幕
正文年R本政府宣布，位于F岛的核电站，受9级特大地震影响，放射性物质发生泄漏。R本人自食恶果不足惜，却给世界环境...
茶点故事阅读 38,881评论 3赞 306
男人毒药：我在死后第九天来索命
文/蒙蒙一、第九天我趴在偏房一处隐蔽的房顶上张望。院中可真热闹，春花似锦、人声如沸。这庄子的主人今日做“春日...
开封第一讲书人阅读 29,891评论 0赞 19
一桩弑父案，背后竟有这般阴谋
文/苍兰香墨我抬头看了看天上的太阳。三九已至，却和暖如春，着一层夹袄步出监牢的瞬间，已是汗流浃背。一阵脚步声响...
开封第一讲书人阅读 31,136评论 1赞 259
情欲美人皮
我被黑心中介骗来泰国打工，没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留，地道东北人。一个月前我还...
沈念sama阅读 42,783评论 2赞 349
代替公主和亲
正文我出身青楼，却偏偏与公主长得像，于是被迫代替她去往敌国和亲。传闻我的和亲对象是个残疾皇子，可洞房花烛夜当晚...
茶点故事阅读 42,316评论 2赞 342

《拟时序分析》5.monocle3的降维、分群、聚类

推荐阅读更多精彩内容