《拟时序分析》5.monocle3的降维、分群、聚类

上次教程已经带领大家先简单的认识一下monocle3的特点与功能(单细胞测序数据进阶分析—《拟时序分析》4.初识monocle3)。虽然是录了五十分钟,但是鉴于monocle3的坑还是很多,所以每一部分的功能还是需要带大家展开了解一下。这次内容主要带领大家学习monocle3的降维分群聚类功能,不知道大家是更喜欢用Seurat还是这次的monocle3呢?
视频中提到的Rmakdown还没有制作完,大家不要着急,我们先把这节课的测试文件与代码放在这个链接中,monocle3系列课程制作完毕会再进行更新,请大家持续关注:

资料链接:
https://pan.baidu.com/s/16lPDInnenycR9sTcABolSg?pwd=tdp7

你也可以直接前往monocle3官网教程进行学习:

https://cole-trapnell-lab.github.io/monocle3/docs/differential/

视频教程总是上传失败,大家直接去B站看吧:
https://www.bilibili.com/video/BV1br4y1x7Hf?p=9

2.1.monocle内置
2.1.1.降维

rm(list = ls());gc()
##            used  (Mb) gc trigger   (Mb)  max used   (Mb)
## Ncells  9994083 533.8   27726538 1480.8  27726538 1480.8
## Vcells 18353825 140.1  158881073 1212.2 198600815 1515.3
library(monocle3)library(dplyr)
cds <- readRDS('test.data/simple.rds')#怎么构建cds就不重复了,直接读一下上一部分的
cds <- preprocess_cds(cds, num_dim = 100)#主要是完成normalize,上一个视频刚讲过
plot_pc_variance_explained(cds)
image.png
cds <- reduce_dimension(cds)#降维,相当于Seurat::RunUMAP()
## No preprocess_method specified, using preprocess_method = 'PCA'
#这里我们没有指定reduction_method,但是默认是"UMAP",有"UMAP", "tSNE", "PCA", "LSI", "Aligned"可供选择
 #一般大于10000个细胞认为是大数据集,这时可以用 umap.fast_sgd=TRUE

umap.fast_sgd参数或者设置cores可以加快运算速度,但是可以能会让降维图最终显得有些不同,所以要慎重选择

red.1 <- plot_cells(cds, color_cells_by="cao_cell_type",                    cell_size = 2,
                    graph_label_size = 5, 
                   group_label_size  = 3)#类似于Seurat::DimPlot()
## No trajectory to plot. Has learn_graph() been called yet?
cds <- reduce_dimension(cds,umap.fast_sgd=T)
## No preprocess_method specified, using preprocess_method = 'PCA'
## Note: reduce_dimension will produce slightly different output each time you run it unless you set 'umap.fast_sgd = FALSE' and 'cores = 1'
red.2 <- plot_cells(cds, color_cells_by="cao_cell_type",                    cell_size = 2,                    graph_label_size = 5,                    group_label_size  = 3)
## No trajectory to plot. Has learn_graph() been called yet?
#这些参数的默认值都太小了,不利于我们观察,每次要敲很多参数,这就是我吐槽monocle3代码不友好的地方
cds <- reduce_dimension(cds,cores =5)
## No preprocess_method specified, using preprocess_method = 'PCA'
## Note: reduce_dimension will produce slightly different output each time you run it unless you set 'umap.fast_sgd = FALSE' and 'cores = 1'
red.3 <- plot_cells(cds, color_cells_by="cao_cell_type",
                    cell_size = 2,
                   graph_label_size = 5,                    group_label_size  = 3)
## No trajectory to plot. Has learn_graph() been called yet?suppressMessages(library(patchwork))red.1|red.2|red.3#线程对降维的影响似乎要小一些
## Warning: Removed 1 rows containing missing values (geom_text_repel).
## Removed 1 rows containing missing values (geom_text_repel).
## Removed 1 rows containing missing values (geom_text_repel).
## Warning: ggrepel: 8 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
## Warning: ggrepel: 9 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
## Warning: ggrepel: 8 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
image.png

试试tSNE

cds <- reduce_dimension(cds, reduction_method="tSNE")
## No preprocess_method specified, using preprocess_method = 'PCA'
plot_cells(cds, color_cells_by="cao_cell_type", reduction_method="tSNE")#给大家看看默认效果多让人抓狂
## No trajectory to plot. Has learn_graph() been called yet?
## Warning: Removed 1 rows containing missing values (geom_text_repel).

image.png
#看看UMAP和tSNE的差别吧red.1|plot_cells(cds, reduction_method="tSNE",
                color_cells_by="cao_cell_type",               cell_size = 2,
               group_label_size = 3)
## No trajectory to plot. Has learn_graph() been called yet?
## Warning: Removed 1 rows containing missing values (geom_text_repel).
## Removed 1 rows containing missing values (geom_text_repel).
## Warning: ggrepel: 8 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
image.png

2.1.2.分群

cds <- cluster_cells(cds)#分群
#这里计算方式用的是community detection,类似于Seurat::FindClusters(),但原理有所不同cds <- cluster_cells(cds, resolution=0.1)
#貌似现在只能选择分辨率,不能选择分群数量了,我觉得还挺好的一个功能没了
unique(partitions(cds))#查看分群ID
## [1] 1 3 2
## Levels: 1 2 3
plot_cells(cds)
## No trajectory to plot. Has learn_graph() been called yet?
image.png
plot_cells(cds, genes=c("cpna-2", "egl-21", "ram-2", "inos-1"))#看看基因在降维图中的表达量吧
## No trajectory to plot. Has learn_graph() been called yet?

image.png
#相当于Seurat::FeaturePlot()

2.1.3.去批次

names(cds@colData)#用于去批次的变量从这里挑
## [1] "plate"         "cao_cluster"   "cao_cell_type" "cao_tissue"   
## [5] "Size_Factor"
bfalig <- plot_cells(cds, color_cells_by="plate", label_cell_groups=FALSE)
## No trajectory to plot. Has learn_graph() been called yet?
cds <- align_cds(cds, alignment_group = "plate")#去除批次效应
## Aligning cells from different batches using Batchelor. 
## Please remember to cite:
##   Haghverdi L, Lun ATL, Morgan MD, Marioni JC (2018). 'Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors.' Nat. Biotechnol., 36(5), 421-427. doi: 10.1038/nbt.4091
aftalig <- plot_cells(cds, color_cells_by="plate", label_cell_groups=FALSE)
## No trajectory to plot. Has learn_graph() been called yet?
bfalig|aftalig#看看效果
image.png

2.1.4.计算并展示marker基因

#计算marker基因,类似于Seurat::FindAllMarkers()marker_test_res <- top_markers(cds, group_cells_by="cao_cell_type", 
                               reference_cells=1000, cores=2)
marker_test_res[1:4,1:4]
##          gene_id gene_short_name               cell_group marker_score
## 1 WBGene00000004           aat-3        GABAergic neurons    0.4279421
## 2 WBGene00000016           abf-5        Pharyngeal muscle    0.3333333
## 3 WBGene00000031           abu-8     Pharyngeal epithelia    0.3269503
## 4 WBGene00000037           ace-3 Canal associated neurons    0.4750789
top_specific_markers <- marker_test_res %>%  filter(fraction_expressing >= 0.10) %>%  group_by(cell_group) %>%  top_n(1, pseudo_R2)top_specific_markers[1:4,1:4]
## # A tibble: 4 × 4
## # Groups:   cell_group [4]
##   gene_id        gene_short_name cell_group           marker_score
##   <chr>          <fct>           <chr>                       <dbl>
## 1 WBGene00000485 che-3           Dopaminergic neurons        0.591
## 2 WBGene00000681 col-107         Seam cells                  0.326
## 3 WBGene00000753 col-180         Non-seam hypodermis         0.355
## 4 WBGene00001172 egl-3           Pharyngeal neurons          0.297
top_specific_marker_ids <- unique(top_specific_markers %>% pull(gene_id))plot_genes_by_group(cds,                    top_specific_marker_ids[1:10],
                    group_cells_by="cao_cell_type", 
                   max.size=3)#相当于Seurat::DotPlot
## Warning in if (ordering_type == "cluster_row_col") {: the condition has length >
## 1 and only the first element will be used
## Warning in stats::cor(t(res)): the standard deviation is zero
## Warning in if (axis_order == "marker_group") {: the condition has length > 1 and
## only the first element will be used
image.png
plot_genes_by_group(cds,                    top_specific_marker_ids[1:10],
                    max.size=3,
                    group_cells_by="partition")#根据cluster分组绘制
## Warning in if (ordering_type == "cluster_row_col") {: the condition has length >
## 1 and only the first element will be used

## Warning in if (ordering_type == "cluster_row_col") {: the condition has length >
## 1 and only the first element will be used
image.png

2.1.5.注释细胞

colData(cds)$assigned_cell_type <- as.character(partitions(cds))
colData(cds)$assigned_cell_type <-   dplyr::recode(colData(cds)$assigned_cell_type,                "1"="Bio_1",                "2"="Bio_2",                "3"="Bio_3", 
               '4'='Bio_4')#值得表扬的是没有像Seurat一样从0开始编号plot_cells(cds, group_cells_by="partition",            color_cells_by="assigned_cell_type", 
          cell_size = 2,           graph_label_size = 5,           group_label_size  = 3)
## No trajectory to plot. Has learn_graph() been called yet?
image.png

2.1.6.取子集以及亚群分析

try(cds_subset <- choose_cells(cds))#交互式的圈出一些细胞用于下游分析
## Error : choose_cells only works in interactive mode.
cds_subset <- cds #这里我本来细胞数就比较少就不取了class(cds_subset)
## [1] "cell_data_set"
## attr(,"package")
## [1] "monocle3"
pr_graph_test_res <- graph_test(cds_subset,
                                 neighbor_graph="knn", cores=2)

通过marker基因集可视化帮助判断

#专门用来分析亚群的marker genepr_deg_ids <- row.names(subset(pr_graph_test_res,
                                morans_I > 0.01 & q_value < 0.05))
gene_module_df <- find_gene_modules(cds_subset[pr_deg_ids,],                                     resolution=1e-3)#Seurat::AddModuleScore()
gene_module_df[1:5,1:5]

## # A tibble: 5 × 5
##   id             module supermodule dim_1  dim_2
##   <chr>          <fct>  <fct>       <dbl>  <dbl>
## 1 WBGene00000002 11     5            1.19  2.30 
## 2 WBGene00000003 2      2            9.66 -2.55 
## 3 WBGene00000004 5      3            4.72  0.633
## 4 WBGene00000006 2      2           10.7  -2.83 
## 5 WBGene00000010 3      1           -5.50 -2.16
plot_cells(cds_subset, genes=gene_module_df,            show_trajectory_graph=FALSE,
            label_cell_groups=FALSE,           cell_size = 0.5,           graph_label_size = 5,
           group_label_size  = 3)#相当于基因集打分
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.
image.png

2.2.Garnett
这个功能旨在帮助大家自动注释细胞

2.2.1.利用Marker制作Garnett文件

assigned_type_marker_test_res <- top_markers(cds,                                             
group_cells_by="assigned_cell_type",                                             
reference_cells=1000,                                             cores=2)
garnett_markers <- assigned_type_marker_test_res %>%  filter(marker_test_q_value < 0.01 & specificity >= 0.5) %>%  group_by(cell_group) %>%  top_n(5, marker_score)garnett_markers <- garnett_markers %>%   group_by(gene_short_name) %>%  filter(n() == 1)garnett_markers[1:5,1:5]
## # A tibble: 5 × 5
## # Groups:   gene_short_name [5]
##   gene_id        gene_short_name cell_group marker_score mean_expression
##   <chr>          <fct>           <chr>             <dbl>           <dbl>
## 1 WBGene00000063 act-1           Bio_1             0.750            3.39
## 2 WBGene00001172 egl-3           Bio_2             0.229            1.88
## 3 WBGene00001189 egl-21          Bio_2             0.212            1.39
## 4 WBGene00002048 ida-1           Bio_2             0.291            2.14
## 5 WBGene00003369 mlc-1           Bio_1             0.738            1.41
generate_garnett_marker_file(garnett_markers, 
                            file="./marker_file.txt")
## Garnett marker file written to ./marker_file.txt

2.2.2.利用先验Marker制作Garnett文件

#如果你有一些先验的marker,也可以自行修改Garnett文件#目前Garnett可以支持monocle2和monocle3
if(!require(org.Mm.eg.db))BiocManager::install(
  c("org.Mm.eg.db", "org.Hs.eg.db","org.Ce.eg.db"))
## Warning: package 'AnnotationDbi' was built under R version 4.1.1
if(!require(garnett))devtools::install_github("cole-trapnell-lab/garnett", ref="monocle3")
unique(clusters(cds))
## [1] 6 1 4 3 5 7 2 8 9
## Levels: 1 2 3 4 5 6 7 8 9
colData(cds)$garnett_cluster <- clusters(cds)#根据marker基因注释细胞
worm_classifier <- train_cell_classifier(cds = cds,
                                         marker_file = "./marker_file.txt",
                                          db=org.Ce.eg.db::org.Ce.eg.db,
                                         cds_gene_id_type = "ENSEMBL",
                                         num_unknown = 50,
                                         #防止被错误的分配,可以输入unknown的数量
                                         marker_file_gene_id_type = "SYMBOL",
                                         cores=4)#读取并构建garnett注释
## Warning in make_name_map(parse_list,
## as.character(row.names(rowData(norm_cds))), : 1 genes could not be converted
## from SYMBOL to ENSEMBL These genes are listed below:
## Warning in make_name_map(parse_list, as.character(row.names(rowData(norm_cds))), : The following genes from the cell type definition file are not presentin the cell dataset.  Please check these genes for errors. Cell typedetermination will continue, ignoring these genes.
## tag-80
## training_sample
## Cell type Bio_1 Cell type Bio_2 Cell type Bio_3         Unknown 
##             118             106              64              50
rownames(cds)[1:5]
## [1] "WBGene00000001" "WBGene00000002" "WBGene00000003" "WBGene00000004"
## [5] "WBGene00000005"
class(worm_classifier)#如果你对一个组织器官比较了解,可以给出一些先验的marker
## [1] "garnett_classifier"
## attr(,"package")
## [1] "garnett"
cds <- classify_cells(cds, worm_classifier, 
                     db = org.Ce.eg.db::org.Ce.eg.db,
                      cluster_extend = TRUE,
                      cds_gene_id_type = "ENSEMBL")#把garnett的注释加进来unique
(colData(cds)$garnett_cluster)
## [1] 6 1 4 3 5 7 2 8 9
## Levels: 1 2 3 4 5 6 7 8 9
unique(colData(cds)$cluster_ext_type)
## [1] "Cell type Bio_1" "Cell type Bio_3" "Unknown"         "Cell type Bio_2"
plot_cells(cds,show_trajectory_graph = F,           group_cells_by="partition",
           color_cells_by="cluster_ext_type",           cell_size = 2,
           graph_label_size = 5,           group_label_size  = 3)|
plot_cells(cds,show_trajectory_graph = F,           group_cells_by="partition",
           color_cells_by="assigned_cell_type",           cell_size = 2, 
          graph_label_size = 5,           group_label_size  = 3)
image.png

欢迎关注同名公众号~

©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 202,607评论 5 476
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 85,047评论 2 379
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 149,496评论 0 335
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 54,405评论 1 273
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 63,400评论 5 364
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 48,479评论 1 281
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 37,883评论 3 395
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 36,535评论 0 256
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 40,743评论 1 295
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 35,544评论 2 319
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 37,612评论 1 329
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 33,309评论 4 318
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 38,881评论 3 306
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 29,891评论 0 19
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 31,136评论 1 259
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 42,783评论 2 349
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 42,316评论 2 342

推荐阅读更多精彩内容