首先打开rstudio
打开后用getwd()查看当前工作路径
getwd()
如果路径跟上次的不一样,重新设置一下路径
setwd("X:/xxx/xxxx")
setwd("F:/Rstudio_data/001_singlecell_code/raw/BC21/")
重新加载R包
library(Seurat)
library(tidyverse)
library(patchwork)
library(dplyr)
加载数据
scRNA <-load("scRNA1.Rdata")
官方推荐是2000个高变基因,很多文章也有设置30000的,这个因自己的实验项目决定
scRNA1 <- FindVariableFeatures(scRNA1, selection.method = "vst", nfeatures = 3000)
Identify the 10 most highly variable genes,把top10的高变基因挑选出来,目的是为了作图
top10 <- head(VariableFeatures(scRNA1), 10)
plot variable features with and without labels 画出来不带标签的高变基因图
plot1 <- VariableFeaturePlot(scRNA1)
把top10的基因加到图中
plot2 <- LabelPoints(plot = plot1, points = top10, repel = TRUE, size=2.5)
plot <- CombinePlots(plots = list(plot1, plot2),legend="bottom")
plot
数据标准化(中心化)
如果内存足够最好对所有基因进行中心化
scale.genes <- rownames(scRNA1)
scRNA1 <- ScaleData(scRNA1, features = scale.genes)
如果内存不够,可以只对高变基因进行标准化
scale.genes <- VariableFeatures(scRNA)
scRNA <- ScaleData(scRNA, features = scale.genes)
scRNA对象中原始表达矩阵经过标准化和中心化之后,已经产生了三套基因表达数据,可以通过以下命令获得
原始表达矩阵
GetAssayData(scRNA,slot="counts",assay="RNA")
标准化之后的表达矩阵
GetAssayData(scRNA,slot="data",assay="RNA")
中心化之后的表达矩阵
GetAssayData(scRNA,slot="scale.data",assay="RNA")
细胞周期回归:上一步找到的高变基因,常常会包含一些细胞周期相关基因。
它们会导致细胞聚类发生一定的偏移,即相同类型的细胞在聚类时会因为细胞周期的不同而分开。
cc.genes
CaseMatch(c(cc.genes$s.genes,cc.genes$g2m.genes),VariableFeatures(scRNA1))
细胞周期评分
g2m_genes = cc.genes$g2m.genes
g2m_genes = CaseMatch(search = g2m_genes, match = rownames(scRNA1))
s_genes = cc.genes$s.genes
s_genes = CaseMatch(search = s_genes, match = rownames(scRNA1))
scRNA1 <- CellCycleScoring(object=scRNA1, g2m.features=g2m_genes, s.features=s_genes)
查看细胞周期基因对细胞聚类的影响
scRNAa <- RunPCA(scRNA1, features = c(s_genes, g2m_genes))
p <- DimPlot(scRNAa, reduction = "pca", group.by = "Phase")
p
VlnPlot(scRNAa, features = c("nFeature_RNA", "nCount_RNA", "percent.mt","percent.HB","G2M.Score","S.Score"), ncol = 6)
ggsave("cellcycle_pca.png", p, width = 8, height = 6)
如果需要消除细胞周期的影响
scRNAb <- ScaleData(scRNA1, vars.to.regress = c("S.Score", "G2M.Score"), features = rownames(scRNA1))
PCA降维并提取主成分
PCA降维
scRNA1 <- RunPCA(scRNA1, features = VariableFeatures(scRNA1))
plot1 <- DimPlot(scRNA1, reduction = "pca", group.by="orig.ident")
### 画图
plot1
####确定数据的维度 Determine the ‘dimensionality’ of the dataset
###ElbowPlot() 可以快速的检查降维的效果
plot2 <- ElbowPlot(scRNA1, ndims=20, reduction="pca")
##画图
plot2
###我们一般选择拐点作为降维的度数。
plotc <- plot1+plot2
ggsave("pca.pdf", plot = plotc, width = 8, height = 4)
ggsave("pca.png", plot = plotc, width = 8, height = 4)
后续分析要根据右图选择提取的pc轴数量,一般选择斜率平滑的点之前的所有pc轴,此图我的建议是选择前13个pc轴。
可以看出大概在PC为13的时候,每个轴是有区分意义的。
pc.num=1:13
细胞聚类
Identify clusters of cells by a shared nearest neighbor (SNN) modularity optimization based clustering algorithm. First calculate k-nearest neighbors and construct the SNN graph. Then optimize the modularity function to determine clusters. For a full description of the algorithms, see Waltman and van Eck (2013) The European Physical Journal B. Thanks to Nigel Delaney (evolvedmicrobe@github) for the rewrite of the Java modularity optimizer code in Rcpp!
scRNA1 <- FindNeighbors(scRNA1, dims = pc.num)
scRNA1 <- FindClusters(scRNA1, resolution = 0.5)
这个resolution(分辨率)是可以自定义的,当我们的样本细胞数较大时候resolution 要高一些,一般情况2万细胞以上都是大于1.0的
查看每一类有多少个细胞
table(scRNA1@meta.data$seurat_clusters)
metadata <- scRNA@meta.data
cell_cluster <- data.frame(cell_ID=rownames(metadata), cluster_ID=metadata$seurat_clusters)
write.csv(cell_cluster,'cluster/cell_cluster.csv',row.names = F)
可视化降维有两个方法tSNE和UMAP
非线性降维——这个目的是为了可视化,而不是特征提取(PCA),虽然它也可以用来做特征提取。
tSNE
scRNA1 = RunTSNE(scRNA1, dims = pc.num)
embed_tsne <- Embeddings(scRNA1, 'tsne')
write.csv(embed_tsne,'embed_tsne.csv')
plot1 = DimPlot(scRNA1, reduction = "tsne")
##画图
plot1
###label = TRUE把注释展示在图中
DimPlot(scRNA1, reduction = "tsne",label = TRUE)
###你会发现cluster都标了图中
ggsave("tSNE.pdf", plot = plot1, width = 8, height = 7)
##把图片保存一下
UMAP---第二种可视化降维
scRNA1 <- RunUMAP(scRNA1, dims = pc.num)
embed_umap <- Embeddings(scRNA1, 'umap')
write.csv(embed_umap,'embed_umap.csv')
plot2 = DimPlot(scRNA1, reduction = "umap")
plot2
ggsave("UMAP.pdf", plot = plot2, width = 8, height = 7)
合并tSNE与UMAP
plotc <- plot1+plot2+ plot_layout(guides = 'collect')
plotc
ggsave("tSNE_UMAP.pdf", plot = plotc, width = 10, height = 5)
保存数据这节课的数据
saveRDS(scRNA1, file="scRNA1.rds")