简介
目前,单细胞技术发展迅速,单细胞转录组测序可以帮助我们获得组织中每个细胞的基因表达水平,但却无法获得对应空间位置上的信息。空间转录组技术(spatial transcriptomics technologies)使我们能够利用空间定位信息对组织位置进行基因表达谱分析,从而表征空间组织结构,更好的帮助我们去理解组织异质性。然而,基于高通量测序的空转技术还达不到单细胞转录组的分辨率,测得的每个孔(spot)可能包含10个以上的细胞,这会导致我们收集的特定位置的基因表达是同质或者异质细胞类型的混合状态。因此,需要使用细胞类型去卷积(deconvolution)的分析工具去解析每个孔中的细胞类型组成,从而实现对细胞类型的空间定位。现在此类分析工具出来很多,像SPOTlight、RCTD、Cell2location、STdeconvolve。也有文章对此类工具进行了测试比较,感兴趣的可以看看这篇文章:A comprehensive comparison on cell-type composition inference for spatial transcriptomics data,今天我们给大家介绍一款发表在Nature Biotechnology杂志上的软件——CARD,它是由美国密歇根大学、生物统计系周翔课题组开发的,是一种对空间转录组数据实现细胞类型去卷积分析的方法。
软件介绍
CARD设计用于去卷积空间转录组数据,并基于参考scRNA-seq数据推断每个空间位置上的细胞类型组成,如下图所示。CARD需要带有细胞类型特异性基因表达信息的scRNA-seq数据(左图)以及带有定位信息的空间转录组学数据(右图)。有了这两个数据输入,CARD通过一个非负矩阵执行去卷积分解框架,并输出跨空间位置的估计细胞类型组成。CARD的一个独特之处在于,它能够通过CAR模型解释跨空间位置的细胞类型组成的空间相关性。通过计算空间位置上细胞类型组成的空间相关性,CARD还能够在原始研究中没有测量到的位置上输入细胞类型组成和基因表达水平,从而促进在组织上构建精细的高分辨率空间地图。此外,如果相应单细胞转录组数据缺失,无法构建参考矩阵时,我们可以输入marker基因,CARD 可以进行无细胞类型特异性参考矩阵的去卷积分析。
软件实操
下面我们使用代码来进行CARD软件分析,具体示例代码见CARD分析流程。
数据准备
我们首先需要准备两个数据,一个是单细胞转录组数据(原始counts表达矩阵+每个细胞的注释矩阵),还有一个空间转录组数据(原始counts表达矩阵+每个spot的空间位置矩阵)。示例数据获取方式:
wget https://github.com/YingMa0107/CARD/raw/master/data/spatial_location.RData ###空转的空间位置矩阵
wget https://github.com/YingMa0107/CARD/raw/master/data/spatial_count.RData ###空转的counts表达矩阵
wget https://github.com/YingMa0107/CARD/raw/master/data/sc_meta.RData ###单细胞转录组的细胞注释矩阵
wget https://github.com/YingMa0107/CARD/raw/master/data/sc_count.RData ###细胞转录组的counts表达矩阵
接下来我们安装和加载CARD包,导入所需数据:
### 安装CARD包
devtools::install_github('YingMa0107/CARD')
###加载CARD包
library(CARD)
###导入数据
load("./spatial_count.RData")###导入空转的counts表达矩阵
spatial_count[1:4,1:4]
4 x 4 sparse Matrix of class "dgCMatrix"
10x10 10x13 10x14 10x15
X5S_rRNA . . . .
X5_8S_rRNA . . . .
X7SK . . . .
A1BG.AS1 . . . .
load("./spatial_location.RData")###导入空转的空间位置矩阵
spatial_location[1:4,]
x y
10x10 10 10
10x13 10 13
10x14 10 14
10x15 10 15
load("./sc_count.RData")
sc_count[1:4,1:4]###导入单细胞转录组的counts表达矩阵
4 x 4 sparse Matrix of class "dgCMatrix"
Cell1 Cell2 Cell3 Cell4
A1BG . . . .
A1CF . . . 1
A2M . . . .
A2ML1 . . . .
load("./sc_meta.RData")###导入单细胞转录组的细胞注释矩阵
sc_meta[1:4,]
cellID cellType sampleInfo
Cell1 Cell1 Acinar_cells sample1
Cell2 Cell2 Ductal_terminal_ductal_like sample1
Cell3 Cell3 Ductal_terminal_ductal_like sample1
Cell4 Cell4 Ductal_CRISP3_high-centroacinar_like sample1
##创建对象
然后我们用导入的数据创建CARD对象,创建完成后空转数据储存在CARD_obj@spatial_countMat和CARD_obj@spatial_location,单细胞转录组数据储存在CARD_obj@sc_eset中:
CARD_obj = createCARDObject(
sc_count = sc_count,
sc_meta = sc_meta,
spatial_count = spatial_count,
spatial_location = spatial_location,
ct.varname = "cellType",
ct.select = unique(sc_meta$cellType),
sample.varname = "sampleInfo",
minCountGene = 100,
minCountSpot = 5)
## QC on scRNASeq dataset! ...
## QC on spatially-resolved dataset! ..
##去卷积分析
现在我们把所有东西都存储在CARD对象中,我们可以使用CARD去卷积空间转录组数据,完成后结果储存在CARD_obj@Proportion_CARD中,获得每个spot中各种细胞类型的组成占比。
CARD_obj = CARD_deconvolution(CARD_object = CARD_obj)
## create reference matrix from scRNASeq...
## Select Informative Genes! ...
## Deconvolution Starts! ...
## Deconvolution Finish! ...
print(CARD_obj@Proportion_CARD[1:2,])
Acinar_cells Ductal_terminal_ductal_like
10x10 6.370860e-02 0.02391251
10x13 7.818278e-08 0.02996182
Ductal_CRISP3_high-centroacinar_like Cancer_clone_A Ductal_MHC_Class_II
10x10 0.1801753 4.229928e-04 0.021557706
10x13 0.9620440 1.910394e-07 0.006437371
Cancer_clone_B mDCs_A Ductal_APOL1_high-hypoxic Tuft_cells
10x10 4.339480e-05 0.011136707 4.967235e-04 2.025089e-03
10x13 2.319262e-05 0.001013068 3.864917e-06 1.796433e-06
mDCs_B pDCs Endocrine_cells Endothelial_cells Macrophages_A
10x10 0.0792525811 7.432979e-07 6.848627e-03 1.722855e-01 9.909662e-02
10x13 0.0004695018 1.566377e-11 3.925412e-11 4.198468e-11 2.766705e-05
Mast_cells Macrophages_B T_cells_&_NK_cells Monocytes RBCs
10x10 7.411147e-11 3.090675e-02 4.976256e-06 2.663846e-06 3.84187e-10
10x13 2.387132e-11 9.499908e-06 1.172387e-11 2.000116e-06 1.00967e-06
Fibroblasts
10x10 3.081225e-01
10x13 4.898874e-06
可视化
CARD软件提供两种比较好的可视化函数,CARD.visualize.pie和CARD.visualize.prop,前者是在空转位置上绘制每个spot的细胞类型组成拼图;后者是在空转位置上绘制每种细胞类型在所有spot中的占比情况,用颜色代表占比的高低。
colors = c("#FFD92F","#4DAF4A","#FCCDE5","#D9D9D9","#377EB8","#7FC97F","#BEAED4",
"#FDC086","#FFFF99","#386CB0","#F0027F","#BF5B17","#666666","#1B9E77","#D95F02",
"#7570B3","#E7298A","#66A61E","#E6AB02","#A6761D")#可以自定义颜色
p1 <- CARD.visualize.pie(proportion = CARD_obj@Proportion_CARD,spatial_location = CARD_obj@spatial_location, colors = colors)
print(p1)
这个函数不能设置每个spot饼图的大小,如果想自己调整,可以手动更改函数内部再绘图,如下所示:
CARD.visualize.pie2 <- function(proportion,spatial_location,colors = NULL,round_size){
res_CARD = as.data.frame(proportion)
res_CARD = res_CARD[,mixedsort(colnames(res_CARD))]
location = as.data.frame(spatial_location)
if(sum(rownames(res_CARD)==rownames(location))!= nrow(res_CARD)){
stop("The rownames of proportion data does not match with the rownames of spatial location data")
}
colorCandidate = c("#1e77b4","#ff7d0b","#ceaaa3","#2c9f2c","#babc22","#d52828","#9267bc",
"#8b544c","#e277c1","#d42728","#adc6e8","#97df89","#fe9795","#4381bd","#f2941f","#5aa43a","#cc4d2e","#9f83c8","#91675a",
"#da8ec8","#929292","#c3c237","#b4e0ea","#bacceb","#f7c685",
"#dcf0d0","#f4a99f","#c8bad8",
"#F56867", "#FEB915", "#C798EE", "#59BE86", "#7495D3",
"#D1D1D1", "#6D1A9C", "#15821E", "#3A84E6", "#997273",
"#787878", "#DB4C6C", "#9E7A7A", "#554236", "#AF5F3C",
"#93796C", "#F9BD3F", "#DAB370", "#877F6C", "#268785")
if(is.null(colors)){
#colors = brewer.pal(11, "Spectral")
if(ncol(res_CARD) > length(colorCandidate)){
colors = colorRampPalette(colorCandidate)(ncol(res_CARD))
}else{
colors = colorCandidate[sample(1:length(colorCandidate),ncol(res_CARD))]
}
}else{
colors = colors
}
data = cbind(res_CARD,location)
ct.select = colnames(res_CARD)
p = suppressMessages(ggplot() + geom_scatterpie(aes(x=x, y=y,r = round_size),data=data, ###r默认是0.52
cols=ct.select,color=NA) + coord_fixed(ratio = 1) +
scale_fill_manual(values = colors)+
theme(plot.margin = margin(0.1, 0.1, 0.1, 0.1, "cm"),
panel.background = element_blank(),
plot.background = element_blank(),
panel.border = element_rect(colour = "grey89", fill=NA, size=0.5),
axis.text =element_blank(),
axis.ticks =element_blank(),
axis.title =element_blank(),
legend.title=element_text(size = 16,face="bold"),
legend.text=element_text(size = 15),
legend.key = element_rect(colour = "transparent", fill = "white"),
legend.key.size = unit(0.45, 'cm'),
strip.text = element_text(size = 16,face="bold"),
legend.position="bottom")+
guides(fill=guide_legend(title="Cell Type")))
return(p)
}
p1 <- CARD.visualize.pie2(proportion = CARD_obj@Proportion_CARD,spatial_location = CARD_obj@spatial_location, colors = colors, round_size=5)
我们同时可以选择一些感兴趣的细胞类型分别进行可视化,查看细胞类型比例的空间分布:
ct.visualize = c("Acinar_cells","Cancer_clone_A","Cancer_clone_B","Ductal_terminal_ductal_like","Ductal_CRISP3_high-centroacinar_like","Ductal_MHC_Class_II","Ductal_APOL1_high-hypoxic","Fibroblasts")
p2 <- CARD.visualize.prop(
proportion = CARD_obj@Proportion_CARD,
spatial_location = CARD_obj@spatial_location,
ct.visualize = ct.visualize, ### 选择要可视化的细胞类型
colors = c("lightblue","lightyellow","red"), ###可以手动设置颜色
NumCols = 4) ###图中展示的细胞类型的列数
print(p2)
精细化空间图谱
同时,CARD还可以将细胞类型组成和基因表达水平归到未测量的组织位置上,构建更精细的空间组织图。也就是如果空转上没测到的孔,可以用单细胞转录组数据将其还原,分析结果储存在CARD_obj@refined_prop和CARD_obj@refined_expression中。
CARD_obj = CARD.imputation(CARD_obj,NumGrids = 2000,ineibor = 10,exclude = NULL)
## The rownames of locations are matched ...
## Make grids on new spatial locations ...
p5 <- CARD.visualize.prop(
proportion = CARD_obj@refined_prop,
spatial_location = location_imputation,
ct.visualize = ct.visualize,
colors = c("lightblue","lightyellow","red"),
NumCols = 4)
print(p5)
CARD还可以提供在增强分辨率下的细胞类型比例后,预测增强分辨率下的空间基因表达,与原始分辨率相比,CARD有助于构建精细的空间组织图,其分辨率远远高于原始研究中测量到的分辨率。
###增强分辨率下的基因表达
p6 <- CARD.visualize.gene(
spatial_expression = CARD_obj@refined_expression,
spatial_location = location_imputation,
gene.visualize = c("Tm4sf1","S100a4","Tff3","Apol1","Crisp3","CD248"),
colors = NULL,
NumCols = 6)
print(p6)
###原始的基因表达
p7 <- CARD.visualize.gene(
spatial_expression = CARD_obj@spatial_countMat,
spatial_location = CARD_obj@spatial_location,
gene.visualize = c("Tm4sf1","S100a4","Tff3","Apol1","Crisp3","CD248"),
colors = NULL,
NumCols = 6)
print(p7)
不使用单细胞转录组数据进行去卷积
开发者同时扩展了CARD用以支持无参考细胞类型的去卷积分析,简称为CARDfree。CARDfree不再需要外部单细胞参考数据集,只需要用户输入之前已知的细胞类型标记的基因名称列表。
wget https://github.com/YingMa0107/CARD/raw/master/data/markerList.RData#软件提供的参考marker基因列表,可以自行提供
###导入参考marker基因列表
load("./markerList.RData")
CARDfree_obj = createCARDfreeObject(
markerList = markerList,
spatial_count = spatial_count,
spatial_location = spatial_location,
minCountGene = 100,
minCountSpot =5) ###创建CARDfree对象
CARDfree_obj = CARD_refFree(CARDfree_obj)
colors = c("#FFD92F","#4DAF4A","#FCCDE5","#D9D9D9","#377EB8","#7FC97F","#BEAED4","#FDC086","#FFFF99","#386CB0","#F0027F","#BF5B17","#666666","#1B9E77","#D95F02","#7570B3","#E7298A","#66A61E","#E6AB02","#A6761D")
CARDfree_obj@Proportion_CARD = CARDfree_obj@Proportion_CARD[,c(8,10,14,2,1,6,12,18,7,13,20,19,16,17,11,15,4,9,3,5)]
colnames(CARDfree_obj@Proportion_CARD) = paste0("CT",1:20)
p8 <- CARD.visualize.pie(CARDfree_obj@Proportion_CARD,CARDfree_obj@spatial_location,colors = colors)
print(p8)
绘制单细胞分辨率的空间图谱
此外,CARD还可以在非单细胞分辨率的空间转录组上构建单细胞分辨率空间转录组。根据用于去卷积的参考单细胞转录组数据,从非单细胞分辨率空间转录组数据推断出每个测量空间位置的单细胞分辨率基因表达。
scMapping = CARD_SCMapping(CARD_obj,shapeSpot="Square",numCell=20,ncore=10)###shapeSpot:一个字符,指示单个细胞的采样空间坐标是位于类方形区域还是类圆形区域。该区域的中心是在非单细胞分辨率空间转录组数据中测量的空间位置。默认为“Square”,另一个选项为“Circle”
library(SingleCellExperiment)
MapCellCords = as.data.frame(colData(scMapping))
count_SC = assays(scMapping)$counts
library(ggplot2)
df = MapCellCords
colors = c("#8DD3C7","#CFECBB","#F4F4B9","#CFCCCF","#D1A7B9","#E9D3DE","#F4867C","#C0979F",
"#D5CFD6","#86B1CD","#CEB28B","#EDBC63","#C59CC5","#C09CBF","#C2D567","#C9DAC3","#E1EBA0",
"#FFED6F","#CDD796","#F8CDDE")
p9 = ggplot(df, aes(x = x, y = y, colour = CT)) +
geom_point(size = 3.0) +
scale_colour_manual(values = colors) +
#facet_wrap(~Method,ncol = 2,nrow = 3) +
theme(plot.margin = margin(0.1, 0.1, 0.1, 0.1, "cm"),
panel.background = element_rect(colour = "white", fill="white"),
plot.background = element_rect(colour = "white", fill="white"),
legend.position="bottom",
panel.border = element_rect(colour = "grey89", fill=NA, size=0.5),
axis.text =element_blank(),
axis.ticks =element_blank(),
axis.title =element_blank(),
legend.title=element_text(size = 13,face="bold"),
legend.text=element_text(size = 12),
legend.key = element_rect(colour = "transparent", fill = "white"),
legend.key.size = unit(0.45, 'cm'),
strip.text = element_text(size = 15,face="bold"))+
guides(color=guide_legend(title="Cell Type"))###可视化每个细胞的细胞类型及其空间位置信息
print(p9)
总结
与其他去卷积分析软件相比,CARD优势在于除了传统基于单细胞转录组数据去卷积分析外,还可以基于细胞类型的marker基因进行分析,最后还可以绘制单细胞层次的空间图谱。