hello,大家好,中秋结束了,不知道大家过的怎么样,假期结束了,我们又要开始工作了,今天我们来分享一个新的内容点,关于空间转录组的同型分数和异性分数计算,相对难一点,大家好好准备。
关于R包STutility,不知道大家了解多少,这个包有很多值得学习的功能,今天介绍的两个就是这个R包的功能,当然,还有值得注意的一点就是这个R包的线性降维采用的是NMF,关于NMF也分享了很多了,大家可以多看看,多多学习。
好了,开始我们今天的内容,当然,之前我们需要对空间转录组数据做一些基础的分析,包括降维聚类,差异富集,以及单细胞空间联合对空间数据进行注释等等,在这个基础上,我们来进行下面的分析。
第一部分:Homotypic score calculation (within-class neighborhood analysis)(相对简单一点)
To assess the extent of spatial clustering of spots within each class, a network approach was applied using the GetSpatNet function in STUtility(这个R包的网址在STUtility).The spot degree, k, (i.e., the number of directly adjacent spots, which for Visium corresponds to a maximum spot degree of six(空间位点上一个点最多周围近邻6个spot)) for every spot in the network was computed and thereafter(之后,此后) the network’s average degree, obs, was calculated as = /,where and , respectively, corresponds to the total number of edges and nodes (spots) in the network. In order to account for differences in network size, the average degree was computed for random networks where the spots’ classifications had been shuffled within each sample.Based on the network size in each sample, an average was calculated as
where N is the number of permutations.A final score, , was thereafter computed as the difference between the observed and expected average degree and could thus inform to what extent the observed value exceeded what would be expected to be seen by chance.(当然,这里强调的是网络的边和点,边更重要一点,定义相似度的距离,很好的方法)。
第二点,Heterotypic score calculation (between-class neighborhood analysis)
To quantify the neighboring class identities adjacent to each spot, the RegionNeighbors function from STUtility was applied.(这个我们在最后面分享一下代码),The number of times spots of one class was found adjacent to another class identity was thereafter summarized in an adjacency matrix, , with the dimensions N X N, where N equals the total number of unique classes(就是多少个细胞类型,得到的矩阵就是临近细胞类型多样性的矩阵)。矩阵中的对角线填充了与该类对应的邻居总数的信息。Since large classes tend to have a larger number of neighbors simply by chance, we corrected the number of class-class neighbors by computing a score and comparing it with what would be expected to see at random(类似于显著性检验).“预期”值是通过对每个样本中的点类身份进行混洗,然后构建邻接矩阵 Aexp 来生成的。 This process was then iterated for a total of 50 permutations, and the average and standard deviation was calculated for each position in the matrix across the iterations to produce the matrices Aexp and Aexp respectively.Ultimately, a z-score for each position in the matrix was calculated as AZ = (Aobs - Aexp)/Aexp,where positive values are interpreted as class-class relationships observed x standard deviations more often than expected by chance and vice versa.(也是很巧妙的构思,对于研究邻居多样性很有帮助).
接下来我们看看Heterotypic score calculation (between-class neighborhood analysis)计算的代码
Sometimes it can be useful to extract the “neighborhood” of a set of spots. As an example, we show how this can be applied to find all the neighboring spots of any region of interest.
首先是我们之前处理好的空间转录组数据
library(STutility)
library(Seurat)
se = readRDS(spatialRDS)
FeatureOverlay(se, features = "seurat_clusters", sampleids = 1:2, ncols = 2)
当然,我们这里没有进行单细胞空间联合分析,所以这里我们就以cluster的信息进行演示,真正运用的时候大家可以自行挑选区域,或者联合后得到的细胞类型。
1、Connected Spatial Network
Once you have defined a region of interest and you want to find all spots neighboring to this region you can use the RegionNeighbours function to automatically detect such spots.
se <- SetIdent(se, value = "seurat_clusters")
se <- RegionNeighbours(se, id = "2", verbose = TRUE)
The default behavior is to find all spots which are neighbors with the selected id but ignoring its label, therefore it will simply be called nbs_2 as in “neighbors to 2”. The output will be stored as a new column in the meta.data slot, and in this example will be called “nbs_2”. The neighborhood detection algorithm is applied to each section separately and can therefore be run on multiple sections at the same time.
FeatureOverlay(se, features = "nbs_2", ncols = 2, sampleids = 1:2, cols = c("red", "lightgray"), pt.size = 2)
可以通过设置 keep.within.id = TRUE 来保留 id 组中的所有点。
se <- SetIdent(se, value = "seurat_clusters")
se <- RegionNeighbours(se, id = 2, keep.within.id = T, verbose = TRUE)
FeatureOverlay(se, features = "nbs_2", ncols = 2, sampleids = 1:2, cols = c("red", "lightgray"), pt.size = 2)
Using these two sets of spots, we can run a DE analysis to check what genes are up-regulated outside the cluster border.
library(magrittr)
library(dplyr)
se <- SetIdent(se, value = "nbs_2")
nbs_2.markers <- FindMarkers(se, ident.1 = "2", ident.2 = "nbs_2")
nbs_2.markers$gene <- rownames(nbs_2.markers)
se.subset <- SubsetSTData(se, expression = nbs_2 %in% c("2", "nbs_2"))
sorted.marks <- nbs_2.markers %>% top_n(n = 40, wt = abs(avg_logFC))
sorted.marks <- sorted.marks[order(sorted.marks$avg_logFC, decreasing = T), ]
DoHeatmap(se.subset, features = sorted.marks$gene, group.colors = c("red", "lightgray"), disp.min = -2, disp.max = 2)
From this DE-test we can for example see that the genes COX6C and FCGR3B genes are up-regulated inside the cluser whereas LGALS1 and CYBA genes are more highly expressed outisde the cluster border.
FeatureOverlay(se.subset, features = c("COX6C", "FCGR3B", "LGALS1", "CYBA"), pt.size = 2,
ncols = 2, cols = c("darkblue", "cyan", "yellow", "red", "darkred"))
And lastly, if you want to keep the labels for the neighbors you can set keep.idents = TRUE and you can keep one label per identity for the neighboring spots, e.g. “label”_nb_to_2
se <- SetIdent(se, value = "seurat_clusters")
se <- RegionNeighbours(se, id = 2, keep.idents = TRUE, verbose = TRUE)
FeatureOverlay(se, features = "nbs_2", ncols = 2, sampleids = 1:2, pt.size = 2)
这里还是要提醒大家,这里只是用cluster进行演示,真正的近邻分析应该是和单细胞进行联合后进行数据分析,大家多多留心。
生活很好,有你更好