在空间转录组之前,我们可以利用单细胞数据进行推断CNA,用到的软件就是inferCNV,至于inferCNV的原理,我们这里就不详细叙说了,大家可以看文章InferCNV: Inferring copy number alterations from tumor single cell RNA-Seq data,其主要是依据基因的表达量及基因在染色体上的位置来进行CNA的判断,本人利用inferCNV在10X空间转录组上做过测试,与图片上识别的肿瘤区域位置相符,但是有一个前提,那就是必须有reference,否则结果不尽如人意,我们这里来介绍一个专门利用10X空间转录组及染色图片来推断的一个软件---STARCH,文章在这里STARCH: Copy number and clone inference from spatialtranscriptomics data
,我们这里来看看这个软件是如何运行的。
软件的介绍(这里我们关注重点):
(1)Intratumor heterogeneity, one of the hallmarks of cancer, is characterized by distinct subpopulations of cells with both genetic and transcriptional diversity. These cell subpopulations, or clones, are spatially organized within a tumor.肿瘤细胞具有空间位置的特异性。(注意和单细胞数据进行对比,单细胞数据无法提供空间位置,推断的结果不能完全确定肿瘤细胞,需要其他方法的验证)。
(2)STARCH relies on the idea that most large CNAs result in correlated changes in gene expression across the multiple adjacent genes whose copy number is altered by the CNA。(原理同inferCNV)。STARCH also leverages the observations that the CNAs in a tumor are related by a shared evolutionary history, and that nearby spots are likely to be genetically similar.(肿瘤进化)。These spatial correlations amplify the weak signal of CNPs in individual spots.
(3)STARCH derives clones and CNPs that are more coherent across layers compared to existing methods.软件推测的结果很稳定。
推断空间CNA的前提
(1)The sequenced sample contains a small number of tumor clones and each clone is characterized by a distinct copy number profile。也就是说肿瘤样本用于推断,正常样本不适用。
(2) CNAs that distinguish clones span multiple genes, creating dependencies between the copy numbers of adjacent genes。原理跟inferCNV一致.
(3)Most of the cells in a spot belong to the same clone.也就是说,一个spot的细胞拥有一致的克隆。
(4)Nearby spots are likely to contain cells from the same clone, leading to spatial dependencies between the clone assignment for each spot。临近spot的表达情况相似。
作者提到These assumptions are well supported by biological evidence.,也就是这些前提都是有依据支撑的,First, tumors evolve through repeated clonal expansions [Nowell, 1976], and most tumors have a small number of subpopulations sharing similar sets of mutations and CNAs [Gerstung et al., 2020]. Second, the median size of CNAs in solid tumors is reported to be approximately 700Kb [Zack et al., 2013], which is significantly longer than the median size (24Kb) of a gene [Fuchs et al., 2014]. Third, several studies have demonstrated that nearby cells are likely to have the same somatic mutations [Merlo et al., 2006, Navin et al., 2010]. Thus, both the cells within each spot as well as the cells in nearby spots are likely to belong to the same clone。足见前提可信且可靠。
We use these assumptions to process the spatial transcriptomics data for our model. First, based on the assumption that CNAs span many genes, we reduce much of the variance in expression attributed due to regulatory mechanisms by taking the average expression of multiple nearby genes into bins without losing information regarding their copy number status,这里可见还是目标基因的上下游基因的平均值作为该基因正常的表达量。Second, we scale the expression of each bin to be proportional to its copy number by normalizing the expression value by the expression value of the same bin in normal spots that do not harbor CNAs,当然,后面就要涉及一些很深入的算法,本人数学不会,大牛看到了请多指教。
分析结果的比较
从结果上看,该软件的分析结果更加符合实际的结果,当然,发文章,当然是这样的,不好的怎么发??
做空间的小朋友不妨试一下,有助于发更好的文章。STARCH