将空间位置信息和转录组分析相结合,对于癌症、免疫、肿瘤免疫相互作用,组织微环境,神经和发育等领域,有着令人期待的应用前景。
单细胞的一切分析,加前缀Spatial 都是一个新的分析点,因此Scanpy 扩展后也可用于空间转录组数据分析。 https://scanpy-tutorials.readthedocs.io/en/latest/spatial/integration-scanorama.html
单细胞转录数据分析之Scanpy:https://www.jianshu.com/p/e22a947e6c60
单细胞转录组之Scanpy - 轨迹推断/拟时序分析:https://www.jianshu.com/p/0b2ca0e0b544
单细胞转录组之Scanpy - 样本整合分析:https://www.jianshu.com/p/beef8a8be360
单细胞空间转录分析之Scanpy:https://www.jianshu.com/p/8dc231c06932
单细胞空间转录分析之Scanpy-多样本整合:https://www.jianshu.com/p/caa98aeac191
单细胞空间转录分析之Seurat:https://www.jianshu.com/p/c9a601ced91f
单细胞空间转录分析之Seurat-多样本整合(浅谈空间批次):https://www.jianshu.com/p/609b04096b79
python 包Scanpy很多函数是借鉴了R包Seurat,所以这两种方法分析结果差异不大,大家可以对照Seurat分析,上面网址也提供了Seurat包处理单细胞空间转录分析过程。
和分析单细胞转录组数据一样,单细胞空间转录组主要包括了:质控(QC),标准化(Normalization),降维聚类(Dimensional reduction and clustering),Cluster marker genes, Spatially variable genes
为了和Seurat结果比较,我们使用了相同的一套数据集:https://support.10xgenomics.com/spatial-gene-expression/datasets/1.1.0/V1_Mouse_Brain_Sagittal_Anterior, 新鲜的冷冻小鼠脑组织, 前牙矢状切面,可以参考前面讲述的ABA大脑图谱:https://www.jianshu.com/p/5d087fffeb35
导入相关包
import scanpy as sc
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sc.logging.print_versions()
sc.set_figure_params(facecolor="white", figsize=(8, 8))
sc.settings.verbosity = 3
import os ##设置输入路径
os.chdir('/home/wucheng/jianshu/scanpy/spatial') ##修改路径
results_file = 'Anterior.h5ad' ##设置结果文件保存路径
读取数据
adata = sc.read_visium("/home/wucheng/data_set/Spatial/Mouse/Brain_Section1_Sagittal_Anterior/Brain_anterior1/outs")
adata.var_names_make_unique()
adata
AnnData object with n_obs × n_vars = 2695 × 32285
obs: 'in_tissue', 'array_row', 'array_col'
var: 'gene_ids', 'feature_types', 'genome'
uns: 'spatial'
obsm: 'spatial'
预处理
adata.var["mt"] = adata.var_names.str.match("^MT-|^mt-") #计算线粒体基因比例 ,这儿没有用作者推荐的str.startswith
sc.pp.calculate_qc_metrics(adata, qc_vars=["mt"], inplace=True)
adata.var["RP"] = adata.var_names.str.match("^RPS|^RPL|^Rps|^Rpl") #同样我们也可以计算核糖体基因比例
sc.pp.calculate_qc_metrics(adata, qc_vars=["RP"], inplace=True)
我们根据总counts和表达的genes对spots进行一些基本的过滤:
fig, axs = plt.subplots(1, 4, figsize=(16, 4))
sns.distplot(adata.obs["total_counts"], kde=False,bins=60, ax=axs[0]) #bins柱子的数目
sns.distplot(adata.obs["n_genes_by_counts"], kde=False, bins=60, ax=axs[1])
sns.distplot(adata.obs["pct_counts_mt"], kde=False, bins=60, ax=axs[2])
sns.distplot(adata.obs["pct_counts_RP"], kde=False, bins=60, ax=axs[3])
plt.savefig("QC_plot.pdf")
这儿和Seurat得到的QC小提琴图一样,只是形式不同。
# 进一步细化,用于过滤spots
fig, axs = plt.subplots(1, 4, figsize=(16, 4))
sns.distplot(adata.obs["total_counts"][adata.obs["total_counts"] < 10000], kde=False, bins=40, ax=axs[0])
sns.distplot(adata.obs["total_counts"][adata.obs["total_counts"] > 40000], kde=False, bins=40, ax=axs[1])
sns.distplot(adata.obs["n_genes_by_counts"][adata.obs["n_genes_by_counts"] < 4000], kde=False, bins=60, ax=axs[2])
sns.distplot(adata.obs["pct_counts_mt"][adata.obs["pct_counts_mt"] >10], kde=False, bins=30, ax=axs[3])
plt.savefig("QC_plot1.pdf")
过滤 可以看到更加质量评估我们是可以去除一些低质量的细胞的,但是因为Seurat对此数据集未作过滤,我们这儿也不过滤,实际可以按照下面过滤细胞
sc.pp.filter_cells(adata, min_counts=5000) #filtered out 80 cells that have less than 5000 counts
sc.pp.filter_cells(adata, max_counts=50000) #filtered out 39 cells that have more than 50000 counts
adata = adata[adata.obs["pct_counts_mt"] < 25]
print(f"#cells after MT filter: {adata.n_obs}") #cells after MT filter: 2502
sc.pp.filter_genes(adata, min_cells=10)
标准化,HVG
sc.pp.normalize_total(adata, inplace=True)
sc.pp.log1p(adata)
sc.pp.highly_variable_genes(adata, flavor="seurat", n_top_genes=2000)
PCA,聚类,可视化
sc.pp.pca(adata)
sc.pp.neighbors(adata)
sc.tl.umap(adata)
sc.tl.leiden(adata, key_added="clusters")
sc.pl.umap(adata, color=["total_counts", "clusters"], wspace=0.4)
plt.savefig("umap_cluster.pdf")
sc.pl.spatial(adata, img_key="hires", color=["total_counts", "clusters"])
plt.savefig("spatial_cluster.pdf")
显示每簇位置
sc.pl.spatial(adata, img_key="hires", color="clusters", groups=["0"], alpha=0.5, size=1.3)
plt.savefig("spatial_cluster_sub_C0.pdf")
sc.pl.spatial(adata, img_key="hires", color="clusters", groups=["1"], alpha=0.5, size=1.3)
plt.savefig("spatial_cluster_sub_C1.pdf")
关键基因的表达可视化
sc.pl.umap(adata, color=["Hpca", "Ttr"])
plt.savefig("umap_gene_exp.pdf")
sc.pl.spatial(adata, img_key="hires", color=["Hpca", "Ttr"])
plt.savefig("spatial_gene_exp.pdf")
每一簇marker genes
sc.tl.rank_genes_groups(adata, "clusters", method="t-test")
sc.pl.rank_genes_groups_heatmap(adata, groups="0", n_genes=20, groupby="clusters") #热图可视化
plt.savefig("Cluster0_marker_heatmap.pdf")
sc.pl.spatial(adata, img_key="hires", color=["Ppp1r1b", "Gpr88","Penk"], alpha=0.7)
plt.savefig("Cluster0_spatial_marker.pdf")
#输出每一簇marker genes
pd.DataFrame(adata.uns['rank_genes_groups']['names']).head(5)
result = adata.uns['rank_genes_groups']
groups = result['names'].dtype.names
pd.DataFrame( {group + '_' + key[:1]: result[key][group] for group in groups for key in ['names', 'pvals']}).head(5)
res = pd.DataFrame( {group + '_' + key: result[key][group] for group in groups for key in ['names', 'pvals','logfoldchanges','pvals_adj','scores']})
res.to_csv("dif_markers.csv") #输出保存
res
0_names 0_pvals 0_logfoldchanges 0_pvals_adj 0_scores 1_names 1_pvals ... 16_pvals_adj 16_scores 17_names 17_pvals 17_logfoldchanges 17_pvals_adj 17_scores
0 Ppp1r1b 0.000000e+00 3.872787 0.000000e+00 87.917786 Camk2n1 0.000000e+00 ... 8.478769e-43 100.765488 Neurod6 9.957260e-22 4.388139 4.848720e-20 30.808397
1 Gpr88 0.000000e+00 4.403049 0.000000e+00 85.114883 Nrgn 0.000000e+00 ... 8.057953e-47 90.942802 Shisa6 3.514023e-18 4.730407 1.345792e-16 23.194458
2 Penk 0.000000e+00 4.161317 0.000000e+00 81.762627 Cck 5.144804e-319 ... 3.267212e-36 61.300213 Kcnab2 5.501745e-19 1.907810 2.231455e-17 21.838722
3 Pde1b 0.000000e+00 3.090780 0.000000e+00 79.073380 Nptxr 4.326942e-262 ... 3.116630e-38 54.045643 Nell2 1.468953e-18 2.391361 5.804790e-17 21.818993
4 Pde10a 0.000000e+00 3.938498 0.000000e+00 77.363052 Camk2a 1.998454e-264 ... 3.185207e-28 42.650623 Slc17a7 6.004060e-21 2.086003 2.793099e-19 21.763395
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
32280 Slc17a7 5.392274e-138 -2.638275 3.108743e-135 -31.638594 Pcp4l1 4.422776e-130 ... 1.257510e-237 -36.938381 Htr1b 7.360363e-161 -28.121906 4.752586e-157 -28.987629
32281 Cck 2.454884e-151 -2.442926 1.617468e-148 -33.223202 Mal 3.544791e-138 ... 7.927483e-259 -38.930893 Shisa8 5.588661e-167 -28.738472 4.510748e-163 -29.622454
32282 Stmn1 1.575607e-143 -1.639276 9.782401e-141 -34.924587 Plp1 8.282063e-151 ... 5.406098e-260 -39.055321 Dlx6 8.146516e-176 -28.056419 8.767009e-172 -30.527283
32283 Olfm1 1.527035e-160 -2.071354 1.173817e-157 -35.563046 Mobp 1.650645e-178 ... 1.075188e-280 -40.989178 Cdkn1a 6.932729e-189 -28.244112 1.119116e-184 -31.842714
32284 Nrn1 6.550518e-207 -3.461612 9.194933e-204 -37.689751 Mbp 1.564585e-190 ... 5.941411e-33 -46.059914 Rxrg 4.265618e-220 -29.022448 1.377155e-215 -34.893322
[32285 rows x 90 columns]
空间特异性genes
import SpatialDE #导入包pip install spatialde
counts = pd.DataFrame(adata.X.todense(), columns=adata.var_names, index=adata.obs_names)
coord = pd.DataFrame(adata.obsm['spatial'], columns=['x_coord', 'y_coord'], index=adata.obs_names)
results = SpatialDE.run(coord, counts)
results.index = results["g"]
adata.var = pd.concat([adata.var, results.loc[adata.var.index.values, :]], axis=1)
results.sort_values("qval").head(10)
FSV M g l max_delta max_ll max_mu_hat max_s2_t_hat model n s2_FSV s2_logdelta time BIC max_ll_null LLR pval qval
g
Dgkb 0.443220 4 Dgkb 456.331505 1.223975 -2400.272142 0.960307 0.261535 SE 2695 0.000034 0.000653 0.006909 4832.140898 -3068.090390 667.818248 0.0 0.0
Dlg4 0.250555 4 Dlg4 456.331505 2.914374 -2323.810810 1.374468 0.131051 SE 2695 0.000026 0.000777 0.007951 4679.218235 -2650.952950 327.142139 0.0 0.0
Eif5a 0.183140 4 Eif5a 456.331505 4.345842 -1760.856838 2.325085 0.125122 SE 2695 0.000025 0.001113 0.008964 3553.310290 -1946.308672 185.451834 0.0 0.0
Nlgn2 0.157334 4 Nlgn2 456.331505 5.218447 -2322.206831 1.041536 0.074196 SE 2695 0.000024 0.001289 0.009007 4676.010275 -2486.621146 164.414315 0.0 0.0
Atp1b2 0.209184 4 Atp1b2 456.331505 3.683460 -2412.570465 1.691318 0.129979 SE 2695 0.000022 0.000824 0.008972 4856.737545 -2670.858217 258.287751 0.0 0.0
Efnb3 0.298024 4 Efnb3 456.331505 2.294977 -2520.276506 0.884368 0.160919 SE 2695 0.000022 0.000557 0.007954 5072.149626 -3041.920377 521.643871 0.0 0.0
Naa38 0.119160 4 Naa38 456.331505 7.202339 -2284.186889 1.567088 0.075790 SE 2695 0.000027 0.002230 0.007951 4599.970392 -2377.750298 93.563409 0.0 0.0
Apcdd1 0.409983 4 Apcdd1 858.638612 1.321026 -1339.982577 0.374860 0.112883 SE 2695 0.000240 0.004741 0.006995 2711.561768 -1548.322641 208.340064 0.0 0.0
Rnf227 0.281065 4 Rnf227 456.331505 2.492253 -2386.449172 1.251820 0.148585 SE 2695 0.000029 0.000776 0.007941 4804.494957 -2690.556304 304.107132 0.0 0.0
Kcnab3 0.361125 4 Kcnab3 456.331505 1.723719 -2047.492804 0.551128 0.142294 SE 2695 0.000023 0.000479 0.007971 4126.582221 -2618.187562 570.694759 0.0 0.0
res = results.sort_values("qval")
res.to_csv("dif_spatial.csv") #输出保存
sc.pl.spatial(adata, img_key="hires", color=["Dgkb", "Dlg4","Eif5a"], alpha=0.7)
plt.savefig("spatial_Var_feature.pdf")
保存数据
adata.write(results_file)
我们可以发现与Seurat相比,分类结果还是有差异,不过大的区域识别两种方法都没有什么问题。