1. 数据下载
在读Single-Cell Reconstruction of Progression Trajectory Reveals Intervention Principles in Pathological Cardiac Hypertrophy的时候,看到如下使用Seurat找到的细胞群高变基因和genes encoding secreted protein取交集来筛选配体做细胞互作分析的方法。
于是想要去找一下genes encoding secreted protein的列表。在human protein atlas上面看到有这个信息。
点进网站,点击subcellular,向下拉点击secreted proteins,向下拉可以看到如下table
这几个class都是可用的,这里我选了蛋白最多的最下面一行。直接点击进入如下界面
点击Download后面的JSON,将分泌蛋白基因列表和相关信息下载到本地。
读入R
install.packages("jsonlite")
library(jsonlite)
jsonData <- fromJSON("protein_class_SPOCTOPUS.json")
jsonData[1:6,1:5]
# Ensembl Gene Gene synonym Gene description Uniprot
# 1 ENSG00000121410 A1BG Alpha-1-B glycoprotein P04217
# 2 ENSG00000175899 A2M CPAMD5, FWP007, S863-7 Alpha-2-macroglobulin P01023
# 3 ENSG00000166535 A2ML1 CPAMD9, FLJ25179, p170 Alpha-2-macroglobulin like 1 A8K2U0
# 4 ENSG00000118017 A4GNT alpha4GnT Alpha-1,4-N-acetylglucosaminyltransferase Q9UNA3
# 5 ENSG00000114771 AADAC CES5A1, DAC Arylacetamide deacetylase P22760
# 6 ENSG00000197953 AADACL2 MGC72001 Arylacetamide deacetylase like 2 Q6P093
2. 人和小鼠同源基因转换
因为下载的是人的数据,想要对小鼠数据进行分析,需要做人和小鼠同源基因转换。这里使用biomaRt
包进行这个操作。
library(biomaRt)
listMarts()
human <- useMart('ensembl',dataset = "hsapiens_gene_ensembl")
mouse <- useMart('ensembl',dataset = "mmusculus_gene_ensembl")
hsa2mus_all <- getLDS(attributes = c("hgnc_symbol"),
filters = "hgnc_symbol",
values = jsonData$Gene,
mart = human,
attributesL = c("mgi_symbol"),
martL = mouse,uniqueRows = T)
head(hsa2mus_all)
# HGNC.symbol MGI.symbol
# 1 DSG2 Dsg2
# 2 CCDC134 Ccdc134
# 3 ZNF419 Zfy1
# 4 ZNF419 Zfy2
# 5 FAM177A1 Fam177a
# 6 FAM177A1 Fam177a2
MouseSecretedGene <- unique(hsa2mus_all$MGI.symbol)
然后就可以拿得到的基因去和Seurat找到的细胞群高变基因做韦恩图取交集了~
注:在转换的时候如果遇到下面这样的报错
gene.mo2ma <- getLDS(attributes = c("external_gene_name"),filters = "external_gene_name",values = c("Gad1","Sst"),mart = mouse,attributesL = c("external_gene_name","chromosome_name"),martL = macaque,uniqueRows = T)
错误: biomaRt has encountered an unexpected server error.
Consider trying one of the Ensembl mirrors (for more details look at ?useEnsembl)
应该是网页自身的问题,在构建数据集的时候需更换2021年版本的一个网页才能正常运行,估计是2022年版本的bug。
解决办法是是用host参数指定2021年版本网页。
human <- useMart('ensembl',dataset = "hsapiens_gene_ensembl", host = "https://dec2021.archive.ensembl.org/")
mouse <- useMart('ensembl',dataset = "mmusculus_gene_ensembl", host = "https://dec2021.archive.ensembl.org/")
hsa2mus_all <- getLDS(attributes = c("hgnc_symbol"),
filters = "hgnc_symbol",
values = jsonData$Gene,
mart = human,
attributesL = c("mgi_symbol"),
martL = mouse,uniqueRows = T)
即可正常运行