Chapter 3 Universal enrichment analysis
clusterProfiler supports both hypergeometric test and gene set enrichment analyses of many ontology/pathway, but it’s still not enough for users may want to analyze their data with unsupported organisms, slim version of GO, novel functional annotation (e.g. GO via BlastGO or KEGG via KAAS), unsupported ontologies/pathways or customized annotations.
clusterProfiler provides enricher
function for hypergeometric test and GSEA
function for gene set enrichment analysis that are designed to accept user defined annotation. They accept two additional parameters TERM2GENE and TERM2NAME. As indicated in the parameter names, TERM2GENE is a data.frame with first column of term ID and second column of corresponding mapped gene and TERM2NAME is a data.frame with first column of term ID and second column of corresponding term name. TERM2NAME is optional.
3.1 Input data
For over representation analysis, all we need is a gene vector, that is a vector of gene IDs. These gene IDs can be obtained by differential expression analysis (e.g. with DESeq2 package).
For gene set enrichment analysis, we need a ranked list of genes. DOSE provides an example dataset geneList
which was derived from R
package breastCancerMAINZ that contained 200 samples, including 29 samples in grade I, 136 samples in grade II and 35 samples in grade III. We computed the ratios of geometric means of grade III samples versus geometric means of grade I samples. Logarithm of these ratios (base 2) were stored in geneList
dataset.
The geneList
contains three features:
- numeric vector: fold change or other type of numerical variable
- named vector: every number was named by the corresponding gene ID
- sorted vector: number should be sorted in decreasing order
Suppose you are importing your own data from a csv file and the file contains two columns, one for gene ID (no duplicated allowed) and another one for fold change, you can prepare your own geneList
via the following command:
d <- read.csv(your_csv_file)
## assume that 1st column is ID
## 2nd column is fold change
## feature 1: numeric vector
geneList <- d[,2]
## feature 2: named vector
names(geneList) <- as.character(d[,1])
## feature 3: decreasing order
geneList <- sort(geneList, decreasing = TRUE)
We can load the sample data into R via:
data(geneList, package="DOSE")
head(geneList)
## 4312 8318 10874 55143 55388 991
## 4.572613 4.514594 4.418218 4.144075 3.876258 3.677857
Suppose we define fold change greater than 2 as DEGs:
gene <- names(geneList)[abs(geneList) > 2]
head(gene)
## [1] "4312" "8318" "10874" "55143" "55388" "991"
3.2 WikiPathways analysis
WikiPathways is a continuously updated pathway database curated by a community of researchers and pathway enthusiasts. WikiPathways produces monthly releases of gmt files for supported organisms at data.wikipathways.org. Download the appropriate gmt file and then generate TERM2GENE
and TERM2NAME
to use enricher
and GSEA
functions.
library(magrittr)
library(clusterProfiler)
data(geneList, package="DOSE")
gene <- names(geneList)[abs(geneList) > 2]
wpgmtfile <- system.file("extdata/wikipathways-20180810-gmt-Homo_sapiens.gmt", package="clusterProfiler")
wp2gene <- read.gmt(wpgmtfile)
wp2gene <- wp2gene %>% tidyr::separate(ont, c("name","version","wpid","org"), "%")
wpid2gene <- wp2gene %>% dplyr::select(wpid, gene) #TERM2GENE
wpid2name <- wp2gene %>% dplyr::select(wpid, name) #TERM2NAME
ewp <- enricher(gene, TERM2GENE = wpid2gene, TERM2NAME = wpid2name)
head(ewp)
## ID
## WP2446 WP2446
## WP2361 WP2361
## WP179 WP179
## WP3942 WP3942
## WP4240 WP4240
## WP2328 WP2328
## Description
## WP2446 Retinoblastoma (RB) in Cancer
## WP2361 Gastric Cancer Network 1
## WP179 Cell Cycle
## WP3942 PPAR signaling pathway
## WP4240 Regulation of sister chromatid separation at the metaphase-anaphase transition
## WP2328 Allograft Rejection
## GeneRatio BgRatio pvalue p.adjust
## WP2446 11/95 88/6249 6.801697e-08 1.054263e-05
## WP2361 6/95 29/6249 3.772735e-06 2.923870e-04
## WP179 10/95 122/6249 1.384549e-05 7.153503e-04
## WP3942 7/95 67/6249 6.210513e-05 2.406574e-03
## WP4240 4/95 16/6249 7.931988e-05 2.458916e-03
## WP2328 7/95 90/6249 4.016758e-04 1.037663e-02
## qvalue
## WP2446 9.450779e-06
## WP2361 2.621058e-04
## WP179 6.412648e-04
## WP3942 2.157336e-03
## WP4240 2.204258e-03
## WP2328 9.301967e-03
## geneID
## WP2446 8318/9133/7153/6241/890/983/81620/7272/1111/891/24137
## WP2361 4605/7153/11065/22974/6286/6790
## WP179 8318/991/9133/890/983/7272/1111/891/4174/9232
## WP3942 4312/9415/9370/5105/2167/3158/5346
## WP4240 991/1062/4085/9232
## WP2328 10563/6373/4283/3002/10578/3117/730
## Count
## WP2446 11
## WP2361 6
## WP179 10
## WP3942 7
## WP4240 4
## WP2328 7
ewp2 <- GSEA(geneList, TERM2GENE = wpid2gene, TERM2NAME = wpid2name, verbose=FALSE)
head(ewp2)
## ID
## WP4172 WP4172
## WP3932 WP3932
## WP306 WP306
## WP236 WP236
## WP474 WP474
## WP2911 WP2911
## Description
## WP4172 PI3K-Akt Signaling Pathway
## WP3932 Focal Adhesion-PI3K-Akt-mTOR-signaling pathway
## WP306 Focal Adhesion
## WP236 Adipogenesis
## WP474 Endochondral Ossification
## WP2911 miRNA targets in ECM and membrane receptors
## setSize enrichmentScore NES pvalue
## WP4172 311 -0.3406070 -1.467847 0.001347709
## WP3932 281 -0.3903410 -1.664489 0.001383126
## WP306 188 -0.4230017 -1.731019 0.001449275
## WP236 125 -0.4387536 -1.694197 0.001526718
## WP474 59 -0.5111776 -1.751890 0.001572327
## WP2911 21 -0.6953475 -1.888706 0.001757469
## p.adjust qvalues rank
## WP4172 0.03001658 0.02487562 2133
## WP3932 0.03001658 0.02487562 1994
## WP306 0.03001658 0.02487562 2221
## WP236 0.03001658 0.02487562 2301
## WP474 0.03001658 0.02487562 2142
## WP2911 0.03001658 0.02487562 2629
## leading_edge
## WP4172 tags=24%, list=17%, signal=21%
## WP3932 tags=26%, list=16%, signal=22%
## WP306 tags=28%, list=18%, signal=23%
## WP236 tags=34%, list=18%, signal=28%
## WP474 tags=47%, list=17%, signal=40%
## WP2911 tags=76%, list=21%, signal=60%
## core_enrichment
## WP4172 5228/7424/7157/627/2252/7059/92579/5563/5295/6794/1288/7010/3910/3371/3082/1291/4602/3791/1027/90993/3441/3643/1129/2322/1975/7450/596/3685/1942/2149/1280/4804/3675/595/2261/7248/2246/4803/2259/3912/1902/1278/1277/2846/2057/1293/2247/55970/5618/7058/10161/56034/3693/4254/3480/4908/5159/1292/3908/2690/3909/8817/4915/3551/2791/63923/3913/3667/1287/3679/7060/3479/80310/1311/5105/1101
## WP3932 2252/7059/51719/8660/5563/5295/6794/1288/7010/3910/3371/3082/3791/1301/1027/90993/3643/1129/1975/7450/3685/2034/1942/2149/1280/4804/3675/2261/7248/2246/4803/2259/3912/1902/2308/1278/1277/81617/2846/2057/2247/1281/50509/1290/55970/5618/7058/10161/56034/3693/4254/6720/3480/5159/3991/1289/1292/3908/2690/3909/8817/3551/2791/63923/3913/3667/3679/7060/3479/80310/1311/1101/3169
## WP306 10420/5595/5228/7424/1499/4636/83660/7059/5295/1288/23396/3910/3371/3082/394/3791/7450/596/3685/1280/3675/595/2318/3912/1793/1278/5753/1277/10398/55742/50509/1290/2317/7058/25759/56034/3693/3480/5159/857/1292/3908/3909/63923/3913/3679/7060/3479/10451/80310/1311/1101
## WP236 9611/8321/5925/8609/1499/2908/4088/8660/3399/6778/8648/6258/2662/5914/6776/2034/196/8204/4692/4208/2308/5468/6695/4023/1592/7350/81029/3952/1675/5618/6720/3991/2487/3667/3572/3479/6424/9370/5105/652/5346/2625
## WP474 9759/4921/51719/860/5745/1028/1280/2261/4256/7042/4208/4322/8100/7078/2247/11096/85477/3480/2690/8817/5167/2737/5744/2487/5327/1300/1473/3479
## WP2911 6383/3672/7057/3915/3910/1291/1278/1293/1281/50509/1290/7058/3693/1289/1292/3913
You may want to convert the gene IDs to gene symbols, which can be done by setReadable
function.
library(org.Hs.eg.db)
ewp <- setReadable(ewp, org.Hs.eg.db, keyType = "ENTREZID")
ewp2 <- setReadable(ewp2, org.Hs.eg.db, keyType = "ENTREZID")
head(ewp)
## ID
## WP2446 WP2446
## WP2361 WP2361
## WP179 WP179
## WP3942 WP3942
## WP4240 WP4240
## WP2328 WP2328
## Description
## WP2446 Retinoblastoma (RB) in Cancer
## WP2361 Gastric Cancer Network 1
## WP179 Cell Cycle
## WP3942 PPAR signaling pathway
## WP4240 Regulation of sister chromatid separation at the metaphase-anaphase transition
## WP2328 Allograft Rejection
## GeneRatio BgRatio pvalue p.adjust
## WP2446 11/95 88/6249 6.801697e-08 1.054263e-05
## WP2361 6/95 29/6249 3.772735e-06 2.923870e-04
## WP179 10/95 122/6249 1.384549e-05 7.153503e-04
## WP3942 7/95 67/6249 6.210513e-05 2.406574e-03
## WP4240 4/95 16/6249 7.931988e-05 2.458916e-03
## WP2328 7/95 90/6249 4.016758e-04 1.037663e-02
## qvalue
## WP2446 9.450779e-06
## WP2361 2.621058e-04
## WP179 6.412648e-04
## WP3942 2.157336e-03
## WP4240 2.204258e-03
## WP2328 9.301967e-03
## geneID
## WP2446 CDC45/CCNB2/TOP2A/RRM2/CCNA2/CDK1/CDT1/TTK/CHEK1/CCNB1/KIF4A
## WP2361 MYBL2/TOP2A/UBE2C/TPX2/S100P/AURKA
## WP179 CDC45/CDC20/CCNB2/CCNA2/CDK1/TTK/CHEK1/CCNB1/MCM5/PTTG1
## WP3942 MMP1/FADS2/ADIPOQ/PCK1/FABP4/HMGCS2/PLIN1
## WP4240 CDC20/CENPE/MAD2L1/PTTG1
## WP2328 CXCL13/CXCL11/CXCL9/GZMB/GNLY/HLA-DQA1/C7
## Count
## WP2446 11
## WP2361 6
## WP179 10
## WP3942 7
## WP4240 4
## WP2328 7
head(ewp2)
## ID
## WP4172 WP4172
## WP3932 WP3932
## WP306 WP306
## WP236 WP236
## WP474 WP474
## WP2911 WP2911
## Description
## WP4172 PI3K-Akt Signaling Pathway
## WP3932 Focal Adhesion-PI3K-Akt-mTOR-signaling pathway
## WP306 Focal Adhesion
## WP236 Adipogenesis
## WP474 Endochondral Ossification
## WP2911 miRNA targets in ECM and membrane receptors
## setSize enrichmentScore NES pvalue
## WP4172 311 -0.3406070 -1.467847 0.001347709
## WP3932 281 -0.3903410 -1.664489 0.001383126
## WP306 188 -0.4230017 -1.731019 0.001449275
## WP236 125 -0.4387536 -1.694197 0.001526718
## WP474 59 -0.5111776 -1.751890 0.001572327
## WP2911 21 -0.6953475 -1.888706 0.001757469
## p.adjust qvalues rank
## WP4172 0.03001658 0.02487562 2133
## WP3932 0.03001658 0.02487562 1994
## WP306 0.03001658 0.02487562 2221
## WP236 0.03001658 0.02487562 2301
## WP474 0.03001658 0.02487562 2142
## WP2911 0.03001658 0.02487562 2629
## leading_edge
## WP4172 tags=24%, list=17%, signal=21%
## WP3932 tags=26%, list=16%, signal=22%
## WP306 tags=28%, list=18%, signal=23%
## WP236 tags=34%, list=18%, signal=28%
## WP474 tags=47%, list=17%, signal=40%
## WP2911 tags=76%, list=21%, signal=60%
## core_enrichment
## WP4172 PGF/VEGFC/TP53/BDNF/FGF7/THBS3/G6PC3/PRKAA2/PIK3R1/STK11/COL4A6/TEK/LAMA4/TNC/HGF/COL6A1/MYB/KDR/CDKN1B/CREB3L1/IFNA4/INSR/CHRM2/FLT3/EIF4B/VWF/BCL2/ITGAV/EFNA1/F2R/COL2A1/NGFR/ITGA3/CCND1/FGFR3/TSC1/FGF1/NGF/FGF14/LAMB1/LPAR1/COL1A2/COL1A1/LPAR4/EPOR/COL6A3/FGF2/GNG12/PRLR/THBS2/LPAR6/PDGFC/ITGB5/KITLG/IGF1R/NTF3/PDGFRB/COL6A2/LAMA2/GHR/LAMA3/FGF18/NTRK2/IKBKB/GNG11/TNN/LAMB2/IRS1/COL4A5/ITGA7/THBS4/IGF1/PDGFD/COMP/PCK1/CHAD
## WP3932 FGF7/THBS3/CAB39/IRS2/PRKAA2/PIK3R1/STK11/COL4A6/TEK/LAMA4/TNC/HGF/KDR/COL11A1/CDKN1B/CREB3L1/INSR/CHRM2/EIF4B/VWF/ITGAV/EPAS1/EFNA1/F2R/COL2A1/NGFR/ITGA3/FGFR3/TSC1/FGF1/NGF/FGF14/LAMB1/LPAR1/FOXO1/COL1A2/COL1A1/CAB39L/LPAR4/EPOR/FGF2/COL3A1/COL5A3/COL5A2/GNG12/PRLR/THBS2/LPAR6/PDGFC/ITGB5/KITLG/SREBF1/IGF1R/PDGFRB/LIPE/COL5A1/COL6A2/LAMA2/GHR/LAMA3/FGF18/IKBKB/GNG11/TNN/LAMB2/IRS1/ITGA7/THBS4/IGF1/PDGFD/COMP/CHAD/FOXA1
## WP306 TESK2/MAPK3/PGF/VEGFC/CTNNB1/MYL5/TLN2/THBS3/PIK3R1/COL4A6/PIP5K1C/LAMA4/TNC/HGF/ARHGAP5/KDR/VWF/BCL2/ITGAV/COL2A1/ITGA3/CCND1/FLNC/LAMB1/DOCK1/COL1A2/PTK6/COL1A1/MYL9/PARVA/COL5A3/COL5A2/FLNB/THBS2/SHC2/PDGFC/ITGB5/IGF1R/PDGFRB/CAV1/COL6A2/LAMA2/LAMA3/TNN/LAMB2/ITGA7/THBS4/IGF1/VAV3/PDGFD/COMP/CHAD
## WP236 NCOR1/FZD1/RB1/KLF7/CTNNB1/NR3C1/SMAD3/IRS2/ID3/STAT6/NCOA1/RXRG/GDF10/RARA/STAT5A/EPAS1/AHR/NRIP1/NDN/MEF2C/FOXO1/PPARG/SPOCK1/LPL/CYP26A1/UCP1/WNT5B/LEP/CFD/PRLR/SREBF1/LIPE/FRZB/IRS1/IL6ST/IGF1/SFRP4/ADIPOQ/PCK1/BMP4/PLIN1/GATA3
## WP474 HDAC4/DDR2/CAB39/RUNX2/PTH1R/CDKN1C/COL2A1/FGFR3/MGP/TGFB2/MEF2C/MMP13/IFT88/TIMP3/FGF2/ADAMTS5/SCIN/IGF1R/GHR/FGF18/ENPP1/GLI3/PTHLH/FRZB/PLAT/COL10A1/CST5/IGF1
## WP2911 SDC2/ITGA1/THBS1/LAMC1/LAMA4/COL6A1/COL1A2/COL6A3/COL3A1/COL5A3/COL5A2/THBS2/ITGB5/COL5A1/COL6A2/LAMB2
As an alternative to manually downloading gmt files, install the rWikiPathways package to gain scripting access to the latest gmt files using the downloadPathwayArchive
function.
3.3 Cell Marker
cell_markers <- vroom::vroom('http://bio-bigdata.hrbmu.edu.cn/CellMarker/download/Human_cell_markers.txt') %>%
tidyr::unite("cellMarker", tissueType, cancerType, cellName, sep=", ") %>%
dplyr::select(cellMarker, geneID) %>%
dplyr::mutate(geneID = strsplit(geneID, ', '))
cell_markers
## # A tibble: 2,868 x 2
## cellMarker geneID
## <chr> <list>
## 1 Kidney, Normal, Proximal tubular cell <chr [1…
## 2 Liver, Normal, Ito cell (hepatic stellate cell) <chr [1…
## 3 Endometrium, Normal, Trophoblast cell <chr [1…
## 4 Germ, Normal, Primordial germ cell <chr [1…
## 5 Corneal epithelium, Normal, Epithelial cell <chr [1…
## 6 Placenta, Normal, Cytotrophoblast <chr [1…
## 7 Periosteum, Normal, Periosteum-derived progenit… <chr [4…
## 8 Amniotic membrane, Normal, Amnion epithelial ce… <chr [2…
## 9 Primitive streak, Normal, Primitive streak cell <chr [2…
## 10 Adipose tissue, Normal, Stromal vascular fracti… <chr [1…
## # … with 2,858 more rows
y <- enricher(gene, TERM2GENE=cell_markers, minGSSize=1)
DT::datatable(as.data.frame(y))
<label style="-webkit-font-smoothing: antialiased; box-sizing: border-box; -webkit-tap-highlight-color: transparent; text-size-adjust: none;">Show <select name="DataTables_Table_0_length" aria-controls="DataTables_Table_0" class="" style="-webkit-font-smoothing: antialiased; box-sizing: border-box; -webkit-tap-highlight-color: transparent; text-size-adjust: none; font-family: inherit; font-size: 16px; margin: 0px; text-transform: none;"><option value="10" style="-webkit-font-smoothing: antialiased; box-sizing: border-box; -webkit-tap-highlight-color: transparent; text-size-adjust: none;">10</option><option value="25" style="-webkit-font-smoothing: antialiased; box-sizing: border-box; -webkit-tap-highlight-color: transparent; text-size-adjust: none;">25</option><option value="50" style="-webkit-font-smoothing: antialiased; box-sizing: border-box; -webkit-tap-highlight-color: transparent; text-size-adjust: none;">50</option><option value="100" style="-webkit-font-smoothing: antialiased; box-sizing: border-box; -webkit-tap-highlight-color: transparent; text-size-adjust: none;">100</option></select> entries</label>
<label style="-webkit-font-smoothing: antialiased; box-sizing: border-box; -webkit-tap-highlight-color: transparent; text-size-adjust: none;">Search:<input type="search" class="" placeholder="" aria-controls="DataTables_Table_0" style="-webkit-font-smoothing: antialiased; box-sizing: border-box; -webkit-tap-highlight-color: transparent; text-size-adjust: none; font-family: inherit; font-size: 16px; margin: 0px 0px 0px 0.5em; line-height: normal; -webkit-appearance: textfield;"></label>
ID | Description | GeneRatio | BgRatio | pvalue | p.adjust | qvalue | geneID | Count | |
---|---|---|---|---|---|---|---|---|---|
Embryonic prefrontal cortex, Normal, Neural progenitor cell | Embryonic prefrontal cortex, Normal, Neural progenitor cell | Embryonic prefrontal cortex, Normal, Neural progenitor cell | 33/153 | 166/11582 | 4.43518066168319e-30 | 6.34230834620697e-28 | 4.80866955950915e-28 | 991/9493/9833/9133/10403/7153/259266/6241/9787/11065/55355/55872/51203/83461/22974/10460/79019/55839/890/983/4085/5080/332/3832/9212/51659/9055/891/4174/9232/1602/2018/4137 | 33 |
Large intestine, Normal, MKI67+ progenitor cell | Large intestine, Normal, MKI67+ progenitor cell | Large intestine, Normal, MKI67+ progenitor cell | 20/153 | 111/11582 | 1.42606245514518e-17 | 1.0196346554288e-15 | 7.7307596252607e-16 | 991/9133/10403/7153/259266/6241/9787/11065/51203/22974/10460/890/983/332/9212/9055/891/9232/2167/51705 | 20 |
Kidney, Normal, Neutrophil | Kidney, Normal, Neutrophil | Kidney, Normal, Neutrophil | 6/153 | 42/11582 | 0.0000170645935795602 | 0.000813412293959037 | 0.000616720399542001 | 6280/6279/597/7850/6286/366 | 6 |
Large intestine, Normal, DCLK1+ progenitor cell | Large intestine, Normal, DCLK1+ progenitor cell | Large intestine, Normal, DCLK1+ progenitor cell | 7/153 | 99/11582 | 0.000329875455341899 | 0.0117930475284729 | 0.00894136102637251 | 1307/730/54829/10234/79901/57758/4969 | 7 |
Kidney, Renal Cell Carcinoma, Neutrophil | Kidney, Renal Cell Carcinoma, Neutrophil | Kidney, Renal Cell Carcinoma, Neutrophil | 6/153 | 77/11582 | 0.0005276542867749 | 0.0150909126017621 | 0.0114417666395399 | 6280/6279/597/7850/6286/366 | 6 |
Fetal gonad, Normal, Endothelial cell | Fetal gonad, Normal, Endothelial cell | Fetal gonad, Normal, Endothelial cell | 9/153 | 209/11582 | 0.00181179940206466 | 0.0431812190825411 | 0.0327395330548526 | 9493/23397/9787/10460/55839/332/7272/4521/9232 | 9 |
Showing 1 to 6 of 6 entries
Previous1Next
3.4 MSigDb analysis
Molecular Signatures Database contains 8 major collections:
- H: hallmark gene sets
- C1: positional gene sets
- C2: curated gene sets
- C3: motif gene sets
- C4: computational gene sets
- C5: GO gene sets
- C6: oncogenic signatures
- C7: immunologic signatures
Users can download GMT files from Broad Institute and use read.gmt
to parse the file to be used in enricher()
and GSEA()
.
There is an R package, msigdbr, that already packed the MSigDB gene sets in tidy data format that can be used directly with clusterProfiler.
It supports several specices:
library(msigdbr)
msigdbr_show_species()
## [1] "Bos taurus" "Caenorhabditis elegans"
## [3] "Canis lupus familiaris" "Danio rerio"
## [5] "Drosophila melanogaster" "Gallus gallus"
## [7] "Homo sapiens" "Mus musculus"
## [9] "Rattus norvegicus" "Saccharomyces cerevisiae"
## [11] "Sus scrofa"
We can retrieve all human gene sets:
m_df <- msigdbr(species = "Homo sapiens")
head(m_df, 2) %>% as.data.frame
## gs_name gs_id gs_cat gs_subcat human_gene_symbol
## 1 AAACCAC_MIR140 M12609 C3 MIR ABCC4
## 2 AAACCAC_MIR140 M12609 C3 MIR ABRAXAS2
## species_name entrez_gene gene_symbol sources
## 1 Homo sapiens 10257 ABCC4 <NA>
## 2 Homo sapiens 23172 ABRAXAS2 <NA>
Or specific collection. Here we use C6, oncogenic gene sets as an example:
m_t2g <- msigdbr(species = "Homo sapiens", category = "C6") %>%
dplyr::select(gs_name, entrez_gene)
head(m_t2g)
## # A tibble: 6 x 2
## gs_name entrez_gene
## <chr> <int>
## 1 AKT_UP_MTOR_DN.V1_DN 25864
## 2 AKT_UP_MTOR_DN.V1_DN 95
## 3 AKT_UP_MTOR_DN.V1_DN 137872
## 4 AKT_UP_MTOR_DN.V1_DN 134
## 5 AKT_UP_MTOR_DN.V1_DN 55326
## 6 AKT_UP_MTOR_DN.V1_DN 216
em <- enricher(gene, TERM2GENE=m_t2g)
em2 <- GSEA(geneList, TERM2GENE = m_t2g)
head(em)
## ID
## RPS14_DN.V1_DN RPS14_DN.V1_DN
## GCNP_SHH_UP_LATE.V1_UP GCNP_SHH_UP_LATE.V1_UP
## VEGF_A_UP.V1_DN VEGF_A_UP.V1_DN
## PRC2_EZH2_UP.V1_DN PRC2_EZH2_UP.V1_DN
## CSR_LATE_UP.V1_UP CSR_LATE_UP.V1_UP
## ATF2_S_UP.V1_UP ATF2_S_UP.V1_UP
## Description GeneRatio
## RPS14_DN.V1_DN RPS14_DN.V1_DN 22/185
## GCNP_SHH_UP_LATE.V1_UP GCNP_SHH_UP_LATE.V1_UP 16/185
## VEGF_A_UP.V1_DN VEGF_A_UP.V1_DN 15/185
## PRC2_EZH2_UP.V1_DN PRC2_EZH2_UP.V1_DN 14/185
## CSR_LATE_UP.V1_UP CSR_LATE_UP.V1_UP 12/185
## ATF2_S_UP.V1_UP ATF2_S_UP.V1_UP 12/185
## BgRatio pvalue p.adjust
## RPS14_DN.V1_DN 185/10962 4.855041e-13 8.205020e-11
## GCNP_SHH_UP_LATE.V1_UP 186/10962 9.357887e-08 7.907414e-06
## VEGF_A_UP.V1_DN 193/10962 8.887046e-07 5.006369e-05
## PRC2_EZH2_UP.V1_DN 190/10962 3.907976e-06 1.651120e-04
## CSR_LATE_UP.V1_UP 166/10962 2.368079e-05 8.004105e-04
## ATF2_S_UP.V1_UP 190/10962 8.879028e-05 2.143651e-03
## qvalue
## RPS14_DN.V1_DN 6.694846e-11
## GCNP_SHH_UP_LATE.V1_UP 6.452017e-06
## VEGF_A_UP.V1_DN 4.084923e-05
## PRC2_EZH2_UP.V1_DN 1.347223e-04
## CSR_LATE_UP.V1_UP 6.530911e-04
## ATF2_S_UP.V1_UP 1.749102e-03
## geneID
## RPS14_DN.V1_DN 10874/55388/991/9493/1062/4605/9133/23397/79733/9787/55872/83461/54821/51659/9319/9055/10112/4174/5105/2532/7021/79901
## GCNP_SHH_UP_LATE.V1_UP 55388/7153/79733/6241/9787/51203/983/9212/1111/9319/9055/3833/6790/4174/3169/1580
## VEGF_A_UP.V1_DN 8318/9493/1062/9133/10403/6241/9787/4085/332/3832/7272/891/23362/2167/10234
## PRC2_EZH2_UP.V1_DN 8318/55388/4605/23397/9787/55355/10460/81620/2146/7272/9212/11182/3887/24137
## CSR_LATE_UP.V1_UP 55143/2305/4605/6241/11065/55872/983/332/2146/51659/9319/1580
## ATF2_S_UP.V1_UP 4312/6280/2305/9493/7153/2146/10112/3887/8544/22885/10234/10974
## Count
## RPS14_DN.V1_DN 22
## GCNP_SHH_UP_LATE.V1_UP 16
## VEGF_A_UP.V1_DN 15
## PRC2_EZH2_UP.V1_DN 14
## CSR_LATE_UP.V1_UP 12
## ATF2_S_UP.V1_UP 12
head(em2)
## ID Description setSize
## LEF1_UP.V1_DN LEF1_UP.V1_DN LEF1_UP.V1_DN 186
## LTE2_UP.V1_UP LTE2_UP.V1_UP LTE2_UP.V1_UP 183
## RAF_UP.V1_DN RAF_UP.V1_DN RAF_UP.V1_DN 189
## IL2_UP.V1_DN IL2_UP.V1_DN IL2_UP.V1_DN 182
## ATF2_S_UP.V1_DN ATF2_S_UP.V1_DN ATF2_S_UP.V1_DN 181
## ATF2_UP.V1_DN ATF2_UP.V1_DN ATF2_UP.V1_DN 177
## enrichmentScore NES pvalue
## LEF1_UP.V1_DN -0.4857056 -1.987140 0.001381215
## LTE2_UP.V1_UP -0.3938357 -1.608416 0.001383126
## RAF_UP.V1_DN -0.5521243 -2.254389 0.001383126
## IL2_UP.V1_DN -0.4252756 -1.732758 0.001388889
## ATF2_S_UP.V1_DN -0.4460160 -1.815767 0.001394700
## ATF2_UP.V1_DN -0.5008413 -2.032113 0.001402525
## p.adjust qvalues rank
## LEF1_UP.V1_DN 0.01354839 0.007394831 1705
## LTE2_UP.V1_UP 0.01354839 0.007394831 2110
## RAF_UP.V1_DN 0.01354839 0.007394831 2035
## IL2_UP.V1_DN 0.01354839 0.007394831 2774
## ATF2_S_UP.V1_DN 0.01354839 0.007394831 2531
## ATF2_UP.V1_DN 0.01354839 0.007394831 2055
## leading_edge
## LEF1_UP.V1_DN tags=35%, list=14%, signal=31%
## LTE2_UP.V1_UP tags=33%, list=17%, signal=28%
## RAF_UP.V1_DN tags=43%, list=16%, signal=37%
## IL2_UP.V1_DN tags=39%, list=22%, signal=31%
## ATF2_S_UP.V1_DN tags=39%, list=20%, signal=32%
## ATF2_UP.V1_DN tags=38%, list=16%, signal=32%
## core_enrichment
## LEF1_UP.V1_DN 79153/22998/10325/23221/79170/9187/4300/51479/25956/7552/8644/11162/7168/1490/54463/65084/25837/80303/1809/3397/4208/11223/79762/6451/481/54795/51626/8991/5950/5627/64699/6414/10103/221078/9961/5874/56898/90865/83989/5002/6653/56034/55314/85458/956/54502/4832/5783/1363/53832/54985/80221/214/50853/51149/57088/5157/2947/79932/7031/6505/9338/25924/80736/3169/10551
## LTE2_UP.V1_UP 23074/64759/1601/6558/7138/11138/10497/81031/665/2878/3554/4925/55005/9252/2034/5283/54869/5025/6019/65124/10979/51435/26999/4212/444/23588/55556/2954/93129/79762/54453/79789/316/79679/151011/51280/9630/57037/5627/9867/55198/6653/10455/5101/65055/8309/9429/23303/9590/253190/4982/4128/4680/1846/51760/80129/4857/5507/3158/5304/10974
## RAF_UP.V1_DN 23314/221037/1960/51340/55105/10265/22996/9687/5357/1952/12/4602/9231/596/51454/57419/595/4256/23022/8777/7837/7042/3397/57613/323/1831/6451/54843/7358/2353/8773/6938/8991/64699/57007/23389/10560/7227/3485/79068/10769/4254/26018/3480/2674/23327/857/116039/6542/2690/1955/1363/26353/7033/5376/89927/51363/8821/2239/6947/4886/214/3487/3667/5157/54847/54898/7031/6505/57535/10451/18/771/80129/5174/5507/56521/8839/8614/5241/10551/57758
## IL2_UP.V1_DN 51276/10858/5333/2977/79874/10014/7773/64798/26959/6588/579/777/5158/23361/9915/5087/64759/2535/6123/7010/51285/126353/80201/79841/1028/2322/4222/947/9187/225/9976/5592/54996/3202/55779/23148/5334/8605/57326/26115/54453/55766/64927/6591/55204/9940/57608/653483/9886/10279/55805/10628/901/9891/7102/27244/79921/4674/8292/5125/79843/6857/79940/23541/3357/50853/9863/79932/90362/55663/4250
## ATF2_S_UP.V1_DN 8714/53616/55667/51279/4642/7130/7139/2/710/80352/4608/11145/881/2252/3399/22846/10687/12/1397/1301/3554/2619/26011/78987/9284/1306/6586/9627/4054/54813/115207/55890/26249/4803/11075/3400/79789/8527/2353/10580/79987/80760/5493/90865/3485/3751/2202/2199/10610/1047/1294/23492/3908/5364/658/5350/2121/2331/4330/9737/4982/57088/8490/9358/6424/1307/3708/125/1311/56521/10351
## ATF2_UP.V1_DN 11145/9738/2252/8660/10687/5357/80201/12/22998/3554/2619/23556/8840/9284/57419/5737/6586/9627/8863/10085/115207/55890/23015/629/3212/3400/50650/8527/2353/79987/80760/9668/1012/10614/5493/8553/23126/90865/3485/3751/2202/57212/2199/10610/3075/6387/22834/1909/10129/57161/5350/2121/2331/4330/2952/4982/57088/9358/6424/6038/51313/1311/56521/10351/5304/4239/79901
We can test with other collections, for example, using C3 to test whether the genes are up/down-regulated by sharing specific motif.
m_t2g <- msigdbr(species = "Homo sapiens", category = "C3") %>%
dplyr::select(gs_name, entrez_gene)
head(m_t2g)
## # A tibble: 6 x 2
## gs_name entrez_gene
## <chr> <int>
## 1 AAACCAC_MIR140 10257
## 2 AAACCAC_MIR140 23172
## 3 AAACCAC_MIR140 81
## 4 AAACCAC_MIR140 90
## 5 AAACCAC_MIR140 8754
## 6 AAACCAC_MIR140 11096
em3 <- GSEA(geneList, TERM2GENE = m_t2g)
head(em3)
## ID
## GGATTA_PITX2_Q2 GGATTA_PITX2_Q2
## TTTGCAC_MIR19A_MIR19B TTTGCAC_MIR19A_MIR19B
## CAGTATT_MIR200B_MIR200C_MIR429 CAGTATT_MIR200B_MIR200C_MIR429
## AAAYRNCTG_UNKNOWN AAAYRNCTG_UNKNOWN
## TTGCACT_MIR130A_MIR301_MIR130B TTGCACT_MIR130A_MIR301_MIR130B
## ATGTACA_MIR493 ATGTACA_MIR493
## Description
## GGATTA_PITX2_Q2 GGATTA_PITX2_Q2
## TTTGCAC_MIR19A_MIR19B TTTGCAC_MIR19A_MIR19B
## CAGTATT_MIR200B_MIR200C_MIR429 CAGTATT_MIR200B_MIR200C_MIR429
## AAAYRNCTG_UNKNOWN AAAYRNCTG_UNKNOWN
## TTGCACT_MIR130A_MIR301_MIR130B TTGCACT_MIR130A_MIR301_MIR130B
## ATGTACA_MIR493 ATGTACA_MIR493
## setSize enrichmentScore
## GGATTA_PITX2_Q2 455 -0.3365909
## TTTGCAC_MIR19A_MIR19B 397 -0.3672122
## CAGTATT_MIR200B_MIR200C_MIR429 380 -0.3607152
## AAAYRNCTG_UNKNOWN 290 -0.4355277
## TTGCACT_MIR130A_MIR301_MIR130B 313 -0.3890571
## ATGTACA_MIR493 251 -0.3980657
## NES pvalue
## GGATTA_PITX2_Q2 -1.495575 0.001230012
## TTTGCAC_MIR19A_MIR19B -1.612649 0.001254705
## CAGTATT_MIR200B_MIR200C_MIR429 -1.581860 0.001257862
## AAAYRNCTG_UNKNOWN -1.863237 0.001319261
## TTGCACT_MIR130A_MIR301_MIR130B -1.675646 0.001322751
## ATGTACA_MIR493 -1.676300 0.001347709
## p.adjust qvalues rank
## GGATTA_PITX2_Q2 0.04123119 0.02860412 2464
## TTTGCAC_MIR19A_MIR19B 0.04123119 0.02860412 4078
## CAGTATT_MIR200B_MIR200C_MIR429 0.04123119 0.02860412 3186
## AAAYRNCTG_UNKNOWN 0.04123119 0.02860412 2952
## TTGCACT_MIR130A_MIR301_MIR130B 0.04123119 0.02860412 2861
## ATGTACA_MIR493 0.04123119 0.02860412 3911
## leading_edge
## GGATTA_PITX2_Q2 tags=26%, list=20%, signal=22%
## TTTGCAC_MIR19A_MIR19B tags=44%, list=33%, signal=30%
## CAGTATT_MIR200B_MIR200C_MIR429 tags=34%, list=25%, signal=26%
## AAAYRNCTG_UNKNOWN tags=38%, list=24%, signal=29%
## TTGCACT_MIR130A_MIR301_MIR130B tags=36%, list=23%, signal=28%
## ATGTACA_MIR493 tags=45%, list=31%, signal=32%
## core_enrichment
## GGATTA_PITX2_Q2 25836/311/3199/4604/57556/8019/3321/57835/8929/6671/1760/10401/4430/9497/5530/6532/54894/23528/5228/5087/2295/6453/56171/3232/4921/3005/221037/56912/10265/10521/10497/23767/8322/37/23429/6561/4303/57082/845/94/51447/9922/6258/23243/2571/10253/4602/29119/55068/126393/7402/56675/6302/5624/7716/7779/596/4300/57804/79923/1280/100506658/9627/3675/26037/2261/2804/65084/11030/3216/57326/140890/23255/5396/9854/1907/54843/57037/862/23452/55152/1848/8835/89795/57007/64221/5136/7227/10324/55273/6310/2202/56034/956/54502/51474/4254/6925/6097/744/443/23446/4306/4915/4487/25803/4081/4223/10472/388677/55107/347902/3708/10742/51313/4857/730/2532/4036/3169
## TTTGCAC_MIR19A_MIR19B 22911/1605/659/10318/4299/6249/81609/57194/9686/10425/9839/23013/22936/989/51742/4154/55625/51460/51592/11127/3976/54980/9751/56937/6045/4204/27252/4862/27236/2115/5869/152006/57698/57018/23032/56145/10150/55054/4090/25852/10618/2060/7107/10771/7071/9044/10904/9752/6431/8473/83452/392/10395/6777/51230/22889/23236/55652/158471/6595/905/56146/4026/79665/5932/115/8325/546/10154/9655/23499/54521/26959/23612/3321/56929/54778/26060/56848/9098/107/10420/23001/29922/9759/29116/23405/94134/2295/6453/83660/51719/8654/11138/2186/23341/8038/860/2004/9256/6548/23621/23503/29/394/220594/831/92689/55082/54014/1901/23111/7552/27303/1490/51454/57419/66008/2318/6252/7248/4929/23041/9900/55727/91694/22863/1389/54796/3400/9649/23253/54453/7048/22862/22905/23365/79899/5412/60481/54838/89795/5793/23389/7227/6310/3625/26018/23492/2099/23116/744/775/90355/388/23414/11069/6857/26960/55638/10472/2152/2697/3131/7786/55107/9863/23261/3479/3708/10742/2018/4036/2066
## CAGTATT_MIR200B_MIR200C_MIR429 7071/31/23291/5179/57092/9969/9865/8473/9644/10395/6383/5569/1392/4008/9647/8613/6405/4884/8491/9706/2263/6567/53353/64766/5310/546/5128/55145/54469/200576/8434/7003/687/3915/1456/3321/10499/23189/26060/55578/1998/51421/22864/10484/10140/23001/4661/253782/9807/83660/95681/1960/2908/51719/2186/22846/108/9710/80205/2104/9522/7320/23243/23258/4602/129642/2669/51439/7716/54014/23047/5099/2823/3340/1942/7552/1843/100506658/8204/10979/79618/23015/8848/4692/5334/57515/26249/6563/23122/6196/1902/4077/2494/10370/3953/51339/79365/54838/89795/8076/5577/3751/6310/1009/29994/10769/1501/57509/3899/2273/23116/4908/27244/775/5783/65055/4035/23414/2737/5744/6857/55800/10186/3131/9828/6935/4982/219654/1602/3708/4857
## AAAYRNCTG_UNKNOWN 598/1392/4008/11080/79893/4724/8515/8613/81847/4297/3218/27314/7025/6595/8516/29966/11334/90/54329/89797/120/783/4750/8289/5071/3386/8929/2776/5530/23564/25959/4776/607/8038/5926/1288/2104/4303/845/51447/23243/22902/3082/54879/5745/4211/9024/23047/1942/79625/27303/4628/55911/51585/3202/6252/23338/64112/7482/444/55128/26249/64236/3397/64393/23635/23253/4013/1278/5252/323/1831/10370/1384/5549/26575/1272/89795/81603/55799/8470/7227/54875/761/477/29951/10468/6444/4254/7102/744/2045/4915/658/23414/11069/55800/10186/10472/2205/1287/7122/3479/1846/2018/2066/3169/8614/79901
## TTGCACT_MIR130A_MIR301_MIR130B 9044/1808/196441/10904/27430/192670/8473/392/2909/83482/4762/4297/23236/27314/5932/4884/9706/90/546/9655/84196/5562/8626/757/26959/26060/56848/9098/23328/113791/107/10420/23001/29922/4661/23405/94134/2295/60468/55914/11138/2186/23341/6500/108/8038/51257/860/8648/80205/2004/9256/23243/23503/23469/4602/220594/8452/51111/55082/54014/1901/23111/7552/57419/389136/171586/3213/3202/6252/7248/23181/7482/55727/5797/91694/22863/10612/23253/8829/7048/753/23365/79365/79899/5468/5793/7227/6310/3625/26018/2099/23116/744/90355/23446/4306/23414/11069/64061/90627/26960/4223/55638/10472/2697/7786/23261/3708/10742/2018/2066
## ATGTACA_MIR493