收到一个学生求助,得到空的表达矩阵,限于目前初学的状态,处理原始数据比较困难,但又急用,这不,我来帮忙咯。
1.初步探索
library(tinyarray)
gse_number = "GSE61260"
b = geo_download(gse_number)
## Warning in geo_download(gse_number): exp is empty
pd = b$pd # 表达矩阵是空的但临床信息有东西
gpl_number = b$gpl #编号也正确
表达矩阵确实是空的,不过附件里提供了cel格式的原始数据,打开这个数据的geo页面即可看到。
它有个写着标准化了的数据,目测是正常的矩阵。但不把握。还是自己搞一下看看。
2.cel原始数据处理
原始数据👉500+M,自己弄起来确实费劲,下载速度,计算资源都可能是限制。拿现成的应该会很快乐O(∩_∩)O。
# BiocManager::install(c('oligo','pd.hugene.1.1.st.v1' ),ask = F,update = F)
library(oligo)
dir='GSE61260_RAW/'
od=getwd()
setwd(dir)
celFiles <- list.celfiles(listGzipped = T)
celFiles
## [1] "GSM1501013_A1970-01.CEL.gz" "GSM1501014_A1891-01.CEL.gz"
## [3] "GSM1501015_A1891-03.CEL.gz" "GSM1501016_A1359-01.CEL.gz"
## [5] "GSM1501017_A1891-04.CEL.gz" "GSM1501018_A1970-02.CEL.gz"
## [7] "GSM1501019_A1891-05.CEL.gz" "GSM1501020_A1359-02.CEL.gz"
## [9] "GSM1501021_A1891-06.CEL.gz" "GSM1501022_A1891-07.CEL.gz"
## [11] "GSM1501023_A1891-08.CEL.gz" "GSM1501024_A1359-03.CEL.gz"
## [13] "GSM1501025_A1970-04.CEL.gz" "GSM1501026_A1970-05.CEL.gz"
## [15] "GSM1501027_A1970-06.CEL.gz" "GSM1501028_A1359-04.CEL.gz"
## [17] "GSM1501029_A1891-09.CEL.gz" "GSM1501030_A1970-07.CEL.gz"
## [19] "GSM1501031_A1891-10.CEL.gz" "GSM1501032_A1359-05.CEL.gz"
## [21] "GSM1501033_A1970-08.CEL.gz" "GSM1501034_A1891-11.CEL.gz"
## [23] "GSM1501035_A1970-09.CEL.gz" "GSM1501036_A1970-10.CEL.gz"
## [25] "GSM1501037_A1970-11.CEL.gz" "GSM1501038_A1891-12.CEL.gz"
## [27] "GSM1501039_A1970-12.CEL.gz" "GSM1501040_A1891-13.CEL.gz"
## [29] "GSM1501041_A1970-13.CEL.gz" "GSM1501042_A1359-07.CEL.gz"
## [31] "GSM1501043_A1891-14.CEL.gz" "GSM1501044_A1970-14.CEL.gz"
## [33] "GSM1501045_A1359-08.CEL.gz" "GSM1501046_A1359-09.CEL.gz"
## [35] "GSM1501047_A1359-10.CEL.gz" "GSM1501048_A1970-15.CEL.gz"
## [37] "GSM1501049_A1970-16.CEL.gz" "GSM1501050_A1970-17.CEL.gz"
## [39] "GSM1501051_A1970-18.CEL.gz" "GSM1501052_A1970-19.CEL.gz"
## [41] "GSM1501053_A1891-15.CEL.gz" "GSM1501054_A1359-12.CEL.gz"
## [43] "GSM1501055_A1970-20.CEL.gz" "GSM1501056_A1359-13.CEL.gz"
## [45] "GSM1501057_A1359-14.CEL.gz" "GSM1501058_A1359-15.CEL.gz"
## [47] "GSM1501059_A1359-16.CEL.gz" "GSM1501060_A1359-17.CEL.gz"
## [49] "GSM1501061_A1970-21.CEL.gz" "GSM1501062_A1359-18.CEL.gz"
## [51] "GSM1501063_A1891-17.CEL.gz" "GSM1501064_A1970-23.CEL.gz"
## [53] "GSM1501065_A1970-24.CEL.gz" "GSM1501066_A1970-25.CEL.gz"
## [55] "GSM1501067_A1891-18.CEL.gz" "GSM1501068_A1359-20.CEL.gz"
## [57] "GSM1501069_A1359-21.CEL.gz" "GSM1501070_A1970-26.CEL.gz"
## [59] "GSM1501071_A1359-22.CEL.gz" "GSM1501072_A1359-23.CEL.gz"
## [61] "GSM1501073_A1359-24.CEL.gz" "GSM1501074_A1649-01.CEL.gz"
## [63] "GSM1501075_A1359-25.CEL.gz" "GSM1501076_A1359-26.CEL.gz"
## [65] "GSM1501077_A1891-19.CEL.gz" "GSM1501078_A1891-20.CEL.gz"
## [67] "GSM1501079_A1970-28.CEL.gz" "GSM1501080_A1359-27.CEL.gz"
## [69] "GSM1501081_A1359-29.CEL.gz" "GSM1501082_A1359-30.CEL.gz"
## [71] "GSM1501083_A1891-21.CEL.gz" "GSM1501084_A1359-32.CEL.gz"
## [73] "GSM1501085_A1359-33.CEL.gz" "GSM1501086_A1359-34.CEL.gz"
## [75] "GSM1501087_A1359-35.CEL.gz" "GSM1501088_A1359-36.CEL.gz"
## [77] "GSM1501089_A1359-37.CEL.gz" "GSM1501090_A1359-38.CEL.gz"
## [79] "GSM1501091_A1359-39.CEL.gz" "GSM1501092_A1359-40.CEL.gz"
## [81] "GSM1501093_A1891-22.CEL.gz" "GSM1501094_A1359-41.CEL.gz"
## [83] "GSM1501095_A1359-44.CEL.gz" "GSM1501096_A1359-45.CEL.gz"
## [85] "GSM1501097_A1359-46.CEL.gz" "GSM1501098_A1359-49.CEL.gz"
## [87] "GSM1501099_A1970-29.CEL.gz" "GSM1501100_A1970-30.CEL.gz"
## [89] "GSM1501101_A1649-05.CEL.gz" "GSM1501102_A1970-31.CEL.gz"
## [91] "GSM1501103_A1359-54.CEL.gz" "GSM1501104_A1649-08.CEL.gz"
## [93] "GSM1501105_A1359-55.CEL.gz" "GSM1501106_A1649-06.CEL.gz"
## [95] "GSM1501107_A1891-24.CEL.gz" "GSM1501108_A1970-32.CEL.gz"
## [97] "GSM1501109_A1934-01.CEL.gz" "GSM1501110_A1359-57.CEL.gz"
## [99] "GSM1501111_A1359-59.CEL.gz" "GSM1501112_A1934-02.CEL.gz"
## [101] "GSM1501113_A1970-33.CEL.gz" "GSM1501114_A1359-61.CEL.gz"
## [103] "GSM1501115_A1359-62.CEL.gz" "GSM1501116_A1359-63.CEL.gz"
## [105] "GSM1501117_A1359-64.CEL.gz" "GSM1501118_A1970-34.CEL.gz"
## [107] "GSM1501119_A1934-04.CEL.gz" "GSM1501120_A1934-05.CEL.gz"
## [109] "GSM1501121_A1934-06.CEL.gz" "GSM1501122_A1934-07.CEL.gz"
## [111] "GSM1501123_A1934-08.CEL.gz" "GSM1501124_A1934-09.CEL.gz"
## [113] "GSM1501125_A1970-35.CEL.gz" "GSM1501126_A1934-10.CEL.gz"
## [115] "GSM1501127_A1934-11.CEL.gz" "GSM1501128_A1934-12.CEL.gz"
## [117] "GSM1501129_A1934-13.CEL.gz" "GSM1501130_A1934-14.CEL.gz"
## [119] "GSM1501131_A1934-15.CEL.gz" "GSM1501132_A1934-16.CEL.gz"
## [121] "GSM1501133_A1970-36.CEL.gz" "GSM1501134_A1970-37.CEL.gz"
## [123] "GSM1501135_A1970-38.CEL.gz" "GSM1501136_A1970-39.CEL.gz"
## [125] "GSM1501137_A1970-40.CEL.gz" "GSM1501138_A1970-41.CEL.gz"
## [127] "GSM1501139_A1970-42.CEL.gz" "GSM1501140_A1970-43.CEL.gz"
## [129] "GSM1501141_A1970-44.CEL.gz" "GSM1501142_A1970-45.CEL.gz"
## [131] "GSM1501143_A1970-46.CEL.gz" "GSM1501144_A1970-47.CEL.gz"
## [133] "GSM1501145_A1970-48.CEL.gz" "GSM1501146_A1970-49.CEL.gz"
affyRaw <- read.celfiles( celFiles )
## Reading in : GSM1501013_A1970-01.CEL.gz
## Reading in : GSM1501014_A1891-01.CEL.gz
## Reading in : GSM1501015_A1891-03.CEL.gz
## Reading in : GSM1501016_A1359-01.CEL.gz
## Reading in : GSM1501017_A1891-04.CEL.gz
## Reading in : GSM1501018_A1970-02.CEL.gz
## Reading in : GSM1501019_A1891-05.CEL.gz
## Reading in : GSM1501020_A1359-02.CEL.gz
## Reading in : GSM1501021_A1891-06.CEL.gz
## Reading in : GSM1501022_A1891-07.CEL.gz
## Reading in : GSM1501023_A1891-08.CEL.gz
## Reading in : GSM1501024_A1359-03.CEL.gz
## Reading in : GSM1501025_A1970-04.CEL.gz
## Reading in : GSM1501026_A1970-05.CEL.gz
## Reading in : GSM1501027_A1970-06.CEL.gz
## Reading in : GSM1501028_A1359-04.CEL.gz
## Reading in : GSM1501029_A1891-09.CEL.gz
## Reading in : GSM1501030_A1970-07.CEL.gz
## Reading in : GSM1501031_A1891-10.CEL.gz
## Reading in : GSM1501032_A1359-05.CEL.gz
## Reading in : GSM1501033_A1970-08.CEL.gz
## Reading in : GSM1501034_A1891-11.CEL.gz
## Reading in : GSM1501035_A1970-09.CEL.gz
## Reading in : GSM1501036_A1970-10.CEL.gz
## Reading in : GSM1501037_A1970-11.CEL.gz
## Reading in : GSM1501038_A1891-12.CEL.gz
## Reading in : GSM1501039_A1970-12.CEL.gz
## Reading in : GSM1501040_A1891-13.CEL.gz
## Reading in : GSM1501041_A1970-13.CEL.gz
## Reading in : GSM1501042_A1359-07.CEL.gz
## Reading in : GSM1501043_A1891-14.CEL.gz
## Reading in : GSM1501044_A1970-14.CEL.gz
## Reading in : GSM1501045_A1359-08.CEL.gz
## Reading in : GSM1501046_A1359-09.CEL.gz
## Reading in : GSM1501047_A1359-10.CEL.gz
## Reading in : GSM1501048_A1970-15.CEL.gz
## Reading in : GSM1501049_A1970-16.CEL.gz
## Reading in : GSM1501050_A1970-17.CEL.gz
## Reading in : GSM1501051_A1970-18.CEL.gz
## Reading in : GSM1501052_A1970-19.CEL.gz
## Reading in : GSM1501053_A1891-15.CEL.gz
## Reading in : GSM1501054_A1359-12.CEL.gz
## Reading in : GSM1501055_A1970-20.CEL.gz
## Reading in : GSM1501056_A1359-13.CEL.gz
## Reading in : GSM1501057_A1359-14.CEL.gz
## Reading in : GSM1501058_A1359-15.CEL.gz
## Reading in : GSM1501059_A1359-16.CEL.gz
## Reading in : GSM1501060_A1359-17.CEL.gz
## Reading in : GSM1501061_A1970-21.CEL.gz
## Reading in : GSM1501062_A1359-18.CEL.gz
## Reading in : GSM1501063_A1891-17.CEL.gz
## Reading in : GSM1501064_A1970-23.CEL.gz
## Reading in : GSM1501065_A1970-24.CEL.gz
## Reading in : GSM1501066_A1970-25.CEL.gz
## Reading in : GSM1501067_A1891-18.CEL.gz
## Reading in : GSM1501068_A1359-20.CEL.gz
## Reading in : GSM1501069_A1359-21.CEL.gz
## Reading in : GSM1501070_A1970-26.CEL.gz
## Reading in : GSM1501071_A1359-22.CEL.gz
## Reading in : GSM1501072_A1359-23.CEL.gz
## Reading in : GSM1501073_A1359-24.CEL.gz
## Reading in : GSM1501074_A1649-01.CEL.gz
## Reading in : GSM1501075_A1359-25.CEL.gz
## Reading in : GSM1501076_A1359-26.CEL.gz
## Reading in : GSM1501077_A1891-19.CEL.gz
## Reading in : GSM1501078_A1891-20.CEL.gz
## Reading in : GSM1501079_A1970-28.CEL.gz
## Reading in : GSM1501080_A1359-27.CEL.gz
## Reading in : GSM1501081_A1359-29.CEL.gz
## Reading in : GSM1501082_A1359-30.CEL.gz
## Reading in : GSM1501083_A1891-21.CEL.gz
## Reading in : GSM1501084_A1359-32.CEL.gz
## Reading in : GSM1501085_A1359-33.CEL.gz
## Reading in : GSM1501086_A1359-34.CEL.gz
## Reading in : GSM1501087_A1359-35.CEL.gz
## Reading in : GSM1501088_A1359-36.CEL.gz
## Reading in : GSM1501089_A1359-37.CEL.gz
## Reading in : GSM1501090_A1359-38.CEL.gz
## Reading in : GSM1501091_A1359-39.CEL.gz
## Reading in : GSM1501092_A1359-40.CEL.gz
## Reading in : GSM1501093_A1891-22.CEL.gz
## Reading in : GSM1501094_A1359-41.CEL.gz
## Reading in : GSM1501095_A1359-44.CEL.gz
## Reading in : GSM1501096_A1359-45.CEL.gz
## Reading in : GSM1501097_A1359-46.CEL.gz
## Reading in : GSM1501098_A1359-49.CEL.gz
## Reading in : GSM1501099_A1970-29.CEL.gz
## Reading in : GSM1501100_A1970-30.CEL.gz
## Reading in : GSM1501101_A1649-05.CEL.gz
## Reading in : GSM1501102_A1970-31.CEL.gz
## Reading in : GSM1501103_A1359-54.CEL.gz
## Reading in : GSM1501104_A1649-08.CEL.gz
## Reading in : GSM1501105_A1359-55.CEL.gz
## Reading in : GSM1501106_A1649-06.CEL.gz
## Reading in : GSM1501107_A1891-24.CEL.gz
## Reading in : GSM1501108_A1970-32.CEL.gz
## Reading in : GSM1501109_A1934-01.CEL.gz
## Reading in : GSM1501110_A1359-57.CEL.gz
## Reading in : GSM1501111_A1359-59.CEL.gz
## Reading in : GSM1501112_A1934-02.CEL.gz
## Reading in : GSM1501113_A1970-33.CEL.gz
## Reading in : GSM1501114_A1359-61.CEL.gz
## Reading in : GSM1501115_A1359-62.CEL.gz
## Reading in : GSM1501116_A1359-63.CEL.gz
## Reading in : GSM1501117_A1359-64.CEL.gz
## Reading in : GSM1501118_A1970-34.CEL.gz
## Reading in : GSM1501119_A1934-04.CEL.gz
## Reading in : GSM1501120_A1934-05.CEL.gz
## Reading in : GSM1501121_A1934-06.CEL.gz
## Reading in : GSM1501122_A1934-07.CEL.gz
## Reading in : GSM1501123_A1934-08.CEL.gz
## Reading in : GSM1501124_A1934-09.CEL.gz
## Reading in : GSM1501125_A1970-35.CEL.gz
## Reading in : GSM1501126_A1934-10.CEL.gz
## Reading in : GSM1501127_A1934-11.CEL.gz
## Reading in : GSM1501128_A1934-12.CEL.gz
## Reading in : GSM1501129_A1934-13.CEL.gz
## Reading in : GSM1501130_A1934-14.CEL.gz
## Reading in : GSM1501131_A1934-15.CEL.gz
## Reading in : GSM1501132_A1934-16.CEL.gz
## Reading in : GSM1501133_A1970-36.CEL.gz
## Reading in : GSM1501134_A1970-37.CEL.gz
## Reading in : GSM1501135_A1970-38.CEL.gz
## Reading in : GSM1501136_A1970-39.CEL.gz
## Reading in : GSM1501137_A1970-40.CEL.gz
## Reading in : GSM1501138_A1970-41.CEL.gz
## Reading in : GSM1501139_A1970-42.CEL.gz
## Reading in : GSM1501140_A1970-43.CEL.gz
## Reading in : GSM1501141_A1970-44.CEL.gz
## Reading in : GSM1501142_A1970-45.CEL.gz
## Reading in : GSM1501143_A1970-46.CEL.gz
## Reading in : GSM1501144_A1970-47.CEL.gz
## Reading in : GSM1501145_A1970-48.CEL.gz
## Reading in : GSM1501146_A1970-49.CEL.gz
setwd(od)
eset <- rma(affyRaw)
## Background correcting
## Normalizing
## Calculating Expression
eset
## ExpressionSet (storageMode: lockedEnvironment)
## assayData: 33297 features, 134 samples
## element names: exprs
## protocolData
## rowNames: GSM1501013_A1970-01.CEL.gz GSM1501014_A1891-01.CEL.gz ...
## GSM1501146_A1970-49.CEL.gz (134 total)
## varLabels: exprs dates
## varMetadata: labelDescription channel
## phenoData
## rowNames: GSM1501013_A1970-01.CEL.gz GSM1501014_A1891-01.CEL.gz ...
## GSM1501146_A1970-49.CEL.gz (134 total)
## varLabels: index
## varMetadata: labelDescription channel
## featureData: none
## experimentData: use 'experimentData(object)'
## Annotation: pd.hugene.1.1.st.v1
save(eset,celFiles,file = "eset.Rdata")
library(GEOquery)
exp = exprs(eset) # 只有表达矩阵可以用,其他的不行。
save(gse_number,exp,pd,gpl_number,file = "step1output.Rdata")
O的K,有了表达矩阵,临床信息,gpl编号,后面的流程就可以搞起来了。