转录组工具文献介绍

声明:以下内容转载自360图书馆
/>前端大法好,网页内容随意复制</
一、比对工具
(Kim et al., 2015) HISAT: a fast spliced aligner with low memory requirements. Nature methods.

Aligns RNA-seq reads to a reference genome using uncompressed suffix arrays. STAR has a potential for accurately aligning long (several kilobases) reads that are emerging from the third-generation sequencing technologies.

(Dobin et al., 2013) STAR: ultrafast universal RNA-seq aligner. Bioinformatics.

Self-training Algorithm for Splice Junction Detection using RNA-seq.

(Li et al., 2013) TrueSight: a new algorithm for splice junction detection using RNA-seq. Nucleic acids research.

A toolkit for processing next-gen sequencing data. These programs were also implemented in Bioconductor R package Rsubread.

(Liao et al., 2013) The Subread aligner: fast, accurate and scalable read mapping by seed-and-vote. Nucleic acids research.

(Rogers et al., 2012) SpliceGrapher: detecting patterns of alternative splicing from RNA-Seq data in the context of gene models and EST data. Genome biology.

(Philippe et al., 2013) CRAC: an integrated approach to the analysis of RNA-seq reads. Genome biology.

A fast splice junction mapper for RNA-Seq reads. TopHat aligns RNA-Seq reads to mammalian-sized genomes using the high-throughput short read aligner Bowtie, and then analyzes the mapping results to identify splice junctions between exons.

(Kim et al., 2013) TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol.

(Chu et al., 2015) SpliceJumper: a classification-based approach for calling splicing junctions from RNA-seq data. BMC bioinformatics.

(Srivastava et al., 2016) RapMap: a rapid, sensitive and accurate tool for mapping RNA-seq reads to transcriptomes. Bioinformatics.

A framework for genome-based transcript reconstruction and quantification. CIDANEis engineered to not only assembly RNA-seq reads ab initio, but to also make use of the growing annotation of known splice sites, transcription start and end sites, or even full-length transcripts, available for most model organisms. To some extent, CIDANEis able to recover splice junctions that are invisible to existing bioinformatics tools.

(Canzar et al., 2016) CIDANE: comprehensive isoform discovery and abundance estimation. Genome biology.

An open source tool for accurate genome-guided transcriptome assembly from RNA-seq reads based on the model of splice graph. An extension of our program CLASS, CLASS2 jointly optimizes read patterns and the number of supporting reads to score and prioritize transcripts, implemented in a novel, scalable and efficient dynamic programming algorithm.

(Song et al., 2016) CLASS2: accurate and efficient splice variant annotation from RNA-seq reads. Nucleic acids research.

二、Read数统计
An RNA-seq read counting tool which builds upon the speed of featureCounts and implements the counting modes of HTSeq. VERSE is more than 30x faster than HTSeq when computing the same gene counts. VERSE also supports a hierarchical assignment scheme, which allows reads to be assigned uniquely and sequentially to different types of features according to user-defined priorities. It is built on top of featureCounts.

(Zhu et al., 2016) VERSE: a versatile and efficient RNA-Seq read counting tool. bioRxiv.

A tool for RNA-Seq data analysis that counts for each gene how many aligned reads overlap its exons.

(Anders et al., 2013) Count-based differential expression analysis of RNA sequencing data using R and Bioconductor. Nature protocols.

A package that provides efficient low-level and highly reusable S4 classes for storing ranges of integers, RLE vectors (Run-Length Encoding) and, more generally, data that can be organized sequentially (formally defined as Vector objects), as well as views on these Vector objects. IRanges provides also efficient list-like classes for storing big collections of instances of the basic classes. All classes in the package use consistent naming and share the same rich and consistent Vector APIas much as possible.

(Lawrence et al., 2013) Software for computing and annotating genomic ranges. PLoS computational biology.

A read summarization program, which counts mapped reads for the genomic features such as genes and exons.

(Liao et., 2013) featureCounts: an efficient general-purpose program for assigning sequence reads to genomic features. Bioinformatics

三、定量
A fast and highly efficient assembler of RNA-Seq alignments into potential transcripts. It is primarily a genome-guided transcriptome assembler, although it can borrow algorithmic techniques from de novo genome assembly to help with transcript assembly. Its input can include not only the spliced read alignments used by reference-based assemblers, but also longer contigs that were assembled de novo from unambiguous, non-branching parts of a transcript.

(Pertea et al., 2015) StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nature biotechnology.

A computational approach that measures changes in mature RNA and pre-mRNA reads across different experimental conditions to quantify transcriptional and post-transcriptional regulation of gene expression. EISA reveals both transcriptional and post-transcriptional contributions to expression changes, increasing the amount of information that can be gained from RNA-seq data sets.

(Gaidatzis et al., 2015) Analysis of intronic and exonic reads in RNA-seq data characterizes transcriptional and post-transcriptional regulation. Nature biotechnology.

Assembles transcripts, estimates their abundances, and tests for differential expression and regulation in RNA-Seq samples.

(Trapnell et al., 2010) Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nature biotechnology.

A method for transcriptome reconstruction that relies solely on RNA-Seq reads and an assembled genome to build a transcriptome ab initio. The statistical methods to estimate read coverage significance are also applicable to other sequencing data. Scripture also has modules for ChIP-Seq peak calling.

(Guttman et al., 2010) Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincRNAs. Nature biotechnology

Accurate quantification of transcriptome from RNA-Seq data by effective length normalization.

(Lee et al., 2011) Accurate quantification of transcriptome from RNA-Seq data by effective length normalization. Nucleic acids research.

An integrated alignment workflow and a simple counting-based approach to derive estimates for gene, exon and exon-exon junction expression. In contrast to previous counting-based approaches, EQP takes into account only reads whose alignment pattern agrees with the splicing pattern of the features of interest. This leads to improved gene expression estimates as well as to the generation of exon counts that allow disambiguating reads between overlapping exons.

(Schuierer and Roma, 2016) The exon quantification pipeline (EQP): a comprehensive approach to the quantification of gene, exon and junction expression from RNA-seq data. Nucleic acids research.

It was designed as a user friendly solution to extract and annotate biologically important transcripts from next generation RNA sequencing data.

(Forster et al., 2013) RNA-eXpress annotates novel transcript features in RNA-seq data. Bioinformatics.

A versatile model to account for sequence specific bias that commonly occurs at the ends of fragments. Isolotar analyzes RNA-Seq experiments using a simple Bayesian hierarchical model. Combined with aggressive bias correction, it produces estimates that are simultaneously accurate and show high agreement between samples. Isolator is uniquely able to compute posterior probabilities corresponding to arbitrarily complex questions, within the confines of the model.

(Jones et al., 2016) Isolator: accurate and stable analysis of isoform-level expression in RNA-Seq experiments. bioRxiv.

四、标准化与差异表达
A method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates. This enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression.

(Love et al., 2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome biology

A software package designed to facilitate flexible differential expression analysis of RNA-Seq data. Ballgown can also be used to visualize the transcript assembly on a gene-by-gene basis, extract abundance estimates for exons, introns, transcripts or genes, and perform linear model–based differential expression analyses.

(Frazee et al., 2015) Ballgown bridges the gap between transcriptome assembly and expression analysis. Nature biotechnology.

A package to dampen the effect of outliers on count-based differential expression analyses. edgeR uses empirical Bayes estimation and exact tests based on the negative binomial distribution and is useful for differential signal analysis with other types of genome-scale count data. It requires a delicate tradeoff to maintain high power while at the same time achieving a decent resistance to the presence of outliers. In particular, it is difficult to know exactly what an outlier is and where the line should be drawn to identify it as such.

(Zhou et al., 2014) Robustly detecting differential expression in RNA sequencing data using observation weights. Nucleic acids research

A differential transcript expression (DTE) analysis algorithm. SDEAPestimates the number of conditions directly from the input samples using a Dirichlet mixture model and discovers alternative splicing events using a new graph modular decomposition algorithm. By taking advantage of the above technical improvement, SDEAP was able to outperform the other DTE analysis methods in extensive experiments on simulated data and real data with qPCR validation. The prediction of SDEAP also allows users to classify the samples of cancer subtypes and cell-cycle phases more accurately.

(Yang and Jiang, 2016) SDEAP: a splice graph based differential transcript expression analysis tool for population data. Bioinformatics

Enables rapid interpretation of complex gene expression studies as well as other high-throughput genomics assays. variancePartition is a statistical and visualization framework, used to prioritize drivers of variation based on a genome-wide summary, and identify genes that deviate from the genome-wide trend. This tool quantifies variation in each expression trait attributable to differences in disease status, sex, cell or tissue type, ancestry, genetic background, experimental stimulus, or technical variables.

(Hoffman and Schadt, 2016) variancePartition: interpreting drivers of variation in complex gene expression studies. BMC BIoinformatics.

A realistic framework to assess the impact of the key components of the statistical framework for differential analyses of RNA-seq data. This tool is based on real data sets and allows the exploration of various scenarios differing in the proportion of non-differentially expressed genes. Hence, it provides an evaluation of the key ingredients of the differential analysis, free of the biases associated with the simulation of data using parametric models.

(Rigaill et al., 2016) Synthetic data sets for the identification of key ingredients for RNA-seq differential analysis. Briefings in Bioinformatics.

Detects differentially expressed (DE) genes for RNA-seq data with high level of hetergeniety such as cancer RNA-seq data. ELTSeq is an empirical likelihood ratio test (ELT) with a mean-variance relationship constraint for the differential expression analysis of RNA sequencing (RNA-seq). As a distribution-free nonparametric model, ELTSeq handles individual heterogeneity by estimating an empirical probability for each observation without making any assumption about read-count distribution. It also incorporates a constraint for the read-count overdispersion, which is widely observed in RNA-seq data. ELTSeq demonstrates a significant improvement over existing methods such as edgeR, DESeq, t-tests, Wilcoxon tests and the classic empirical likelihood-ratio test when handling heterogeneous groups. It will significantly advance the transcriptomics studies of cancers and other complex disease

(Xu and Chen, 2016) An empirical likelihood ratio test robust to individual heterogeneity for differential expression analysis of RNA-seq. Briefings in Bioinformatics.

A package for detecting the differentially expressed (DE) genes in time course RNA-Seq data. The negative binomial mixed-effect model (NBMM) method is applied to gene expression data on a gene-by-gene basis. A parallel computing option is implemented in timeSeq package to speed up the computing process. We showed that our approach outperforms other currently available methods in both synthetic and real data.

(Sun et al., 2016) Statistical inference for time course RNA-Seq data using a negative binomial mixed-effect model. BMC Bioinformatics.

A method for facilitating DE analysis using RNA-seq read count data with multiple treatment conditions. The read count is assumed to follow a log-linear model incorporating two factors (i.e., condition and gene), where an interaction term is used to quantify the association between gene and condition. The number of the degrees of freedom is reduced to one through the first order decomposition of the interaction, leading to a dramatically power improvement in testing DE genes when the number of conditions is greater than two.

(Kang et al., 2016) multiDE: a dimension reduced model based statistical method for differential expression analysis using RNA-sequencing data with multiple treatment conditions. BMC bioinformatics.

(Jia et al., 2015) MetaDiff: differential isoform expression analysis using random-effects meta-regression. BMC bioinformatics.

Provides a data-driven solution to test the assumptions of global normalization methods. Group level information about each sample (such as tumor/normal status) must be provided because the test assesses if there are global differences in the distributions between the user-defined groups.

(Hicks and Irizarry, 2015) quantro: a data-driven approach to guide the choice of an appropriate normalization method. Genome biology.

A Bayesian hierarchical approach to investigate within-sample and between-sample variations in RNA-Seq data.

(Gu et al., 2014) BADGE: A novel Bayesian model for accurate abundance quantification and differential analysis of RNA-Seq data. BMC bioinformatics.

An algorithm that estimates expression at transcript-level resolution and controls for variability evident across replicate libraries.

(Trapnell et al., 2013) Differential analysis of gene regulation at transcript resolution with RNA-seq. Nature biotechnology.

(Li et al., 2012) Normalization, testing, and false discovery rate estimation for RNA-sequencing data. Biostatistics.

A package to identify differentially expressed genes or isoforms for RNA-seq data from different samples. DEGseq also encourage users to export gene expression values in a table format which could be directly processed by edgeR (Robinson, 2009), an R package implementing the method based on negative binominal distribution to model overdispersion relative to Poisson for digital gene expression data with small replicates (Robinson and Smyth, 2007)

(Wang et al., 2010) DEGseq: an R package for identifying differentially expressed genes from RNA-seq data. Bioinformatics.

五、基因融合

An enhanced version with the ability to align reads across fusion points, which results from the breakage and re-joining of two different chromosomes, or from rearrangements within a chromosome.

(Kim and Salzberg, 2011) TopHat-Fusion: an algorithm for discovery of novel fusion transcripts. Genome biology.

A python package to annotate and visualize gene fusions. For a given gene fusion, AGFusion will predict the cDNA, CDS, and protein sequences resulting from fusion of all combinations of transcripts and save them to fasta files. AGFusion can also plot the protein domain architecture of the fusion transcripts.

(Murphy and Elemento, 2016) AGFusion: annotate and visualize gene fusions. bioRxiv.

A toolkit for fusion gene and chimeric transcript detection from RNA-seq data. InFusion is a computational method for the discovery of chimeric transcripts from RNA-seq data capable of detecting alternatively spliced chimeric transcripts and fusion genes involving non-coding regions. InFusion allows detection of fusions that involve intergenic regions, analyses and filters putative fusion events based on coverage depth, genomic context and strand specificity.

(Okonechnikov et al., 2016) InFusion: Advancing Discovery of Fusion Genes and Chimeric Transcripts from Deep RNA-Sequencing Data. PLoS One.

六、可变剪接
(Reuter et al., 2016) PreTIS: A Tool to Predict Non-canonical 5’ UTR Translational Initiation Sites in Human and Mouse. Plos Computational Biology.

(Afsari et al., 2016) Splice Expression Variation Analysis (SEVA) for Differential Gene Isoform Usage in Cancer. bioRxiv.

The DEXseq method is implemented as an open Bioconductor package, which facilitates data visualization and exploration. It can detect with high sensitivity genes, and in many cases exons, that are subject to differential exon usage.

(Anders et al., 2012) Detecting differential usage of exons from RNA-seq data. Genome research.

(Liu et al., 2012) Detection, annotation and visualization of alternative splicing from RNA-Seq data with SplicingViewer. Genomics.

(Ryan et al., 2012) SpliceSeq: a resource for analysis and visualization of RNA-Seq data on alternative splicing and its functional impacts. Bioinformatics.

Alternative Splicing transcriptional landscape visualization tool.

(Foissac and Sammeth, 2007) ASTALAVISTA: dynamic and flexible analysis of alternative splicing events in custom gene datasets. Nucleic acids research.

六、等位基因
(Deonovic et al., 2016)IDP-ASE: haplotyping and quantifying allele-specific expression at the gene and gene isoform level by hybrid sequencing. Nucleic Acids Research.

(Soderlund et al., 2014) Allele workbench: transcriptome pipeline and interactive graphics for allele-specific expression. PloS one

(Romanel et al., 2015) ASEQ: fast allele-specific studies from next-generation sequencing data. BMC medical genomics.

(Nariai et al., 2015) A Bayesian approach for estimating allele-specific expression from RNA-Seq data with diploid genomes. BMC genomics.

©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 202,905评论 5 476
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 85,140评论 2 379
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 149,791评论 0 335
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 54,483评论 1 273
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 63,476评论 5 364
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 48,516评论 1 281
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 37,905评论 3 395
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 36,560评论 0 256
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 40,778评论 1 296
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 35,557评论 2 319
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 37,635评论 1 329
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 33,338评论 4 318
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 38,925评论 3 307
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 29,898评论 0 19
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 31,142评论 1 259
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 42,818评论 2 349
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 42,347评论 2 342

推荐阅读更多精彩内容

  • rljs by sennchi Timeline of History Part One The Cognitiv...
    sennchi阅读 7,279评论 0 10
  • **2014真题Directions:Read the following text. Choose the be...
    又是夜半惊坐起阅读 9,351评论 0 23
  • By clicking to agree to this Schedule 2, which is hereby ...
    qaz0622阅读 1,423评论 0 2
  • 滴答滴答的雨声 似在诉说心的真诚 忽远忽近的悠扬中 迈开舞姿掀起袅袅凉风 这是有多么惬意呢 看那窗外的枝条还在傻傻...
    抹茶味与向日葵阅读 261评论 0 0
  • 绘画分享之宝莲灯,今天画的是动画片宝莲灯里的沉香和他的妈妈三圣母快乐时光。这个故事讲述了天宫中的三圣母爱上了人间书...
    芃芃5200阅读 2,499评论 0 3