融合检测之FACTERA

FACTERA:https://factera.stanford.edu/download.php

FACTERA (Fusion And Chromosomal Translocation Enumeration and Recovery Algorithm) is a tool for detection of genomic fusions in paired-end targeted (or genome-wide) sequencing data.

Command

perl factera.pl [options] tumor.bam exons.bed hg19.2bit [optional: targets.bed]
主程序是perl脚本,可以自行修改一些内容,使检测到的融合更多。

Input

tumor.bam should consist of paired-end reads aligned by a mapping algorithm capable of soft-clipping, such as BWA. The BAM file does not need to be realigned or deduped, but should be position-sorted and have a corresponding index file (bam.bai created using SAMtools index) in the same directory in order to estimate the total sequencing depth in the neighborhood of each detected fusion.

exons.bed contains chromosomal coordinates (such as exon boundaries) in 3-column BED format (chr start end). The fourth column contains gene names, exon names, or any arbitrary identifier, and will be used to group corresponding coordinates. This allows the resolution of fusion detection to be restricted to inter-gene or inter-exon fusions, for example. Make sure to use coordinates from the same genome version as the 2bit reference sequence required in the third argument.

Users can download this exons.bed file, which combines hg19 RefSeq and Gencodev17 exon coordinates (downloaded from UCSC 02-23-14) with corresponding HUGO gene symbols in column 4. With this file, FACTERA will identify inter-gene fusions. To identify inter-exon fusions in hg19, use this exons.bed file.

hg19.2bit is a 2 bit encoded human reference genome, used for fast genome subsequence retrieval. Of note, FACTERA is not restricted to human sequences, and any 2bit reference genome can be used as long as coordinates in exons.bed are consistent. To create a 2bit file for a genome of interest, download the FASTA to 2BIT conversion tool from the appropriate system folder (<u>faToTwoBit</u>) and follow these <u>instructions</u>.

targets.bedis optional and allows the user to restrict the FACTERA search to genomic regions of interest, such as those targeted by a sequencing capture library. Format is a standard 3-column BED (chr start end). The use of a targets.bed file can greatly improve running time when only a subset of sequenced regions is known to be relevant for fusion detection.

Output

Each FACTERA run produces 9 main output files, each of which is described below:

parameters.txt = all input files and parameter values.

discordantpair.depth.txt = ranked list of discordant read clusters.

disordantpair.details.txt = discordant read positions.

fusiontargets.bed = bed coordinates of candidate fusions – used to restrict search space for soft-clipped reads.

blastreads.fa = used to build blast database of soft-clipped, improperly paired, and unmapped reads.

blastquery.fa = file used to search individual candidate fusion sequences (query) for hits in blastreads.fa (target database).

fusionseqs.fa = all detected breakpoints with 500bp of additional flanking sequences.

fusions.bed = bed output for detected fusions. Useful for comparing runs or somatic vs germline (column 4 is fusion ID).

fusions.txt = all detected fusion events, including details, described below:

Field Description
Est_Type Estimated structural variant type: TRA = translocation; INV = inversion; DEL = deletion; '-' = not determined
Region1 Name of genomic region closest to breakpoint 1 (e.g., gene 1, exon 1, etc.)
Region2 Name of genomic region closest to breakpoint 2 (e.g., gene 2, exon 2, etc.)
Break1 Chromosomal breakpoint 1
Break2 Chromosomal breakpoint 2
Break_support1 Number of reads supporting breakpoint 1
Break_support2 Number of reads supporting breakpoint 2
Break_Offset Breakpoint adjustment in bases (e.g., owing to microhomology)
Order1 Orientation of read clipping with respect to breakpoint 1: CN, clipped followed by not clipped; NC, vice versa
Order2 Same as Order1, but for breakpoint 2
Break_depth Number of breakpoint-spanning reads
Proper_pair_support Number of properly paired and previously soft-clipped reads that map to fusion
Unmapped_support Number of previously unmapped reads that map to fusion
Improper_pair_support Number of previously discordantly paired reads that map to fusion
Paired_end_depth Total number of paired-end reads that flank breakpoint
Total_depth Mean total depth for regions flanking both breakpoints (+/-500bp by default)
Fusion_seq Estimated fusion sequence (50 bases flanking breakpoint by default)
Non-templated_seq Non-templated (i.e., non-reference) sequence segment (if any) enclosed in brackets

Requirements

Unix operating system (Linux, Mac OS X, etc.)
Perl 5, with the following external dependency: Statistics::Descriptive.
To install Statistics::Descriptive from CPAN, issue the following command:
sudo cpan Statistics::Descriptive
Other Perl dependencies are included in the Perl 5 Core Modules and should already be installed: IPC::Open3, List::Util, File::Spec, Symbol, Getopt::Std, File::Basename.
twoBitToFa
Find and download executable from the appropriate system folder, then copy/link/move to PATH (i.e., /usr/bin).
hg19.2bit to run FACTERA on the hg19 human genome.
Note that hg38.2bit is now available. To use another reference genome, make sure that input BED coordinates are consistent (the exons.bed file provided here is currently hg19 only).
blast+
After downloading, find blastn and makeblastdb in ncbi-blast-version/bin and copy/link/move to PATH (i.e., /usr/bin).
SAMtools
After downloading, find samtools and copy/link/move to PATH (i.e., /usr/bin).

Options (defaults): 描述
-o Output directory (tumor.bam directory).
-r <int> Minimum number of breakpoint-spanning reads needed for output (5).
-m <int> Minimum number of discordant reads needed for a candidate fusion (2).
-x <int> Maximum number of breakpoints to examine for any given pair of genomic regions (5).
-s <int> Minimum number of reads with the same breakpoint (1).
-f <0-1> Minimum fraction of read bases required for alignment to fusion template (0.9).
-S <0-1> Minimum similarity required for alignment of read to fusion template (0.95).
-k <int> k-mer size for fragment comparison (10 bases).
-c <int> Minimum size of soft-clipped region to consider (16 bases).
-b <int> Number of bases flanking breakpoint for fusion template (500).
-p <int> Number of threads for blastn search (4; 10 or more recommended).
-a <int> Number of bases flanking breakpoint to provide in output (50).
-e Disable grouping of input coordinates by column 4 of exons.bed (off).
-v Disable verbose output (off).
-t Disable running time output (off).
-C Disable addition of 'chr' prefix to chromosome names (off)***
-F Force remake of BLAST database for a particular input (off).
Required if 'chr' is absent from all input files, including reference.2bit.

FAQ

1.Which aligners are supported?

Answer: FACTERA was developed and optimized using targeted sequencing data aligned by bwa aln, and we currently recommend that users employ bwa aln for best performance. While FACTERA can be applied to data mapped by bwa mem, users should be aware of the following considerations when interpreting results. The most notable difference between bwa aln and mem with respect to fusion detection is the use of hard clipping in addition to soft clipping by bwa mem. Absent from bwa aln, hard clipping enables bwa mem to improve the mapping rate by realigning (rather than truncating) sufficiently long read segments in chimeric sequences. In contrast, bwa aln will truncate such reads without realignment (soft-clipping), and FACTERA leverages soft clipped, but not hard clipped, reads for breakpoint detection. Hard clipped reads will be supported in a future release of FACTERA, and we will notify registered users when this version is available.

2.According to the paper, FACTERA has high specificity. Why does FACTERA report some fusions that appear to be false positives?

Answer: False positive calls may arise from mapping artifacts (due to repeat sequences), PCR template switching, and other sequencing errors, and are increasingly difficult to avoid as the sequencing space grows in size and complexity. While we have implemented a variety of post-processing algorithms to reduce the false positive rate compared to previous methods (paper), the elimination of all fusions with repetitive content would risk discarding genuine events. We therefore recommend that users inspect the FACTERA output for possible false positives by using BLAT and the UCSC human genome browser. This is particularly important when using FACTERA to analyze exome or genome-scale datasets. In cases where paired normal datasets are available, we recommend leveraging this information to reduce the FPR. Finally, we would welcome suggestions from users on how to best discriminate real fusions in repeat regions from sequencing artifacts. Please send us your feedback/suggestions along with fusion results that you suspect are not real. This will help us to compile a blacklist of poorly behaving genomic regions that might be useful as a post-processing filter.

Reference

Aaron M. Newman, Scott V. Bratman, Henning Stehr, Luke J. Lee, Chih Long Liu, Maximilian Diehn* and Ash A. Alizadeh* (2014) FACTERA: a practical method for the discovery of genomic rearrangements at breakpoint resolution, Bioinformatics DOI: 10.1093/bioinformatics/btu549.

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 202,905评论 5 476
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 85,140评论 2 379
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 149,791评论 0 335
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 54,483评论 1 273
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 63,476评论 5 364
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 48,516评论 1 281
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 37,905评论 3 395
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 36,560评论 0 256
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 40,778评论 1 296
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 35,557评论 2 319
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 37,635评论 1 329
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 33,338评论 4 318
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 38,925评论 3 307
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 29,898评论 0 19
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 31,142评论 1 259
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 42,818评论 2 349
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 42,347评论 2 342

推荐阅读更多精彩内容