Annovar注释细节说明(一)

Annovar注释结果中Func.refGeneWithVer类型

annovar注释结果中,一些列中的内容需要认真研究,如下是常用的信息:

Value Default precedence Explanation Sequence Ontology
exonic 1 variant overlaps a coding exon_variant (SO:0001791)
splicing 1 variant is within 2-bp of a splicing junction (use -splicing_threshold to change this) splicing_variant (SO:0001568)
ncRNA 2 variant overlaps a transcript without coding annotation in the gene definition (see Notes below for more explanation) non_coding_transcript_variant (SO:0001619)
UTR5 3 variant overlaps a 5' untranslated region 5_prime_UTR_variant (SO:0001623)
UTR3 3 variant overlaps a 3' untranslated region 3_prime_UTR_variant (SO:0001624)
intronic 4 variant overlaps an intron intron_variant (SO:0001627)
upstream 5 variant overlaps 1-kb region upstream of transcription start site upstream_gene_variant (SO:0001631)
downstream 5 variant overlaps 1-kb region downtream of transcription end site (use -neargene to change this) downstream_gene_variant (SO:0001632)
intergenic 6 variant is in intergenic region intergenic_variant (SO:0001628)
注释的优先级如下:

The value of the first column takes the following precedence (as of December 2010 and later version of ANNOVAR): exonic = splicing > ncRNA> > UTR5/UTR3 > intron > upstream/downstream > intergenic. The precedence defined above is used to decide what function to print out when a variant fit multiple functional categories.
If the users want to have all functional consequences printed out (rather than just the most important one defined by the precedence above), the --separate argument should be used. In this case, several output lines may be present for each variant, representing several possible functional consequences.
默认情况下,会根据上面的优先级只显示一个注释类型,也就是说注释结果文件中每一行只是一个位点的一种变异类型。如果想要把多个变异类型都显示出来,加上 --separate 参数,那么结果文件中就是好几行都表示同一个突变位点的几个不同变异类型。
另外,这个优先级顺序是可以自己修改的,用 -precedence 参数

每一个注释类型的详细解释:

(1) the "exonic" here refers only to coding exonic portion , but not UTR portion, as there are two keywords (UTR5, UTR3) that are specifically reserved for UTR annotations. "exonic" 只是编码区的外显子,不包括UTR区域。
(2) "splicing" in ANNOVAR is defined as variant that is within 2-bp away from an exon/intron boundary by default, but the threshold can be changed by the --splicing_threshold argument. Before Feb 2013, if "exonic,splicing" is shown, it means that this is a variant within exon but close to exon/intron boundary; this behavior is due to historical reason, when a user requested that exonic variants near splicing sites be annotated with splicing as well. However, I continue to get user emails complaining about this behavior despite my best efforts to put explanation in the ANNOVAR website with details. Therefore, starting from Feb 2013 , "splicing" only refers to the 2bp in the intron that is close to an exon, and if you want to have the same behavior as before, add -exonicsplicing argument. "splicing"是内含子中的,靠近剪接点 2bp的序列,这个2bp可以设置其他值。此外,-exonicsplicing参数可以回到以前注释版本。
(3) If a variant is located in both 5' UTR and 3' UTR region (possibly for two different genes), then the "UTR5,UTR3" will be printed as the output.如果注释为"UTR5,UTR3",表示该位点可能同时在2个不同基因中,一个是UTR5,另一个是UTR3区域。
(4) The term "upstream" and "downstream" is defined as 1-kb away from transcription start site or transcription end site, respectively, taking in account of the strand of the mRNA; the --neargene threshold can be used to adjust this threshold."upstream" and "downstream"指的是转录起始位点和转录终止位点上游/下游 1kb的区间内。
(5) Technical Notes: ncRNA above refers to RNA without coding annotation. It does not mean that this is a RNA that will never be translated; it merely means that the user-selected gene annotation system was not able to give a coding sequence annotation. It could still code protein products and may have such annotations in future versions of gene annotation or in another gene annotation system. For example, BC039000 is regarded as ncRNA by ANNOVAR when using UCSC Known Gene annotation, but it is regarded as a protein-coding gene by ANNOVAR when using ENSEMBL annotation. If the goal of the user is to find known (well-annotated) microRNA or other known (well-annotated) non-coding RNA, then the region-based annotation should be used and the wgRNA track should be selected. Read instructions here.需要指出的是,"ncRNA"并不是说这个RNA是 non-conding,而是当前的注释系统中没有该RNA coding的注释信息,如果用其他注释系统就有可能注释为coding。
(6) Technical Notes: if the first codon of a transcript is deleted, it will be reported as wholegene deletion by ANNOVAR because the gene cannot be translated.
(7) If a variant is located in both downstream and upstream region (possibly for 2 different genes), then the "upstream,downstream" will be printed as the output. In 2011 June version of ANNOVAR, the splicing annotation is improved. If the splicing site is in intron, then all isoforms and the corresponding base change will be printed. For example:

splicing SMS(NM_004595:c.447+2T>G) X 21895357 21895357 T G hetero 8 15
splicing DMD(NM_004011:c.48+1A>C) X 31803228 31803228 T G homo 117 30
splicing BAGE(NM_001187:c.14+1A>G),BAGE4(NM_181704:c.14+1A>G),BAGE5(NM_182484:c.14+1A>G) 21 10120594 10120594 T C hetero 66 53

(8)如果转录本的第一个密码子发生了deletion,则annovar会注释为整个基因都deletion。
Technical Notes: if the first codon of a transcript is deleted, it will be reported as wholegene deletion by ANNOVAR because the gene cannot be translated.

下面用一个例子说明:

image.png

SNP1 is an intergenic variant, as it is >1kb away from any gene, 离两边的基因大于1kb距离
SNP2 is a downstream variant, as it is 1kb from the 3'end of the NADK gene; 注意转录方向
SNP3 is a UTR3 variant; 图中外显子和UTR都是蓝色柱子,但是UTR的柱子低一点
SNP4 is an intronic variant;
SNP5 is an exonic variant.
deletion的情况与SNP的一样:
Deletion 1 is an intergenic variant;
deletion 2 is a downstream variant;
deletion3 is a UTR3 variant;
deletion 4 overlaps both with UTR3 and intron, and based on the precedence rule, it is a UTR3 variant; 同时注释到UTR3和intron中,但是根据优先级会默认只保留UTR3的注释
deletion 5 is an intronic variant;
deletion6 overlaps with both an exon and an intron, and based on the precedence rule, it is an exonic variant.根据优先级,注释保留了exonic。

关于文件中基因名称的确定:

(1)annovar根据数据库(such as RefSeq, UCSC Gene and Ensembl Gene)中定义的名称注释基因名称,这些数据库中的名称一般是用户提供的;
(2)对于一些复杂的情况:
①如果一个基因同时注释到 coding and non-coding (multiple transcripts, some coding, some non-coding),则默认注释是 coding;
② If a gene or a transcript has one or several non-coding definitions but without coding definition, it will be regarded as ncRNA in annotation output.
③ If a transcript maps to multiple locations as "coding transcripts", but some with complete ORF, some without complete ORF (that is, with premature stop codon), then the ones without complete ORF will be ignored. 如果转录本比对到多个 conding transcripts,但是存在没有完整ORF的,默认忽略没有完整ORF的注释;
④ If a transcript maps to multiple locations, all as "coding transcripts", but none has a complete ORF, then this transcript will not be used in exonic_variant_function annotation and the corresponding annotation will be marked as "UNKNOWN".
⑤ NEW in July 2014: If a transcript maps to multiple genomic locations, all mapping wil be used in the annotation process. Previously, only the "most likely" mapping will be used in annotation.

原文路径:
http://annovar.openbioinformatics.org/en/latest/user-guide/gene/

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 199,440评论 5 467
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 83,814评论 2 376
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 146,427评论 0 330
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 53,710评论 1 270
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 62,625评论 5 359
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 48,014评论 1 275
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 37,511评论 3 390
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 36,162评论 0 254
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 40,311评论 1 294
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 35,262评论 2 317
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 37,278评论 1 328
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 32,989评论 3 316
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 38,583评论 3 303
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 29,664评论 0 19
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 30,904评论 1 255
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 42,274评论 2 345
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 41,856评论 2 339