GWAS,SNP,和疾病

三种方法如何获取snp信息

引用: http://www.bio-info-trainee.com/2100.html#more-2100

有研究表明STAT4上的rs7574865和HLA-DQ的 rs9275319是人群中乙型肝炎病毒(HBV)相关肝细胞癌(HCC)遗传易感基因

意思是,某两个位点变异导致乙型肝炎病毒和相关肝细胞癌发生的关键原因。rsID分别代表两个变异位点 (发现变异位点后通过vep/snpEFF对变异位点进行的注释)。所以根据rsID能够找到这个位点在基因组的位置。可以用dnSNP来查看rsID的基因坐标。

方法一:
下载All_20160601.vcf.gz 这个文件(很大数据):

mkdir -p ~/annotation/variation/human/dbSNP
cd ~/annotation/variation/human/dbSNP
## https://www.ncbi.nlm.nih.gov/projects/SNP/
## ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606_b147_GRCh38p2/
## ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606_b147_GRCh37p13/
nohup wget ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606_b147_GRCh37p13/VCF/All_20160601.vcf.gz &
wget ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606_b147_GRCh37p13/VCF/All_20160601.vcf.gz.tbi

运行的时候有报错:No such directory ‘snp/organisms/human_9606_b147_GRCh37p13/VCF’.

方法二:
也可以登录网页版本数据库,直接修改 url(小量搜索):
https://www.ncbi.nlm.nih.gov/SNP/snp_ref.cgi?rs=7574865
https://www.ncbi.nlm.nih.gov/SNP/snp_ref.cgi?rs=rs9275319

方法三:
SNPedia,直接修改url (优点,搜集了非常多的其它数据库的链接)
https://www.snpedia.com/index.php/Rs7574865
https://www.snpedia.com/index.php/Rs9275319


拓展:如何进行GWAS分析

方法一:
plink进行分析
这里是plink的官网:https://www.cog-genomics.org/plink2/
plink做SNP筛选和GWAS
plink进行GWAS分析

方法二:
R包分析 (绘制曼哈顿图)
Postgwas: Advanced GWAS Interpretation in R
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0071775


如何call SNP and indels

参考: http://blog.sina.com.cn/s/blog_83f77c940102w2eb.html


如何SNP过滤

引用: http://blog.sina.com.cn/s/blog_83f77c940102w2eg.html

  1. 缺失比例 (Missing rates)
    GENO>0.05

Shortly we will apply more stringent criteria, such that GENO > 0.05. In this case, 0.05*89 = 4.45 samples, meaning that if a SNP is missing in 4.45 more more samples, that SNP will be removed from the dataset.

89是全部sample数,89xGENO得到的阀值是4.45,所以某个call的SNP在4样品(或以下)里没有出现,保留;在5个样本以上没出现则删掉。

  1. 最小等位基因频率 (Minor Allele frequencies)
    提示: MAF< 0.03 如果SNP较多可以设置为MAF<0.05

MAF is the Minor Allele Frequency. It can be used to exclude SNPs which are not informative because they show little variation in the sample set being analyzed. For instance, if a SNP shows variation in only 1 of the 89 individuals, it is not useful statistically and should be removed.

意思是,如果某一个SNP只出现在很少数样品(< MAF x Total Number of samples)的时候,就需要移除

  1. Removing SNPs out of Hardy-Weinberg equilibrium(p-value > 10^6 - 10^4 ) 哈迪温伯格平衡

Population genetic theory suggests that under ‘normal’ conditions, there is a predictable relationship between allele frequencies and genotype frequencies. In cases where the genotype distribution is different from what one would expect based on the allele frequencies, one potential explanation for this is genotyping error. Natural selection is another explanation. For this reason, we typically check for deviation from Hardy-Weinberg equilibrium in the controls for a case- control study. For a quantitative trait, PLINK just uses everyone. The following command generates p-values for deviation from HWE for each SNP. Low p-values indicate that a SNP is out of HWE.

  1. 由vcf文件进行SNP过滤
    运用vcftools转换为plink的输入形式,输出 bed文件 (或者map文件),然后作为输入进行过滤
vcftools --vcf my.vcf --plink --out plink

plink --noweb --file plink --geno 0.05 --maf 0.05 --hwe 0.0001 --make-bed --out QC

如果还不知道什么是GWAS?什么是SNP?这里是定义:

引用: http://www.biotrainee.com:8080/thread-1487-1-1.html
Genome-wide association studies (GWAS) 是指在人類全基因組範圍內利用存在的序列變異,即單核苷酸多型性(SNP),並從中篩選出與疾病相關的SNPs。

  • 哪些疾病与SNP有关呢?
    近些年,全基因组关联分析方法(Genome-Wide Association Study,简称GWAS)利用大群体和高密度SNP(Single Nucleotide Polymorphism,单核苷酸多态)分子标记已经定位到了上千个与复杂疾病关联的SNP位点,而且这些关联信号在多次试验中有很高的可重复性。比如人类常见疾病肥胖,糖尿病,精神分裂等。
  • SNP的误差因素?
    由于随机采样带来到抽样误差(这在现实中无法避免)以及SNP之间复杂的连锁不平衡(linkage disequilibrium, 简称LD),GWAS定位到的SNP位点通常不是致病位点。

2016年发表在PLOS-one上的文章,介绍SNP与骨关节炎。
虽然不是很牛的杂志,但是文章质量很好。

Functional Characterization of the Osteoarthritis Susceptibility Mapping to CHST11—A Bioinformatics and Molecular Study

根据标题可以知道,是对Osteoarthritis疾病的研究,针对的目标基因是CHST11,Carbohydrate sulfotransferase 11 is an enzyme that in humans is encoded by the CHST11 糖-磺基转移酶 (不知道具体翻译,请(生)化学大神指教)。基因位置 是 chr12: 104,455,295-104,762,014 (GRCh38)。CHST11的功能研究,英国剑桥的桑格研究所有做过该基因敲除的小鼠,Chst11^tm1a(KOMP)Wtsi 。这个基因主要与骨头和软骨的表型phenotyping有关系。小鼠的表型研究里发现异常:Homozygous viability at P14

2012年柳叶刀里也有文章说这个基因突变会导致,骨关节炎,这个杂志就不用说有多厉害了。

Identification of new susceptibility loci for osteoarthritis (arcOGEN): a genome-wide association

接下来分别看一下这两篇文章,和这个基因,以及这个基因的SNP,以及对其功能分析上的研究与阐述。

(一) 骨关节炎的背景:

什么是OA?

(1)Osteoarthritis (OA) is a common disease of older individuals that is characterized by the focal(病灶点) loss of articular cartilage. This loss usually occurs gradually over many years and typically results in chronic pain and severely impaired joint function by the sixth or seventh decade of life.

(2)Osteoarthritis is the most common form of arthritis worldwide and is a major cause of pain and disability in elderly people.

genetics上OA的特点?

(1)OA is polygenic and unlike many other common arthritic diseases, there are no OA risk- conferring loci of large singular impact
(2)It is a complex disease of the musculoskeletal system with both genetic and environmental risk factors. From the results of heritability studies in twins, sibling pairs, and families, genetic factors are estimated to account for about 50% of the risk of developing osteoarthritis in the hip or knee, although precise estimates vary according to sex, affected site, and severity of disease.

(二)研究方法:

(1)偏重功能分析

  • Identification of SNPs in LD with rs835487
  • Identification of Sequences Homologous to the Enhancer in Non-Human Mammals
  • Cloning of pGL3-Promoter Luciferase Reporter Plasmids
  • Transfection of Cell Lines
  • Electrophoretic Mobility Shift Assays (EMSAs)
  • Ethics Statement, Cartilage Collection and Nucleic Acid Extraction
  • Gene Expression, Genotyping and AEI Analysis
  • Chondrogenic Differentiation of MSCs

(2)偏重分析

  • We undertook a large genome-wide association study (GWAS) in 7,410 unrelated and retrospectively and prospectively selected patients with severe osteoarthritis in the arcOGEN study, 80% of whom had undergone total joint replacement, and 11,009 unrelated controls from the UK. We replicated the most promising signals in an independent set of up to 7,473 cases and 42,938 controls, from studies in Iceland, Estonia, the Netherlands, and the UK. All patients and controls were of European descent.

(三)结论

(1)rs835487 (allele G; THR) located within intron two of CHST11 is associated with hip OA

©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 203,098评论 5 476
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 85,213评论 2 380
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 149,960评论 0 336
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 54,519评论 1 273
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 63,512评论 5 364
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 48,533评论 1 281
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 37,914评论 3 395
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 36,574评论 0 256
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 40,804评论 1 296
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 35,563评论 2 319
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 37,644评论 1 329
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 33,350评论 4 318
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 38,933评论 3 307
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 29,908评论 0 19
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 31,146评论 1 259
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 42,847评论 2 349
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 42,361评论 2 342

推荐阅读更多精彩内容