EndHic
想比较HiC-Pro,EndHic的安装就简单很多,就是下载即可用
EndHiC的安装
git clone git@github.com:fanagislab/EndHiC.git
要用到的脚本都在文件夹下,直接调用就行
怎么使用呢?不得不说一下,github上面写的简直潦草~~
还不如直接看他给出的实例中的脚本来得直接
EndHiC的使用
给出的实例脚本
$ cat biosoft/EndHiC/z.testing_data/Arabidopsis_thalina/work.sh
##Atha.contigs.fa is generated by Hifiasm
##AthaHiC_100000_abs.bed, AthaHiC_100000.matrix, AthaHiC_100000_iced.matrix are generated by HiC-pro using Atha.contigs.fa as the reference genome
gzip -d Atha.contigs.fa.gz
##get contig length
perl ../../fastaDeal.pl -attr id:len Atha.contigs.fa > Atha.contigs.fa.len
##draw contig Hi-C heatmaps with 10*100000 (1-Mb) resolution
../../matrix2heatmap.py AthaHiC_100000_abs.bed AthaHiC_100000.matrix 10
##Run one round, when the contig assembly is quite good
perl ../../endhic.pl Atha.contigs.fa.len AthaHiC_100000_abs.bed AthaHiC_100000.matrix AthaHiC_100000_iced.matrix
ln Round_A.04.summary_and_merging_results/z.EndHiC.A.results.summary.cluster* ./
##convert cluster file to agp file
perl ../../cluster2agp.pl Round_A.04.summary_and_merging_results/z.EndHiC.A.results.summary.cluster Atha.contigs.fa.len > Atha.scaffolds.agp
##get final scaffold sequence file
perl ../../agp2fasta.pl Atha.scaffolds.agp Atha.contigs.fa > Atha.scaffolds.fa
##draw HiC heatmaps for scaffolds with 10*100000 (1-Mb) resolution
../../cluster2bed.pl AthaHiC_100000_abs.bed z.EndHiC.A.results.summary.cluster > clusterA_100000_abs.bed 2> clusterA.id.len
../../matrix2heatmap.py clusterA_100000_abs.bed AthaHiC_100000.matrix 10
##Here, Arabidopsis thalina has 5 chromosomes, and all these chromosomes can be successfully scaffolded by EndHiC
使用的数据就是我们上一步HiC-Pro输出的数据:
改良后的脚本
contig=/share/home/off/Work/Genome_assembly/Assembly/contig.fa ##contig文件,一定要和HiC-Pro中的contig保持一致
endhic_dir=/share/home/off_wenhao/biosoft/EndHiC ##EndHiC的安装路径
name=dlo ##物种名称,也要和HiC-Pro设置的保持一致,也是就是hic-pro的输出文件夹`**_outdir_new`
##get contig length
perl ${endhic_dir}/fastaDeal.pl -attr id:len ${contig} > contigs.fa.len
##draw contig Hi-C heatmaps with 10*100000 (1-Mb) resolution
hic_pro_dir=/share/home/off/Work/Genome_assembly/Assembly/08.EndHiC/01.hicprp/${name}_outdir_new/hic_results/matrix/${name}
${endhic_dir}/matrix2heatmap.py ${hic_pro_dir}/raw/100000/${name}_100000_abs.bed ${hic_pro_dir}/raw/100000/${name}_100000.matrix 10
##Run one round, when the contig assembly is quite good
perl ${endhic_dir}/endhic.pl contigs.fa.len ${hic_pro_dir}/raw/100000/${name}_100000_abs.bed ${hic_pro_dir}/raw/100000/${name}_100000.matrix ${hic_pro_dir}/iced/100000/${name}_100000_iced.matrix
ln Round_A.04.summary_and_merging_results/z.EndHiC.A.results.summary.cluster* ./
##convert cluster file to agp file
perl ${endhic_dir}/cluster2agp.pl Round_A.04.summary_and_merging_results/z.EndHiC.A.results.summary.cluster contigs.fa.len > scaffolds.agp
##get final scaffold sequence file
perl ${endhic_dir}/agp2fasta.pl scaffolds.agp ${contig} > ${name}.scaffolds.fa
##draw HiC heatmaps for scaffolds with 10*100000 (1-Mb) resolution
${endhic_dir}/cluster2bed.pl ${hic_pro_dir}/raw/100000/${name}_100000_abs.bed Round_A.04.summary_and_merging_results/z.EndHiC.A.results.summary.cluster > clusterA_100000_abs.bed 2> clusterA.id.len
${endhic_dir}/matrix2heatmap.py clusterA_100000_abs.bed ${hic_pro_dir}/raw/100000/${name}_100000.matrix 10
结果
clusterA.id.len
clusterA_100000_abs.bed
clusterA_100000_abs.bed.pdf
endhic.100000.10.iced.sh
endhic.100000.20.iced.sh
endhic.100000.5.iced.sh
endhic.100000.10.raw.sh
endhic.100000.20.raw.sh
endhic.100000.5.raw.sh
endhic.100000.15.raw.sh
endhic.100000.25.raw.sh
endhic.Round_A.sh
endhic.100000.15.iced.sh
endhic.100000.25.iced.sh
endhic.log
EndHic.sh
dlo.scaffolds.fa
Round_A.01.contig_end_contact_results/
Round_A.02.GFA_contig_graph_results/
Round_A.03.cluster_order_orient_results/
Round_A.04.summary_and_merging_results/
scaffolds.agp
contigs.fa.len
z.EndHiC.A.results.summary.cluster
z.EndHiC.A.results.summary.cluster.GFA.v1.2.GFA
z.EndHiC.A.results.summary.cluster.GFA
文件很多,但是我们真正需要的就只有scaffolds.agp
和prefix.scaffolds.fa
两个,一个是scaffold文件,一个是map文件。