Last Updated: 07/26/2019
Cupcake是一个三代测序后续分析软件的集合,可以
cDNA_Cupcake is a miscellaneous collection of Python and R scripts used for analyzing sequencing data. Most of the scripts only require Biopython. For scripts that require additional libraries, it will be specified in documentation.
https://github.com/Magdoll/cDNA_Cupcake
Current version: 8.2
发现一个不错的介绍
https://github.com/Magdoll/cDNA_Cupcake/wiki#refgmap
- 首先通过git拉包
git clone https://github.com/Magdoll/cDNA_Cupcake.git
出现错误
(base) [jing@localhost ~]$ git clone https://github.com/Magdoll/cDNA_Cupcake.git
Cloning into 'cDNA_Cupcake'...
error: RPC failed; curl 56 OpenSSL SSL_read: SSL_ERROR_SYSCALL, errno 104
fatal: the remote end hung up unexpectedly
查询得解答:
使用git clone error: RPC failed
#Solution:
#修改Git的传输字节限制即可。
git config --global http.postBuffer 524288000
运行以上代码后,正常下载了
这步骤比较慢,,14:20-15:05,断线,重新上, 大约60分钟
(base) [jing@localhost ~]$ git clone https://github.com/Magdoll/cDNA_Cupcake.git
Cloning into 'cDNA_Cupcake'...
remote: Enumerating objects: 164, done.
remote: Counting objects: 100% (164/164), done.
remote: Compressing objects: 100% (115/115), done.
Receiving objects: 18% (301/1615), 9.45 MiB | 49.00 KiB/s
运行以下:
export PATH=$PATH:/home/jing/cDNA_Cupcake/sequence/
export PATH=$PATH:/home/jing/cDNA_Cupcake/rarefaction/
改为自己的路径
- 装Cupcake ToFU
因为: The only exception is Cupcake ToFU, which does require compiling and installation.
https://github.com/Magdoll/cDNA_Cupcake/wiki/Cupcake-ToFU%3A-supporting-scripts-for-Iso-Seq-after-clustering-step
下载下来之后,
cd cDNA_Cupcake
python setup.py build
python setup.py install
报错
缺啥安啥
conda install numpy
yum search zlib
install之后重新运行安装
继续yum search gcc
install
还是不行
试一下
yum install gcc libffi-devel python-devel openssl-devel
还是不行
装了一堆,,还是不行。。。。。。有装好的告诉我下怎么装好么?
<meta charset="utf-8">
What to do after Iso Seq Cluster?https://github.com/PacificBiosciences/IsoSeq_SA3nUP/wiki/What-to-do-after-Iso-Seq-Cluster%3F
Cupcake ToFU 能做什么?
在经过cluster步骤之后,我们应该已经获得了高质量isoforms(HQ isoform sequences.),满足以下条件:
- 所得序列为全长(包含5‘UTR,序列中包含polyA)
- 高质量(predicted accuracy by default is >= 99%)
- 有至少2个全长序列支持(subreads?)
独白:可能用不着那么高质量的reads,也可以挖掘很多有用的信息
这写高质量isoforms中,依旧存在冗余序列(isoforms),因此前步骤产出的序列,并不能真正代表样品中的所有unique isoforms。有两个原因:
Clustering algorithm tradeoff between sensitivity and specificity.
Natural 5' degradation in RNA.
所以,下面需要做的步骤有Best practice for aligning Iso Seq to reference genome: minimap2, GMAP, STAR, BLAT
Collapse identical isoforms to obtain final set of unique, full-length, high-quality isoforms
Fusion finding -- tutorial to come soon
Cupcake TOFU 可以做第 (2), (3), and (5)步