新的转录组组装方法-行业观点

新的转录组组装方法

admin 176 2025-02-08 14:41:44 编辑

HISAT, StringTie and Ballgown（一）

今天给大家分享一下nature protocol上8月11号刚发的转录组分析新流程HISAT, StringTie and Ballgown（查看原文可以看到）。至此tophat-cufflinks可以说拜拜了。

首先了解一下几个工具作用;

HISAT:比对工具了，类似于昨天讲的tophat2；

Stringtie：组装与定量工具。

Ballgown：为差异表达计算工具

主要流程图

具体步骤

1、创建index

首先利用下面脚本提取剪接信息（有参考GFF前提下，没有忽略此步）：

$ extract_splice_sites.py chrX_data/genes/chrX.gtf >chrX.ss

$ extract_exons.py chrX_data/genes/chrX.gtf >chrX.exon

然后构建HISAT2 index:

$ hisat2-build --ss chrX.ss --exon chrX.exon chrX_data/genome/chrX.fa chrX_tran

The --ss and --exon options（没有步可以不写）。indexing requires 9 GB of RAM for chromosome X and 160 GB for the whole human genome. The amount of memory is much smaller if one omits annotation information. Indexing chromosome X using one CPU core takes <10 min. It should take ~2 h to build an index for the whole human genome using eight CPU cores.

2、开始比对

各样本分别比对参考基因组

$ hisat2 -p 8 --dta -x chrX_data/indexes/chrX_tran -1

chrX_data/samples/ERR188044_chrX_1.fastq.gz -2

chrX_data/samples/ERR188044_chrX_2.fastq.gz -S ERR188044_chrX.sam

$ hisat2 -p 8 --dta -x chrX_data/indexes/chrX_tran -1

chrX_data/samples/ERR188104_chrX_1.fastq.gz -2

chrX_data/samples/ERR188104_chrX_2.fastq.gz -S ERR188104_chrX.sam

将SAM 转换为BAM:

$ samtools sort -@ 8 -o ERR188044_chrX.bam ERR188044_chrX.sam

$ samtools sort -@ 8 -o ERR188104_chrX.bam ERR188104_chrX.sam

3、组装转录本

$ stringtie -p 8 -G chrX_data/genes/chrX.gtf -o

ERR188044_chrX.gtf –l ERR188044 ERR188044_chrX.bam

$ stringtie -p 8 -G chrX_data/genes/chrX.gtf -o

ERR188104_chrX.gtf –l ERR188104 ERR188104_chrX.bam

4、合并各个样本

$ stringtie --merge -p 8 -G chrX_data/genes/chrX.gtf -o stringtie_merged.gtf

chrX_data/mergelist.txt

chrX_data/mergelist.txt：各个gtf路径放在里面。

5、估计表达丰度

$ stringtie –e –B -p 8 -G stringtie_merged.gtf -o

ballgown/ERR188044/ERR188044_chrX.gtf ERR188044_chrX.bam

$ stringtie –e –B -p 8 -G stringtie_merged.gtf -o

ballgown/ERR188104/ERR188104_chrX.gtf ERR188104_chrX.bam

6、加载 Ballgown R包

$ R

R version 3.2.2 (2015-08-14) -- "Fire Safety"

Platform: x86_64-apple-darwin13.4.0 (64-bit)

>library(ballgown)

>library(RSkittleBrewer)

>library(genefilter)

>library(dplyr)

>library(devtools)

7、加载表型数据.

An example file called geuvadis_phenodata.csv is included with the data files for this protocol (ChrX_data). In general, you will have to create this file yourself. It contains information about your RNA-seq samples, formatted as illustrated in this csv (comma-separated values) file.

>pheno_data = read.csv("geuvadis_phenodata.csv")

8、加载表达丰度文件其来源于stingtie

To do this, we use the ballgown command with the following three parameters: the directory in which the data are stored (dataDir, which here is named simply ‘Ballgown’), a pattern that appears in

the sample names (samplePattern) and the phenotypic information that we loaded in the previous step (pData). Note that once a Ballgown object is created, any other Bioconductor32 package can be applied for data analysis

or data visualization.

>bg_chrX = ballgown(dataDir = "ballgown", samplePattern = "ERR", pData=pheno_data)

9、过滤低表达基因。

>bg_chrX_filt = subset(bg_chrX,"rowVars(texpr(bg_chrX)) >1",genomesubset=TRUE)

10、鉴定差异转录本

>results_transcripts = stattest(bg_chrX_filt,feature="transcript",covariate="sex",adjustvars =c("population"), getFC=TRUE, meas="FPKM")

11、鉴定差异基因

>results_genes = stattest(bg_chrX_filt, feature="gene",covariate="sex", adjustvars = c("population"), getFC=TRUE,meas="FPKM")

后面还有一些步骤，欢迎关注微信号，明天继续！

欢迎关注

更多实验咨询关注

NC重磅！CellChat：单细胞通讯分析工具！

823 2024-12-18

新的转录组组装方法

SMART蛋白质结构域注释的20年(附用法）

2025基因组设计软件实测指南：AI大模型驱动效率跃迁

NC重磅！CellChat：单细胞通讯分析工具！

推荐阅读

浙大团队开发出「三组分LNP」，可实现真正意义上的mRNA靶向递送，无外溢风险！

提高酶切反应特异性和效率的关键策略：加入酶切位点保护碱基

Nature、Cell连发的多个基因编辑新工具/技术，有何厉害之处？

Science重磅 | Recode开发肺部干细胞SORT-LNP，介导长达1.8年的基因编辑效果！

国内首个mRNA-LNP团体标准征求意见稿！涉及细胞治疗～

Science重磅团队再次发文～SORT LNP递送siRNA，靶向肝外器官沉默基因！

利用mRNA 瞬时构建 TCR-T 细胞，治疗晚期难治性高 MSI 结直肠癌患者！

过表达GSNOR增强线粒体活性，从而增强 CAR-T 细胞干性和抗肿瘤功能！

克隆 PCR 产物

聚合酶链式反应

热门文章

如何利用时间序列分析工具实现数据可视化与异常检测的完美结合，实时分析将引领数据科学的未来趋势

如何通过科研用人工智能工具与技术创新提升科研管理效率？

如何通过智能科研工具包提升数据分析效率与准确性，AI技术又将如何改变科研的未来？

探索人工智能工具如何在科研中提升效率并引领未来科学计算的趋势

利用AI技术提升科研实验结果可视化效果，探索科学绘图的未来趋势与挑战

单酶切和双酶切的区别，了解这两者的特点

如何通过智能科研工具提升研究效率与数据分析助力科研创新

SMART蛋白质结构域注释的20年(附用法）

深度学习与生物医药AI大模型如何重塑药物研发的未来？

如何在NCBI上查找基因的CDS序列

最新文章

序列编辑器的核心定义与多领域应用场景

序列编辑工具的核心功能与应用场景都有哪些？

序列比对工具的核心作用有哪些？

序列分析工具：分类、应用场景与全流程使用指南

DNA 序列注释工具：分类、作用与使用指南

基因序列数据库的核心类型与代表平台

基因序列的编码：从实验操作到技术应用的完整指南

基因组重复序列的生物学意义有哪些？

基因序列检测：解锁精准医疗与健康管理的核心技术

全序列基因检测：定义、临床应用与核心技术解析

热门标签