pacbio数据比对工具-blasr介绍-行业观点-衍因科技官网

pacbio数据比对工具-blasr介绍

admin 147 2025-02-05 10:24:44 编辑

三代测序如今越来越普及了，小编就介绍一下用来比对三代数据的工具blasr。类似于blast，但与他不同，因为blasr侧重于长序列整体比对了，并且容忍一定的错误率，而blast更测重于局部较严格的比对了。具体各个比对工具的进化史如下图：

废话不多说，开始介绍用法。

一、主要参数

1、输入测序数据文件

reads.fasta ：直接fasta文件，比较常用，也可以是下面格式的：

reads.bax.h5|reads.plx.h5

2、比对参数

-minMatch m (12)

Minimum seed length. Higher minMatch will speed up alignment, butdecrease sensitivity.

-maxMatch l (inf)

Stop mapping a read to the genome when the lcplength reaches l. This is useful when the query is part of the reference, forexample when constructing pairwise alignments for de novo assembly.

-maxLCPLength l (inf)

Thesame as -maxMatch.

-maxAnchorsPerPosition m (10000)

Do not add anchors from a position if itmatches to more than 'm' locations in the target.

-advanceExactMatches E (0)

Another trick for speeding up alignmentswith match - E fewer anchors. Rather than finding anchors between the read andthe genome at every position in the read, when an anchor is found at position iin a read of length L, the next position in a read to find an anchor is ati+L-E. Use this when alignining already assembled contigs.

-nCandidates n (10)

Keep up to 'n' candidates for the bestalignment. A large value of n will slowmapping because the slower dynamicprogramming steps are applied to more clusters of anchors which can be a ratelimiting step when reads are very long.

3、其他参数

-nproc N (1) CPU个数设置

-minPctIdentity p (0)：identity设置

-minReadLengthl(50)：比对最短的read长度要求

-minSubreadLength l(0)

Do not align subreads of lengthless than l.

-bestnn (10)：输出最佳的结果个数

-sam 输出sam格式文件

-clipping [none|hard|subread|soft] (none)

Use no/hard/subread/softclipping for SAM output.

-out out (terminal)：输出文件名字设置

-unaligned file：输出未比对的read

-mt 输出格式设置

If not printing SAM, modify the output of the alignment.

t=0 Print blast like output with |'s connecting matched nucleotides.

1 Print only a summary: score and pos.

2Print in Compare.xml format.

3Print in vulgar format (deprecated).

4Print a longer tabular version of the alignment.

5 Print in a machine-parsable format that isread by compareSequences.py.

二、用法

blasr reads genome.fasta [-options]

三、输出格式介绍

(a) blasr option: -m 0

blasr like human-readable output with |'sconnecting matched nucleotides.

(b) blasr option: -m 1

Space-delimited summary of alignmentscontaining 11 fields:

qName tName qStrand tStrand scorepercentSimilarity tStart tEnd tLength qStart qEnd qLength nCells

XML format.

(d) blasr option: -m 3

Vulgar format (deprecated).

(e) blasr option: -m 4

Space-delimited summary of alignmentscontaining 13 fields:

qName tName score percentSimilarity qStrandqStart qEnd qLength tStrand tStart tEnd tLength mapQV

(f) blasr option: -m 5

Space-delimited machine-parsable formatcontaining 19 fields:

qName qLength qStart qEnd qStrand tNametLength tStart tEnd tStrand score numMatch numMismatch numIns numDel mapQVqAlignedSeq matchPattern tAlignedSeq

(g) blasr option: -sam

SAM format. SAM 文件各标签介绍:

(1)"XS": 1 plus (first base of SEQ in 0 based coordinate of zmw unrolledpolymerase read), inclusive, where SEQ is SAM mandatory field column 10.

(2)"XE": 1 plus (last base of SEQ in 0 based coordinate of zmw unrolledpolymerase read), exclusive.

(3)"XL": number of aligned query bases

(4)"XQ": length of zmw unrolled polymerase read.

(5)"XT": number of continues reads, always 1 for blas

(6)"YS": first base of query subread in 0 based coordinate of zmwunrolled polymerase read, inclusive. movie/zmw/YS_YE

(7)"YE": last base of query subread in 0 based coordinate of zmw unrolledpolymerase read, exclusive.

(8)"ZM": zmw number.

标签：

NC重磅！CellChat：单细胞通讯分析工具！

624 2024-12-18

pacbio数据比对工具-blasr介绍

SMART蛋白质结构域注释的20年(附用法）

NC重磅！CellChat：单细胞通讯分析工具！

Cibersort基本原理及使用解析

推荐阅读

浙大团队开发出「三组分LNP」，可实现真正意义上的mRNA靶向递送，无外溢风险！

提高酶切反应特异性和效率的关键策略：加入酶切位点保护碱基

Nature、Cell连发的多个基因编辑新工具/技术，有何厉害之处？

Science重磅 | Recode开发肺部干细胞SORT-LNP，介导长达1.8年的基因编辑效果！

国内首个mRNA-LNP团体标准征求意见稿！涉及细胞治疗～

Science重磅团队再次发文～SORT LNP递送siRNA，靶向肝外器官沉默基因！

利用mRNA 瞬时构建 TCR-T 细胞，治疗晚期难治性高 MSI 结直肠癌患者！

过表达GSNOR增强线粒体活性，从而增强 CAR-T 细胞干性和抗肿瘤功能！

克隆 PCR 产物

聚合酶链式反应

热门文章

如何利用时间序列分析工具实现数据可视化与异常检测的完美结合，实时分析将引领数据科学的未来趋势

如何通过科研用人工智能工具与技术创新提升科研管理效率？

如何通过智能科研工具包提升数据分析效率与准确性，AI技术又将如何改变科研的未来？

探索人工智能工具如何在科研中提升效率并引领未来科学计算的趋势

利用AI技术提升科研实验结果可视化效果，探索科学绘图的未来趋势与挑战

如何通过智能科研工具提升研究效率与数据分析助力科研创新

深度学习与生物医药AI大模型如何重塑药物研发的未来？

单酶切和双酶切的区别，了解这两者的特点

科研样品管理，生物医药科研的基石

掌握DNA重组技术，如何提升转化效率以成功构建高效重组蛋白？

最新文章

什么是LITMUS38，了解其独特魅力

探索基因克隆与实验室服务的结合，质粒构建的成本与效率如何提升？

如何通过实验设计与数据分析提升管理信息库的效率，探索关键数据分析技巧

掌握质粒构建实验流程图，如何通过DNA重组提升细胞培养效率？

如何通过电子学习提升实验室管理效率与数据整理能力，电子实验记录本模板设计助力实验报告撰写

如何通过生物信息学与数据分析提升高通量测序效率并利用在线蛋白质翻译工具实现精准序列比对

在科研管理中如何有效利用智能工具提升数据分析能力并优化科研效率

生物信息学如何推动高效蛋白质翻译平台的研究与应用

如何利用实时蛋白翻译网站提升生物信息学研究的效率与创新

科研领域如何利用机器学习与大数据技术提升数据分析效率？

热门标签