pacbio数据比对工具-blasr介绍

admin 3 2025-02-05 编辑

三代测序如今越来越普及了,小编就介绍一下用来比对三代数据的工具blasr。类似于blast,但与他不同,因为blasr侧重于长序列整体比对了,并且容忍一定的错误率,而blast更测重于局部较严格的比对了。具体各个比对工具的进化史如下图:

 废话不多说,开始介绍用法。

一、主要参数

1、输入测序数据文件

  reads.fasta :直接fasta文件,比较常用,也可以是下面格式的:

  reads.bax.h5|reads.plx.h5

2、比对参数

  -minMatch m (12)

   Minimum seed length.  Higher minMatch will speed up alignment, butdecrease sensitivity.

  -maxMatch l (inf)

    Stop mapping a read to the genome when the lcplength reaches l. This is useful when the query is part of the reference, forexample when constructing pairwise alignments for de novo assembly.

  -maxLCPLength l (inf)

    Thesame as -maxMatch.

  -maxAnchorsPerPosition m (10000)

   Do not add anchors from a position if itmatches to more than 'm' locations in the target.

  -advanceExactMatches E (0)

Another trick for speeding up alignmentswith match - E fewer anchors. Rather than finding anchors between the read andthe genome at every position in the read, when an anchor is found at position iin a read of length L, the next position in a read to find an anchor is ati+L-E. Use this when alignining already assembled contigs.

  -nCandidates n (10)

     Keep up to 'n' candidates for the bestalignment.  A large value of n will slowmapping    because the slower dynamicprogramming steps are applied to more clusters of anchors which can be a ratelimiting step when reads are very long.

  3、其他参数

    -nproc N (1) CPU个数设置

    -minPctIdentity p (0):identity设置

    -minReadLengthl(50):比对最短的read长度要求

    -minSubreadLength l(0)

               Do not align subreads of lengthless than l.

   -bestnn (10):输出最佳的结果个数

   -sam        输出sam格式文件

  -clipping [none|hard|subread|soft] (none)

               Use no/hard/subread/softclipping for SAM output.

  -out out (terminal):输出文件名字设置

  -unaligned file:输出未比对的read

   -mt   输出格式设置

  If not printing SAM, modify the output of the alignment.

  t=0 Print blast like output with |'s connecting matched nucleotides.

 1 Print only a summary: score and pos.

  2Print in Compare.xml format.

  3Print in vulgar format (deprecated).

  4Print a longer tabular version of the alignment.

  5 Print in a machine-parsable format that isread by compareSequences.py.

二、用法

blasr reads genome.fasta [-options]

三、输出格式介绍

(a) blasr option: -m 0

blasr like human-readable output with |'sconnecting matched nucleotides.

(b) blasr option: -m 1

Space-delimited summary of alignmentscontaining 11 fields:

qName tName qStrand tStrand scorepercentSimilarity tStart tEnd tLength qStart qEnd qLength nCells

(c) blasr option: -m 2

XML format.

(d) blasr option: -m 3

Vulgar format (deprecated).

(e) blasr option: -m 4

Space-delimited summary of alignmentscontaining 13 fields:

qName tName score percentSimilarity qStrandqStart qEnd qLength tStrand tStart tEnd tLength mapQV

(f) blasr option: -m 5

Space-delimited machine-parsable formatcontaining 19 fields:

qName qLength qStart qEnd qStrand tNametLength tStart tEnd tStrand score numMatch numMismatch numIns numDel mapQVqAlignedSeq matchPattern tAlignedSeq

(g) blasr option: -sam

SAM format. SAM 文件各标签介绍:

  (1)"XS": 1 plus (first base of SEQ in 0 based coordinate of zmw unrolledpolymerase read), inclusive, where SEQ is SAM mandatory field column 10.

  (2)"XE": 1 plus (last base of SEQ in 0 based coordinate of zmw unrolledpolymerase read), exclusive.

  (3)"XL": number of aligned query bases

  (4)"XQ": length of zmw unrolled polymerase read.

  (5)"XT": number of continues reads, always 1 for blas

  (6)"YS": first base of query subread in 0 based coordinate of zmwunrolled polymerase read, inclusive. movie/zmw/YS_YE

  (7)"YE": last base of query subread in 0 based coordinate of zmw unrolledpolymerase read, exclusive.

  (8)"ZM": zmw number.

 

pacbio数据比对工具-blasr介绍

上一篇: 质粒构建工具推荐,实验室必备的分子克隆利器
下一篇: 单细胞基因调控网路思路发7+
相关文章