MAKER2基因注释流程

admin 14 2025-02-09 编辑

一、基因注释软件发展

1. The first-generation of genome projects benefited greatly from large bodies of pre-existing knowledge  For D. melanoga-ster, C. elegans, and Homo sapiens for example, hundreds of published gene models already existed  train and optimize gene prediction and annotation tools for each genome.

2. Second-generation projects rarely have access to such information.  limits their ability to train ab initio gene-finders,  

3. No objective standards with which to gauge annotation accuracy.Quality control is thus a significant issue for these projects;

二、MAKER1 

三、MAKER2

1、MAKER2  that includes the three gene prediction tools, Augustus, GeneMark-ES and SNAP . 

2、 MAKER2 use Annotation Edit Distance (AED) to evaluate the quality of genome annotations,identify and prioritize problematic annotations 

四、特点

1、Improve ab initio gene-predictors

Using the same incorrect parameter files for comparison.MAKER2 can substantially improve the performance of ab initio gene-predictors in situations where training data may be of poor quality.

2、MAKER2 vs SNAP

(b) The Pfam domain content of SNAP produced ab initio predictions compared to MAKER2-SNAP gene annotations for the L. humile(阿根廷蚁 )genome. 

(c) The Pfam domain content of SNAP ab initio gene predictions and MAKER2-SNAP annotations in the S. mediterranea genome.

3、预测基因数目:

真实基因数目:Approximately 15,000 genes: S. mediter-ranea ;approximately 17,000 : L.Humile 

SNAP 单独预测数目:  63,622 and 420,224 gene predictions 

MAKER2’s supervised SNAP-based预测数目:13,785 gene annotations for L. humile and 17,883 for S.mediterranea 

五、评价指标AED

1.Given a gene prediction i and a reference j, 

  敏感性:SN = |i∩j|/|j|; 

  特异性:SP =|i∩j|/|i|;

2.Experimental evidence ( RNAseq,EST and protein homology datasets )as approximate the reference. SN = |i∩j|/|j|, the value |i∩j| represents the number of nucleotides in a gene prediction overlapped by experimental evidence, and |j| represents the total base pair count for experimental evidence in that cluster. 

3.Because we are not comparing to a high quality reference, it is more correct to refer to the average of sensitivity and specificity as the congruency rather than accuracy;  C =(SN+SP)/2. 

4.The incongruency, or distance between i and j, then becomes D = 1-C, with a value of 0 indicating complete agreement of an annotation to the evidence,and values at or near 1 indicating disagreement or noevidence support.

5、特点:

1) AED is similar to the sensitivity and specificity measures used to judge gene-finder performance, but it differs in that no reference gene-model is used. 

2) AED measures the distance between two annotations  and it makes no assumptions as to which one is the more correct. In MAKER2 as a means to quantify the congruency between a gene annotation and its supporting evidence - EST, protein, and mRNA-seq

alignments 

六、Domain与AED 相关性

七、orthology 与AED相关性

八、其他应用。

Re-annotation of existing genomes and legacy annotations

Gold curve: AED distribution of high-quality ‘gold standard’ annotations 

Red curve: 5a.59 

Blue curve: MAKER2’s first pass, de novo annotations 

Purple curve: automatic MAKER2-based update/revision of the Maize 4a.53 

九、运行时间

十、结论

1.MAKER2 improve the performance of ab initio gene-predictors gene-finders when training data are poorly 

2.By aligning evidence from ESTs, mRNA-seq, and pro-

tein homology, MAKER2 also provides a experimental data to new and existing annotation datasets for purposes of quality control, 

3. To update and revise legacy annotation datasets automatically.

十一、参考文献

Holt, C., & Yandell, M. (2011). MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects. BMC bioinformatics, 12(1), 1.

MAKER2基因注释流程

上一篇: 质粒构建工具推荐,实验室必备的分子克隆利器
下一篇: 山羊相关文章解读
相关文章