一、基因注释软件发展
1. The first-generation of genome projects benefited greatly from large bodies of pre-existing knowledge For D. melanoga-ster, C. elegans, and Homo sapiens for example, hundreds of published gene models already existed train and optimize gene prediction and annotation tools for each genome.
2. Second-generation projects rarely have access to such information. limits their ability to train ab initio gene-finders,
3. No objective standards with which to gauge annotation accuracy.Quality control is thus a significant issue for these projects;
![MAKER2基因注释流程](https://www.yanyin.tech/cms/manage/file/4f90ebd61543415ba313f0de20484636)
二、MAKER1
三、MAKER2
1、MAKER2 that includes the three gene prediction tools, Augustus, GeneMark-ES and SNAP .
2、 MAKER2 use Annotation Edit Distance (AED) to evaluate the quality of genome annotations,identify and prioritize problematic annotations
四、特点
1、Improve ab initio gene-predictors
Using the same incorrect parameter files for comparison.MAKER2 can substantially improve the performance of ab initio gene-predictors in situations where training data may be of poor quality.
2、MAKER2 vs SNAP
(b) The Pfam domain content of SNAP produced ab initio predictions compared to MAKER2-SNAP gene annotations for the L. humile(阿根廷蚁 )genome.
(c) The Pfam domain content of SNAP ab initio gene predictions and MAKER2-SNAP annotations in the S. mediterranea genome.
3、预测基因数目:
真实基因数目:Approximately 15,000 genes: S. mediter-ranea ;approximately 17,000 : L.Humile
SNAP 单独预测数目: 63,622 and 420,224 gene predictions
MAKER2’s supervised SNAP-based预测数目:13,785 gene annotations for L. humile and 17,883 for S.mediterranea
五、评价指标AED
1.Given a gene prediction i and a reference j,
敏感性:SN = |i∩j|/|j|;
特异性:SP =|i∩j|/|i|;
2.Experimental evidence ( RNAseq,EST and protein homology datasets )as approximate the reference. SN = |i∩j|/|j|, the value |i∩j| represents the number of nucleotides in a gene prediction overlapped by experimental evidence, and |j| represents the total base pair count for experimental evidence in that cluster.
3.Because we are not comparing to a high quality reference, it is more correct to refer to the average of sensitivity and specificity as the congruency rather than accuracy; C =(SN+SP)/2.
4.The incongruency, or distance between i and j, then becomes D = 1-C, with a value of 0 indicating complete agreement of an annotation to the evidence,and values at or near 1 indicating disagreement or noevidence support.
5、特点:
1) AED is similar to the sensitivity and specificity measures used to judge gene-finder performance, but it differs in that no reference gene-model is used.
2) AED measures the distance between two annotations and it makes no assumptions as to which one is the more correct. In MAKER2 as a means to quantify the congruency between a gene annotation and its supporting evidence - EST, protein, and mRNA-seq
alignments
六、Domain与AED 相关性
七、orthology 与AED相关性
八、其他应用。
Re-annotation of existing genomes and legacy annotations
Gold curve: AED distribution of high-quality ‘gold standard’ annotations
Red curve: 5a.59
Blue curve: MAKER2’s first pass, de novo annotations
Purple curve: automatic MAKER2-based update/revision of the Maize 4a.53
九、运行时间
十、结论
1.MAKER2 improve the performance of ab initio gene-predictors gene-finders when training data are poorly
2.By aligning evidence from ESTs, mRNA-seq, and pro-
tein homology, MAKER2 also provides a experimental data to new and existing annotation datasets for purposes of quality control,
3. To update and revise legacy annotation datasets automatically.
十一、参考文献
Holt, C., & Yandell, M. (2011). MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects. BMC bioinformatics, 12(1), 1.