Protein-coding gene annotation protocol for the eastern banjo frog
Qiye Li, Qunfei Guo, Yang Zhou, Huishuang Tan, Terry Bertozzi, Yuanzhen Zhu, Ji Li, Stephen Donnellan, Guojie Zhang
Abstract
This pipeline is the protein-coding gene annotation by using homology-based and de novo predictions to build gene models for the Limnodynastes dumerilii dumerilii genome assembly.
Steps
Gene prediction preparation
Download protein sequences of diverse vertebrate species ( Danio rerio (Ensembl, release-98), Xenopus tropicalis (Ensembl, release-98), Xenopus laevis (NCBI, GCF_001663975.1), Nanorana parkeri (NCBI, GCF_000935625.1), Microcaecilia unicolor (NCBI, GCF_901765095.1), Rhinatrema bivittatum (NCBI, GCF_901001135.1), Anolis carolinensis (Ensembl, release-98), Gallus gallus (Ensembl, release-98) and Homo sapiens (Ensembl, release-98)).
Homology-based prediction
Align the protein sequences from the previous step to the Limnodynastes dumerilii dumerilii genome assembly by using TBLASTN (blast-2.2.26).
Extract the genomic sequences of the candidate loci together with 5 kb flanking sequences.
Align the homologous proteins to the extracted genomics sequences from the previous step by using GeneWise (wise-2.2.0) for exon-intron structure determinations.
de novo prediction
Randomly pick 1,000 homology-derived gene models of Limnodynastes dumerilii dumerilii with complete open reading frames (ORFS) and reciprocal aligning rates exceeding 90% against the X. tropicalis proteins to train AUGUSTUS (v3.3.1, RRID:SCR_008417).
Use the obtained gene parameters by AUGUSTUS to predict protein-coding genes on the repeat-masked Limnodynastes dumerilii dumerilii genome assembly.
Gene models combination
Combine gene models dervied from the above two methods into a non-redundant gene set using a similar strategy to Xiong et al . (2016).
Gene filtration
Remove genes showing BLASTP (blast-2.2.26) hits to transposon proteins in the UniProtKB/Swiss-Prot database (v2019_11), or with more than 70% of their coding regions overlapping repetitive sequences from the combined gene set.
Functional annotation
Finally, perform functional annotation by searching the Limnodynastes dumerilii dumerilii proteins against public databases of UniProtKB/Swiss-Prot (v2019_11), NCBI nr (v20191030), and KEGG (v93.0) with BLASTP (blast-2.2.26;).