Protein-coding gene annotation protocol for the eastern banjo frog

Qiye Li, Qunfei Guo, Yang Zhou, Huishuang Tan, Terry Bertozzi, Yuanzhen Zhu, Ji Li, Stephen Donnellan, Guojie Zhang

Published: 2023-02-27 DOI: 10.17504/protocols.io.bc38iyrw

Abstract

This pipeline is the protein-coding gene annotation by using homology-based and de novo predictions to build gene models for the Limnodynastes dumerilii dumerilii genome assembly.

Steps

Gene prediction preparation

1.

Download protein sequences of diverse vertebrate species ( Danio rerio (Ensembl, release-98), Xenopus tropicalis (Ensembl, release-98), Xenopus laevis (NCBI, GCF_001663975.1), Nanorana parkeri (NCBI, GCF_000935625.1), Microcaecilia unicolor (NCBI, GCF_901765095.1), Rhinatrema bivittatum (NCBI, GCF_901001135.1), Anolis carolinensis (Ensembl, release-98), Gallus gallus (Ensembl, release-98) and Homo sapiens (Ensembl, release-98)).

Homology-based prediction

2.

Align the protein sequences from the previous step to the Limnodynastes dumerilii dumerilii genome assembly by using TBLASTN (blast-2.2.26).

Extract the genomic sequences of the candidate loci together with 5 kb flanking sequences.

Align the homologous proteins to the extracted genomics sequences from the previous step by using GeneWise (wise-2.2.0) for exon-intron structure determinations.

de novo prediction

3.

Randomly pick 1,000 homology-derived gene models of Limnodynastes dumerilii dumerilii with complete open reading frames (ORFS) and reciprocal aligning rates exceeding 90% against the X. tropicalis proteins to train AUGUSTUS (v3.3.1, RRID:SCR_008417).

Use the obtained gene parameters by AUGUSTUS to predict protein-coding genes on the repeat-masked Limnodynastes dumerilii dumerilii genome assembly.

Gene models combination

4.

Combine gene models dervied from the above two methods into a non-redundant gene set using a similar strategy to Xiong et al . (2016).

Note
Xiong et al . (2016) : Xiong Z, Li F, Li Q, Zhou L, Gamble T, Zheng J, et al. Draft genome of the leopard gecko, Eublepharis macularius . Gigascience. 2016;5 1:47. doi:10.1186/s13742-016-0151-4.

Gene filtration

5.

Remove genes showing BLASTP (blast-2.2.26) hits to transposon proteins in the UniProtKB/Swiss-Prot database (v2019_11), or with more than 70% of their coding regions overlapping repetitive sequences from the combined gene set.

Note
Use the parameters "-F F -e 1e-5" when running BLASTP.

Functional annotation

6.

Finally, perform functional annotation by searching the Limnodynastes dumerilii dumerilii proteins against public databases of UniProtKB/Swiss-Prot (v2019_11), NCBI nr (v20191030), and KEGG (v93.0) with BLASTP (blast-2.2.26;).

Note
Use the parameters "-F F -e 1e -5" when running BLASTP.

推荐阅读

Nature Protocols
Protocols IO
Current Protocols
扫码咨询