Phylogenetic Analysis of Complete Bovine Coronavirus Genome Sequences
Aspen M Workman, Gregory P Harhay, Tara G. McDaneld, Subha Das, John Dustin Loy, Benjamin M. Hause
Disclaimer
•The findings and conclusions in this presentation are those of the authors and should not be construed to represent any official USDA or U.S. Government determination or policy.
•Mention of trade names or commercial products in this presentation is solely for the purpose of providing specific information and does not imply recommendation or endorsement by the USDA.
•USDA is an equal opportunity provider and employer
Abstract
Bovine coronavirus (BCoV) has spilled over to many species, including humans, where the host range variant coronavirus OC43 is endemic. Balance of the opposing activities of the surface spike (S) and hemagglutinin esterase (HE) glycoproteins control virion avidity which is critical for interspecies transmission and host adaptation. Here, 78 genomes were sequenced directly from clinical samples collected between 2013 and 2022 from cattle in 12 states, primarily in the Midwestern U.S. Relatively little genetic diversity was observed, with genomes having >98% nucleotide identity. Eleven isolates collected between 2020 and 2022 from four states (Nebraska, Colorado, California, and Wisconsin) contained a 12-nucleotide insertion in the receptor-binding domain (RBD) of the HE gene identical to one recently reported in China, and a single genome from Nebraska collected in 2020 contained a novel 12-nucleotide deletion in the HE gene RBD. Isogenic HE proteins containing either the insertion or deletion in the HE RBD maintained esterase activity and the ability to bind bovine submaxillary mucin, a substrate enriched in the receptor 9- O -acetylated-sialic acid, despite modeling that predicted structural changes in the HE R3 loop critical for receptor binding. The emergence of BCoV with structural variants in the RBD raises the possibility of further interspecies transmission.
Steps
Software
Software
Value | Label |
---|---|
MAFFT | NAME |
Kazutaka Katoh | DEVELOPER |
https://mafft.cbrc.jp/alignment/software/ | LINK |
7.450 | VERSION |
Software
Value | Label |
---|---|
RAxML | NAME |
UBUNTU Linux ` | OS_NAME |
20.4 | OS_VERSION |
https://github.com/stamatak/standard-RAxML | REPOSITORY |
Alexandros Stamatakis | DEVELOPER |
https://cme.h-its.org/exelixis/web/software/raxml/ | LINK |
8.2.12 | VERSION |
MAFFT Alignment
Align using MAFFT v7.450
The sequences were aligned using the MAFFT accuracy methods ---globalpair as defined in the mafft manual page below
Keep in mind ...
- The commands below generate files that contain both STDOUT (mafft output to the "terminal") as well as the multifasta alignment file (.afa). The STDOUT is found at the beginning of the file and the alignments follow the STDOUT
- All .afa_out files were split in a text editor into two separate .out and .afa files for downstream analysis.
- The input FASTA file headers contain description information; this header information should be stripped out to facilitate readable, compact leaf names/identifiers. Use the following command to strip out the description text.
#sed to strip out the description in the header from FASTA (UBUNTU Linux 20.4)
sed '/^>/ s/ .*//' input_alignment_multifasta_file.afa > output_alignment_multifasta_file.SEQID_Only.afa
```thethe
The input FASTA files are:
192_Spike_6_14_22.fasta 192_HE_6_14_22.fasta 192_Genomes_Final_6_14_22.fasta
Align 192 genomes using the default PAM (JTT) 200 substitution matrix
nohup mafft --thread 15 --globalpair --maxiterate 1000 --jtt 200 --reorder 192_Genomes_Final_6_14_22.fasta > 192_Genomes_Final_6_14_22.jtt_200.globalpair.afa_out &
```Outputs : [192_Genomes_Final_6_14_22.jtt_200.globalpair.afa_out](https://static.yanyin.tech/literature_test/protocol_io_true/protocols.io.kqdg3pyeql25/igkgjrq63.afa_out)
This was split into [192_Genomes_Final_6_14_22.jtt_200.globalpair.afa](https://static.yanyin.tech/literature_test/protocol_io_true/protocols.io.kqdg3pyeql25/igkijrq615.afa) &[192_Genomes_Final_6_14_22.jtt_200.globalpair.out](https://static.yanyin.tech/literature_test/protocol_io_true/protocols.io.kqdg3pyeql25/igkjjrq64.out)
Use sed command to strip out the description from the header in FASTA
Create SEQID_Only file [192_Genomes_Final_6_14_22.jtt_200.globalpair.SEQID_Only.afa](https://static.yanyin.tech/literature_test/protocol_io_true/protocols.io.kqdg3pyeql25/igkkjrq625.afa)
Align 192 BCV genomes using the PAM (JTT) 100 substitution matrix
nohup mafft --thread 15 --globalpair --maxiterate 1000 --jtt 100 --reorder 192_Genomes_Final_6_14_22.fasta > 192_Genomes_Final_6_14_22.jtt_100.globalpair.afa_out &
```Outputs : [192_Genomes_Final_6_14_22.jtt_100.globalpair.afa_out](https://static.yanyin.tech/literature_test/protocol_io_true/protocols.io.kqdg3pyeql25/if5rjrq630.afa_out)
This was split into [192_Genomes_Final_6_14_22.jtt_100.globalpair.afa](https://static.yanyin.tech/literature_test/protocol_io_true/protocols.io.kqdg3pyeql25/if5sjrq68.afa) &[192_Genomes_Final_6_14_22.jtt_100.globalpair.out](https://static.yanyin.tech/literature_test/protocol_io_true/protocols.io.kqdg3pyeql25/if5tjrq619.out)
Use sed command to strip out the description from the header in FASTA
Create SEQID_Only file [192_Genomes_Final_6_14_22.jtt_100.globalpair.SEQID_Only.afa](https://static.yanyin.tech/literature_test/protocol_io_true/protocols.io.kqdg3pyeql25/if5ujrq628.afa)
Align SPIKE using the default PAM (JTT) 200 substitution matrix
nohup mafft --thread 15 --globalpair --maxiterate 1000 --jtt 200 --reorder 192_Spike_6_14_22.fasta > 192_Spike_6_14_22.jtt_200.globalpair.afa_out &
```Outputs : [192_Spike_6_14_22.jtt_200.globalpair.afa_out](https://static.yanyin.tech/literature_test/protocol_io_true/protocols.io.kqdg3pyeql25/if5vjrq626.afa_out)
This was split into[192_Spike_6_14_22.jtt_200.globalpair.afa](https://static.yanyin.tech/literature_test/protocol_io_true/protocols.io.kqdg3pyeql25/if5xjrq616.afa) & [192_Spike_6_14_22.jtt_200.globalpair.out](https://static.yanyin.tech/literature_test/protocol_io_true/protocols.io.kqdg3pyeql25/if5yjrq65.out)
Use sed command to strip out the description from the header in FASTA[192_Spike_6_14_22.jtt_200.globalpair.SEQID_Only.afa](https://static.yanyin.tech/literature_test/protocol_io_true/protocols.io.kqdg3pyeql25/if6qjrq627.afa)
Create SEQID_Only file [192_Spike_6_14_22.jtt_200.globalpair.SEQID_Only.afa](https://static.yanyin.tech/literature_test/protocol_io_true/protocols.io.kqdg3pyeql25/if6rjrq66.afa)
Align SPIKE using the PAM (JTT) 100 substitution matrix
nohup mafft --thread 15 --globalpair --maxiterate 1000 --jtt 100 --reorder 192_Spike_6_14_22.fasta > 192_Spike_6_14_22.jtt_100.globalpair.afa_out &
```Outputs : [192_Spike_6_14_22.jtt_100.globalpair.afa_out](https://static.yanyin.tech/literature_test/protocol_io_true/protocols.io.kqdg3pyeql25/if54jrq620.afa_out)
This was split into[192_Spike_6_14_22.jtt_100.globalpair.afa](https://static.yanyin.tech/literature_test/protocol_io_true/protocols.io.kqdg3pyeql25/if55jrq69.afa) & [192_Spike_6_14_22.jtt_100.globalpair.out](https://static.yanyin.tech/literature_test/protocol_io_true/protocols.io.kqdg3pyeql25/if56jrq633.out)
Use sed command to strip out the description from the header in FASTA
Create SEQID_Only file [192_Spike_6_14_22.jtt_100.globalpair.SEQID_Only.afa](https://static.yanyin.tech/literature_test/protocol_io_true/protocols.io.kqdg3pyeql25/if6sjrq629.afa)
Align HE using the default PAM (JTT) 200 substitution matrix
nohup mafft --thread 15 --globalpair --maxiterate 1000 --jtt 200 --reorder 192_HE_6_14_22.fasta > 192_HE_6_14_22.jtt_200.globalpair.afa_out &
```Outputs : [192_HE_6_14_22.jtt_200.globalpair.afa_out](https://static.yanyin.tech/literature_test/protocol_io_true/protocols.io.kqdg3pyeql25/if62jrq617.afa_out)
This was split into [192_HE_6_14_22.jtt_200.globalpair.afa](https://static.yanyin.tech/literature_test/protocol_io_true/protocols.io.kqdg3pyeql25/if65jrq627.afa) & [192_HE_6_14_22.jtt_200.globalpair.out](https://static.yanyin.tech/literature_test/protocol_io_true/protocols.io.kqdg3pyeql25/if66jrq618.out)
Use sed command to strip out the description from the header in FASTA
Create SEQID_Only file [192_HE_6_14_22.jtt_200.globalpair.SEQID_Only.afa](https://static.yanyin.tech/literature_test/protocol_io_true/protocols.io.kqdg3pyeql25/if68jrq67.afa)
Align HE using the default PAM (JTT) 100 substitution matrix
nohup mafft --thread 15 --globalpair --maxiterate 1000 --jtt 100 --reorder 192_HE_6_14_22.fasta > 192_HE_6_14_22.jtt_100.globalpair.afa_out &
```Outputs : [192_HE_6_14_22.jtt_100.globalpair.afa_out](https://static.yanyin.tech/literature_test/protocol_io_true/protocols.io.kqdg3pyeql25/if58jrq621.afa_out)
This was split into: [192_HE_6_14_22.jtt_100.globalpair.afa](https://static.yanyin.tech/literature_test/protocol_io_true/protocols.io.kqdg3pyeql25/if59jrq610.afa) & [192_HE_6_14_22.jtt_100.globalpair.out](https://static.yanyin.tech/literature_test/protocol_io_true/protocols.io.kqdg3pyeql25/if6ajrq623.out)
Use sed command to strip out the description from the header in FASTA
Create SEQID_Only file[192_HE_6_14_22.jtt_100.globalpair.SEQID_Only.afa](https://static.yanyin.tech/literature_test/protocol_io_true/protocols.io.kqdg3pyeql25/if6bjrq611.afa)
Tree Building
RaXML Tee Building
Notes below are cut from the RaXML manual to put the command line parameters used in context
raxmlHPC reports 192_Genomes_Final_6_14_22.jtt_100.globalpair.SEQID_Only.afa has 3438 DNA alignment patterns
Given the directions above from the RaXML manual, use 8 threads for raxmlHPC tree building RaXML manual, use 8 threads for raxmlHPC tree building
nohup raxmlHPC -T 8 -f a -x 12345 -p 12345 -N 2000 -m GTRGAMMA -s 192_Genomes_Final_6_14_22.jtt_100.globalpair.SEQID_Only.afa -n 192_Genomes_Final_6_14_22.jtt_100.globalpair.SEQID_Only.T8.N2000 > 192_Genomes_Final_6_14_22.jtt_100.globalpair.SEQID_Only.T8.N2000.out &
```end
**Output :**
[RAxML_bootstrap.192_Genomes_Final_6_14_22.jtt_100.globalpair.SEQID_Only.T8.N2000](https://static.yanyin.tech/literature_test/protocol_io_true/protocols.io.kqdg3pyeql25/if6tjrq613.N2000) [192_Genomes_Final_6_14_22.jtt_100.globalpair.SEQID_Only.T8.N2000.out](https://static.yanyin.tech/literature_test/protocol_io_true/protocols.io.kqdg3pyeql25/if6ujrq62.out)
[RAxML_info.192_Genomes_Final_6_14_22.jtt_100.globalpair.SEQID_Only.T8.N2000](https://static.yanyin.tech/literature_test/protocol_io_true/protocols.io.kqdg3pyeql25/if6vjrq622.N2000)
[RAxML_bipartitionsBranchLabels.192_Genomes_Final_6_14_22.jtt_100.globalpair.SEQID_Only.T8.N2000](https://static.yanyin.tech/literature_test/protocol_io_true/protocols.io.kqdg3pyeql25/if6wjrq625.N2000) [RAxML_bipartitions.192_Genomes_Final_6_14_22.jtt_100.globalpair.SEQID_Only.T8.N2000](https://static.yanyin.tech/literature_test/protocol_io_true/protocols.io.kqdg3pyeql25/if6xjrq614.N2000) [RAxML_bestTree.192_Genomes_Final_6_14_22.jtt_100.globalpair.SEQID_Only.T8.N2000](https://static.yanyin.tech/literature_test/protocol_io_true/protocols.io.kqdg3pyeql25/if6yjrq612.N2000)
Use RAxML_bipartitionsBranchLabels.192_Genomes_Final_6_14_22.jtt_100.globalpair.SEQID_Only.T8.N2000 file to create phylogenetic tree with bootstrap support values provided as branch labels.
Visualize this tree of [192 Bovine coronavirus genomes, MAFFT Global Pair Alignment, PAM 100](https://itol.embl.de/tree/162219193109347161657293088) in the [Interactive Tree of Life](https://itol.embl.de/)
<img src="https://static.yanyin.tech/literature_test/protocol_io_true/protocols.io.kqdg3pyeql25/iq66jrq61.jpg" alt="RAxML tree of 192 bovine coronavirus genomes. Bootstrap support values are proportional to the line thickness of the branches. Isolates with HE deletions are denoted with a filled red square while the single isolate with an HE insertion is denoted with a filled blue square. Tree metrics are best investigated in the IToL tree." loading="lazy" title="RAxML tree of 192 bovine coronavirus genomes. Bootstrap support values are proportional to the line thickness of the branches. Isolates with HE deletions are denoted with a filled red square while the single isolate with an HE insertion is denoted with a filled blue square. Tree metrics are best investigated in the IToL tree."/>
The pdf suitable for download is [RaXML tree of 192 livestock coronavirus genomes.pdf](https://static.yanyin.tech/literature_test/protocol_io_true/protocols.io.kqdg3pyeql25/igmkjrq624.pdf)