Phylogenetic Analysis of Complete Bovine Coronavirus Genome Sequences

Aspen M Workman, Gregory P Harhay, Tara G. McDaneld, Subha Das, John Dustin Loy, Benjamin M. Hause

Published: 2022-08-03 DOI: 10.17504/protocols.io.kqdg3pyeql25/v1

Disclaimer

•The findings and conclusions in this presentation are those of the authors and should not be construed to represent any official USDA or U.S. Government determination or policy.

•Mention of trade names or commercial products in this presentation is solely for the purpose of providing specific information and does not imply recommendation or endorsement by the USDA.

•USDA is an equal opportunity provider and employer

Abstract

Bovine coronavirus (BCoV) has spilled over to many species, including humans, where the host range variant coronavirus OC43 is endemic. Balance of the opposing activities of the surface spike (S) and hemagglutinin esterase (HE) glycoproteins control virion avidity which is critical for interspecies transmission and host adaptation.  Here, 78 genomes were sequenced directly from clinical samples collected between 2013 and 2022 from cattle in 12 states, primarily in the Midwestern U.S.  Relatively little genetic diversity was observed, with genomes having >98% nucleotide identity. Eleven isolates collected between 2020 and 2022 from four states (Nebraska, Colorado, California, and Wisconsin) contained a 12-nucleotide insertion in the receptor-binding domain (RBD) of the HE gene identical to one recently reported in China, and a single genome from Nebraska collected in 2020 contained a novel 12-nucleotide deletion in the HE gene RBD. Isogenic HE proteins containing either the insertion or deletion in the HE RBD maintained esterase activity and the ability to bind bovine submaxillary mucin, a substrate enriched in the receptor 9- O -acetylated-sialic acid, despite modeling that predicted structural changes in the HE R3 loop critical for receptor binding.  The emergence of BCoV with structural variants in the RBD raises the possibility of further interspecies transmission.

Steps

Software

1.

Software

ValueLabel
MAFFTNAME
Kazutaka KatohDEVELOPER
https://mafft.cbrc.jp/alignment/software/LINK
7.450VERSION
2.

Software

ValueLabel
RAxMLNAME
UBUNTU Linux `OS_NAME
20.4OS_VERSION
https://github.com/stamatak/standard-RAxMLREPOSITORY
Alexandros StamatakisDEVELOPER
https://cme.h-its.org/exelixis/web/software/raxml/LINK
8.2.12VERSION

MAFFT Alignment

3.

Align using MAFFT v7.450

The sequences were aligned using the MAFFT accuracy methods ---globalpair as defined in the mafft manual page below

Note
DESCRIPTION MAFFT is a multiple sequence alignment program for unix-like operating systems. It offers a range of multiple alignment methods.Accuracy-oriented methods: • L-INS-i (probably most accurate; recommended for <200 sequences; iterative refinement method incorporating local pairwise alignment information):mafft --localpair --maxiterate 1000input [> output ]linsi input [> output ]• G-INS-i (suitable for sequences of similar lengths; recommended for <200 sequences; iterative refinement method incorporating global pairwise alignmentinformation):mafft--globalpair--maxiterate 1000input [> output ]ginsi input [> output ]

Keep in mind ...

  1. The commands below generate files that contain both STDOUT (mafft output to the "terminal") as well as the multifasta alignment file (.afa). The STDOUT is found at the beginning of the file and the alignments follow the STDOUT
  2. All .afa_out files were split in a text editor into two separate .out and .afa files for downstream analysis.
  3. The input FASTA file headers contain description information; this header information should be stripped out to facilitate readable, compact leaf names/identifiers. Use the following command to strip out the description text.
#sed to strip out the description in the header from FASTA (UBUNTU Linux 20.4)
sed '/^>/ s/ .*//'  input_alignment_multifasta_file.afa  >  output_alignment_multifasta_file.SEQID_Only.afa
```thethe



3.2.

Align 192 genomes using the default PAM (JTT) 200 substitution matrix

nohup mafft --thread 15 --globalpair --maxiterate 1000 --jtt 200 --reorder 192_Genomes_Final_6_14_22.fasta > 192_Genomes_Final_6_14_22.jtt_200.globalpair.afa_out &
```Outputs :  [192_Genomes_Final_6_14_22.jtt_200.globalpair.afa_out](https://static.yanyin.tech/literature_test/protocol_io_true/protocols.io.kqdg3pyeql25/igkgjrq63.afa_out) 







This  was split into  [192_Genomes_Final_6_14_22.jtt_200.globalpair.afa](https://static.yanyin.tech/literature_test/protocol_io_true/protocols.io.kqdg3pyeql25/igkijrq615.afa) &[192_Genomes_Final_6_14_22.jtt_200.globalpair.out](https://static.yanyin.tech/literature_test/protocol_io_true/protocols.io.kqdg3pyeql25/igkjjrq64.out) 





 

Use sed command to strip out the description from the header in FASTA





 Create SEQID_Only file [192_Genomes_Final_6_14_22.jtt_200.globalpair.SEQID_Only.afa](https://static.yanyin.tech/literature_test/protocol_io_true/protocols.io.kqdg3pyeql25/igkkjrq625.afa) 











3.3.

Align 192 BCV genomes using the PAM (JTT) 100 substitution matrix

nohup mafft --thread 15 --globalpair --maxiterate 1000 --jtt 100 --reorder 192_Genomes_Final_6_14_22.fasta > 192_Genomes_Final_6_14_22.jtt_100.globalpair.afa_out &
```Outputs :  [192_Genomes_Final_6_14_22.jtt_100.globalpair.afa_out](https://static.yanyin.tech/literature_test/protocol_io_true/protocols.io.kqdg3pyeql25/if5rjrq630.afa_out) 







This  was split into  [192_Genomes_Final_6_14_22.jtt_100.globalpair.afa](https://static.yanyin.tech/literature_test/protocol_io_true/protocols.io.kqdg3pyeql25/if5sjrq68.afa) &[192_Genomes_Final_6_14_22.jtt_100.globalpair.out](https://static.yanyin.tech/literature_test/protocol_io_true/protocols.io.kqdg3pyeql25/if5tjrq619.out) 





 

Use sed command to strip out the description from the header in FASTA





 Create SEQID_Only file [192_Genomes_Final_6_14_22.jtt_100.globalpair.SEQID_Only.afa](https://static.yanyin.tech/literature_test/protocol_io_true/protocols.io.kqdg3pyeql25/if5ujrq628.afa) 





3.4.

Align SPIKE using the default PAM (JTT) 200 substitution matrix

nohup mafft --thread 15 --globalpair --maxiterate 1000 --jtt 200 --reorder 192_Spike_6_14_22.fasta > 192_Spike_6_14_22.jtt_200.globalpair.afa_out &
```Outputs :  [192_Spike_6_14_22.jtt_200.globalpair.afa_out](https://static.yanyin.tech/literature_test/protocol_io_true/protocols.io.kqdg3pyeql25/if5vjrq626.afa_out) 







This  was split into[192_Spike_6_14_22.jtt_200.globalpair.afa](https://static.yanyin.tech/literature_test/protocol_io_true/protocols.io.kqdg3pyeql25/if5xjrq616.afa)  & [192_Spike_6_14_22.jtt_200.globalpair.out](https://static.yanyin.tech/literature_test/protocol_io_true/protocols.io.kqdg3pyeql25/if5yjrq65.out) 







Use sed command to strip out the description from the header in FASTA[192_Spike_6_14_22.jtt_200.globalpair.SEQID_Only.afa](https://static.yanyin.tech/literature_test/protocol_io_true/protocols.io.kqdg3pyeql25/if6qjrq627.afa) 





 Create SEQID_Only file [192_Spike_6_14_22.jtt_200.globalpair.SEQID_Only.afa](https://static.yanyin.tech/literature_test/protocol_io_true/protocols.io.kqdg3pyeql25/if6rjrq66.afa) 





3.5.

Align SPIKE using the PAM (JTT) 100 substitution matrix

nohup mafft --thread 15 --globalpair --maxiterate 1000 --jtt 100 --reorder 192_Spike_6_14_22.fasta > 192_Spike_6_14_22.jtt_100.globalpair.afa_out &
```Outputs :  [192_Spike_6_14_22.jtt_100.globalpair.afa_out](https://static.yanyin.tech/literature_test/protocol_io_true/protocols.io.kqdg3pyeql25/if54jrq620.afa_out) 







This  was split into[192_Spike_6_14_22.jtt_100.globalpair.afa](https://static.yanyin.tech/literature_test/protocol_io_true/protocols.io.kqdg3pyeql25/if55jrq69.afa) & [192_Spike_6_14_22.jtt_100.globalpair.out](https://static.yanyin.tech/literature_test/protocol_io_true/protocols.io.kqdg3pyeql25/if56jrq633.out) 







Use sed command to strip out the description from the header in FASTA



 Create SEQID_Only file [192_Spike_6_14_22.jtt_100.globalpair.SEQID_Only.afa](https://static.yanyin.tech/literature_test/protocol_io_true/protocols.io.kqdg3pyeql25/if6sjrq629.afa) 





3.6.

Align HE using the default PAM (JTT) 200 substitution matrix

nohup mafft --thread 15 --globalpair --maxiterate 1000 --jtt 200 --reorder 192_HE_6_14_22.fasta > 192_HE_6_14_22.jtt_200.globalpair.afa_out &
```Outputs :  [192_HE_6_14_22.jtt_200.globalpair.afa_out](https://static.yanyin.tech/literature_test/protocol_io_true/protocols.io.kqdg3pyeql25/if62jrq617.afa_out) 





This  was split into [192_HE_6_14_22.jtt_200.globalpair.afa](https://static.yanyin.tech/literature_test/protocol_io_true/protocols.io.kqdg3pyeql25/if65jrq627.afa)  & [192_HE_6_14_22.jtt_200.globalpair.out](https://static.yanyin.tech/literature_test/protocol_io_true/protocols.io.kqdg3pyeql25/if66jrq618.out) 







Use sed command to strip out the description from the header in FASTA



Create SEQID_Only file [192_HE_6_14_22.jtt_200.globalpair.SEQID_Only.afa](https://static.yanyin.tech/literature_test/protocol_io_true/protocols.io.kqdg3pyeql25/if68jrq67.afa) 





3.7.

Align HE using the default PAM (JTT) 100 substitution matrix

nohup mafft --thread 15 --globalpair --maxiterate 1000 --jtt 100 --reorder 192_HE_6_14_22.fasta > 192_HE_6_14_22.jtt_100.globalpair.afa_out &
```Outputs :  [192_HE_6_14_22.jtt_100.globalpair.afa_out](https://static.yanyin.tech/literature_test/protocol_io_true/protocols.io.kqdg3pyeql25/if58jrq621.afa_out) 







This  was split into: [192_HE_6_14_22.jtt_100.globalpair.afa](https://static.yanyin.tech/literature_test/protocol_io_true/protocols.io.kqdg3pyeql25/if59jrq610.afa)  & [192_HE_6_14_22.jtt_100.globalpair.out](https://static.yanyin.tech/literature_test/protocol_io_true/protocols.io.kqdg3pyeql25/if6ajrq623.out) 





Use sed command to strip out the description from the header in FASTA





 Create SEQID_Only file[192_HE_6_14_22.jtt_100.globalpair.SEQID_Only.afa](https://static.yanyin.tech/literature_test/protocol_io_true/protocols.io.kqdg3pyeql25/if6bjrq611.afa) 





Tree Building

4.

RaXML Tee Building

Notes below are cut from the RaXML manual to put the command line parameters used in context

Note
"This is a how-to, which describes how RAxML should best be used for a simple real-world biological analysis, given an example alignment named ex_al"Now, if you want to run a full analysis, i.e., BS and ML search type:raxmlHPC -­f a ­-x 12345 -­p 12345 ­-# 100 ­m GTRGAMMA ­-s ex_al -­n TESTThis will first conduct a BS search and once that is done a search for the best–scoring ML tree.Such a program run will return the bootstrapped trees (RAxML_bootstrap.TEST), the best scoring ML tree(RAxML_bestTree.TEST), and the BS support values drawn on the best-scoring tree as node labels (RAxML_bipartitions.TEST) as well as, more correctly since support values refer to branches as branch labels (RAxML_bipartitionsBranchLabels.TEST).Finally, note that by increasing the number of BS replicates via -# you will also make the ML thorough since for ML optimization every 5th BS tree is used as a starting point tofor ML trees.

raxmlHPC reports 192_Genomes_Final_6_14_22.jtt_100.globalpair.SEQID_Only.afa has 3438 DNA alignment patterns

Note
Thus, if you run RAxML with 32 instead of 1 thread this does not mean that it will automatically become 32 times faster, it may actually even become slower. As I already mentioned, the parallel efficiency, that is, with how many threads/cores you can still execute it efficiently in parallel depends on the alignment length, or to be more precise on the number of distinct patterns in your alignment. This number is printed by RAxML to the terminal and into the RAxML_info.runID file" and look like this:Alignment has 70 distinct alignment patterns As a rule of thumbI'd use one core/thread per 500 DNA site patterns , i.e., if you have less, than it's probably better to just use the sequential version. Single-gene DNA alignments with around 1000 sites can be analyzed with 2 or at most 4 threads. Thus, the more patterns your alignment has, the more threads/cores you can use efficiently.

Given the directions above from the RaXML manual, use 8 threads for raxmlHPC tree building RaXML manual, use 8 threads for raxmlHPC tree building

nohup raxmlHPC -T 8 -f a -x 12345 -p 12345 -N 2000 -m GTRGAMMA -s 192_Genomes_Final_6_14_22.jtt_100.globalpair.SEQID_Only.afa -n 192_Genomes_Final_6_14_22.jtt_100.globalpair.SEQID_Only.T8.N2000 >  192_Genomes_Final_6_14_22.jtt_100.globalpair.SEQID_Only.T8.N2000.out &
```end

 **Output :** 





[RAxML_bootstrap.192_Genomes_Final_6_14_22.jtt_100.globalpair.SEQID_Only.T8.N2000](https://static.yanyin.tech/literature_test/protocol_io_true/protocols.io.kqdg3pyeql25/if6tjrq613.N2000) [192_Genomes_Final_6_14_22.jtt_100.globalpair.SEQID_Only.T8.N2000.out](https://static.yanyin.tech/literature_test/protocol_io_true/protocols.io.kqdg3pyeql25/if6ujrq62.out) 



[RAxML_info.192_Genomes_Final_6_14_22.jtt_100.globalpair.SEQID_Only.T8.N2000](https://static.yanyin.tech/literature_test/protocol_io_true/protocols.io.kqdg3pyeql25/if6vjrq622.N2000) 



[RAxML_bipartitionsBranchLabels.192_Genomes_Final_6_14_22.jtt_100.globalpair.SEQID_Only.T8.N2000](https://static.yanyin.tech/literature_test/protocol_io_true/protocols.io.kqdg3pyeql25/if6wjrq625.N2000) [RAxML_bipartitions.192_Genomes_Final_6_14_22.jtt_100.globalpair.SEQID_Only.T8.N2000](https://static.yanyin.tech/literature_test/protocol_io_true/protocols.io.kqdg3pyeql25/if6xjrq614.N2000) [RAxML_bestTree.192_Genomes_Final_6_14_22.jtt_100.globalpair.SEQID_Only.T8.N2000](https://static.yanyin.tech/literature_test/protocol_io_true/protocols.io.kqdg3pyeql25/if6yjrq612.N2000) 





Use RAxML_bipartitionsBranchLabels.192_Genomes_Final_6_14_22.jtt_100.globalpair.SEQID_Only.T8.N2000 file to create phylogenetic tree with bootstrap support values provided as branch labels.





Visualize this tree of [192 Bovine coronavirus genomes, MAFFT Global Pair Alignment, PAM 100](https://itol.embl.de/tree/162219193109347161657293088) in the [Interactive Tree of Life](https://itol.embl.de/) 







<img src="https://static.yanyin.tech/literature_test/protocol_io_true/protocols.io.kqdg3pyeql25/iq66jrq61.jpg" alt="RAxML tree of 192 bovine coronavirus genomes. Bootstrap support values are proportional to the line thickness of the branches. Isolates with HE deletions are denoted with a filled red square while the single isolate with an HE insertion is denoted with a filled blue square.  Tree metrics are best investigated in the IToL tree." loading="lazy" title="RAxML tree of 192 bovine coronavirus genomes. Bootstrap support values are proportional to the line thickness of the branches. Isolates with HE deletions are denoted with a filled red square while the single isolate with an HE insertion is denoted with a filled blue square.  Tree metrics are best investigated in the IToL tree."/>

  

The pdf suitable for download is  [RaXML tree of 192 livestock coronavirus genomes.pdf](https://static.yanyin.tech/literature_test/protocol_io_true/protocols.io.kqdg3pyeql25/igmkjrq624.pdf) 









推荐阅读

Nature Protocols
Protocols IO
Current Protocols
扫码咨询