Introduction
The year 2020 will be remembered mainly for COVID‐19. Whether it is the only such year depends on the production of appropriate vaccines to protect future populations from outbreaks of this virus in years to come. As I write this, governments worldwide have asked that all our resources in medicine, industry, transportation, finance, and other fields be directed according to the best policy for containing this pandemic. One thing our phylogenetic community can provide is the best possible understanding of the origin and evolution of SARS viruses.
Common knowledge
As anyone who follows international news can tell you, coronaviruses that evolved in animals have recently been transferred to humans. The zoological source of these viruses is important to fighting this pandemic or others in the future. There is already substantial study of SARS (sometimes called SARS‐CoV, or SARS‐1) that caused an East Asian epidemic 20 years ago. The current pandemic of COVID‐19 is also called SARS‐2, or SARS‐CoV‐2 because these viruses are related. More peripherally, MERS is a similar disease that appears occasionally in the Middle East, but also spread in South Korea, with cases elsewhere in Asia, North America, and Europe.
Origin of SARS and MERS coronaviruses
The National Institutes of Health (2020), the highest medical authority of USA, writes
In the first scenario, as the new coronavirus evolved in its natural hosts, possibly bats or pangolins, its spike proteins mutated to bind to molecules similar in structure to the human ACE2 protein, thereby enabling it to infect human cells. This scenario seems to fit other recent outbreaks of coronavirus‐caused disease in humans, such as SARS, which arose from cat‐like civets; and Middle East respiratory syndrome (MERS), which arose from camels.” (National Institutes of Health, 2020.)
This statement identifies pangolins and civets as serving as links between viruses that do not infect humans and the epidemics that do. Sun et al. (2020) concur:
Subsequent studies found that SARS‐CoV originated from bats and interspecies transmission to humans took place via an intermediate host: Himalayan palm civets (Paguma larvata ) or raccoon dogs (Nyctereutes procyonoides ).
The civet is again determined to be an intermediate. To demonstrate the broad acceptance of this connection, the crowd‐sourced pages of Wikipedia (2020) state:
In 2002–03, civets sold for meat in local markets of China's Yunnan province carried the SARS virus from horseshoe bats to humans. The resulting viral outbreak killed 774 people in 2002–2003.
Readers are invited to search the internet and find their own authorities repeating the connection through civets and pangolins. But, do the actual analyses provide adequate basis for concluding that civets and pangolins served as the origin for human SARS viruses?
No valid root for the tree
In a few cases, we can be fairly certain of the direction of transmission between species, such as with the lions and tigers in the Bronx zoo that must have contracted SARS‐CoV‐2 from humans (Wildlife Conservation Society, 2020). In most situations, we will need an evolutionary tree of viral samples to reconstruct a history of viruses and their host relationships. The point of using a tree to interpret evolutionary patterns is to trace characters of interest back in time, toward the root of the tree. This means the position of the root in the network may be especially meaningful. Without a rigorous method to establish a root, it is not safe to evaluate the direction of evolution, who is the ancestor and who is the descendant. The standard in phylogenetics today is to use several successively more distant lineages (called outgroups) to determine the direction of evolution in the group of interest (the ingroup). No such method was followed in several landmark papers.
Tracing COVID‐19 to animals, and what animals they might be, was addressed in part by an influential paper by Guan et al. (2003). Published as a Report in Science , such brief papers put much of the methodology in supplemental material that is not presented in the publication itself. It is not unusual to find in the supplement errors that were not evident to those who just read the publication. Of Guan et al.’s phylogenetic reconstruction, Janies et al. (2008) wrote.
In the case of Guan et al., [2003; see Guan’s fig. 2 and fig. S2 of the supplemental materials] and the Chinese SARS Molecular Epidemiology Consortium (2004); see fig. S7 of their supplemental materials] these researchers simply force the root position on their drawings such that they represent SARS‐CoV isolates from animal hosts as ancestral. In other drawings, no outgroup is designated [Chinese SARS Molecular Epidemiology Consortium (2004) their fig. 2] or a human SARS‐CoV outgroup is used and the animal SARS‐CoV isolates are omitted from the tree [Chinese SARS Molecular Epidemiology Consortium (2004) their fig. S6 of the supplemental materials]. In the case of Song et al., 2005 a human SARS‐CoV is designated as the outgroup.
In an equally dangerous short‐cut, some authors choose to interpret their phylogeny by placing the root at the midpoint of the network. This method attempts to place the root in the middle of the tree by calculating all of the tip‐to‐tip distances and selecting the center of the longest distance. The symmetry created by midpoint rooting means evolutionary change is balanced around the root, and that the “direction” of evolution (primitive versus derived) implied by the tree is partly arbitrary. It is unclear how the actual history will be misrepresented by midpoint rooting, but the consequence may be severe. Li et al. (2020) use midpoint rooting of a tree to infer origins of SARS‐CoV‐2. Whereas all other analyses indicate a Chinese origin, the diagram of Li et al. (their fig. 2) places the root (origin) among samples from the USA, with most of China separate and further away. The authors do not discuss this nonsensical result, apparently unaware that it represents a grave problem.
It is ordinary to discuss differences in topology among solution trees from a single data source, but comparing trees with different taxon sampling and midpoint rooting is completely pointless. There is no reason to expect that the point midway between the longest branch‐to‐branch distance on separate trees would match when the primary data are not the same. Yet, Stavrinides and Guttman (2004) did this in an effort to identify hybrid origin of SARS. They partitioned the SARS genome into four pieces and studied these partitions separately. The analyses included different isolates representing birds and mammals (no bats), with the number of animal isolates ranging from 9 to 14, and one terminal for SARS in each analysis. The four separate solution trees were each rooted at the midpoint and, because the trees did not match, separate origins were proposed for the different partitions. Yet, the midpoint root has no actual biological value, it is only a mathematical entity, and there are plenty of reasons to find different solutions when the taxon sampling and primary data differ across analyses, regardless of arbitrary rooting. Such an analysis is actually uninterpretable.
Using an alternative approach, Forster et al. (2020) calculated a median‐joining network to represent relatedness among 160 isolates of SARS‐CoV‐2 from around the world. In the same publication, Sánchex‐Pacheco et al. (2020) and Mavian et al. (2020) both stated grave concerns regarding the rooting of the network, and observed that directionality (ancestor‐descendant) cannot be determined and that clusters formed by this method do not reflect actual phylogenetic descent. In addition, Forster misuses the term “clade” and “derived” throughout the paper. Mavia et al also identify inadequate sampling and a limited data set, which is a common problem (see below).
Outdated methodology
Phylogenetics as a field advances as other scientific fields do. Methods that were recommended in years past have no place today. Publishing in the respected Journal of Virology , Yang et al. (2016) used overall similarity to group taxa. This is a method that was rejected circa 1980 in the general phylogenetic community. The history of this perspective is outside the focus of the present paper. It is sufficient here to say that in evolutionary biology we are interested only in special similarity: shared, derived characters that indicate recent common ancestry, characters that may be very few in number. It is irrelevant how many similarities are shared due to a ground plan that has not changed during the time period in question. We are not trying to measure stasis. This is the fundamental and most important advance in the way we compose evolutionary trees since Darwin, and Yang does not present that perspective.
Poor taxon sampling
Most papers focus narrowly on a few animals that were mentioned in the earliest reports. Yet, there are thousands of species of mammals, and certainly, we need to review a broad sample of these to make a well‐supported conclusion that we have found a likely ancestral source, as well as address rooting of the tree of viruses. The analysis of Zhang et al. (2020, their fig. 2) is a network of 27 bat isolates, 10 human isolates, and one each from a pangolin, a mouse, and a mink. In such an analysis, all that is necessary for the pangolin to be grouped among the bats and humans is that it is not very much like the mink or mouse. This would not mean that the pangolin isolate really belongs among bats and humans, it may be simply that bats and humans constitute 37 of 40 terminals, dominating the analysis. Nonetheless, Zhang et al. wrote: “Within this group, RaTG13 [bat] and SARS‐CoV‐2 [Covid‐19] were grouped together, and Pangolin‐CoV was their closest common ancestor.” Maybe true, but this is not an adequate demonstration to conclude so. Reinforcing this error in Nature , Lam et al. (2020) provided a phylogeny of the SARS‐CoV‐2 lineage that unites three bats, six pangolins, and six humans (their fig. 1). The pangolin isolates must group with those from bats or humans because that is the only possibility. Xiao et al. (2020) demonstrated in a study that included 19 bat isolates that human isolates of SARS‐CoV‐2 come out close to a sample of pangolin isolates from rescued animals who mostly died after weeks in captivity. First, we may wonder if the pangolins got the disease from close contact with humans, as the lions and tigers in the zoo did. Further, two of the three partitioned analyses and the full genome analysis all embraced at least one bat isolate with the human and pangolin clade, so pangolins are still not closer to humans than are some bats. This study purports to show that the pangolin isolates are like human SARS‐CoV‐2, but I expect it will be taken as support for the conclusion that pangolins are the origin of human disease. We need more sampling of possible hosts before we can conclude where the virus came from.
Function should not outweigh other data
It is possible, or likely, that some functional elements of Covid‐19 evolved independently in different hosts and then were combined to form the current pathogens, as with the “swine flu” strains of influenza (Wang et al., 2005). Focusing on the critical ACE2 binding, Andersen et al. (2020) presented an amino acid alignment of two genetic regions of particular interest, one of 58 residues and one of 33. The figure considers three bats, two humans, and a pangolin. The human sequences are not identical, nor are the bats. The pangolin sometimes matches one human but not both, and if it matches both then it also matches a bat. Among the highlighted key residues that match ACE2 contact sites, there are only two amino acids (493 and 494 in the figure) shared uniquely by pangolins and one of the humans (but not the other human) from the chosen window of 58 amino acids. Those are not very many data given the level of the problem. This piece of ambiguous evidence highlights six nucleotide base pairs among the chosen 273 bases (coding for 91 amino acids). Consider that the genome is more than 20 000 base pairs, and these authors favor a conclusion based on a small pattern that may not represent the whole picture.
A change in function (such as ability to bind to human ACE2 sites) is critical medically, but it does not change general criteria for evaluating ancestry of the partitions of the viral genome. In fact, a hybrid origin as we expect for SARS‐CoV‐2 would require even greater care to unravel. Yet, the significance of function of the ACE2 binding sites is taking priority and degrading a more general perspective that should be primary. Granting special status to a few important amino acids among more than 20 000 base pairs is not a sound methodology. Consider that tomorrow another important discovery will favor a different region, and then we change the tree according to those priorities. In phylogenetics, this is called “weighting,” and weighting some few data enough that they control the whole analysis is universally rejected among career systematists. That is not analysis of data, it is dictation.
As a side note, an amino acid is a “composite character,” meaning it is composed of different parts (the nucleotides) that may represent separate evolutionary events. Evolution can produce the same amino acid residue by different paths because codons need not be identical to have identical translated products (Simmons and Freudenstein, 2002). Future work would do well to examine the actual nucleotide sequence when investigating genetic evolution and ancestry, and keep “function” of the translated product as a different question, important as it may be.
Inadequate consideration of ambiguity
As one who has read a lot of papers in phylogenetics, I am surprised that there is no struggle with ambiguity in these studies of an explosive radiation of viruses. We can expect the data to be rather messy. With such rapid evolution, many very short branches are likely, with the resultant phylogenetic solutions being sensitive to the parameters of analysis. Furthermore, if a virus can be passed from one species to another, then it may perhaps be passed back and forth more than once, creating ambiguity about origin and direction (see below). Authors working on far less difficult problems usually devote some discussion to what groups are well supported and what are less so. It is curious that the authors cited here are so confident of the single solution they favor. This extends explicitly to Forster et al. who state that more time‐tested methods “did not facilitate an unambiguous interpretation of the data.” Of course, if the data themselves are ambiguous, then it is misleading to offer an unambiguous conclusion.
Best analysis of this group to date:
Following the SARS epidemic, Janies et al. (2008) produced an extensive phylogenetic analysis including geographic localities of origin, host species, different regions of viral genome, and complete genomes. The study included several different data sets: 83 partial genome isolates available in 2005; 157 partial genomes isolates available in 2006; 114 whole viral genomes available in 2006. These include particular study of the appearance of a 29‐nucleotide region that is important to the switch between carnivores and humans as hosts. A variety of weighting schemes and methods of reconstruction were used with explicit discussion of each to evaluate the robustness of the results. The authors stated:
Many of the [research teams] that argue carnivores are the original reservoir of SARS‐CoV use a phylogeny to support their argument. However, the phylogenies in these studies often lack outgroup and rooting criteria necessary to determine the origins of SARS‐CoV. Recently, SARS‐CoV has been isolated from various species of Chiroptera from China (e.g., Rhinolophus sinicus ) thus leading to reconsideration of the original reservoir of SARS‐CoV. We evaluated the hypothesis that SARS‐CoV isolated from Chiroptera are the original zoonotic source for SARS‐CoV by sampling SARS‐CoV and non‐SARS‐CoV from diverse hosts including Chiroptera, as well as carnivores, artiodactyls, rodents, birds and humans. Regardless of alignment parameters, optimality criteria, or isolate sampling, the resulting phylogenies clearly show that the SARS‐CoV was transmitted to small carnivores well after the epidemic of SARS in humans that began in late 2002. [emphasis added]
In both the 83‐isolate analysis and the 114‐genome analysis, the civet viruses are deep inside the clade of human viruses, inside 10 nested branch points and 18 branch points, respectively. The 157‐isolate analysis of Janies et al. is reproduced here (Fig. 1). While the 157‐isolate analysis places carnivores closer to the base of the human lineage than do the 83‐ or 114‐isolate analyses, all of these analyses show that bats passed SARS‐CoV‐1 to humans (Fig. 1, circle B) before humans passed it to civets (circle A). Note also that this analysis indicates transfer from humans to carnivores and back to humans several times, but these are not the origin of the infection in humans (circle B). Recall that lions and tigers in a zoo have recently contracted SARS‐CoV‐2 from humans, and so the fact that a certain animal carries a given virus does not mean it represents the origin of human infection. Note also that most human infections represented here are independent of carnivores, simply as human to human infection spread across Asia.
COVID‐19
Common knowledge regarding SARS viruses is apparently not well supported. Are we doing any better with SARS‐CoV‐2 that causes the present COVID‐19 pandemic? There are signs that we may do better. Andersen et al (2020) stated at the outset that “the diversity of coronaviruses in bats and other species is massively undersampled.” More direct attack of this problem is coming. Hu et al. (2017) reported 11 new viruses from a single cave in Yunnan. These may relate to the origin of SARS‐CoV‐2. After all, the virus was identified in Wuhan, but it likely did not evolve there, or among the animals that happened to be in the meat market when scientists visited. The good thing about these studies is that they show that we are at last moving in the right direction as far as accumulating the primary data. Let us hope the sophistication of analysis will follow.
Conclusions
People outside the science of systematics often view phylogenetic study as little more than a kind of bookkeeping, an organizational tool that provides a reference point. This perspective is facilitated by the mechanization of data capture, and the ease with which mathematical operations are executed by approved computer programs where the user needs to know only how to point‐and‐click (Grant et al., 2003). At the same time, the level of sophistication in the field has grown well beyond what is easily accomplished by following the owner’s manual of a good software package. An ordinary analysis today requires an author to rationalize the choice of taxa under study, the choice of primary data, the method of character coding (including difficult issues of alignment, weighting, missing values), treatment of partitions, choice of reconstruction methods, choice of competing models, choice of measures of fit or confidence, how to resolve ambiguity, optimization of characters of interest on a tree, etc. (Wenzel, 2002; Janies, 2019). None of these is addressed adequately by the FAQ section of a software manual. It is not unusual for my colleagues in phylogenetic research today to answer the problems listed above by using several programs chained together, or multiple analyses compared as alternatives, before coming to some general conclusion. Such complexity extends well beyond the interest, patience, or capacity of the occasional users of phylogenetic information, so those who are in a hurry largely ignore these issues.
We rely on data and on epistemologically sound processes to make scientific conclusions. It is plain that even in this time of emergency, no one is paying attention to the expertise of professional phylogeneticists. Perhaps it is because especially in an emergency people want to find “the answer.” They do not want to hear that “it is complicated.” How unfortunate. We need to do more to promote our science as being important, relevant, sophisticated, and helpful in this emergency.
All legitimate peer‐reviewed papers accept that this virus came from animal pathogens, probably as a novel combination of viruses found in different species, and is not manufactured. This virus has new features that promote binding to human cells, and therefore contagion and disease in humans is a new feature added to older viral elements. Where did the virus come from? If there are partitions of the virus with different histories, what are those partitions, and what are the histories? These are questions of phylogenetics and systematics, rather than medicine and virology. This article states the problem. We are still looking for a solution.
Acknowledgements
I thank Mathieu Faure‐Brac, Daniel Janies, Denis Machado, Donna Wenzel, Ward Wheeler and John Wible for comments on the manuscript.
https://news.google.com/__i/rss/rd/articles/CBMiNWh0dHBzOi8vb25saW5lbGlicmFyeS53aWxleS5jb20vZG9pLzEwLjExMTEvY2xhLjEyNDI10gEA?oc=5
2020-07-31 02:56:40Z
52780965164593
Bagikan Berita Ini
0 Response to "Origins of SARS‐CoV‐1 and SARS‐CoV‐2 are often poorly explored in leading publications - Wiley"
Post a Comment