critical review

  

Assessment criteria include, paper format, content literature cited and article choice. In addition to the three articles used for analysis, supporting references should also be used. A minimum of one unique supporting article is required in the Introduction and Conclusion sections. A student example paper has been uploaded to Blackboard. This paper is well-done and hits almost all the “Meets expectations” criteria in the rubric.

Summary sums up the strengths and weaknesses of each article; compares and contrasts articles to one another summary lacks some details, misses a strength/weakness of one of the articles; occasionally compares and contrasts but mainly lists each articles strengths and weakness separately summary lacks detail, includes either strengths or weaknesses; fails to compare and contrasts articles to one another

Significance establishes practical and theoretical significance of body of work; has your chosen article been cited by others; did your articles spark other researches hypotheses or questions; are there any practical applications; implication (social, political, technological, medical) to the research; cites at least one other supporting reference (unique from introduction) logic not clear to the theoretical significance of the body of work; not thorough in establishing its significance; cites at least one other supporting reference (unique from introduction) no connection to theoretical significance of body of work; fails to cite at least one supporting reference Literature cited Format one journal format chosen and used throughout in bibliography and in-text citations some in-text citations were not in the same format; 1-2 errors in bibliography consistency lacking for in-text citations; bibliography with 3+ formatting errors Subject Chosen articles were all on the same topic; topic was specific enough so that an analysis was possible topics were not consistent or were too broad/general each article was on a separate topic and the topics were without reasonable similarities Citation Each reference was used and cited correctly within the body of the paper; three focal references were analyzed; at least 5 references used references were occasionally cited incorrectly; three focal references were analyzed; 4 total references used two or fewer references were analyzed; no supporting references used Quantity minimum of 5, 1 unique to intro, 1 unique to discussion and 3 critically reviewed missing 1 unique missing 2 unique and/or 1 of critically reviewed. 

Host-Microbe Coevolution: Applying Evidence from Model Systems to Complex Marine Invertebrate Holobionts

Paul A. O’Brien,a,b,c Nicole S. Webster,b,c,d David J. Miller,e,f David G. Bournea,b,c

aCollege of Science and Engineering, James Cook University, Townsville, QLD, Australia bAustralian Institute of Marine Science, Townsville, QLD, Australia cAIMS@JCU, Townsville, QLD, Australia dAustralian Centre for Ecogenomics, University of Queensland, Brisbane, QLD, Australia eARC Centre of Excellence for Coral Reef Studies, James Cook University, Townsville, QLD, Australia fCentre for Tropical Bioinformatics and Molecular Biology, James Cook University, Townsville, QLD, Australia

ABSTRACT Marine invertebrates often host diverse microbial communities, making it difficult to identify important symbionts and to understand how these communi- ties are structured. This complexity has also made it challenging to assign microbial functions and to unravel the myriad of interactions among the microbiota. Here we propose to address these issues by applying evidence from model systems of host- microbe coevolution to complex marine invertebrate microbiomes. Coevolution is the reciprocal adaptation of one lineage in response to another and can occur through the interaction of a host and its beneficial symbiont. A classic indicator of coevolution is codivergence of host and microbe, and evidence of this is found in both corals and sponges. Metabolic collaboration between host and microbe is of- ten linked to codivergence and appears likely in complex holobionts, where micro- bial symbionts can interact with host cells through production and degradation of metabolic compounds. Neutral models are also useful to distinguish selected mi- crobes against a background population consisting predominately of random associ- ates. Enhanced understanding of the interactions between marine invertebrates and their microbial communities is urgently required as coral reefs face unprecedented local and global pressures and as active restoration approaches, including manipula- tion of the microbiome, are proposed to improve the health and tolerance of reef species. On the basis of a detailed review of the literature, we propose three re- search criteria for examining coevolution in marine invertebrates: (i) identifying sto- chastic and deterministic components of the microbiome, (ii) assessing codivergence of host and microbe, and (iii) confirming the intimate association based on shared metabolic function.

KEYWORDS codivergence, coevolution, marine invertebrates, microbiome, phylosymbiosis

Coevolution theory dates back to the 19th century (box 1), and coevolution iscurrently referred to as the reciprocal evolution of one lineage in response to another (1). This definition encompasses a broad range of interactions such as predator- prey, host-symbiont, and host-parasite interactions or interactions among the members of a community of organisms such as a host and its associated microbiome (1, 2). In the case of host-microbe associations, this has produced some of the most remarkable evolutionary outcomes that have shaped life on Earth, such as the eukaryotic cell, multicellularity, and the development of organ systems (3, 4). It is now recognized that microbial associations with a multicellular host represent the rule rather than the

Citation O’Brien PA, Webster NS, Miller DJ, Bourne DG. 2019. Host-microbe coevolution: applying evidence from model systems to complex marine invertebrate holobionts. mBio 10:e02241-18. https://doi.org/10.1128/mBio .02241-18.

Editor Danielle A. Garsin, University of Texas Health Science Center at Houston

Copyright © 2019 O’Brien et al. This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International license.

Address correspondence to David G. Bourne, [email protected].

Published 5 February 2019

MINIREVIEW Host-Microbe Biology

crossm

January/February 2019 Volume 10 Issue 1 e02241-18 ® mbio.asm.org 1

D ow

nl oa

de d

fr om

h tt

ps :/

/j ou

rn al

s. as

m .o

rg /j

ou rn

al /m

bi o

on 0

2 F

eb ru

ar y

20 22

b y

24 .1

16 .2

51 .2

22 .

exception (4), but in complex associations of that kind, the extent to which coevolution operates is often unclear.

BOX 1: A BRIEF HISTORY OF COEVOLUTION Charles Darwin once explained the sudden and rapid diversification of flowering

plants as an “abominable mystery,” since it could not be explained by traditional views of evolution alone (5). While his correspondent Gaston de Saporta speculated that a biological interaction between flowering plants and insects might be the cause of the phenomenon, it was not until nearly 100 years later that the concept of coevolution developed. In a pioneering study, Ehrlich and Raven (6) observed that related groups of butterflies were feeding on related groups of plants and specu- lated this was due to a process for which they coined the name “coevolution.” Using butterflies, they argued that plants had evolved mechanisms to overcome predation from herbivores, which in turn had evolved new ways to prey on plants. Decades on, the introduction of phylogenetics has shown that plants evolved in the absence of butterflies, which colonized the diverse group of plants after their chemical defenses were already in place (7). Nevertheless, the theory of coevolution was endorsed, and two important points came to light. First, care must be taken when inferring coevolution from seemingly parallel lines of evolution, and where possible, diver- gence times and common ancestry should be included. Second, coevolution can occur between communities of organisms (“guild” coevolution), as observed in the case of flowering plants, where predation and pollination from a wide variety of insects likely influenced the diversification of angiosperms (8).

Since coevolution can occur across multiple levels of interactions, multiple theories have also developed. The Red Queen theory is based on the concept of antagonistic coevolution and assumes that an adaptation that increases the fitness of one species will come at the cost to the fitness of another (9). This type of coevolution has been most pronounced in host-parasite interactions, where the antagonistic interactions are closely coupled (10). However, coevolutionary patterns may also arise in the case of mutualistic symbioses, which require reciprocal adaptations to the benefit of each partner (11). Mutualistic coevolution is associated with a number of key traits that are discussed further in this review, such as obligate symbiosis, vertical inheritance, and metabolic collaboration. Third, coevolution has also recently been placed in context of the hologenome theory (12), which suggests that the holobiont can act as a unit of selection (but not necessarily as the primary unit) since the combined genomes influence the host phenotype on which selection may operate (13, 14). However, hologenome theory also acknowledges that selection acts on each component of the holobiont individually as well as in combination with other components (including the host). Thus, the entity that is the hologenome may be formed, in part, through coevolution of interacting holobiont compartments, in addition to neutral processes (12).

Given the ubiquitous nature of host-microbe associations and the huge metabolic potential that microorganisms represent, it is not surprising that evidence of host- microbe coevolution is emerging. Model representatives of both simple and complex associations are being used to study coevolution, allowing researchers to look for specific traits, signals, and patterns (1, 15). A well-known model system is the pea-aphid and its endosymbiotic bacteria in the genus Buchnera. This insect has evolved special- ized cells known as bacteriocytes to host its endosymbionts, which in turn synthesize and translocate amino acids that are missing from the diet of the pea aphids (16). Amino acid synthesis occurs through intimate cooperation between host and symbiont, with some pathways missing from the host and some from the symbiont, such that the relationship is obligate to the extent that the one organism cannot survive without the other (17). The human gut microbiome has been extensively studied in complex systems and has been shown to be intimately associated with human health. Gut

Minireview ®

January/February 2019 Volume 10 Issue 1 e02241-18 mbio.asm.org 2

D ow

nl oa

de d

fr om

h tt

ps :/

/j ou

rn al

s. as

m .o

rg /j

ou rn

al /m

bi o

on 0

2 F

eb ru

ar y

20 22

b y

24 .1

16 .2

51 .2

22 .

microbes have been shown to be linked with human behavior and development through metabolic processes, such as microbial regulation of the essential amino acid tryptophan (18, 19). The human microbiome contains around 150-fold more nonre- dundant genes than the human genome (20), and the metabolic capacity of microbes residing in the intestine is believed to have been a driving evolutionary force in the host-microbe coevolution of humans (2). In these examples, as well as many others (21–23), both host and symbiont evolved to maintain and facilitate the symbiosis. Furthermore, phylogenies of host and symbiont in these systems are often mirrored, indicating that host and symbiont are diverging in parallel (16, 24, 25), a phenomenon known as codivergence (26).

In the marine environment, invertebrates can host microbial communities as simple and stable as that of the pea aphid or as complex and dynamic as that of the human gut (Fig. 1). The Hawaiian bobtail squid, for example, maintains an exclusive symbiosis with a single bacterial symbiont which it hosts within a specialized light organ (27). On the other hand, corals host enormously diverse microbial communities, comprising thousands of species-level operational taxonomic units (OTUs), which are often influ- enced by season, location, host health, and host genotype (28–31). Marine sponges also host complex microbial communities with diversity comparable to that of corals (32) but with associations that are generally far more stable in space and time (33). Less-diverse microbial communities are found in the sea anemone Aiptasia, where the number of OTUs is generally in the low hundreds (34). Due to the close taxonomic relationship of Aiptasia with coral and its comparatively simple microbial community, it has been proposed as a model organism for studying coral microbiology and symbiosis (34). Some marine invertebrates also include species along a continuum of microbial diversities. Ascidians, for example, have been shown to host fewer than 10 (Polycarpa aurata) or close to 500 (Didemnum sp.) microbial OTUs within their inner tunic (35). Furthermore, species with low microbial diversity such as P. aurata can exhibit high intraspecific variation, with as few as 8% of OTUs shared among individuals of the same species (35). Taken together, the data from those studies highlight the vast spectrum of associations that marine invertebrates form with microbial communities in terms of diversity, composition, and stability (Fig. 1).

While previous research has provided a good understanding of the composition of marine invertebrate microbiomes, our understanding of how the microbiome interacts with the host, and of the potential to coevolve, is far more limited. Moreover, the

FIG 1 Spectrum of microbial diversity associated with different compartments of marine invertebrates. Microbial associations may involve a single symbiont in a specialized organ or over 1,000 operational taxonomic units (OTUs) associated with tissues. The levels of OTUs reported in the figure represent the highest recorded in the referenced study for that species. Reported levels of diversity may differ significantly within the same species across different studies.

Minireview ®

January/February 2019 Volume 10 Issue 1 e02241-18 mbio.asm.org 3

D ow

nl oa

de d

fr om

h tt

ps :/

/j ou

rn al

s. as

m .o

rg /j

ou rn

al /m

bi o

on 0

2 F

eb ru

ar y

20 22

b y

24 .1

16 .2

51 .2

22 .

increasing number of studies generating tremendous volumes of host-associated microbiome sequence data requires theoretical development to interpret these rela- tionships. Coevolved microbial symbionts are presumed to be intimately linked with host fitness and metabolism (36); therefore, understanding these relationships in marine invertebrates will have direct implications for health and disease processes in these animals. Three research criteria arise for examining coevolution in marine inver- tebrates: (i) identifying stochastic and deterministic microbial components of the microbiome, (ii) assessing codivergence of host and microbe, and (iii) confirming an intimate association between host and microbe related to shared metabolic function (metabolic collaboration). While each of these criteria may be fulfilled without the involvement of coevolution (26, 37, 38), evidence of their existence in combination provides a strong basis for establishing coevolution patterns (Fig. 2). This review positions these three criteria in coevolution as representing a complementary approach to the study of complex marine invertebrate microbiomes by drawing from examples of model systems. Focussing on keystone coral reef invertebrates, this review also evaluates the current evidence for each criterion. Finally, while parasites and pathogens also contribute to host coevolution, the focus of this review is mutualistic symbionts; thus, pathogens and parasitism are not discussed.

BOX 2: GLOSSARY (i) Codivergence. Two organisms which speciate or diverge in parallel as illus-

trated by topological congruency of phylogenetic trees. (ii) Coevolution. Reciprocal adaptation of one (or more) lineage(s) in response to

another (or others). (iii) Holobiont. A host organism and its associated microbial community. (iv) Hologenome. The collective genomes of a host and its associated microbial

community, which may act as a unit of selection or at discrete levels. (v) Metabolic collaboration. Two or more oganisms that are linked through

metabolic interactions, generally to the benefit of one another. (vi) Metagenome. The collective microbial genes recovered from an environmen-

tal sample, usually predominantly prokaryotic. (vii) Metatranscriptomics. Quantification of the total microbial mRNA in a sample

as an indication of gene expression and active microbial functions. (viii) Microbiome. The total genetic make-up of a microbial community associated

with a habitat. (ix) Microbiota. The community of microorganisms residing in a particular habitat,

usually a host organism. (x) Phylosymbiosis. The rentention of a host phylogenetic signal within its

associated microbial community. (xi) Virome. The total viral genetic content recovered from an environmental

sample.

UNTANGLING PATTERNS OF HOST-MICROBE COEVOLUTION IN A WEB OF MICROBES

(i) Phylosymbiosis and neutral theory—identifying stochastic and determinis- tic components of the microbiome. Host-microbe coevolution may occur to some degree at the level of the hologenome, i.e., reciprocal evolution of the host genome and microbiome (12). Therefore, it is necessary to understand microbial community structure and population dynamics within the host environment. This may illustrate (i) that the microbiome associated with a host is structured through phylogenetically related host traits and may therefore retain a host phylogenetic signal (phylosymbiosis) and (ii) that certain microbes deviate from the expected patterns of neutral population dynamics, i.e., stochastic births and deaths and immigration. It is likely that phylosym- biosis and neutral population dynamics are linked; therefore, their potential to con- tribute to coevolution is discussed together.

Minireview ®

January/February 2019 Volume 10 Issue 1 e02241-18 mbio.asm.org 4

D ow

nl oa

de d

fr om

h tt

ps :/

/j ou

rn al

s. as

m .o

rg /j

ou rn

al /m

bi o

on 0

2 F

eb ru

ar y

20 22

b y

24 .1

16 .2

51 .2

22 .

Homocysteine + Serine (host diet & metabolism)

Cystathionine B-synthase (symbiont enzyme)

Cystathionine

Cystathionine y-lyase (host enzyme)

Cysteine

S-H CH2

C COOH H2N H

Bacteria spp. 1

Host spp. A

d)

Host phylogeny Microbial dendrogram Host species

A

B

C

D

a)

b)

Host phylogeny Microbial phylogeny

Bacteria spp. 1

A

B

C

D

Relative abundance of microbes in host sample

Fr eq

ue nc

y of

m ic

ro be

s in

h os

t Bacteria spp. 1

Bacteria spp.2 Low High

0

1

c)

FIG 2 Hypothetical scenario addressing three criteria for host-microbe coevolution in species A to D. (a) Phylosymbiosis shown through hierarchical clustering of the microbial community, resulting in a microbial dendrogram which mirrors host phylogeny. (b) Neutral model showing the expected occurrence of microbes based on neutral population dynamics (blue line). As the relative abundance increases, so too does the occurrence in host samples. The members of bacterial species group 1 (Bacteria spp. 1) are therefore more abundant than would be expected by chance and may indicate active selection, while the members of Bacteria spp. 2 are less abundant. (c) Codivergence of the members of Bacteria spp. 1 with their hosts. The members of Bacteria spp. 1 are found within the microbial community of each host species and appear to be actively selected for. Their phylogeny indicates a host split at the strain level followed by diversification within each host species. Congruence between host and microbial lineages suggests important host-microbe interactions and warrants further investigation. (d) Metabolic collaboration between the members of Host spp. A and those of Bacteria spp. 1. Fluorescence in-situ hybridization (FISH) confirms that the members of Bacteria spp. 1 are located within bacteriocyte cells in the tissues of Host spp. A. Genome and transcriptome data for each species suggest that the amino acid cysteine is produced by the activity of a metabolic pathway shared between host and microbe. In corals of the genus Acropora, for example, the genome is incomplete with respect to biosynthesis of cysteine and represents a potential pathway for collaborations of host and microbe (101). Hypothetically, the amino acids homocysteine and serine (potentially sourced from host diet and metabolism) are combined to form cystathionine through the enzyme cystathionine V synthase (provided by the host’s endosymbiont). The host enzyme cystathionine �-lyase then breaks down cystathionine to form cysteine.

Minireview ®

January/February 2019 Volume 10 Issue 1 e02241-18 mbio.asm.org 5

D ow

nl oa

de d

fr om

h tt

ps :/

/j ou

rn al

s. as

m .o

rg /j

ou rn

al /m

bi o

on 0

2 F

eb ru

ar y

20 22

b y

24 .1

16 .2

51 .2

22 .

The term “phylosymbiosis” is not intended to imply coevolution (12, 38); however, coevolution of a host and microbiome may reinforce patterns of phylosymbiosis. There are many host traits that correlate with host phylogeny, some of which can act as environmental filters, preventing the establishment of microbes in the host environ- ment. Thus, neutral population dynamics, with host traits acting as an ecological filter to microbial immigration, may be sufficient to result in phylosymbiotic patterns (39, 40). However, host traits are not static; thus, the evolution of these microbial niches may further drive the radiation of the microbes that reside within them. In turn, the continuous colonization over many generations of a microbial community likely adds to the selective pressure on host traits. Therefore, ecological filtering of microbes through host traits and coevolution of a host and microbiome need not be mutually exclusive in the appearance of phylosymbiosis (39). Moreover, assessing patterns of phylosymbiosis and neutral population dynamics also allows the detection of microbes that deviate from these patterns and may identify important microbial species that are actively selected for (or against) by the host. In this context, neutral models can simulate expected microbial abundance, allowing easier detection of microbes that do not fit these patterns (41). This reasoning justifies consideration of phylosymbiosis and microbial population dynamics in assessing coevolution in complex holobionts.

Patterns of phylosymbiosis are frequently detected in complex holobionts. One particular study tested for phylosymbiosis across 24 species of terrestrial animals from 4 groups that included Peromyscus deer mice, Drosophila flies, mosquitos, and Nasonia wasps and an additional data set of 7 hominid species (42). Since these animals (with the exception of hominids) could be reared under controlled laboratory conditions, environmental influences could be eliminated, leaving the host as the sole factor influencing the microbial community. Under these conditions, phylosymbiotic patterns were clearly observed for all five groups, with phylogenetically related taxa sharing similar microbial communities and microbial dendrograms mirroring host phylogenies. Similar patterns of phylosymbiosis have been observed in a growing number of terrestrial systems, including all five gut regions in rodents (43), the skin of ungulates (44), the distal gut in hominids (45), and roots of multiple plant phyla (46), providing evidence that such patterns are common among host-associated microbiomes.

In the marine environment, two major studies, one involving 236 colonies across 32 genera of scleractinian coral collected from the east and west coasts of Australia (47) and the other involving 804 samples of 81 sponge species collected from the Atlantic Ocean, Pacific Ocean, and Indian Ocean and the Mediterranean Sea and Red Sea (32), have provided the most convincing examples of phylosymbiosis. Both studies found a significant evolutionary signal of the host with respect to microbial diversity and composition. Specifically, mantel tests were used to delineate the finding that closely related corals and sponges hosted more extensively similar microbial communities in terms of composition than would be expected by chance. In the case of corals, the similarity was seen in the skeleton and, to a lesser extent, in the tissue microbiome, while the mucus microbiome was more highly influenced by the surrounding environ- ment (47). However, both studies found that host species was the strongest factor in explaining dissimilarity among microbial communities. Additional studies on both cold water and tropical sponges have found similar phylogenetic patterns within the microbiome of the host species (48, 49). Together, these results suggest that host phylogeny (or associated traits) has a significant role in structuring associated microbial communities, although there are additional factors related to host identity (and unre- lated to phylogeny) that also likely play a major role.

Most studies to date have focused on the microbes that adhere to these patterns of phylosymbiosis, though more-useful information arguably could be determined from the microbes that do not. Since phylosymbiosis is a pattern that shows correlations between microbiome dissimilarity and host phylogeny, it does not indicate active microbial selection or cospeciation (38), and the species that deviate from these patterns would be interesting targets for studies of codivergence and metabolic collaboration (see below). Neutral models have been applied to three species of

Minireview ®

January/February 2019 Volume 10 Issue 1 e02241-18 mbio.asm.org 6

D ow

nl oa

de d

fr om

h tt

ps :/

/j ou

rn al

s. as

m .o

rg /j

ou rn

al /m

bi o

on 0

2 F

eb ru

ar y

20 22

b y

24 .1

16 .2

51 .2

22 .

sponges, a jellyfish, and a sea anemone, and while neutral models have been shown to fit well to the expectation of microbial abundance in sponges (which also show phylosymbiosis), jellyfish and sea anemone microbiomes were found to be associated with a higher level of nonneutrality (40). Potential reasons for nonneutrality include the presence of a more sophisticated immune system in cnidarians that provides active selection on certain microbial taxa and that the microbiomes in such cases are more transient or a combination of the two. In summary, neutral population dynamics filtered through phylogenetically related host traits likely result in, or at least contribute to, the observed patterns of phylosymbiosis. This does not necessarily mean that the pattern is unimportant or is not contributing to coevolution at the hologenome level, and it may be that the communities of microbes that follow these patterns are responsible for broad ecological functions (50). On the other hand, microbes that deviate from these patterns may be responsible for more-specific functions and are of high interest to those trying to identify symbionts and coevolution at the microbial species or strain level.

(ii) Codivergence—microbial phylogeny and host phylogeny are congruent. The second criterion in assessing host-microbe coevolution is that of whether individ- ual microbial lineages and their hosts have matching phylogenies (22, 24, 51). Codi- vergence implies a tightly coupled, long-term interaction between two species and can potentially identify beneficial symbionts (or parasites) that have coevolved with the host (26). However, it is also important that codivergence can arise due to processes other than coevolution, such as one species adaptively tracking another, which would imply that the evolution is not reciprocal, or two species responding independently to the same speciation event or environmental stress (37). In known cases of coevolution, phylogenies of hosts and their microbial symbionts are congruent (16, 51, 52). However, in complex and uncharacterized systems, this strategy can be reversed to identify potential symbionts. Therefore, the main value of investigating codivergence in com- plex associations is to identify those specific microbes on which to focus further attention.

Codivergence has been demonstrated in the case of Hydra viridissima, a freshwater relative of marine cnidarians, and its photosymbiont Chlorella (53). In this system, photosynthetically fixed carbohydrates from Chlorella are transported to its host (54), and phylogenetic analysis of 6 strains of H. viridissima and their vertically transmitted symbionts revealed clear congruency of host and symbiont topologies (55). In more- complex systems, patterns of codivergence have been illustrated in the gut microbiota of hominids (25). Analysis of fecal samples from humans, wild chimpanzees, wild bonobos, and wild gorillas showed that four clades of bacteria from the dominant families Bacteroidaceae and Bifidobacteriaceae codiverged with host phylogeny. Impor- tantly, this example illustrates one possible way of identifying codivergence in complex holobionts where the symbionts are unknown. Since bacteria from the families Bacte- roidaceae, Bifidobacteriaceae, and Lachnospiraceae are known to dominate the gut of hominids, multiple primer sets targeting each individual family were utilized, and phylogenetic analyses of the families were completed independently. Furthermore, instead of using the relatively slowly diverging 16S rRNA gene, the fast-evolving and variable gene encoding DNA gyrase subunit B was used for bacterial phylogenetics. Similar methods may be applied to complex marine invertebrates such as coral and sponges, where 16S rRNA gene studies have identified prominent bacteria.

Within complex marine invertebrate holobionts, codivergence has been most clearly demonstrated in cold-water sponges in the family Latrunculiidae. The microbiomes of six species within this family were dominated by a single betaproteobacterial OTU, and the phylogeny of this OTU was highly congruent with that of the host (56). Further- more, gene expression analysis suggested that the dominant betaproteobacteria are active members of the microbiome rather than dormant or nonviable members; however, whether or not this potential symbiont and its host participate in metabolic collaboration is unknown, highlighting an example warranting further investigation. The microbiomes of many other marine invertebrates are dominated by members of

Minireview ®

January/February 2019 Volume 10 Issue 1 e02241-18 mbio.asm.org 7

D ow

nl oa

de d

fr om

h tt

ps :/

/j ou

rn al

s. as

m .o

rg /j

ou rn

al /m

bi o

on 0

2 F

eb ru

ar y

20 22

b y

24 .1

16 .2

51 .2

22 .

the genus Endozoicomonas (57). A pan-genomic analysis of the genomes of seven Endozoicomonas strains representing a broad range of hosts (corals, sponges, and sea slugs) provided some evidence for codivergence (58). Strikingly, the two closely related corals Stylophora pistillata and Pocillopora verrucosa hosted Endozoicomonas with highly similar genomes. A second, large-scale study (47) found that Endozoicomonas species within the coral tissues showed strong signals of codivergence with their hosts; however, they were grouped into two major divisions, namely, the host-specific and host-generalist divisions. The presence of a host-generalist clade may partly explain why the patterns of codivergence did not hold when samples of S. pistillata and P. verrucosa were collected across 28 reefs worldwide (59). Furthermore, the genome of Endozoicomonas is large and appears to be adapted to a planktonic lifestyle (57). Having a free-living stage with respect to the Endozoicomonas life cycle suggests a facultative relationship with corals and would limit the extent of codivergence.

Codivergence may also occur between two symbionts within the microbial com- munity associated with a single host. An interesting example occurs in lower termites, which live in a symbiotic relationship with flagellate protozoa that are essential for the breakdown of lignocellulose obtained from wood particles (60). Within the hindgut, these flagellate protozoa are associated with endosymbiotic prokaryotes, and while the functional basis of this relationship is unclear, matching phylogenies of flagellate host and prokaryote symbiont indicate codivergence (61). The microbiomes of many marine invertebrates also include both eukaryotes and prokaryotes that appear to closely interact with one another. For example, the symbiotic algae Symbiodiniaceae, which reside in the endoderm of the coral tissue, are producers of dimethylsulfoniopropionate (DMSP), which is thought to be metabolized by bacteria within the holobiont (62). Symbiodiniaceae and bacteria are also linked through the nitrogen cycle, where diazotrophs within the holobiont are postulated to fix nitrogen such that it can be used by the endosymbiotic algae (63, 64). Furthermore, the existence of a core microbiome associated with Symbiodiniaceae appears likely, with bacteria affiliated to Marinobacter, Labrenzia, and Chromatiaceae present across 18 cultures of Symbiodiniaceae spanning 5 genera (65). A range of other marine invertebrates, including soft corals, sponges, and molluscs, also host Symbiodiniaceae, and it would be valuable to investigate whether Symbiodiniaceae show codivergence and coevolution with prokaryotes in these sys- tems.

(iii) Metabolic collaboration—intimate association between host and microbe. A third key feature of coevolution is that host and microbe collaborate in a way that is mutually beneficial (15). This is often related to the metabolic function of the microbe, with the host facilitating or complementing that function. This could be in the form of a specialized cell or organ to host microbial symbionts (27), a shared metabolic pathway to produce essential vitamins or amino acids (17), or microbial regulation of certain metabolites produced by the host (19). Metabolic collaboration should be validated where potential candidates for coevolution have been identified through population dynamics and codivergence, as reciprocal evolution necessitates an inter- action between the two species. A key step in demonstrating an interaction, and therefore identifying potential reciprocal evolution, is to look at the genome and transcriptomes of the host and symbionts for evidence of integrated metabolism, combined with targeted in situ visualization of metabolite passage to support the metabolic collaboration.

Sharpshooters, a group of xylem-feeding insects, provide an elegant example of metabolic collaboration between a host and bacterial symbionts. Sharpshooters host two microbial symbionts, Baumannia cicadellinicola and Sulcia muelleri, in their special- ized bacteriocyte cells (36), and both symbionts show patterns of codivergence with their host (66). The genomes of B. cicadellinicola and S. muelleri predict the synthesis of vitamins and essential amino acids, respectively, which are deficient in the diet of sharpshooters (23). Furthermore, these two symbionts not only appear to complement each other in terms of their roles in supplementing the host diet, but each symbiont also appears dependent on the other. Circumstantial evidence suggests that similar

Minireview ®

January/February 2019 Volume 10 Issue 1 e02241-18 mbio.asm.org 8

D ow

nl oa

de d

fr om

h tt

ps :/

/j ou

rn al

s. as

m .o

rg /j

ou rn

al /m

bi o

on 0

2 F

eb ru

ar y

20 22

b y

24 .1

16 .2

51 .2

22 .

functional relationships may exist among marine invertebrates, and the characteriza- tion of these should be a high priority.

Some examples of metabolic collaboration in complex marine invertebrate holo- bionts are represented by sponges. Genome and transcriptome data from Cymbastela concentrica and two of its bacterial symbionts (novel genomes of the Phyllobacteriaceae and Nitrosopumilales) suggest that creatine and creatinine produced by sponge me- tabolism are likely to be degraded to the amino acid glycine by its symbionts (67). Furthermore, gene expression data suggest that the urea produced by creatine deg- radation by the Phyllobacteriaceae symbiont may be transported and degraded by a third bacterial symbiont in the genus Nitrospira (67). The potential for metabolic collaboration also exists between the sponge Theonella swinhoei and its symbiont belonging to “Candidatus Entotheonella.” The genome of “Ca. Entotheonella” possesses the repertoire for production of almost all amino acids as well as rare coenzymes; however, additional research is needed to understand if these products are used by the host (68). While the following does not constitute metabolic collaboration, sponge symbionts also appear to interact with their host through eukaryote-like proteins (ELPs). For example, microbial symbionts associated with different sponges often contain genes coding for ELPs, some of which are phylogenetically similar to those found in sponges and appear to inhibit phagocytosis (69, 70). Furthermore, additional functional domains associated with ELPs suggest that these proteins are transported to the outer membrane, where they are maintained and potentially used in bacterium-host inter- actions (71). A symbiosis maintained through host-bacterium interactions such as this emphasizes the potential for coevolution to take place, although it does not in itself demonstrate reciprocal evolution. Finally, characterizations based on metagenomic and metatranscriptomic data sets require functional validation using techniques such as stable isotope probing (SIP) (for a review, see reference 72). For example, using 14C- and 13C-labeled bicarbonate in combination with autoradiography and nanoscale second- ary ion mass spectrometry (nanoSIMS), symbionts of the colonial ciliate Zoothamnium niveum were shown to fix inorganic carbon and translocate organic carbon to its host (73). In the advent of new technology associated with SIP, future research would benefit from validating the putative microbial functions implied by genomic research.

CORE MICROBIOME AND THE POTENTIAL OF VIRUSES

A core microbial community, i.e., one that has high intraspecies stability, is often the primary focus of microbial ecologists trying to distinguish functionally important taxa from commensals or short-term visitors (74). While a few bacterial lineages have been shown to occur across a large number of corals and other invertebrate species (57, 75), evidence of the existence of a defined and stable core community remains elusive. From a taxonomic perspective, a core community may not exist; instead, a core functional capacity may exist across diverse lineages. In marine sponges, for example, different host species associate with different symbionts that perform equivalent functions (95). Namely, host-specific microbes among different sponge species appear to use different enzymes to perform the same functions in processes such as denitri- fication and ammonium oxidation. However, functional redundancy in microbial eco- systems may not be as common as previously thought, as rare microbial phylotypes have been implicated in specific microbial pathways, while more-abundant phylotypes are positively correlated with broader metabolic functions such as respiration (50). This may have important implications in looking at neutral population dynamics, as those rare taxa that are present more often than expected could be responsible for key microbial functions. The existence of a core community would have obvious implica- tions for coevolution, as universally associated microbes are more likely to have coevolved with their host. If present, reconstruction of phylogenetic relationships of core taxa can illustrate whether microbes also diverge in parallel with their host, leading to further investigations that utilize integrated genomic techniques to identify core functional genes and pathways.

While research on the microbiome of marine invertebrates has focused mostly on

Minireview ®

January/February 2019 Volume 10 Issue 1 e02241-18 mbio.asm.org 9

D ow

nl oa

de d

fr om

h tt

ps :/

/j ou

rn al

s. as

m .o

rg /j

ou rn

al /m

bi o

on 0

2 F

eb ru

ar y

20 22

b y

24 .1

16 .2

51 .2

22 .

prokaryotes and microbial eukaryotes (box 3), there is increasing recognition of the importance of viruses as components of the holobiont, adding to the complexity of an already challenging system (76). Viruses are the most abundant biological entities in the oceans (77) and are likely to play important roles in host-microbe coevolution, as bacteria commonly acquire genes for symbiosis or pathogenicity through lateral gene transfer from viruses (78). For example, the bacterium Hamiltonella defensa is a com- mon symbiont of aphids providing defense against wasp parasitism. However, toxin- encoding genes required for aphid protection occur only after infection from a lyso- genic lambdoid bacteriophage (79). Thus, it is feasible that coevolution of host and symbiont can be made possible through the initial acquisition of symbiont genes from viruses. Furthermore, viruses structure bacterial communities through processes such as cell lysis, thereby adding another form of selective pressure to invertebrate holo- bionts (80). A recent study found that viral communities of corals and sponges are specific to their host species and are distinct from the viral communities inhabiting the surrounding seawater (81). Viruses of the order Caudovirales (tailed bacteriophages) were found across all viromes in the study, often as the dominant member; thus, a host-specific virome combined with a host-specific microbiome could be associated with viral selection and pressure. As a result, by influencing microbial community structure, viruses can have major effects on coevolution within the holobiont. The extent to which viruses influence marine invertebrate holobionts is still unknown; however, future research on reef holobionts would benefit from including analyses of both the viral and prokaryotic communities.

BOX 3: SYMBIODINIACEAE—AN OBLIGATE SYMBIONT AND A COEVOLVED PARTNER?

Dinoflagellates from the family Symbiodiniaceae (see reference 82 for revised taxonomy) are common symbionts of many different marine invertebrates, including cnidarians, sponges, molluscs, and protozoans (83). These photosynthetic dinofla- gellates provide their host with fixed carbon and in return gain inorganic nutrients and a suitable living environment, creating a remarkable symbiosis that is respon- sible for the foundation of coral reef ecosystems (83, 84). The symbiotic lifestyle often leads to a reduction of genome size, and, although the genomes of Symbiodiniaceae are large by comparison with those of many other eukaryotic microbes, they are among the smallest for dinoflagellates. The relatively small genomes typical of the Symbiodiniaceae suggest some degree of adaptation to life inside the host (71), despite the fact that many members of this family are known to have a free-living stage (68, 69). An important exception to this life cycle is the dinoflagellate formerly known as clade C15, which is vertically transmitted in coral hosts, and culturing experiments suggested that it is unlikely that the strain can survive outside the host environment (85). Moreover, this symbiont appears to have lost its genomic poten- tial for motility, representing a likely adaptation to life inside a host (85).

CHALLENGES, FURTHER CONSIDERATIONS, AND CONCLUSIONS

Illustrating reciprocal adaptation of one lineage in response to another is extremely challenging in complex symbiotic systems. While meeting the basic criteria set out in this review does not prove coevolution, it would provide support for the idea of coevolution in host-microbe systems where little is known about the evolutionary origins. In doing so, it is also likely that obligate microbes can be differentiated from transient members of the holobiont. Many factors need to be considered, including common ancestry, the origins of the host-microbe association, and the estimated times of divergence. The butterfly-plant example (box 1) highlights the necessity to distin- guish the possibility of microbes colonizing their host after host evolution has taken place. In the case of the aphid-Buchnera symbiosis, the origin of infection has been dated at 150 to 250 million years ago (MYA), when aphids first diverged from a common ancestor, and Buchnera form a monophyletic group that is exclusively asso-

Minireview ®

January/February 2019 Volume 10 Issue 1 e02241-18 mbio.asm.org 10

D ow

nl oa

de d

fr om

h tt

ps :/

/j ou

rn al

s. as

m .o

rg /j

ou rn

al /m

bi o

on 0

2 F

eb ru

ar y

20 22

b y

24 .1

16 .2

51 .2

22 .

ciated with aphids (16, 36). Within hominids, divergence times were calculated for gut bacteria that show codivergence with their host and were found to coincide with host evolution. Furthermore, the hominid-microbe association appears to have arisen from a common ancestor of all African great apes.

Vertical versus horizontal microbial acquisition may also influence patterns of evolution and should be considered within any study on host-microbe coevolution. Generally speaking, microbes that are acquired vertically, i.e., passed from parent to offspring, are more likely to have coevolved with their host. This is the case for many insect endosymbionts, and their loss of a free-living stage and their subsequent adaptation to the host environment determined many of the coevolution signals previously detailed (23, 36, 86). For example, Buchnera endosymbionts have been passed from parent to offspring for over 100 million years and, as the endosymbiont evolved, it lost many genes required for life outside the host (16). Such patterns may be far more difficult to observe in microbes acquired from the environment (horizontal transmission). Codiversification is more difficult to detect in horizontally acquired symbionts, as the selection pressures include environmental forces that act in concert with the host-imposed pressures. Invertebrates such as cnidarians and sponges can acquire microbial symbionts through both vertical transmission and horizontal trans- mission (87–91), and focusing initially on vertically transmitted microbes would simplify the search for coevolutionary signals.

Consideration of genetic markers and key traits of symbiosis could also be useful for identifying potentially coevolved symbionts. For example, many vertically transmitted endosymbionts have reduced genome sizes compared to their free-living relatives, since many genes may become redundant during adaptation to the host environment (36, 86). Some microbial symbionts are also housed in bacteriocytes or other specialized compartments, and microbial aggregates resembling such associations have been detected in both corals and sponges (102, 103). Microbes housed in these specialized cells represent priority candidates in the search for coevolved relationships. Other trends, such as lower G�C content, high isoelectric point values, and proteins that are quickly evolving relative to those seen with free-living bacteria, are all features of insect endosymbionts (23). Exploring these traits in more-complex systems may also have some utility in the search for coevolved symbionts. Furthermore, observing support for host-symbiont coevolution may require careful choices of appropriate genetic markers due to different divergence rates. In particular, it has been suggested that immune genes should be targeted as they are rapidly evolving and likely to directly influence the microbial community (92). Additionally, unresolved host and microbe genealogies may further confuse patterns of host-microbe coevolution; thus, robust phylogenetic trees and markers are critical to illustrate codivergence.

To begin investigating host-microbe coevolution in complex holobionts, it may be useful to unify studies by investigating a number of model organisms. Marine sponges present an ideal starting point for investigating coevolution in complex systems for a variety of reasons. First, they may represent the earliest animal lineage to have diverged and they host highly stable microbial communities, increasing the likelihood of discov- ering coevolved symbionts. Second, metagenomic analyses in sponges are currently better developed than in other marine invertebrates with complex microbiomes, providing a solid platform with which to investigate coevolution. Third, some evidence of coevolution already exists, with sponges exhibiting codivergence and metabolic collaboration and some species hosting microbial cells within bacteriocytes. However, as yet, no research has traced all the aforementioned traits to a single holobiont species.

In this era of climate change and environmental degradation heavily impacting marine ecosystems (93, 94), there is an urgent need to better understand the microbial processes that underpin invertebrate health and evolution. Following the criteria set out in the review will not only enable exploration of evidence for coevolution but also provide a better understanding of how microbial communities are structured and

Minireview ®

January/February 2019 Volume 10 Issue 1 e02241-18 mbio.asm.org 11

D ow

nl oa

de d

fr om

h tt

ps :/

/j ou

rn al

s. as

m .o

rg /j

ou rn

al /m

bi o

on 0

2 F

eb ru

ar y

20 22

b y

24 .1

16 .2

51 .2

22 .

identify potentially beneficial symbionts which can be targeted using genomic tech- niques to elucidate their specific roles within the holobiont.

ACKNOWLEDGMENTS We thank Hillary Smith (James Cook University [JCU]) for helpful contributions to

figure preparation and Nikolaos Andreakis (JCU) for helpful comments on the manu- script outline. We also thank Pedro Frade (University of Algarve) for insightful discus- sions on the manuscript.

P.A.O. conceived, designed, and drafted the manuscript and figures. D.G.B., N.S.W., and D.J.M. revised the manuscript and made substantial contributions to its design and intellectual content. D.G.B. contributed to figure conception and preparation.

We declare that the review was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

REFERENCES 1. Zaneveld J, Turnbaugh PJ, Lozupone C, Ley RE, Hamady M, Gordon JI,

Knight R. 2008. Host-bacterial coevolution and the search for new drug targets. Curr Opin Chem Biol 12:109 –114. https://doi.org/10.1016/j .cbpa.2008.01.015.

2. Van den Abbeele P, Van de Wiele T, Verstraete W, Possemiers S. 2011. The host selects mucosal and luminal associations of coevolved gut microorganisms: a novel concept. FEMS Microbiol Rev 35:681–704. https://doi.org/10.1111/j.1574-6976.2011.00270.x.

3. Archibald JM. 2015. Endosymbiosis and eukaryotic cell evolution. Curr Biol 25:R911–R921. https://doi.org/10.1016/j.cub.2015.07.055.

4. McFall-Ngai M, Hadfield MG, Bosch TCG, Carey HV, Domazet-Lošo T, Douglas AE, Dubilier N, Eberl G, Fukami T, Gilbert SF, Hentschel U, King N, Kjelleberg S, Knoll AH, Kremer N, Mazmanian SK, Metcalf JL, Nealson K, Pierce NE, Rawls JF, Reid A, Ruby EG, Rumpho M, Sanders JG, Tautz D, Wernegreen JJ. 2013. Animals in a bacterial world, a new imperative for the life sciences. Proc Natl Acad Sci U S A 110:3229 –3236. https:// doi.org/10.1073/pnas.1218525110.

5. Friedman WE. 2009. The meaning of Darwin’s ‘abominable mystery’. Am J Bot 96:5–21. https://doi.org/10.3732/ajb.0800150.

6. Ehrlich PR, Raven PH. 1964. Butterflies and plants: a study in coevolu- tion. Evolution 18:586 – 608. https://doi.org/10.2307/2406212.

7. Janz N, Nylin S. 1998. Butterflies and plants: a phylogenetic study. Evolu- tion 52:486 –502. https://doi.org/10.1111/j.1558-5646.1998.tb01648.x.

8. Ryan MF, Byrne O. 1988. Plant-insect coevolution and inhibition of acetylcholinesterase. J Chem Ecol 14:1965–1975. https://doi.org/10 .1007/BF01013489.

9. Van Valen L. 1974. Molecular evolution as predicted by natural selec- tion. J Mol Evol 3:89 –101. https://doi.org/10.1007/BF01796554.

10. Paterson S, Vogwill T, Buckling A, Benmayor R, Spiers AJ, Thomson NR, Quail M, Smith F, Walker D, Libberton B, Fenton A, Hall N, Brockhurst MA. 2010. Antagonistic coevolution accelerates molecular evolution. Nature 464:275–278. https://doi.org/10.1038/nature08798.

11. Herre EA, Knowlton N, Mueller UG, Rehner SA. 1999. The evolution of mutualisms: exploring the paths between conflict and cooperation. Trends Ecol Evol 14:49 –53. https://doi.org/10.1016/S0169-5347(98) 01529-8.

12. Theis KR, Dheilly NM, Klassen JL, Brucker RM, Baines JF, Bosch TCG, Cryan JF, Gilbert SF, Goodnight CJ, Lloyd EA, Sapp J, Vandenkoorn- huyse P, Zilber-Rosenberg I, Rosenberg E, Bordenstein SR. 2016. Getting the hologenome concept right: an eco-evolutionary framework for hosts and their microbiomes. mSystems 1:e00028-16. https://doi.org/ 10.1128/mSystems.00028-16.

13. Bordenstein SR, Theis KR. 2015. Host biology in light of the microbiome: ten principles of holobionts and hologenomes. PLoS Biol 13:e1002226. https://doi.org/10.1371/journal.pbio.1002226.

14. Zilber-Rosenberg I, Rosenberg E. 2008. Role of microorganisms in the evolution of animals and plants: the hologenome theory of evolution. FEMS Microbiol Rev 32:723–735. https://doi.org/10.1111/j.1574-6976 .2008.00123.x.

15. Wilson ACC, Duncan RP. 2015. Signatures of host/symbiont genome coevolution in insect nutritional endosymbioses. Proc Natl Acad Sci U S A 112:10255–10261. https://doi.org/10.1073/pnas.1423305112.

16. Baumann P, Moran NA, Baumann L. 1997. The evolution and genetics

of aphid endosymbionts. Bioscience 47:12–20. https://doi.org/10.2307/ 1313002.

17. Russell CW, Bouvaine S, Newell PD, Douglas AE. 2013. Shared metabolic pathways in a coevolved insect-bacterial symbiosis. Appl Environ Mi- crobiol 79:6117– 6123. https://doi.org/10.1128/AEM.01543-13.

18. Collins SM, Surette M, Bercik P. 2012. The interplay between the intestinal microbiota and the brain. Nat Rev Microbiol 10:735–742. https://doi.org/10.1038/nrmicro2876.

19. Kennedy PJ, Cryan JF, Dinan TG, Clarke G. 2017. Kynurenine pathway metabolism and the microbiota-gut-brain axis. Neuropharmacology 112:399 – 412. https://doi.org/10.1016/j.neuropharm.2016.07.002.

20. Qin J, Li R, Raes J, Arumugam M, Burgdorf KS, Manichanh C, Nielsen T, Pons N, Levenez F, Yamada T, Mende DR, Li J, Xu J, Li S, Li D, Cao J, Wang B, Liang H, Zheng H, Xie Y, Tap J, Lepage P, Bertalan M, Batto JM, Hansen T, Le Paslier D, Linneberg A, Nielsen HB, Pelletier E, Renault P, Sicheritz-Ponten T, Turner K, Zhu H, Yu C, Li S, Jian M, Zhou Y, Li Y, Zhang X, Li S, Qin N, Yang H, Wang J, Brunak S, Doré J, Guarner F, Kristiansen K, Pedersen O, Parkhill J, Weissenbach J, et al. 2010. A human gut microbial gene catalogue established by metagenomic sequencing. Nature 464:59 – 65. https://doi.org/10.1038/nature08821.

21. Brune A, Dietrich C. 2015. The gut microbiota of termites: digesting the diversity in the light of ecology and evolution. Annu Rev Microbiol 69:145–166. https://doi.org/10.1146/annurev-micro-092412-155715.

22. Fenn K, Blaxter M. 2004. Are filarial nematode Wolbachia obligate mutualist symbionts? Trends Ecol Evol 19:163–166. https://doi.org/10 .1016/j.tree.2004.01.002.

23. Wu D, Daugherty SC, Van Aken SE, Pai GH, Watkins KL, Khouri H, Tallon LJ, Zaborsky JM, Dunbar HE, Tran PL, Moran NA, Eisen JA. 2006. Metabolic complementarity and genomics of the dual bacterial sym- biosis of sharpshooters. PLoS Biol 4:e188. https://doi.org/10.1371/ journal.pbio.0040188.

24. Clark MA, Moran NA, Baumann P, Wernegreen JJ. 2000. Cospeciation between bacterial endosymbionts (Buchnera) and a recent radiation of aphids (Uroleucon) and pitfalls of testing for phylogenetic con- gruence. Evolution 54:517–525. https://doi.org/10.1111/j.0014-3820 .2000.tb00054.x.

25. Moeller AH, Caro-Quintero A, Mjungu D, Georgiev AV, Lonsdorf EV, Muller MN, Pusey AE, Peeters M, Hahn BH, Ochman H. 2016. Cospecia- tion of gut microbiota with hominids. Science 353:380 –382. https://doi .org/10.1126/science.aaf3951.

26. Moran NA. 2006. Symbiosis. Curr Biol 16:R866 –R871. https://doi.org/10 .1016/j.cub.2006.09.019.

27. McFall-Ngai M. 2008. Hawaiian bobtail squid. Curr Biol 18:R1043–R1044. https://doi.org/10.1016/j.cub.2008.08.059.

28. Chen C, Tseng C, Chen CA, Tang S. 2011. The dynamics of microbial partnerships in the coral Isopora palifera. ISME J 5:728 –740. https://doi .org/10.1038/ismej.2010.151.

29. Gil-Agudelo DL, Myers C, Smith GW, Kim K. 2006. Changes in the microbial communities associated with Gorgonia ventalina during aspergillosis infection. Dis Aquat Organ 69:89 –94. https://doi.org/10 .3354/dao069089.

30. Koren O, Rosenberg E. 2006. Bacteria associated with mucus and tissues of the coral Oculina patagonica in summer and winter. Appl

Minireview ®

January/February 2019 Volume 10 Issue 1 e02241-18 mbio.asm.org 12

D ow

nl oa

de d

fr om

h tt

ps :/

/j ou

rn al

s. as

m .o

rg /j

ou rn

al /m

bi o

on 0

2 F

eb ru

ar y

20 22

b y

24 .1

16 .2

51 .2

22 .

Environ Microbiol 72:5254 –5259. https://doi.org/10.1128/AEM.00554 -06.

31. Littman RA, Willis BL, Pfeffer C, Bourne DG. 2009. Diversities of coral- associated bacteria differ with location, but not species, for three acroporid corals on the Great Barrier Reef. FEMS Microbiol Ecol 68: 152–163. https://doi.org/10.1111/j.1574-6941.2009.00666.x.

32. Thomas T, Moitinho-Silva L, Lurgi M, Björk JR, Easson C, Astudillo-García C, Olson JB, Erwin PM, López-Legentil S, Luter H, Chaves-Fonnegra A, Costa R, Schupp PJ, Steindler L, Erpenbeck D, Gilbert J, Knight R, Ackermann G, Victor Lopez J, Taylor MW, Thacker RW, Montoya JM, Hentschel U, Webster NS. 2016. Diversity, structure and convergent evolution of the global sponge microbiome. Nat Commun 7:11870. https://doi.org/10.1038/ncomms11870.

33. Webster NS, Thomas T. 2016. The sponge hologenome. mBio 7:e00135 -16. https://doi.org/10.1128/mBio.00135-16.

34. Röthig T, Costa RM, Simona F, Baumgarten S, Torres AF, Radhakrishnan A, Aranda M, Voolstra CR. 2016. Distinct bacterial communities associ- ated with the coral model Aiptasia in aposymbiotic and symbiotic states with Symbiodinium. Front Mar Sci 3. https://doi.org/10.3389/ fmars.2016.00234.

35. Erwin PM, Pineda MC, Webster N, Turon X, López-Legentil S. 2014. Down under the tunic: bacterial biodiversity hotspots and widespread ammonia-oxidizing archaea in coral reef ascidians. ISME J 8:575–588. https://doi.org/10.1038/ismej.2013.188.

36. Moran NA, Baumann P. 2000. Bacterial endosymbionts in animals. Curr Opin Microbiol 3:270 –275. https://doi.org/10.1016/S1369-5274 (00)00088-6.

37. Moran NA, Sloan DB. 2015. The hologenome concept: helpful or hol- low? PLoS Biol 13:e1002311. https://doi.org/10.1371/journal.pbio .1002311.

38. Douglas AE, Werren JH. 2016. Holes in the hologenome: why host- microbe symbioses are not holobionts. mBio 7:e02099-15. https://doi .org/10.1128/mBio.02099-15.

39. Mazel F, Davis KM, Loudon A, Kwong WK, Groussin M, Parfrey LW. 2018. Is host filtering the main driver of phylosymbiosis across the tree of life? mSystems 3:e00097-18. https://doi.org/10.1128/mSystems.00097-18.

40. Sieber M, Pita L, Weiland-Bräuer N, Dirksen P, Wang J, Mortzfeld B, Franzenburg S, Schmitz RA, Baines JF, Fraune S, Hentschel U, Schulen- burg H, Bosch TCG, Traulsen A. 2018. The neutral metaorganism. bioRxiv https://doi.org/10.1101/367243.

41. Sloan WT, Lunn M, Woodcock S, Head IM, Nee S, Curtis TP. 2006. Quantifying the roles of immigration and chance in shaping prokaryote community structure. Environ Microbiol 8:732–740. https://doi.org/10 .1111/j.1462-2920.2005.00956.x.

42. Brooks AW, Kohl KD, Brucker RM, van Opstal EJ, Bordenstein SR. 2016. Phylosymbiosis: relationships and functional effects of microbial com- munities across host evolutionary history. PLoS Biol 14:e2000225. https://doi.org/10.1371/journal.pbio.2000225.

43. Kohl KD, Dearing MD, Bordenstein SR. 2018. Microbial communities exhibit host species distinguishability and phylosymbiosis along the length of the gastrointestinal tract. Mol Ecol 27:1874 –1883. https://doi .org/10.1111/mec.14460.

44. Ross AA, Müller KM, Weese JS, Neufeld JD. 2018. Comprehensive skin microbiome analysis reveals the uniqueness of human skin and evi- dence for phylosymbiosis within the class Mammalia. Proc Natl Acad Sci U S A 115:E5786 –E5795. https://doi.org/10.1073/pnas.1801302115.

45. Ochman H, Worobey M, Kuo CH, Ndjango JBN, Peeters M, Hahn BH, Hugenholtz P. 2010. Evolutionary relationships of wild hominids reca- pitulated by gut microbial communities. PLoS Biol 8:e1000546. https:// doi.org/10.1371/journal.pbio.1000546.

46. Yeoh YK, Dennis PG, Paungfoo-Lonhienne C, Weber L, Brackin R, Ragan MA, Schmidt S, Hugenholtz P. 2017. Evolutionary conserva- tion of a core root microbiome across plant phyla along a tropical soil chronosequence. Nat Commun 8:215. https://doi.org/10.1038/ s41467-017-00262-8.

47. Pollock FJ, McMinds R, Smith S, Bourne DG, Willis BL, Medina M, Thurber RV, Zaneveld JR. 2018. Coral-associated bacteria demonstrate phylosymbiosis and cophylogeny. Nat Commun 9:4921. https://doi .org/10.1038/s41467-018-07275-x.

48. Schöttner S, Hoffmann F, Cárdenas P, Rapp HT, Boetius A, Ramette A. 2013. Relationships between host phylogeny, host type and bacterial community diversity in cold-water coral reef sponges. PLoS One 8:e55505. https://doi.org/10.1371/journal.pone.0055505.

49. Easson CG, Thacker RW. 2014. Phylogenetic signal in the community

structure of host-specific microbiomes of tropical marine sponges. Front Microbiol 5:532. https://doi.org/10.3389/fmicb.2014.00532.

50. Rivett DW, Bell T. 2018. Abundance determines the functional role of bacterial phylotypes in complex communities. Nat Microbiol 3:767–772. https://doi.org/10.1038/s41564-018-0180-0.

51. Nishiguchi MK, Ruby EG, McFall-Ngai MJ. 1998. Competitive dominance among strains of luminous bacteria provides an unusual form of evi- dence for parallel evolution in sepiolid squid-vibrio symbioses. Appl Environ Microbiol 64:3209 –3213.

52. Bandi C, Anderson TJC, Genchi C, Blaxter ML. 1998. Phylogeny of Wolbachia in filarial nematodes. Proc Biol Sci 265:2407–2413. https:// doi.org/10.1098/rspb.1998.0591.

53. Deines P, Bosch TCG. 2016. Transitioning from microbiome composi- tion to microbial community interactions: the potential of the metaor- ganism hydra as an experimental model. Front Microbiol 7:1610. https://doi.org/10.3389/fmicb.2016.01610.

54. Mews LK, Smith DC. 1980. The green hydra symbiosis. III. The biotrophic transport of carbohydrate from alga to animal. Proc R Soc Lond B Biol Sci 209:377– 401. https://doi.org/10.1098/rspb.1980.0101.

55. Kawaida H, Ohba K, Koutake Y, Shimizu H, Tachida H, Kobayakawa Y. 2013. Symbiosis between hydra and chlorella: molecular phylogenetic analysis and experimental study provide insight into its origin and evolution. Mol Phylogenet Evol 66:906 –914. https://doi.org/10.1016/j .ympev.2012.11.018.

56. Matcher GF, Waterworth SC, Walmsley TA, Matsatsa T, Parker-Nance S, Davies-Coleman MT, Dorrington RA. 2017. Keeping it in the family: coevolution of latrunculid sponges and their dominant bacterial sym- bionts. Microbiologyopen 6:e00417. https://doi.org/10.1002/mbo3.417.

57. Neave MJ, Apprill A, Ferrier-Pagès C, Voolstra CR. 2016. Diversity and function of prevalent symbiotic marine bacteria in the genus Endozo- icomonas. Appl Microbiol Biotechnol 100:8315– 8324. https://doi.org/ 10.1007/s00253-016-7777-0.

58. Neave MJ, Michell CT, Apprill A, Voolstra CR. 2017. Endozoicomonas genomes reveal functional adaptation and plasticity in bacterial strains symbiotically associated with diverse marine hosts. Sci Rep 7:40579. https://doi.org/10.1038/srep40579.

59. Neave MJ, Rachmawati R, Xun L, Michell CT, Bourne DG, Apprill A, Voolstra CR. 2017. Differential specificity between closely related corals and abundant Endozoicomonas endosymbionts across global scales. ISME J 11:186 –200. https://doi.org/10.1038/ismej.2016.95.

60. Brune A. 2014. Symbiotic digestion of lignocellulose in termite guts. Nat Rev Microbiol 12:168 –180. https://doi.org/10.1038/nrmicro3182.

61. Ikeda-Ohtsubo W, Brune A. 2009. Cospeciation of termite gut flagel- lates and their bacterial endosymbionts: Trichonympha species and ‘Candidatus Endomicrobium trichonymphae’. Mol Ecol 18:332–342. https://doi.org/10.1111/j.1365-294X.2008.04029.x.

62. Raina JB, Tapiolas D, Willis BL, Bourne DG. 2009. Coral-associated bacteria and their role in the biogeochemical cycling of sulfur. Appl Environ Microbiol 75:3492–3501. https://doi.org/10.1128/AEM.02567 -08.

63. Lema KA, Willis BL, Bourne DG. 2012. Corals form characteristic associ- ations with symbiotic nitrogen-fixing bacteria. Appl Environ Microbiol 78:3136 –3144. https://doi.org/10.1128/AEM.07800-11.

64. Rädecker N, Pogoreutz C, Voolstra CR, Wiedenmann J, Wild C. 2015. Nitrogen cycling in corals: the key to understanding holobiont func- tioning? Trends Microbiol 23:490 – 497. https://doi.org/10.1016/j.tim .2015.03.008.

65. Lawson CA, Raina JB, Kahlke T, Seymour JR, Suggett DJ. 2018. Defining the core microbiome of the symbiotic dinoflagellate, Sym- biodinium. Environ Microbiol Rep 10:7–11. https://doi.org/10.1111/ 1758-2229.12599.

66. Takiya DM, Tran PL, Dietrich CH, Moran NA. 2006. Co-cladogenesis spanning three phyla: leafhoppers (Insecta: Hemiptera: Cicadellidae) and their dual bacterial symbionts. Mol Ecol 15:4175– 4191. https://doi .org/10.1111/j.1365-294X.2006.03071.x.

67. Moitinho-Silva L, Díez-Vives C, Batani G, Esteves AIS, Jahn MT, Thomas T. 2017. Integrated metabolism in sponge-microbe symbiosis revealed by genome-centered metatranscriptomics. ISME J 11:1651–1666. https://doi.org/10.1038/ismej.2017.25.

68. Lackner G, Peters EE, Helfrich EJN, Piel J. 2017. Insights into the lifestyle of uncultured bacterial natural product factories associated with ma- rine sponges. Proc Natl Acad Sci U S A 114:E347–E356. https://doi.org/ 10.1073/pnas.1616234114.

69. Nguyen MTHD, Liu M, Thomas T. 2014. Ankyrin-repeat proteins from

Minireview ®

January/February 2019 Volume 10 Issue 1 e02241-18 mbio.asm.org 13

D ow

nl oa

de d

fr om

h tt

ps :/

/j ou

rn al

s. as

m .o

rg /j

ou rn

al /m

bi o

on 0

2 F

eb ru

ar y

20 22

b y

24 .1

16 .2

51 .2

22 .

sponge symbionts modulate amoebal phagocytosis. Mol Ecol 23: 1635–1645. https://doi.org/10.1111/mec.12384.

70. Reynolds D, Thomas T. 2016. Evolution and function of eukaryotic-like proteins from sponge symbionts. Mol Ecol 25:5242–5253. https://doi .org/10.1111/mec.13812.

71. Díez-Vives C, Moitinho-Silva L, Nielsen S, Reynolds D, Thomas T. 2017. Expression of eukaryotic-like protein in the microbiome of sponges. Mol Ecol 26:1432–1451. https://doi.org/10.1111/mec.14003.

72. Berry D, Loy A. 2018. Stable-isotope probing of human and animal microbiome function. Trends Microbiol 13:999-1007. https://doi.org/10 .1016/j.tim.2018.06.004.

73. Volland JM, Schintlmeister A, Zambalos H, Reipert S, Mozetič P, Espada- Hinojosa S, Turk V, Wagner M, Bright M. 2018. NanoSIMS and tissue autoradiography reveal symbiont carbon fixation and organic carbon transfer to giant ciliate host. ISME J 12:714 –727. https://doi.org/10 .1038/s41396-018-0069-1.

74. Hernandez-Agreda A, Gates RD, Ainsworth TD. 2017. Defining the core microbiome in corals’ microbial soup. Trends Microbiol 25:125–140. https://doi.org/10.1016/j.tim.2016.11.003.

75. Ainsworth TD, Krause L, Bridge T, Torda G, Raina JB, Zakrzewski M, Gates RD, Padilla-Gamiño JL, Spalding HL, Smith C, Woolsey ES, Bourne DG, Bongaerts P, Hoegh-Guldberg O, Leggat W. 2015. The coral core microbiome identifies rare bacterial taxa as ubiquitous endosymbionts. ISME J 9:2261–2274. https://doi.org/10.1038/ismej.2015.39.

76. Weynberg KD, Wood-Charlson EM, Suttle CA, van Oppen MJH. 2014. Generating viral metagenomes from the coral holobiont. Front Micro- biol 5:206. https://doi.org/10.3389/fmicb.2014.00206.

77. Wommack KE, Colwell RR. 2000. Virioplankton: viruses in aquatic eco- systems. Microbiol Mol Biol Rev 64:69 –114. https://doi.org/10.1128/ MMBR.64.1.69-114.2000.

78. Ochman H, Moran NA. 2001. Genes lost and genes found: evolution of bacterial pathogenesis and symbiosis. Science 292:1096 –1099. https:// doi.org/10.1126/science.1058543.

79. Oliver KM, Degnan PH, Hunter MS, Moran NA. 2009. Bacteriophages encode factors required for protection in a symbiotic mutualism. Sci- ence 325:992–994. https://doi.org/10.1126/science.1174463.

80. Bettarel Y, Bouvier T, Nguyen HK, Thu PT. 2015. The versatile nature of coral-associated viruses. Environ Microbiol 17:3433–3439. https://doi .org/10.1111/1462-2920.12579.

81. Laffy PW, Wood-Charlson EM, Turaev D, Jutz S, Pascelli C, Bell SC, Peirce TE, Weynberg KD, Van OMJH, Rattei T, Webster NS. 25 March 2018. Reef invertebrate viromics : diversity, host specificity and functional capac- ity. Environ Microbiol https://doi.org/10.1111/1462-2920.14110.

82. LaJeunesse TC, Parkinson JE, Gabrielson PW, Jeong HJ, Reimer JD, Voolstra CR, Santos SR. 2018. Systematic revision of Symbiodiniaceae highlights the antiquity and diversity of coral endosymbionts. Curr Biol 28:2570 –2580.e6. https://doi.org/10.1016/j.cub.2018.07.008.

83. Stat M, Carter D, Hoegh GO. 2006. The evolutionary history of Symbio- dinium and scleractinian hosts—symbiosis, diversity, and the effect of climate change. Perspect Plant Ecol Evol Syst 8:23– 43. https://doi.org/ 10.1016/j.ppees.2006.04.001.

84. Rowan R. 1998. Diversity and Ecology of Zooxanthellae on coral reefs. J Phycol 34:407– 417. https://doi.org/10.1046/j.1529-8817 .1998.340407.x.

85. Krueger T, Gates RD. 2012. Cultivating endosymbionts – host environ- mental mimics support the survival of Symbiodinium C15 ex hospite. J Exp Mar Bio Ecol 413:169 –176. https://doi.org/10.1016/j.jembe.2011.12 .002.

86. Fisher RM, Henry LM, Cornwallis CK, Kiers ET, West SA. 2017. The evolution of host-symbiont dependence. Nat Commun 8:15973. https://doi.org/10.1038/ncomms15973.

87. Ceh J, van Keulen M, Bourne DG. 2013. Intergenerational transfer of specific bacteria in corals and possible implications for offspring fitness. Microb Ecol 65:227–231. https://doi.org/10.1007/s00248-012-0105-z.

88. Leite DCA, Leão P, Garrido AG, Lins U, Santos HF, Pires DO, Castro CB, van Elsas JD, Zilberberg C, Rosado AS, Peixoto RS. 2017. Broadcast spawning coral Mussismilia hispida can vertically transfer its associated bacterial core. Front Microbiol 8:176. https://doi.org/10.3389/fmicb .2017.00176.

89. Sharp KH, Distel D, Paul VJ. 2012. Diversity and dynamics of bacterial communities in early life stages of the Caribbean coral Porites as- treoides. ISME J 6:790 – 801. https://doi.org/10.1038/ismej.2011.144.

90. Sharp KH, Ritchie KB, Schupp PJ, Ritson-Williams R, Paul VJ. 2010. Bacterial acquisition in juveniles of several broadcast spawning coral species. PLoS One 5:e10898. https://doi.org/10.1371/journal.pone .0010898.

91. Sharp KH, Eam B, Faulkner DJ, Haygood MG. 2007. Vertical transmission of diverse microbes in the tropical sponge Corticium sp. Appl Environ Microbiol 73:622– 629. https://doi.org/10.1128/AEM.01493-06.

92. Brucker RM, Bordenstein SR. 2012. Speciation by symbiosis. Trends Ecol Evol 27:443– 451. https://doi.org/10.1016/j.tree.2012.03.011.

93. Hughes TP, Kerry JT, Álvarez-Noriega M, Álvarez-Romero JG, Anderson KD, Baird AH, Babcock RC, Beger M, Bellwood DR, Berkelmans R, Bridge TC, Butler IR, Byrne M, Cantin NE, Comeau S, Connolly SR, Cumming GS, Dalton SJ, Diaz-Pulido G, Eakin CM, Figueira WF, Gilmour JP, Harrison HB, Heron SF, Hoey AS, Hobbs JPA, Hoogenboom MO, Kennedy EV, Kuo CY, Lough JM, Lowe RJ, Liu G, McCulloch MT, Malcolm HA, McWilliam MJ, Pandolfi JM, Pears RJ, Pratchett MS, Schoepf V, Simpson T, Skirving WJ, Sommer B, Torda G, Wachenfeld DR, Willis BL, Wilson SK. 2017. Global warming and recurrent mass bleaching of corals. Nature 543: 373–377. https://doi.org/10.1038/nature21707.

94. Hughes TP, Barnes ML, Bellwood DR, Cinner JE, Cumming GS, Jackson JBC, Kleypas J, Van De Leemput IA, Lough JM, Morrison TH, Palumbi SR, Van Nes EH, Scheffer M. 2017. Coral reefs in the Anthropocene. Nature 546:82–90. https://doi.org/10.1038/nature22901.

95. Fan L, Reynolds D, Liu M, Stark M, Kjelleberg S, Webster NS, Thomas T. 2012. Functional equivalence and evolutionary convergence in com- plex communities of microbial sponge symbionts. Proc Natl Acad Sci 109:E1878 –E1887. https://doi.org/10.1073/pnas.1203287109.

96. Bourne DG, Dennis PG, Uthicke S, Soo RM, Tyson GW, Webster N. 2013. Coral reef invertebrate microbiomes correlate with the presence of photosymbionts. ISME J 7:1452–1458.

97. Li J, Chen Q, Long LJ, Dong J De, Yang J, Zhang S. 2014. Bacterial dynamics within the mucus, tissue and skeleton of the coral Porites lutea during different seasons. Sci Rep 4:1– 8.

98. Hakim JA, Koo H, Kumar R, Lefkowitz EJ, Morrow CD, Powell ML, Watts SA, Bej AK. 2016. The gut microbiome of the sea urchin, Lytechinus variegatus, from its natural habitat demonstrates selective attributes of microbial taxa and predictive metabolic profiles. FEMS Microbiol Ecol 92:1–12.

99. Wessels W, Sprungala S, Watson SA, Miller DJ, Bourne DG. 2017. The microbiome of the octocoral Lobophytum pauciflorum: Minor differ- ences between sexes and resilience to short-term stress. FEMS Micro- biol Ecol 93:1–13.

100. Ngangbam AK, Baten A, Waters DLE, Whalan S, Benkendorff K. 2015. Characterization of bacterial communities associated with the Tyrian purple producing gland in a marine gastropod. PLoS One 10:1–19.

101. Shinzato C, Shoguchi E, Kawashima T, Hamada M, Hisata K, Tanaka M, Fujie M, Fujiwara M, Koyanagi R, Ikuta T, Fujiyama A, Miller DJ, Satoh N. 2011. Using the Acropora digitifera genome to understand coral re- sponses to environmental change. Nature 476:320 –323.

102. Work TM, Aeby GS. 2014. Microbial aggregates within tissues infect a diversity of corals throughout the Indo-Pacific. Mar Ecol Prog Ser 500:1–9.

103. Maldonado M. 2007. Intergenerational transmission of symbiotic bac- teria in oviparous and viviparous demosponges, with emphasis on intracytoplasmically-compartmented bacterial types. J Mar Biol Assoc United Kingdom 87:1701–1713.

Minireview ®

January/February 2019 Volume 10 Issue 1 e02241-18 mbio.asm.org 14

D ow

nl oa

de d

fr om

h tt

ps :/

/j ou

rn al

s. as

m .o

rg /j

ou rn

al /m

bi o

on 0

2 F

eb ru

ar y

20 22

b y

24 .1

16 .2

51 .2

22 .

  • UNTANGLING PATTERNS OF HOST-MICROBE COEVOLUTION IN A WEB OF MICROBES
    • (i) Phylosymbiosis and neutral theory—identifying stochastic and deterministic components of the microbiome.
    • (ii) Codivergence—microbial phylogeny and host phylogeny are congruent.
    • (iii) Metabolic collaboration—intimate association between host and microbe.
  • CORE MICROBIOME AND THE POTENTIAL OF VIRUSES
  • CHALLENGES, FURTHER CONSIDERATIONS, AND CONCLUSIONS
  • ACKNOWLEDGMENTS
  • REFERENCES

,

ESSAY

Natural experiments and long-term

monitoring are critical to understand and

predict marine host–microbe ecology and

evolution

Matthieu LerayID 1‡*, Laetitia G. E. WilkinsID

2¤‡ , Amy ApprillID

3 , Holly M. BikID

4 ,

Friederike Clever 1,5

, Sean R. ConnollyID 1 , Marina E. De León

1,2 , J. Emmett DuffyID

6 ,

Leïla Ezzat7, Sarah Gignoux-WolfsohnID8, Edward Allen Herre1, Jonathan Z. KayeID9, David I. KlineID

1 , Jordan G. KuenemanID

1 , Melissa K. McCormickID

8 , W. Owen McMillan

1 ,

Aaron O’DeaID 1,10*, Tiago J. PereiraID

4 , Jillian M. PetersenID

11 , Daniel F. PetticordID

1 ,

Mark E. Torchin 1 , Rebecca Vega ThurberID

12 , Elin VidevallID

13,14 , William T. WcisloID

1 ,

Benedict YuenID 11

, Jonathan A. EisenID 2,15,16

1 Smithsonian Tropical Research Institute, Balboa, Ancon, Republic of Panama, 2 UC Davis Genome

Center, University of California, Davis, Davis, California, United States of America, 3 Marine Chemistry and

Geochemistry Department, Woods Hole Oceanographic Institution, Woods Hole, Massachusetts, United

States of America, 4 Department of Marine Sciences and Institute of Bioinformatics, University of Georgia,

Athens, Georgia, United States of America, 5 Department of Natural Sciences, Manchester Metropolitan

University, Manchester, United Kingdom, 6 Tennenbaum Marine Observatories Network, Smithsonian

Environmental Research Center, Edgewater, Maryland, United States of America, 7 Department of Ecology,

Evolution and Marine Biology, University of California Santa Barbara, Santa Barbara, California, United

States of America, 8 Smithsonian Environmental Research Center, Edgewater, Maryland, United States of

America, 9 Gordon and Betty Moore Foundation, Palo Alto, California, United States of America,

10 Department of Biological, Geological and Environmental Sciences, University of Bologna, Bologna, Italy,

11 Centre for Microbiology and Environmental Systems Science, University of Vienna, Vienna, Austria,

12 Department of Microbiology, Oregon State University, Corvallis, Oregon, United States of America,

13 Center for Conservation Genomics, Smithsonian Conservation Biology Institute, Washington, DC, United

States of America, 14 Department of Ecology and Evolutionary Biology, Brown University, Providence,

Rhode Island, United States of America, 15 Department of Evolution and Ecology, University of California,

Davis, Davis, California, United States of America, 16 Department of Medical Microbiology and Immunology,

University of California, Davis, Davis, California, United States of America

¤ Current address: Max Planck Institute for Marine Microbiology, Department of Symbiosis, Bremen, Germany

‡ These authors share first authorship on this work.

* [email protected] (ML); [email protected] (AO)

AbstractAU : Pleaseconfirmthatallheadinglevelsarerepresentedcorrectly: Marine multicellular organisms host a diverse collection of bacteria, archaea, microbial

eukaryotes, and viruses that form their microbiome. Such host-associated microbes can sig-

nificantly influence the host’s physiological capacities; however, the identity and functional

role(s) of key members of the microbiome (“core microbiome”) in most marine hosts coexist-

ing in natural settings remain obscure. Also unclear is how dynamic interactions between

hosts and the immense standing pool of microbial genetic variation will affect marine eco-

systems’ capacity to adjust to environmental changes. Here, we argue that significantly

advancing our understanding of how host-associated microbes shape marine hosts’ plastic

and adaptive responses to environmental change requires (i) recognizing that individual

host–microbe systems do not exist in an ecological or evolutionary vacuum and (ii)

PLOS BIOLOGY

PLOS Biology | https://doi.org/10.1371/journal.pbio.3001322 August 19, 2021 1 / 18

a1111111111

a1111111111

a1111111111

a1111111111

a1111111111

OPEN ACCESS

Citation: Leray M, Wilkins LGE, Apprill A, Bik HM,

Clever F, Connolly SR, et al. (2021) Natural

experiments and long-term monitoring are critical

to understand and predict marine host–microbe

ecology and evolution. PLoS Biol 19(8): e3001322.

https://doi.org/10.1371/journal.pbio.3001322

Published: August 19, 2021

Copyright: This is an open access article, free of all

copyright, and may be freely reproduced,

distributed, transmitted, modified, built upon, or

otherwise used by anyone for any lawful purpose.

The work is made available under the Creative

Commons CC0 public domain dedication.

Funding: Financial support for the workshop was

provided by grant GBMF5603 (https://doi.org/10.

37807/GBMF5603) from the Gordon and Betty

Moore Foundation (W.T. Wcislo, J.A. Eisen, co-

PIs), and additional funding from the Smithsonian

Tropical Research Institute and the Office of the

Provost of the Smithsonian Institution (W.T.

Wcislo, J.P. Meganigal, and R.C. Fleischer, co-PIs).

JP was supported by a WWTF VRG Grant and the

ERC Starting Grant ’EvoLucin’. LGEW has received

funding from the European Union’s Framework

Programme for Research and Innovation Horizon

2020 (2014-2020) under the Marie Sklodowska-

Curie Grant Agreement No. 101025649. AO was

supported by the Sistema Nacional de

Investigadores (SENACYT, Panamá). A. Apprill was

supported by NSF award OCE-1938147. D.I. Kline,

M. Leray, S.R. Connolly, and M.E. Torchin were

expanding the field toward long-term, multidisciplinary research on entire communities of

hosts and microbes. Natural experiments, such as time-calibrated geological events associ-

ated with well-characterized environmental gradients, provide unique ecological and evolu-

tionary contexts to address this challenge. We focus here particularly on mutualistic

interactions between hosts and microbes, but note that many of the same lessons and

AU : Anabbreviationlisthasbeencompiledforthoseusedinthemaintext:Pleaseverifythatallentriesarecorrect:approaches would apply to other types of interactions.

Main

It is widely recognized that host-associated microbes play profound roles in the health of their

marine hosts and the ecosystems they inhabit. Although some such interactions with microbes

are transient, many are more persistent and can be generally described as symbioses. Symbio-

ses come in many flavors including parasitism, commensalism, and mutualism (see Box 1),

and, in this paper, we focus in particular on the mutually beneficial (i.e., mutualistic) subset of

such interactions involving marine hosts. Despite the wide recognition of the importance of

such mutualisms, it remains less clear how these associations scale up to drive broader ecologi-

cal and evolutionary patterns and processes. For example, the contribution of microbes to host

acclimatization and adaptation (see Box 1 for definitions) is an active new field of experimental

research with much potential. Studies, mostly conducted in controlled laboratory settings,

have evaluated the ecological costs/benefits for hosts to associate temporarily with different

microbes (e.g., corals [1–4]) or to engage in obligate intimate relationships (e.g., bobtail squid

with the bioluminescent bacteria Aliivibrio fischeri [5]). Experimental studies are, however, intrinsically limited in several ways. They limit them-

selves to a small number of experimentally tractable hosts and microbes, and, in doing so,

fail to account for the enormous complexity of interactions and variation that exist in nature

between multiple hosts and their multitudes of associated microbes. Short-lived experiments

(e.g., days to weeks) cannot replicate the scales of time and space involved in the potential

coevolution of hosts and microbes (Box 1). Attempts to merge long-term datasets to reveal

overarching patterns (e.g., [6–9]) have provided valuable insights but are shadowed by the lim-

its and biases introduced by mixing information from different contexts or methodologies

[10]. These limitations obscure general principles on the roles (mutualistic or otherwise) of

host-associated microbes across host individuals, species, and communities [11–13].

Here, we demonstrate the value of moving beyond taxon-centric approaches to studying

host–microbe associations in their natural evolutionary and ecological context. We suggest

intensifying long-term research in well-documented “natural experiments”. Such natural

experiments, including well-calibrated geological events (e.g., vicariance and creation of novel

habitats accurately dated using fossil and geological data) and environmental gradients where

multiple hosts and associated microbes are subjected to the same range of environmental con-

ditions, can be particularly useful (Fig 1). These phenomena provide a unique framework for

comparative studies where the processes of interest occur over spatial and evolutionary time

scales that are nearly impossible to capture in laboratory experiments. The value of combining

experimental and long-term field studies at natural experiments has been recognized by ecolo-

gists [14–16]. We argue that similar approaches should be applied to the study of host–microbe

interactions. We highlight several natural experiments that can advance our understanding of

the ecological and evolutionary mechanisms shaping host–microbe interactions (with a focus

on mutualistic ones) in marine communities and ecosystems.

PLOS BIOLOGY

PLOS Biology | https://doi.org/10.1371/journal.pbio.3001322 August 19, 2021 2 / 18

supported by a Rohr Family Foundation grant for

the Rohr Reef Resilience Project, for which this is

contribution #2. This is contribution #85 from the

Smithsonian’s MarineGEO and Tennenbaum

Marine Observatories Network. The funders had no

role in study design, data collection and analysis,

decision to publish, or preparation of the

manuscript.

Competing interests: The authors have declared

that no competing interests exist.

Abbreviations: LTER, Long-Term Ecological

Research; MarineGEO, Marine Global Earth

Observatory; MBON, Marine Biodiversity

Observation Network; TEP, Tropical Eastern Pacific.

Identifying important players

Marine organisms have evolved complex structural, behavioral, and chemical mechanisms to

regulate the presence, abundance, and activity of their microbial associates. Hosts can limit

colonization by transient opportunistic microbes that would use space and resources without

providing any benefits, and some hosts can even block pathogens entirely [17–19]. Host-spe-

cific and obligate microbial associates, often called the “core microbiome” of a host population

Box 1. Definitions of key terms

Acclimatization: The process by which an organism becomes accustomed to new envi-

ronmental conditions during its lifetime.

Adaptation: A heritable trait of an organism that increases its fitness in its surrounding

environment. In comparison to acclimatization, adaptations will be passed on to the

next generation.

Convergent evolution: Independent origins of similar features in different organisms in

response to separately experiencing similar selective pressures. Importantly, conver-

gently originated features, also known as analogous features, were not present in the

common ancestor of the taxa in question.

Genetic drift: Change in the relative frequency of genotypes due to random variation in

reproduction. Such drift is more common in small populations and leads to changes in

genotype frequencies independent of adaptive forces.

Host–microbe coevolution: During host–microbe coevolution, multicellular hosts and

their associated microbes show a concerted and heritable response to an environmental

change.

Homologous recombination: The process by which two pieces or stretches of DNA that

are very similar in their sequence physically align and exchange nucleotides.

Horizontal gene transfer: The unidirectional movement of DNA, usually only small frac-

tions of a genome, from one organism to another. Though this generally occurs more

frequently within species than between, it can also occur across vast evolutionary

distances.

Metagenomics: Studies of the genetic material of communities of organisms.

Phenotypic plasticity: Phenotypic plasticity is the ability of a specific genotype to pro-

duce more than one phenotype in response to a changing environment during an indi-

vidual’s lifetime. These phenotypic changes may include an organism’s behavior,

morphology, physiology, or other features. Phenotypic plasticity is adaptive if it increases

an individual’s survival and if the ability is passed on to the next generation.

Symbioses: Symbioses are broadly defined as intimate interactions between at least two

organisms where at least one of them benefits. We focus here specifically on mutually

beneficial interactions (aka mutualisms) between multicellular eukaryotes and their

associated microbes. These interactions may include disease resistance, predator avoid-

ance, and nutrition. These interactions will ultimately increase host survival and fitness.

PLOS BIOLOGY

PLOS Biology | https://doi.org/10.1371/journal.pbio.3001322 August 19, 2021 3 / 18

Fig 1. Examples of marine natural experiments as observatories of host–microbe interactions. Regionally focused, long-term,

and taxonomically broad research programs will help fill key knowledge gaps about the nature of microbe functions and the

dynamics of host–microbe interactions in changing oceans. We highlight areas of the world’s oceans where environmental

gradients are well characterized, where the taxonomy and evolutionary history of the local host fauna and flora is already well

established, where paleoecological studies can provide important historical context, where a long-term monitoring program is

ongoing, and where there is significant research infrastructure. Long-term monitoring sites (white dots) include sites of the NSF’s

LTER Network, the Smithsonian Institution’s MarineGEO network of partners, the MBON, the AIMS, and the ASSEMBLE. (1)

NASA MODIS data; (2) Adapted from [93]; (3) Adapted from [73]; (4) Adapted from [74]; (5) Adapted from [94]; (6) Adapted

from [95]. AIMSAU : AbbreviationlistshavebeencompiledforthoseusedinFigs1and4:Pleaseverifythatallentriesarecorrect:, Australian Institute of Marine Science; ASSEMBLE, Association of European Marine Biological Laboratories; LTER, Long-Term Ecological Research; MarineGEO, Marine Global Earth Observatory; MBON, Marine Biodiversity

Observation Network.

https://doi.org/10.1371/journal.pbio.3001322.g001

PLOS BIOLOGY

PLOS Biology | https://doi.org/10.1371/journal.pbio.3001322 August 19, 2021 4 / 18

or species, are generally assumed to play more important functional roles than opportunistic

and transient taxa [20]. This core microbiome is exemplified by an obligate nutritional micro-

bial symbiosis, in which the host relies extensively on microbial partners for survival by syn-

thesis of food, often in a nutrient-limited habitat. The host may acquire these partners

horizontally (from the surrounding environment), vertically (from the parent to the offspring),

or in both ways (mixed mode) [21]. Many evolved symbioses result in codependency; for

example, the genomes of host-associated microbes have lost genes encoding pathways that

were previously essential, such as those for motility or environmental stress responses, but that

became obsolete in obligate symbiotic lifestyles [22]. In return, hosts have evolved mechanisms

to maintain their associated microbes in stable intracellular environments and to support their

nutritional needs [23]. Some of these nutritional associations are clearly identifiable because

symbionts form massive and dense populations, sometimes only consisting of a single micro-

bial species, in or on the bodies of their hosts. Examples include photosynthetic symbioses in

cnidarians [24] and chemosynthetic symbioses in invertebrate animals such as bathymodiolin

mussels, lucinid clams, Riftia tubeworms, and Astomonema nematodes [25,26]. Although widespread, host reliance on a single or few microbes for nutrition are the exception rather

than the rule. The vast majority of animals and plants are instead associated with a diverse

assemblage of microbes where it is challenging to differentiate between members of the core

microbiome and the myriad of transient microbes and even more challenging to determine

what, if any, key functional roles such microbes play.

Several approaches have been proposed to identify key microbes or functions within com-

plex host microbiomes (reviewed in [27]). The most common practice is to identify microbial

taxa that are consistently associated with a host population or species using marker gene

sequencing, usually above some arbitrary prevalence threshold ([28]; but see [29,30] for alter-

native methods). The prevalence of a host–microbe association is typically measured without

explicit attention to co-occurring and closely related host taxa, the surrounding environment,

or adequacy of spatial and temporal sampling. This limited sampling and lack of context, often

resulting from funding constraints, leads to several major limitations. First, a microbial taxon

can be prevalent in a host population for reasons unrelated to its functional role. For example,

it may originate from the host’s food or habitat, including seawater or sediment [31]. Second,

even the core microbiome can change over time [32]. Functionally important microbes may

fluctuate in abundance throughout host ontogeny and may also vary seasonally. Essential host-

associated microbes may be overlooked if the sampling method cannot detect low abundance

reliably, resulting in false negatives, or if sampling is sporadic, missing the life stage or season

when particular microbes are essential. Third, many studies rely upon sequencing of rRNA

genes to characterize communities, yet rRNA genes are generally too conserved to distinguish

closely related taxa and reveal little directly about genomic functional potential. Clearly, under-

standing the functional roles of host-associated microbes requires analyses that go far beyond

individual marker gene profiles and instead encompass other types of information such as

whole genomes or metagenomes, transcriptomes, metabolomes, localization, biochemistry,

and more. Fourth, taxon-focused studies may miss valuable information about interactions

that could be gleaned from broader comparative analyses. Microbes that are specific to particu-

lar host genotypes, host species, or closely related groups of hosts, indicating a shared evolu-

tionary history, are likely candidates for core microbes with specialized functions (e.g., gut

fermenters associated with herbivores). These existing limitations could be robustly circum-

vented via whole-ecosystem studies where long-term collection of comprehensive genomic-

level datasets (e.g., ‘omic scale information) would transform our understanding of host–

microbe interactions at all levels.

PLOS BIOLOGY

PLOS Biology | https://doi.org/10.1371/journal.pbio.3001322 August 19, 2021 5 / 18

To instigate this new approach, we recommend strategically intensifying research within a

few ocean regions. This entails collecting large scale data on host-associated microbes across

phylogenetically diverse sets of co-occurring host organisms, together with data on surround-

ing free-living microbes (i.e., in seawater and sediments) through time in areas where the sur-

rounding abiotic environment and community dynamics have been well characterized. A

regionally focused and coordinated approach will allow identifying environmental sources

and hosts that serve as reservoirs of key host-associated microbial taxa and genes. Long-term

investments in research on particular communities of hosts and microbes will also help estab-

lish links between changes in core microbiome composition, environmental factors, ecosystem

function, and resilience. Public archival of genomic data and samples (available for comple-

mentary analysis using emerging technologies) collected from a few intensively studied ocean

regions will foster transformative discoveries on dynamic host–microbe relationships. Habi-

tat-forming corals, sponges, seagrasses, and mangrove trees are important focal groups, since

breakdowns in the associations between these species and their microbiomes likely dispropor-

tionately influence other taxa and ecosystem functions. However, this should not come at the

expense of research on more inconspicuous and overlooked, yet functionally important taxa

that comprise the majority of the oceans’ biological diversity (e.g., small fish that fuel marine

food webs [33] and urchins and crustaceans that feed on algae that can displace corals [34]).

Systematic biases toward studying certain taxa (vertebrates, species with large body sizes,

charismatic fauna), partly caused by the lack of coordination, have clearly affected our under-

standing of the distribution and roles of host-associated microbes. For example, a recent

microbiome comparison of several Indo-Pacific invertebrate species demonstrated that

sponges have a less specific microbiome than had been assumed for many years [35]. Expand-

ing the taxonomic breadth of host–microbe studies will be most fruitful in areas where taxo-

nomically rigorous field guides, ecological survey data, and functional trait databases are

available. Substantial progress will also occur where phylogenetic relationships are known and

local expert taxonomists can be engaged. One of the numerous potential outcomes includes

building community-wide association matrices to unveil the extent of reliance between hosts

and microbial partners (specificity versus ubiquity, obligate versus facultative) and the interac-

tions that promote the stability of core microbiomes.

Role of microbes in host acclimatization and adaptation

Host-associated microbes can rapidly respond to extrinsic factors such as extreme or anoma-

lous environmental conditions (e.g., heatwaves, hypoxia), pathogens, anthropogenic distur-

bances (e.g., pollution, overfishing, aquaculture, invasive species), and acute and chronic

stressors [36,37]. They can also quickly change in response to factors intrinsic to the host (e.g.,

changes in host physiology [38]). The dynamic nature of microbes may provide a source of

ecological and evolutionary novelty to support potential host response mechanisms that aug-

ment the host’s own evolutionary potential. Host-associated microbial communities can shift

rapidly through the loss, gain, or replacement of individual members. Individual microbial

cells can make rapid physiological adjustments during their lifetime (plasticity) or within a few

generations (adaptation) [39] (Fig 2). In many microbes, relatively high rates of mutation and

exchange of genetic material among divergent lineages (through homologous recombination

and horizontal gene transfer) generate a high frequency of new genetic variants, some of

which may be better suited to novel conditions (Fig 2). These mechanisms contribute to fuel-

ing an immense standing pool of genetic variation that hosts can potentially draw upon. The

outcomes of the collective ecological and evolutionary response of hosts and their associated

microbes to environmental change may comprise 1 of 4 nonmutually exclusive scenarios

PLOS BIOLOGY

PLOS Biology | https://doi.org/10.1371/journal.pbio.3001322 August 19, 2021 6 / 18

[40,41]: (1) Imbalance: a temporary or permanent change of host fitness and microbial func-

tions leading to increased disease susceptibility; (2) Resistance: the microbiome continues per-

forming its functions and the host does not lose or gain fitness; (3) Acclimatization: the newly

formed microbial community in conjunction with host phenotypic plasticity enable the indi-

vidual host to adjust and maintain performance under changing environmental conditions

(Fig 2); and (4) Adaptation: in the long term, newly formed interactions between host geno-

types and associated microbes increase the fitness of the symbiosis and they become heritable

(Fig 2).

The role that host-associated microbes play in their host’s response to environmental

change is also influenced by their mode of transmission (Fig 3). While vertical transmission

may help ensure the intergenerational stability of mutualistic symbioses, the dependence on

symbionts with highly simplified and inflexible genomes is a risky strategy under variable or

unpredictable stressful conditions [42,43]. Vertically transmitted symbionts have fewer oppor-

tunities to exchange genes with the vast pool of genetic diversity available in the external

Fig 2. Conceptual representation of the role of microbes in host acclimatization and adaptation. Microbes can frequently adapt to environmental

changes more rapidly than their host because of shorter generation times and higher standing genetic variation. Changes that occur at the levels of

individual microbes and microbiomes can rapidly generate phenotypic plasticity in a broad range of host traits (i.e., one host genotype expresses multiple

phenotypes induced by microbes). Microbially induced phenotypes may promote host adaptation if they become heritable traits. Within microbiomes,

transient microbes (thin dashed circles) have limited effects on host phenotype. On the other hand, core microbes (thick dashed circles) that engage in

prolonged relationships with hosts and potentially coevolve with hosts likely alter host phenotypes and promote host adaptation. Note that the time scale at

which evolutionary changes occur varies widely between organisms, but adaptation is generally slower than acclimatization. Plain line: nonaltered

interaction; dashed line: altered interaction; colors of microbes represent different microbial taxa.

https://doi.org/10.1371/journal.pbio.3001322.g002

PLOS BIOLOGY

PLOS Biology | https://doi.org/10.1371/journal.pbio.3001322 August 19, 2021 7 / 18

environment, which could constrain the adjustment of these associations to rapidly changing

conditions. In the marine environment, the vast majority of mutualistic symbionts are

acquired horizontally from the surrounding environment or from other hosts [44]; this

includes associations where a host is entirely dependent upon a single or a few symbionts for

nutrition (e.g., tubeworms [45]; mussels [46]). Horizontal transmission has important implica-

tions for the adaptive potential of hosts [47]. The ability to acquire microbes and genes from

the surrounding environment allows hosts to access the huge evolutionary potential contained

within the larger microbial communities. Hosts with horizontally acquired microbes could

thus be better positioned to adjust and become resilient to changing environmental condi-

tions. Selection that maintains and fine-tunes the relationship could subsequently lead to adap-

tive genetic change.

Several key bottlenecks currently impede our understanding of how host-associated

microbes drive the initial response as well as long-term, evolutionary adaptation to climate

change–related disturbances in hosts with diverse microbial communities. First, changes in

microbiomes that confer adverse or beneficial outcomes for the host cannot be distinguished

from natural variability without adequate measures of host phenotypes that covary with fitness.

Unlike photosymbiotic organisms that exhibit quantifiable phenotypic responses to stress

Fig 3. The role of microbes in the host’s response to environmental changes is contingent upon their predominant mode of transmission.

Microbes that are present in the marine environment represent a vast pool of standing genetic variation. The majority of marine species with horizontal

(e.g., lucinid clams and snapping shrimps) or mixed mode of symbiont acquisition (e.g., sponges) interact with a large number of microbes that they

acquire during their lifetime. The ability to draw on this large evolutionary potential by switching microbes or gaining new genes potentially allows

hosts to respond rapidly to environmental changes. At the other end of the spectrum, the few marine hosts with strictly vertically transmitted symbionts

(e.g., flatworms) have less opportunity to exchange genes to rapidly adjust the symbiosis to changing conditions.

https://doi.org/10.1371/journal.pbio.3001322.g003

PLOS BIOLOGY

PLOS Biology | https://doi.org/10.1371/journal.pbio.3001322 August 19, 2021 8 / 18

(e.g., using a bleaching index or symbiont density), the early signs of physiological stress are

difficult to observe and measure in the vast majority of marine host–microbe associations. Sec-

ond, studies are rarely designed to disentangle causes from effects. Before–after studies corre-

late seemingly altered microbial communities with perturbations or diseases, often without

establishing causality in the relationship [48,49]. Third, most research in this field has been

conducted over temporal scales that are not suited for understanding processes of acclimatiza-

tion and adaptation that may occur over months to decades [50]. Single or multistressor

laboratory experiments conducted over days to weeks are powerful means to identify environ-

mental thresholds beyond which the host–microbiome interactions become disrupted [51].

However, how experimental results can be extrapolated to understand the response of natural

systems exposed to ambient microbes and heterogeneous stressors in their natural environ-

ment remains unclear. Fourth, the response of host–microbe mutualistic symbioses to stress-

ors is partly shaped by the environmental conditions experienced during the lifetime of the

host and by previous generations, although that information is rarely considered or available.

For example, the susceptibility of corals to future environmental changes is partly contingent

upon changes in algal symbiont composition that occurred as a result of previous exposures

to temperature anomalies (i.e., symbiont shuffling in the controversial adaptive bleaching

hypothesis [52]). Therefore, the tolerance of hosts and their host-associated microbes to envi-

ronmental change is rarely interpretable without ecological context [53]. Finally, there is a

dearth of paired host and microbial genomes in public databases. The lack of population-wide

data relating traits of interest to host and microbial genomic variation at the individual level

(i.e., genome-wide association studies) limits our understanding of how genomic innovations

contribute to host acclimatization and adaptation [54].

Bolstering our understanding of the mechanisms of host–microbe evolution requires

investing resources into long-term multidisciplinary research on diverse communities of hosts

and microbes distributed across well-characterized environmental gradients. Rigorously

designed comparative population genomic studies and field experiments (e.g., reciprocal

transplants) combined with measures of host phenotypes using methods such as in situ imag-

ing [55], immunological assays [56], gene expression [57], metabolomic profiling [58], and

behavioral assays [59] will illuminate adaptive genetic variants, how they are transferred

among microbial strains across host communities, and their impacts upon host fitness.

Repeated through time, these measures will provide unique insights into how microbiome-

mediated phenotypic plasticity may allow hosts to rapidly accommodate to novel environ-

ments or resources (e.g., microbes allow some host individuals to obtain nutrients from novel

foods) through periodic (e.g., seasonal fluctuations) and transient environmental changes (e.g.,

heat waves). For foundational, long-lived, and large colonial host species, noninvasive methods

exist for repetitive sampling of tagged individuals (e.g., for corals [60]). The focus should also

expand beyond foundation species to include small, ecologically important host organisms

and those with life history strategies that make them particularly tractable for transgenera-

tional studies. This approach will only be fruitful if integrated measures of hosts and micro-

biomes are collected over multiple generations (i.e., beyond the time scale of a typical scientific

project), where physiochemical parameters are being monitored, and where the evolutionary

history of the local host fauna and flora is already well established. Targeted comparative

research can similarly leverage natural experiments that have played out over longer time

scales. Sudden discontinuities in the distribution of many closely related populations and

species have been linked to geological vicariant effects, sharp environmental gradients, or a

combination of both [61]. Organisms on opposing sides of dispersal barriers (sometimes

impassable) follow different evolutionary trajectories under the influence of local environmen-

tal conditions [62]. These systems provide unique historical contexts in which researchers can

PLOS BIOLOGY

PLOS Biology | https://doi.org/10.1371/journal.pbio.3001322 August 19, 2021 9 / 18

generate testable hypotheses about the role that host-associated microbes played in the evolu-

tion of host traits. Signatures of convergent evolution, evident at the ecosystem-wide level (i.e.,

similar patterns observed across many hosts and symbionts that have been exposed to similar

selective pressures), likely reflect fundamental principles of adaptation [63].

Examples of natural experiments

Natural experiments are past events or gradients that allow researchers to explore biological

patterns and processes on spatial and temporal scales that far exceed those possible in the labo-

ratory. Natural experiments may or may not be created or altered by humans and have been

the bread and butter of natural historians, biogeographers, and evolutionary biologists for

decades. Building on this substantial body of conceptual work, we propose that natural experi-

ments can also enlighten our understanding of the evolution and ecology of host-associated

microbes and their hosts. We present examples of natural experiments where the outcomes of

complex interactions can be observed with replication to provide insights into the processes

underlying host–microbe evolution. Our examples focus on well-characterized systems where

host evolution has already been well explored, thereby allowing “tests” that approach the rigor

of laboratory experiments. We expect that studying natural experiments like these will allow

general principles of host–microbe evolution to emerge when repeated patterns are observed

within a system or across different systems.

Biogeography

The formation of the Isthmus of Panama presents an unparalleled opportunity for exploring

the roles of biogeographic isolation and environmental change in structuring host-associated

microbes (Fig 4). In the Miocene, populations of marine organisms and their microbial symbi-

onts moved freely between the Tropical Eastern Pacific (TEP) and Caribbean in a large, unified

tropical faunal province dominated by high primary productivity and seasonal upwelling [64].

Gradually, over millions of years, this shared faunal province became severed by uplift of the

Isthmus of Panama, which finally closed approximately 2.8 Ma (million years ago) [65]. The

Caribbean became nutrient poor, causing widespread extinction and a concurrent prolifera-

tion of coral reefs and immigration of new biotas [66]. In contrast, the TEP continued to expe-

rience strong seasonal upwelling and nutrient-rich conditions. In many cases, closely related

animal hosts diverged and followed separate evolutionary trajectories, adapting to the strongly

contrasting environments on opposite sides of the Isthmus. Presumably, their associated

microbiomes did so too. Today’s Caribbean and TEP marine ecosystems of Panama and

Central America are home to hundreds of sister species that emerged through transisthmian

vicariance, representing all major taxonomic groups. Decades of research have identified phy-

logenetic relationships between hosts, as well as the behavioral, physiological, and genetic

mechanisms involved in host divergence and reproductive isolation [65]. These data place

host-associated microbes into an unrivaled ecological and evolutionary framework.

Ocean gateways that remain open today also present unique attributes suitable for natural

experiments. The narrow Strait of Bab al Mandab connects the warm and saline semi-enclosed

Red Sea with the open and more variable Arabian Sea. The Red Sea is host to many endemic

species (5% to 13% endemic across a range of taxa [67]), while the pronounced seasonal varia-

tions in the Arabian Sea have driven fine-scale local adaptations [68]. Although the Mediterra-

nean has been connected to the Atlantic through the Strait of Gibraltar since the end of the

Messinian Salinity Crisis 5.3 Ma [69], the modern Mediterranean fauna bears the more recent

imprint of Pleistocene glacial and interglacial cycles. Temperature shifts in the basin over the

last 2 to 3 Million years dictated whether subtropical or higher latitude taxa could successfully

PLOS BIOLOGY

PLOS Biology | https://doi.org/10.1371/journal.pbio.3001322 August 19, 2021 10 / 18

colonize the basin from the Atlantic and subsequent basin wide extinctions [70]. The historical

context of these ocean gateways and their impacts on gene flow have been explored in a myriad

of organisms ranging from plants to invertebrates, fish, and mammals.

Other important biogeographic regions characterized by unique environmental conditions,

long-term data collection, and good scientific infrastructure include the Great Barrier Reef

[71], the Baltic Sea [72], the Larsen B ice shelf [73], Ischia Island [74], and the French Polyne-

sian island of Moorea [75] (Fig 1). Extensive research networks such as the National Science

Foundation’s Long-Term Ecological Research (LTER) Network, the Smithsonian Institution’s

Marine Global Earth Observatory (MarineGEO) network of partners, and the Marine

Fig 4. Methodological approach to leveraging a natural experiment, the Isthmus of Panama, for the long-term study of host–microbe ecology and

evolution. Present-day organisms physically separated by the Isthmus of Panama are adapted to the distinct environmental conditions of the

productive TEP and the oligotrophic Caribbean. In the Gulf of Panama of the TEP, organisms experience some of the most drastic annual fluctuations

in temperature, pH, oxygen, salinity, and nutrients, due to intense seasonal upwelling. Conversely, the nearby Gulf of Chiriquı́ of the TEP experiences

weak to no upwelling due to trade winds being largely blocked by the Cordillera Central mountain range. Multidisciplinary and long-term research on

hosts and associated microbes across these environmental spatiotemporal gradients, where decades of taxonomic, ecological, and evolutionary research

can be leveraged, will help capture the dynamics of host–microbe interactions. TEP, Tropical Eastern Pacific.

https://doi.org/10.1371/journal.pbio.3001322.g004

PLOS BIOLOGY

PLOS Biology | https://doi.org/10.1371/journal.pbio.3001322 August 19, 2021 11 / 18

Biodiversity Observation Network (MBON; Fig 1) are set to play a fundamental role in provid-

ing researchers with logistical access (field labs and sites) to these marine ecosystems and rig-

orously collected physicochemical and biological contextual data [via, for example, long-term

deployment of sondes (CTDs) and data loggers, standardized visual surveys, and other meth-

ods] at a global scale (Fig 1). The many examples of crucial long-term support networks typi-

cally overlook host-associated microbes. They can serve as a good model going forward or

they could be leveraged to facilitate comparative studies that map microbial variation across

communities of hosts from unique marine ecosystems to help us elucidate how host–microbe

associations adjust to changes in their environment at multiple temporal (from seasonal to

geological) and spatial scales (from local to biogeographical; Fig 4).

Emergence of volcanic islands

Novel habitats such as remote island archipelagos that formed over relatively recent geological

history also offer exceptional opportunities to study evolutionary processes in marine and ter-

restrial host-associated mutualistic microbes. Initially barren, shallow coastal areas were colo-

nized by marine organisms from neighboring areas that subsequently evolved in conditions

that are often drastically different from their native environments. Three archipelagos in par-

ticular, Hawai’i and Marquesas, located at the periphery of the Indo-Pacific region, and the

Galapagos in the TEP, have provided tremendous opportunities to study evolution through

comparative phylogeography (Fig 1). All three are composed of young islands (25 to 0.75 Ma,

5.5 to 0.4 Ma, and 3.2 to 0.05 Ma, respectively; reviewed in [76]) with high proportions of

endemic species (25.0% [77], 13.7% [78], and 13.6% [79] for fishes, respectively). The shallow

coastal habitats of the islands within these archipelagos were colonized sequentially by marine

species as they formed, resulting in a “progression” pattern whereby evolutionarily older line-

ages consistently occur on older islands [80]. These regions provide a unique historical context

for understanding the evolution of host-associated microbes and their roles in driving host

ecological success when new ecological opportunities emerge.

Ongoing human-induced changes

Marine communities are changing rapidly in the face of climate change and other anthropo-

genic activities [81]. The physicochemical parameters associated with the catastrophic changes

occurring over contemporary timescales are now relatively well characterized, but the effects

on most host-associated microbes are still virtually unknown [82]. Coral bleaching is a notable

exception. As host species and their associated microbes shift in distribution, they often face

novel abiotic and biotic conditions. For example, melting of ice is opening new pathways for

the movement of animals, plants, and microbes through the Arctic, from the North Pacific to

the North Atlantic, leading to one of the largest species invasions ever observed [83]. The grad-

ual increase in salinity caused by the expansion of the Panama Canal, along with predicted

increased runoff and evaporation, will likely result in greater movement of marine species

between the tropical Western Atlantic and the TEP [84] (Fig 4). Construction of the Suez

Canal in 1869 caused an influx of saline water into the Mediterranean that was followed by the

intrusion of invasive species from the subtropical Red Sea [85]. Rats introduced to islands of

the Chagos Archipelago precipitated a decline in bird density, thereby reducing the nitrogen

input on land and in the sea with downstream effects on coral reef productivity [86]. Finally,

many tropical species are expanding their distributions with the warming climate [87]. For

example, mangrove trees take advantage of the lower frequency of freezes to colonize salt

marshes [88], which allows many invertebrate and fish species to simultaneously expand their

ranges. Additional anthropogenic pressures stem from episodic or localized disasters such as

PLOS BIOLOGY

PLOS Biology | https://doi.org/10.1371/journal.pbio.3001322 August 19, 2021 12 / 18

the 2010 Deepwater Horizon oil spill in the Gulf of Mexico [89], anoxic events (Bocas del Toro

[90]), sediment runoff events (Great Barrier Reef [91]), as well as water pollution and eutrophi-

cation around large urban centers such as Jakarta, Hong Kong, and Singapore [92] (Fig 1).

These anthropogenic changes provide multiple opportunities to understand how the rapid

evolutionary potential of host-associated microbes underpins adaptive evolution in hosts.

Conclusions

Understanding what changes in host-associated microbes mean for the maintenance of marine

communities and ecosystems requires measurements that go far beyond the typical life span of

a publicly funded scientific project. The integration of microbial sampling into long-term eco-

logical monitoring programs across key geographic locations will help us identify important

core and transient host-associated microbes and provide the fundamental basis for mechanis-

tic studies. Researchers should focus on the vast majority of marine animals and plants that

are able to interchange microbial partners, genes, and functions with surrounding microbial

communities. The future of marine ecosystems around the globe may in part depend upon

the ability of marine organisms to dip into the enormous pool of microbes and harness their

remarkable genetic potential.

Acknowledgments

We thank the staff of the Smithsonian Bocas del Toro Research Station, Rachel Collin, Jennifer

McMillan, and Patricia Leiro for helping with the logistics of the #istmobiome workshop

(December 9 to 13, 2019, Bocas del Toro) during which some of these ideas were discussed.

We thank Kendall D. Clements (ORCID: 0000-0001-8512-5977), A. Murat Eren (ORCID:

0000-0001-9013-4827), Niko Leisch (ORCID: 0000-0001-7375-3749), J. Patrick Megonigal

(ORCID: 0000-0002-2018-7883), Luis C. Mejı́a (ORCID: 0000-0003-2135-5241), Emilia M.

Sogin (ORCID: 0000-0001-7533-3705), and Blake Ushijima (ORCID: 0000-0002-1053-5207)

for participating in the discussions. Illustrations by Natalie Renier (http://nrenier.com/),

Woods Hole Oceanographic Institution.

References 1. Chakravarti LJ, Beltran VH, van Oppen MJH. Rapid thermal adaptation in photosymbionts of reef-build-

ing corals. Glob Chang Biol. 2017; 23:4675–4688. https://doi.org/10.1111/gcb.13702 PMID: 28447372

2. van Oppen MJH, Bongaerts P, Frade P, Peplow LM, Boyd SE, Nim HT, et al. Adaptation to reef habitats

through selection on the coral animal and its associated microbiome. Mol Ecol. 2018; 27:2956–2971.

https://doi.org/10.1111/mec.14763 PMID: 29900626

3. Rosado PM, Leite DCA, Duarte GAS, Chaloub RM, Jospin G, Nunes da Rocha U, et al. Marine probiot-

ics: increasing coral resistance to bleaching through microbiome manipulation. ISME J. 2019; 13:921–

936. https://doi.org/10.1038/s41396-018-0323-6 PMID: 30518818

4. Voolstra CR, Ziegler M. Adapting with microbial help: microbiome flexibility facilitates rapid responses

to environmental change. BioEssays. 2020; 42:2000004. https://doi.org/10.1002/bies.202000004

PMID: 32548850

5. Cohen ML, Mashanova EV, Rosen NM, Soto W. Adaptation to temperature stress by Vibrio fischeri

facilitates this microbe’s symbiosis with the Hawaiian bobtail squid (Euprymna scolopes). Evolution.

2019; 73:1885–1897. https://doi.org/10.1111/evo.13819 PMID: 31397886

6. Cornejo-Granados F, Gallardo-Becerra L, Leonardo-Reza M, Ochoa-Romo JP, Ochoa-Leyva A. A

meta-analysis reveals the environmental and host factors shaping the structure and function of the

shrimp microbiota. PeerJ. 2018; 6:e5382. https://doi.org/10.7717/peerj.5382 PMID: 30128187

7. Huggett MJ, Apprill A. Coral microbiome database: integration of sequences reveals high diversity and

relatedness of coral-associated microbes. Environ Microbiol Rep. 2019; 11:372–385. https://doi.org/10.

1111/1758-2229.12686 PMID: 30094953

PLOS BIOLOGY

PLOS Biology | https://doi.org/10.1371/journal.pbio.3001322 August 19, 2021 13 / 18

8. Sullam KE, Essinger SD, Lozupone CA, O’Connor MP, Rosen GL, Knight R, et al. Environmental and

ecological factors that shape the gut bacterial communities of fish: a meta-analysis. Mol Ecol. 2012;

21:3363–3378. https://doi.org/10.1111/j.1365-294X.2012.05552.x PMID: 22486918

9. Thomas T, Moitinho-Silva L, Lurgi M, Björk JR, Easson C, Astudillo-Garcı́a C, et al. Diversity, structure

and convergent evolution of the global sponge microbiome. Nat Commun. 2016; 7:11870. https://doi.

org/10.1038/ncomms11870 PMID: 27306690

10. Lozupone CA, Stombaugh J, Gonzalez A, Ackermann G, Wendel D, Vázquez-Baeza Y, et al. Meta-

analyses of studies of the human microbiota. Genome Res. 2013; 23:1704–1714. https://doi.org/10.

1101/gr.151803.112 PMID: 23861384

11. Antwis RE, Griffiths SM, Harrison XA, Aranega-Bou P, Arce A, Bettridge AS, et al. Fifty important

research questions in microbial ecology. FEMS Microbiol Ecol. 2017; 93:fix044. https://doi.org/10.1093/

femsec/fix044 PMID: 28379446

12. Cullen CM, Aneja KK, Beyhan S, Cho CE, Woloszynek S, Convertino M, et al. Emerging priorities for

microbiome research. Front Microbiol. 2020; 11:136be. https://doi.org/10.3389/fmicb.2020.00136

PMID: 32140140

13. Wilkins LGE, Leray M, O’Dea A, Yuen B, Peixoto RS, Pereira TJ, et al. Host-associated microbiomes

drive structure and function of marine ecosystems. PLoS Biology. 2019; 17:e3000533. https://doi.org/

10.1371/journal.pbio.3000533 PMID: 31710600

14. Sagarin R, Pauchard A. Observational approaches in ecology open new ground in a changing world.

Front Ecol Environ. 2010; 8:379–386. https://doi.org/10.1890/090001

15. Barley SC, Meeuwig JJ. The power and the pitfalls of large-scale, unreplicated natural experiments.

Ecosystems. 2017; 20:331–339. https://doi.org/10.1007/s10021-016-0028-5

16. Hewitt JE, Thrush SF, Dayton PK, Bonsdorff E. The effect of spatial and temporal heterogeneity on the

design and analysis of empirical studies of scale-dependent systems. Am Nat. 2007; 169:398–408.

https://doi.org/10.1086/510925 PMID: 17243075

17. Douglas AE. Housing microbial symbionts: evolutionary origins and diversification of symbiotic organs

in animals. Philos Trans R Soc Lond B Biol Sci. 2020; 375:20190603. https://doi.org/10.1098/rstb.2019.

0603 PMID: 32772661

18. Foster KR, Schluter J, Coyte KZ, Rakoff-Nahoum S. The evolution of the host microbiome as an eco-

system on a leash. Nature. 2017; 548:43–51. https://doi.org/10.1038/nature23292 PMID: 28770836

19. McLaren MR, Callahan BJ. Pathogen resistance may be the principal evolutionary advantage provided

by the microbiome. Philos Trans R Soc Lond B Biol Sci. 2020; 375:20190592. https://doi.org/10.1098/

rstb.2019.0592 PMID: 32772671

20. Shade A, Handelsman J. Beyond the Venn diagram: the hunt for a core microbiome. Environ Microbiol.

2012; 14:4–12. https://doi.org/10.1111/j.1462-2920.2011.02585.x PMID: 22004523

21. Bright M, Bulgheresi S. A complex journey: transmission of microbial symbionts. Nat Rev Microbiol.

2010; 8:218–230. https://doi.org/10.1038/nrmicro2262 PMID: 20157340

22. Moran NA, McCutcheon JP, Nakabachi A. Genomics and evolution of heritable bacterial symbionts.

Annu Rev Genet. 2008; 42:165–190. https://doi.org/10.1146/annurev.genet.41.110306.130119 PMID:

18983256

23. Chomicki G, Werner GDA, West SA, Kiers ET. Compartmentalization drives the evolution of symbiotic

cooperation. Philos Trans R Soc Lond B Biol Sci. 2020; 375:20190602. https://doi.org/10.1098/rstb.

2019.0602 PMID: 32772665

24. van Oppen MJH, Medina M. Coral evolutionary responses to microbial symbioses. Philos Trans R Soc

Lond B Biol Sci. 2020; 375:20190591. https://doi.org/10.1098/rstb.2019.0591 PMID: 32772672

25. Clavijo JM, Donath A, Serôdio J, Christa G. Polymorphic adaptations in metazoans to establish and

maintain photosymbioses. Biol Rev Camb Philos Soc. 2018; 93:2006–2020. https://doi.org/10.1111/

brv.12430 PMID: 29808579

26. Dubilier N, Bergin C, Lott C. Symbiotic diversity in marine animals: the art of harnessing chemosynthe-

sis. Nat Rev Microbiol. 2008; 6:725–740. https://doi.org/10.1038/nrmicro1992 PMID: 18794911

27. Risely A. Applying the core microbiome to understand host–microbe systems. J Anim Ecol. 2020;

89:1549–1558. https://doi.org/10.1111/1365-2656.13229 PMID: 32248522

28. Astudillo-Garcı́a C, Bell JJ, Webster NS, Glasl B, Jompa J, Montoya JM, et al. Evaluating the core

microbiota in complex communities: a systematic investigation. Environ Microbiol. 2017; 19:1450–

1462. https://doi.org/10.1111/1462-2920.13647 PMID: 28078754

29. Shade A, Stopnisek N. Abundance-occupancy distributions to prioritize plant core microbiome

membership. Curr Opin Microbiol. 2019; 49:50–58. https://doi.org/10.1016/j.mib.2019.09.008 PMID:

31715441

PLOS BIOLOGY

PLOS Biology | https://doi.org/10.1371/journal.pbio.3001322 August 19, 2021 14 / 18

30. Clever F, Sourisse JM, Preziosi RF, Eisen JA, Rodriguez Guerra EC, Scott JJ, et al. The gut micro-

biome stability of a butterflyfish is disrupted on severely degraded Caribbean coral reefs. bioRxiv. 2020;

https://doi.org/10.1101/2020.09.21.306712

31. Zhang C, Derrien M, Levenez F, Brazeilles R, Ballal SA, Kim J, et al. Ecological robustness of the gut

microbiota in response to ingestion of transient food-borne microbes. ISME J. 2016; 10:2235–2245.

https://doi.org/10.1038/ismej.2016.13 PMID: 26953599

32. Sharp KH, Pratte ZA, Kerwin AH, Rotjan RD, Stewart FJ. Season, but not symbiont state, drives micro-

biome structure in the temperate coral Astrangia poculata. Microbiome. 2017; 5:120. https://doi.org/10.

1186/s40168-017-0329-8 PMID: 28915923

33. Brandl SJ, Tornabene L, Goatley CHR, Casey JM, Morais RA, Côté IM, et al. Demographic dynamics of

the smallest marine vertebrates fuel coral reef ecosystem functioning. Science. 2019; 364:1189–1192.

https://doi.org/10.1126/science.aav3384 PMID: 31123105

34. Kuempel CD, Altieri AH. The emergent role of small-bodied herbivores in pre-empting phase shifts on

degraded coral reefs. Sci Rep. 2017; 7:39670. https://doi.org/10.1038/srep39670 PMID: 28054550

35. Cleary DFR, Swierts T, Coelho FJRC, Polónia ARM, Huang YM, Ferreira MRS, et al. The sponge

microbiome within the greater coral reef microbial metacommunity. Nat Commun. 2019; 10:1–12.

36. Brothers CJ, Van Der Pol WJ, Morrow CD, Hakim JA, Koo H, McClintock JB. Ocean warming alters pre-

dicted microbiome functionality in a common sea urchin. Proc R Soc B. 2018; 285:20180340. https://

doi.org/10.1098/rspb.2018.0340 PMID: 29925614

37. Cavalcanti GS, Shukla P, Morris M, Ribeiro B, Foley M, Doane MP, et al. Rhodoliths holobionts in a

changing ocean: host-microbes interactions mediate coralline algae resilience under ocean acidifica-

tion. BMC Genomics. 2018; 19:701. https://doi.org/10.1186/s12864-018-5064-4 PMID: 30249182

38. Alverdy JC, Luo JN. The influence of host stress on the mechanism of infection: lost microbiomes,

emergent pathobiomes, and the role of interkingdom signaling. Front Microbiol. 2017; 8:322. https://doi.

org/10.3389/fmicb.2017.00322 PMID: 28303126

39. Brooks AN, Turkarslan S, Beer KD, Lo FY, Baliga NS. Adaptation of cells to new environments. Wiley

Interdiscip Rev Syst Biol Med. 2011; 3:544–561. https://doi.org/10.1002/wsbm.136 PMID: 21197660

40. Pita L, Rix L, Slaby BM, Franke A, Hentschel U. The sponge holobiont in a changing ocean: from

microbes to ecosystems. Microbiome. 2018; 6:46. https://doi.org/10.1186/s40168-018-0428-1 PMID:

29523192

41. Apprill A. The role of symbioses in the adaptation and stress responses of marine organisms. Ann Rev

Mar Sci. 2020; 12:291–314. https://doi.org/10.1146/annurev-marine-010419-010641 PMID: 31283425

42. Kikuchi Y, Tada A, Musolin DL, Hari N, Hosokawa T, Fujisaki K, et al. Collapse of insect gut symbiosis

under simulated climate change. mBio. 2016; 7:e01578–16. https://doi.org/10.1128/mBio.01578-16

PMID: 27703075

43. Zhang B, Leonard SP, Li Y, Moran NA. Obligate bacterial endosymbionts limit thermal tolerance of

insect host species. Proc Natl Acad Sci USA. 2019; 116:24712–24718. https://doi.org/10.1073/pnas.

1915307116 PMID: 31740601

44. Russell SL. Transmission mode is associated with environment type and taxa across bacteria-eukary-

ote symbioses: a systematic review and meta-analysis. FEMS Microbiol Lett. 2019; 366:fnz013. https://

doi.org/10.1093/femsle/fnz013 PMID: 30649338

45. Nussbaumer AD, Fisher CR, Bright M. Horizontal endosymbiont transmission in hydrothermal vent

tubeworms. Nature. 2006; 441:345–348. https://doi.org/10.1038/nature04793 PMID: 16710420

46. Salerno JL, Macko SA, Hallam SJ, Bright M, Won Y-J, McKiness Z, et al. Characterization of symbiont

populations in life-history stages of mussels from chemosynthetic environments. Biol Bull. 2005;

208:145–155. https://doi.org/10.2307/3593123 PMID: 15837964

47. Eberhard WG. Evolution in bacterial plasmids and levels of selection. Q Rev Biol. 1990; 65:3–22.

https://doi.org/10.1086/416582 PMID: 2186429

48. Hooks KB, O’Malley MA. Dysbiosis and its discontents. mBio. 2017; 8:e01492–17str. https://doi.org/10.

1128/mBio.01492-17 PMID: 29018121

49. Relman DA. Thinking about the microbiome as a causal factor in human health and disease: philosophi-

cal and experimental considerations. Curr Opin Microbiol. 2020; 54:119–126. https://doi.org/10.1016/j.

mib.2020.01.018 PMID: 32114367

50. Bénard A, Vavre F, Kremer N. Stress & symbiosis: heads or tails? Front Ecol Evol. 2020; 8:167. https://

doi.org/10.3389/fevo.2020.00167

51. Maher RL, Rice MM, McMinds R, Burkepile DE, Vega Thurber R. Multiple stressors interact primarily

through antagonism to drive changes in the coral microbiome. Sci Rep. 2019; 9:6834. https://doi.org/

10.1038/s41598-019-43274-8 PMID: 31048787

PLOS BIOLOGY

PLOS Biology | https://doi.org/10.1371/journal.pbio.3001322 August 19, 2021 15 / 18

52. Baker AC. Reef corals bleach to survive change. Nature. 2001; 411:765–766. https://doi.org/10.1038/

35081151 PMID: 11459046

53. Roach TNF, Dilworth J, H CM, Jones AD, Quinn RA, Drury C. Metabolomic signatures of coral bleach-

ing history. Nat Ecol Evol. 2021;1–9.

54. Awany D, Allali I, Dalvie S, Hemmings S, Mwaikono KS, Thomford NE, et al. Host and microbiome

genome-wide association studies: current state and challenges. Front Genet. 2019; 9:637. https://doi.

org/10.3389/fgene.2018.00637 PMID: 30723493

55. Geier B, Sogin EM, Michellod D, Janda M, Kompauer M, Spengler B, et al. Spatial metabolomics of in

situ host–microbe interactions at the micrometre scale. Nat Microbiol. 2020; 5:498–510. https://doi.org/

10.1038/s41564-019-0664-6 PMID: 32015496

56. Lozupone CA. Unraveling interactions between the microbiome and the host immune system to deci-

pher mechanisms of disease. mSystems. 2018; 3:e00183–17. https://doi.org/10.1128/mSystems.

00183-17 PMID: 29556546

57. Strader ME, Wong JM, Hofmann GE. Ocean acidification promotes broad transcriptomic responses in

marine metazoans: a literature survey. Front Zool. 2020; 17:7. https://doi.org/10.1186/s12983-020-

0350-9 PMID: 32095155

58. Galtier d’Auriac I, Quinn RA, Maughan H, Nothias L-F, Little M, Kapono CA, et al. Before platelets: the

production of platelet-activating factor during growth and stress in a basal marine organism. Proc R Soc

B. 2018; 285:20181307. https://doi.org/10.1098/rspb.2018.1307 PMID: 30111600

59. Vuong HE, Yano JM, Fung TC, Hsiao EY. The microbiome and host behavior. Annu Rev Neurosci.

2017; 40:21–49. https://doi.org/10.1146/annurev-neuro-072116-031347 PMID: 28301775

60. Greene A, Leggat W, Donahue MJ, Raymundo LJ, Caldwell JM, Moriarty T, et al. Complementary sam-

pling methods for coral histology, metabolomics and microbiome. Methods Ecol Evol. 2020; 11:1012–

1020. https://doi.org/10.1111/2041-210X.13431

61. Bowen BW, Gaither MR, DiBattista JD, Iacchei M, Andrews KR, Grant WS, et al. Comparative phylo-

geography of the ocean planet. Proc Natl Acad Sci USA. 2016; 113:7962–7969. https://doi.org/10.

1073/pnas.1602404113 PMID: 27432963

62. Lessios HA. The Great American schism: divergence of marine organisms after the rise of the Central

American Isthmus. Annu Rev Ecol Evol Syst. 2008; 39:63–91. https://doi.org/10.1146/annurev.ecolsys.

38.091206.095815

63. Sheppard SK, Guttman DS, Fitzgerald JR. Population genomics of bacterial host adaptation. Nat Rev

Genet. 2018; 19:549–565. https://doi.org/10.1038/s41576-018-0032-z PMID: 29973680

64. O’Dea A, Jackson JBC, Fortunato H, Smith JT, D’Croz L, Johnson KG, et al. Environmental change pre-

ceded Caribbean extinction by 2 million years. Proc Natl Acad Sci USA. 2007; 104:5501–5506. https://

doi.org/10.1073/pnas.0610947104 PMID: 17369359

65. O’Dea A, Lessios HA, Coates AG, Eytan RI, Restrepo-Moreno SA, Cione AL, et al. Formation of the

Isthmus of Panama. Sci Adv. 2016; 2:e1600883. https://doi.org/10.1126/sciadv.1600883 PMID:

27540590

66. O’Dea A, Jackson J. Environmental change drove macroevolution in cupuladriid bryozoans. Proc R

Soc B. 2009; 276:3629–3634. https://doi.org/10.1098/rspb.2009.0844 PMID: 19640882

67. DiBattista JD, Roberts MB, Bouwmeester J, Bowen BW, Coker DJ, Lozano-Cortés DF, et al. A review

of contemporary patterns of endemism for shallow water reef fauna in the Red Sea. J Biogeogr. 2016;

43:423–439. https://doi.org/10.1111/jbi.12649

68. DiBattista JD, Saenz-Agudelo P, Piatek MJ, Cagua EF, Bowen BW, Choat JH, et al. Population geno-

mic response to geographic gradients by widespread and endemic fishes of the Arabian Peninsula.

Ecol Evol. 2020; 10:4314–4330. https://doi.org/10.1002/ece3.6199 PMID: 32489599

69. Garcia-Castellanos D, Estrada F, Jiménez-Munt I, Gorini C, Fernàndez M, Vergés J, et al. Catastrophic flood of the Mediterranean after the Messinian salinity crisis. Nature. 2009; 462:778–781. https://doi.

org/10.1038/nature08555 PMID: 20010684

70. Patarnello T, Volckaert F a. MJ, Castilho R. Pillars of Hercules: is the Atlantic–Mediterranean transition

a phylogeographical break? Mol Ecol. 2007; 16:4426–4444. https://doi.org/10.1111/j.1365-294X.2007.

03477.x PMID: 17908222

71. De’ath G, Fabricius KE, Sweatman H, Puotinen M. The 27-year decline of coral cover on the Great Bar-

rier Reef and its causes. Proc Natl Acad Sci USA. 2012; 109:17995–17999. https://doi.org/10.1073/

pnas.1208909109 PMID: 23027961

72. Herlemann DP, Labrenz M, Jürgens K, Bertilsson S, Waniek JJ, Andersson AF. Transitions in bacterial

communities along the 2000 km salinity gradient of the Baltic Sea. ISME J. 2011; 5:1571–1579. https://

doi.org/10.1038/ismej.2011.41 PMID: 21472016

PLOS BIOLOGY

PLOS Biology | https://doi.org/10.1371/journal.pbio.3001322 August 19, 2021 16 / 18

73. Convey P, Chown SL, Clarke A, Barnes DKA, Bokhorst S, Cummings V, et al. The spatial structure of

Antarctic biodiversity. Ecol Monogr. 2014; 84:203–244. https://doi.org/10.1890/12-2216.1

74. Hall-Spencer JM, Rodolfo-Metalpa R, Martin S, Ransome E, Fine M, Turner SM, et al. Volcanic carbon

dioxide vents show ecosystem effects of ocean acidification. Nature. 2008; 454:96–99. https://doi.org/

10.1038/nature07051 PMID: 18536730

75. McCliment EA, Nelson CE, Carlson CA, Alldredge AL, Witting J, Amaral-Zettler LA. An all-taxon micro-

bial inventory of the Moorea coral reef ecosystem. ISME J. 2012; 6:309–319. https://doi.org/10.1038/

ismej.2011.108 PMID: 21900967

76. Neall VE, Trewick SA. The age and origin of the Pacific islands: a geological overview. Philos Trans R

Soc Lond B Biol Sci. 2008; 363:3293–3308. https://doi.org/10.1098/rstb.2008.0119 PMID: 18768382

77. Randall JE. Reef and Shore Fishes of the Hawaiian Islands. Sea Grant College Program, University of

Having Trouble Meeting Your Deadline?

Get your assignment on critical review completed on time. avoid delay and – ORDER NOW

Hawai‘i; 2007.

78. Delrieu-Trottin E, Williams JT, Bacchet P, Kulbicki M, Mourier J, Galzin R, et al. Shore fishes of the Mar-

quesas Islands, an updated checklist with new records and new percentage of endemic species. Check

List. 2015; 11:1758. https://doi.org/10.15560/11.5.1758

79. McCosker JE, Rosenblatt RH. The fishes of the Galápagos Archipelago: an update. Proc Calif Acad

Sci. 2010; 61:167–195.

80. Shaw KL, Gillespie RG. Comparative phylogeography of oceanic archipelagos: hotspots for inferences

of evolutionary process. Proc Natl Acad Sci USA. 2016; 113:7986–7993. https://doi.org/10.1073/pnas.

1601078113 PMID: 27432948

81. Duarte CM, Agusti S, Barbier E, Britten GL, Castilla JC, Gattuso J-P, et al. Rebuilding marine life.

Nature. 2020; 580:39–51. https://doi.org/10.1038/s41586-020-2146-7 PMID: 32238939

82. Cavicchioli R, Ripple WJ, Timmis KN, Azam F, Bakken LR, Baylis M, et al. Scientists’ warning to

humanity: microorganisms and climate change. Nat Rev Microbiol. 2019; 17:569–586. https://doi.org/

10.1038/s41579-019-0222-5 PMID: 31213707

83. VanWormer E, Mazet J a. K, Hall A, Gill VA, Boveng PL, London JM, et al. Viral emergence in marine

mammals in the North Pacific may be linked to Arctic sea ice reduction. Sci Rep. 2019; 9:15569. https://

doi.org/10.1038/s41598-019-51699-4 PMID: 31700005

84. Salgado J, Vélez MI, González-Arango C, Rose NL, Yang H, Huguet C, et al. A century of limnologi-

cal evolution and interactive threats in the Panama Canal: long-term assessments from a shallow

basin. Sci Total Environ. 2020; 729:138444. https://doi.org/10.1016/j.scitotenv.2020.138444 PMID:

32380321

85. Albano PG, Steger J, Bošnjak M, Dunne B, Guifarro Z, Turapova E, et al. Native biodiversity collapse in

the eastern Mediterranean. Proc R Soc B. 2021; 288:20202469. https://doi.org/10.1098/rspb.2020.

2469 PMID: 33402072

86. Graham NAJ, Wilson SK, Carr P, Hoey AS, Jennings S, MacNeil MA. Seabirds enhance coral reef pro-

ductivity and functioning in the absence of invasive rats. Nature. 2018; 559:250–253. https://doi.org/10.

1038/s41586-018-0202-3 PMID: 29995864

87. Wernberg T, Bennett S, Babcock RC, de Bettignies T, Cure K, Depczynski M, et al. Climate-driven

regime shift of a temperate marine ecosystem. Science. 2016; 353:169–172. https://doi.org/10.1126/

science.aad8745 PMID: 27387951

88. Saintilan N, Wilson NC, Rogers K, Rajkaran A, Krauss KW. Mangrove expansion and salt marsh decline

at mangrove poleward limits. Glob Chang Biol. 2014; 20:147–157. https://doi.org/10.1111/gcb.12341

PMID: 23907934

89. Beyer J, Trannum HC, Bakke T, Hodson PV, Collier TK. Environmental effects of the Deepwater Hori-

zon oil spill: a review. Mar Pollut Bull. 2016; 110:28–51. https://doi.org/10.1016/j.marpolbul.2016.06.

027 PMID: 27301686

90. Altieri AH, Harrison SB, Seemann J, Collin R, Diaz RJ, Knowlton N. Tropical dead zones and mass mor-

talities on coral reefs. Proc Natl Acad Sci USA. 2017; 114:3660–3665. https://doi.org/10.1073/pnas.

1621517114 PMID: 28320966

91. MacNeil MA, Mellin C, Matthews S, Wolff NH, McClanahan TR, Devlin M, et al. Water quality mediates

resilience on the Great Barrier Reef. Nat Ecol Evol. 2019; 3:620–627. https://doi.org/10.1038/s41559-

019-0832-3 PMID: 30858590

92. Heery EC, Hoeksema BW, Browne NK, Reimer JD, Ang PO, Huang D, et al. Urban coral reefs: degra-

dation and resilience of hard coral assemblages in coastal cities of East and Southeast Asia. Mar Pollut

Bull. 2018; 135:654–681. https://doi.org/10.1016/j.marpolbul.2018.07.041 PMID: 30301085

93. Robertson DR, Christy JH, Collin R, Cooke RG, D’Croz L, Kaufmann KW, et al. The Smithsonian Tropi-

cal Research Institute: marine research, education, and conversation in Panama. Smithson Contrib

Mar Sci. 2009;73–93.

PLOS BIOLOGY

PLOS Biology | https://doi.org/10.1371/journal.pbio.3001322 August 19, 2021 17 / 18

94. Berumen ML, Voolstra CR, Daffonchio D, Agusti S, Aranda M, Irigoien X, et al. The Red Sea: environ-

mental gradients shape a natural laboratory in a nascent Ocean. In: Voolstra CR, Berumen ML, editors.

Coral Reefs of the Red Sea. Cham: Springer International Publishing; 2019. pp. 1–10.

95. Archana A, Thibodeau B, Geeraert N, Xu MN, Kao S-J, Baker DM. Nitrogen sources and cycling

revealed by dual isotopes of nitrate in a complex urbanized environment. Water Res. 2018; 142:459–

470. https://doi.org/10.1016/j.watres.2018.06.004 PMID: 29913387

PLOS BIOLOGY

PLOS Biology | https://doi.org/10.1371/journal.pbio.3001322 August 19, 2021 18 / 18

Copyright of PLoS Biology is the property of Public Library of Science and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use.

,

Resource

Deep-Learning Resources for Studying Glycan-

Mediated Host-Microbe Interactions

Graphical Abstract

Highlights

d Glycan-focused language models can be used for sequence-

to-function models

d Information in glycans predicts immunogenicity,

pathogenicity, and taxonomic origin

d Glycan alignments shed light into bacterial virulence

Bojar et al., 2021, Cell Host & Microbe 29, 132–144 January 13, 2021 ª 2020 The Author(s). Published by Elsevier In https://doi.org/10.1016/j.chom.2020.10.004

Authors

Daniel Bojar, Rani K. Powers,

Diogo M. Camacho, James J. Collins

Correspondence [email protected] (D.M.C.), [email protected] (J.J.C.)

In Brief

Bojar et al. present a workflow that

combines machine learning and

bioinformatics techniques to analyze the

prominent role of glycans in host-microbe

interactions. The herein developed

glycan-focused language models and

alignments allow for the prediction and

analysis of glycan immunogenicity,

association with pathogenicity, and

taxonomic classification.

c. ll

OPEN ACCESS

ll

Resource

Deep-Learning Resources for Studying Glycan-Mediated Host-Microbe Interactions Daniel Bojar,1,2 Rani K. Powers,1,2 Diogo M. Camacho,1,4,* and James J. Collins1,2,3,4,5,* 1Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA 02115, USA 2Department of Biological Engineering and Institute for Medical Engineering & Science, Massachusetts Institute of Technology, Cambridge,

MA 02139, USA 3Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA 4These authors contributed equally 5Lead Contact

*Correspondence: [email protected] (D.M.C.), [email protected] (J.J.C.)

https://doi.org/10.1016/j.chom.2020.10.004

SUMMARY

Glycans, the most diverse biopolymer, are shaped by evolutionary pressures stemming from host-microbe interactions. Here, we present machine learning and bioinformatics methods to leverage the evolutionary in- formation present in glycans to gain insights into how pathogens and commensals interact with hosts. By using techniques from natural language processing, we develop deep-learning models for glycans that are trained on a curated dataset of 19,299 unique glycans and can be used to study and predict glycan functions. We show that these models can be utilized to predict glycan immunogenicity and the pathogenicity of bac- terial strains, as well as investigate glycan-mediated immune evasion via molecular mimicry. We also develop glycan-alignment methods and use these to analyze virulence-determining glycan motifs in the capsular polysaccharides of bacterial pathogens. These resources enable one to identify and study glycan motifs involved in immunogenicity, pathogenicity, molecular mimicry, and immune evasion, expanding our under- standing of host-microbe interactions.

INTRODUCTION

In contrast to RNA and proteins, whose sequences can be eluci-

dated from their associated DNA sequence, glycans are the only

biopolymer outside the rules of the central dogma of molecular

biology. Although glycans are synthesized by DNA-encoded en-

zymes (Lairson et al., 2008), an individual glycan sequence is

dependent on the interplay between multiple enzymes and

cellular conditions. Additionally, the expansive glycan alphabet

of hundreds of different monosaccharides allows for a large

number of potential oligosaccharides, built with different mono-

saccharides, lengths, connectivity, and branching. Glycans are

present as modifications on all other biopolymers (Varki, 2017),

exerting varying effects on biomolecules, including stabilization

and modulation of their functionality (Dekkers et al., 2017; Solá

and Griebenow, 2009). Apart from influencing the function of in-

dividual proteins, glycans are also crucial for cell-cell contact in

the case of glycan-glycan interactions during the attachment of

pathogenic bacteria to host cells (Day et al., 2015), and they

mediate essential developmental processes such as nervous

system development (Haltiwanger and Lowe, 2004). Recently,

Lauc et al. hypothesized that the plethora of available glycoforms

and their plasticity facilitated the evolution of complex multicel-

lular lifeforms (Lauc et al., 2014), reasoning that is supported

by the essential roles of glycans in developmental processes

132 Cell Host & Microbe 29, 132–144, January 13, 2021 ª 2020 The This is an open access article under the CC BY-NC-ND license (http://

and cell-cell communication and emphasizes the evolutionary

information in glycans.

Because glycans make up the outermost layer of both eukary-

otic and prokaryotic cells, cross-kingdom interactions will

necessarily involve these molecules (Day et al., 2015). The prom-

inent role of glycans in host-pathogen interactions (Varki, 2017)

has resulted in evolutionary pressures and opportunities on

both sides of the interaction—natural selection can modify

host glycan receptors used by pathogens without losing their

functionalities, whereas pathogens and commensals need to

alter their glycans to evade the host immune system. These inter-

actions provide a window into understanding glycan-mediated

host-microbe relationships. Glycans display great phenotypic

variability: sequences can be changed depending on environ-

mental conditions, such as the level of extracellular metabolites

(Park et al., 2017), without the need for genetic mutations, poten-

tially facilitating rapid responses to changes in host-microbe

relationships.

Given the aforementioned glycan-mediated host-microbe in-

teractions, glycans could provide insights into pathogenicity

and commensalism determinants, as, for instance, molecular

mimicry of host glycans by both pathogens and commensals fa-

cilitates their immune evasion (Carlin et al., 2009; Varki and Gag-

neux, 2015). Additional therapeutic potential is enabled by the

widespread usage of glycans by viruses for cell adhesion and

Author(s). Published by Elsevier Inc. creativecommons.org/licenses/by-nc-nd/4.0/).

C

D Bonds Made by NeuNAc

α2-8

α2-3

α2-6

N um

b e

r o f G

ly c

a ns

0

200

400

α2-6

α2-3

α2-8

Bonds Made by NeuNGc

0

40

80

NeuNAc

NeuNGc Kdo

Monosaccharides with Bond α2-3

0

200

400

Monosaccharides Paired with Fuc

N um

b e

r o f G

ly c

a ns

Branching

O c

c ur

e nc

e s

Position

Main

Side

G a

l G

lc

G a

lN A

c M

a n

Fu c

G lc

N A

c N

G lc

N A

c O

S

G lc

A Rh

a

G lc

N A

c G

ro G

a lO

A c

G a

lA G

lc N

1200

800

400

0

3000

2000

1000

0

G lc

N A

c

G a

lO S

A 12,674 Species-Specific Glycans

6,969 eukaryotic 6,119 prokaryotic 152 viral

19,299 Unique Glycans

1,027 Glycoletters 19,866 Glycowords 9,152 Glycans with at least one label

1600

6000

Domain

Order

Kingdom

Family

Phylum

Genus

Class

Species

Number of Glycans

Number of Glycans 0 2000 4000 6000 0 2000 4000 0 20001000 3000

0 800 0 400 800 0 400 800

Virus

Archaea

Primates Pseudomonadales Fabales

Rhizobiales

Saccharomycetales

Lactobacillales Artiodactyla

Burkholderiales

Actinomycetales

0 1000 2000

Plantae Animalia

Fungi Excavata

Virus Euryarchaeota Riboviria Chromista Proteoarchaeota

0 2000 4000

Hominidae

Pseudomonadaceae Fabaceae

Saccharomycetaceae Rhizobiaceae

Pasteurellaceae

Burkholderiaceae

Solanaceae Muridae

Angiosperms Chordata

Ascomycota Firmicutes

Basidiomycota Actinobacteria

Euglenozoa

Virus Arthropoda

Homo

Salmonella

Burkholderia Shigella

Bos Sus Streptococcus

Lactobacillus

Dicotyledons Mammalia

Bacilli Alphaproteobacteria Monocotyledons

Saccharomycetes

Betaproteobacteria

Sordariomycetes Actinobacteria

Sus scrofa Mus musculus

Rattus norvegicus

Shigella dysenteriae Gallus gallus

Pseudomonas sp.

Saccharomyces cerevisiae

B Eukarya

Bacteria

Bacteria Proteobacteria Gammaproteobacteria

Enterobacterales Enterobacteriaceae Escherichia Homo

Pseudomonas Homo sapiens

Bos taurus

Escherichia coli

Figure 1. Using a Curated Glycan Dataset as a Resource for Glycobiology and Analyzing Host-Microbe Interactions (A) Building curated datasets of species-specific and unique glycan sequences. Glycans stemming from proteins, lipids, small molecules, or cellular surfaces

were gathered from UniCarbKB, CSDB, GlyTouCan, and the academic literature. We deposited these datasets in our database SugarBase, containing additional

associated metadata, such as linkage and immunogenicity information.

(legend continued on next page)

ll OPEN ACCESSResource

Cell Host & Microbe 29, 132–144, January 13, 2021 133

ll OPEN ACCESS Resource

entry (Thompson et al., 2019) and pathogenic bacteria (Poole

et al., 2018).

In addition to previous work developing computational ap-

proaches to glycan analysis (McDonald et al., 2016; Spahn

et al., 2016), identifying relevant glycan motifs and their roles in

host-microbe interactions at scale would benefit from pattern-

learning algorithms, such as machine learning, that can uncover

statistical dependencies in biological sequences (Camacho

et al., 2018). Research on other biopolymers has shown that lan-

guage models, originally developed for the analysis of human

languages, perform best in this task (Alley et al., 2019; Almagro

Armenteros et al., 2020; Strodthoff et al., 2020), because they

can leverage evolutionarily conserved regularities and lan-

guage-like properties in such sequences. Language models,

with their memory-like features, are well suited for leveraging

patterns and implicit structure in biopolymers such as those un-

derlying nucleic acids (Valeri et al., 2020) and proteins (Alley

et al., 2019), because information in these sequences is order

dependent, and non-neighboring residues can have meaningful

interactions. Applying a natural language-processing approach

to biological sequences also enables learning a representation

of a molecule that can be used to analyze sequence motifs

and predict functional properties. These types of models are

therefore a suitable starting point for the analysis of glycan

sequences.

Here, we present a resource toolkit comprising machine

learning and bioinformatics methods as well as a large glycan

database to leverage the evolutionary information present in gly-

cans for predictive purposes in the context of host-microbe in-

teractions, e.g., by understanding pathogenicity-associated

glycan motifs. This toolkit can be used as a complete workflow

for investigating host-microbe interactions, from a glycan data-

set to glycan motifs identified by machine learning and further

investigated by glycan alignments, or as separate modules. Un-

derlying all of this is our language model for glycans, SweetTalk,

trained on a dataset of 19,299 unique glycan sequences. With

this, we demonstrate that similarities between glycans can be

visualized and used to predict glycan properties such as human

immunogenicity. Another part of our platform is SweetOrigins, a

language-model-based classifier predicting the taxonomic

origin of glycans that we use to obtain evolution-informed repre-

sentations of glycans. To achieve this in the context of glycan-

mediated host-microbe interactions, we manually curated a

comprehensive dataset comprising 12,674 glycans with species

annotations. These datasets were combined into a database,

SugarBase, that is amenable to programmatic access and inte-

gration into deep-learning pipelines, thus providing resources for

analyses involving host-microbe interactions.

In this work, we demonstrate the potential and generaliz-

ability of using SugarBase, SweetTalk, SweetOrigins, and a

glycan-alignment methodology for studying glycan-mediated

host-microbe interactions. We show that a language-model-

based classifier trained on glycan sequences can accurately

(B) Glycan species distribution in the species-specific glycan dataset. For all gly

taxonomic level are shown with their number of glycans.

(C and D) Analyzing the local structural context of glycoletters. We identified the m

local structural context together with its likely position in the glycan structure (main

sialic acids (D).

134 Cell Host & Microbe 29, 132–144, January 13, 2021

predict glycan immunogenicity and the pathogenicity of

E. coli strains, revealing predictive glycan motifs. We also

leverage the evolutionary information gained by SweetOrigins

to analyze glycan motifs that could be used for molecular-

mimicry-mediated immune evasion by commensals and

pathogens. Applying our glycan-alignment methodology to

the example of the capsular polysaccharides of Staphylo-

coccus aureus and Acinetobacter baumannii, we uncover a

potential connection to the enterobacterial common antigen

and hypothesize a mechanism for the increased virulence

mediated by these glycan motifs. Taken together, these

resources offer a powerful and generalizable platform for

studying and understanding the role of glycans in host-

microbe interactions.

RESULTS

Curating Glycan Datasets for Glycobiology and Glycan- Mediated Host-Microbe Interactions To investigate the role of glycans in host-microbe interactions,

we constructed a dataset of species-specific glycan sequences

that could be used to train machine-learning models. For this, we

gathered and curated a dataset with glycans from GlyTouCan

(Tiemeyer et al., 2017), UniCarbKB (Campbell et al., 2014), the

Carbohydrate Structure Database (CSDB) (Toukach and Egor-

ova, 2016), and targeted literature searches (see STAR

Methods). To facilitate training deep-learning models on glycan

sequences, we only included glycans with fully elucidated se-

quences, including the determination of linkages between

monosaccharides. Our dataset contained 12,674 highly diverse

glycans with a deposited species association (Figure 1A; Table

S1) and included glycans from 1,726 species (corresponding to

39 taxonomic phyla; Figure 1B). Specifically, our dataset con-

tained 6,969 eukaryotic, 6,119 prokaryotic, and 152 viral gly-

cans. Because we included all species for which we could find

glycans, this dataset constituted a comprehensive snapshot of

currently known species-specific glycans, with glycans from

numerous bacteria, facilitating the study of glycan-mediated

host-microbe interactions.

We further reasoned that the inclusion of glycan sequences

without a deposited species label would strengthen the lan-

guage models we describe below. This approach is supported

by the success of transfer learning in the field of machine learning

(Howard and Ruder, 2018), in which models are initially trained

on large datasets without labels and then finetuned on smaller

datasets with labels. This makes more data available to learn

general patterns, such as sequence motifs, that can be lever-

aged to predict glycan properties. Accordingly, we curated a

separate dataset in which we used the databases mentioned

above to gather 19,299 unique glycan sequences, irrespective

of whether species information was available (Figure 1A;

STAR Methods; Table S2). To gain a comprehensive view of

glycobiology, we included all glycan categories, encompassing

cans with species information, up to the 10 most abundant classes for each

ost frequent monosaccharides following fucose in glycans (C), highlighting its

versus side branch). Additionally, we compared the binding behavior of several

Featurize Input

/ // /

Language Model (Glycoletters)

Classifiers (Glycowords)

SweetTalk

Xt-1 Xt

LSTMRv LSTMRv LSTMRv

Xt+1

Yt-1 Yt Yt+1

2-3x

Embedding

A

Language Model Output

C Glycowords With Existing Alphabet

Possible Realized

Data Processing

GlyTouCan Literature

D

Datasets

α2

β4 α3 β3 β3

Ser/Thr

α2

β4 α3 β3 β3

Ser/Thr

α2

β4 α3 β3 β3

Ser/Thr

α2

β4 α3 β3 β3

Ser/Thr

β3 β3

α3 β3

β4 α3

α2 β4

Fuc

Gal GalNAc GlcNAc

tSNE Dim 1

Glycoletter Embeddings

Fuc

Glc Neu

Man

B Glycowords

tS N

E D

im 2

Bonds

Gal

NeuNAc

Glc

α3 β3

10 –

30 –

-10 –

-30 –

-30

-10

30

10

N um

b e

r o f G

ly c

o w

o rd

s

Possible Realized

U M

A P

D im

2

UMAP Dim 1

8 –

4 –

0 –

-4 –

-10

-6

-2

2

6

LSTMFw LSTMFw LSTMFw

19,299 Glycans

E F G

2-2-6 UMAP Dim 1

6

-8

-4

0

4

U M

A P

D im

2

Non -immuno

-genic

Immuno -genic

N um

b e

r o f U

nm a

sk e

d G

ly c

o w

o rd

s

Probability Immunogenic Probability Immunogenic

α3 α6 α6

α3

β2

β4 β4 α6

α2 α3 α3 α2

α6

α2 β4 β4 α3 β4 β4 α6

WT α2 α3

α2 α6

1 2

3

3

1 2

2

α6 α3

α3 α6

β4 β21

1

2

1

1

1

α6 β4 β3 β4

WT

1 2

β4 β31

1

β3 β6

α6 α3

Fuc Gal GalNAcGlcNAc ManNAcGlc ManRhaNeuNAcNeuNGc Xyl

αGal

0.2

0.6

1.0

0.0

0.4

0.8

0.2

0.6

1.0

0.0

0.4

0.8

1 –

2 – 3 –

4 – 5 –

6 – 7 –

0.2

0.6

1.0

0.2

0.6

1.0

1 –

2 –

3 –

4 –

0.2

0.6

1.0

0.2

0.6

1.0

Homo sapiens

Non-Reducing Reducing

Homo sapiens

Non-Reducing Reducing

Ruminococcus gnavus

Homo sapiens

High MMannose

ghHig haRh

NN-Glycans

nsO-Glycan

1012

108

104

100

(legend on next page)

ll OPEN ACCESSResource

Cell Host & Microbe 29, 132–144, January 13, 2021 135

ll OPEN ACCESS Resource

protein-, lipid-, and small molecule-associated glycans, as well

as capsular and extracellular polysaccharides.

In our dataset, we observed 1,027 unique monosaccharides or

bonds that were present in glycan sequences and comprised the

smallest units of an alphabet for a glycan language. Analogous to

natural language processing, we termed these entities ‘‘glycolet-

ters’’ and constructed ‘‘glycowords’’ by considering trisaccha-

rides (i.e., three monosaccharides and two connecting bonds,

or five glycoletters), yielding 19,866 unique glycowords in our da-

taset. With this, we sought to incorporate local structural infor-

mation into our models and enable the discovery of relevant mo-

tifs, which usually contain subsequences larger than a single

monosaccharide. Even larger substructures would preclude

the analysis of shorter glycans and lead to an exponential in-

crease in the size of the resulting vocabulary. We would also

like to note that although we chose trisaccharides as building

blocks, glycan substructures of any length can be used to build

a vocabulary for our models without considerable changes.

To make these data and analysis resources readily accessible

and facilitate further advances in glycobiology, we created Sug-

arBase, a comprehensive glycan database with metadata and

analytical tools based on this work (Figure S1A; Table S2;

https://webapps.wyss.harvard.edu/sugarbase). SugarBase of-

fers accessible glycan data, explorable glycan representations

learned by our language models, and many of the methods

developed here as tools, such as the local structural context

of any glycoletter (Figure S1B) and glycan alignments, described

below.

Reasoning that our glycan datasets constitute broad re-

sources for glycobiology and host-microbe interactions, we set

out to investigate host glycan substructures that could be

emulated by microbes for molecular mimicry. Analyzing the envi-

ronment of the monosaccharide fucose as an example, we

observed N-acetylglucosamine (GlcNAc) and galactose (Gal)

as typical connected monosaccharides (Figure 1C), which is

consistent with the fucosyltransferase substrate specificities an-

notated in glycosyltransferase family 10 (Lombard et al., 2014).

Thus, microbial glycans containing fucose could potentially

include either GlcNAc or Gal in direct proximity to maximize sim-

ilarity with host glycans. This insight aids in formulating hypoth-

Figure 2. Learning the Language of Glycans Revealed Regularities in S

(A) Building a language model for glycobiology. We used glycowords, overlapping

based bidirectional RNN, SweetTalk, that was trained by predicting the next glyc

symbol nomenclature for glycans (SNFG).

(B) Learned representation of glycoletters by SweetTalk. We visualized the embe

SNE). Areas enriched for modified monosaccharides of one type are colored.

(C) Comparing the abundance of possible and observed glycowords. Possible

exhaustive combination (36 bonds and 991 monosaccharides).

(D) Comparing the distribution of possible and observed glycowords. We gene

monosaccharides and bonds and formed their embedding by averaging their co

jection (UMAP) of these generated glycowords (blue) and all observed glycoword

(E) Glycan embeddings learned by the immunogenicity classifier. Embeddings fo

according to whether they were immunogenic (blue) or non-immunogenic (orang

(F) Glycoword masking to probe the immunogenicity classifier. Glycowords were

Reducing’’/‘‘Reducing’’) and used as input for the trained immunogenicity classifi

glycan is for prediction, with the bar representing the full-length glycan at the bo

(G) Glycan in silico alterations to probe immunogenicity classifier. For 4,000

monosaccharide or bond. If the resulting glycowords were observed, we used th

probability is plotted together with the altered glycan sequences, with the wildtyp

monosaccharide was modified. The addition of an ‘‘S’’ implies a sulfurylated mo

136 Cell Host & Microbe 29, 132–144, January 13, 2021

eses and identifying glycan motifs relevant for molecular mim-

icry, as we describe below. We also differentiated binding

orientation preferences for different sialic acids, a crucial mono-

saccharide type in host-pathogen interactions (Figure 1D;

Haines-menges et al., 2015), revealing a preference for the char-

acteristic human monosaccharide NeuNAc to be (a2-3)-linked,

relative to other sialic acids such as NeuNGc. These types of an-

alyses can directly lead to hypotheses of glycan motifs that can

be investigated by using the methods presented in this work.

Using Natural Language Processing to Learn the Grammar of Glycans Next, we used our curated dataset of 19,299 glycan sequences

(Table S2) to develop a deep-learning-based language model,

SweetTalk. For this, we chose a bidirectional recurrent neural

network (RNN; Figure 2A; Sherstinsky, 2020), because this

type of model has delivered state-of-the-art results for other bio-

polymers, such as protein sequences (Alley et al., 2019; Almagro

Armenteros et al., 2020; Strodthoff et al., 2020). Originally devel-

oped for human languages, RNNs exhibit memory-like elements

by predicting the next word given the preceding words (Sherstin-

sky, 2020); this enables RNNs to learn complex, order-depen-

dent interactions in proteins by viewing amino acids as letters

and predicting the next amino acid given the preceding

sequence (Alley et al., 2019). Two of the main usages for a trained

language model are as follows: (1) extracting a learned represen-

tation for each word and (2) finetuning the model for predicting

structural or functional properties of a sequence. For the former,

a representation or embedding that characterizes a word in

terms of context, usage, and meaning is constructed in the pa-

rameters of the trained model for each word in the vocabulary.

This learned representation can be used to quantify the similarity

of two glycan sequences or analyze language properties, which

we demonstrate with the analysis of molecular mimicry in host-

microbe interactions. The latter—finetuning a general language

model on a predictive task such as predicting pathogenicity—

is also known as transfer learning (Howard and Ruder, 2018;

Tan et al., 2018), and in our case it involves general glycan fea-

tures that are learned by the language model to predict func-

tional properties.

ubstructures and Can Be Used to Predict Glycan Immunogenicity

units consisting of three monosaccharides and two bonds, for our glycoletter-

oletter given previous glycoletters. Glycans are drawn in accordance with the

dding for every glycoletter by t-distributed stochastic neighbor embedding (t-

glycowords were calculated from the pool of observed glycoletters and their

rated 250,000 glycowords by randomly sampling from the observed pool of

nstituent glycoletter embeddings. A uniform manifold approximation and pro-

s (orange) is shown.

r glycans from our immunogenicity dataset are shown via UMAP and colored

e).

progressively exchanged with padding (‘‘masking’’) from both termini (‘‘Non-

er. Inferred immunogenicity probability indicates how crucial each region of a

ttom.

iterations, single monosaccharides or bonds were replaced with a random

em as input for the trained immunogenicity classifier. Inferred immunogenicity

e glycan found at the bottom. In case of ambiguity, a number indicates which

nosaccharide, whereas ‘‘Me’’ implies a methylated monosaccharide.

ll OPEN ACCESSResource

Glycans are the only nonlinear biopolymer, with up to multiple

branches per sequence. To enable a language model despite

this branching, we extracted partially overlapping ‘‘glyco-

words’’ from the non-reducing end to the reducing end of gly-

cans in the bracket notation (Figure 2A), comprising three

monosaccharides and two bonds. These glycowords repre-

sented snapshots of structural contexts that characterize a

glycan sequence. By using monosaccharides and bonds as

‘‘glycoletters,’’ we then trained a glycoletter-based language

model, SweetTalk, predicting the next most probable glycolet-

ter given the preceding glycoletters in the context of these gly-

cowords (Table S3). This operation, instead of directly training

on full sequences, avoids learning specious relationships be-

tween glycoletters that are close in the bracket notation but

far apart in the actual glycan structure due to branching. We

then demonstrated the necessity of accounting for the order-

dependent information in glycans by training SweetTalk on

scrambled glycan sequences, randomizing the order but keep-

ing the composition of a sequence—this resulted in severely

degraded model performance, emphasizing the language-like

elements inherent in glycan sequences (Table S3). Analyzing

the learned embeddings of glycoletters after training SweetTalk

revealed similar positions in embedding space for monosac-

charides and their modified counterparts (e.g., sulfurylated

galactose, GalOS, and sulfurylated N-acetylgalactosamine,

GalNAcOS; Figure 2B), implying similarity in their language

characteristics and context. This finding is reminiscent of ob-

servations made on the popular word2vec embeddings that

also learn a representation of words in a human language by

considering their neighboring words/context, in which seman-

tically similar words form clusters (Mikolov et al., 2013).

We then constructed glycoword embeddings by averaging the

embeddings of their constituent glycoletters. Our first observa-

tion was that from the close to 1.2 trillion possible glycowords

(given our observed glycoletters), only 19,866 distinct glyco-

words (�0.0000016%) were observed here (Figure 2C). More- over, these 19,866 glycowords were not evenly distributed in

the learned embedding space, as existing glycowords formed

clusters compared to in silico-generated, possible glycowords

(Figure 2D). The observation that the glycoword space (and,

thus, glycan space) is sparsely populated is potentially a conse-

quence of having to evolve dedicated enzymes for constructing

specific glycan substructures from a species-specific set of

monosaccharide building blocks, making most combinations

inaccessible.

Predicting Glycan Immunogenicity with a Glycan-Based Language Model Given the important role glycans play in human immunity (Kap-

pler and Hennet, 2020; Reusch and Tejada, 2015), we curated

known immunogenic glycans from the literature (Table S2) to fi-

netune a SweetTalk-based classifier with glycan sequences as

input to predict their immunogenicity to humans. On an indepen-

dent validation dataset, our model achieved an accuracy of

�92% (F1 score or balanced F score: 0.915), in comparison with an accuracy of �51% for a model trained on scrambled glycan sequences (Figures 2E–2G; Table S4). Alternative ma-

chine-learning models that did not treat glycan sequences as a

language, such as random forest classifiers, only achieved accu-

racies ranging from �80%–88% for this task (Table S4), empha- sizing the importance of order and patterns for elucidating

glycan properties.

Rhamnose-rich glycans, a common monosaccharide in bac-

teria but not in mammals, were unambiguously assigned to an

immunogenic cluster by our RNN-based model and presented

the most striking motif for glycan immunogenicity (Figure 2E).

The cluster containing high-mannose glycans provided addi-

tional ambiguity, because it included both immature human gly-

cans and immunogenic fungal glycans, potentially suggesting

the immunogenicity of unintentionally exposed immature human

glycans. Indeed, the presence of immature high-mannose gly-

cans on viral surfaces has been noted to influence immunoge-

nicity, with many broadly neutralizing antibodies targeting the

high-mannose glycans on HIV glycoproteins (Lavine et al.,

2012). We also found that human mucosal O-glycans, character-

ized by their interactions with bacteria, were interspersed with

bacterial immunogenic glycans in the embedding space, in

contrast to N-linked glycans. This adds to the notion of an immu-

nological compromise of recognizing these bacterial glycans at

the expense of targeting human O-glycans with shared motifs,

such as the ABH blood group antigens (Kappler and Hennet,

2020). These analyses indicate that embeddings from glycan-

focused language models could be used to study characteristics

of glycans on a large scale and with many potential applications,

such as the exploration of glycan-immune system interactions.

Using Deep Learning to Provide Evolution-Informed Glycan Representations We next hypothesized that the evolutionary pressures on gly-

cans stemming from host-pathogen interactions could be ex-

tracted by a deep-learning model. For this, we constructed a lan-

guage-model-based classifier, SweetOrigins, to predict the

taxonomic origin of a glycan (Figure 3A). In distinguishing taxo-

nomic classes, SweetOrigins could learn species-specific fea-

tures of glycans that are indicative of their evolutionary history.

Based on a bidirectional RNN, we first pre-trained SweetOrigins

with a SweetTalk model as described above. We then used the

language-like properties learned in this process to finetune the

model on a different task—predicting the taxonomic group of

glycans. By doing this for every taxonomic level, from the spe-

cies level up to the domain level, we obtained eight SweetOrigins

models with the same basic model architecture except for

different final layers. These final layers could learn how to

combine the extracted information from glycans for predicting

their taxonomic group, and they differed in terms of their number

of output nodes, as the number of classes varied for each taxo-

nomic level. This strategy was successful in extracting evolu-

tionary information from glycans, as SweetOrigins models clas-

sified the taxonomic group of a glycan with high accuracy

(Table 1).

In contrast to other biological sequences such as DNA or pro-

teins, the number of available sequences for glycans is still

limited, which is compounded by their high diversity. This is

especially visible in prediction tasks in which only few glycans

per class are available, such as for the species-level SweetOri-

gins model, resulting in lower model performance for rare clas-

ses and less useful glycan representations for downstream ana-

lyses. As knowledge of host-microbe interactions at the species

Cell Host & Microbe 29, 132–144, January 13, 2021 137

A

Glycan

G ly

c o

w o

rd s

SweetOrigins

b iL

ST M

b iL

ST M

b iL

ST M

Domain: Bacteria

Kingdom: Bacteria

Phylum: Proteobacteria

Class: Gammaproteobacteria

Order: Enterobacterales

Family: Enteriobacteriaceae

Genus: Escherichia

Classification ResultFully Connected Layer

Species: Escherichia coli

Glycoword Embeddings

α6β2 β2

α3

β4 β4 α3

α6β2

β2

α3 β4 β4

α3 α6

β2

β2

α3

β4 β4 α3 α6

β2

β2 α3 β4 β4

α3

α6β2 β2

α3

β4 β4 α3

α6

β2

β2

α3 β4 β4 α3 α6

β2

β2 α3 β4 β4

α3

GlcNAc(b1-2)Man(a1-6)[Xyl(b1-2)][Man(a1-3)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc

GlcNAc(b1-2)Man(a1-6) [Xyl(b1-2)][Man(a1-3)]…

GlcNAc(b1-2)Man(a1-6) [Man(a1-3)][Xyl(b1-2)]…

Xyl(b1-2)[GlcNAc(b1-2) Man(a1-6)][Man(a1-3)]..

Xyl(b1-2)[Man(a1-3)] [GlcNAc(b1-2)Man(a1-6)]..

Man(a1-3)[GlcNAc(b1-2) Man(a1-6)][Xyl(b1-2)]..

Man(a1-3)[Xyl(b1-2)] [GlcNAc(b1-2)Man(a1-6)]..

B

400

t-SNE Dim 1 -40

t- SN

E D

im 2

20

40

0

-20

αGal-engineered

O8/O9

K-12

O86/O127/O128

O13/O148/O150 C

F470O6

O174

O4/O25

O6

J5

C D

t-SNE Dim 1 -20 0 20 60

40

0

-40

Yes Unknown No

Pathogenic

O157:H7

K-12

O111:B4

α3 β3 β3

α6 α4

α6 α4

β3α4

α3β3

α3 β3

Figure 3. Deep-Learning-Based Classifiers Use Glycans to Predict Taxonomic Origin and Pathogenicity

(A) Exemplary schematic of SweetOrigins to predict taxonomic origin from glycans. Lists of glycowords are used as input for a SweetOrigins model to predict the

taxonomic class ranging from the domain level down to the species level.

(B) Glycan data augmentation strategy. Different bracket notations describing the same glycan can be generated by alternating double branches as well as

replacing side branches with main branches to increase model robustness.

(C) Glycans of E. coli in embedding space distinguish strains. The embedding for all 1,010 E. coli-derived glycans with strain information from the trained species-

level SweetOrigins model is plotted via t-SNE and colored for areas enriched for annotated E. coli strains.

(D) E. coli glycans predict pathogenicity. For all E. coli-derived glycans, representations learned by a model predicting pathogenicity are plotted via t-SNE and

colored as to whether they stem from pathogenic, non-pathogenic, or unlabeled E. coli. Example strains for all cases are annotated.

ll OPEN ACCESS Resource

138 Cell Host & Microbe 29, 132–144, January 13, 2021

Table 1. Metrics of Trained SweetOrigins Models

Taxonomic Level Classes

Baseline

Accuracy Cross-Entropy Loss Accuracy MCC

Random Max Base Aug Base Aug Base Aug

Domain 4 (4) 0.2500 0.99 0.2841 0.1906 0.9128 0.9313 0.8134 0.8693

Kingdom 9 (11) 0.1111 0.98 0.3844 0.3249 0.8733 0.8953 0.8001 0.8390

Phylum 33 (39) 0.0303 0.98 0.8685 0.7543 0.7779 0.8008 0.7018 0.7341

Class 71 (101) 0.0141 0.96 1.3283 1.1729 0.6803 0.7149 0.6218 0.6638

Order 145 (207) 0.0069 0.92 2.2498 2.1132 0.4937 0.5333 0.4602 0.5066

Family 258 (411) 0.0039 0.90 2.9834 2.7068 0.4134 0.4660 0.3873 0.4428

Genus 405 (919) 0.0025 0.86 3.6588 3.4081 0.3658 0.3849 0.3505 0.3682

Species 581 (1,726) 0.0017 0.86 4.3704 3.9550 0.3052 0.3651 0.2870 0.3496

Taxonomic groups with fewer than five unique glycans were not used for model training or validation. Number of classes indicates the number of

included taxonomic groups, whereas the full number of taxonomic groups in our dataset is given in parentheses. Models were trained with the standard

set of glycans (Base) or after data augmentation (Aug). As an accuracy baseline, a random prediction of classes was used for each model. Max in-

dicates the maximum theoretically possible accuracy given shared glycan sequences across taxonomic groups. Cross-entropy loss, accuracy,

and Matthew’s correlation coefficient (MCC) of the trained model on a separate validation set are given for each taxonomic level. For each metric

and taxonomic level, the superior value is bolded.

ll OPEN ACCESSResource

level could offer insights, we developed methods that enable

training glycan-focused machine-learning models on small data-

sets. This goal motivated our transfer-learning approach of pre-

training a language model on all glycan sequences and then fine-

tuning the model on a smaller dataset, because this approach in

natural language processing has in some cases reduced the

necessary dataset size by a factor of 100 (Howard and Ruder,

2018). In other domains of deep learning, such as image classi-

fication, data augmentation routinely results in improved model

quality and robustness by providing the model with slightly modi-

fied versions of the data (Perez and Wang, 2017), such as

rotating images or changing their brightness. We reasoned that

the same could be achieved for biomolecules such as glycans;

we thus designed a data-augmentation method, specifically for

glycans, by conceptualizing glycans as graphs and forming a

set of isomorphic graphs comprising slightly different lists of gly-

cowords that we used as inputs for SweetOrigins (Figure 3B;

STAR Methods). Capitalizing on the ambiguity of the bracket no-

tation (Tanaka et al., 2014), we generated bracket notations that

differed in their ordering of branches but still described the same

glycan. This led to model performance improvements at every

classification level, with absolute accuracy increases of up to

6%, by effectively increasing the amount of available data. As

we envisioned, classifications with less data per class, such as

the species level, benefited most from data augmentation (Table

1), paving the way for using glycan-based deep-learning models

with smaller datasets.

In general, our predictions were robust, and we could, for

example, accurately predict glycans from the kingdoms Animalia

(91.1%) and Bacteria (97.2%), as well as glycans from the phyla

Chordata (91.9%) and Firmicutes (90.4%) in our validation data-

set (Figures S2A–S2C). This demonstrates that SweetOrigins

can learn glycan representations from both hosts and microbes,

enabling the analyses presented below. Any misclassifications

occurred among closely related groups, such as viral glycans

misclassified as those of their hosts (Figures S2A–S2C). Glycan

embeddings from our trained SweetOrigins model illustrated

clusters reminiscent of taxonomic groups (Figure S2D). We

next used our trained SweetOrigins models to infer the taxo-

nomic origin of the 10,333 glycans without a species label in

our dataset (Table S2). For several randomly selected glycans,

we performed literature searches to validate the predictions

made by SweetOrigins (Figure S2E; Table S5), indicating that

our trained SweetOrigins models had accurately learned spe-

cies- or group-specific glycan motifs.

We next used SweetOrigins models to investigate host-path-

ogen interactions, specifically in the context of the well-studied

bacterium E. coli. Although SweetOrigins classifiers were only

trained up to the species level, we hypothesized that subspe-

cies-level information could be extracted from the rich glycan

representation learned by the species-level SweetOrigins model.

To test this, we gathered 1,010 glycan sequences from E. coli

with strain-level annotation from CSDB and used these as inputs

to our trained model, yielding learned representations that we

used to differentiate serotypes. We could readily identify clusters

enriched for several strains in the representations, such as the

serotypes O8/O9, characterized by a special polymannose O-

antigen (Greenfield et al., 2012), and the K-12 strain popular in

molecular biology research (Figure 3C), demonstrating the diver-

sity and characteristic features of glycans for different E. coli

strains.

We next reasoned, given the prominent role of glycans in host-

microbe interactions, that these glycan differences could be

used to predict E. coli pathogenicity, because E. coli strains

can range from being non-colonizing to commensal or patho-

genic (Lim et al., 2010). Accordingly, we trained a deep-

learning-based classifier with the same language-model archi-

tecture as SweetOrigins on glycan sequences to elucidate

whether information in glycans can predict pathogenicity. With

a threshold of 0.5 in the predicted probability of pathogenicity,

we found that we were able to predict E. coli strain pathogenicity

with an accuracy of �89% on a separate validation dataset (Fig- ure 3D; F1 score: �0.906). This positioned E. coli strains along a continuum of predicted pathogenicity and supported the role of

glycans in mediating pathogenicity. Interestingly, E. coli strains

such as O111:B4, which were labeled as ‘‘unknown’’ in the

Cell Host & Microbe 29, 132–144, January 13, 2021 139

ll OPEN ACCESS Resource

dataset and therefore not available during model training, were

predicted to be among the pathogenic strains and confirmed

to cause gastric disease (Viljanen et al., 1990). Our trained model

placed the majority of E. coli glycans from unknown pathoge-

nicity strains between pathogenic and non-pathogenic strains,

adding to the notion of a continuum of pathogenicity (Casade-

vall, 2017).

Because glycans appear to be predictive of pathogenicity, we

reasoned that certain glycan motifs in E. coli strains on the path-

ogenic end of the spectrum might provide further insight into

pathogenesis. To address this notion, we identified glycan motifs

that are enriched in regions populated by predominantly patho-

genic E. coli strains in the representation learned by our model

(Figure 3D). Motifs in these pathogenicity-associated glycans

exhibited a striking resemblance to host mucosal glycans, with

an enrichment for a1-2-linked fucose and the core 1 O-glycan

structure (also known as T antigen; Gal(b1-3)GalNAc) prevalent

in mucins (Figures S3A and S3B). Consistent with our local struc-

tural context analysis (Figure 1B), the majority of a1-2-linked

fucose residues in pathogenic E. coli strains were linked to

galactose (Figure S3C), forming part of the human blood group

H antigen. Indeed, when analyzing the glycan motifs most pre-

dictive of E. coli strain pathogenicity, both Gal(b1-3)GalNAc

and Fuc(a1-2)Gal disaccharides were among the top 20 motifs

(Figure S3D). On the other hand, the presence of typical bacterial

glycan components, such as rhamnose or L-Glycero-D-Manno-

Heptose (LDManHep), was associated with lower predicted

pathogenicity (Figure S3D).

Using Glycan Alignments to Study Virulence Determinants in Bacterial Pathogens To better understand the function of glycans in host-microbe in-

teractions, we developed a sequence-alignment method. For

DNA and protein sequences, alignments use sequence changes

due to mutations and insertions to enable, for example, the iden-

tification of conserved motifs in protein families (Do�gan and Kar-

açalı, 2013). To facilitate analogous analyses for glycans and capitalize on the evolutionary influence of host-pathogen inter-

actions on glycans, we developed methods for gapped, pairwise

alignments of glycan sequences based on the Needleman-

Wunsch alignment algorithm (Needleman and Wunsch, 1970).

For this, we constructed a substitution matrix (which we termed

GLYSUM; Table S6), analogous to the BLOSUM matrices used

in protein alignments, that utilizes the likelihood of substituting

two monosaccharides to calculate alignment scores. To assess

whether our glycan alignments performed as envisioned, we

analyzed viral glycans that are predominantly derived from their

host organisms and thus should align to host glycans. As ex-

pected, the optimal alignment for the viral glycans was indeed

from their host organisms (Figures 4A and 4B), supporting the

validity of our glycan-alignment method.

We reasoned that functionally relevant glycan motifs for host-

pathogen interactions are likely conserved to some extent and

could be analyzed with glycan alignments. As an example, we

used our glycan-alignment method to align the serotype 5

capsular polysaccharide of the clinically relevant pathogen

S. aureus, which is known to increase bacterial virulence (Tziana-

bos et al., 2001), against our dataset. Because the capsular poly-

saccharides of S. aureus mediate its evasion of the immune sys-

140 Cell Host & Microbe 29, 132–144, January 13, 2021

tem (Weidenmaier and Lee, 2015), we hypothesized that

comparing these to similar sequences might offer insights to un-

derstand their pathogenicity. Notably, the best alignment results

were achieved with the enterobacterial common antigen, ECA

(Figure 4C), conserved in the Enterobacteriaceae family, which

has been shown to be important for virulence (Gilbreath et al.,

2012) and outer membrane permeability (Mitchell et al., 2018).

These findings are supported by experiments demonstrating

that ECA deficiency in E. coli can be rescued by the expression

of enzymes from serotype 5 S. aureus (Kiser and Lee, 1998).

Such a phenotype complementation could suggest that this

ECA-like glycan motif fulfills a similar role in S. aureus as the ca-

nonical ECA in E. coli.

To further probe the connection of ECA-like glycans and

increased virulence, we aligned the canonical ECA motif against

our dataset to compile a list of ECA-like sequences and their

alignment distances; we used these distances to construct a

dendrogram detailing the relationships between ECA-like glycan

sequences (Figure 4D). Although most of the S. aureus-derived

ECA-like sequences formed a separate cluster, the type 5

capsular polysaccharide was located in a different cluster with

the canonical ECA sequences. Of note, we observed an ECA-

like motif in the capsular polysaccharide of A. baumannii (Fig-

ure 4D, bold), one of the most problematic hospital-acquired

pathogens, in the same cluster dominated by canonical ECA se-

quences. The capsular polysaccharide of A. baumannii has been

implicated with antibiotic resistance and virulence (Geisinger

and Isberg, 2015), providing an intriguing potential link to the

functions of the canonical ECA. For other pathogens, such as

Haemophilus ducreyi, the expression of a gene cluster synthe-

sizing a putative ECA-like glycan has also been linked to

increased virulence (Banks et al., 2008), further suggesting a

connection of this motif with virulence. Notably, the genera

Staphylococcus, Acinetobacter, and Haemophilus are not part

of the Enterobacteriaceae family that is typically associated

with the ECA, highlighting the importance of our glycan align-

ments for screening thousands of glycans to aid in understand-

ing motifs important for pathogenicity, such as the ECA-like gly-

cans from S. aureus and A. baumannii.

DISCUSSION

Here, we presented a set of resources—a collection of deep-

learning and bioinformatics methods, together with large,

curated datasets of glycan sequences—that can be used to

gain insights into many facets of glycan-mediated host-microbe

interactions. The aggregation of many glycan sequences in our

datasets leads to robust machine-learning models that are

largely unaffected by data-entry errors, thereby adjusting for

database errors. By training a language model to understand

the hidden grammar of glycan sequences, we demonstrated

that the information in glycans can be used to predict a range

of glycan properties, such as immunogenicity or pathogenicity.

We also showed that sequences can be compared and clustered

by learning a representation for each glycan via our trained

models. For applications involving glycoproteins, the distribution

of variant glycans on a protein (Wu et al., 2018) could be ac-

counted for by averaging their representations, potentially even

weighted by their relative abundance. By developing both

Human Immunodeficiency Virus

Gal β1-4 GlcNAc β1-2 Gal β1-4 GlcNAc β1-4 Man a1-3 Gal β1-4 GlcNAc β1-2 Man a1-6 Man β1-4 GlcNAc β1-4 Fuc a1-6 GlcNAc

Gal β1-4 GlcNAc β1-2 Gal β1-4 GlcNAc β1-4 Man a1-3 Gal β1-4 GlcNAc β1-2 Man a1-6 Man β1-4 GlcNAc β1-4 Fuc a1-6 GlcNAc

1 23

Alignment Score: 115 Percent Identity: 100.0 Percent Coverage: 100.0 Species: Homo sapiens

A

B SARS-CoV-2

Man a1-3 Man a1-6 Man a1-6 Man a1-3 Man β1-4 GlcNAc β1-4 GlcNAc

Man a1-3 Man a1-6 Man a1-6 Man a1-3 Man β1-4 GlcNAc β1-4 GlcNAc

1 13

Alignment Score: 65 Percent Identity: 100.0 Percent Coverage: 100.0 Species: Homo sapiens

C D Staphylococcus aureus

ManNAcA β1-4 FucNAcOAc a1-3 D-FucNAc β1-4 ManNAcA 1 7

Alignment Score: 26 Percent Identity: 71.4 Percent Coverage: 100.0 Species: Escherichia coli

ManNAcA β1-4 GlcNAc a1-3 D-FucNAc a1-4 ManNAcA

ManNAcA β1-4 FucNAcOAc a1-3 D-FucNAc β1-4 ManNAcA 1 7

Alignment Score: 22 Percent Identity: 57.1 Percent Coverage: 100.0 Species: Escherichia coli

ManNAcA β1-4 GlcNAc a1-3 FucNAc a1-4 ManNAcA

ManNAcA β1-4 FucNAcOAc a1-3 D-FucNAc β1-4 ManNAcA 1 7

Alignment Score: 21 Percent Identity: 71.4 Percent Coverage: 100.0 Species: Yersinia pestis

ManNAcA β1-4 GlcNAcOAc a1-3 D-FucNAc a1-4 ManNAcA

ManNAcβ1-4Glcα1-4ManNAc S.marcescens

ManNAcAβ1-4FucNAcOAcα1-3D-FucNAcβ1-4ManNAcA S.aureus

Manα1-2Fucα1-2GlcOAcAβ1-3GalNAc P.alcalifaciens

Manα1-3FucNAcα1-3GlcNAcβ1-3FucNAc C.universalis

ManNAcAβ1-4GlcNAcα1-3D-FucNAcα1-4ManNAcA S.sonnei

ManNAcAβ1-4GlcNAcα1-3D-FucNAcα1-4ManNAcA E.coli

ManNAcOAcAβ1-3FucNAcα1-3D-FucNAcβ1-3ManNAcOAcA S.aureus

Manα1-3D-Fucα1-3GlcNAcβ1-3Rha P.agglomerans

ManNAcα1-3Rhaβ1-4GlcNAcα1-2Man.1 S.dysenteriae

ManNAcAβ1-4GlcNAcα1-3D-FucNAcα1-4ManNAcA Y.enterocolitica

ManNAcβ1-4Glcβ1-3ManNAc.1 C.werkmanii

GalNAcβ1-4GlcAβ1-3D-FucNAcNβ1-3GalNAc P.temperata

ManNAcβ1-4Glcβ1-3ManNAc C.braakii

ManNAcAβ1-4GlcNAcα1-3FucNAcα1-4ManNAcA E.coli

ManNAcAβ1-4GlcNAcα1-3D-FucNAcα1-4ManNAcA S.enterica

Manα1-3Fucα1-3GlcNAcβ1-4GalNAc E.tarda

ManNAcAβ1-4GlcNAcα1-3D-FucNAcα1-4ManNAcA Y.pestis

Manα1-2Fucα1-2Glcβ1-3GlcNAc P.rustigianii

Manβ1-4Glcβ1-3D-FucNAcOAcα1-4GalNAc C.gillenii

ManNAcβ1-4GlcNAcα1-3D-FucNAcα1-4ManNAc E.coli

ManNAcOAcAβ1-4FucNAcα1-3D-FucNAcβ1-4ManNAcOAcA S.aureus

Manα1-3Fucα1-3GlcNAcα1-2Man.1 Y.entomophaga

ManNAcAβ1-4GlcNAcα1-3D-FucNAcα1-4ManNAcA P.mirabilis

ManNAcAβ1-4GlcNAcOAcβ1-3D-FucNAcα1-4ManNAcA Y.pestis

ManNAcOAcAβ1-3FucNAcα1-3FucNAcα1-3ManNAcOAcA S.aureus

ManNAcOAcAβ1-3FucNAcα1-3D-FucNAcα1-3ManNAcOAcA S.aureus

ManNAcβ1-4GlcNAcα1-4ManNAc H.alvei

ManNAcAβ1-4ManNAcAβ1-3D-FucNAcα1-4ManNAcA A.baumannii

Manα1-3Fucα1-3GlcNAcα1-2Man Y.pseudotuberculosis

ManNAcAα1-4FucNAcα1-3D-FucNAcβ1-4ManNAcA S.aureus

ManNAcβ1-4GlcNAcβ1-6GlcNAcα1-4ManNAc B.anthracis

ManNAcAβ1-4GlcNAcα1-4ManNAcA P.putida ManNAcAβ1-4GlcNAcα1-4ManNAcAβ1-3D-FucNAcα1-4ManNAcA A.globiformis

ManNAcAβ1-4L-GulNAcOAcAα1-3QuiNAcNButα1-4ManNAcA A.haemolyticus ManNAcAβ1-4GlcNAcAβ1-6Glca1-4ManNAcA A.cyaneus

ManNAcα1-3Rhaβ1-4GlcNAcα1-2Man R.terrigena

ManNAcAβ1-4GlcNAcα1-3D-FucNAcα1-4ManNAcA P.shigelloides

ManNAcAβ1-4GlcNAcNAmAβ1-3GlcNAcα1-4ManNAcA E.albertii

ManNAcAβ1-4GlcNAcOAcα1-3D-FucNAcα1-4ManNAcA Y.pestis

C an

on ic

al S

. a ur

eu s

Enterobacterial common antigen

Figure 4. Glycan Alignments Identify Pathogenicity-Associated Glycan Motifs

(A and B) Viral glycans aligned to host glycans. We aligned viral glycans to all glycans and depicted the highest scoring alignment.

(C) Glycan alignments using serotype 5 capsular polysaccharide of S. aureus. The repeating unit of the glycan was aligned against our database, and the best

three alignments are shown.

(D) ECA and ECA-like glycans. We aligned the canonical ECA sequence against our entire dataset, curated ECA-like sequences from the best 50 alignments, and

constructed a dendrogram from alignment distances.

ll OPEN ACCESSResource

transfer-learning and data-augmentation methods for glycan-

focused machine learning, we also addressed the pressing issue

of the limited availability of glycan sequences due to experi-

mental difficulties, enabling machine learning for many applica-

tions in glycobiology.

Our deep-learning strategies enabled us to introduce lan-

guage models for glycans, while our curated datasets offer a

state-of-the-art coverage for glycan sequences across a multi-

tude of organisms. In contrast to word2vec-type models (Miko-

lov et al., 2013), our language-model-based approach captured

sequential information beyond mere co-occurrences in glycan

sequences and thus achieved better predictive results than

alternative machine-learning techniques. This also enabled us

to analyze glycan motifs, such as those important for immuno-

genicity and pathogenicity, that are dependent on sequential

information and their relative position in glycans. Additionally,

starting from a glycoletter-based model allowed for the con-

struction of embeddings for close to 1.2 trillion glycowords,

making SweetTalk easily extendable to the full diversity of gly-

cobiology. SweetTalk can also incorporate position-specific

modifications, illustrating its flexibility and potential for the anal-

ysis of information-rich glycosaminoglycans to predict, for

instance, viral binding such as required for severe acute respi-

ratory syndrome coronavirus 2 (SARS-CoV-2) cell entry (Liu

et al., 2020).

Our resources can be utilized as a complete workflow, from a

glycan dataset to motifs obtained by machine learning and

further analyzed by glycan alignment, or as separate modules.

The accuracy exhibited by our SweetOrigins models demon-

strated that glycans can be used to distinguish closely related

taxonomic groups and provided the means to leverage the

evolutionary information in glycans for predictive purposes.

Our observation that E. coli glycans are predictive of pathoge-

nicity adds to the role of glycans as mediators of host-microbe

relationships (Poole et al., 2018). The continuum of pathogenicity

of E. coli strains, suggested by our deep-learning model, further

adds to the redefinition of the notion of pathogenicity from a bi-

nary concept to a gradual, environmentally controlled process

(Casadevall, 2017), mediated and influenced by glycans.

Both glycan alignments and glycan classification can connect

glycan functions with sequence patterns, which we have used to

derive insight from glycan motifs by analyzing glycans that could

potentially be used for molecular-mimicry-mediated immune

evasion by pathogenic E. coli strains. We further hypothesized

Cell Host & Microbe 29, 132–144, January 13, 2021 141

ll OPEN ACCESS Resource

that glycan-based molecular mimicry, in addition to mimicking

host glycans, could also extend to approximating glycans from

other bacteria for increased virulence, e.g., as in the case of

the capsular polysaccharides of S. aureus and A. baumannii, in

which we hypothesized that they potentially mimicked the ECA

of other bacteria. Our glycan-alignment method readily facili-

tated a hypothesis of the ECA mimicry performed by glycans

of these pathogens, with a potentially broader relevance of this

phenomenon in other pathogens, such as H. ducreyi, that are

predicted to engage in ECA mimicry as well. In general, the re-

sources developed here enable rapid discovery, understanding,

and utilization of functionally relevant glycan motifs from glycan

datasets, especially in the context of host-pathogen interac-

tions. Another important feature of trained machine-learning

models is the prediction of properties for newly acquired sam-

ples, such as predicting the pathogenic potential of newly iden-

tified E. coli strains based on their glycans. As glycobiology pro-

gresses, SugarBase and our deep-learning models could be

readily expanded and updated, enabling an even more compre-

hensive investigation of glycan-mediated host-microbe interac-

tions. This will eventually allow for precise classification at the

subspecies level using language-model-based approaches,

facilitating the glycan-based study of host-microbe interactions

at unprecedented resolution.

STAR+METHODS

Detailed methods are provided in the online version of this paper

and include the following:

d KEY RESOURCES TABLE

d RESOURCE AVAILABILITY

B Lead Contact

B Materials Availability

B Data and Code Availability

d METHOD DETAILS

B Dataset

B Data Processing

B Analyzing Links in Glycan Sequences

B Glycan In Silico Modification

B Glycan Alignment

B Model Training

d QUANTIFICATION AND STATISTICAL ANALYSIS

SUPPLEMENTAL INFORMATION

Supplemental Information can be found online at https://doi.org/10.1016/j.

chom.2020.10.004.

ACKNOWLEDGMENTS

The authors would like to thank Jacqueline Valeri and Mathieu Groussin for

helpful discussions. This work was supported by the Predictive BioAnalytics

Initiative at the Wyss Institute for Biologically Inspired Engineering.

AUTHOR CONTRIBUTIONS

D.B. conceived the method. D.B., D.M.C., and J.J.C. designed the experi-

ments. D.B. performed the experiments and implemented the method.

R.K.P. developed the SugarBase web tool. D.M.C. and J.J.C. supervised the

work. D.B., R.K.P., D.M.C., and J.J.C. wrote and edited the manuscript.

142 Cell Host & Microbe 29, 132–144, January 13, 2021

DECLARATION OF INTERESTS

The authors declare no competing interests.

Received: June 29, 2020

Revised: September 9, 2020

Accepted: October 8, 2020

Published: October 28, 2020

REFERENCES

Alley, E.C., Khimulya, G., Biswas, S., AlQuraishi, M., and Church, G.M. (2019).

Unified rational protein engineering with sequence-based deep representation

learning. Nat. Methods 16, 1315–1322.

Almagro Armenteros, J.J., Johansen, A.R., Winther, O., and Nielsen, H. (2020).

Language modelling for biological sequences – curated datasets and base-

lines. bioRxiv. https://doi.org/10.1101/2020.03.09.983585.

Banks, K.E., Fortney, K.R., Baker, B., Billings, S.D., Katz, B.P., Munson, R.S.,

Jr., and Spinola, S.M. (2008). The enterobacterial common antigen-like gene

cluster of Haemophilus ducreyi contributes to virulence in humans. J. Infect.

Dis. 197, 1531–1536.

Bardor, M., Faveeuw, C., Fitchette, A.-C., Gilbert, D., Galas, L., Trottein, F.,

Faye, L., and Lerouge, P. (2003). Immunoreactivity in mammals of two typical

plant glyco-epitopes, core alpha(1,3)-fucose and core xylose. Glycobiology

13, 427–434.

Bashir, S., Leviatan Ben Arye, S., Reuven, E.M., Yu, H., Costa, C., Galiñanes,

M., Bottio, T., Chen, X., and Padler-Karavani, V. (2019). Presentation Mode of

Glycans Affect Recognition of Human Serum anti-Neu5Gc IgG Antibodies.

Bioconjug. Chem. 30, 161–168.

Bovin, N., Obukhova, P., Shilova, N., Rapoport, E., Popova, I., Navakouski, M.,

Unverzagt, C., Vuskovic, M., and Huflejt, M. (2012). Repertoire of human nat-

ural anti-glycan immunoglobulins. Do we have auto-antibodies? Biochim.

Biophys. Acta 1820, 1373–1382.

Camacho, D.M., Collins, K.M., Powers, R.K., Costello, J.C., and Collins, J.J.

(2018). Next-Generation Machine Learning for Biological Networks. Cell 173,

1581–1592.

Campbell, M.P., Peterson, R., Mariethoz, J., Gasteiger, E., Akune, Y., Aoki-

Kinoshita, K.F., Lisacek, F., and Packer, N.H. (2014). UniCarbKB: building a

knowledge platform for glycoproteomics. Nucleic Acids Res. 42, D215–D221.

Carlin, A.F., Uchiyama, S., Chang, Y.-C., Lewis, A.L., Nizet, V., and Varki, A.

(2009). Molecular mimicry of host sialylated glycans allows a bacterial path-

ogen to engage neutrophil Siglec-9 and dampen the innate immune response.

Blood 113, 3333–3336.

Casadevall, A. (2017). The Pathogenic Potential of a Microbe. MSphere 2,

e00015–e00017.

Day, C.J., Tran, E.N., Semchenko, E.A., Tram, G., Hartley-Tassell, L.E., Ng,

P.S.K., King, R.M., Ulanovsky, R., McAtamney, S., Apicella, M.A., et al.

(2015). Glycan:glycan interactions: High affinity biomolecular interactions

that can mediate binding of pathogenic bacteria to host cells. Proc. Natl.

Acad. Sci. USA 112, E7266–E7275.

Dekkers, G., Treffers, L., Plomp, R., Bentlage, A.E.H., de Boer, M., Koeleman,

C.A.M., Lissenberg-Thunnissen, S.N., Visser, R., Brouwer, M., Mok, J.Y., et al.

(2017). Decoding the Human Immunoglobulin G-Glycan Repertoire Reveals a

Spectrum of Fc-Receptor- and Complement-Mediated-Effector Activities.

Front. Immunol. 8, 877.

Do�gan, T., and Karaçalı, B. (2013). Automatic identification of highly conserved family regions and relationships in genome wide datasets including remote

protein sequences. PLoS One 8, e75458.

Dotan, N., Altstock, R.T., Schwarz, M., and Dukler, A. (2006). Anti-glycan an-

tibodies as biomarkers for diagnosis and prognosis. Lupus 15, 442–450.

Geisinger, E., and Isberg, R.R. (2015). Antibiotic modulation of capsular exo-

polysaccharide and virulence in Acinetobacter baumannii. PLoS Pathog. 11,

e1004691.

Gilbreath, J.J., Colvocoresses Dodds, J., Rick, P.D., Soloski, M.J., Merrell,

D.S., and Metcalf, E.S. (2012). Enterobacterial common antigen mutants of

ll OPEN ACCESSResource

Salmonella enterica serovar Typhimurium establish a persistent infection and

provide protection against subsequent lethal challenge. Infect. Immun. 80,

441–450.

Glorot, X., and Bengio, Y. (2010). Understanding the difficulty of training deep

feedforward neural networks, in: Proceedings of the Thirteenth International

Conference on Artificial Intelligence and Statistics. Presented at the

Proceedings of the Thirteenth International Conference on Artificial

Intelligence and Statistics, pp. 249–256.

Greenfield, L.K., Richards, M.R., Li, J., Wakarchuk, W.W., Lowary, T.L., and

Whitfield, C. (2012). Biosynthesis of the polymannose lipopolysaccharide O-

antigens from Escherichia coli serotypes O8 and O9a requires a unique com-

bination of single- and multiple-active site mannosyltransferases. J. Biol.

Chem. 287, 35078–35091.

Haines-menges, B.L., Whitaker, W.B., Lubin, J.B., and Boyd, E.F. (2015). Host

Sialic Acids: A Delicacy for the Pathogen with Discerning Taste. In Metabolism

and Bacterial Pathogenesis, C. Conway, ed. (American Society of

Microbiology), pp. 321–342.

Haltiwanger, R.S., and Lowe, J.B. (2004). Role of glycosylation in develop-

ment. Annu. Rev. Biochem. 73, 491–537.

Hochreiter, S., and Schmidhuber, J. (1997). Long short-term memory. Neural

Comput. 9, 1735–1780.

Hong, Y., and Reeves, P.R. (2014). Diversity of o-antigen repeat unit structures

can account for the substantial sequence variation of wzx translocases.

J. Bacteriol. 196, 1713–1722.

Howard, J., and Ruder, S. (2018). Universal Language Model Fine-tuning for

Text Classification. arXiv.

Kappler, K., and Hennet, T. (2020). Emergence and significance of carbohy-

drate-specific antibodies. Genes Immun. 21, 224–239.

Khasbiullina, N.R., Shilova, N.V., Navakouski, M.J., Nokel, A.Yu., Blixt, O.,

Kononov, L.O., Knirel, Y.A., and Bovin, N.V. (2019). The Repertoire of

Human Antiglycan Antibodies and Its Dynamics in the First Year of Life.

Biochemistry (Mosc.) 84, 608–616.

Kiser, K.B., and Lee, J.C. (1998). Staphylococcus aureus cap5O and cap5P

genes functionally complement mutations affecting enterobacterial com-

mon-antigen biosynthesis in Escherichia coli. J. Bacteriol. 180, 403–406.

Knirel, Y.A. (2011). Structure of O-Antigens. In Bacterial Lipopolysaccharides,

Y.A. Knirel and M.A. Valvano, eds. (Springer Vienna), pp. 41–115.

Lairson, L.L., Henrissat, B., Davies, G.J., and Withers, S.G. (2008).

Glycosyltransferases: structures, functions, and mechanisms. Annu. Rev.

Biochem. 77, 521–555.

Lauc, G., Kri�sti�c, J., and Zoldo�s, V. (2014). Glycans – the third revolution in evo-

lution. Front. Genet. 5, 145.

Lavine, C.L., Lao, S., Montefiori, D.C., Haynes, B.F., Sodroski, J.G., and Yang,

X.; NIAID Center for HIV/AIDS Vaccine Immunology (CHAVI) (2012). High-

mannose glycan-dependent epitopes are frequently targeted in broad neutral-

izing antibody responses during human immunodeficiency virus type 1 infec-

tion. J. Virol. 86, 2153–2164.

Lim, J.Y., Yoon, J., and Hovde, C.J. (2010). A brief overview of Escherichia coli

O157:H7 and its plasmid O157. J. Microbiol. Biotechnol. 20, 5–14.

Liu, L., Chopra, P., Li, X., Wolfert, M.A., Tompkins, S.M., and Boons, G.-J.

(2020). SARS-CoV-2 spike protein binds heparan sulfate in a length- and

sequence-dependent manner. bioRxiv. 2020.05.10.087288. https://doi.org/

10.1101/2020.05.10.087288.

Lombard, V., Golaconda Ramulu, H., Drula, E., Coutinho, P.M., and Henrissat,

B. (2014). The carbohydrate-active enzymes database (CAZy) in 2013. Nucleic

Acids Res. 42, D490–D495.

Lundberg, S.M., and Lee, S.-I. (2017). A Unified Approach to Interpreting

Model Predictions. In Advances in Neural Information Processing Systems,

Volume 30, I. Guyon, U.V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S.

Vishwanathan, and R. Garnett, eds. (Curran Associates, Inc), pp. 4765–4774.

McDonald, A.G., Tipton, K.F., and Davey, G.P. (2016). A Knowledge-Based

System for Display and Prediction of O-Glycosylation Network Behaviour in

Response to Enzyme Knockouts. PLoS Comput. Biol. 12, e1004844.

Mikolov, T., Chen, K., Corrado, G., and Dean, J. (2013). Efficient Estimation of

Word Representations in Vector Space. arXiv.

Mitchell, A.M., Srikumar, T., and Silhavy, T.J. (2018). Cyclic Enterobacterial

Common Antigen Maintains the Outer Membrane Permeability Barrier of

Escherichia coli in a Manner Controlled by YhdP. mBio 9, e01321-18.

Needleman, S.B., and Wunsch, C.D. (1970). A general method applicable to

the search for similarities in the amino acid sequence of two proteins.

J. Mol. Biol. 48, 443–453.

Park, D., Xu, G., Barboza, M., Shah, I.M., Wong, M., Raybould, H., Mills, D.A.,

and Lebrilla, C.B. (2017). Enterocyte glycosylation is responsive to changes in

extracellular conditions: implications for membrane functions. Glycobiology

27, 847–860.

Paschinger, K., Fabini, G., Schuster, D., Rendi�c, D., and Wilson, I.B.H. (2005).

Definition of immunogenic carbohydrate epitopes. Acta Biochim. Pol. 52,

629–632.

Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen,

T., Lin, Z., Gimelshein, N., Antiga, L., et al. (2019). PyTorch: An Imperative

Style, High-Performance Deep Learning Library. arXiv.

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O.,

Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., et al. (2011). Scikit-learn:

Machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830.

Perez, L., and Wang, J. (2017). The Effectiveness of Data Augmentation in

Image Classification using Deep Learning. arXiv.

Pochechueva, T., Jacob, F., Fedier, A., and Heinzelmann-Schwarz, V. (2012).

Tumor-associated glycans and their role in gynecological cancers: acceler-

ating translational research by novel high-throughput approaches.

Metabolites 2, 913–939.

Poole, J., Day, C.J., von Itzstein, M., Paton, J.C., and Jennings, M.P. (2018).

Glycointeractions in bacterial pathogenesis. Nat. Rev. Microbiol. 16, 440–452.

Reusch, D., and Tejada, M.L. (2015). Fc glycans of therapeutic antibodies as

critical quality attributes. Glycobiology 25, 1325–1334.

Samraj, A.N., Bertrand, K.A., Luben, R., Khedri, Z., Yu, H., Nguyen, D., Gregg,

C.J., Diaz, S.L., Sawyer, S., Chen, X., et al. (2018). Polyclonal human anti-

bodies against glycans bearing red meat-derived non-human sialic acid N-gly-

colylneuraminic acid are stable, reproducible, complex and vary between indi-

viduals: Total antibody levels are associated with colorectal cancer risk. PLoS

One 13, e0197464.

Sherstinsky, A. (2020). Fundamentals of Recurrent Neural Network (RNN) and

Long Short-Term Memory (LSTM) network. Phys. Nonlinear Phenom. 404,

132306.

Silipo, A., and Molinaro, A. (2010). The Diversity of the Core Oligosaccharide in

Lipopolysaccharides. In Endotoxins: Structure, Function and Recognition, X.

Wang and P.J. Quinn, eds. (Springer Netherlands), pp. 69–99.

Solá, R.J., and Griebenow, K. (2009). Effects of glycosylation on the stability of

protein pharmaceuticals. J. Pharm. Sci. 98, 1223–1245.

Spahn, P.N., Hansen, A.H., Hansen, H.G., Arnsdorf, J., Kildegaard, H.F., and

Lewis, N.E. (2016). A Markov chain model for N-linked protein glycosylation–

towards a low-parameter tool for model-driven glycoengineering. Metab.

Eng. 33, 52–66.

Strodthoff, N., Wagner, P., Wenzel, M., and Samek, W. (2020). UDSMProt: uni-

versal deep sequence models for protein classification. Bioinformatics 36,

2401–2409.

Tan, C., Sun, F., Kong, T., Zhang, W., Yang, C., and Liu, C. (2018). A Survey on

Deep Transfer Learning. arXiv.

Tanaka, K., Aoki-Kinoshita, K.F., Kotera, M., Sawaki, H., Tsuchiya, S., Fujita,

N., Shikanai, T., Kato, M., Kawano, S., Yamada, I., and Narimatsu, H. (2014).

WURCS: the Web3 unique representation of carbohydrate structures.

J. Chem. Inf. Model. 54, 1558–1566.

Thompson, A.J., de Vries, R.P., and Paulson, J.C. (2019). Virus recognition of

glycan receptors. Curr. Opin. Virol. 34, 117–129.

Tiemeyer, M., Aoki, K., Paulson, J., Cummings, R.D., York, W.S., Karlsson,

N.G., Lisacek, F., Packer, N.H., Campbell, M.P., Aoki, N.P., et al. (2017).

GlyTouCan: an accessible glycan structure repository. Glycobiology 27,

915–919.

Cell Host & Microbe 29, 132–144, January 13, 2021 143

ll OPEN ACCESS Resource

Toukach, P.V., and Egorova, K.S. (2016). Carbohydrate structure database

merged from bacterial, archaeal, plant and fungal parts. Nucleic Acids Res.

44 (D1), D1229–D1236.

Tsuchiya, S., Yamada, I., and Aoki-Kinoshita, K.F. (2019).

GlycanFormatConverter: a conversion tool for translating the complexities of

glycans. Bioinformatics 35, 2434–2440.

Tzianabos, A.O., Wang, J.Y., and Lee, J.C. (2001). Structural rationale for the

modulation of abscess formation by Staphylococcus aureus capsular poly-

saccharides. Proc. Natl. Acad. Sci. USA 98, 9365–9370.

Valeri, J.A., Collins, K.M., Ramesh, P., Alcantar, M.A., Lepe, B.A., Lu, T.K., and

Camacho, D.M. (2020). Sequence-to-function deep learning frameworks for

engineered riboregulators. Nat Commun 11, 5058, https://doi.org/10.1038/

s41467-020-18676-2.

Varki, A. (2017). Biological roles of glycans. Glycobiology 27, 3–49.

144 Cell Host & Microbe 29, 132–144, January 13, 2021

Varki, A., and Gagneux, P. (2015). Biological Functions of Glycans. In

Essentials of Glycobiology, A. Varki, R.D. Cummings, J.D. Esko, P. Stanley,

G.W. Hart, M. Aebi, A.G. Darvill, T. Kinoshita, N.H. Packer, and J.H.

Prestegard, et al., eds. (Cold Spring Harbor Laboratory Press).

Viljanen, M.K., Peltola, T., Junnila, S.Y., Olkkonen, L., J€arvinen, H., Kuistila, M.,

and Huovinen, P. (1990). Outbreak of diarrhoea due to Escherichia coli

O111:B4 in schoolchildren and adults: association of Vi antigen-like reactivity.

Lancet 336, 831–834.

Weidenmaier, C., and Lee, J.C. (2015). Structure and Function of Surface

Polysaccharides of Staphylococcus aureus. In Staphylococcus Aureus, F.

Bagnoli, R. Rappuoli, and G. Grandi, eds. (Springer International Publishing),

pp. 57–93.

Wu, D., Struwe, W.B., Harvey, D.J., Ferguson, M.A.J., and Robinson, C.V.

(2018). N-glycan microheterogeneity regulates interactions of plasma pro-

teins. Proc. Natl. Acad. Sci. USA 115, 8763–8768.

ll OPEN ACCESSResource

STAR+METHODS

KEY RESOURCES TABLE

REAGENT or RESOURCE SOURCE IDENTIFIER

Software and Algorithms

PyTorch Paszke et al., 2019 https://github.com/pytorch/pytorch

Scikit-learn Pedregosa et al., 2011 https://github.com/scikit-learn/scikit-learn

Apex N/A https://github.com/NVIDIA/apex

Python-alignment N/A https://github.com/eseraygun/python-alignment

SHAP Lundberg and Lee, 2017 https://github.com/slundberg/shap

SweetTalk This paper https://github.com/midas-wyss/sweettalk

SweetOrigins This paper https://github.com/midas-wyss/sweetorigins

SugarBase This paper https://webapps.wyss.harvard.edu/sugarbase

RESOURCE AVAILABILITY

Lead Contact Communication should be directed to the lead contact, James J. Collins ([email protected]).

Materials Availability This study did not generate new unique reagents.

Data and Code Availability Data used for all analyses can be found in the supplementary tables. All code and trained models can be found at https://github.com/

midas-wyss/sweettalk and https://github.com/midas-wyss/sweetorigins.

METHOD DETAILS

Dataset To create a comprehensive glycan dataset annotated with species labels, we manually curated 12,674 glycan sequences from three

sources: UniCarbKB (Campbell et al., 2014), the Carbohydrate Structure Database (CSDB) (Toukach and Egorova, 2016), and the

peer-reviewed scientific literature. From UniCarbKB, we compiled all glycans with species information, a length of at least three

monosaccharides to facilitate usage with machine learning models, and a working link to PubChem to retrieve their sequences.

We further complemented and extended this list by gathering glycans deposited in the Carbohydrate Structure Database (CSDB)

up to December 2019 with a length of at least three monosaccharides. For species with more than 15 strains available on CSDB,

only glycans from the first 15 strains were recorded to prevent taxonomic bias. For the model organism E. coli, all available glycan

sequences were recorded to facilitate a strain-based analysis. Labels for E. coli strain pathogenicity were assigned, if possible, via

the peer-reviewed academic literature. Finally, we performed additional literature searches, predominantly adding viral and archaeal

glycans, which are underrepresented in the other databases. We revised and completed the annotations for all species’ taxonomic

characterization (species, genus, family, order, class, phylum, kingdom, domain) based on the NCBI Taxonomy Browser. In total, the

dataset contained sequences from 1,726 different species from a range of 39 taxonomic phyla. To the best of our knowledge, this

database represents the most comprehensive and current resource of glycans and their species information to date (Table S1).

To enable transfer learning by first pre-training a language model, we also added glycan sequences that lacked species informa-

tion, by extracting the Web3 Unique Representation of Carbohydrate Structures (WURCS) representation (Tanaka et al., 2014) of the

set of all glycans with at least three monosaccharides deposited on GlyTouCan (Tiemeyer et al., 2017) that were also available on

PubChem (n = 18,926) and the databases mentioned above; this resulted in an augmented database containing 19,299 unique glycan

sequences (Table S2). For all glycans, we relied on the quality control of the respective database. All glycans in WURCS represen-

tation were reformatted into the IUPAC condensed representation, using the GlycanFormatConverter software (Tsuchiya et al.,

2019). For the immunogenicity classifier, all GlycoEpitope (https://www.glycoepitope.jp) entries with a minimum length of at least

three monosaccharides were extracted. This list was further complemented by targeted literature searches (Bardor et al., 2003; Ba-

shir et al., 2019; Bovin et al., 2012; Dotan et al., 2006; Hong and Reeves, 2014; Khasbiullina et al., 2019; Knirel, 2011; Paschinger et al.,

2005; Pochechueva et al., 2012; Samraj et al., 2018; Silipo and Molinaro, 2010) resulting in the final set of immunogenic glycans

(n = 685, Table S2). We included protein-, lipid-, and small molecule-associated glycans as well as capsular and extracellular

Cell Host & Microbe 29, 132–144.e1–e3, January 13, 2021 e1

ll OPEN ACCESS Resource

polysaccharides in our dataset of 19,299 glycans. All these glycans were paired with an ID to allow for our relational database Sugar-

Base, linking all available information (linkage type, species information, human immunogenicity, etc.) to a glycan sequence (Table

S2). Additionally, we included representations learned by our language model for all observed glycoletters (monosaccharides or

bonds) as well as glycowords (trisaccharides).

Data Processing Glycan sequences were processed by removing dangling bonds (e.g., ‘(a1-’). Analogous to word stemming in natural language pro-

cessing, unifying different inflections of the same word, we removed position-specific information of monosaccharide modifications

to reduce vocabulary size. Then, we harmonized capitalization and, in the case of glycan repeat structures, appended the first mono-

saccharide to their end to capture more sequence context. Additional steps to exclude duplicated glycans included strict ordering of

multiple branches with equal lengths by ascending connection to the main branch (e.g., branch ending in ‘a1-2’ before branch ending

in ‘b1-4’). For branches closest to the non-reducing end, the longest branch was defined as the main chain. Observed monosaccha-

ride modifications necessitated a hierarchy of order (in case of multiple modifications on the same monosaccharide) to avoid dupli-

cates or mislabeling: NAc > OAc > NGc > OGc > NS > OS > NP > OP > NAm > OAm > NBut > OBut > NProp > OProp > NMe > OMe >

CMe > NFo > OFo > OPPEtn > OPEtn > OEtn > A > N > SH > OPCho > OPyr > OVac > OPam > OEtg > OFer > OSin > OAep > OCoum >

ODco > OLau > OSte > OOle > OBz > OCin > OAch > OMal > OMar > OOrn > rest.

Data processing for model training included featurization of glycan sequences into glycoletters (e.g., ‘Gal’), as well as glycowords

(three monosaccharides connected by two bonds). The conversion of a glycan sequence into glycowords, from the non-reducing to

the reducing end, resulted in a list of partially overlapping glycowords, with maximum overlap so that two subsequent glycowords

only differed in one monosaccharide and one bond. The aim of these glycowords is to capture representative characteristics and

local structural contexts of a given glycan. The dataset comprising all glycowords (n = 113,112) was then used to train a context-spe-

cific, glycoletter-based language model. For scrambled glycan sequences, the order of glycoletters in any given glycan was randomly

shuffled to maintain composition but erase patterns. All abbreviations for glycan nomenclature in this work can be found in Table S7.

Analyzing Links in Glycan Sequences To determine typical local structural contexts of monosaccharides and bonds, we quantified the frequency of a given monosaccha-

ride co-occurring with any other monosaccharide in our extensive database of unique glycans. Additionally, we also compared the

relative frequencies of a particular monosaccharide being observed in the glycan main branch versus a side branch in our database.

Glycan In Silico Modification We performed in silico modification of glycans by replacing monosaccharides and/or bonds with other observed monosaccharides/

bonds. We used exhaustive modification, replacing glycoletters with all possible glycoletters, while only retaining modified glycans

comprising previously observed glycowords. This ensured physiological relevance, given the extreme sparsity of observed glycan

sequences compared to the theoretical number of possibilities.

Glycan Alignment Global sequence alignment of glycans was implemented according to the Needleman-Wunsch algorithm (Needleman and Wunsch,

1970) by adapting the Python Alignment library (https://github.com/eseraygun/python-alignment). For our GLYcan SUbstitution Ma-

trix (GLYSUM; Table S6), the exhaustive list of in silico modifications resulting in glycans with observed glycowords was generated

(n = 1,238,879). All thereby observed monosaccharide and/or bond substitutions were recorded in a symmetric matrix and converted

into substitution frequencies by dividing them by the total number of retained modifications. The substitution score Sij for each

possible substitution was then calculated with the following formula:

Sij = l log

� pij

qi � qj

The substitution frequency is hereby denoted as pij, while qi and qj describe the observed base frequencies of the respective gly-

coletters. Additionally, we used l as a scaling factor (a value of four in this work) to arrive at suitable integer values by rounding all

values up or down. Substitutions never observed during this procedure received a final value of �5, lower than any of the observed substitution scores, while the diagonal values of the substitution matrix were set at 5, higher than any of the observed substitution

scores. The penalty for gaps for alignments in this work was set at �5, to match the minimal substitution score.

Model Training All models were trained on an NVIDIA� Tesla� K80 GPU using PyTorch (Paszke et al., 2019). For all models, architecture and hyper- parameters were optimized by minimizing the respective loss function. For the language models, we used mixed precision training

utilizing the Apex library (https://github.com/nvidia/apex). For language models and classifiers, we randomly split the respective da-

taset into 80% for training and 20% for validation. A modified stratified shuffle split was used to randomly split glycans into training

and validation sets for the species classifier so that, for every class, 80% of the glycans were present in the training set and 20% in the

validation set. Further, only classes comprising at least five glycans were used for training and testing the SweetOrigins models. We

e2 Cell Host & Microbe 29, 132–144.e1–e3, January 13, 2021

ll OPEN ACCESSResource

employed data augmentation by forming a generalizable subset of all possible isomorphic glycans if a glycan sequence had isomor-

phic glycans. Specifically, we swapped the order of double branches and exhaustively exchanged the main branch with the side

branches closest to the non-reducing end in the bracket notation (Figure 3B). The resulting sequence in the bracket notation still

described the same glycan in a slightly different way, increasing model robustness during training. Glycans were converted into lists

of glycowords describing the glycans, brought to equal lengths using a padding token facilitating model training, and used in batches

of 32 glycans for training and testing.

SweetTalk and the SweetOrigins models for each taxonomic level consisted of a three-layered, bidirectional recurrent neural

network using long short-term memory (LSTM) units (Hochreiter and Schmidhuber, 1997) with 128 nodes per layer, including an

embedding layer for the glycowords. The concatenated hidden representation learned by the bidirectional LSTMs was then pro-

jected to a fully connected layer at the end for the final prediction. The language model SweetTalk was trained by predicting the

next glycoletters, given preceding glycoletters, in the context of glycowords, thereby learning the local structural context of glyco-

letters. The embedding layer for classifiers was derived by first training a glycoletter-based language model and then extracting the

learned glycoletters embedding and calculating initial glycoword embeddings for SweetOrigins. The last, fully connected layer in all

models was initialized by Xavier initialization (Glorot and Bengio, 2010) and the number of nodes was determined by the number of

classes for each classifier. We used a cross-entropy loss function and the ADAM optimizer with a starting learning rate of 0.0001

(decaying it with a cosine function over 100 epochs during training) and a weight decay of 0.005. Additionally, we employed an early

stopping criterion after 10 epochs without improvement in validation loss for regularization.

The model for predicting E. coli strain pathogenicity followed the same architecture except for using 150 nodes per layer, a binary

cross-entropy loss function, and a learning rate of 0.0005. Machine learning models used for comparison comprised random forest

classifiers and support vector machines for classification. For the implementation of these models, we used the scikit-learn imple-

mentation (Pedregosa et al., 2011). Feature importances were extracted using SHAP (SHapley Additive exPlanations) values (Lund-

berg and Lee, 2017). Hyperparameters for all methods were optimized by maximization of accuracy via 5-fold cross-validation.

QUANTIFICATION AND STATISTICAL ANALYSIS

This study did not use statistical analysis. All experimental details can be found in the STAR Methods section.

Cell Host & Microbe 29, 132–144.e1–e3, January 13, 2021 e3

  • Deep-Learning Resources for Studying Glycan-Mediated Host-Microbe Interactions
    • Introduction
    • Results
      • Curating Glycan Datasets for Glycobiology and Glycan-Mediated Host-Microbe Interactions
      • Using Natural Language Processing to Learn the Grammar of Glycans
      • Predicting Glycan Immunogenicity with a Glycan-Based Language Model
      • Using Deep Learning to Provide Evolution-Informed Glycan Representations
      • Using Glycan Alignments to Study Virulence Determinants in Bacterial Pathogens
    • Discussion
    • Supplemental Information
    • Acknowledgments
    • Author Contributions
    • Declaration of Interests
    • References
    • STAR★Methods
      • Key Resources Table
      • Resource Availability
        • Lead Contact
        • Materials Availability
        • Data and Code Availability
      • Method Details
        • Dataset
        • Data Processing
        • Analyzing Links in Glycan Sequences
        • Glycan In Silico Modification
        • Glycan Alignment
        • Model Training
      • Quantification and Statistical Analysis

,

Running Head: CRITIQUE OF OCEAN TEMPERATURES IN CORAL REEFS

CRITIQUE OF OCEAN TEMPERATURES IN CORAL REEFS Madison McNeill

Introduction

Coral reef ecosystems are the most diverse marine ecosystem in the world. They provide a home to thousands of species of plants and animals. In the last few decades, global warming has caused increased temperatures, resulting in ocean acidification and increasing surface temperatures of the ocean. This can lead to the bleaching of coral reefs as well as the death of coral reef fishes due to their inability to acclimate to the elevated temperature. These three papers were chosen, because they illustrate the environmental impact higher temperatures have on these coral reefs and the organisms that live within them.

· Dias, M., Ferreira, A., Gouveia, R., Cereja, R., & Vinagre, C. (2018). Mortality, growth and regeneration following fragmentation of reef-forming corals under thermal stress. Journal of Sea Research141, 71-82. doi: 10.1016/j.seares.2018.08.008.

· De'ath, G., Lough, J., & Fabricius, K. (2009). Declining Coral Calcification on the Great Barrier Reef. Science323(5910), 116-119. doi: 10.1126/science.1165283.

· Nilsson, G., Östlund-Nilsson, S., & Munday, P. (2010). Effects of elevated temperature on coral reef fishes: Loss of hypoxia tolerance and inability to acclimate. Comparative Biochemistry and Physiology Part A: Molecular & Integrative Physiology156(4), 389-393. doi: 10.1016/j.cbpa.2010.03.009.

Dias (2018) evaluated how elevated surface temperatures of the ocean affected growth, mortality, and regeneration following the fragmentation of nine coral species in the Indo-Pacific, while De'ath (2009) suggested that the ability of coral in the Great Barrier Reef may have depleted due to a decrease in the saturation state of aragonite and rising temperature stress in this region. The third paper evaluated, Nilsson (2010), examined whether or not an elevated temperature decreased tolerance levels for low-oxygen regions in two species of coral reef fishes. This experiment used adults fishes of two species and tested their ability to acclimate to changes in higher temperatures, which differed from the other two studies in that Dias and De’Ath did not study the fishes in the ecosystems, only the coral there. Dias found that whether or not a coral had previous injury did not impact the mortality, partial mortality, or rate of growth of each fragment. However, the species of coral and the ocean temperature had significant impacts on the results for each fragment. Although the cause for coral calcification of Great Barrier Reef corals was not determined by the De'ath’s study, he did find that it was largely related to increasing temperatures of oceans, which caused more thermal stress in coral populations. This differed from the Nilsson paper, which showed that certain species of coral reef fishes were unable to adjust to higher ocean temperatures, a phenomenon that has occurred due to global warming and ocean acidification.

Analysis

Introduction

When the three articles’ introductions were evaluated, some similarities as well as dissimilarities stood out. For example, the titles of the articles varied in appropriateness. Nilsson's title was too long. The paper had a title that told its audience what the researchers hoped to get out of it, but the title seemed long and bulky. The title, in my opinion, could have been shortened or rephrased to one that grabbed the audience’s attention more quickly, even a change as simple as changing the title to, “Effects of elevated temperature on coral reef fishes.” However, Dias’s title was accurate and concise. “Mortality, growth and regeneration following fragmentation” was a title that accurately explained what was being examined within the confines of this study. De'ath had a title that matched the contents of the paper as well.

The abstract’s statement of purpose of all three articles matched the introductions. For the Dias article, they stressed that the impacts of thermal stress on fragments of regenerating coral species needed to quickly be explored, while De'ath’s abstract was well written, telling readers how many coral colonies were studied and what the results showed. The abstract of Nilsson’s paper plainly stated what occurred within the first two sentences. The abstract’s statement of purpose for this article was to display how two species of coral reef fishes in the Great Barrier Reef are failing to acclimate to higher sea surface temperatures. This was plainly stated in both the abstract and introduction of the article.

The hypotheses of the three articles varied greatly. Dias stated that the change in the global climate has led to rising sea surface temperatures and ocean acidification, which jeopardized coral reef survival. With this sentence, Dias made it clear why his study efforts were so urgent. Nilsson followed a similar pattern when he clearly stated his concerns for the inability of coral reef fishes to acclimate to rising water temperatures. De'ath’s hypothesis was stated in the abstract, which said that his study suggested that the increasing thermal stress may be depleting the ability of Great Barrier Reef corals to deposit calcium carbonate. Thus, the hypotheses of all three articles were given. I also found that Dias, De'ath, and Nilsson all had a nice way of arranging their data, which allowed the information to build to what the experimental design included and what the researchers were hoping to accomplish from this experiment.

Methods

The sample selection among the three articles showed great contrast. The Dias paper used nine reef-forming coral species, while De'ath’s experiment studied 328 colonies of coral from the same genus, Porites, which is a stony coral. Nilsson studied adults of two species of coral reef fishes. For Dias’s paper, the methods were easy to follow and seemed easy to repeat, while De’ath’s methods were harder to follow, for the details did not appear to all be listed. The Methods section of the Nilsson article was both valid and delivered with enough detail that another group could perform most of this study again. Only most of the experiment, because although the article listed when and where the experiment was conducted, the number of each species of adult coral reef fish caught and analyzed was not given in the Methods sections of the paper. This information is crucial, because a small sample size could invalidate the data, while a large sample size could support the data more accurately. Furthermore, if one species of coral reef fishes had a much larger or small number than the other species, the data would also not be well represented in the results found by this study. The number of samples for both of the other articles were given.

While some articles had strong Methods sections, others were missing key components. The experimental design for the three articles chosen all seemed valid. De’ath’s study seemed valid for the experiment being conducted, though I am unsure that this study could be repeated using the paper alone. The experimental design did make sense overall, in that Porites is commonly chosen for sclerochronological analyses because they have annual density bands that are widely distributed. Portites coral also has the capability of growing for hundreds of years, so choosing this genus of coral for a large analysis made sense. Using the three growth parameters De’ath mentioned—skeletal density, calcification rate, and annual extension rate—are good parameters to look at in a genus of coral that has such a long life span. Dias justified every step of his experimental design, making it easy to repeat the process. For instance, Dias utilized contrasting morphologies, because they have different susceptibilities to thermal stress, giving the overall results more credit. These corals were held in captivity for several years, giving the researchers knowledge of the corals’ thermal history. Twenty fragments were cut from each of the nine species, half of which were used as a control. Sources of variations were eliminated in this process by cutting only one coral from each colony. These methods appear valid, and each is given a reason as to why a scientist would conduct the experiment in this way, making the overall flow of the methods logical and easy to follow. This was similar to De’ath’s paper in that De’ath listed the parameters used to test the samples, and he mention that Porites has such a long lifespan, so these types of corals have been proven to record environmental changes within their skeletons. This statement justified why De’ath chose this coral and explained why these particular parameters were chosen. However, he did not specify how to conduct these analyses. Also, although the data was collected within a two-month period for both Dias and Nilsson’s experiments, De’ath’s experiments was a composite collection from the years 1900-2006, containing over sixteen thousand annuals records with corals ranging from ten to 436 years old. Hence, the broad range of years the specimens were collected was overwhelming, not to mention the three growth parameters the paper mentioned but again failed to explain. Lastly, for Nilsson’s experimental design, I found that it was carried out well, using adults of the two species of coral reef fishes and varying temperatures that supported their hypothesis. However, not including the number of each species caught negates the data to a certain degree. Overall, I found that the Methods section of Nilsson’s paper was logical, but it did not contain details that were pertinent to this experiment, whereas Dias included all pertinent information and De’ath failed to include how he performed the parameters that were chosen.

Results

Since the concentration for each study varied, the results were also quite different in composition. Dias et al. (2018) found that injury—whether present or absent—had no impact on the death or growth rate of the coral fragments studied. The researchers determined that the true factors that impacted death and growth rate of the corals analyzed were temperature and the coral species itself. These results were illustrated using tables and figures. Table 1 was difficult to follow, because some of the columns were abbreviated using terms not explained within the content of the article. However, the numbers in the table coincide with the text, showing that injury did not impact the growth or mortality rate of the coral fragments used in this experiment. The results found in De'ath’s paper were easy to follow, but showed that the cause of decline in coral populations in the GBR were still not known. Within the Results section of the articles written by Nilsson and De’ath, I saw that the figures and tables matched the text without repeating the same information to the audience. The figures and tables were accurate with what the text had previously stated, showing P values that were statistically significant, and the data was very easy to understand. The table in Dias’s article could have been better presented if the abbreviations used had some type of key that denote what each header meant. I did not find any discrepancies among the figures and text of the three articles as far as percentages were concerned.

The results found in the three studies did test the hypothesis of the researchers. For Dias’s paper, these results were shown in Figure 1, which illustrated that as temperature increased, the mortality rate of coral species also increased. As Dias mentioned in the Abstract section, there were two coral species that survived this experiment, Turbinaria reniformis and Galaxea fascicularis. However, the results of the Nilsson paper were to test the hypothesis of the researchers in that study, which was that after a given number of days in varying temperatures, the coral reef fishes studied would fail to acclimate to those temperature changes. Again, in the third paper, De’ath’s results tested the hypothesis, including 328 colonies of massive corals form 69 various reefs, which made the results more broad.

Discussion

I found that none of the three articles repeated the same information in both the figures and the text of the article. From the Discussion section of the Dias paper, a reader could tell the main points of the article, which were to show that there was variability in the susceptibility to thermal stress of different coral reef species. These coral reef species had the lowest mortality, partial mortality, and levels of bleaching at 26 degrees Celsius, while their growth rate was at its zenith at this temperature. Dias found that the regeneration rate of corals generally increased as the temperature increased. These results also show that the bleaching resistance capacity of most of the corals analyzed was overcome at 32 degrees Celsius. Because this paper is so new—published in 2018—I could not find its interpretations to be supported by other research. However, the article does list the direction in which the research is headed and lists other studies similar to this one.

The findings of De'ath’s and Dias’s articles were supported by each other, as well as many other articles over coral reef ecology. However, I found very few articles that supported the conclusions drawn by Nilsson (2010) about the effects increasing temperatures had on two species of coral reef fishes, which was a weakness for the paper. In all three journal articles, I found that the interpretation of data was logical.

Conclusion

Summary

The three articles, overall, had both strengths and weaknesses. For instance, all three papers were peer reviewed. Nilsson’s paper had few other articles that backed up its findings, while Dias and De’ath backed up each other’s paper. The article by Dias (2018) was very recent, which made it one of the newest published papers in its field, while the De'ath and Nilsson papers were a few years older. Although the Dias paper had tables and figures that were not entirely straight forward, the content of the article itself was very easy to follow. Each section within the article was set apart, whereas in the article by De’ath, the sections (introduction, methods, etc.) were not separated from each other. The Dias and De’ath papers had appropriate titles, while Nilsson’s title was too long. The abstracts and introductions match for all three articles. The pace for the Dias article is great, leading the audience straight into the hypothesis and objectives for the analysis, while De’ath failed to separate his paper into different sections. Overall, Dias and the other researchers took many steps to ensure accurate results, and the Methods section of this article is explained well enough to be repeated, which differed from Nilsson’s paper in that Nilsson did not include his sample size. When it came to reproducing the experiment, Dias included all pertinent information, but De’ath failed to include how he performed the parameters that were chosen.

De’ath had a concise paper with text that accurately related to the figures mentioned. Overall, the article was concise in its findings, but not as easy to follow as it could have been if the proper sections and subsections had been utilized. The title seemed appropriate, and readers know from the statement of purpose and the introduction that the primary goal of this study was to help determine what is causing the decline in corals’ ability to lay down a calcium carbonate skeleton to more efficiently build coral reef ecosystems. The De'ath paper used a large sample selection, which made the results seem more inclusive as opposed to Dias’s samples size of nine species of corals using twenty fragments of each species. Nilsson’s article had excellent figures that were easy to interpret, in contrast to Dias’s paper that did not explain what some of the abbreviations meant in the tables. The strengths and weaknesses of the three papers varied greatly.

Significance

When evaluating the role these articles play in the world, striking similarities were found. Nilsson’s article showed primary concerns toward two populations of fish species that lived in the Great Barrier Reef, while De’ath and Dias wrote papers over the reactions of different coral species to increasing surface temperatures. De’ath’s article has practical significance similar to the Dias paper, in that major ecosystems are dying as a result of rising ocean surface temperatures, and these researchers tried to find ways to explain these issues. Nilsson’s article examined whether or not increasing temperatures reduced the hypoxia tolerance of coral reef fishes. It has been cited sixty-nine times, cited in papers involving hypoxia tolerance of coral reef fishes, how temperature and hypoxia play a role in respiratory performance of certain tropical fishes, and many other similar studies (Nilsson et al.). De'ath et al. has been cited twenty-four times, which sparked interest for similar research in the Great Barrier Reef in the last decade. The Dias et al. paper was only published in 2018, so not many other researchers have cited this paper yet. This can be seen as a potential problem; however, the article was peer-reviewed by individuals who are well-educated in this particular field. The currency of the Dias article may also be seen as a good attribute, showing that this information was some of the newest in its field of interest. The research among all three articles has significance to today’s society, in that the bleaching of coral reefs has become a growing problem, and without more research to determine what factors are causing this issue, large hypoxic zones in aquatic ecosystems may result.

Overall, all three of these articles illustrated environmental significance. Human survival depends on the biodiversity of plants and animals, and many animals live in these coral reef ecosystems.

Works Cited

De'ath, G., Lough, J., & Fabricius, K. (2009). Declining Coral Calcification on the Great Barrier Reef. Science323(5910), 116-119. doi: 10.1126/science.1165283

Dias, M., Ferreira, A., Gouveia, R., Cereja, R., & Vinagre, C. (2018). Mortality, growth and regeneration following fragmentation of reef-forming corals under thermal stress. Journal of Sea Research141, 71-82. doi: 10.1016/j.seares.2018.08.008

Nilsson, G., Östlund-Nilsson, S., & Munday, P. (2010). Effects of elevated temperature on coral reef fishes: Loss of hypoxia tolerance and inability to acclimate. Comparative Biochemistry and Physiology Part A: Molecular & Integrative Physiology156(4), 389-393. doi: 10.1016/j.cbpa.2010.03.009

[Type here]

2

9

Order Solution Now

Similar Posts