Caister Academic Press

Microbial Phylogenetics

Staphylococcus: Genetics and Physiology
Edited by: Greg A. Somerville
Essential reading for scientists working with staphylococci. This text is an excellent introduction for entry level scientists, as well as those seeking a deeper understanding of this critically important bacterial pathogen.
A review of current methods, computer programs and applications for Microbial Phylogenetics, molecular phylogeny and molecular taxonomy. Current phylogenetic methods, techniques and software tools.

Phylogenetics of Microorganisms

from Molecular Phylogeny of Microorganisms by Aharon Oren and R. Thane Papke (2010)

A proper understanding of the diversity, systematics and nomenclature of microbes is increasingly important in many branches of biological science. The molecular approach to phylogenetic analysis, pioneered by Carl Woese in the 1970s and leading to the three-domain model (Archaea, Bacteria, Eucarya), has revolutionized our thinking about evolution in the microbial world. The technological innovation of modern molecular biology and the rapid advancement in computational science have led to a flood of nucleic acid sequence information, bioinformatic tools and phylogenetic inference methods. Phylogenetic analysis has long played a central role in microbiology and the emerging fields of comparative genomics and phylogenomics require substantial knowledge and understanding of phylogenetic analysis and computational methods read more ...

A Historical Overview

When at the end of the 19th century information began to accumulate about the diversity within the bacterial world, scientists started to include the bacteria in phylogenetic schemes to explain how life on Earth may have developed. Some of the early phylogenetic trees of the prokaryote world were morphology-based; others were based on the then-current ideas on the presumed conditions on our planet at the time that life first developed. Around 1950 many leading microbiologists had become pessimistic with respect to the possibility of ever reconstructing bacterial phylogeny. The concept of the prokaryote-eukaryote dichotomy did little to clarify phylogenetic relationships. The developing technology of nucleic acid sequencing, together with the recognition that sequences of building blocks in informational macromolecules (nucleic acids, proteins) can be used as "molecular clocks" that contain historical information, led to the development of the three-domain model (Archaea - Bacteria - Eucarya) in the late 1970s, primarily based on small subunit ribosomal RNA sequence comparisons. The information currently accumulating from complete genome sequences of an ever increasing number of prokaryotes are now leading to further modifications of our views on microbial phylogeny read more ...

Methods and Programs

The purpose of phylogenetic analysis is to understand the past evolutionary path of organisms. Even though we will never know for certain the true phylogeny of any organism, phylogenetic analysis provides best assumptions, thereby providing a framework for various disciplines in microbiology. Due to the technological innovation of modern molecular biology and the rapid advancement in computational science, accurate inference of the phylogeny of a gene or organism seems possible in the near future. There has been a flood of nucleic acid sequence information, bioinformatic tools and phylogenetic inference methods in public domain databases, literature and worldwide web space. Phylogenetic analysis has long played a central role in basic microbiology, for example in taxonomy and ecology. In addition, more recently emerging fields of microbiology, including comparative genomics and phylogenomics, require substantial knowledge and understanding of phylogenetic analysis and computational skills to handle the large-scale data involved. Methods of phylogenetic analysis and relevant computer software tools lend accuracy, efficiency and availability to the task.
There are four steps in general phylogenetic analysis of molecular sequences: (i) selection of a suitable molecule or molecules (phylogenetic marker), (ii) acquisition of molecular sequences, (iii) multiple sequence alignment (MSA) and (iv) phylogenetic treeing and evaluation. The first step of phylogenetic analysis is to choose a suitable homologous part of the genomes to be compared. Mechanisms of molecular evolution include mutations, duplication of genes, reorganization of genomes, and genetic exchanges such as recombination, reassortment and lateral gene transfer. Although all of this information can be used to infer phylogenetic relationships of genes or organisms, information on mutations, including substitution, insertion, and deletion, is most frequently used in phylogeny reconstruction. The aim is to infer a correct organismal phylogeny, using orthologous genetic loci, in which common ancestry of two sequences can be traced back to a speciation event. Phylogeny using homologous genetic loci derived by gene duplication (paralogy) or related through lateral gene transfer (xenology), cannot reflect evolutionary history of organisms.
Once DNA sequence data are generated, they are subjected to a multiple sequence alignment process. This involves finding homologous sites, that is, positions derived from the same ancestral organism in the molecules under study. A set of sequences can be aligned with another by introducing "alignment gaps" (known in brief as "gaps"). In general, multiple sequence alignment starts by aligning a pair of sequences (pairwise alignment), and is then expanded to multiple sequences using various algorithms.
Many algorithms and computer programs have been developed in the last few decades for multiple sequence alignment, but the original Clustal series programs are still most widely used and produce reasonably good quality MSA for small data sets. For a large dataset, such as massive pyrosequencing reads, the MUSCLE program can generate good compromise between accuracy and speed. The MAFFT program utilizes several different algorithmic approaches and can be used for either small or very large datasets. There are also other computer programs developed for general multiple sequence alignment, but the above three have been most popular and are routinely used in publications in various microbiological disciplines read more ...

Multilocus Sequence Analysis

Multilocus sequence analysis (MLSA) represents the novel standard in microbial molecular systematics. In this context, MLSA is implemented in a relatively straightforward way, consisting essentially in the concatenation of several sequence partitions for the same set of organisms, resulting in a "supermatrix" which is used to infer a phylogeny by means of distance-matrix or optimality criterion-based methods. This approach is expected to have an increased resolving power due to the large number of characters analyzed, and a lower sensitivity to the impact of conflicting signals (i.e. phylogenetic incongruence) that result from eventual horizontal gene transfer events. The strategies used to deal with multiple partitions can be grouped in three broad categories: the total evidence, separate analysis and combination approaches. The concatenation approach that dominates MLSAs in the microbial molecular systematics literature is known to systematists working with plants and animals as the "total molecular evidence" approach, and has been used to solve difficult phylogenetic questions such as the relationships among the major groups of cetaceans, that of microsporidia and fungi, or the phylogeny of major plant lineages. The total molecular evidence approach has been criticized because by directly concatenating all available sequence alignments, the evidence of conflicting phylogenetic signals in the different data partitions is lost along with the possibility to uncover the evolutionary processes that gave rise to such contradictory signals. The nature of these conflicts is varied, but in the microbial world the strongest conflicting signals often derive from the existence of horizontal gene transfer events in the dataset. If the individuals containing xenologous loci are not identified and removed from the supermatrix prior to phylogeny inference, the resulting hypothesis may be strongly distorted, since standard treeing methods assume a single underlying evolutionary history. Based on these arguments, the conditional data combination strategy is to be generally preferred in bacterial MLSA read more ...

rRNA and Other Global Markers

The introduction of comparative rRNA sequence analysis represents a major milestone in the history of microbiology. The current taxonomy of prokaryotes as well as modern probe and chip based identification methods are mainly based upon rRNA derived phylogenetic conclusions. Also of importance is single gene based phylogenetic inference and alternative global markers include elongation and initiation factors, RNA polymerase subunits, DNA gyrases, heat shock and recA proteins. Although the comparative analyses are hampered by the generally low phylogenetic information content, and different resolution power, and multiple copies of the individual markers, the domain and prokaryotic phyla concept is globally supported read more ...

The Phyla of Prokaryotes

There is no official classification of prokaryotes. For the higher taxa there even is no official nomenclature: the rules of the International Code of Nomenclature of Prokaryotes do not cover taxa above the rank of class. The most commonly accepted division of the prokaryotes in two "subkingdoms" or "domains" (Bacteria and Archaea) and the classification of their species with validly published names in respectively 27 and 2 "phyla" or "divisions" (as of November 2009) is primarily based on 16S rRNA sequence comparisons. This type of classification was adopted in the latest edition of Bergey's Manual of Systematic Bacteriology. Alternative classifications have been proposed as well, based e.g. on the structure of the cell wall. Some 16S rRNA sequence-based phyla unite prokaryotes of similar physiological properties (for example Cyanobacteria, Chlorobi, Thermotogae); others (Euryarchaeota, Proteobacteria, Flavobacteria) contain organisms with highly disparate lifestyles. Some phyla based on deep 16S rRNA lineages are currently represented by one or a few species only. Environmental genomics/metagenomics approaches suggest existence of many more phyla based on the deep lineages of 16S rRNA gene sequences recovered. To obtain the organisms harboring these sequences and to study their properties is a major challenge of microbiology today read more ...

Rooting the Tree of Life

Defining the evolutionary relationships between groups of organisms is a major part of modern-day microbiology. With the continuing dramatic increase in the availability of genomic data, these techniques have been extended to describing an all-encompassing "tree of life". However, identifying the location of the root of this tree corresponding to the most recent common ancestor is a challenging and distinct problem that has yet to be solved. To date, many investigations have proposed various roots, using a wide diversity of biological data and techniques. A survey of the most promising of these models illustrates the difficulty faced in reaching a scientific consensus on the issue, as well as the additional philosophical complications posed by our emerging understanding of the role of horizontal gene transfer in genome evolution read more ...

Conserved Indels

Comparative analysis of genome sequences is leading to discovery of large numbers of novel molecular markers that are proving very helpful in understanding many important aspects of microbial phylogeny. Of these molecular markers, the conserved inserts or deletions (indels) in protein sequences provide particularly useful means for identifying different groups of microbes in clear molecular terms and for understanding how they have branched off from a common ancestor. Conserved indels and other novel molecular markers (viz. lineage-specific proteins) can be useful for understanding microbial phylogeny at different phylogenetic depths. Genetic and biochemical studies of these markers should also lead to identification of novel properties that are unique to different groups of microbes read more ...

Lateral or Horizontal Gene Transfer

Efforts to construct the tree of life take their conceptual motivation from Charles Darwin's theory of evolution. Until the advent of molecular biology, however, a universal tree of life was well beyond the scope of the data and methods of traditional organismal phylogeny. The rapid development of these methods and bodies of genetic sequence from the 1970s onwards resulted in major reclassifications of life and revived ambitions to represent all organismal lineages by one true tree of life. Subsequent realization of the significance of lateral gene transfer and other non-vertical processes has subtly reconceptualized and reoriented attempts to construct this universal phylogeny.
Gene transfer has affected the formation of groups of organisms. Gene transfer can make it more difficult to define and determine relationships. In those cases where many genes have been transferred between preferred partners, the majority of genes in a genome may reflect gene acquisition, and as a consequence, if a coherent signal is detected, one nevertheless might not be sure that the signal is due to organismal shared ancestry. However, the presence of a particular transferred gene has been shown, in several cases, to constitute a shared derived character useful in classification. Gene transfer can put together new metabolic pathways that open up new ecological niches, and consequently, the transfer of an adaptive gene might create a new group of organisms read more ...

Endosymbiosis and the Evolution of Plastids

Photosynthesis is one of the most successful energy production strategies on the planet and has been co-opted numerous times throughout evolutionary history via the uptake and retention of photosynthetic cells by non-photosynthetic eukaryotic heterotrophs. Whereas the result of this process is clear, what is not settled is the mode and tempo of plastid movement among eukaryotes, particularly plastids of red algal derivation. Recent changes in our understanding of the relationships between eukaryotic supergroups have only served to complicate the picture further. Of particular interest is the evolution of plastids, the relationships among photosynthetic eukaryotes, the process of endosymbiogenesis and the variation in ways plastids have been modified to suit the light harvesting needs of their hosts. The understanding of all of these factors is an active field of continued research that will undoubtedly lead to further discoveries in the coming years read more ...

from Molecular Phylogeny of Microorganisms by Aharon Oren and R. Thane Papke (2010)


Further reading