Caister Academic Press

Bioinformatics in Microbiology

Metagenomics: Current Advances and Emerging Concepts
Edited by: Diana Marco
Cutting-edge and emerging conceptual and methodological tools used in metagenomics
The rapid advancement of sequencing techniques, coupled with the new methodologies of bioinformatics to handle large scale data analysis, are providing exciting opportunities for us to understand microbial communities from a variety of environments beyond previous imagination. Data analysis is extremely important for a deeper knowledge of microbes and their habitats, and for many applications of microbiology ranging from understanding the basis of diseases or host pathogen interactions so as to design drugs and develop vaccines, to many other biotechnology applications, including barcoding, microbial bio-remediation and bio-fuel production (Bishop 2014).

Bioinformatics analysis of microbial sequence data

Adapted from Meesbah Jiwaji, Gwynneth F. Matcher and Rosemary A. Dorrington writing in Bishop 2014

Bioinformatics analysis of sequence data

Part of bioinformatics research involves the management and analysis of large scale sequence data that has been generated, and is a rapidly growing field of science that incorporates aspects of biology, mathematics and computer science. Once sequence data have been generated, bioinformatics analysis is required. Depending on the application, input template, and platform utilized, this initial processing would include removal of substandard reads and the alignment of reads into contigs (in the case of whole genome sequencing). The final role played by bioinformaticists is the curation of the huge datasets generated by next generation sequencing technologies.

Determination of the nucleotide sequence of a target organism or population of organisms' genetic material on its own is relatively uninformative. Defining how this nucleotide sequence is responsible for structural and metabolic functionality is more important. Bioinformatics forms the bridge between the sequence data and the biological functioning in an organism/organisms. Once the nucleotide sequence has been determined, the first step in the bioinformatics analysis of the sequence is gene prediction by detecting potential open reading frames (ORFs). This is achieved by identifying conserved sequences responsible for the initiation and termination of transcription as well as the site for the initiation of translation. Once a potential ORF is identified, the next step is to annotate the putative gene by assigning a function to the sequence. By comparing the encoded query protein sequence with a database of proteins with known functions, putative proteins with sufficient homology to the known proteins can then be assigned the corresponding function. If the query protein sequence does not show high levels of similarity with known proteins, annotation by function can be carried out whereby the putative proteins domains can be assigned a function. For example, a particular arrangement of hydrophobic regions within a protein may indicate a membrane protein. While assignment of function based on similarity to other proteins does provide an extremely valuable starting point when correlating DNA sequence to cellular metabolism, it is important to keep in mind that subsequent biological validation is required for confirmation (Bishop 2014).

Further reading