
If only sequence information is available, positions are assigned based on the conservation of residue identity or properties, which is inherently less reliable than structural inference. With structural information, homology is typically validated by demonstrating that two residues occupy the same location in 3D space since structural homology implies sequential homology. Protein alignments are built based on the assumption that each position (column) in the alignment is homologous. Multiple sequence alignments are critical for generating and testing hypotheses based on protein structure, function, and phylogeny. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.Ĭompeting interests: The authors have declared that no competing interests exist. Work in the lab of GG is supported by a Discovery Grant from the National Sciences and Engineering Council of Canada. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.įunding: RJD is supported by a National Sciences and Engineering Council of Canada CGS scholarship. Received: DecemAccepted: ApPublished: June 8, 2012Ĭopyright: © 2012 Dickson, Gloor. PLoS ONE 7(6):Įditor: Bostjan Kobe, University of Queensland, Australia Loco is available at Ĭitation: Dickson RJ, Gloor GB (2012) Protein Sequence Alignment Analysis by Local Covariation: Coevolution Statistics Detect Benchmark Alignment Errors. We also show that local covariation identifies active site residues in a validated alignment of paralogous structures. Realignment of these misaligned segments reduces local covariation these alternative alignments are supported with structural evidence. Two alignments contain sequential and structural shifts that cause elevated local covariation. We highlight three alignments from the benchmark database, BAliBASE 3, that contain regions of high local covariation, and investigate the causes to illustrate these types of scenarios. Using LoCo, we illustrate how local covariation is capable of identifying alignment errors due to the reduction of positional independence in the region of misalignment. We demonstrate an alignment curation tool, LoCo, that integrates local covariation scores with the Jalview alignment editor. Local covariation identifies systematic misalignments and is independent of conservation. Identifying alignment errors is difficult because alignments are built and validated on the same primary criteria: sequence conservation. Misalignments can lead to inferential errors about protein structure, folding, function, phylogeny, and residue importance.

High quality alignments are difficult to build and protein alignment remains one of the largest open problems in computational biology. The use of sequence alignments to understand protein families is ubiquitous in molecular biology.
