[Back to Doug] [Address] [Academics] [Honors] [Publications] [Presentations] [Public Service]

Naor, D. and Brutlag, D. L. (1993). On Suboptimal Alignments of Biological Sequences. Padova, Italy: Springer-Verlag, 179-196.

On Suboptimal Alignments of Biological Sequences.

D. Naor & D. L. Brutlag

Department of Biochemistry, Stanford University School of Medicine, Stanford California 94305-5307.

It is widely accepted that the optimal alignment between a pair of proteins or nucleic acid sequences that minimizes the edit distance may not necessarily reflect the correct biological alignment. Alignments of proteins based on their structures or of DNA sequences based on evolutionary changes are often different from alignments that minimize edit distance. However, in many cases (e.g. when the sequences are close), the edit distance alignment is a good approximation to the biological one. Since, for most sequences, the true alignment is unknown, a method that either assesses the significance of the optimal alignment, or that provides few ``close'' alternatives to the optimal one, is of great importance.

A suboptimal alignment is an alignment whose score lies within the neighborhood of the optimal score. It is therefore a natural candidate for an alternative to the optimal alignment. Enumeration of suboptimal alignments is not very useful since there are many such alignments. Other approaches that use only partial information about suboptimal alignments are more successful in practice.

In this paper we study the combinatorial nature of suboptimal alignments. We define a set of ``canonical'' suboptimal alignments, and argue that these are the essential ones since any other suboptimal alignment is a combination of few canonical ones. We characterize the canonical alignments, and then show that for the additional cost of one edit distance computation a compact representation of all of them can be obtained. This representation represents all non-canonical suboptimal alignments as well, allows efficient enumerations of various sets of suboptimal alignments, and allows counting the number of suboptimal alignments. The implementation of this representation is now under development, and its applicability to real data is tested. Preliminary examples are presented to motivate the problem.

Full Text: To get Acrobat:

[Back to Doug] [Address] [Academics] [Honors] [Publications] [Presentations] [Public Service]