![]()
Naor, D. and Brutlag, D. L. (1993). On Suboptimal Alignments of Biological Sequences. Padova, Italy: Springer-Verlag, 179-196.
Department of
Biochemistry, Stanford
University School of Medicine, Stanford
California 94305-5307.
It is widely accepted that the optimal alignment between a pair of
proteins or nucleic acid sequences that minimizes the edit distance
may not necessarily reflect the correct biological alignment.
Alignments of proteins based on their structures or of DNA sequences
based on evolutionary changes are often different from alignments
that minimize edit distance. However, in many cases (e.g. when the
sequences are close), the edit distance alignment is a good
approximation to the biological one. Since, for most sequences, the
true alignment is unknown, a method that either assesses the
significance of the optimal alignment, or that provides few ``close''
alternatives to the optimal one, is of great importance.
A suboptimal alignment is an alignment whose score lies within the
neighborhood of the optimal score. It is therefore a natural
candidate for an alternative to the optimal alignment. Enumeration of
suboptimal alignments is not very useful since there are many such
alignments. Other approaches that use only partial information about
suboptimal alignments are more successful in practice.
In this paper we study the combinatorial nature of suboptimal
alignments. We define a set of ``canonical'' suboptimal alignments,
and argue that these are the essential ones since any other
suboptimal alignment is a combination of few canonical ones. We
characterize the canonical alignments, and then show that for the
additional cost of one edit distance computation a compact
representation of all of them can be obtained. This representation
represents all non-canonical suboptimal alignments as well, allows
efficient enumerations of various sets of suboptimal alignments, and
allows counting the number of suboptimal alignments. The
implementation of this representation is now under development, and
its applicability to real data is tested. Preliminary examples are
presented to motivate the problem.
![]()
[Back to Doug] [Address] [Academics] [Honors] [Publications] [Presentations] [Public Service]