An efficient string matching algorithm with k differences for nucleotide and amino acid sequences.


There are a few algorithms designed to solve the problem of the optimal alignment of one sequence, the pattern, of length m, with another, longer sequence the text, of length n. These algorithms allow mismatches, deletions and insertions. Algorithms to date run in O(mn) time. Let us define an integer, k, which is the maximal number of differences allowed. We present a simple algorithm showing that sequences can be optimally aligned in O(k2n) time. For long sequences the gain factor over the currently used algorithms is very large.

Documentos Relacionados