Rapid similarity searches of nucleic acid and protein data banks.
AUTOR(ES)
Wilbur, W J
RESUMO
With the development of large data banks of protein and nucleic acid sequences, the need for efficient methods of searching such banks for sequences similar to a given sequence has become evident. We present an algorithm for the global comparison of sequences based on matching k-tuples of sequence elements for a fixed k. The method results in substantial reduction in the time required to search a data bank when compared with prior techniques of similarity analysis, with minimal loss in sensitivity. The algorithm has also been adapted, in a separate implementation, to produce rigorous sequence alignments. Currently, using the DEC KL-10 system, we can compare all sequences in the entire Protein Data Bank of the National Biomedical Research Foundation with a 350-residue query sequence in less than 3 min and carry out a similar analysis with a 500-base query sequence against all eukaryotic sequences in the Los Alamos Nucleic Acid Data Base in less than 2 min.
ACESSO AO ARTIGO
http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=393452Documentos Relacionados
- Protein sequence similarity searches using patterns as seeds.
- Rapid and accurate estimates of statistical significance for sequence data base searches.
- A computer program for the management of small cosmid banks.
- Chemical Cleveland mapping: a rapid technique for characterization of crosslinked nucleic acid-protein complexes.
- Edward Jenner's unpublished cowpox inquiry and the Royal Society: Everard Home's report to Sir Joseph Banks.