Assignment of position-specific error probability to primary DNA sequence data.

AUTOR(ES)
RESUMO

DNA sequence predicted from polyacrylamide gel-based technologies is inaccurate because of variations in the quality of the primary data due to limitations of the technology, and to sequence-specific variations due to nucleotide interactions within the DNA molecule and with the gel. The ability to recognize the probability of error in the primary data will be useful in reconstructing the target sequence of a DNA sequencing project, and in estimating the accuracy of the final sequence. This paper describes the use of linear discriminant analysis to assign position-specific probabilities of incorrect, over- and under-prediction of nucleotides for each predicted nucleotide position in primary sequence data generated by a gel-based DNA sequencing technology. Using this method, most of the error potential in primary sequence data can be assigned to a limited number of discrete positions. The use of probability values in the sequence reconstruction process, and in estimating the accuracy of consensus sequence determination is described.

Documentos Relacionados