Multi-alphabet consensus algorithm for identification of low specificity protein-DNA interactions.

AUTOR(ES)
RESUMO

A method for the identification and characterization of protein-DNA interactions is presented. We have developed an approach for finding unknown multiple patterns that occur imperfectly in a set of several sequences. The pattern may contain letters from the nucleotide alphabet (A, C, G and T) including ambiguous characters (A/C, A/G, A/T; A/C/G, etc.). This method reveals weak DNA signals on an unaligned set of DNA fragments known to be functionally related and assumes no prior information on the sequences' alignment. It determines the locations of the signals from only the information intrinsic to the sequences themselves. We have applied this method to analyze the binding sites of cAMP receptor protein (CRP). The consensus based on these data are discussed and a comparison of the consensus with the crystal structure of CAP-DNA complex is presented. We further show that in a mixture of DNA sequences, containing binding sites for two different proteins, both classes of binding sites can be discovered simultaneously by this method. The DNA sequences of nucleosome cores from chicken erythrocyte and a set of the other known nucleosomal sequences show existence of symmetrical features in nucleosome-binding DNA sequences. We also show multi-alphabet patterns that can play a role in the phasing signal on the nucleosome DNA molecule and have compared the results with existing models of nucleosome positioning.

Documentos Relacionados