A method to locate protein coding sequences in DNA of prokaryotic systems.

AUTOR(ES)
RESUMO

cDNA sequence data from E. coli phages, for which complete genome sequences are known, have been analysed, From this analysis thirteen triplets have been identified as markers to distinguish protein-coding frames from fortuitous open reading frames. The region of -18 to +18 nucleotides around ATG/GTG, has been analysed and used to identify initiator codons from internal ATG/GTG. With the aid of criteria defined above a method has been developed to locate protein coding sequences by a combination of 'gene search by signal' and 'gene search by content' approaches. Application of this method to prokaryotic systems including those which were not part of our data base indicates that it is quite accurate and general in nature.

Documentos Relacionados