A computational and experimental approach to validating annotations and gene predictions in the Drosophila melanogaster genome
AUTOR(ES)
Yandell, Mark
FONTE
National Academy of Sciences
RESUMO
Five years after the completion of the sequence of the Drosophila melanogaster genome, the number of protein-coding genes it contains remains a matter of debate; the number of computational gene predictions greatly exceeds the number of validated gene annotations. We have assembled a collection of >10,000 gene predictions that do not overlap existing gene annotations and have developed a process for their validation that allows us to efficiently prioritize and experimentally validate predictions from various sources by sequencing RT-PCR products to confirm gene structures. Our data provide experimental evidence for 122 protein-coding genes. Our analyses suggest that the entire collection of predictions contains only ≈700 additional protein-coding genes. Although we cannot rule out the discovery of genes with unusual features that make them refractory to existing methods, our results suggest that the D. melanogaster genome contains ≈14,000 protein-coding genes.
ACESSO AO ARTIGO
http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=545494Documentos Relacionados
- Assessing the Drosophila melanogaster and Anopheles gambiae Genome Annotations Using Genome-Wide Sequence Comparisons
- A novel computational approach for development of highly selective fenitrothion imprinted polymer: theoretical predictions and experimental validations
- The mouse genome: Experimental examination of gene predictions and transcriptional start sites
- Gene Discovery Using Computational and Microarray Analysis of Transcription in the Drosophila melanogaster Testis
- A computational genomics approach to the identification of gene networks.