Using Text Analysis to Identify Functionally Coherent Gene Groups
AUTOR(ES)
Raychaudhuri, Soumya
FONTE
Cold Spring Harbor Laboratory Press
RESUMO
The analysis of large-scale genomic information (such as sequence data or expression patterns) frequently involves grouping genes on the basis of common experimental features. Often, as with gene expression clustering, there are too many groups to easily identify the functionally relevant ones. One valuable source of information about gene function is the published literature. We present a method, neighbor divergence, for assessing whether the genes within a group share a common biological function based on their associated scientific literature. The method uses statistical natural language processing techniques to interpret biological text. It requires only a corpus of documents relevant to the genes being studied (e.g., all genes in an organism) and an index connecting the documents to appropriate genes. Given a group of genes, neighbor divergence assigns a numerical score indicating how “functionally coherent” the gene group is from the perspective of the published literature. We evaluate our method by testing its ability to distinguish 19 known functional gene groups from 1900 randomly assembled groups. Neighbor divergence achieves 79% sensitivity at 100% specificity, comparing favorably to other tested methods. We also apply neighbor divergence to previously published gene expression clusters to assess its ability to recognize gene groups that had been manually identified as representative of a common function.
ACESSO AO ARTIGO
http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=187532Documentos Relacionados
- TXTGate: profiling gene groups with text-based information
- Nosocomial infections in brazilian pediatric patients: using a decision tree to identify high mortality groups
- Haplotype Analysis in Multiple Crosses to Identify a QTL Gene
- GeneClinics: A Hybrid Text/Data Electronic Publishing Model Using XML Applied to Clinical Genetic Testing
- Was Rodney Ledward a statistical outlier? Retrospective analysis using routine hospital data to identify gynaecologists' performance