Transcript identification by analysis of short sequence tags—influence of tag length, restriction site and transcript database
AUTOR(ES)
Unneberg, Per
FONTE
Oxford University Press
RESUMO
There exist a number of gene expression profiling techniques that utilize restriction enzymes for generation of short expressed sequence tags. We have studied how the choice of restriction enzyme influences various characteristics of tags generated in an experiment. We have also investigated various aspects of in silico transcript identification that these profiling methods rely on. First, analysis of 14 248 mRNA sequences derived from the RefSeq transcript database showed that 1–30% of the sequences lack a given restriction enzyme recognition site. Moreover, 1–5% of the transcripts have recognition sites located less than 10 bases from the poly(A) tail. The uniqueness of 10 bp tags lies in the range 90–95%, which increases only slightly with longer tags, due to the existence of closely related transcripts. Furthermore, 3–30% of upstream 10 bp tags are identical to 3′ tags, introducing a risk of misclassification if upstream tags are present in a sample. Second, we found that a sequence length of 16–17 bp, including the recognition site, is sufficient for unique transcript identification by BLAST based sequence alignment to the UniGene Human non-redundant database. Third, we constructed a tag-to-gene mapping for UniGene and compared it to an existing mapping database. The mappings agreed to 79–83%, where the selection of representative sequences in the UniGene clusters is the main cause of the disagreement. The results of this study may serve to improve the interpretation of sequence-based expression studies and the design of hybridization arrays, by identifying short tags that have a high reliability and separating them from tags that carry an inherent ambiguity in their capacity to discriminate between genes. To this end, supplementary information in the form of a web companion to this paper is located at http://biobase.biotech.kth.se/tagseq.
ACESSO AO ARTIGO
http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=153741Documentos Relacionados
- Discovery of three genes specifically expressed in human prostate by expressed sequence tag database analysis
- Production of full-length cDNA sequences by sequencing and analysis of expressed sequence tags from Schistosoma mansoni
- Assembly of a gene sequence tag microarray by reversible biotin-streptavidin capture for transcript analysis of Arabidopsis thaliana
- Identification and Differentiation of Leishmania Species in Clinical Samples by PCR Amplification of the Miniexon Sequence and Subsequent Restriction Fragment Length Polymorphism Analysis
- Identification and Analysis of Arabidopsis Expressed Sequence Tags Characteristic of Non-Coding RNAs1