Analysis of the Clustering Algorithms for the Databases / Análise de Algoritmos de Agrupamento para Base de Dados Textuais
AUTOR(ES)
Luiz Gonzaga Paula de Almeida
DATA DE PUBLICAÇÃO
2008
RESUMO
The increasing amount of digitally stored texts makes necessary the development of computational tools to allow the access of information and knowledge in an efficient and efficacious manner. This problem is extremely relevant in biomedicine research, since most of the generated knowledge is translated into scientific articles and it is necessary to have the most easy and fast access. The research field known as Text Mining deals with the problem of identifying new information and knowledge in text databases. One of its tasks is to find in databases groups of texts that are correlated, an issue known as text clustering. To allow clustering, text databases must be transformed into the commonly used Vector Space Model, in which texts are represented by vectors composed by the frequency of occurrence of words and terms present in the databases. The set of vectors composing a matrix named document-term is usually sparse with high dimension. Normally, to attenuate the problems caused by these features, a subset of terms is selected, thus giving rise a new document-term matrix with reduced dimensions, which is then used by clustering algorithms. This work presents two algorithms for terms selection and the evaluation of clustering algorithms: k-means, spectral and graph portioning, in five pre-classified databases. The databases were pre-processed by previously described methods. The results indicate that the term selection algorithms implemented increased the performance of the clustering algorithms used and that the k-means and spectral algorithms outperformed the graph portioning.
ASSUNTO(S)
análise por agrupamento clustering analysis computabilidade e modelos de computacao feature selection seleção de características mineração de textos text mining
ACESSO AO ARTIGO
http://www.lncc.br/tdmc/tde_busca/arquivo.php?codArquivo=146Documentos Relacionados
- Agrupamento híbrido de dados utilizando algoritmos genéticos
- A framework for cluster analysis based in the multi-objective combination of clustering algorithms
- Adaptação de viés indutivo de algoritmos de agrupamento de fluxos de dados
- Avaliação de algoritmos de agrupamento em grafos para segmentação de imagens
- Estudo e desenvolvimento de algoritmos para agrupamento fuzzy de dados em cenários centralizados e distribuídos