VISTREE: uma linguagem visual para análise de padrões arborescentes e para especificação de restrições em um ambiente de mineração de árvores

AUTOR(ES)
DATA DE PUBLICAÇÃO

2008

RESUMO

The frequent pattern mining in data represented by more complex structures like trees and graphs are growing lately. Among the reasons for this improvement is the fact that the tree and graph patterns has more information than sequential patterns, besides there is the possibility of usage of this type of mining in several areas like XML Mining,Web Mining and Bioinformatic. A problem that occurs in mining patterns in general is the great amount of patterns generated. Being some of them not interesting for users. The decrease in the quantity of patterns generated can be done restricting the patterns types produced through the user constraint. Even incorporating constraints in the mining process, the quantity of tree pattern mined is large, what make necessary one tool for pattern analysis, possibiliting the user specify queries to extract in the mass of mined patterns that satisfy the criteria of the selection in the query. The pattern mining with constraint, aim to obtain as a result of the process of mining only the patterns with the real interest for the user. The constraint about patterns will be represented related to the structure of them. One form to represent the sequential pattern mining would be through regular expressions, for the tree pattern mining, the tree automata. The use of constraints solve the problem to generate a large amout of patterns, but the mechanism used to represent the constraint is still constituted in another problem that would be the difficult for a user do the input of constraint using this mechanism. The queries about frequent patterns are made according to the characteristics of the data. One way to extract specific patterns in data structured like trees is to store the specific patterns in a XML file and make queries using one of the query languages for XML files. Among the XML query languages, the XQuery language is very used, mainly by the fact that its similar in semantic to SQL, the query language for databases. The frequently patterns queries could be made using this language, but, for this the user would have to know and be capable to express queries through it. In this research it will be presented the visual language VisTree that consists of visual tool to be used in a phase of preprocess for specification the user preferences that involves the format of the tree pattern that are interested to him, as in a phase of postprocess to analyze the mined patterns. The VisTree sintaxe is based on in a fragment of the Tree Pattern language[Chen et al. 2003, Che and Liu 2005], the core of XPath 1.0 [Clark and Derose 1999, Olteanu et al. 2002]. However, the semantic of VisTree differs from the semantic of these languages in the sense that VisTree queries return the sets of tree patterns. VisTree uses a XQuery language [Chamberlin 2003, Katz et al. 2003] like query process mechanism: the visual queries specified in VisTree are mapped in XQuery queries and theirs responses are adapted to fit the format returned by VisTree. VisTree works like a XQuery front-end. A complete system of mining tree pattern was developed to test and validate the use of VisTree language in specific contexts of applications. The system was made in a modular form, in a way to allow that new applications could be incorporated in a simple way. This research show the application of tree mining with constraint in the areas of XML Mining andWeb Mining through study case. In both applications, the system use the VisTree language in the preprocess modules (constraint input) and analysis of patterns (query input).

ASSUNTO(S)

mineração de árvores com restrição mineração de dados (computação) constraint-based tree mining banco de dados tree mining xml mining mineração de árvores ciencia da computacao datamining web mining

Documentos Relacionados