Uma nova metodologia para seleção de atributos no processo de extração de conhecimento de base de dados baseada na Teoria de Rough Sets

AUTOR(ES)
DATA DE PUBLICAÇÃO

2008

RESUMO

In this dissertation, a new Feature Selection Subsets methodology is proposed, to be used in the Knowledge Discover in Database process. The databases, dimensioned for specific purposes, own in its essence, the intrinsic knowledge to the system of its application. This knowledge is very valuable and important to take strategical decisions in this system. Thus, the Artificial Intelligences proposal, through of the Data Mining, is to extract this knowledge of databases with automatic form. With this, the KDD concept was introduced, that implies in a knowledge extractions database process. One of the stages of the KDD is the Feature Selection Subsets (FSS) that it has for objective to analyze a database and to eliminate attributes not important for knowledge to be extracted, thus reducing the datas volume to be analyzed, without it has significant alterations in its content. Then, analyzing the existing methodologies of FSS, in special, Reducts in the Theory of Rough Sets, FOCUS and FOCUS-2, were verified that in Reducts selects conditional attributes without considering the decision attribute, that it is the object of the knowledge to be extracted. In FOCUS and FOCUS-2, that applies similar concepts to the Reducts methodology, implying in analysis of all combinations of examples (two by two), verifies that the application occurs to pairs of examples belonging to the different classrooms, of this form considering the decision attribute. From this analysis, it was elaborated the methodology proposal in this work, that uses the concepts introduced in the Theory of Rough Sets, with a differential in the Discernibility Matrixs composition. This differential considers the attribute decision in the composition of this matrix, as in FOCUS and FOCUS-2, and additionally, providing a differentiated treatment to examples belonging to the same classroom. Well, a hypothesis was created that implies in an attributes subset pointed by a FSS, to obtain to distinguish all examples belonging the different classrooms and not to obtain to conclude that an example belongs the same classroom of another example, for having all its different conditional attributes between itself. To make possible the implementation of the proposal, it was necessary to introduce a simplification in the operation matrices, therefore its dimensions, for definition, are very great. With this, it was concluded its implementation, and in the sequence, the evaluation. The evaluations results, in the generality, had been satisfactory, with exception of some points that are displayed and argued in chapters 7 and 8 of this work.

ASSUNTO(S)

base de dados feature selection substes knowledge discover database matriz de discernimento teoria de rough sets seleção de subconjuntos de atributos redutos extração de conhecimento discernibility matrixs engenharia eletrica theory of rough sets

Documentos Relacionados