Sumarização automática multidocumento: seleção de conteúdo com base no Modelo CST (Cross-document Structure Theory) / Multidocument sumarization: content selection based on CST (Cross-document Structure Theory)
AUTOR(ES)
Maria Lucía Del Rosario Castro Jorge
DATA DE PUBLICAÇÃO
2010
RESUMO
Multidocument summarization consists in producing a summary from a group of texts on a same topic, containing the most relevant information according to the users interest. Recently, with the huge amount of growing information over the internet and the short time available to learn and process the information of interest, automatic summaries have become a very important resource. In this work, we explored content selection methods for multidocument summarization based on CST (Cross-document Structure Theory) a recently proposed model and already investigated in the Computational Linguistics area. Particularly, in this work we defined and formalized content selection operators based on CST model. These operators represent possible summarization preferences and they focus on the treatment of the main challenges of multidocument summarization: redundancy, complementarity and contradiction among information. These operators are specified in templates containing rules and functions that relate the preferences to CST relations. Specifically, we define operators for extracting main information, context information, identifying authorship, treating redundancy and showing contradicted information. We also explored the impact of CST model over superficial summarization methods. Experiments were done using journalistic texts written in Brazilian Portuguese. Results show that the use of CST model helps to improve informativeness and quality in automatic summaries
ASSUNTO(S)
cst conteúdo seleção multidocument sumarization cst content sumarização. multidocumento selection
Documentos Relacionados
- Um sistema de disseminação seletiva da informação baseado em Cross-Document Structure Theory
- CARACTERIZAÇÃO DA COMPLEMENTARIDADE TEMPORAL: SUBSÍDIOS PARA SUMARIZAÇÃO AUTOMÁTICA MULTIDOCUMENTO
- Structure and Coding Content of CST (BART) Family RNAs of Epstein-Barr Virus
- Sumarização de vídeos de histerocopias diagnósticas
- SDIFF: UMA FERRAMENTA PARA COMPARAÇÃO DE DOCUMENTOS COM BASE NAS SUAS ESTRUTURAS SINTÁTICAS