Avaliação da Relação entre Qualidade Perceptual da Fala e Taxa de Acerto de Sistemas de Reconhecimento de Fala em Ambientes Ruidosos

André Godoi Chiovato

The goal of this work is to evaluate the distortion of the noisy speech signal being after enhanced by noise-reduction algorithms. This is performed by comparison of word accuracy (%) of a standardized Automatic Speech Recognition (ASR) system and objective measures of perceptual speech quality (PESQ-MOS score), obtained after applying noise-reduction methods. The test scenario, composed of ETSI STQ-Aurora DSR Working Group database and a standardized ASR system, evaluated the following algorithms: WI008 (ETSI STQ-Aurora standard), EMSR (Ephraim and Malah noise Suppressor Rule Algorithm), NMT-PSS (Noise Masking Threshold Power Spectral Subtraction) and EMSR + NMT-PSS (EMSR algorithm with the concept of noise masking threshold). Moreover, a curve that models the relationship between PESQ-MOS score and Recognition Rate (%) is proposed. The purpose is to predict, under certain conditions, the system performance by means of the PESQ evaluation. This approximation is based in the Logistic Curve, which configuration parameters have physical meanings, validated by experimental results. Finally, some analysis are presented to indicate the advantages and disadvantages of several noise types present at Aurora 1 database over recognition system performance.

Avaliação da Relação entre Qualidade Perceptual da Fala e Taxa de Acerto de Sistemas de Reconhecimento de Fala em Ambientes Ruidosos

AUTOR(ES)

DATA DE PUBLICAÇÃO

RESUMO

ASSUNTO(S)

ACESSO AO ARTIGO

Documentos Relacionados