Segmentação automatica e treinamento discriminativo aplicados a um sistema de reconhecimento de digitos conectados




Hidden Markov Model is actually the main approach to Speech Recognition problem, because of the good performance and high degree of flexibility that can be achieved. Unfortunately, this acoustical modeling is not optimum and some problems still affect it s robustness and performance in a more realistic condition. The weakness of the temporal modeling embedded in HMM is an example of a serious problem without well defined solutions. In fact, the implicit state duration model with exponential distribution may not describe the real linguistic units distributions. The hypothesis of independence between observations is other difficult problem to solve and it is incompatible with practical experiments because there is strong correlation between frames in the same acoustic segment. Some models and algorithms have been proposed to overcome or, at lest, attenuate those problems, such as Stochastic Segment Models and Explicit State Duration. This thesis presents an alternative approach to alleviate these problems, with relatively low computational cost. The information on phoneme boundaries in time is obtained through an Automatic segmentation algorithm and it is used in a Weighted Viterbi Algorithm in order to penalize the, models that generates inconsistent segmentations. Good results were achieved for various conditions related to connected digits application. The actual objective is to expand it to continuous speech recognition


markov redes neurais (computação) processos de algoritmos processamento de palavras reconhecimento automatico da voz

Documentos Relacionados