An objective evaluation framework for pathological speech synthesis

Published in ITG Conference on Speech Communication | 29.09.2021 - 01.10.2021 | Kiel, 2021

Introduction

The need for systems that process pathological speech is increasingly pressing. However, pathological speech processing remains a highly challenging area, as our understanding of speech is largely limited to typical, healthy speech. Furthermore, development of pathological speech systems is currently hindered by the lack of a standardised objective evaluation framework. In this work, (1) we utilise existing detection and analysis techniques to propose a general framework for the evaluation of synthetic speech in a consistent manner, then (2) using our proposed framework, we develop a dysarthric voice conversion system (VC) using CycleGAN-VC, and we show that the developed system is able to exhibit different levels of speech intelligibility.

Our proposed conversion scheme

You can find the COLAB demo of our conversion system by clicking here. Please note that the demo uses librosa’s phase vocoder instead of Praat’s PSOLA, while the model is trained with Praat’s PSOLA, so somewhat inferior performance is expected compared to the experiments.

Our proposed evalution system for rapid development

Voice conversion results

SpeakerOriginal speechConverted speechGround truth
M01
M04
M05
M07
M08
M09
M10
M11
M12
M14
M16
F02
F03
F04

Citation

Currently, no official pre-print or print of the work exists. For now, use the following citation if you have to reference the work:

@misc{halpern2021evaluation,
    title={An objective evaluation framework for pathological speech synthesis},
    author={Bence Mark Halpern and Julian Fritsch and Enno Hermann and Rob van Son and Odette Scharenborg and Mathew.-Magimai Doss},
    year={2021},
    primaryClass={eess.AS}
}