An objective evaluation framework for pathological speech synthesis
Published in ITG Conference on Speech Communication | 29.09.2021 - 01.10.2021 | Kiel, 2021
Introduction
The need for systems that process pathological speech is increasingly pressing. However, pathological speech processing remains a highly challenging area, as our understanding of speech is largely limited to typical, healthy speech. Furthermore, development of pathological speech systems is currently hindered by the lack of a standardised objective evaluation framework. In this work, (1) we utilise existing detection and analysis techniques to propose a general framework for the evaluation of synthetic speech in a consistent manner, then (2) using our proposed framework, we develop a dysarthric voice conversion system (VC) using CycleGAN-VC, and we show that the developed system is able to exhibit different levels of speech intelligibility.
Our proposed conversion scheme
You can find the COLAB demo of our conversion system by clicking here. Please note that the demo uses librosa’s phase vocoder instead of Praat’s PSOLA, while the model is trained with Praat’s PSOLA, so somewhat inferior performance is expected compared to the experiments.
Our proposed evalution system for rapid development
Voice conversion results
Speaker | Original speech | Converted speech | Ground truth |
---|---|---|---|
M01 | |||
M04 | |||
M05 | |||
M07 | |||
M08 | |||
M09 | |||
M10 | |||
M11 | |||
M12 | |||
M14 | |||
M16 | |||
F02 | |||
F03 | |||
F04 |
Citation
Currently, no official pre-print or print of the work exists. For now, use the following citation if you have to reference the work:
@misc{halpern2021evaluation,
title={An objective evaluation framework for pathological speech synthesis},
author={Bence Mark Halpern and Julian Fritsch and Enno Hermann and Rob van Son and Odette Scharenborg and Mathew.-Magimai Doss},
year={2021},
primaryClass={eess.AS}
}