Blog posts

2020

Human-Centered Systems Design and Unconscious Bias

5 minute read

Published:

As part of the TAPAS Marie Curie ETN, I had the opportunity along with my research peers in the program to listen to a series of talks on unconscious bias and human-centred systems designs. While the talks were all from a different perspective, there were certainly common lessons that can be learned from these talks. In this article, I’m going to summarise these lessons and reflect a bit on the user-centred design part.

Can we visualise style and speaker separations in Flowtron?

3 minute read

Published:

This week NVIDIA’s new TTS model, Flowtron [4], has been released. I worked a lot with the Mellotron and its score parser for the Eurovision AI Song Contest (where we got third place, hooray!), so I was really interested in checking out this new model.

Fiddling with papers: MOSNet

4 minute read

Published:

One of the recurring themes in my PhD is the topic of naturalness of signals, in particular the naturalness of speech. It is challenging to define what we mean by naturalness – in fact this is the problem itself. Though, we could find a definition for it in “natural language”, it wouldn’t give us the mathematical formula for it.

2019

Fiddling with AI singing: the Mellotron

2 minute read

Published:

It was one of my mini PhD goals to train a full-fledged Tacotron 2 model, but I lacked the project goal and resources to do it. So my winter “vacation” came, meaning I could finally fiddle with the Mellotron model.

Transcribing 3 hours of pathological speech

4 minute read

Published:

It has been again a month since my last blog post, but here we are again. Recently, I started a collaboration in Delft so that we can discover even more aspects of pathological speech. For that, I had to do transcriptions of three hours of oral cancer speech, and I’ve learned many lessons which I think are interesting for other people and might be widely applicable.

My reproducibility practices for ML

2 minute read

Published:

I’m currently working on making some of my code reproducible (meaning: producing the same exact results) on different platforms, and some of the lessons learned in the past few days hit so hard, that I thought it would be worth sharing these with other people.

My secondment at Oxford Wave Research

4 minute read

Published:

I spent two months, as part of my Marie Curie PhD program, at Oxford Wave Research, where I could learn about forensic speech technology from world-leading experts.

Toulouse: the third TAPAS training event

2 minute read

Published:

Before Interspeech, we had a quick opportunity to showcase our results so far, and talk about data management and research ethics.

A glimpse at Interspeech 2019 papers

2 minute read

Published:

Interspeech is drawing closer day by day, so I went through the planned schedule to cherry pick the titles that I found a bit more interesting than the others (from the perspective of pathological speech synthesis). I would like to use this opportunity to also go through the authors and show some impressive results.