What is it about?

Speech models, such as those for speech-to-text, are becoming an integral part of our everyday experiences. However, these models are typically only evaluated with typical speech due to the lack of labeled datasets featuring diverse speech types. To address this, we add transcripts, timestamps, and disfluency labels to the FluencyBank dataset, making it a valuable resource for evaluating speech model performance with stuttered speech. We then compare the performance of existing speech recognition and disfluency detection models with typical speech and stuttered speech.

Featured Image

Why is it important?

Speech-to-text has the power to ease our everyday tasks, such as sending messages or transcribing notes. However, transcription innaccuracies can lead to frustrated users, and our work has found that these errors increase with atypical speech. We hope the release of FluencyBank Timestamped will encourage research in the area consider model performance with a more diverse set of speech, ultimately making speech technology more accessible for all.

Perspectives

I believe speech technology has the power to transform both how we interact with devices on a daily basis, and also how speech-language pathologists assess speech in the clinic. However, for this technology to be truly impactful, it needs to be adaptable to all users, including those with atypical speech patterns. By introducing FluencyBank Timestamped, we can push the boundaries of model development for diverse types of speech.

Amrit Romana
University of Michigan

Read the Original

This page is a summary of: FluencyBank Timestamped: An Updated Data Set for Disfluency Detection and Automatic Intended Speech Recognition, Journal of Speech Language and Hearing Research, October 2024, American Speech-Language-Hearing Association (ASHA),
DOI: 10.1044/2024_jslhr-24-00070.
You can read the full text:

Read

Contributors

The following have contributed to this page