What is it about?
Speech models, such as those for speech-to-text, are becoming an integral part of our everyday experiences. However, these models are typically only evaluated with typical speech due to the lack of labeled datasets featuring diverse speech types. To address this, we add transcripts, timestamps, and disfluency labels to the FluencyBank dataset, making it a valuable resource for evaluating speech model performance with stuttered speech. We then compare the performance of existing speech recognition and disfluency detection models with typical speech and stuttered speech.
Featured Image
Photo by BandLab on Unsplash
Why is it important?
Speech-to-text has the power to ease our everyday tasks, such as sending messages or transcribing notes. However, transcription innaccuracies can lead to frustrated users, and our work has found that these errors increase with atypical speech. We hope the release of FluencyBank Timestamped will encourage research in the area consider model performance with a more diverse set of speech, ultimately making speech technology more accessible for all.
Perspectives
Read the Original
This page is a summary of: FluencyBank Timestamped: An Updated Data Set for Disfluency Detection and Automatic Intended Speech Recognition, Journal of Speech Language and Hearing Research, October 2024, American Speech-Language-Hearing Association (ASHA),
DOI: 10.1044/2024_jslhr-24-00070.
You can read the full text:
Contributors
The following have contributed to this page