Publication: Automatic Speech Recognition as a Clinical Tool: Implications for Speech Assessment and Intervention
Open/View Files
Date
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
Citation
Abstract
Clinician judgments for speech assessment and treatment are subjective and costly, especially for large datasets. Automatic speech recognition (ASR) systems, designed to transcribe spoken speech, present a possible objective and efficient alternative to clinician judgments. ASR accuracy degrades with speech intelligibility, and the projects in this thesis leveraged that effect to test the use of ASR as a clinical tool. We developed noise-augmented ASR (nASR), in which we mixed speech samples with noise before inputting them to the ASR system. Because noise reduces ASR accuracy, nASR requires more highly articulated speech to achieve the same accuracy as standard ASR. The first study investigated the validity of ASR for dysarthria and intelligibility assessment in people with amyotrophic lateral sclerosis. Our results suggested that ASR is not a one-to-one proxy for clinician-provided transcription intelligibility but might be appropriate for coarse stratification of dysarthria severity, especially when using nASR to mitigate a ceiling effect for mildly impaired speech. The second study investigated the impact of KN95 masks on intelligibility and ASR accuracy for healthy speakers. We found acoustic and kinematic evidence that mask-wearers automatically adapted to the mask by increasing their vocal intensity and that mask-wearing did not affect intelligibility or ASR accuracy in the laboratory setting. Additionally, speaking clearly or loudly—but not slowly—improved intelligibility and ASR accuracy for mask-wearers. The third study evaluated whether feedback from nASR accuracy could elicit clear speech, a commonly used treatment for dysarthric speech. Healthy speakers read sentences in habitual, clear (over-enunciated), and nASR conditions. The nASR noise level was calibrated for each speaker to ensure an appropriate challenge. Clear and nASR speech were more intelligible, clearer, slower, and had increased vowel distinctiveness relative to habitual speech. These results showed that nASR feedback is a viable means of eliciting clear speech and could enable more accessible, independent therapy for speakers with dysarthria. Together these projects point to avenues of further development of ASR specifically for clinical use, including systems developed for dysarthria assessment, suggestions for mask-wearers to improve ASR accuracy, and a potential new form of clear speech therapy.