Publication: Speech Perception and Auditory Attention in Noisy and Multi-Talker Environments
Open/View Files
Date
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
Citation
Abstract
A key component of communication involves listening to, perceiving, and comprehending speech accurately to receive information from another individual. More often than not, speech stream does not occur in quiet but instead the speech of interest rides on environmental noise, is masked by speech babble from other talkers, or sometimes competes with a talker that is equal in level and close in space. Sustaining one’s listening towards a source while ignoring distractors requires attentional engagement to focus cognitive resources on the stimuli of interest. Unfortunately, some individuals have trouble hearing in background noise even though they have clinically normal pure-tone audiometric hearing thresholds. Here we present three works unified by a motivation to better understand this hearing challenge and to provide enhancement to individuals who have speech-in-noise (SIN) perception difficulties and/or struggle with controlling their auditory attention in complex scenes. To that end, this work focuses on the use of computational modeling and machine learning techniques to simulate, characterize, and augment human auditory perception and attention.
In the first work, we use computational modeling to predict performance on a speech-in-noise task, given various underlying pathologies in the auditory pathway. This gives us insight into the casual relationships between cochlear damage and speech-in noise that cannot yet be experimentally derived in lab or clinic due to the inability to study speech perception in animals and due to the lack of an established technique to non-invasively measure various suspected types of peripheral degradation in living humans. We use a two-stage model to track a speech signal input from the auditory periphery to the cortex where it produces an identification of the speech signal content. We simulate various cochlear degradation profiles to determine their impact in altering an auditory nerve representation of a large set of words-in-noise. These auditory nerve responses are then sent into a simulation of the listener’s cortical processing for classification. The cortical model’s SIN perception accuracy over various signal-to-noise-ratios (SNRs) parallels human SIN perception performance measures that are collected in clinical settings. Under the normal-hearing (NH) cochlear settings, the two-stage model achieved 50% digit-recognition accuracy at −20.7 dB SNR, approximately matching published SIN perception thresholds of –22dB SNR recorded from NH participants. These simulations complement the search for a non-invasive clinical diagnostic tool for hearing loss pathologies that may degrade SIN perception. We also show that auditory perception is not only impacted by peripheral health; central processing plays a large part in perception-related tasks.
While processing in the ascending auditory pathway is critical for speech perception, descending pathways of attention are known to be especially important in multi-talker environments. A listener’s auditory attention has the ability to change the encodings of auditory sources in the scene based on their importance to the listener. The fields of engineering and neuroscience are developing tools that can leverage a listener’s decoded attention to control an assistive auditory technology called a cognitively-controlled hearing aid (CCHA). In contrast to traditional hearing aids that provide frequency-dependent scene amplification, a CCHA can facilitate speech-source specific enhancement and distractor attenuation which opens up the possibility of improved speech perception with decreased expended listener effort. Auditory attention decoding (AAD) describes the computational modeling technique that leverages cortical differences in relative talker source representation, location, and timing to determine to whom a listener is attending. In the next two works we use AAD to characterize auditory attention through a listening task and as an augmentation tool in a neurofeedback paradigm.
First, we study voluntary switches in attention between competing speech sources in a novel attention paradigm. This study provides AAD-derived measures of naturalistic switches of attention which complements previous work that only evaluated attention decoding algorithms by artificially concatenating non-continuous data across experimental conditions. With voluntary attention switches, we find that expended listening effort, as measured by simultaneous electroencephalogram (EEG) alpha power and pupillometry, is a strong indicator of whether the listener sustains attention or shifts attention from one talker to another (minimum parietal alpha power measure, p = 0.016, and peak pupil diameter measure, p = 0.034). We also identify evidence of alpha-power based spatial measures that correspond to an increase in the suppression of the distractor source following a switch in attention to a new source (p = 0.042). These physiological measures of naturalistic switching could be incorporated into a CCHA that uses multiple modalities in its model of shifting attention between multiple speech sources. Novel approaches to decoding attention are welcome since decoding can always be improved in accuracy and speed. Moreover, there are listeners who severely struggle with the attention task, limiting the attention signal quality that is available to be decoded.
The last work is motivated by the goal of developing a training paradigm to rehabilitate and assist listeners with SIN perception and attention difficulties. We implement a customized closed-loop neurofeedback system that uses real-time decoded auditory attention to indicate to the listener how well they are performing on a continuous speech attention task. During neurofeedback trials, participants are asked to attend to a talker while the closed-loop system modifies the unattended talker presentation level in response to how well they are attending to the correct talker. We aim to then quantify talker representation measures over the course of the neurofeedback training session to provide insight on how the neurofeedback augments the attended talker’s entrainment and unattended talker’s suppression. We had hypothesized finding strengthened attended-talker entrainment over the course of the neurofeedback session but instead found evidence of increased unattended-talker suppression, i.e. a weakened decoded representation of the distractor speech stream over the session (p = 0.012). This foundational work provides the engineering and scientific groundwork for a future multi-session clinical trial of an auditory attention training paradigm.