Is visual salience top-down or bottom-up?

Listeners perform an amplitude-modulation [AM] detection task by attending to a tone sequence and indicating presence of intermittent modulated target tones [orange note in Figure 1]. Concurrently, a busy acoustic scene is presented in the background and subjects are asked to completely ignore it. Background scenes are taken from the JHU DNSS [Dichotic Natural Salience Soundscapes] database for which behavioral estimates of salience timing and strength have been previously collected [Huang and Elhilali, 2017] [see Materialsandmethods for details]. In a first experiment, easy and hard AM detection tasks are interleaved in experimental blocks by changing the modulation depth of the target note [easy: 0 dB, hard: 5 dB]. As expected, subjects report a higher overall detection accuracy for the easy condition [75.4%] compared to the hard condition [48.2%]. Moreover, target detection [in both easy and hard conditions] is disrupted by presence of a salient event in the ignored background scenes; and detection accuracy drops significantly over a period up to a second after onset of the salient event [drop in detection accuracy; hard task, t[62] = 5.25, p=1.96*106; easy task, t[62] = 5.62, p=4.92*107]. Salient events attract listeners attention away from the task at hand and cause a drop in detection accuracy that is proportional to the salience level of background distractors; especially for high and midsalience events [hard task - highsalience event t[62] = 4.97, p=5.57*106; mid salience event t[62] = 3.70, p=4.54*104; low salience event t[62] = 0.75, p=0.46; easy task - high salience event t[62] = 4.20, p=8.54*105; mid salience event t[62] = 2.29, p=0.025; low salience event t[62] = 1.51, p=0.14]. In order to further explore neural underpinnings of changes in the attentional state of listeners, this paradigm is repeated with the easy task while neural activity is measured using Electroencephalography [EEG].

Stimulus paradigm during EEG recording.

Listeners are presented with two concurrent sounds in each each trial: [top stimulus] A recording of a natural audio clip, which subjects are asked to ignore; and [bottom stimulus] a rhythmic tone sequence, which subjects pay attention to and detect presence of occasional modulated tones [shown in orange]. A segment of one trial neural recording is shown in the bottom. Analyses focus on changes in neural responses due to presence of salient events in the ambient scene or target tones in the attended scene.

The attended tone sequence is presented at a regular tempo of 2.6 Hz and induces a strong overall phase-locked response around this frequency despite the concurrent presentation of a natural scene in the background. Figure 2A shows the grand average spectral profile of the neural response observed throughout the experiment. The plot clearly displays a strong energy at 2.6 Hz, with a left-lateralized fronto-central response, consistent with activation of Heschls gyrus and conforming to prior observations of precise phase-locking to relatively slow rates in core auditory cortex [Lütkenhöner and Steinsträter, 1998; Liégeois-Chauvel et al., 2004; Stropahl et al., 2018]. [Figure 2A, inset].

Phase-locking results.

[A] Spectral density across all stimuli. The peak in energy at the tone presentation frequency is marked by a red arrow. Inset shows average normalized tone-locking energy for individual electrodes. [B] Spectral density around target tones [top] and salient events [bottom]. Black lines show energy preceding the target or event, while colored lines depict energy following. Note that target tones are fewer throughout the experiment leading to lower resolution of the spectral profile. [C] Change in phase-locking energy across target tones, non-events, and salient events. [D] Change in tone-locking energy across high, mid, and low salience events. Error bars depict ±1 SEM.

Taking a closer look at this phase-locked activity aligned to the tone sequence, the response appears to change during the course of each trial, particularly when coinciding with task-specific AM tone targets, as well as when concurring with salient events in the background scene. Phase-locking near modulated-tone targets shows an increase in 2.6 Hz power relative to the average level, reflecting an expected increase in neural power induced by top-down attention [Figure 2B-top]. The same phase-locked response is notably reduced when tones coincide with salient events in the background [Figure 2B-bottom - blue curve], indicating diversion of resources away from the attended sequence and potential markers of distraction caused by salient events in the ignored background.

We contrast variability of 2.6 Hz phase-locked energy over 3 windows of interest in each trial: [i] near AM tone targets, [ii] near salient events and [iii] near tones chosen randomly away from either targets or salient events and used as control baseline responses. We compare activity in each of these windows relative to a preceding window [e.g. Figure 1, post vs. pre-event interval]. Figure 2C shows that phase-locking to 2.6 Hz after target tones increases significantly [t[443]=4.65, p=4.43*106], whereas it decreases significantly following salient events [t[443]=5.89, p < 107], relative to preceding non-target tones. A random sampling of tones away from target tones or salient events does not show any significant variability [t[443]=0.78, p=0.43, Bayes Factor 0.072] indicating a relatively stable phase-locked power in control segments of the experiment away from task-relevant targets or bottom-up background events [2C, middle bar]. Compared to each other, the top-down attentional effect due to target tones is significantly different from the inherent variability in phase-locked responses in control segments [t[886]=3.81, p=1.48*104]; while distraction due to salient events induces a decrease in phase-locking that is significantly different from inherent variability in control segments [t[886]=3.58, p=3.66*103].

Interestingly, this salience-induced decrease is modulated in strength by the level of salience of background events. The decrease in phase-locked energy is strongest for events with a higher level of salience [t[443]=3.78, p=1.8*104]. It is also significant for events with mid-level salience [t[443]=2.57, p=0.01], but marginally reduced though not significant for events with the lowest salience [t[359]=1.33, p=0.20, Bayes Factor BF 0.14] [Figure 2D]. A one-way ANOVA did not show a significant difference between the mean suppression at the three salience levels [F[1329]=1.65, p=0.19].

A potential confound to reduced phase-locking due to distraction could be local acoustic variability associated with salient events instead of actual deployment of bottom-up attention that disrupts phase-locking to the attended sequence. While this possibility is unlikely given the significant effect of salient events on behavioral detection of targets, we further reassess loss of phase-locking to the attended rhythm near events by excluding salient events with the highest loudness which could cause energetic masking effects [Moore, 2013]. This analysis confirms that phase-locking to 2.6 Hz is still significantly reduced relative to non-event control moments [t[443]=3.88, p < 103]. A complementary measure of loudness is also explored by excluding events with the highest energy in one equivalent rectangular bandwidth [ERB] around the tone frequency at 440 Hz [Moore and Glasberg, 1983]. Excluding the loudest 25% events by this measure still yields a significant reduction in tone-locking [t[443]=4.93, p=1.17*106]. In addition, we analyze acoustic attributes of all salient events in background scenes and compare their acoustic attributes to those of randomly selected intervals in non-salient segments. This comparison assesses whether salient events have unique acoustic attributes that are never observed at other moments in the scene. A Bhattacharyya coefficient -BC- [Kailath, 1967] reveals that salient events share the same global acoustic attributes as non-salient moments in the ambient background across a wide range of features [BC for loudness 0.9655, brightness 0.9851, pitch 0.9867, harmonicity .9775 and scale 0.9868]. Morever, the significant drop in phase locking is maintained when events are split by strength of low-level acoustic features such as harmonicity or brightness [High Harmonicity, t[443] = 3.75, p=1.97*104; Low Harmonicity, t[443] = 3.77, p=1.82*104; High Brightness, t[443] = 4.18, p=3.51*105; Low Brightness, t[443] = 3.26, p=1.21*103], further validating that the effect of salience is not solely due to low-level acoustic features.

The reduction of phase-locking to the attended sequences rhythm in presence of salient events raises the question whether these attention-grabbing instances result in momentary increased neural entrainment to the background scene. While the ambient scene does not contain a steady rate to examine exact phase-locking, its dynamic nature as a natural soundscape allows us to explore the fidelity of encoding of the stimulus envelope before and after salient events. Generally, synchronization of ignored stimuli tends to be greatly suppressed [Ding and Simon, 2012; Fuglsang et al., 2017]. Nonetheless, we note a momentary enhancement in decoding accuracy after high salience events compared to a preceding period [paired t-test, t[102] = 2.18, p=0.03] though no such effects are observed in mid [t[113]=1.09, p=0.28] and low salience [t[107]=0.24, p=0.81] events [Figure 3].

Reconstruction of ignored scene envelopes from neural responses before and after salient events for high, mid and low salience instances.

The accuracy quantifies the correlation between neural reconstructions and scene envelopes estimated using ridge regression [see Materialsandmethods]. Error bars depict ±1 SEM.

Next, we probe other markers of attentional shift and focus particularly on the Gamma band energy in the neural response [Ray et al., 2008]. We contrast spectral profiles of neural responses after target tones, salient events and during control tones. Figure 4A depicts a time-frequency profile of neural energy around modulated target tones [0 on the x-axis denotes the start of the target tone]. A strong increase in Gamma activity occurs after the onset of target tones and spans a broad spectral bandwidth from 40 to 120 Hz. Figure 4B shows the same time-frequency profile of neural energy relative to attended tones closest to a salient event. The figure clearly shows a decrease in spectral power post-onset of attended tones nearest salient events which is also spectrally broad, though strongest in a high-Gamma range [60120 Hz].

High gamma band energy results.

[A] Time frequency spectrogram of neural responses aligned to onsets nearest modulated targets, averaged across central and frontal electrodes. Contours depict the highest 80% and 95% of the gamma response. [B] Time frequency spectrogram of tones nearest salient events in the background scene. Contours depict the lowest 80% and 95% of the gamma response.[C] Change in energy in the high gamma frequency band [70110 Hz] across target tones, non-events, and salient events relative to a preceding time window. [D] Change in high gamma band energy across high, mid, and low salience events. Error bars depict ±1 SEM.

Figure 4C quantifies the variations of Gamma energy relative to targets, salient events, and control tones as compared to a preceding time window. High-Gamma band energy increases significantly following target tones [t[443]=11.5, p

Chủ Đề