The cortical auditory system is far more extensive than expected, as it includes not only the entire superior temporal gyrus (STG) but also large portions of the parietal, prefrontal, and limbic lobes. Anatomical evidence suggests that the auditory core constitutes the first stage of auditory cortical processing, with a serial progression from core outward, first to the surrounding auditory belt and then to the parabelt. Many speech sounds and animal vocalizations, for instance, contain components, commonly referred to as complex tonal stimuli, which consist of a fundamental frequency (f0) and higher harmonics. We perceive the pitch of a complex tonal stimulus by resolving the overtone harmonics to its fundamental frequency (f0) rather than perceiving each frequency separately. We hypothesized that the neural mechanism of such harmonic processing lay close to the tonotopically organized auditory core areas. Using single- and multi-unit techniques, we recorded neurons from different subdivisions of core and lateral belt in monkeys while they performed an auditory discrimination task. To examine whether these tonotopically organized auditory areas are best characterized by frequency or pitch, we used 84 pure-tone stimuli (110 Hz to 13.3k Hz), and 56 pitch-shifted monkey vocalizations (coos) that consisted of complex-tonal segments, such that f0 was matched to each pure-tone stimulus. Once we identified a neurons best frequency (BF), acoustic stimuli with the same center frequency but varying timbre were introduced by subtracting f0 and/or harmonics from the stimuli. Based on a neurons BF and responsiveness to tones and bandpass noise, the recording sites were attributed to the primary and rostral core fields (A1 and R) or the middle- and anterior-lateral belt fields (ML and AL). The latencies to BF tone stimuli were significantly different across the four regions. The latency to BF tone stimuli was also significantly shorter than the latency to best f0 harmonic stimuli in A1 but not in other areas, suggesting a pure-tone frequency representation in A1, whereas pitch may be represented in the other areas. Based on the neurons frequency-tuning response function, the relations of multiple-peak responses were analyzed. The number of peaks in fields R and AL were greater than in A1 and ML, often exhibiting harmonic intervals between peaks. These neurons may serve as high-resolution filters that extract specific harmonic features based on the frequency relationship between the peaks and the fundamental frequency. They may play an important role in harmonic processing and thus perception of complex auditory stimulus. We hypothesized that the rostral superior temporal plane (rSTP) contains the anterior extension of a rostrally directed auditory pathway, and, in particular, that auditory subdivisions within the rSTP form the continuation of a stimulus-quality processing stream originating in the auditory core area A1. Previously results from our physiology and imaging studies demonstrated that the STP contains a rostrally directed, hierarchically organized auditory processing stream, with gradually increasing stimulus selectivity, and that this stream extends from the primary auditory area to the superior temporal pole. Extrapolating from the difference in neuronal response latencies between the primary and rostral fields of the core auditory cortex, regions farther rostral on the STP may be expected to show longer response latencies than those observed in caudal areas. One may also predict an enhanced selectivity in these neurons for the spectral and temporal properties of the acoustic stimulus. We recorded single-unit activity across the rSTP. The stimulus set consisted of 21 sounds, each 300 ms in length, including synthesized sounds, animal vocalizations (rhesus and other species), and environmental sounds. At more rostral locations, latencies were longer overall, and more variable among neurons. Response selectivity was assessed by several metrics, these were highly variable along the rostral-caudal axis, and no clear trend toward greater selectivity among rostral neurons was evident. Whether neurons in rostral fields exhibit enhanced selectivity awaits further study. In a separate study we investigated whether the behavioral context could be discerned in the information carried by local field potentials (LFP), which represents the average subthreshold activity of the neural population in the proximity of the electrode tip, recorded from rSTP. LFPs were recorded during an auditory delayed-match-to-sample task: 2-4 auditory stimuli were presented sequentially, where the first sound (sample) was identical to the last sound (match) but not to intervening (nonmatch) sounds. The stimuli consisted of the same 21 sounds from 7 categories mentioned above. LFPs were decomposed into energy and phase components via the Hilbert transform. The mutual information between each LFP component (energy and phase) and non-overlapping 4-ms segments (treated as a different component or stimulus) of the sound presentation interval was computed. For both sample and match stimuli, LFP phase carried more information than did LFP energy. In the delayed match-to-sample task, the sample and match stimuli are acoustically identical but appear in a different behavioral context. Our preliminary analysis indicates that LFP information is significantly greater during presentation of the match stimulus than during presentation of the sample. This suggests that behavioral context affects information encoding by the LFP in rSTP, and that this cortical region may participate in auditory working memory.
Monkeys trained on a task designed to assess auditory recognition memory, were impaired after removal of either the rostral STG or the medial temporal lobe (MTL) but were unaffected by lesions of the rhinal cortex (Rh). These results compared with those obtained in other sensory modalities, in which Rh lesions produce severe impairment, have lead us to the tentative conclusion that the monkeys were unimpaired after Rh lesions because they had performed the task utilizing working memory. The apparent inability for long-term auditory recognition suggests that the auditory stimuli were processed without the participation of the Rh cortices and consequently ablating the Rh cortices had no deleterious effect. This apparent failure in monkeys also stands in contrast to the facility with which humans encode auditory stimuli in LTM, raising the question of whether the human ability is supported in some way by speech and language. To investigate this possibility, we asked whether humans can store representations of sounds that can be neither repeated nor labeled. Young adult participants were presented with four separate study lists of auditory stimuli differing in the degree to which speech or language could support encoding and storage in LTM: words, pseudowords, nonverbal sounds, and words played backwards (reversed words). Following rapid presentation of a study list, participants performed an unrelated filler task (e.g. counting tones) for 5 min to block rehearsal in working memory, after which they performed an old/new recognition task in which they judged which of the stimuli had been presented for study. Recognition scores were highest for words (81%), somewhat lower for pseudowords and nonverbal sounds (each 75%), and lowest by far for reversed words (58%; with chance, 50%). Our results indicate that memory for auditory stimuli is strongly influenced by its potential association with speech and language: The more that articulation and verbal labeling can be used to support storage of auditory information in LTM, the better the performance appears to be.