PSYC summary

1. What is the general question or issue that motivated the study?

2. What is the specific question or hypothesis that the experiment is designed to address? (How does the general question lead to the specific one?)

3. What is the method of the experiment(s)—the materials and procedure? If multiple experiments were done, why were they done?

4. Identify the dependent and independent variables, and identify any confounds that might flaw the results (if any).

5. What are the results ( in a general way ).

6. If you had to design a follow-up study, what might you test next (and how)?

Psychonomic Bulletin & Review 1999, 6 (4), 641-646

A song’s identity is specified by its pitch and rhythmic structure. Accordingly, these structures have been the primary focus of psychological research on music (e.g., Jones & Yee, 1993; Krumhansl, 1990). Songs are a par- ticularly interesting domain of study because their iden- tity is determined from abstracted information about re- lations between tones, rather than from the tones’ absolute characteristics. For example, the frequency (pitch) of the initial tone of “Happy Birthday” can be selected arbitrar- ily, but the song will retain its identity if the relations (in- tervals) between tones are preserved. Hence, regardless of whether a song is sung with a high or a low voice, it is recognizable if its intervallic structure is maintained. Dif- ferences in tone durations (rhythm) work similarly. Songs can be sung fast or slow and still be recognized (within limits; see Warren, Gardner, Brubaker, & Bashford, 1991), if the durational differences between consecutive tones maintain the correct ratios.

By contrast, the sound quality of musical instruments (timbre) is irrelevant to a song’s identity. “Happy Birthday” is recognizable regardless of whether it is played on a trombone or a piano. Timbre is typically defined by what

it is not: characteristics of sounds other than pitch, dura- tion, or amplitude (see, e.g., Dowling & Harwood, 1986; Hajda, Kendall, Carterette, & Harshberger, 1997). Whereas these parameters can be measured on ordinal scales, tim- bre is multidimensional and diff icult to def ine (Hajda et al., 1997). Nonetheless, we know that listeners’ per- ception of timbre is a function of static attributes of tones, such as the steady state frequency distribution of har- monics, and of dynamic or time-varying attributes, such as changes in harmonics at tone onsets (see, e.g., Grey, 1977; Iverson & Krumhansl, 1993; McAdams, Wins- berg, Donnadieu, De Soete, & Krimphoff, 1995; Pitt & Crowder, 1992).

Although a song’s identity is defined by relational in- formation, this does not preclude the possibility that ab- solute information about pitch, tempo, or timbre is also stored in auditory memory. Absolute attributes of voices (e.g., pitch and timbre) are irrelevant to a word’s identity, yet talker identity is stored in episodic memory for words (Nygaard & Pisoni, 1998; Nygaard, Sommers, & Pisoni, 1994; Palmeri, Goldinger, & Pisoni, 1993). In the exper- iments conducted by Pisoni and his colleagues, partici- pants typically heard a list of words spoken by different talkers and were asked to identify words that had been presented previously in the list. Consistent with the prin- ciple of encoding specificity (Tulving & Thomson, 1973), recognition was best if the same talker said the word both times, but relatively poor when the repeated word was said by a different talker. Voice recognition may be somewhat unique, however, in that listeners appear to rely on differ- ent cues for different speakers; for example, some famous voices are recognized equally well when they are pre- sented backward or forward, presumably because listeners

641 Copyright 1999 Psychonomic Society, Inc.

Funding for this research was provided by a grant awarded to the first author from the Natural Sciences and Engineering Research Council of Canada. We thank Dennis Phillips for extensive discussions about all aspects of the study, Susan Hall for her assistance in preparing Figure 1, and Andrea Halpern, Dan Levitin, John Wixted, and an anonymous re- viewer for their insightful comments on earlier versions of the manuscript. Correspondence concerning this article should be addressed to E. G. Schellenberg, Department of Psychology, University of Toronto at Mis- sissauga, Mississauga, ON, L5L 1C6, Canada (e-mail: g.schellenberg@ utoronto.ca).

Name that tune: Identifying popular recordings from brief excerpts

E. GLENN SCHELLENBERG University of Toronto, Mississauga, Ontario, Canada

PAUL IVERSON University of Washington, Seattle, Washington

and

MARGARET C. MCKINNON University of Toronto, Mississauga, Ontario, Canada

We tested listeners’ ability to identify brief excerpts from popular recordings. Listeners were required to match 200- or 100-msec excerpts with the song titles and artists. Performance was well above chance levels for 200-msec excerpts and poorer but still better than chance for 100-msec excerpts. Perfor- mance fell to chance levels when dynamic (time-varying) information was disrupted by playing the 100-msec excerpts backward and when high-frequency information was omitted from the 100-msec excerpts; performance was unaffected by the removal of low-frequency information. In sum, success- ful identification required the presence of dynamic, high-frequency spectral information.

642 SCHELLENBERG, IVERSON, AND MCKINNON

are using cues other than those based on dynamic spectral information (Van Lancker, Kreiman, & Emmorey, 1985).

Absolute attributes also play an important role in memory for popular recordings, despite their irrelevance to a song’s identity. When respondents are asked to sing short passages from well-known recordings, they tend to do so at a pitch (Levitin, 1994) and tempo (Levitin & Cook, 1996) that closely approximate those of the origi- nal recordings. Anecdotal evidence indicates that listen- ers can recognize songs rapidly when scanning through radio stations for a song that they like or when participat- ing in radio contests (e.g., “Name that Tune”) that require identification of brief excerpts of recordings. Although it is possible that the limited relational information avail- able in these segments is sufficient for recognition, we suggest that such recognition relies more on absolute in- formation based primarily on timbre rather than on pitch or tempo. (Timbre can also refer to the global sound qual- ity of the recording and orchestration of a particular song.) Indeed, listeners’ ability to perceive differences in timbre is remarkable. For example, sequences of 10-msec tones with identical pitch but different timbres can be dis- tinguished from comparison sequences with the same tones played in a different order (Warren et al., 1991). Moreover, specific musical instruments can be identified in forced-choice tasks involving tones of similarly short durations (Robinson & Patterson, 1995a).

In the present investigation, listeners were asked to identify excerpts from recordings of popular songs that were too brief to contain any relational information. We selected five recordings that were highly popular in North America in the months preceding data collection and, therefore, likely to be familiar to undergraduates. Our goal was twofold: (1) to explore the limits of listeners’ abil- ity to identify recordings from very brief excerpts and (2) to identify stimulus attributes necessary for success- ful identification. Although our excerpts contained ab- solute information about pitch and timbre, their brevity (100 or 200 msec) precluded the possibility of identify- ing words or multiple tones presented successively. Our hypothesis was that listeners would rely on timbre more than on absolute pitch in these brief contexts. Accord- ingly, the excerpts were altered in some conditions, to ex- amine which attributes were important for identification. Specifically, we altered the distribution of frequencies in the harmonic spectrum through high-pass (frequencies < 1000 Hz attenuated) and low-pass (frequencies > 1000 Hz attenuated) filtering and the dynamic information by play- ing the excerpts backward. These alterations affected the timbre of the excerpts but had little impact on their per- ceived pitch. Thus, differential responding across condi- tions would indicate listeners’ greater reliance on timbre than on absolute pitch.

METHOD

Participants The listeners were 100 undergraduates enrolled in psychology

courses at a medium-sized Canadian university located a few miles

from downtown Detroit. Participation in the experiment took ap- proximately 20 min, for which the students received partial course credit. An additional 10 listeners were recruited but excluded from the testing session for failing to meet the inclusion criterion (see the Procedure section).

Apparatus and Stimulus Materials We searched through “HOT 100” charts in Billboard magazine to

select five recordings that were highly popular in North America in the months preceding data collection: (1) “Because You Loved Me,” performed by Celine Dion; (2) “Exhale (Shoop Shoop),” performed by Whitney Houston; (3) “Macarena,” performed by Los Del Rios; (4) “Missing,” performed by Everything But the Girl; and (5) “One Sweet Day,” performed by Mariah Carey and Boyz II Men. The ex- tensive airplay accorded these songs ensured that it was likely that anyone who had listened to popular music during this period had been exposed to all of them. The recordings were purchased on com- pact disc. An excerpt from each disc was digitally copied onto the hard disk of a Macintosh PowerPC 7100/66AV computer in 16-bit format (sampling rate of 22.05 kHz) using the SoundEdit 16 soft- ware program. Excerpt onsets were chosen to be maximally repre- sentative of the recordings (experimenters’ judgment); each started on a downbeat at the beginning of a bar. One of the excerpts (“Maca- rena”) contained no vocals.1

There were five experimental conditions. In one condition, the excerpts were 200 msec in duration; this duration was selected so that the task would be challenging but not impossible. In a second condition, the excerpts were shortened to 100 msec by deleting the second half. Frequency spectra at 50 msec from excerpt onsets are illustrated in Figure 1. In a third condition, the 100-msec excerpts were played backward (as in Van Lancker et al., 1985), which dis- rupted the dynamic information but had no effect on the static (steady state) information. In the remaining two conditions, the orig- inal (forward) 100-msec excerpts were high-pass or low-pass filtered (following D. L. Halpern, Blake, & Hillenbrand, 1986, but with a cutoff frequency of 1000 Hz, similar to Compton, 1963), using the SoundEdit program.2 The stimuli were presented to the listeners binaurally via headphones (Sony CD 550) at a comfortable listening level. Inclusion of 10-msec onset and offset ramps proved to be un- detectable to the experimenters, so the excerpts were not ramped.

Procedure The listeners were tested individually; 20 were assigned to each

of five conditions. They wore headphones and sat in front of the computer monitor in a quiet room. A SoundEdit file was open on the computer, which allowed the listeners to see the waveforms for each of the five excerpts. (None of the listeners reported any famil- iarity with waveforms.) The order of the waveforms was random- ized separately for each condition. To hear an excerpt, the listeners used a mouse connected to the computer and clicked on one of the waveforms. The listeners were provided with an answer sheet that listed the five artists and song titles (alphabetical order) and were required to match the five excerpts with the five songs on the an- swer sheet. This method differed from multiple-choice tasks in that the five judgments from any individual listener were not indepen- dent (e.g., one error ensured another error). The listeners were allowed to hear the test excerpts repeatedly and in any order they chose.

Prior to the test session, the participants were informed that there would be a pretest, to verify that they were familiar with the five songs used in the experiment. Because many of the students might have been familiar with the recordings but not with the names of the songs, the pretest also served to familiarize or refamiliarize the par- ticipants with the song titles and artists, as was required in the sub- sequent experiment. The pretest involved presenting a single 20-sec excerpt from each of the recordings and requiring listeners to match the five excerpts with the five song titles and artists, as in the actual experiment. The vocals in these excerpts did not reveal the titles of the songs, and the 20-sec excerpts did not contain the excerpts used

IDENTIFYING POPULAR RECORDINGS 643

in the actual experiment. Only listeners who scored 100% were in- cluded in the final sample, but all the participants received course credit, even if they failed to meet the inclusion criterion. The listen- ers were tested individually or in small groups during the screening process. A delay of several minutes between the screening session and the actual experiment prevented the listeners from retaining a representation of the excerpts in working memory.

RESULTS

For each condition, there were 120 (5 ! 4 ! 3 ! 2 ! 1) possible response combinations, each of which was equally likely if the listeners were guessing. The average number of correct responses for these 120 possibilities was one. Because the distribution of scores (number correct) based on chance levels of responding was not normal, the data were analyzed with nonparametric tests. Individual listeners were classified according to whether or not they performed better than chance (score > 1 or score ≤ 1). The probability of getting more than one correct response (two, three, or five correct)3 was 31/120 if listeners were guessing. Thus, only about 1 in 4 listeners (i.e., 5.17 out of 20 in each condition) should score better than chance, if listeners as a group were guessing. Figure 2 illustrates the number of listeners who performed above chance sep- arately for each condition. Mean scores for each condition

(provided below the figure) make it clear that dichoto- mizing the outcome variable did not affect the overall re- sponse pattern.

Chi-square goodness-of-fit tests were used separately for each condition, to examine whether the number of listeners with scores greater than 1 exceeded chance levels. Performance was much better than chance in the 200- msec condition [c2(1, n " 20) " 49.89, p < .001], with 19 of 20 listeners performing above chance. Group re- sponding remained above chance for the even briefer 100-msec stimuli [c2 (1, n " 20) " 8.87, p < .005]. Per- formance was also better than chance in the 100-msec high-pass filtered condition [c2 (1, n " 20) " 15.99, p < .001], but not in the low-pass filtered or backward con- ditions. A chi-square test of independence confirmed that the number of listeners performing above chance differed across conditions [c2 (4, N " 100) " 30.29, p < .001].

Performance in the 200-msec condition was superior to levels observed in the 100-msec condition [c2 (1, n " 40) " 8.53, p < .005]. This effect was evident for each of the five recordings and implies that successful identifi- cation of the recordings required the presence of dynamic information in the frequency spectrum, because the sta- tic (steady state) information and the absolute pitch of the excerpts would have been very similar for the 200-

Frequency (kHz)

R el

at iv

e A

m pl

itu de

( dB

)

0 2 4 6 8 10

One Sweet Day

Because You Loved Me

Exhale (Shoop Shoop)

Macarena

Missing

20 dB=

Figure 1. Relative amplitude of frequencies between 0 and 10 kHz in the unfiltered, forward excerpts. Spectra were derived using linear predictive coding (LPC) at 50 msec after the onset of each excerpt.

644 SCHELLENBERG, IVERSON, AND MCKINNON

and the 100-msec excerpts. This hypothesis was tested directly in the next comparison, which showed that per- formance was poorer in the backward 100-msec condition than it was in the forward 100-msec condition [c2 (1, n " 40) " 5.23, p < .05]. This decrement was evident for four of the five songs (all but “One Sweet Day”). Because sta- tic spectral information and absolute pitch were exactly the same in these two conditions, inferior performance with the backward excerpts provides confirmation of lis- teners’ reliance on dynamic information in the frequency spectrum.

In the next set of analyses, differences in performance as a function of the presence of low-frequency or high- frequency information were examined. Performance in the high-pass filtered condition was no different from levels observed in the original 100-msec condition; the number of listeners scoring above chance increased for two songs (“Because You Loved Me” and “Missing”), decreased for two songs (“Exhale” and “Macarena”), and remained un- changed for one song (“One Sweet Day”). Signif icant performance decrements were observed, however, in the low-pass condition, as compared with the original 100- msec and the high-pass conditions [c2 (1, n " 60) " 6.54, p < .05]; indeed, the low-pass condition had the fewest above-chance listeners for all the songs but one (“One Sweet Day”). Thus, successful identification of the ex- cerpts depended on the presence of high-frequency, but not on low-frequency, spectral information.

To examine the possibility that listeners were relying solely on vocal cues, rather than the timbre of the overall recordings, we examined song-by-song responding for each of the three conditions in which performance was bet- ter than chance. In each condition, absolute levels of per- formance were highest for the excerpt that did not con- tain any vocals (“Macarena”).

DISCUSSION

Our listeners were able to identify recordings of pop- ular songs from excerpts as brief as 0.1 sec, provided that dynamic, high-frequency information from the record- ings was present in the excerpts. The observed pattern of findings cannot be attributed to absolute-pitch cues or to recognition of specific voices. Rather, the spectra in Fig- ure 1 show that the excerpt with the highest levels of per- formance (“Macarena,” no vocals) had the densest concen- tration of energy between 1000 and 8000 Hz, which may have contributed to its relative distinctiveness. Listeners may also have been more familiar with “Macarena” than with the other recordings.

Listeners’ ability to identify complex musical stimuli from a minimal amount of perceptual information is sim- ilar to their abilities with speech. For example, 10-msec vowels can be identified reliably (Robinson & Patterson, 1995b; Suen & Beddoes, 1972), as can individual voices from vowel samples as brief as 25 msec (Compton, 1963).

0

5

10

15

20

200 ms 100 ms 100 msbackward 100 ms

high-pass 100 ms

low-pass Mean: 3.45 1.70 1.00 1.90 1.20

Li st

en er

s A

bo ve

C ha

nc e

Figure 2. Number of listeners exceeding chance levels (>1 correct response) for each testing condition (ns " 20). Hatched bars indicate conditions in which group perfor- mance was significantly better than chance. Mean scores (number of songs identified correctly) are provided below the figure.

IDENTIFYING POPULAR RECORDINGS 645

When respondents are asked to identify famous voices from a set of 60 different voices, performance starts to ex- ceed chance levels with samples of 250 msec (Schwein- berger, Herholz, & Sommer, 1997). The capacity to iden- tify speech stimuli from a minimal amount of information appears to be general enough to extend to other auditory domains—such as music—where the adaptive signif i- cance is much less obvious (Roederer, 1984). Although our findings do not imply that recognition of popular songs typically occurs in 100 msec, they provide unequivocal evidence that excerpts this brief contain information that can be used for identification. Moreover, our results re- veal that such information is timbral in nature and inde- pendent of absolute-pitch cues or changes in pitch and tone durations.

Our results extend those of Levitin (1994; Levitin & Cook, 1996; see, also, A. R. Halpern, 1989), who reported that memory representations for popular recordings con- tain absolute information about pitch and tempo. With very brief presentations, however, identification of re- cordings is primarily a function of timbre rather than of absolute pitch or tempo. Although information about tempo was unavailable in our brief excerpts, pitch is per- ceptible from tones as brief as 10 msec (Warren et al., 1991). Nonetheless, performance was at chance when our 100-msec excerpts were played backward or low-pass fil- tered. Because both manipulations would have dramati- cally disrupted attributes that are critical to timbre (dy- namic and static information, respectively) while having little impact on perceived pitch, it appears that timbre is more important than absolute pitch for identifying pop- ular recordings from very brief excerpts. This finding con- verges with others involving music and speech, which show that timbre (i.e., a specific musical instrument or vowel) is better identified than is pitch when stimuli are extremely brief (Robinson & Patterson, 1995a, 1995b).

The listeners’ dependence on timbre rather than on ab- solute pitch in the present investigation could stem from (1) the importance of timbral cues (i.e., voice qualities other than pitch) in speech, (2) the relative unimportance of absolute, as compared with relative pitch in music lis- tening, or (3) both of these factors. Although voices vary in pitch as well as in timbre, differences in pitch (i.e., av- erage fundamental frequency) between talkers of the same sex are relatively small; in a group of 12 women tested by Miller (1983), the SD was less than 2.5 semitones. None- theless, most people can rapidly identify many different female (or male) voices, despite similarities in pitch. Be- cause of the multidimensional nature of timbre, voice- quality cues are more distinctive than those based on pitch. Extensive experience discriminating voices on the basis of timbre could, in turn, influence processing in the musi- cal domain.

We also know that the ability to perceive musical pitch in an absolute manner is limited to a relatively small pro- portion of the population (approximately 1 in 10,000; see Takeuchi & Hulse, 1993). Absolute-pitch possessors can

identify a note by name (e.g., C, F♯, etc.) when it is played in isolation (an ability that is qualitatively different than remembering the pitch of a recording). Because such ab- solute-identification abilities tend to be automatic, they can interfere with relational processing strategies that are more relevant to music listening (Miyazaki, 1993). More- over, other evidence implies that absolute-pitch process- ing is actually a relatively primitive auditory strategy. For example, elevated prevalence levels have been reported among mentally retarded individuals, and absolute- rather than relative-pitch processing is the norm for nonhuman vertebrates (Ward & Burns, 1982).

At present, it is unclear why the portion of the spectrum above 1000 Hz is more important for song recognition than the portion below 1000 Hz. The high-pass filtered excerpts differed quantitatively from the low-pass ex- cerpts (e.g., they had more spectral information, because most of the harmonics in the excerpts were above 1000 Hz; see Figure 1), and qualitative differences may also have played a role (e.g., the high frequencies may have been more distinctive). It is also possible that high-frequency timbral information is either perceived or encoded in mem- ory with better detail, as compared with low-frequency information. Interestingly, Compton (1963) used speech samples that were low-pass and high-pass filtered much like our musical excerpts (cutoff frequency of 1020 Hz, rather than 1000 Hz) and reported results similar to ours. His respondents, who were asked to identify the talker, showed marked deficits in performance for low-pass fil- tered samples, but not for high-pass samples.

Performance levels in the present study were undoubt- edly inflated by two factors: (1) allowing the excerpts to be heard repeatedly, which would have enhanced percep- tual fluency for the repeated items (Jacoby & Dallas, 1981), and (2) the pretest session, which would have primed listeners’ memories of the songs. Indeed, exposure to the pretest excerpts could have allowed above-chance levels of performance to emerge even among listeners who had limited familiarity with the songs prior to the experiment. These listeners may have met the pretest in- clusion criterion by recognizing one or two of the singers, by a process of elimination, by luck, or by a combination of these factors, all of which may have influenced perfor- mance in the subsequent test session as well. Because the listeners received course credit even if they failed to meet the inclusion criterion (which excused them from the test session), however, it is unlikely that they falsely claimed familiarity with the tunes. Moreover, the time frame of the experiment prevented the listeners from retaining one or more of the excerpts in working memory. By defini- tion, then, the task required the listeners to rely primarily on representations in long-term memory of greater or lesser permanence. For example, such representations would be relatively permanent (or consolidated) for lis- teners with extensive familiarity with the tunes, but more temporary (or less consolidated) for other listeners, being retrievable only for the length of the experiment. Regard-

646 SCHELLENBERG, IVERSON, AND MCKINNON

less, the results make it clear that (1) the brief stimuli contained information that listeners could compare with their representations of the recordings and (2) this infor- mation was primarily timbral in nature. Future research could examine the generalizability of these findings with a broader selection of excerpts and a less constrained task. For example, different results might be obtained with re- cordings of soft-rock tunes or orchestral symphonies or with individual recordings in which the overall timbre is less distinctive. Representations that vary in degree of con- solidation could also differ in the way timbre is encoded.

It is important to clarify that absolute attributes in mem- ory representations for popular songs would be stored in combination with the relational information that defines the songs. Adult, child, and infant listeners recognize sim- ilarities between sequences of pure tones presented in transposition (different absolute pitch, same pitch and tem- poral relations; Schellenberg & Trehub, 1996a, 1996b). It is safe to assume, then, that our listeners would recog- nize previously unheard versions of, say, “Macarena,” per- formed by different singers, on different instruments, and in a key and tempo different from the original recording. Nonetheless, our results provide converging evidence that memory representations for complex auditory stimuli contain information about the absolute properties of the stimuli, in addition to more meaningful information ab- stracted from the relations between stimulus components. Indeed, in contexts with an extremely limited amount of information, listeners may rely primarily on the sound quality of the stimuli for successful identification and recognition.

REFERENCES

Compton, A. J. (1963). Effects of filtering and vocal duration upon the identification of speakers, aurally. Journal of the Acoustical Society of America, 35, 1748-1752.

Dowling, W. J., & Harwood, D. L. (1986). Music cognition. San Diego: Academic Press.

Grey, J. M. (1977). Multidimensional perceptual scaling of musical tim- bres. Journal of the Acoustical Society of America, 61, 1270-1277.

Hajda, J. M., Kendall, R. A., Carterette, E. C., & Harshberger, M. L. (1997). Methodological issues in timbre research. In I. Deliège & J. Sloboda (Eds.), Perception and cognition of music (pp. 253-306). Hove, U.K.: Psychology Press.

Halpern, A. R. (1989). Memory for the absolute pitch of familiar songs. Memory & Cognition, 17, 572-581.

Halpern, D. L., Blake, R., & Hillenbrand, J. (1986). Psychoacous- tics of a chilling sound. Perception & Psychophysics, 39, 77-80.

Iverson, P., & Krumhansl, C. L. (1993). Isolating the dynamic attrib- utes of musical timbre. Journal of the Acoustical Society of America, 94, 2595-2603.

Jacoby, L. L., & Dallas, M. (1981). On the relationship between auto- biographical memory and perceptual learning. Journal of Experimen- tal Psychology: General, 110, 306-340.

Jones, M. R., & Yee, W. (1993). Attending to auditory events: The role of temporal organization. In S. McAdams & E. Bigand (Eds.), Think- ing in sound: The cognitive psychology of human audition (pp. 69- 112). Oxford: Oxford Universty Press, Clarendon Press.

Krumhansl, C. L. (1990). Cognitive foundations of musical pitch. New York: Oxford University Press.

Levitin, D. J. (1994). Absolute memory for musical pitch: Evidence

from the production of learned melodies. Perception & Psychophysics, 56, 414-423.

Levitin, D. J., & Cook, P. R. (1996). Memory for musical tempo: Addi- tional evidence that auditory memory is absolute. Perception & Psychophysics, 58, 927-935.

McAdams, S., Winsberg, S., Donnadieu, S., De Soete, G., & Krimp- hoff, J. (1995). Perceptual scaling of synthesized musical timbres: Common dimensions, specificities, and latent subject classes. Psy- chological Research, 58, 177-192.

Miller, C. L. (1983). Developmental changes in male/female classifi- cation by infants. Infant Behavior & Development, 6, 313-330.

Miyazaki, K. (1993). Absolute pitch as an inability: Identification of musical intervals in a tonal context. Music Perception, 11, 55-72.

Nygaard, L. C., & Pisoni, D. B. (1998). Talker-specific learning in speech perception. Perception & Psychophysics, 60, 355-376.

Nygaard, L. C., Sommers, M. S., & Pisoni, D. B. (1994). Speech percep- tion as a talker-contingent process. Psychological Science, 5, 42-46.

Palmeri, T. J., Goldinger, S. D., & Pisoni, D. B. (1993). Episodic en- coding of voice attributes and recognition memory for spoken words. Journal of Experimental Psychology: Learning, Memory, & Cogni- tion, 19, 309-328.

Pitt, M. A., & Crowder, R. G. (1992). The role of spectral and dynamic cues in imagery for musical timbre. Journal of Experimental Psychol- ogy: Human Perception & Performance, 18, 728-738.

Robinson, K., & Patterson, R. D. (1995a). The duration required to identify an instrument, the octave, or the pitch chroma of a musical note. Music Perception, 13, 1-15.

Robinson, K., & Patterson, R. D. (1995b). The stimulus duration re- quired to identify vowels, their octave, and their pitch chroma. Jour- nal of the Acoustical Society of America, 98, 1858-1865.

Roederer, J. G. (1984). The search for a survival value of music. Music Perception, 1, 350-356.

Schellenberg, E. G., & Trehub, S. E. (1996a). Children’s discrimi- nation of melodic intervals. Developmental Psychology, 32, 1039-1050.

Schellenberg, E. G., & Trehub, S. E. (1996b). Natural intervals in music: A perspective from infant listeners. Psychological Science, 7, 272-277.

Schweinberger, S. R., Herholz, A., & Sommer, W. (1997). Recog- nizing familiar voices: Influence of stimulus duration and different types of retrieval cues. Journal of Speech, Language, & Hearing Re- search, 40, 453-463.

Suen, C. Y., & Beddoes, M. P. (1972). Discrimination of vowel sounds of very short duration. Perception & Psychophysics, 11, 417-419.

Takeuchi, A. H., & Hulse, S. H. (1993). Absolute pitch. Psychologi- cal Bulletin, 113, 345-361.

Tulving, E., & Thomson, D. M. (1973). Encoding specificity and re- trieval processes in episodic memory. Psychological Review, 80, 352- 373.

Van Lancker, D., Kreiman, J., & Emmorey, K. (1985). Familiar voice recognition: Patterns and parameters: Part I. Recognition of backward voices. Journal of Phonetics, 13, 19-38.

Ward, W. D., & Burns, E. M. (1982). Absolute pitch. In D. Deutsch (Ed.), The psychology of music (pp. 431-451). New York: Academic Press.

Warren, R. M., Gardner, D. A., Brubaker, B. S., & Bashford, J. A. (1991). Melodic and nonmelodic sequences of tones: Effects of du- ration on perception. Music Perception, 8, 277-290.

NOTES

1. Although the recording of “Macarena” contained vocals, the ex- cerpt did not.

2. Filtering is actually gradual rather than absolute; some frequencies on the unwanted side of the cutoff point are present with monotonically de- creasing amplitude (D. J. Levitin, personal communication, August 1998).

3. A score of four correct was impossible: In the present matching task, one error ensured another error.

(Manuscript received June 23, 1998; revision accepted for publication December 11, 1998.)

,

6

Ann. N.Y. Acad. Sci. 1060: 6–16 (2005). © 2005 New York Academy of Sciences. doi: 10.1196/annals.1360.002

Probing the Evolutionary Origins of Music Perception

JOSH MCDERMOTTa AND MARC D. HAUSERb aPerceptual Science Group, Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA bCognitive Evolution Laboratory, Department of Psychology, Harvard University, Cambridge, Massachusetts 02138

ABSTRACT: Empirical data have recently begun to inform debates on the evo- lutionary origins of music. In this paper we discuss some of our recent findings and related theoretical issues. We claim that theories of the origins of music will be usefully constrained if we can determine which aspects of music perception are innate, and, of those, which are uniquely human and specific to music. Comparative research in nonhuman animals, particularly nonhuman pri- mates, is thus critical to the debate. In this paper we focus on the preferences that characterize most humans’ experience of music, testing whether similar preferences exist in nonhuman primates. Our research suggests that many rudimentary acoustic preferences, such as those for consonant over dissonant intervals, may be unique to humans. If these preferences prove to be innate in humans, they may be candidates for music-specific adaptations. To establish whether such preferences are innate in humans, one important avenue for fu- ture research will be the collection of data from different cultures. This may be facilitated by studies conducted over the internet.

KEYWORDS: music; preferences; monkey; consonance; evolution; adaptation

INTRODUCTION

From the standpoint of evolutionary theory, music is among the most puzzling things that people do. As far as we know, music is universal, playing a significant role in every human culture that has ever been documented. People everywhere love music and expend valuable resources in order to produce and listen to it. Yet despite its central role in human culture, the evolutionary origins of music remain a great mystery. Unlike many other things that humans enjoy (e.g., food, sex, and sleep), music confers no obvious value to an organism, and for this reason music has puzzled evolutionary theorists since the time of Darwin.1

Although the adaptive function of music, if any, remains unknown, there is no shortage of proposals for how it might have evolved. Some have noted that music

Address for correspondence: Josh McDermott, Perceptual Science Group, Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, NE20-444, 3 Cambridge Center, Cambridge, MA 02139. Voice: 617-258-9412; fax: 617-253-8335.

[email protected]

7MCDERMOTT & HAUSER: ORIGINS OF MUSIC PERCEPTION

might promote social cohesion in group activities like war or religion; others have proposed a sexually selected role in courtship.1–6 Developmental psychologists have drawn attention to the pacifying effect music has on infant listeners, which could constitute an adaptive function.7 Still others suggest that music was not a product of natural selection and, instead, is a side effect of mechanisms that evolved for other functions.8 Despite the longstanding interest in music’s origins, there has thusfar been little empirical data with which to decide between these and other theories (see McDermott and Hauser9 for a review).

Rather than continue to speculate on putative adaptive functions, we have focused on gathering further empirical constraints on music’s origins. Our approach is to ex- amine aspects of human music perception, and for each of them attempt to answer three questions: (1) Is the feature in question innate in humans? (2) Is it unique to humans? and (3) Is it specific to music?

Each of these questions plays an important role in thinking about the evolution of music. Capacities that are innate, that is, determined from properties present in an organism at birth, are potential targets for evolutionary explanations, unlike capaci- ties that are learned. The question of uniqueness plays an equally important role, par- ticularly for music, because music is something that only humans do (see recent reviews10,29 for a discussion of animal song). If some feature of human music per- ception is found to be shared by a nonhuman animal, and that feature is assumed to be homologous to the human feature, then the feature in question must not have evolved for the purpose of making music. Testing for aspects of human music per- ception (e.g., octave equivalence,11–13 or relative pitch perception11,12,14) can thus place useful constraints on music’s origins. The third question of music specificity is most relevant for features of music perception that have been found to be uniquely human. If some aspect of music perception in humans is found to be innate and uniquely human, the possibility remains that it evolved to serve some uniquely human function other than music, such as language or mathematics. In contrast, perceptual capacities that are innate, unique, and specific to music are strong candi- dates for adaptations for music. We thus suggest that evolutionary theories of music perception would be well served by posing these three questions about different aspects of music perception.

PREFERENCES

In this paper we will discuss one particular aspect of music perception—prefer- ences—framed by the three questions about innateness, uniqueness, and specificity. Clearly, many preferences that humans have for music are culture specific, as humans tend to prefer the music of their own culture. Preferences for entire pieces or genres of music may, however, be built on more elementary preferences that could themselves be universal and innate in humans. One simple preference that has re- ceived great attention in music literature is that for consonance over dissonance. It has been widely appreciated since at least the time of the Greeks that some combi- nations of musical notes are more pleasing than others. Although the fact that con- sonant and dissonant intervals are perceptually distinct seems to follow from what is known about the peripheral auditory system,15–17 it remains unclear why conso- nance is preferable to dissonance. This preference is generally acknowledged to be

8 ANNALS NEW YORK ACADEMY OF SCIENCES

widespread among Westerners, but there is surprisingly little data from other cul- tures to support a claim of universality.18,19 Recent work in developmental psycho- logy, however, suggests that the preference for consonance is either innate or acquired very early, as infants as young as two months seem to exhibit the prefer- ence.20–22 There is thus some evidence that the preference is present independent from musical experience, although a larger cross-cultural database would help to augment the existing case.

Given the possibility that this and perhaps other elementary preferences are in- nate, our research has focused on the question of whether such preferences are unique to humans by testing for them in nonhuman primates. A consonance prefer- ence in a nonhuman primate would provide evidence that the preference did not evolve for the purpose of making and/or appreciating music, as nonhuman primates do not naturally make music. Conversely, any feature of music found to be uniquely human becomes a candidate for part of an adaptation for music, particularly if there is evidence that it is specific to music. Nonhuman subjects have the additional ad- vantage of being reared in a laboratory setting, in which their exposure to music can be controlled to an extent not possible in humans for practical and ethical reasons. As a result of this high level of control, many of the concerns often voiced about the role of musical exposure in experimental results from human infants can be decisive- ly addressed. We thus tested for various acoustic preferences, including that for consonance over dissonance, in nonhuman primates.

Our subjects in the experiments to be described are two species of new world monkey—cotton-top tamarins and common marmosets. Both species are native to the South American rain forest; their lineage diverged from that of humans approx- imately 48 million years ago (FIG. 1). They are generally regarded as the most prim-

FIGURE 1. Divergence times of some of the relevant taxonomic groups used in studies of the origins and evolution of music. The cotton-top tamarins and common marmosets used in our studies are New World monkeys. (Reproduced with permission from Hauser and McDermott.10)

9MCDERMOTT & HAUSER: ORIGINS OF MUSIC PERCEPTION

itive species of monkey, but are small (weighing on the order of about one pound) and harmless, making them useful experimental subjects. Their hearing characteris- tics have not been well explored, but the audiograms that have been measured in marmosets are similar to those of humans,23 and recent auditory physiology work suggests there may be higher-level similarities as well.24 Recent behavioral work in Japanese monkeys suggests that nonhuman primates can readily discriminate between consonance and dissonance,25 as one would expect given Helmholtzian the- ory and the recent physiological results that support it. What is unknown is whether nonhuman primates would also prefer consonance over dissonance as many humans do. All the animals used in our experiments were reared in captivity, and none had ever heard human music prior to the onset of the experiments.

A METHOD TO MEASURE PREFERENCES

To measure preferences in animals, we used a behavioral method in which sub- jects were placed in a V-shaped maze26 (FIG. 2); related methods have been devel- oped to test for preferences in birds.27 Each branch of the maze had a speaker at its end, and a subject’s position in the apparatus controlled their auditory environ- ment—one sound was played out of the left speaker when they were in the left branch of the maze, and another out of the right speaker when they were in the right branch. The stimulus for a particular side played continuously as long as the animal was on that side, and switched as soon as they switched sides. If a subject preferred

FIGURE 2. Photo of the apparatus used in nonhuman primate experiments. (Repro- duced with permission from McDermott and Hauser.26)

10 ANNALS NEW YORK ACADEMY OF SCIENCES

one of the two sounds over the other, one might expect them to spend more time in the corresponding side of the apparatus, so as to increase their exposure to the pre- ferred sound. We left animals in the apparatus for five-minute sessions and measured the proportion of time they spent on the left and right.

To verify that the method was appropriate for measuring preferences for sounds, we began by conducting two control experiments. In the first, we presented subjects with a choice between loud (90 dB) and soft (60 dB) white noise. We expected the animals to find the high amplitude noise aversive, and to thus spend more time on the side of the soft noise. The average results from six tamarins over four sessions are shown in FIGURE 3. The animals exhibited a pronounced bias toward the soft side as early as the first session, an effect that increased in the second session. Between the second and third sessions the side–sound pairings were reversed, to rule out ef- fects due to side biases. Following the reversal, the animals spent an average of 50% of the time on each side. Coupled with the increase in the effect from the first session to the second, this indicates that the animals had acquired a side–sound association that took time to be unlearned. By the fourth session (the second after the reversal), however, the effect had reversed, such that they again spent most of the time on the side with the soft noise. The results suggest that the animals learn to associate a side with a sound and modulate their position in the apparatus to reflect their preferences.

In a second control experiment, we presented tamarins with a choice between two classes of species-specific vocalizations: chirps that they emit in the presence of

FIGURE 3. Results of the first control experiment, in which animals were presented with a choice between loud and soft white noise. Each bar plots the average data from 6 sub- jects, as a proportion of the total time spent in the apparatus. Error bars here and elsewhere denote standard errors. The dashed line denotes the reversal of the side assignment that occurred after the second session. (Reproduced with permission from McDermott and Hauser.26)

11MCDERMOTT & HAUSER: ORIGINS OF MUSIC PERCEPTION

food, and screams that they make when being held by a veterinarian. We reasoned that they would be likely to have negative associations with the screams and positive associations with the chirps, and thus might spend more time on the side with the chirps than that with the screams. Recordings of the two types of vocalizations were equated in amplitude to minimize loudness differences. The same six tamarins were again run in several five-minute sessions. As shown in FIGURE 4, the tamarins spend more time on average with the chirps than with the screams, providing additional ev- idence that our method provides an appropriate behavioral assay for measuring pref- erences for sounds.

CONSONANCE AND DISSONANCE

We next proceeded to test for preferences for consonance over dissonance. Before testing our animal subjects with such stimuli, we ran an analogous experiment in humans to confirm that a behavioral method such as ours would demonstrate the consonance preference believed to be widespread in humans. Our human subjects were placed in a room divided in half with a strip of tape (FIG. 5). A concealed speak- er was situated on each side of the room, and as in the animal apparatus, each speaker was assigned a particular stimulus. Only one speaker was on at a time, triggered by a subject’s position in the room. Our human subjects were given no instructions and were merely told they would be left in the room for five minutes and videotaped. All subjects were naive as to the purpose of the experiment and were involved for a single session. As with the tamarins, we measured the proportion of time spent on each side.

The consonant stimulus in this experiment was a random sequence of two-note chords, the notes of which were separated by either an octave, and fifth, or a fourth.

FIGURE 4. Results from the second control experiment, comparing tamarin food chirps with distress screams. Data are averages across sessions. (Reproduced with permis- sion from McDermott and Hauser.26)

12 ANNALS NEW YORK ACADEMY OF SCIENCES

The dissonant stimulus was a similar sequence of minor seconds, tritones, and minor ninths. The notes composing the intervals were synthesized complex tones with ten harmonics. The bass note was always middle C. Each interval was 1.5 s in duration.

FIGURE 6 (left) plots the average results for four human subjects, all of whom spent most of their time on the side with the consonant intervals. Typically a human subject would wander around the room until by chance they crossed over the divid- ing line, thus changing the sound. After moving back and forth across the line a few times, they quickly realized that their position controlled the sound, and thereafter typically spent most of their time on the side of the sound they preferred. These re- sults suggested that our method would be sufficient to demonstrate a consonance preference in nonhuman primates were they to share this with humans,

FIGURE 6 (right) plots the average results for five tamarins. In contrast to the humans, they showed no effect. Note that the animals used in these experiments were the same ones used in the two control experiments, both of which yielded significant effects. Moreover, all 5 animals again showed a preference for loud over soft noise when tested at the conclusion of the consonance experiment, confirming that they had not somehow habituated to the apparatus or method. Rather, it seems that tamarins do not exhibit the preference for consonance over dissonance found in humans, even when tested with analogous methods.

FIGURE 5. Schematic of setup for human control experiments.

13MCDERMOTT & HAUSER: ORIGINS OF MUSIC PERCEPTION

SCREECHING

For a second test of whether nonhuman primates might share timbral preferences with humans, we turned to a sound that many humans find highly aversive—the sound of fingernails on a blackboard. We made recordings of a very similar sound produced by scraping a metal garden tool down a glass window; many listeners in- formally reported the sounds to be very unpleasant. Spectrograms of the sounds we recorded revealed harmonic structure superimposed on broadband noise, similar to what has been previously described.28 Little is known about why such sounds are so unpleasant, or about the relationship between the perceptual effect they have and that of musical stimuli, but given the strength of the reaction evoked in humans, they seemed a promising stimulus with which to test for timbral preferences in nonhuman primates.

We used a concatenation of several screech recordings as an experimental stimu- lus. For a control stimulus, we generated white noise with the same amplitude enve- lope as the screech stimulus. This control stimulus was as loud as the screech stimulus, but otherwise sounded quite different, and we intended it to be much less annoying to human listeners. We again began by running an experiment with human subjects, using the same method as for the consonance experiment. As expected, our method revealed a pronounced preference in humans for the white noise over the screech, shown in FIGURE 7a. In contrast, the tamarins showed no evidence of a pref- erence one way or the other, even when run over many sessions (FIG. 7b). Evidently the screeching sounds that are so annoying to most humans are not particularly aver- sive for tamarins, at least no more so than our amplitude-matched control stimulus.

FIGURE 6. Results from experiment comparing consonant and dissonant musical in- tervals. (Left) Results for human subjects. (Right) Results for tamarin subjects. (Reproduced with permission from McDermott and Hauser.26)

14 ANNALS NEW YORK ACADEMY OF SCIENCES

ARE HUMAN ACOUSTIC PREFERENCES UNIQUELY HUMAN?

Two timbral preferences that are pronounced in humans thus appear to be absent in cotton-top tamarins. We have recently replicated the consonance result in com- mon marmosets (McDermott and Hauser, unpublished data), and although it would be ideal to test other species of primates as well, our results raise the possibility that nonhuman primates may lack the timbral preferences that appear to at least partly underlie human appreciation of music.

One key difference between our primate subjects and our human subjects, how- ever, is that the humans all had a lifetime of exposure to music, as do virtually all humans. The consonance preferences apparently present in young infants suggest that a lifetime of exposure is not necessary to develop the preference, but whatever exposure the infants inevitably had may nonetheless be important. Given this, our results suggest three main possibilities: (1) Simple acoustic preferences for conso- nance and other stimuli could be innate in humans, and unique to them, given the absence of such preferences in the nonhuman primates we have tested. (2) Such preferences might not be unique to humans and could primarily be the result of ex- posure to musical stimuli, which our nonhuman primate subjects lacked. (3) Such preferences could require exposure to music but might also involve specialized learning mechanisms that could be unique to humans, and perhaps specific to music.

A key issue, therefore, involves determining the role of exposure to music. One important avenue for future research will be to explore the effects of extended mu- sical exposure on nonhuman animals. If nonhuman animals can develop preferences given enough exposure to human music, domain-general learning mechanisms might then also be responsible for human preferences. Conversely, if animals tested after musical exposure still do not exhibit any of the preferences found in humans, the case for uniqueness would be bolstered, for even with similar auditory experience, humans and nonhumans would exhibit different behavior. Further explorations of the

FIGURE 7. Results from experiment comparing a screeching sound to amplitude- matched white noise. (Left) Results for human subjects. (Right) Results for tamarin subjects. (Reproduced with permission from McDermott and Hauser.26)

15MCDERMOTT & HAUSER: ORIGINS OF MUSIC PERCEPTION

effects of musical exposure on humans could help to determine whether exposure coupled with uniquely human learning mechanisms is involved, or whether the pref- erences in question are, in fact, innate.

MUSIC UNIVERSALS STUDY

In an attempt to assess the effect of the varying musical exposure that occurs in different cultures, one of us (J.M.) has set up an experiment on the internet to mea- sure aspects of music perception in people all over the world. Anyone can participate in the Music Universals Study by visiting <http://music.media.mit.edu>. Our goal is to collect large amounts of data from people with vastly different musical cultures, to examine whether any aspects of music perception are invariant across culture. Dif- ferences across cultures would suggest an important role for learning. Web-based ex- periments are not a replacement for conventional cross-cultural studies, as the subject pool is limited to those with internet access, but they are potentially a useful additional tool with which to ask many questions of interest in music perception.

CONCLUSIONS

The role of musical exposure could also be clarified with a richer cross-cultural database. We propose that evolutionary theories of music’s origins will be facilitated by investigating whether aspects of music perception are innate in humans, and, of those, whether any are unique to humans and specific to music. Our studies of pref- erences in nonhuman primates suggest that many simple acoustic preferences that are pronounced in humans are not shared by our primate relatives. Additional re- search is needed to investigate the role of musical exposure, but such preferences may thus be innate and unique to humans. Given that some of them appear to be spe- cific to music, they are candidates for part of an adaptation for music. We believe that future research investigating the innateness, uniqueness, and specificity of other aspects of music perception will place strong constraints on the evolutionary origins of music.

ACKNOWLEDGMENTS

We are grateful to Matt Kamen, Altay Guvench, Fernando Vera, Adam Pearson, Tory Wobber, Matthew Sussman, and Alex Rosati for their assistance in running the experiments.

[Competing interests: The authors declare that they have no competing financial interests.]

REFERENCES

1. DARWIN, C. 1871. The Descent of Man and Selection in Relation to Sex. London. John Murray.

16 ANNALS NEW YORK ACADEMY OF SCIENCES

2. MERKER, B. 2000. Sychronous chorusing and human origins. In The Origins of Music. B. Merker & N. L. Wallin, Eds.: 315–327. The MIT Press. Cambridge, MA.

3. MILLER, G.F. 2001. The Mating Mind: How Sexual Choice Shaped the Evolution of Human Nature, 1st ed. Anchor Books. New York.

4. CROSS, I. 2001. Music, cognition, culture, and evolution. Ann. N. Y. Acad. Sci. 930: 28–42.

5. HURON, D. 2001. Is music an evolutionary adaptation? Ann. N. Y. Acad. Sci. 930: 43–61. 6. HAGEN, E.H. & G.A. BRYANT. 2003. Music and dance as a coalition signaling system.

Hum. Nat. 14: 21–51. 7. TREHUB, S.E. 2003. The developmental origins of musicality. Nat. Neurosci. 6: 669–

673. 8. PINKER, S. 1997. How the Mind Works 1st ed. Norton. New York. 9. MCDERMOTT, J. & M.D. HAUSER. 2005. The origins of music: innateness, uniqueness,

and evolution. Mus. Percept. In press. 10. HAUSER, M.D. & J. MCDERMOTT. 2003. The evolution of the music faculty: a compara-

tive perspective. Nat. Neurosci. 6: 663–668. 11. HULSE, S.H. & J. CYNX. 1985. Relative pitch perception is constrained by absolute

pitch in songbirds (Mimus, Molothrus, and Sturnus). J. Comp. Psychol. 99: 176–196. 12. D’AMATO, M.R. 1988. A search for tonal pattern perception in cebus monkeys: why

monkeys can’t hum a tune. Mus. Percept. 5: 453–480. 13. WRIGHT, A.A., J.J. RIVERA, S.H. HULSE, et al. 2000. Music perception and octave gen-

eralization in rhesus monkeys. J. Exp. Psychol. Gen. 129: 291–307. 14. BROSCH, M., E. SELEZNEVA, C. BUCKS & H. SCHEICH. 2004. Macaque monkeys dis-

criminate pitch relationships. Cognition 91: 259–272. 15. HELMHOLTZ, H.V. & A.J. ELLIS. 1954. On the Sensations of Tone as a Physiological

Having Trouble Meeting Your Deadline?

Get your assignment on PSYC summary completed on time. avoid delay and – ORDER NOW

Basis for the Theory of Music. 2nd English ed. Dover Publications. New York. 16. FISHMAN, Y.I., I.O. VOLKOV, M.D. NOH, et al. 2001. Consonance and dissonance of

musical chords: neural correlates in auditory cortex of monkeys and humans. J. Neurophysiol. 86: 2761–2788.

17. TRAMO, M.J., P.A. CARIANI, B. DELGUTTE & L.D. BRAIDA. 2001. Neurobiological foun- dations for the theory of harmony in Western tonal music. Ann. N. Y. Acad. Sci. 930: 92–116.

18. BUTLER, J.W. & P.G. DASTON. 1968. Musical consonance as musical preference: a cross-cultural study. J. Gen. Psychol. 79: 129–142.

19. MAHER, T.F. 1976. “Need for resolution” ratings for harmonic musical intervals: a comparison between Indians and Canadians. J. Cross Cultural Psychol. 7: 259–276.

20. ZENTNER, M.R. & J. KAGAN. 1996. Perception of music by infants. Nature 383: 29. 21. TRAINOR, L.J. & B.M. HEINMILLER. 1998. The development of evaluative responses to

music: infants prefer to listen to consonance over dissonance. Infant Behav. Dev. 21: 77–88.

22. TRAINOR, L.J., C.D. TSANG & V.H.W. CHEUNG. 2002. Preference for sensory conso- nance in two- and four-month-old infants. Mus. Percept. 20: 187–194.

23. SEIDEN, H.R. 1958. Auditory acuity of the marmoset monkey (Hapale jacchus). Unpublished doctoral dissertation, Princeton University.

24. BENDOR, D. & X. WANG. 2005. The neuronal representation of pitch in primate audi- tory cortex. Nature 436: 1161–1165.

25. IZUMI, A. 2000. Japanese monkeys perceive sensory consonance of chords. J. Acoust. Soc. Am. 108: 3073–3078.

26. MCDERMOTT, J. & M.D. HAUSER. 2004. Are consonant intervals music to their ears? Spontaneous acoustic preferences in a nonhuman primate. Cognition 94: B11–21.

27. WATANABE, S. & M. NEMOTO. 1998. Reinforcing property of music in Java sparrows (Padda oryzivora). Behav. Processes 43: 211–218.

28. HALPERN, D.L., R. BLAKE & J. HILLENBRAND. 1986. Psychoacoustics of a chilling sound. Percept. Psychophys. 39: 77–80.

29. FITCH, W.T. 2005. The evolution of music in comparative perspective. Ann. N. Y. Acad. Sci. 1060: 29–49.

,

PSYCHOLOGICAL SCIENCE

Research Article

262

Copyright © 2003 American Psychological Society VOL. 14, NO. 3, MAY 2003

GOOD PITCH MEMORY IS WIDESPREAD

E. Glenn Schellenberg and Sandra E. Trehub

University of Toronto at Mississauga, Mississauga, Ontario, Canada

Abstract—

Here we show that good pitch memory is widespread among adults with no musical training. We tested unselected college students on their memory for the pitch level of instrumental soundtracks from familiar television programs. Participants heard 5-s excerpts either at the original pitch level or shifted upward or downward by 1 or 2 semitones. They suc- cessfully identified the original pitch levels. Other participants who heard comparable excerpts from unfamiliar recordings could not do so. These findings reveal that ordinary listeners retain fine-grained information about pitch level over extended periods. Adults’ reportedly poor memory for pitch is likely to be a by-product of their inability to name isolated

pitches.

Absolute pitch

(AP; also called

perfect pitch

) is often viewed as a marker of musical giftedness (Takeuchi & Hulse, 1993; Ward, 1999), with an estimated incidence of 1 in 10,000. AP refers to the ability to identify or produce isolated tones in the absence of contextual cues or reference pitches. Upon awakening, for example, AP possessors can label or sing middle C (262 Hz) or concert A (440 Hz). In other words, they have long-term memory for musically relevant pitches, and they remember those pitches by name (Levitin, 1994). AP is thought to dif- fer from other human abilities in its bimodal distribution (Takeuchi & Hulse, 1993): Either you have it or you do not. For people who do not, memory for isolated pitches is thought to fade quickly with the pas- sage of time (Burns, 1999). According to Krumhansl (2000), “pitch memory is approximately equal for possessors and nonpossessors of AP for delays up to one minute, but only AP possessors perform above chance for longer delays” (p. 167). AP possessors do not differ from other musicians in their memory for tone frequencies that are musi- cally irrelevant (e.g., tones outside the musical range, mistuned tones), nor do they differ in their ability to discriminate pitches or in most other musical abilities (Ward, 1999). In short, the uniqueness of AP possessors is restricted to their rapid and effortless identification and production of isolated tones.

AP is found almost exclusively among individuals who began music lessons in early childhood (Takeuchi & Hulse, 1993), which implies a

critical period

for its acquisition. In one large sample of musicians, 40% of those who began musical training before 4 years of age had AP, com- pared with 27% who began between ages 4 and 6 years, and 8% who be- gan between ages 6 and 9 years (Baharloo, Johnston, Service, Gitschier, & Freimer, 1998). Although early training is the best predictor of AP, it does not guarantee AP. Genetic factors also make important contributions. For example, individuals with AP are considerably more likely than those without AP to have siblings with AP, even when amount of musical train- ing and age of onset are taken into account (Baharloo, Service, Risch, Gitschier, & Freimer, 2000).

For normally developing children, relative pitch processing is thought to replace absolute pitch processing during the preschool years

(Saffran & Griepentrog, 2001; Takeuchi & Hulse, 1993), with only a small minority (i.e., AP possessors) retaining both modes of process- ing. Relative pitch processing—a widespread skill—lies at the heart of music and its appreciation. For example, identifying a familiar tune (e.g., “The Star Spangled Banner”), whether it is performed at a high pitch level (e.g., sung by a soprano, played on a piccolo) or at a low pitch level (e.g., sung by a baritone, played on a tuba), depends on the listener’s knowledge of pitch relations. Whereas non-AP musicians share AP possessors’ explicit knowledge of musical note names and pitch intervals (i.e., relations between musical notes), they do not share AP possessors’ accurate memory for individual pitches (Ben- guerel & Westdal, 1991). Nevertheless, given one musical tone, such as C, non-AP musicians can use their knowledge of intervals to iden- tify or generate other musical tones, such as F or G (5 or 7 semitones from C). AP possessors tend to approach such tasks on a tone-by-tone basis, reflecting their bias for absolute over relative processing. As a result, they name intervals more slowly and less accurately than do non-AP musicians, which implicates AP as a nonmusical mode of pro- cessing (Miyazaki, 1995).

In contrast to musicians with or without AP, nonmusicians cannot name any musical intervals or tones. Nonetheless, they can identify fa- miliar melodies presented at novel pitch levels, and they notice when such melodies are performed incorrectly (Drayna, Manichaikul, de Lange, Snieder, & Spector, 2001), which confirms the accuracy of their implicit memory for pitch relations. There is speculation that the higher-than-usual incidence of AP in autistic and developmentally de- layed populations (Heaton, Hermelin, & Pring, 1998; Heaton, Pring, & Hermelin, 1999; Lenhoff, Perales, & Hickok, 2001a, 2001b; Mot- tron, Peretz, Belleville, & Rouleau, 1999; Young & Nettlebeck, 1995) stems from deficient relational processing (Ward, 1999). These atypi- cally developing individuals may fail to generalize song-defining pitch relations across pitch levels (e.g., the first four tones of “Twinkle Twinkle Little Star” can be CCGG, DDAA, EEBB, and so on, with the last two tones being 7 semitones higher than the first two).

Our goal in the present investigation was to demystify the phenom- enon of AP by documenting adults’ memory for pitch under ecologi- cally valid conditions. We hypothesized that the reportedly poor pitch memory of ordinary adults is an artifact of conventional test proce- dures, which involve isolated tones and pitch-naming tasks. Isolated tones are musically meaningless to all but AP possessors, and pitch naming necessarily excludes individuals without musical training. Much recent research focuses on knowledge acquired without explicit awareness (e.g., Goshen-Gottstein, Moscovitch, & Melo, 2000; Reber & Allen, 2000; Tillman, Bharucha, & Bigand, 2000). Thus, the ab- sence of explicit memory for pitch level does not preclude relevant im- plicit knowledge. We also expected that implicit memory for pitch, like most other human abilities, would be normally distributed rather than bimodally distributed.

Previous indications that nonmusicians retain in memory some sensory attributes of music arise from studies that have included meaningful test materials (Bergeson & Trehub, 2002; Halpern, 1989; Levitin & Cook, 1996; Palmer, Jungers, & Jusczyk, 2001; Schellenberg,

Address correspondence to Glenn Schellenberg, Department of Psychol- ogy, University of Toronto at Mississauga, Mississauga, ON, Canada L5L 1C6; e-mail: [email protected].

PSYCHOLOGICAL SCIENCE

E. Glenn Schellenberg and Sandra E. Trehub

VOL. 14, NO. 3, MAY 2003

263

Iverson, & McKinnon, 1999). For example, college students with lim- ited musical training can identify familiar recordings of popular songs (i.e., songs heard previously at the same pitch level, tempo, and tim- bre) from excerpts as short as 100 ms (Schellenberg et al., 1999). Such brief excerpts preclude the use of relational cues, forcing listeners to rely on absolute features from the overall timbre or frequency spec- trum. When adults sing hit songs from recordings heard repeatedly, al- most two thirds of these productions are within 2 semitones of the recorded versions (Levitin, 1994), and their tempo (speed) is within 8% of the originals (Levitin & Cook, 1996). Adults show similar con- sistency in pitch level and tempo when they sing familiar songs from the folk repertoire (e.g., “Yankee Doodle”) on different occasions, even though they would have heard these songs at several pitch levels and tempi (Bergeson & Trehub, 2002; Halpern, 1989).

Although the song-production data (Bergeson & Trehub, 2002; Halpern, 1989; Levitin, 1994) imply accurate pitch memory, the con- tributions of cognitive and motor factors are inseparable in these stud- ies. For example, movement patterns associated with song production (i.e., motor memory) may be implicated. Moreover, the limited pitch range of musically untrained individuals may generate pitch consis- tency that has little to do with memory. Nonetheless, the findings highlight the potential of familiar materials to reveal nonmusicians’ memory for acoustic features.

We tested memory for the pitch level of musical recordings heard frequently at one pitch level only. We expected that contextually rich materials would reveal the generality of long-term memory for pitch and the normal distribution of this ability. In Experiment 1, adult lis- teners heard excerpts from highly familiar recordings. On each trial, the same instrumental excerpt was presented twice, once at the origi- nal pitch level and once shifted upward or downward in pitch by 1 or 2 semitones. Participants attempted to identify which excerpt (the first or the second) was presented at the correct pitch level, that is, the only pitch level at which they had heard the recording previously. Experi- ment 2 was identical except that a different group of listeners made judgments about unfamiliar recordings that were pitch-shifted by 2 semitones. In other words, it was a “control” experiment designed to ascertain whether factors other than pitch memory (e.g., the audio ma- nipulation, composers’ use of particular keys) contribute to successful identification.

EXPERIMENT 1

Method

Participants

The participants in Experiment 1 were 48 college students. Re- cruitment was limited to students familiar with the six television pro- grams from which the stimuli were excerpted. The skewed distribution of musical training (i.e., years of music lessons) was typical of college populations, with a mean of 5.1, a median of 3, and a mode of zero. None of the participants reported having AP.

Stimuli and apparatus

The recordings were instrumental excerpts from six popular tele- vision programs: “E.R.,” “Friends,” “Jeopardy,” “Law & Order,” “The Simpsons,” and “X-Files” (keys of B minor, A major, E-flat ma-

jor, G minor, C-sharp major, and A minor, respectively). Each recording had multiple instruments, each with multiple pure-tone components. The selection criteria were as follows: (a) popularity with undergraduates, as estimated in a pilot study, and (b) a musical theme with at least 5 s of in- strumental music. The theme music was saved as CD-quality sound files on an iMac computer. For five of the six programs, the 5-s instrumental excerpt was from the beginning of the program. For “Jeopardy,” the ex- cerpt was from Final Jeopardy. In all cases, the excerpt was selected to be maximally representative of the overall recording.

The excerpts were shifted in pitch by 1 or 2 semitones with Pro Tools (DigiDesign) digital-editing software, which is used commonly in professional recording studios.

1

Pitch shifting had no discernible ef- fect on tempo (speed) or overall sound quality. Within each semitone condition, the “incorrect” excerpt for a given musical selection was al- ways shifted in one direction (upward for three, downward for three), to eliminate the option of selecting the middle pitch level and to en- sure that correct and incorrect excerpts were presented equally often; the participants were divided into two equal groups, and the direction of pitch shifts was reversed for the two groups. Pitch shifts involved multiplying (for upward shifts) or dividing (for downward shifts) all frequencies in an excerpt by a factor of 1.12 for 2-semitone shifts and 1.06 for 1-semitone shifts. For example, a 2-semitone upward shift in- volved a change from 262 Hz to 294 Hz.

To eliminate potential cues from the electronic manipulation, we also shifted the pitch level of the correct excerpts. The original excerpts were shifted upward and then downward by 1 semitone (all frequen- cies multiplied and subsequently divided by 1.06) in the 2-semitone condition and by half a semitone (frequencies multiplied and divided by 1.03) in the 1-semitone condition. The monaural excerpts were pre- sented binaurally over lightweight headphones while participants sat in a sound-attenuating booth. (Sample stimuli are available on the Web at www.erin.utoronto.ca/~w3psygs.)

Procedure

Participants were tested in two test sessions on different days no more than 1 week apart. The incorrect excerpts were shifted by 2 semi- tones in one session and by 1 semitone in the other, with order of ses- sions counterbalanced. The 2-semitone pitch shifts were orthogonal to the 1-semitone shifts, such that the direction of shift was reversed for half of the excerpts across sessions. Each session consisted of five blocks of six trials. Each block had one trial for each excerpt, with trials presented in random order. The first block served as a practice block. On each trial, listeners heard one version of a 5-s excerpt at the original pitch level and another version at the altered (upward or downward) pitch, with the two excerpts separated by 2 s. Order (original-altered or altered-original) was counterbalanced. Participants were told that they would hear two versions of the same theme song on each trial, with one version at the correct pitch and the other version shifted higher or lower. Their task was to identify the excerpt (first or second) at the correct (i.e., usual) pitch level. They received no feedback for correct or incorrect re- sponses. Participants also completed a brief questionnaire about their musical background, and they provided cumulative viewing estimates for each program (i.e., lifetime viewing estimates).

1. A free version of the software (Pro Tools Free) that includes the pitch-shift- ing function can be downloaded from the Internet (http://www.digidesign. com).

PSYCHOLOGICAL SCIENCE

Good Pitch Memory Is Widespread

264

VOL. 14, NO. 3, MAY 2003

Results

The outcome measure was the percentage of correct responses. Be- cause order of presentation (1- or 2-semitone change first) and stimulus set (i.e., excerpts shifted upward or downward) did not affect performance or interact with other variables, they were excluded from further consider- ation. Performance exceeded chance levels (50% correct) for the 1-semi- tone comparisons (58% correct),

t

(47)

4.00,

p

.001, and for the 2- semitone comparisons (70% correct),

t

(47)

9.40,

p

.001, with supe- rior performance on the larger shifts,

t

(47)

4.46,

p

.001 (see Fig. 1). (This finding was replicated with different listeners and a slightly different task: yes/no judgments for single excerpts rather than selection of one of two alternatives. Performance remained significantly above chance and commensurate with the levels in the main study reported here.) Perfor- mance on the first trial of each excerpt significantly exceeded perfor- mance on subsequent trials, which implies that increasing exposure to pitch-shifted excerpts interfered with memory for the original pitch level.

For subsequent analyses, performance was calculated across the 1- and 2-semitone conditions. As can be seen in Figure 2, the fre- quency distribution for performance accuracy approximated a normal curve. Performance was far from perfect, but it was remarkably con- sistent, with only 3 of 48 participants performing below 50% correct (binomial test,

p

.001). Performance was not significantly corre- lated with musical training,

r

.242,

p

.952 (one-tailed). Differences in performance among the six excerpts were examined

with a one-way repeated measures analysis of variance. The analysis re- vealed that some musical excerpts were identified better than others,

F

(5, 235)

5.59,

p

.001, with performance at 60% correct or below for some excerpts (“The Simpsons”—57%; “E.R.”—60%) and above 70% for others (“Friends”—71%, “X-Files”—71%). Pair-wise comparisons re- vealed better performance for “Friends” and “X-Files” than for the other four excerpts,

p

s

.03, but no differences between other pairs of excerpts. For all six excerpts, performance exceeded chance levels,

p

s

.03. Lifetime-viewing estimates for the TV programs are summarized in

Table 1. For each program, the distribution of estimates was positively skewed because some individual estimates for particular programs were extremely high. (For example, “Friends” and “The Simpsons” are

broadcast several times daily.) We evaluated the possibility that viewing estimates for a particular TV program predicted performance for that excerpt better than for the other five excerpts. Although the six within- program correlations (e.g., exposure to “X-Files” and pitch memory for “X-Files”) were low (highest

r

.372,

p

.005, for “The Simpsons”; lowest

r

.074,

p

.5, for “Law & Order”), they were significantly higher than the 30 cross-program correlations (e.g., exposure to “X-Files” and pitch memory for “Friends”),

p

.010 (Mann-Whitney test).

EXPERIMENT 2

We conducted a further experiment to rule out the possibility that the results of Experiment 1 were due to singularly appropriate pitch levels for the original excerpts or to obscure electronic cues from the pitch-shifting manipulation. If either of these factors accounted for adults’ ability to identify the familiar recordings in Experiment 1, then similar findings should be obtained with unfamiliar recordings.

Fig. 1. Performance on excerpts pitch-shifted by 1 or 2 semitones (chance � 50%). The excerpts were familiar in Experiment 1 and un- familiar in Experiment 2. Error bars represent standard errors.

Fig. 2. Distribution of performance levels in Experiment 1 collapsed across 1- and 2-semitone pitch shifts. The distribution, which is cen- tered markedly to the right of chance levels (50% correct), approxi- mates the normal curve (mean � 64%, median � 65%). Scores at the boundary between two categories were grouped in the higher category.

Table 1.

Mean estimates of total number of episodes viewed for the television programs in Experiment 1

Program Viewing estimate

E.R. 55 (114) Friends 337 (631) Jeopardy 370 (1,137) Law & Order 319 (824) The Simpsons 1,094 (2,258) X-Files 127 (226)

Note.

Standard deviations are in parentheses.

PSYCHOLOGICAL SCIENCE

E. Glenn Schellenberg and Sandra E. Trehub

VOL. 14, NO. 3, MAY 2003

265

Method

The participants were 48 college students who did not take part in Experiment 1. Recruitment was limited to students who were familiar with the six programs from the previous experiment. The method was the same as in Experiment 1 with two exceptions: (a) Musical excerpts were taken from unfamiliar recordings, and (b) there was a single test session in which excerpts at the original pitch level were paired with excerpts pitch-shifted upward or downward by 2 semitones. Partici- pants were told that they would hear a series of trials in which a musi- cal excerpt would be played twice, once at the original pitch level and once shifted upward or downward in pitch. Their task was to identify the correct, unshifted excerpt (first or second).

In two cases, we replaced the familiar recording from Experiment 1 with an unfamiliar recording by the same composer: The theme from “Silk Stalkings” (in C minor), an HBO police drama from the 1980s, re- placed the theme from “Law & Order”; and the theme from “Gremlins” (E-flat major), a 1984 film, replaced the theme from “The Simpsons.” In the other instances, the unfamiliar recordings retained the style and in- strumentation of the original. A pop song, “Circle of Friends” (A major, by Better Than Ezra), replaced music from “Friends”; theme music from “Match Game” (C major), a game show from the 1980s, replaced the “Jeopardy” theme; music from Ninja Gaiden (B-flat minor), a Nintendo video game, replaced the “E.R.” theme; and music from Tenchu Stealth Assassin (E-flat minor), from SONY PlayStation, replaced the theme from “X-Files.”

Results

Overall performance did not differ from chance levels (49% cor- rect) and was significantly poorer than performance in the 2-semitone condition of Experiment 1,

t

(94)

8.18,

p

.001 (see Fig. 1). Perfor- mance ranged from 40.1% to 55.7% correct across excerpts but did not exceed chance levels for any excerpt (

p

s

.2). Clearly, the pitch- shifting procedure did not generate cues that enabled listeners to dis- tinguish unfamiliar, pitch-shifted excerpts from the original versions. Moreover, there was no indication that any pitch level or key, includ- ing common keys (e.g., C major, A major), was considered more ap- propriate than any other.

DISCUSSION

Our results provide unequivocal evidence that adults with little mu- sical training remember the pitch level of familiar instrumental record- ings, as reflected in their ability to distinguish the correct version from versions shifted upward or downward by 1 or 2 semitones. Their fail- ure to identify the correct pitch level of unfamiliar musical recordings rules out contributions from potential artifacts of the pitch-shifting process. Long-term memory for pitch that permits successful identifi- cation of 1-semitone alterations is especially interesting for two rea- sons. First, musicians with AP often make 1-semitone errors (Lockhead & Byrd, 1981; Miyazaki, 1988), which raises the possibility that ordi- nary adults’ memory for the pitch level of highly familiar music is similar to AP possessors’ memory for isolated pitches. Second, 1 semi- tone is the smallest meaningful difference in Western music, as well as the smallest difference specified in standard musical notation. Per- formers may use smaller pitch deviations for expressive purposes, but no musical culture makes systematic use of intervals smaller than a semitone (Burns, 1999).

Thus, contrary to scholarly wisdom, adults with little musical back- ground retain fine-grained information about pitch level over extended periods. This finding advances the case that music listeners construct precise memory representations of music that include absolute as well as relational features (Dowling, 1999). It also demystifies aspects of AP such as its rarity, its bimodal distribution, and the reported critical pe- riod for AP acquisition. Once pitch-naming or reproduction require- ments are eliminated and familiar materials are used, memory for specific pitch levels seems to be widespread and normally distributed. It is likely that pitch naming rather than pitch memory underlies much about AP, including its apparent bimodal distribution. Pitch naming may be an all-or-none ability, but pitch memory is not. Similarly, the unique pattern of cortical activity in AP possessors (Hirata, Kuriki, & Pantev, 1999; Ohnishi et al., 2001) may reflect distinctive auditory-verbal associa- tions—a consequence of naming—rather than distinctive pitch process- ing (Zatorre, Perry, Beckett, Westbury, & Evans, 1998).

Although our findings are consistent with some previous accounts, they are notable for demonstrating considerably greater accuracy in pitch memory. For example, Levitin (1994) reported that 44% of adults’ sung performances were within 2 semitones of the original re- cording on both of two test trials. He quantized responses to the near- est semitone, however, which means that 44% of his participants were within 2.5 semitones of the original pitch. He also ignored pitch height (e.g., Cs in different octaves were considered equivalent), such that performance could deviate from the original by no more than 6 semi- tones. In effect, more than half of his participants were performing at chance levels (deviations of 3 semitones or more) on at least one of two trials. Our more sensitive measure of pitch memory avoids these and other limitations of production tasks.

If repeated exposure to a recording enhances memory for its pitch level, what accounts for the weak associations between exposure and pitch memory? First, self-reports of television viewing may be inaccu- rate. Second, individual differences associated with amount of television viewing (e.g., motivation) would complicate matters, as would factors that influence performance on standard AP tasks, such as timbre, pitch register, and pitch class (Takeuchi & Hulse, 1993). The complex associ- ation between exposure and pitch memory is exemplified by results for “The Simpsons,” which had greater exposure than any other program (Table 1), the strongest association between accuracy and individual dif- ferences in exposure, and the poorest overall levels of performance. “The Simpsons” differs from the other familiar programs in that its theme music incorporates upward and downward pitch shifts (transposi- tions) in 2-semitone steps, which could obscure differences between the original pitch and shifts of 2 semitones or less.

We are not suggesting that the pitch-perception skills of typical college students are equivalent to those of musicians with or without AP. In gen- eral, pitch memory and the perception of pitch relations are a function of musical experience (Krumhansl, 2000). Musical training results in en- hanced representation of musical features, which is reflected not only in superior performance on tests of explicit musical knowledge (e.g., naming tones, intervals, or keys; playing an instrument), but also in differential neural processing (Ohnishi et al., 2001; Pantev et al., 1998; Schneider et al., 2002). For tasks that do not depend on explicit knowledge, however, nonmusicians’ abilities are surprisingly similar to those of musicians. As unfamiliar melodies unfold, trained and untrained listeners have similar expectancies about which tones will follow (Schellenberg, 1996), as do listeners of different ages (Schellenberg, Adachi, Purdy, & McKinnon, 2002). Our measure of memory for pitch level appears to be another test of implicit musical knowledge that is unrelated to musical training.

PSYCHOLOGICAL SCIENCE

Good Pitch Memory Is Widespread

266

VOL. 14, NO. 3, MAY 2003

The hypothesized shift from absolute to relative pitch processing in early childhood (Saffran & Griepentrog, 2001; Takeuchi & Hulse, 1993) is at odds with our results and with considerable evidence of relative-pitch processing in infancy (Trehub, 2000). The discrepancy in experimental re- sults could stem from the divergent procedures used for evaluating pitch processing in children and adults. Aside from the criteria for AP being ap- plied less stringently to children than to adults, isolated pure tones—the stimuli of choice for adults—are used rarely with children. For example, children’s reproduction of songs at a consistent pitch level is often offered as evidence of AP (Takeuchi & Hulse, 1993), but similar adult skills (Bergeson & Trehub, 2002; Halpern, 1989) are regarded as

residual AP

or

pseudo-AP

(Takeuchi & Hulse, 1993; Ward, 1999) rather than genuine AP. In short, the absolute-to-relative shift in pitch processing may be ex- aggerated or absent altogether.

On the one hand, adults outperform children on relative-pitch tasks (Schellenberg & Trehub, 1996), and adults with AP identify musical tones more accurately than do children with AP (Miller & Clausen, 1997). On the other hand, preschoolers outperform older children and adults on the acquisition and retention of labels for specific pitches (Crozier, 1997). In this respect, preschoolers are more like older autistic or developmentally delayed individuals whose cognitive inflexibility, language limitations, or focus on local rather than global details may fa- cilitate the acquisition of pitch labels (Heaton et al., 1998, 1999). The prolonged critical period for the acquisition of AP among developmen- tally delayed children (Lenhoff et al., 2001b) could stem from similar factors. It is intriguing that the critical period for acquiring AP among normally developing children (before age 6 or 7) is the optimal age range for achieving nativelike phonological proficiency in a second lan- guage (Flege & Fletcher, 1992). The attentional and cognitive profile of young children may be ideally suited to rote learning, sound reproduc- tion, and the acquisition of word-object or pitch-name associations. Why some children with early musical training acquire AP, as usually defined, and others do not may stem from genetic variations in associa- tive abilities and from unidentified environmental factors.

In conclusion, adults with little explicit knowledge of music differ from musicians with AP, who can label isolated tones, and from musi- cians without AP, who can label isolated intervals. Nonetheless, the average person has rich representations of familiar music that include implicit memory for pitch level.

REFERENCES

Baharloo, S., Johnston, P.A., Service, S.K., Gitschier, J., & Freimer, N.B. (1998). Absolute pitch: An approach for identification of genetic and nongenetic components.

Ameri- can Journal of Human Genetics

,

62

, 224–231. Baharloo, S., Service, S.K., Risch, N., Gitschier, J., & Freimer, N.B. (2000). Familial ag-

gregation of absolute pitch.

American Journal of Human Genetics

,

67

, 755–758. Benguerel, A.-P., & Westdal, C. (1991). Absolute pitch and the perception of sequential

musical intervals.

Music Perception

,

9

, 105–120. Bergeson, T.R., & Trehub, S.E. (2002). Absolute pitch and tempo in mothers’ songs to in-

fants.

Psychological Science

,

13

, 72–75. Burns, E.M. (1999). Intervals, scales, and tuning. In D. Deutsch (Ed.),

The psychology of music

(2nd ed., pp. 215–264). San Diego, CA: Academic Press. Crozier, J.B. (1997). Absolute pitch: Practice makes perfect, the earlier the better.

Psychol- ogy of Music

,

25

, 110–119.

Acknowledgments—This research was supported by the Natural Sciences and Engineering Research Council of Canada. We thank Keira Stockdale and Will Huggon for assistance in stimulus preparation and data collection, and Mari Jones, Morris Moscovitch, Bruce Schneider, Laurel Trainor, Bill Thompson, and Lawrence Ward for helpful comments on an earlier draft.

Dowling, W.J. (1999). Development of music perception and cognition. In D. Deutsch (Ed.),

The psychology of music

(2nd ed., pp. 603–625). San Diego, CA: Academic Press. Drayna, D., Manichaikul, A., de Lange, M., Snieder, H., & Spector, T. (2001). Genetic

correlates of musical pitch recognition in humans.

Science

,

291

, 1969–1972. Flege, J.E., & Fletcher, K.L. (1992). Talker and listener effects on degree of perceived for-

eign accent.

Journal of the Acoustical Society of America

,

91

, 370–389. Goshen-Gottstein, Y., Moscovitch, M., & Melo, B. (2000). Intact implicit memory for

newly formed verbal associations in amnesic patients following single study trials.

Neuropsychology

,

14

, 570–578. Halpern, A.R. (1989). Memory for the absolute pitch of familiar songs.

Memory & Cogni- tion

,

17

, 572–581. Heaton, P., Hermelin, B., & Pring, L. (1998). Autism and pitch processing: A precursor for

savant musical ability?

Music Perception

,

15

, 291–305. Heaton, P., Pring, L., & Hermelin, B. (1999). A pseudo-savant: A case of exceptional mu-

sical splinter skills.

Neurocase

,

5

, 503–509. Hirata, Y., Kuriki, S., & Pantev, C. (1999). Musicians with absolute pitch show distinct

neural activities in the auditory cortex.

NeuroReport

,

10

, 999–1002. Krumhansl, C.L. (2000). Rhythm and pitch in music cognition.

Psychological Bulletin

,

126

, 159–179. Lenhoff, H.M., Perales, O., & Hickok, G. (2001a). Absolute pitch in Williams syndrome.

Music Perception

,

18

, 491–503. Lenhoff, H.M., Perales, O., & Hickok, G. (2001b). Preservation of a normally transient critical

period in a cognitively impaired population: Window of opportunity for acquiring abso- lute pitch in Williams syndrome. In C.A. Shaw & J.C. McEachern (Eds.),

Toward a the- ory of neuroplasticity

(pp. 275–287). Philadelphia: Psychology Press. Levitin, D.J. (1994). Absolute memory for musical pitch: Evidence from the production of

learned melodies.

Perception & Psychophysics

,

56

, 414–423. Levitin, D.J., & Cook, P.R. (1996). Memory for musical tempo: Additional evidence that

auditory memory is absolute.

Perception & Psychophysics

,

58

, 927–935. Lockhead, G.R., & Byrd, R. (1981). Practically perfect pitch.

Journal of the Acoustical So- ciety of America

,

70

, 387–389. Miller, L.K., & Clausen, H. (1997). Pitch identification in children and adults: Naming and

discrimination.

Psychology of Music

,

25

, 4–17. Miyazaki, K. (1988). Musical pitch identification by absolute pitch possessors.

Perception & Psychophysics

,

44

, 501–512. Miyazaki, K. (1995). Perception of relative pitch with different references: Some absolute-

pitch listeners can’t tell musical interval names.

Perception & Psychophysics

,

57

, 962–970.

Mottron, L., Peretz, I., Belleville, S., & Rouleau, N. (1999). Absolute pitch in autism: A case study.

Neurocase

,

5

, 485–501. Ohnishi, T., Matsuda, H., Asada, T., Aruga, M., Hirakata, M., Nishikawa, M., Katoh, A., &

Imabayashi, E. (2001). Functional anatomy of musical perception in musicians.

Ce- rebral Cortex

,

11

, 754–760. Palmer, C., Jungers, M.K., & Jusczyk, P.W. (2001). Episodic memory for musical prosody.

Journal of Memory and Language

,

45

, 526–545. Pantev, C., Oostenveld, R., Engelien, A., Ross, B., Roberts, L.E., & Hoke, M. (1998). In-

creased auditory cortical representation in musicians.

Nature

,

392

, 811–814. Reber, A.S., & Allen, R. (2000). Individual differences in implicit learning: Implication for

the evolution of consciousness. In R.G. Kunzendorf & B. Wallace (Eds.),

Advances in consciousness research: Vol. 20. Individual differences in conscious experience

(pp. 227–247). Amsterdam: John Benjamins. Saffran, J.R., & Griepentrog, G.J. (2001). Absolute pitch in infant auditory learning: Evi-

dence for developmental reorganization.

Developmental Psychology

,

37

, 74–85. Schellenberg, E.G. (1996). Expectancy in melody: Tests of the implication-realization

model.

Cognition

,

58

, 75–125. Schellenberg, E.G., Adachi, M., Purdy, K.T., & McKinnon, M.C. (2002). Expectancy in

melody: Tests of children and adults.

Journal of Experimental Psychology: General

,

131

, 511–537. Schellenberg, E.G., Iverson, P., & McKinnon, M.C. (1999). Name that tune: Identifying popu-

lar recordings from brief excerpts.

Psychonomic Bulletin & Review

,

6

, 641–646. Schellenberg, E.G., & Trehub, S.E. (1996). Children’s discrimination of melodic intervals.

Developmental Psychology

,

32

, 1039–1050. Schneider, P., Scherg, M., Dosch, H.G., Specht, H.J., Gutschalk, A., & Rupp, A. (2002).

Morphology of Heschl’s gyrus reflects enhanced activation in the auditory cortex of musicians.

Nature Neuroscience

,

5

, 688–694. Takeuchi, A.H., & Hulse, S.H. (1993). Absolute pitch.

Psychological Bulletin

,

113

, 345– 361.

Tillman, B., Bharucha, J.J., & Bigand, E. (2000). Implicit learning of tonality: A self-orga- nizing approach.

Psychological Review

,

107

, 885–913. Trehub, S.E. (2000). Human processing predispositions and musical universals. In N.L.

Wallin, B. Merker, & S. Brown (Eds.),

The origins of music (pp. 427–448). Cam- bridge, MA: MIT Press.

Ward, W.D. (1999). Absolute pitch. In D. Deutsch (Ed.), The psychology of music (2nd ed., pp. 265–298). San Diego, CA: Academic Press.

Young, R.L., & Nettlebeck, T. (1995). The abilities of a musical savant and his family. Journal of Autism and Developmental Disorders, 25, 231–248.

Zatorre, R.J., Perry, D.W., Beckett, C.A., Westbury, C.F., & Evans, A.C. (1998). Func- tional anatomy of musical processing in listeners with absolute and relative pitch. Proceedings of the National Academy of Sciences, USA, 95, 3172–3177.

(RECEIVED 6/3/02; REVISION ACCEPTED 8/31/02)

Order Solution Now

Similar Posts