How babies (really) learn first words

U. PENN (US) — The leading theory on how children learn their first words may need revision, according to new research.

The current, long-standing theory suggests that children learn their first words through a series of associations; they associate words they hear with multiple possible referents in their immediate environment. Over time, children can track both the words and elements of the environments they correspond to, eventually narrowing down what common element the word must be referring to.

“This sounds very plausible until you see what the real world is like,” says Lila Gleitman, a psychology professor at the University of Pennsylvania. “It turns out it’s probably impossible.”

“The theory is appealing as a simple, brute force approach,” says postdoctoral fellow Tamara Nicol Medina. “I’ve even seen it make its way into in parenting books describing how kids learn their first words.”

The new study by Gleitman and colleagues suggests that learning those first words occurs more in moments of insight rather than gradually through repeated exposure. The findings are published in the journal Proceedings of the National Academy of Sciences.

Real-world learning
Experiments supporting the associative word learning theory generally involve series of pictures of objects, shown in pairs or small groups against a neutral background. The real world, in contrast, has an infinite number of possible referents that can change in type or appearance from instance to instance and may not even be present each time the word is spoken.

A small set of psychologists and linguists, including members of the Penn team, have long argued that the sheer number of statistical comparisons necessary to learn words this way is simply beyond the capabilities of human memory. Even computational models designed to compute such statistics must implement shortcuts and do not guarantee optimal learning.

“This doesn’t mean that we are bad at tracking statistical information in other realms, only that we do this kind of tracking in situations where there are a limited number of elements that we are associating with each other,” says psychology professor John Trueswell. “The moment we have to map the words we hear onto the essentially infinite ways we conceive of things in the world, brute-force statistical tracking becomes infeasible. The probability distribution is just too large.”

Eureka moment
To demonstrate this, researchers conducted three related experiments, all involving short video segments of parents interacting with their children. Subjects, both adults and preschool-aged children, watched these videos with the sound muted except for when the parent said a particular word which subjects were asked to guess; the target word was replaced with a beep in the first experiment and a nonsense placeholder word in the second and third.

The first experiment was designed to determine how informative the vignettes were in terms of connecting the target word to its meaning. If more than half of the subjects could correctly guess the target word, it was deemed High Informative, or HI. If less than a third could, the vignette was deemed Low Informative, or LI. The latter vastly outnumbered the former; of the 288 vignettes, 7 percent were HI and 90 percent were LI, demonstrating that even for highly frequent words, determining the meaning of a word simply from its visual context was quite difficult.

The second experiment involved showing subjects a series of vignettes with multiple target words, all consistently replaced with nonsense placeholders. The researchers carefully ordered the mixture of HI and LI examples to explore the consequences of encountering a highly informative learning instance early or late.

“In past studies of this kind, researchers used artificial stimuli with a small number of meaning options for each word; they also just looked at the final outcome of the experiment: whether you end up knowing the word or not,” Trueswell says. “What we did here was to look at the trajectory of word learning throughout the experiment, using natural contexts that contain essentially an infinite number of meaning options.”

By asking the subjects to guess the target word after each vignette, the research could get a sense of whether their understanding was cumulative or occurred in a “eureka” moment.

The evidence pointed strongly to the latter. Repeated exposure to the target word did not lead to improved accuracy over time, suggesting that previous associations hypotheses were not coming into play.

Moreover, it was only when subjects saw an HI vignette first did the accuracy of their final guesses improve; early HI vignettes provided subjects with the best opportunity to learn the correct word, and most guessed correctly when presented with them. Confirming evidence helped “lock in” the correct meaning for these subjects who started on the right track.

“It’s as though you know when there is good evidence, you make something like an insightful conjecture,” Gleitman says.

However, when subjects saw an LI vignette first they tended to guess incorrectly and, although they revised these guesses throughout the experiment, they were ultimately unable to arrive at the correct meaning. This showed that these subjects had no memory of plausible alternative meanings, including the correct one, from earlier vignettes that they could return to.

Memory failure
The third experiment showed that the inability to hold these incorrect meanings in mind is necessary for how word acquisition likely works. After a delay of a couple days, subjects saw vignettes on the same target word they missed before but showed no evidence of retaining their incorrect assumptions.

“All of those memories go away,” Gleitman says. “And that’s great! It’s the failure of memory that’s rescuing you from remaining wrong for the rest of your life.”

Future work by members of the Penn team will investigate what makes certain interactions more or less informative when it comes to word meaning, as well as the order in which people process visual information in their environment. Both avenues of research could help rewrite textbooks and parenting guides, suggesting that rich interactions with children—and patience—are more important than abstract picture books and drilling.

Jesse Snedeker, a professor at Harvard University, contributed to the work, which was supported by the National Institutes of Health

More news from the University of Pennsylvania: www.upenn.edu/pennnews/

chat3 Comments

You are free to share this article under the Creative Commons Attribution-NoDerivs 3.0 Unported license.


  1. Jill

    I think it may be a combination of both. I remember when my oldest put together groups of things as opposed to a name for a single item. One day, I was telling her “chair” for a dining room chair and could see her puzzled. She touched another chair in the living room and asked what it was called. Then there was that eureka moment – you could see it on her face. She ran around touching all the chairs in the house and telling me “chair”. She was so excited – she had discovered this amazing principle on her own.

  2. Nicholas C. Aliotti

    I enjoyed reading this study. It was not particularly new information for me.Perhaps the researchers are
    young… I am in my late sixties now, This relates to a lot of the research in the sixties related to increasing reading skills, language acquisition etc during
    the Headstart years and early research on learning disabilities(see Dr Alexander Bannatyne excellent book and articles on reading, language acquisition etc). Rebus was a program that used the picture – word association strategy but was not really helpful to children. Language acquisition is a very intuitive,creative, instinctual process. One only uses language that has meaning for them, weather you are 9mos old or an adult. The researchers used a cloze procedure which has been around for a long time.Hopefully you are referencing some of the old teradacty researchers whose studies predate the internet.

  3. Andrew Weiler

    There is no doubt at all in my mind that ” rich interactions with children—and patience—are more important than abstract picture books and drilling” As a language learning specialist it is not surprising to see the poor second language learning results achieved when most language classes and books persist in “abstract picture book and drilling” type of exercises.
    Learning happens when our awareness is lit by our experiences. Awareness enables us to piece together the “bits” that make make sense to us in some sort of whole. So the act of mum bringing a spoon to her child’s mouth and hearing the words “eat this” at some moment coalesces into a meaningful “experience” extracted from a while variety of exposures which Johnny/Betty have been involved in, and s/he end up associating “eat” with you know what!
    That can only happen when there is meaning full experiences that the learner is aware of intimately.

We respect your privacy.