Word embeddings—an algorithmic technique that can map relationships and associations between words—can measure changes in gender and ethnic stereotypes over the past century in the United States.
Researchers analyzed large databases of American books, newspapers, and other texts and looked at how those linguistic changes correlated with actual US Census demographic data and major social shifts such as the women’s movement in the 1960s and the increase in Asian immigration, according to the research.
Artificial intelligence systems and machine-learning algorithms have come under fire recently because they can pick up and reinforce existing biases in our society, depending on what data they are programmed with.
“Word embeddings can be used as a microscope to study historical changes in stereotypes in our society,” says paper coauthor James Zou, an assistant professor of biomedical data science at Stanford University. “Our prior research has shown that embeddings effectively capture existing stereotypes and that those biases can be systematically removed. But we think that, instead of removing those stereotypes, we can also use embeddings as a historical lens for quantitative, linguistic, and sociological analyses of biases.”
“This type of research opens all kinds of doors to us,” says coauthor Londa Schiebinger, history professor at Stanford. “It provides a new level of evidence that allow humanities scholars to go after questions about the evolution of stereotypes and biases at a scale that has never been done before.”
Words, words, words
A word embedding is an algorithm researchers can use, or train, on a collection of text. The algorithm then assigns a geometrical vector to every word, representing each word as a point in space. The technique uses location in this space to capture associations between words in the source text.
“Embeddings are a powerful linguistic tool for measuring subtle aspects of word meaning, such as bias,” says coathor Dan Jurafsky, a professor of linguistics and computer science.
Take the word “honorable.” Using the embedding tool, previous research found that the adjective has a closer relationship to the word “man” than to the word “woman.”
In their new research, the researchers used embeddings to identify specific occupations and adjectives that were biased toward women and particular ethnic groups by decade from 1900 to the present.
Progress toward workplace gender equality has ‘stalled out’
The researchers trained those embeddings on newspaper databases and also used embeddings previously trained by computer science graduate student Will Hamilton on other large text datasets, such as the Google Books corpus of American books, which contains over 130 billion words published during the 20th and 21st centuries.
The researchers compared the biases found by those embeddings to demographical changes in the US Census data between 1900 and the present.
The research findings showed quantifiable shifts in gender portrayals and biases toward Asians and other ethnic groups during the 20th century.
One of the key findings to emerge was how biases toward women changed for the better—in some ways—over time.
For example, adjectives such as “intelligent,” “logical,” and “thoughtful” were associated more with men in the first half of the 20th century. But since the 1960s, the same words have increasingly been associated with women with every following decade, correlating with the women’s movement in the 1960s, although a gap remains.
The research also showed a dramatic change in stereotypes toward Asians and Asian Americans.
Algorithms don’t yet spare us from bias
For example, in the 1910s, words like “barbaric,” “monstrous,” and “cruel” were the adjectives most associated with Asian last names. By the 1990s, those adjectives were replaced by words like “inhibited,” “passive,” and “sensitive.” This linguistic change correlates with a sharp increase in Asian immigration to the United States in the 1960s and 1980s and a change in cultural stereotypes, the researchers says.
Overall, the researchers demonstrated that changes in the word embeddings tracked closely with demographic shifts measured by the US Census.
“The starkness of the change in stereotypes stood out to me,” says electrical engineering graduate student Nikhil Garg, who is lead author of the study. “When you study history, you learn about propaganda campaigns and these outdated views of foreign groups. But how much the literature produced at the time reflected those stereotypes was hard to appreciate.”
The researchers report their findings in the Proceedings of the National Academy of Sciences.
Source: Stanford University