U. ILLINOIS (US) — By analyzing the overall tone and geographic references of large amounts of global news media, supercomputers could help forecast human behavior.
“There’s been a huge amount of research coming out of the business literature on the power of news tone to predict economic behavior, yet there hasn’t been as much work in using it to predict social behavior,” says University of Illinois researcher Kalev Leetaru, whose finding are reported in the journal First Monday.
Leetaru combined three massive news archives totaling more than 100 million articles worldwide to explore the global consciousness of the news media. The complete New York Times, the unclassified edition of Summary of World Broadcasts, and an archive of English-language Google News articles were used to capture a cross-section of the U.S. media spanning half a century, and over a quarter-century of global media.
Leetaru used a supercomputer called Nautilus at the National Institute for Computational Sciences in Tennessee to work with this massive data set.
Using a range of advanced analysis techniques, Leetaru was able to produce a 2.4-petabyte (over 2 million gigabytes) network containing more than 10 billion people, places, things, and activities connected by over 100 trillion relationships.
Leetaru divided the data into workable amounts and allowed Nautilus to uncover patterns in these subsets. When patterns emerged, he was able to examine these in detail using three key data mining techniques: tone mining, full-text geocoding, and network analysis.
Tone mining creates a numeric measure of overall tone in a document. An algorithm counts the number of “positive” and “negative” words that appear and assigns a corresponding value, noting that “loathe” is more negative than “dislike.”
Geocoding uses algorithms that examine the text of a news article for possible location references, disambiguate them (does this document reference Cairo, Egypt, or Cairo, Illinois?), and output an approximate geographic coordinate that can be displayed on a map.
A third technique, network analysis, shows how the global media groups countries together.
Arguably one of the most unexpected findings highlighted in Leetaru’s paper focuses on using news to map the movements of Osama bin Laden. Leetaru was able to estimate the militant leader’s hiding place as a 200-kilometer radius in Northern Pakistan, including Abbottabad where he was ultimately found.
Leetaru was also able to use news to retroactively forecast revolutions in Egypt, Tunisia, and Lybia and dissect the underpinnings.
“Certainly Tunisia played a huge role in pushing Egypt over the edge, but if you trace out the tonal curve of news concerning Egypt, you see that the real down spiral didn’t happen until after January first, the day of the Coptic Church bombing.”
Using network analysis techniques, Leetaru used both the SWB and New York Times archives to create world “civilizations,” essentially groups of countries that the news media tends to group together.
While the SWB news led to seven civilizations, the Times news led to only five groups and a far greater portion of countries grouped with America.
“Each country’s media will depict the world differently. It’s a standard principle of journalism—you write for your audience,” says Leetaru. “Still, it vividly reinforces that what we get here in the U.S. is a very U.S.-centric view of the world.”
Leetaru emphasizes that while these are captivating findings, the real goal of his work is to encourage further study.
“I see it as diving beneath the ocean—we’ve been so focused on the surface that we’re only just beginning to start exploring the entire new world that’s underneath.”
The National Science Foundation supported this project.
More news from the University of Illinois: www.ichass.illinois.edu/