140 million tweets capture COVID-19’s spread

"In a future scenario, having this data will allow researchers to be better prepared and build systems to detect community transmission, and devise interventions to not be in the current position we are now," says Juan Banda. (Credit: Getty Images)

A dataset of more than 140 million COVID-19 tweets could help represent the spread and effects of the global coronavirus pandemic.

It’s publicly available as a resource for the global research community.

“We are all on the same planet together, and any additional data that could be easily available for other researchers to analyze can make the difference.”

The work is part of research that collects and tracks social media chatter to clarify mobility patterns during natural disasters. This rare step of making the work public before the results are finalized highlights the unprecedented threat posed during the global pandemic.

Juan Banda, assistant professor of computer science at Georgia State University, is heading up the project and working with epidemiologists and data scientists. The researchers will update the dataset every two days and could have wide-reaching implications.

A word cloud shows the most common two-word phrases from the tweets, including "coronavirus outbreak," "coronavirus covid19," and "coronavirus cases"
This graphic shows the most common bigrams (two words that appear together) in the tweets. (Credit: Georgia State U.)

“It was a big decision to make to release the data before having a few papers prepared on it, but it is for the common good,” says Banda.

“We are all on the same planet together, and any additional data that could be easily available for other researchers to analyze can make the difference. I am a big believer in open science, and this is definitely a time where it’s important to have the greatest number of eyes on the research.”

The work provides unique insight into the outbreak, including information on travel, displacement, diagnoses, treatment, and a historical record of the timing. Banda is collaborating with Gerardo Chowell, a professor of mathematical epidemiology and chair of the department of population health sciences in the School of Public Health. Chowell says the work can identify how people are getting and using information on social media.

“This dataset,” Chowell says, “will allow researchers to investigate the spread of misinformation relating to COVID-19, study the change in population behaviors and sentiments as the virus spreads in different geographic areas, and quantify the effects of social distancing efforts and changes in human mobility patterns over course of the pandemic.”

The research team began collecting tweets dedicated to coronavirus on March 10. They have collected millions of impressions that could help scientists identify clues they might otherwise overlook. Chowell and Banda used similar research to identify patterns during the recent global Zika outbreak.

“These data provide another view of the pandemic’s impact,” says Banda. “While most efforts are focused on infection rates, hospitalizations and death toll for epidemiological use, our dataset can be used to measure from where people are getting their information (or disinformation) and gauge the sentiment of people with respect to the measures our government is taking, and more.”

So far, researchers have been collecting close to 4.5 million tweets every day. This is part of a revolution in data collection and computer science that offers new ways to track people living through a pandemic in real-time, something that wasn’t possible even 10 years ago.

As scientists around the world work to reduce the toll from the outbreak, Banda and his team hope the work can improve future outcomes and even encourage the public to change behavior.

“Indirectly, by being able to tackle sources of disinformation and highlight instances of people not following rules, I believe we can get everybody to do their part in flattening the curve,” says Banda.

“In a future scenario, having this data will allow researchers to be better prepared and build systems to detect community transmission, and devise interventions to not be in the current position we are now.”

Additional data came from researchers at the University of Missouri, the Universitat Autònoma de Barcelona, Carl von Ossietzky Universität Oldenburg, and the Universität Duisburg-Essen.

Source: Georgia State University