Software mines Twitter for ‘sick’ trends
JOHNS HOPKINS (US) — Sift through 2 billion tweets and you can find a lot of useful public health intelligence on where people are sick, what ails them, and what they’re doing about it.
Computer scientists Mark Dredze and Michael Paul came up with a method that filters and categorizes health-related tweets—an approach that could be used to identify information about other trends from the Twitterverse.
“Our goal was to find out whether Twitter posts could be a useful source of public health information,” says Dredze, an assistant research professor at Johns Hopkins University. “We determined that indeed, they could. In some cases, we probably learned some things that even the tweeters’ doctors were not aware of, like which over-the-counter medicines the posters were using to treat their symptoms at home.”
Dredze and Paul, a doctoral student, fed 2 billion public tweets posted between May 2009 and October 2010 into computers, then used sophisticated software to filter out the 1.5 million messages that referred to health matters. They did not collect identities of the tweeters.
By sorting these health-related tweets into electronic “piles,” Dredze and Paul uncovered intriguing patterns about allergies, flu cases, insomnia, cancer, obesity, depression, pain, and other ailments.
“There have been some narrow studies using Twitter posts, for example, to track the flu,” Dredze says. “But to our knowledge, no one has ever used tweets to look at as many health issues as we did.”
Dredze and Paul will present their study on July 18 in Barcelona, Spain, at the International Conference on Weblogs and Social Media, sponsored by the Association for the Advancement of Artificial Intelligence.
In addition to finding a range of health ailments in Twitter posts, the researchers were able to record many of the medications that ill tweeters consumed, thanks to posts such as: “Had to pop a Benadryl … allergies are the worst.”
Of course, the vast majority of daily tweets have nothing to do with an illness. While a simple approach would be to filter for words that are tied to illness, such as “headache” or “fever,” this strategy fails on such tweets as “High price of gas is a headache for my business” or “Got a case of Bieber Fever. Love his new song.”
To find the health-related posts among the messages in their original pool, the researchers applied a filtering and categorization system they devised. With this tool, computers can be taught to disregard phrases that do not really relate to one’s health, even though they contain a word commonly used in a health context. Once the unrelated tweets were removed, the remaining results provided some surprising findings.
“When we started, I didn’t even know if people talked about allergies on Twitter,” Paul says. “But we found out that they do”
In about 200,000 of the health-related tweets, the researchers were able to draw on user-provided public information to identify the U.S. state from which the message was sent. That allowed them to track some trends by time and place, such as when the allergy and flu seasons peaked in various parts of the country.
“We were able to see from the tweets that the allergy season started earlier in the warmer states and later in the Midwest and the Northeast,” Dredze says.
Dredze and Paul have already begun talking to public health scientists, including some from Johns Hopkins, who say that future studies of tweets could uncover even more useful data, not only about posters’ medical problems but also about public perceptions concerning illnesses, medications, and other health issues.
Still, Dredze and Paul cautioned that trying to take the nation’s temperature by analyzing tweets has its limitations. For one thing, most Twitter users did not comment more than once on their particular ailment, making it tough to track how long the illness lasted and whether it recurred. In addition, most Twitter users tend to be young, which would exclude many senior citizens from a public health study. Also, at the moment, Twitter is dominated by users who are in the United States, making it less useful for research in other countries.
Although social media sites allow users to expose lots of personal information to friends and strangers, Twitter-based research may only reach a certain depth.
“In our study,” Paul says, “we could only learn what people were willing to share. We think there’s a limit to what people are willing to share on Twitter.”
Nevertheless, Dredze says there is still plenty of useful data left to plumb from Twitter posts. “The people I’ve talked to have felt this is a really interesting research tool,” he says, “and they have some great ideas about what they’d like to learn next from Twitter.”
More news from Johns Hopkins University: http://releases.jhu.edu/