Facebook posts alone can predict some 21 diseases and conditions, including diabetes, hypertension, anxiety, and depression, a new study shows.
The study, published in PLOS ONE, includes 999 participants who consented to share their social media posts and medical records. It involved an analysis of approximately 20 million words. The researchers looked at language patterns—words, phrases, clusters of related words—and their statistical association with 21 standard categories of medical record diagnoses indicating conditions.
“Our predictions from language captures diagnosis of diabetes about as well as predictions based on one’s body mass index.”
The researchers used three models to analyze the predictive power for the patients. One model only analyzed Facebook post language, another used demographics such as age and sex, and a third combined the two datasets.
The researchers found that Facebook posts alone predicted all 21 conditions, and for 10 of the conditions Facebook better predicted them in comparison to demographic information.
“Our predictions from language captures diagnosis of diabetes about as well as predictions based on one’s body mass index,” says senior author H. Andrew Schwartz, assistant professor of computer science in the Stony Brook University College of Engineering and Applied Sciences. “We can treat language pattern analogous to a genome and see similar diseases seem to have similar linguistic patterns.”
The method appears to have strong correlations to predicting mental health conditions, such as anxiety, depression, and psychosis in some patients. And with certain diseases, such as diabetes and mental health conditions, Facebook posts can predict disease more often than demographic information.
“Our digital language captures powerful aspects of our lives that are likely quite different from what is captured through traditional medical data,” Schwartz says. “By looking across many medical conditions, we get a view of how conditions relate to each other, which can enable new applications for AI for medicine.”
Some of the Facebook data that was found to be more predictive than demographic data seemed intuitive. For example, the findings show that “drink” and “bottle” are more predictive of alcohol abuse.
Others, however, weren’t as easy. For example, the people that most often mentioned religious language like “God” or “pray” in their posts were 15 times more likely to have diabetes than those who used these terms the least. Additionally, words expressing hostility—like “dumb” and some expletives—served as indicators of drug abuse and psychoses.
“As social media posts are often about someone’s lifestyle choices and experiences or how they’re feeling, this information could provide additional information about disease management and exacerbation,” says lead author Raina Merchant, director of Penn Medicine’s Center for Digital Health and an associate professor of emergency medicine.
Later this year, Merchant will conduct a large trial in which patients will be asked to directly share social media content with their health care provider. This will provide a look into whether managing this data and applying it is feasible, as well as how many patients would actually agree to having their accounts supplement active care.
“One challenge with this is that there is so much data and we, as providers, aren’t trained to interpret it ourselves—or make clinical decisions based on it,” Merchant explains. “To address this, we will explore how to condense and summarize social media data.”