Why big data can’t predict the next armed conflict

The expectation that big data alone will be enough to predict armed conflict is unrealistic, according a recent Science essay coauthored by Lars-Erik Cederman, professor of international conflict research at ETH Zurich. He explains why in this recent interview with ETH News.

Q

Your research deals with violent conflict, as well as the ability to predict it. How well can armed conflict be predicted in general?

A

The risk of future armed conflict can, in fact, be identified at an early stage. The risk of conflict occurring is high in regions with oppressed ethnic groups, for example. In Syria, for instance, the situation was already known to be volatile long before the civil war broke out.

But conflict is enormously complex. In terms of predictions, conflict research is much like earthquake research, in the sense that scientifically substantiated risk maps can be created. But it is almost impossible to predict whether armed conflict will actually occur in a region and, above all, exactly when and where.

world map of Armed conflict between 1989 and 2015
Armed conflict between 1989 and 2015 (in red and pink; Syria is excluded). (Credit: ETH Zurich/Luc Girardin with data from UCDP, NASA, and ETH Zurich)
Q

What are the difficulties of making such predictions?

A

World history is never a linear sequence of logically ordered events. Instead, it is often erratic and unpredictable. This is especially true today, as demonstrated by Brexit or by the election of US president Donald Trump.

“…simply mapping the events of the previous decades to the future rarely works.”

Predicting the outcome of elections and referendums is already difficult enough, even though they follow established laws and principles. But armed conflict not only occurs much less frequently, it is also much more complex.

Although its likelihood is to some extent based on regularities that can be studied, it hardly obeys established laws or timelines. The following applies now in particular: simply mapping the events of the previous decades to the future rarely works.

Q

Some of your science colleagues are pinning their hopes on data science, believing that intelligent computer algorithms such as those used to analyze social media posts will make it possible to predict conflict in the future.

A

A colleague and I have written about our own opinions on this subject in an essay published in Science. There’s no doubt that data science provides new tools that we can use in conflict research. And I’m convinced that big data can make our predictions even more accurate.

But the optimism being expressed by some, who believe that predictions can be made much more precise and their temporal and spatial reach increased by amassing non-representative, unverified data, is exaggerated in our opinion. That is the main point that we make in our essay.

Q

How, specifically, could conflict research benefit from data science in the future?

A

Media reports are an important source of data in conflict research. Nationalist developments and potential conflict situations, for example, can be identified by analyzing keywords. Most of this is still done manually.

In his research, my colleague Nils Weidmann, who coauthored the essay with me, shows that new developments in data science have made it possible to analyze such data automatically to a certain extent. Software that can interpret the significance of a text could pre-select press articles, for example, in order to speed up the analysis process. This would make it possible to make faster assertions about political developments.

However, some scientists believe that highly complex conflict patterns can also be analyzed in a fully automated way without any significant loss of precision. Their hopes are definitely premature. As conflict researchers, we don’t expect to be out of work any time soon.

Q

Why might fully automated analysis be infeasible?

A

In our experience, computerized analysis is only possible to a limited extent. Software that can prioritize data doesn’t even exist for many of the main languages in our field. What’s more, people are needed to select the media sources.

We also have to take into account that the media in many regions is not independent, and an unconsidered analysis would paint a distorted picture.

When analyzing social media, it is also important to bear in mind that some of the data is of dubious quality. In many regions of the world, especially ones where conflict is highly probable, the internet is censored and only made accessible to a minority.

Q

Are there any other limits?

A

Data can only be analyzed if it is available in the first place. My work covers the situation in Burma, for example, where only few people who live in the jungle there are connected to the internet.

If researchers are interested in the views of the people living there, they have to conduct local surveys. However, certain information can also be obtained in such regions by using computerized methods. In our own research, we use satellite pictures of light emissions to then draw conclusions about economic welfare and inequality. Compared to using official statistics (if any even exist), this method offers the advantage of identifying short-term developments very quickly.

Source: ETH Zurich