Team uses big data to fight human trafficking
A new tool takes an advantage of the internet for human trafficking—its sheer size—and turns it against criminals.
It can be easy for digital criminals to hide in plain sight—posting escort ads for underaged individuals brazenly online—knowing that there are millions of these ads already with more created every day.
There are far too many for law enforcement to search individually, and there are plenty of simple ways to obscure identifying information from large-scale searches (for example, listing a phone number as “Eight-1-eight, Five 55…”).
Pedro Szekely and Craig Knoblock of the USC Viterbi Information Sciences Institute (ISI) treated it as a big data problem, and created a tool that combs through escort ads—mining, decoding, and organizing the relevant data into an enormous but easily searchable database.
The tool, called DIG (for “Domain-specific Insight Graphs”), allows officers who are searching for a missing child who is believed to be trapped in the escort industry to search by phone number, location, alias—even by photo—and pin down a way to reach them.
“The internet contains seemingly limitless information, but we’re constrained by our ability to search that information and come up with meaningful results. DIG solves that problem,” says Szekely, research associate professor at ISI.
DIG is simple enough that it won’t require special training to use. The database it utilizes currently has 50 million web pages, 2 billion records, and grows at a rate of roughly 5,000 web pages per hour.
“As the database continues to grow, DIG will be able to uncover new connections and patterns in the data, making it even more useful,” says Knoblock, research professor and director of information integration at ISI.
The funding and initiative to create DIG came from Memex, a Defense Advanced Research Projects Agency (DARPA) program aimed at developing the next generation of internet search tools in hopes of helping law enforcement agencies fight online human trafficking.
The code for DIG is open-source—and therefore free to law enforcement agencies—and will be upgraded quarterly over the course of the three-year project, which began in 2014.
For example, Szekely and Knoblock plan to improve DIG so that it automatically flags potential victims and identifies trafficking rings through their ads and the victims under their control.
The DIG project at ISI is six months into a three-year project. Szekely and Knoblock are now seeing how else it might be useful—including analyzing papers on material science research and building a comprehensive database of the competitive space of private companies using the data on the web.