MegaFace Challenge uncovers big gaps in face recognition

(Credit: iStockphoto)

In the last few years, several groups have announced that their facial recognition systems have achieved near-perfect accuracy rates, performing better than humans at picking the same face out of the crowd.

But those tests were performed on a dataset with only 13,000 images—fewer people than attend an average professional US soccer game. What happens to their performance as those crowds grow to the size of a major US city?

“We can’t just test it on a very small scale and say it works perfectly.”

University of Washington researchers answered that question with the MegaFace Challenge, the world’s first competition aimed at evaluating and improving the performance of face recognition algorithms at the million-person scale.

All of the algorithms suffered in accuracy when confronted with more distractions, but some fared much better than others.

“We need to test facial recognition on a planetary scale to enable practical applications—testing on a larger scale lets you discover the flaws and successes of recognition algorithms,” says Ira Kemelmacher-Shlizerman, an assistant professor of computer science at the University of Washington and the project’s principal investigator.

“We can’t just test it on a very small scale and say it works perfectly.”

Not so perfect after all

The team first developed a dataset with one million Flickr images from around the world that are publicly available under a Creative Commons license, representing 690,572 unique individuals. Then they challenged facial recognition teams to download the database and see how their algorithms performed when they had to distinguish between a million possible matches.

Google’s FaceNet showed the strongest performance on one test, dropping from near-perfect accuracy when confronted with a smaller number of images to 75 percent on the million-person test.

MegaFace results
Facial recognition algorithms that fared well with 10,000 distracting images all experienced a drop in accuracy when confronted with 1 million images. But some performed much better than others. (Credit: University of Washington)

A team from Russia’s N-TechLab came out on top on another test set, dropping to 73 percent.

By contrast, the accuracy rates of other algorithms that had performed well at a small scale dropped by much larger percentages to as low as 33 percent accuracy when confronted with the harder task.

Initial results are detailed in a paper to be presented at the IEEE Conference on Computer Vision and Pattern Recognition June 30, and ongoing results are updated on the project website. More than 300 research groups are working with MegaFace.

iPhone security

The MegaFace challenge tested the algorithms on verification, or how well they could correctly identify whether two photos were of the same person. That’s how an iPhone security feature, for instance, could recognize your face and decide whether to unlock your phone instead of asking you to type in a password.

“I’d want certainty that my phone can correctly identify me out of a million people—or 7 billion—not just 10,000 or so.”

“What happens if you lose your phone in a train station in Amsterdam and someone tries to steal it?” says Kemelmacher-Shlizerman. “I’d want certainty that my phone can correctly identify me out of a million people—or 7 billion—not just 10,000 or so.”

They also tested the algorithms on identification, or how accurately they could find a match to the photo of a single individual to a different photo of the same person buried among a million “distractors.”

That’s what happens, for instance, when law enforcement have a single photograph of a criminal suspect and are combing through images taken on a subway platform or airport to see if the person is trying to escape.

“You can see where the hard problems are—recognizing people across different ages is an unsolved problem. So is identifying people from their doppelgängers and matching people who are in varying poses like side views to frontal views,” says Kemelmacher-Shlizerman.

Build a better training set

The paper also analyses age and pose invariance in face recognition when evaluated at scale.

In general, algorithms that “learned” how to find correct matches out of larger image datasets outperformed those that only had access to smaller training datasets. But the SIAT MMLab algorithm developed by a research team from China, which learned on a smaller number of images, bucked that trend by outperforming many others.

The MegaFace challenge is ongoing and still accepting results.

How dogs know a face when they see one

The team’s next steps include assembling a half a million identities—each with a number of photographs—for a dataset that will be used to train facial recognition algorithms. This will help level the playing field and test which algorithms outperform others given the same amount of large scale training data, as most researchers don’t have access to image collections as large as Google’s or Facebook’s. The training set will be released towards the end of the summer.

“State-of-the-art deep neural network algorithms have millions of parameters to learn and require a plethora of examples to accurately tune them,” says Aaron Nech, a computer science and engineering master’s student working on the training dataset. “Unlike people, these models are initially a blank slate. Having diversity in the data, such as the intricate identity cues found across more than 500,000 unique individuals, can increase algorithm performance by providing examples of situations not yet seen.”

The National Science Foundation, Intel, Samsung, Google, and the University of Washington Animation Research Labs funded the project.

Source: University of Washington