Strike a pose. Computers are watching

NYU (US) — A crowd-sourcing music video is helping computer vision scientists give eyesight to machines.

Microsoft’s Kinect for Xbox 360, which detects poses in order for game play to be controlled using only the body, is an example of a current application of computer vision technology. But for a computer to truly mimic the human vision system, it must reliably detect specific objects or individuals under a variety of conditions—poor lighting, cluttered backgrounds, unusual clothing, and other sources of variation.

In building such a system, developers need an algorithm that performs “pose estimation”—computer recognition of individuals or objects based on their positioning. To succeed at pose estimation, a computer must draw from a large database of people or objects in a variety of poses—after detecting a certain pose in its field of vision, it draws on its vast database of images to find a match.

Individual frames from the band C-Mon & Kypski’s recent video for its song “More is Less” served as a unique visual database for the researchers’ work to develop computer vision technology. (Video images courtesy of C-Mon & Kypski)

“If we had many examples of people in similar pose, but under differing conditions, we could construct an algorithm that matches based on pose and ignores the distracting information—lighting, clothing, and background,” says Graham Taylor, a post-doctoral fellow at New York University (NYU). “But how do we collect such data?”

Departing from traditional data-collection methods, the NYU team turned to Dutch progressive-electro band C-Mon and Kypski and its video crowd-sourcing project One Frame of Fame. The band asks fans to replace one frame of the music video for the song “More or Less” with a capture from their webcams.

In the project, a visitor to the band’s website is shown a single frame of the video and asked to perform an imitation in front of the camera. The new contribution is spliced into the video that updates once an hour.

“This turned out to be the perfect data source for developing an algorithm that learns to compute similarity based on pose,” explains Taylor. “Armed with the band’s data and a few machine learning tricks up our sleeves, we built a system that is highly effective at matching people in similar pose but under widely different settings.”

The research team will present its findings at the 24th IEEE Conference on Computer Vision and Pattern Recognition (June 21-23) in Colorado Springs.

More news from NYU: