It’s easy to capture video with smartphones, GoPro cameras, and Google Glass, but viewing it can get boring. A new video highlighting technique can automatically pick out the interesting parts.
Called LiveLight, this method constantly evaluates action in the video, looking for visual novelty and ignoring repetitive or eventless sequences, to create a summary that lets a viewer get the gist of what happened.
It basically produces a miniature video trailer. Although not yet comparable to a professionally edited video, it can help people quickly review a long video of an event, a security camera feed, or video from a police cruiser’s windshield camera.
One potential application would be using LiveLight to automatically digest videos from GoPro or Google Glass, for example, and quickly upload thumbnail trailers to social media.
The summarization process avoids generating costly internet data charges and tedious manual editing on long videos. This application, along with the surveillance camera auto-summarization, is now being developed for the retail market by PanOptus Inc., a startup founded by the inventors of LiveLight.
The LiveLight video summary occurs in “quasi-real-time,” with just a single pass through the video. It’s not instantaneous, but it doesn’t take long—LiveLight might take 1 to 2 hours to process one hour of raw video and can do so on a conventional laptop.
With a more powerful backend computing facility, production time can be shortened to mere minutes, according to the researchers.
Making a ‘dictionary’
Eric P. Xing, professor of machine learning, and Bin Zhao, a PhD student in the machine learning department, presented their work on June 26 at the Computer Vision and Pattern Recognition Conference in Columbus, Ohio.
“The algorithm never looks back,” says Zhao, whose research specialty is computer vision. Rather, as the algorithm processes the video, it compiles a dictionary of its content.
The algorithm then uses the learned dictionary to decide in a very efficient way if a newly seen segment is similar to previously observed events, such as routine traffic on a highway. Segments thus identified as trivial recurrences or eventless are excluded from the summary.
Novel sequences not appearing in the learned dictionary, such as an erratic car, or a traffic accident, would be included in the summary.
Though LiveLight can produce these summaries automatically, users can also participating in compiling the summary. In that instance, Zhao says LiveLight provides a ranked list of novel sequences for a human editor to consider for the final video.
In addition to selecting the sequences, a human editor might choose to restore some of the footage deemed worthless to provide context or visual transitions before and after the sequences of interest.
“We see this as potentially the ultimate unmanned tool for unlocking video data,” Xing says. Video has never been easier for the average person to shoot, but reviewing and tagging the raw video remains so tedious that ever larger volumes of video are going unwatched or discarded.
The interesting moments captured in those videos thus go unseen and unappreciated, he adds.
The ability to detect unusual behaviors amidst long stretches of tedious video could also be a boon to security firms that monitor and review surveillance camera video.
Google, the National Science Foundation, the Office of Naval Research, and the Air Force Office of Scientific Research supported the work.
Source: Carnegie Mellon