Annotations are teaching algorithms about classical music

(Credit: Getty Images)

The composer Johann Sebastian Bach left behind an incomplete fugue upon his death, either as an unfinished work or perhaps as a puzzle for future composers to solve.

A new classical music dataset—which lets machine learning algorithms learn the features of classical music from scratch—raises the likelihood that a computer could expertly finish the job.

“…we’re interested in what makes music appealing to the ears, how we can better understand composition, or the essence of what makes Bach sound like Bach.”

MusicNet is the first publicly available large-scale classical music dataset with curated fine-level annotations. It’s designed to allow machine learning researchers and algorithms to tackle a wide range of open challenges—from note prediction to automated music transcription to offering listening recommendations based on the structure of a song a person likes, instead of relying on generic tags or what other customers have purchased.

“At a high level, we’re interested in what makes music appealing to the ears, how we can better understand composition, or the essence of what makes Bach sound like Bach. It can also help enable practical applications that remain challenging, like automatic transcription of a live performance into a written score,” says Sham Kakade, an associate professor of computer science and engineering and of statistics at the University of Washington.

“We hope MusicNet can spur creativity and practical advances in the fields of machine learning and music composition in many ways,” he says.

Breaking down classical music

Described in a paper available on arXiv, MusicNet is a collection of 330 freely licensed classical music recordings with annotated labels that indicate the exact start and stop time of each individual note, what instrument plays the note, and its position in the composition’s metrical structure. It includes more than 1 million individual labels from 34 hours of chamber music performances that can train computer algorithms to deconstruct, understand, predict, and reassemble components of classical music.

“The music research community has been working for decades on hand-crafting sophisticated audio features for music analysis. We built MusicNet to give researchers a large labelled dataset to automatically learn more expressive audio features, which show potential to radically change the state-of-the-art for a wide range of music analysis tasks,” says Zaid Harchaoui, an assistant professor of statistics.

Dynamic time warping

It’s similar in design to ImageNet, a public dataset that revolutionized the field of computer vision by labeling basic objects—from penguins to parked cars to people—in millions of photographs. This vast repository of visual data that computer algorithms can learn from has enabled huge strides in everything from image searching to self-driving cars to algorithms that recognize your face in a photo album.

Algorithm can describe what’s in your photos

“An enormous amount of the excitement around artificial intelligence in the last five years has been driven by supervised learning with really big datasets, but it hasn’t been obvious how to label music,” says lead author John Thickstun, a doctoral student in computer science and engineering.

“You need to be able to say from 3 seconds and 50 milliseconds to 78 milliseconds, this instrument is playing an A. But that’s impractical or impossible for even an expert musician to track with that degree of accuracy.”

The research team overcame that challenge by applying a technique called dynamic time warping—which aligns similar content happening at different speeds—to classical music performances. This allowed them to synch a real performance, such as Beethoven’s ‘Serioso’ string quartet, to a synthesized version of the same piece that already contained the desired musical notations and scoring in digital form.

Time warping and mapping that digital scoring back onto the original performance yields the precise timing and details of individual notes that make it easier for machine learning algorithms to learn from musical data.

In their paper, the research team tested the ability of some common end-to-end deep learning algorithms used in speech recognition and other applications to predict missing notes from compositions. They are making the dataset publicly available so machine learning researchers and music hobbyists can adapt or develop their own algorithms to advance music transcription, composition, research, or recommendations.

“No one’s really been able to extract the properties of music in this way, which opens so many opportunities for creative play,” says Kakade.

For instance, imagine asking your computer to make up a performance that’s similar to songs you’ve listened to, or to hum a melody and tell it to make a fugue on command.

“I’m really interested in the artistic opportunities. Any composer who crafts their art with the assistance of a computer—which includes many modern musicians—could use these tools,” says Thickstun. “If the machine has a higher understanding of what they’re trying to do, that just gives the artist more power.”

The Washington Research Foundation and the Canadian Institute for Advanced Research (CIFAR), where Harchaoui is an associate fellow, funded the research.

Source: University of Washington