For privacy, ‘shuffle’ data on cloud servers

"What we do is we obfuscate the access pattern," Roberto Tamassia says. "It becomes unfeasible for the cloud provider to figure out what the user is doing." (Credit: Andrew/Flickr)

To keep data safe in the cloud, a group of computer scientists suggests doing the Melbourne Shuffle—not the dance move, but the new computer algorithm.

The computing version of the Melbourne Shuffle aims to hide patterns that may emerge as users access data on cloud servers. Patterns of access could provide important information about a dataset—information that users don’t necessarily want others to know—even if the data files themselves are encrypted.

“Encrypting data is an important security measure. However, privacy leaks can occur even when accessing encrypted data,” says Olga Ohrimenko, lead author of a new paper describing the algorithm. “The objective of our work is to provide a higher level of privacy guarantees, beyond what encryption alone can achieve.”


The paper is available on arXiv, an open-access repository for math and computer science papers. Ohrimenko, who recently received her Ph.D. from Brown University and now works at Microsoft Research, co-authored the work with Roberto Tamassia and Eli Upfal, professors of computer science at Brown, and Michael Goodrich from the University of California, Irvine.

Encrypted, but not secure

Cloud computing is increasing in popularity as more individuals use services like Google Drive and more companies outsource their data to companies like Amazon Web Services. As the amount of data on the cloud grows, so do concerns about keeping it secure.

Most cloud service providers encrypt the data they store. Larger companies generally encrypt their own data before sending it to the cloud to protect it not only from hackers but also to keep cloud providers themselves from snooping around in it.

But while encryption renders data files unreadable, it can’t hide patterns of data access. Those patterns can be a serious security issue. For example, a service provider—or someone eavesdropping on that provider—might be able to figure out that after accessing files at certain locations on the cloud server, a company tends to come out with a negative earnings report the following week. Eavesdroppers may have no idea what’s in those particular files, but they know that it’s correlated to negative earnings.

But that’s not the only potential security issue.

“The pattern of accessing data could give away some information about what kind of computation we’re performing or what kind of program we’re running on the data,” says Tamassia, chair of the department of computer science.

Some programs have very particular ways in which they access data. By observing those patterns, someone might be able to deduce, for example, that a company seems to be running a program that processes bankruptcy proceedings.

Shuffle security

The Melbourne Shuffle aims to hide those patterns by shuffling the location of data on cloud servers. Ohrimenko named it after a dance that originated in Australia, where she did her undergraduate work.

“The contribution of our paper is specifically a novel data shuffling method that is provably secure and computationally more efficient than previous methods,” Ohrimenko says.

It works by pulling small chunks of data down from the cloud and placing them in a user’s local memory. Once the data is out of view of the server’s prying eyes, it’s rearranged—shuffled like a deck of cards—and then sent back to the cloud server. By doing this over and over with new blocks of data, the entirety of the data on the cloud is eventually shuffled.

The result is that data accessed in one spot today, may be in a different spot tomorrow. So even when a user accesses the same data over and over, that access pattern looks to the server or an eavesdropper to be essentially random.

“What we do is we obfuscate the access pattern,” Tamassia says. “It becomes unfeasible for the cloud provider to figure out what the user is doing.”

The researchers envision deploying their shuffle algorithm through a software application or a hardware device that users keep at their location. It could also be deployed in the form of a tamper-proof chip controlled by the user and installed at the data center of the cloud provider.

Source: Brown University