5 tactics for dumping digital trash

JOHNS HOPKINS (US) — The “reduce, reuse, and recycle” philosophy may help manage garbage both in the real and digital worlds, say researchers.

Inside most computers there’s a wasteland where old, rarely used, and unneeded files pile up. Such data depletes precious storage space, bogs down system efficiency, and saps energy. Like household garbage, it has to go.

Johns Hopkins University computer scientists Ragib Hasan and Randal Burns suggest familiar “green” solutions to this digital waste data problem: reduce, reuse, recycle, recover and, only if necessary, dispose.

As with real world trash, researchers say, the best approach to deal with digital garbage is not to generate garbage at all. The worst is to simply throw it away. (Credit: Ragib Hasan and Randal Burns, JHU)


“In everyday life, ‘waste’ is something we don’t need or don’t want or can’t use anymore, so we look for ways to reuse it, recycle it or get rid of it,” says Hasan, an adjunct assistant professor. “We decided to apply the same concepts to the waste data that builds up inside of our computers and storage devices.”

Hasan and Burns, an associate professor, first needed to figure out what kind of computer data might qualify as “waste.” They settled on four categories:

  • Unintentional waste data, created as a side effect or by-product of a process, with no purpose.
  • Used data, which has served its purposes and is no longer useful to the owner.
  • Degraded data, which has deteriorated to a point where it is no longer useful.
  • Unwanted data, which was never useful to the computer user in the first place.

The researchers found no shortage of files and computer code that fit into these categories.

“Our everyday data processing activities create massive amounts of data,” the researchers write in a paper posted on the scholarly website arXiv. “Like physical waste and trash, unwanted and unused data also pollutes the digital environment. . . . We propose using the lessons from real life waste management in handling waste data.”

A user may not even be aware that much of this binary waste is piling up and impairing a computer’s efficiency.

“If you have a lot of debris in the street, traffic slows down,” Hasan says. “And if you have too much waste data in your computer, your applications may slow down because they don’t have the space they require.”

Even though data storage devices have become less expensive, Hasan says, hard drives can still run out of room. In addition, Flash-based systems, such as memory cards, possess a limited number of write-erase cycles, and frequent deleting of waste data can shorten their lifespan.

How then can the clutter inside computers be curbed? Hasan and Burns devised a five-tier pyramid of options, inspired by real-world waste reduction tactics:

Reduce: At the top of the pyramid, the most preferred option is to cut back on the amount of waste data that flows into a computer to begin with. This can be done, the researchers say, by encouraging software makers to design programs to leave fewer unneeded files behind after a program is installed. To coax the software makers to comply, computers could be set up to “punish” programs that do excessive data dumping; such programs would be forced to run more slowly.

Reuse: Software makers also could break their complex strings of code into smaller modules that could serve double-duty. If two programs are found to utilize identical modules, one might be eliminated in a process called “data deduplication.” This is the second-best option in the waste-management pyramid, the researchers say.

Recycle: Just as discarded plastic can be refashioned into new soda bottles, some files could be repurposed. For example, when old software is about to be removed, a computer could look for useful pieces of the program that could be put to work in other applications.

Recover: Even when waste data can’t be reused or recycled, digital leftovers might yield information worth studying after private identification details are removed. In their paper, the researchers suggest that “obsolete data can also be mined to gather patterns about historical trends.”

Dispose: Sitting at the bottom of the pyramid, this is the least desirable option, the researchers say, and the messiest, when you consider the energy used to completely eliminate old files or the real-world pollution created when one destroys an old hard drive or other form of storage media. One solution, however, the scientists say, could be a “digital landfill.” This could be accomplished with a “semi-volatile storage device” that would provide a temporary home to data that is designed to automatically fade away over time, freeing up space for more virtual refuse.

Hasan acknowledges that most computer users don’t give much thought to the clutter piling up in their laptops, particularly when extra storage media and devices are relatively cheap. But he pointed out that more users are moving toward cloud computing, in which files are sent over the Internet to a site where an enormous number of them can be stored. As this continues, central storage sites could find themselves drowning in waste data.

“Someday, this could become a problem as we begin using up these storage resources,” Hasan says. “Maybe we should start talking about it now.”

The work was supported in part by a National Science Foundation grant.

More news from Johns Hopkins University: http://releases.jhu.edu/