Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Are there distributed data stores like this that are also resilient to intentional sabotage?

I've been looking recently at long-term digital preservation systems -- tools designed to archive large amounts of data for decades. This is the Library of Alexandria problem -- how do we preserve all this data we're generating against once-in-a-century disasters?

So this 2005 paper lists thirteen different threats to long-term archives: Media Failure; Hardware Failure; Software Failure; Communication Errors; Failure of Network Services; Media & Hardware Obsolescence; Software Obsolescence; Operator Error; Natural Disaster; External Attack; Internal Attack; Economic Failure; Organizational Failure.[1]

Fault-tolerant distributed data stores are exciting, because they solve a bunch of those problems off the bat -- media failure, hardware failure, communication errors, failure of network services, hardware obsolescence, and natural disaster.

They also help to address software failure, software obsolescence, and economic failure, because archival projects are always strapped for resources and it's great to rely on tools that exist for totally distinct, commercially-valuable reasons.

But that still leaves operator error, external attack, and internal attack -- burning down the Library.

Hence my original question: are there distributed data stores that can be configured to resist intentional destruction of data?

[1] http://www.dlib.org/dlib/november05/rosenthal/11rosenthal.ht...



The problem with internal/external attacks is that we (the society) don't really want to prevent it. The reason is simple: child porn. To date, Bitcoin block chain (and related ideas) is the only data-storage that is 100% resistant to attacks (i.e. changing history), but luckily it cannot handle amounts of data large enough to be viable for child porn (or most other forms of media). Tor, on the other hand, gets a bad rep precisely because it doesn't prevent it (despite its numerous other, beneficial, uses).

The core of the issue is that humans view different information differently (child porn vs. Mona Lisa), whereas for computers, bits are bits and numbers are numbers. As long as child porn remains illegal and socially unacceptable, we'll want to enable attacks on data, i.e. for someone (usually internal operators) to be able to delete some kind of information, corrupt it or at least track it. Of course, this necessarily means that all information stored in the same data-store will be vulnerable.


You're conflating the archival properties of the medium with the decision about what to save. Oil paint on canvas is durable. It doesn't mean that a museum needs to retain every piece of crap that anyone paints.


The problem is that removal of content because it's crap/immoral versus operator destruction is not a meaningful distinction, from a software perspective.

So it would probably need to be write-only to prevent people from burning it down, which would necessarily mean that, once content is included, it cannot be modified or removed.


Journaling or storing incremental backups (perhaps offline?) of validated/verified checkpoints may address this, although it sounds like something you wouldn't be happy with since it's not a 'built-in' feature but an additional backup & maintenance process that a system administrator would need to implement.

I guess you're asking whether there exists a distributed fault-tolerant with a form of version control (similar to git/cvs/perforce) as part of the native feature set.




LOCKSS is definitely a giant in this field, and David Rosenthal (who wrote the paper I linked as well) is great.

But LOCKSS occupies a small niche. My hope is really that at some point a commercially-focused project with a ton of engineering effort and battle testing behind it will displace a lot of what LOCKSS has had to do manually. Seems like that might happen as web services get more and more distributed and fault-tolerant.


> are there distributed data stores that can be configured to resist intentional destruction of data?

Well, Git has checksums on everything.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: