Webfs: A Filesystem Built on Top of the Web

tootie · on Jan 5, 2020

Remember WebDAV? It was a similar concept, but never really found its footing and most of the implementations were pretty shaky. I always thought it was a good idea though.

https://en.wikipedia.org/wiki/WebDAV

Eikon · on Jan 5, 2020

The rfc is super dumb though. For instance, when handling a PROPFIND request (more or less listing files / folders), it’s not mandatory for the server to honor the Depth header (how many levels are returned). There is also no mechanism for the server to advertise whether or not it’s honoring the Depth header. That means the Depth header is useless because the client has no way to know whether there was only one hierarchy depth or if the server did not honor the Depth header. Therefore, your only option is to always scan the full hierarchy using PROPFIND at each level.

The RFC is full of that kind of crazy gotchas, not to mention the overuse of “MAY” or “SHOULD” which will drive you crazy if you try to implement a client / server.

If you want to go further in insanity, just look at how crazily over engineered the locking mechanism is. I have no words for it.

https://tools.ietf.org/html/rfc4918#section-6

Unfortunately, even if you think you implemented the whole rfc correctly, your implementation will work with almost nothing as not that much implementations are any good in the wild. A useful WebDAV implementation must be full of vendor-specific workarounds.

tootie · on Jan 5, 2020

There was an even worse protocol back then called CMIS. It was an attempt to define a standard API for content management which turned into this absolute enterprise monstrosity that would make SOAP blush. It was also impossible to implement.

It's interesting looking back but I think developing a standard has a higher chance of success coming from some dude's GitHub than it does with $1T of market cap behind it.

jpalomaki · on Jan 5, 2020

Limitations on depth makes sense, because the actual storage implementation can make recursive retrievals very costly. A folder could be an abstraction for a remote resource.

Of course it would still make sense for server to tell client about this (there are files/no files/I don’t know).

paulddraper · on Jan 5, 2020

Agreed. The spec is overcomplicated and half broken.

mlyle · on Jan 5, 2020

Every single major operating system ships with WebDAV remote filesystem support built in and it works reasonably. Subversion is built out of WebDAV and can be wired to naively auto-commit changes these clients store.

Maybe it's on the decline, but I'd hardly put it as something that "never really found its footing". It is still a decent way to do fileshares over the public internet without sshfs, etc.

That's not really what the link is, though. It's an adaptation layer to turn a random network resource into something that looks like a filesystem.

latchkey · on Jan 5, 2020

Ah yes... I implemented Sardine [1] 10 years ago!

At the time, I worked for a large porn company and we couldn't host stuff 'in the cloud' because they didn't allow porn there.

We invested a ton of money into an Isilon NAS to store our image/video content and the best way to get stuff off it over HTTP was via webdav. Unfortunately, there wasn't a good Java client.

So, I built a simple proxy that would accept regular GET requests and on the back end, use webdav to retrieve content from the Isilon. In front of that proxy was our CDN.

Since then, Sardine has been the basis for quite a few other projects.

[1] https://github.com/lookfirst/sardine

thomasfl · on Jan 5, 2020

I implemented tve same for ruby. With monkey patching you write normal code for writing files, but use the WebDAV protcol to write files if files had names starting with http.

brendoncarroll · on Jan 5, 2020

A WebDAV file could be used as a WebFS Cell. The locking would allow emulation of the compare-and-swap functionality. The reference HTTP Cell: client and server are way less complicated than WebDAV though, as many here have alluded to.

zzo38computer · on Jan 5, 2020

Someone else described my "httpdirlist" specification as "like WebDAV but better" or "like WebDAV but less messy" (actually I do not remember the exact wording).

Now I see that Wikipedia also lists several alternatives to WebDAV too, but I think httpdirlist is good.

amelius · on Jan 5, 2020

I always thought WebDAV was sabotaged by different players for some reason (e.g. Apple's implementation caused loss of data). "Never attribute to malice", but support was so bad that almost no other conclusion was possible.

cryptoz · on Jan 5, 2020

"Never attribute to malice what can be adequately explained by stupidity" has led me astray so many times in life I've come to largely disbelieve it is a useful mantra. It lets bad actors hide behind stupidity and cause chaos on purpose, while good people let it happen because "it's simply accidental, right"?

sitkack · on Jan 5, 2020

I think it is slightly more nuanced, in that bad decisions get made on accident and overlooked on purpose. If you want to sabotage something, put lots of people that make many accidental mistakes on a project and put your people in a position to overlook them.

jlokier · on Jan 5, 2020

Maybe it's just Apple competence in this area. Apple's SMB client causes loss of data as well.

CharlesW · on Jan 9, 2020

I've never experienced that (and it's the only way I access my NAS), but you have my attention. Citation(s) please?

Nux · on Jan 5, 2020

I use it every day on Android, Windows and Linux. It found its footing.

austincheney · on Jan 5, 2020

I am working on a similar idea myself. Here is a video demo from a month ago:

http://mailmarkup.org/sharefile/demo1.mp4

dredmorbius · on Jan 5, 2020

Watching now.

1. Thanks for the direct mp4 link.

2. Consider reducing your desktop resolution for webcasts. I can see that there are dialogues open. I can't for the life of me see what's presented in them.

brendoncarroll · on Jan 5, 2020

Hey everyone, I'm the author of WebFS. Happy to answer any questions.

dredmorbius · on Jan 5, 2020

I'm going to assume you're familiar with Plan9OS's WebFS and 9P Protcol?

https://en.m.wikipedia.org/wiki/9P_(protoco)

I've been kicking around similar ideas (nowhere near implemented) for a while:

https://old.reddit.com/r/dredmorbius/comments/6bgowu/what_if...

brendoncarroll · on Jan 5, 2020

I was not aware of Plan 9's WebFS. It looks like it presents websites in the local file system. IPFS can do something very similar with its content. I have used that before.

The name here comes from using web resources (referred to by url) as building blocks for a file system. I guess "Web for FS" rather than "Web as FS".

dredmorbius · on Jan 5, 2020

The notion of remote services accessed via local filesystem dynamics is pretty well established. Among implementations:

- NFS, particularly with the Solaris-originated concept of automounts over a /net mountpoint.

- Various virtual filesystems. Midnight Commander ("mc") offers several of these, including archive formats (tar, cpio, afio, rpm, deb) and remote (FTP, SSH).

- SMB/CFS/Samba

- Various FUSE filesystems, including again ssh, ftp, and others. These generally require specifying in advance specific mountpoints.

The notion of on-demand access to remote resources over protocols (e.g., http/https, or others), under filesystem dynamics, is interesting -- you can use any general tool, utility, or application for access, mediated through the filesystem by way of drivers, rather than a specific application (e.g., Web browser, FTP client, etc.)

There are numerous issues. In particular, applications tend not to respond well to remote resources disappearing, changing, or failing to return from change requests -- NFS's behaviour with nonresponsive remote hosts is ... notorious.

Consistency, availability, and partition resistance (CAP) are long-standing concerns, and there's no way to solve for all three. I'd add latency as another major consideration.

https://en.wikipedia.org/wiki/CAP_theorem

The general notion of managing and tracking changes locally, and pushing them to remote, has merits. I'm not aware of a "gitfs" ... though of course, one does exist, TIL: https://www.presslabs.com/code/gitfs/ The notion of using git (or other versioning system) as a mediator for remote/local revisioned access seems to have merits. Obviously not viable for very-high-variability systems, but adequate for many occasionally-modified resources.

I'm not sure if you're looking at using your WebFS itself as a publishing mechanism, though in general I think I'd recommend against doing this. For small-n peer-to-peer distribution that's probably workable, but for large-scale provisioning-and-request systems, relying on HTTP or other established transports is likely more sensible.

One area I've recognised as being particularly fraught is the whole notion of security and privacy. Providing unfiltered local access to remote resources which may change arbitrarily is a great way for allowing malware onto local systems -- your transport layer should probably implement some level of security and mounts deny direct execution of content. The fact that remote content could be copied to an executable mountpoint remains, and would make numerous attacks possible.

Similarly: access, update, write, and/or publishing actions all leak considerable information which could be of concern to specific users or organisations. Hash-based indexing (already addressed in this thread) being only one of several such vectors.

z3t4 · on Jan 6, 2020

SMB/CFS/Samba are not meant to be used directly over the internet. eg. without a tunnel. The best alternative right now is Dropbox unless you are a developer.

dredmorbius · on Jan 6, 2020

Neither, really, is NFS.

The point isn't whether or not these are protocols that are utilised on the naked Internet, but that they offer access to network services via filesystem semantics.

That is, rather than use a specific client or API to access remote content, or copying it locally as a separate step, you simply open a file in an existing application, or, within a program, using fopen() or equivalent operators. The networking is ... translucently ... handled in the background by the filesystem interface and/or driver(s).

The reasons SMB is not generally used or advised over the Internet are worth looking at, as this touches on many of the security / privacy concerns of any such service.

dredmorbius · on Jan 6, 2020

Wikipedia link fixed:

https://en.wikipedia.org/wiki/9P_(protocol)

mpweiher · on Jan 5, 2020

This looks really cool.

> If you have ever thought "x can probably be used as a file system"

...you might also want to take a look at Storage Combinators[1]. Not quite the same problem space, but abstracting away a bit from both concrete filesystems and other storage mechanisms to get to a composable abstraction.

Note: I am the primary author, and also taking a good look at WebFS for further inspiration... :-)

[1] https://2019.splashcon.org/details/splash-2019-Onward-papers...

toomim · on Jan 5, 2020

One issue that networked filesystems have is with mutations, especially with multiple writers. WebDAV, NFS, etc. try to address this with locking, but that doesn't allow simultaneous writes, which means it's dangerous to work offline. It's possible to solve this by using an OT or CRDT algorithm behind the scene. This is what we are building into an extension of HTTP called braid (https://braid.news). It could be useful for a web-backed filesystem. It automates synchronization.

layoutIfNeeded · on Jan 5, 2020

Thanks for the shameless plug, but what does this have to do with WebFS?

jerrygreenest · on Jan 10, 2020

I see very little docs and absolutely no fancy info like gifs or videos explaining what it's possible to do with this tool. I see some example folder, but I can't really say the difference between webfs and ipfs. (Just a little feedback, I mean no offence)

ianopolous · on Jan 5, 2020

You will probably be interested in Peergos [0][1], which at it's lowest level is an encrypted global filesystem also built on IPFS and also only using the block api.

[0] https://github.com/peergos/peergos

[1] https://book.peergos.org

rkagerer · on Jan 5, 2020

Your page mentions files in a store are identified by a hash of their contents. Does this have any privacy implications?

Can another participant in the network tell what files are in my store? (like a party providing disk space, or someone intercepting my traffic)

brendoncarroll · on Jan 5, 2020

Encryption keys are derived from a secret and the hash of the data. The secrets are set per Volume. An empty secret would be a convergent encryption strategy, maybe suitable for public data. If the secret is set then the keys will be convergent with other keys generated with the same secret. This amounts to deduplication within a Volume, and privacy.

All a storage provider sees are many small encrypted blobs, so the size of large files is not leaked either.

mirimir · on Jan 5, 2020

ELI5 please.

Where is the data for all of those IPFS hashes stored?

And what ensures that it will persist?

brendoncarroll · on Jan 5, 2020

WebFS stores data in things it calls "Stores". Stores can be anything that takes data and gives back a key to retrieve the data. Right now we have an IPFS and HTTP store built. An FTP server could also work as a store.

If you or a friend run the HTTP or FTP server then it will persist the data for you. IPFS doesn't incentivize data persistence so if WebFS is working on top of IPFS it inherits that problem. You could run WebFS on top of one of the storage networks and persisting your blobs would be incentivized.

WebFS is storage layer agnostic. Give it a Store, and it will give you a file system.

mirimir · on Jan 5, 2020

OK, cool. I understand now.

So if WebFS is running on a system with access to Tor SocksPorts, can Stores be onion URLs?

Edit: If not, one could presumably route WebFS through OnionCat's IPv6 /48. But that only works with v2 onions, which are deprecated. However, tinc works with v3 onions. And either of those gives you UDP transport.

brendoncarroll · on Jan 5, 2020

Yes, you could use the HTTP Store over Tor. I think you would have to configure the proxy environment variables for Go's HTTP client.

mirimir · on Jan 5, 2020

Thanks. I'll try it, and comment on GitHub.

jerry292 · on Jan 5, 2020

I read you use IPFS. To my knowledge everything on there can be encrypted however not private. Is there a specific way you get around that.

brendoncarroll · on Jan 5, 2020

Yes. WebFS doesn't actually use any of the file/directory functionality provided by IPFS, or any encryption features. We only use the get/put block functionality. Everything is encrypted in WebFS before being posted to a Store.

The data encryption keys are generated using a secret and the hash of the data being encrypted. That key is stored in the reference to that data. This continues recursively to the superblock which is not encrypted.

mirimir · on Jan 5, 2020

Can one use Tor onions as stores?

sp332 · on Jan 5, 2020

Tor is a network layer. It doesn't really store anything.

mirimir · on Jan 5, 2020

True. But Tor onions are servers.

sp332 · on Jan 5, 2020

I've never heard of an onion service called just "an onion" before. I don't know enough to say if that's wrong, just that I was confused.

mirimir · on Jan 5, 2020

Sorry to be confusing. It is commonly used, in some Tor communities. But yes, "onion service" is clearer.

duskwuff · on Jan 5, 2020

Not in any sense that's relevant here. They don't host content.

mirimir · on Jan 5, 2020

I have no clue where you're coming from.

What sort of content do you say Tor onions can't host?

"Tor onion" just means that a server is (ideally) only reachable as an onion URL, which is only accessible via the Tor network. There is the limitation that Tor only handles TCP. Otherwise, one can route anything over Tor. In my experience, that includes HTTP(S), FTP, Tahoe-LAFS, SSH, RDP, Mumble, OpenVPN and tinc. And others, if I spent more time remembering what I've played with.

duskwuff · on Jan 5, 2020

Okay, I think I see the source of the confusion.

What you're describing is a Tor hidden service. Hidden services are separate from the Tor relay network itself, which is what I thought you were referring to as "Tor onions".

Hidden services are optimized for confidentiality over performance. Using them for bulk data storage would place a lot of load on the relay network, and it's not clear what security problem this arrangement would solve.

mirimir · on Jan 5, 2020

Again, sorry.

As far as I know, "hidden service" is deprecated, with "onion service" the current term. And it does tend to get shortened to "onion". But I admit that it was confusing. Because relays used to be called "onion routers". Which is also more or less deprecated, I think.

The security problem is Stores being physically located and compromised, based on IP addresses found in traffic logs.

mirimir · on Jan 5, 2020

What do you mean by "private"?

I mean, if it's securely encrypted, does it matter if others can see it? And indeed, if it's online, you must assume that others can see it.

My privacy concern is about IP addresses. So I'd want to use IPFS with Tor onions as stores.

jerry292 · on Jan 5, 2020

Well I fear even if the information is encrypted alone once quantum computing breaks modern AES encryption standards that’s going to be a yikes. So I’d be more comfortable with encryption as well as access controls.

brendoncarroll · on Jan 5, 2020

This is a legitimate concern. WebFS is designed for the p2p storage use case. Persisting data with p2p storage means that it can live forever. All the secrets in WebFS are randomly generated and there are no user supplied (potentially weak) passwords.

w.r.t. quantum computing: it is possible for WebFS to use symmetric cryptography for all remote data. Although, many Cell implementations in the near term will likely use elliptic curves or RSA.

mirimir · on Jan 5, 2020

I guess. But access controls really just keep the punters out. Any serious adversary will just track down the stores. And even if they're on dedicated servers with FDE, keys can be obtained from RAM.

brendoncarroll · on Jan 5, 2020

Just to clarify: All data is encrypted on the client, going after a server backing a Store will get you encrypted blobs. Encryption keys would not exist on the server in plaintext.

mirimir · on Jan 5, 2020

Sorry to confuse the issue. I meant the keys for the Store's FDE, not the WebFS keys, which never leave the user's machine.

I was addressing jerry292's concern about access to the encrypted data in Stores.

cbsmith · on Jan 5, 2020

Name is dangerously close to WebNFS.

cryptonector · on Jan 5, 2020

HTTP is basically a filesystem protocol that supports magical files -- not too unlike sharing named pipes over SMB. Not too unlike what it'd be like if one could open(2) AF_LOCAL sockets on Unix/POSIX systems instead of having to connect(2) to them -- if that had been so in 1982 in BSD, it would be true now, NFS would have supported the same, etc.