Protobuf crashes with data larger than 2GB (json does not). This severely limits...

kentonv · on Aug 17, 2019

You really don't want to put >2GB in a single protobuf (or JSON object). That would imply that in order to extract any one bit of data in that 2GB, you have to parse the entire 2GB. If you have that much data, you want to break it up into smaller chunks and put them in a database or at least a RecordIO.

Cap'n Proto is different, since it's zero-copy and random-access. You can in fact read one bit of data out of a large file in O(1) time by mmap()ing it and using the data structure in-place.

Hence, it makes sense for Cap'n Proto to support much larger messages, but it never made sense for Protobuf to try.

Incidentally the 32-bit limitation on Protobuf is an implementation issue, not fundamental to the format. It's likely some Protobuf implementations do not have this limitation.

(Disclosure: I'm the author of Protobuf v2 and Cap'n Proto.)

yongjik · on Aug 17, 2019

Generally speaking, if you have 2GB of data, why would you want it inside a protobuf, or worse, json? You clearly aren't going to open it in a text editor or send it over ajax - just put the bulk of data as a separate binary blob, and your code won't have to scan 2GB to find the end of a string.

rapsey · on Aug 17, 2019

Not even remotely is it a severe restriction. Very few need that.

hrktb · on Aug 17, 2019

While it’s a valid point, there are many workarounds.

Also even in json, splitting documents in smaller chunks (ndjson for instance) is the standard practice to avoid having to parse it all in one go.