In the last 2 decades in the industry as well I've never lost data with MongoDB,...

rantanplan · on July 24, 2016

>In the last 2 decades in the industry as well I've never lost data with MongoDB, Riak or Cassandra but have with Oracle, DB2 and PostgreSQL

Yet every test proves otherwise. Also, use Google to see how people have lost data with MongoDB. Mongo is not considered a serious piece of technology by any scientist or engineer I know. Postgres though is universally considered an engineering marvel.

>Hint: think about the schema problems associated with storing auto generated features from deep learning models.

Hint: The problem you mentioned? Even less than 1%

Calling me ignorant doesn't change reality you know.

dominotw · on July 24, 2016

Is data-loss something inherent to nosql tech or just poor implementations?

If its the latter why haven't there been any reliable nosql implementations.

Perhaps its well suited to non transactional, low fi data?

rantanplan · on July 24, 2016

NoSQL DBs usually target distributed environments.

So... enter CAP theorem. There's no free lunch. People think we can simply throw away half a century's worth of science because JSON and schemaless are teh awesome derp derp.

Implementation is surely an issue, if you take into account that the mongodb guys had to acquire another company [1] in order to overcome their abysmal write performance. And yet there were people, and benchmarks that were trying to tell us that mongo was faster than RDBMS alternatives. All this circa 2009-2012.

You know what's faster than everything? Writing to /dev/null ;)

Anyways, depending on your use case there might be a NoSQL out there that might fill your needs and it might actually deliver what it claims it can deliver. But it's hard to sift through all this ad-driven, buzzword-ridden informacials that gets thrown around by start-up companies in the DB domain.

Also, DBs are like filesystems; even if the match/science is correct, it needs at least a decade of proven track record before you can say that it works as advertised.

[1] http://www.informationweek.com/software/information-manageme...

dominotw · on July 24, 2016

> NoSQL DBs usually target distributed environments. So... enter CAP theorem.

Surely FB is not running MYSQL on a single machine. Perhaps i am misunderstanding what you are saying but saying SQL db's dont face the issues of distribution seems a little strange.

Distribution comes into picture from shape and size of the data not data saving/retrieval techniques. yea?

rantanplan · on July 24, 2016

FB and all big companies are a very bad example. They have ton of resources and usually they don't use vanilla products, since they have the engineering capacity to support their own forked versions. e.g. see their own version of PHP.

Also distributing reads is easy, writes... not so much. NoSQL systems usually offer distributed writes with the caveat of eventual consistency. RDBMS have referential integrity and other constraints which by definition cannot migrate into a distributed environment. Or at least there's not a one size fits all solution.

> Distribution comes into picture from shape and size of the data not data saving/retrieval techniques. yea?

Most definitely not. It has nothing to do with the shape and size of data. Also.. there's not such thing as "distribution" in our context. Only "distributed", from "distributed computing"[1] and it's everything to do about data saving and retrieval :)

[1] https://en.wikipedia.org/wiki/Distributed_computing

dominotw · on July 24, 2016

>RDBMS have referential integrity and other constraints which by definition cannot migrate into a distributed environment.

so,

Use RDBMS if your data can be handled by a single machine( or have the resources of FB) ? '99% ppl need RDBMS' argument boils down to 99% of ppl have data that can be handled by a single machine RDBMS.

Is that a good conclusion?

rantanplan · on July 24, 2016

The single machine shouldn't be the deciding factor.

If your application is like most apps(far more reads than writes) then you can easily distribute the load across multiple machines. If you have more writes than reads(quite rare but still) then scaling an RDBMS will be challenging.

In this case, if eventual consistency is something you can live with, a NoSQL store might be best for you.