Wrapping both caching logic and database access in an ORM like system is no doub...

nettdata · on May 3, 2011

The two big systems I architected where I made the decision to go with ORM's were the online EA Sports system (all EA Sports games on all platforms, currently running in a 7 node Oracle cluster), and most recently, the Need For Speed World Online system. We launched the EA Sports system with Madden, and went from 50 to 11 million users hitting the DB in less than an hour. Then we rolled out the other EA Sports games. Needless to say, both systems were slightly bigger than a simple blogging site.

In both cases, we had a large number of smart developers who we empowered with the use of an ORM; they understood the domain model, and they didn't have to worry about waiting for a "DB type" to write stored procedures, or develop a data model, etc. As a matter of fact, in both cases, I was the only DBA on the project, and it was a predominately part-time role. We'd meet, ensure we were all on the same page with the object/data model, and then they'd go and build it. The developers were able to immediately build and run and test and integrate something that was functional and operational, when they needed it. This was HUGE, and something that most people don't properly appreciate. Timelines were already insane enough as it was, the last thing we needed to do was artificially constrain ourselves by waiting for other (db) devs before work could go on. Especially when requirements had the potential to change from one day to the next.

In both situations, we took advantage of very, very sophisticated testing procedures that would happen nightly, both functional and stress/load, and it pointed us at the bottlenecks of each nightly build that would require tuning and investigation. We intentionally set up our testing to be able to monitor and test the effectiveness of the ORM, and to point it out when it didn't work efficiently. The devs would do the majority of the heavy lifting with the initial data model, and the results would be tested, reviewed, and then modified if required. The performance modifications were not a lot of effort to fix, either. Usually it was a very slight data model change, or using a named query to take advantage of a database-specific features. And CLOBS. Every database seems to handle them differently, so we had to hack some solutions.

Having done large scale database development for almost 25 years, using the classic stored procedure approach and the ORM approach, I'll say again that ORM's are a great solution for certain projects with the right staff, and aren't a crutch or some lazy choice if used properly.

fendale · on May 3, 2011

As another 'Oracle guy' this is an interesting post. I have said before on here, if you pay for Oracle, and also pay for decent storage arrays, Oracle can shift a serious amount of data before it reaches its limit. In my opinion, it really seems to be an order of magnitude better than its closest open source competitor.

jacques_chester · on May 3, 2011

I have to admit that when I hear "relational doesn't scale", I usually rewrite it in my head as "MySQL doesn't scale".

Because people have been running massive systems on Oracle, DB2, Sybase, TeraData etc for years now.

mkjones · on May 4, 2011

I think when people say "relational doesn't scale," what they mean is "MySQL often requires application-level changes to scale out."

I assume (among this crowd, anyway) that scaling out is more desirable vs. scaling up because the majority of the hardware costs are variable, whereas scaling up requires a step function of large cash investments that startups often can't afford.

Do those massive systems on Oracle etc scale out, or simply scale up with expensive hardware?

fendale · on May 4, 2011

You can certainly scale out with Oracle RAC. At some point the bottleneck will likely be disk however, so buying a high end storage system would probably become priority.

jacques_chester · on May 4, 2011

> Do those massive systems on Oracle etc scale out, or simply scale up with expensive hardware?

Both. If you still need ACID guarantees and want hundreds of thousands (or even millions, if necessary) of TPM, you will need to pay the piper.

n_are_q · on May 3, 2011

My experience is from writing a bunch of middle tier code at MySpace in the 06-07 time frame, the myspace hey days when they were pushing more traffic than google (true story). Anyway, the user facing product might have sucked, but we did scale (that's why friendster was friendster and we were myspace :). In an environment with 450+ million users, we had extensive caching systems and still had to use every sql trick in the book to get our systems to scale well. I know because my job was working with the DBAs to bridge the sql and front end worlds together. I can say with great certainty that front end developers who did not know sql and were simply following a logical object model would not have produced code that scaled in our environment, there were way too many things that were done that were extremely non-obvious. Since myspace i've been working at a python/postgres start up where we've been applying the same principles pretty successfully, at a much different scale of course. If nothing else, i think the no orm approach will at least give you more bang for your buck.

Separating your data access code out of the application logic also allows you to change it much more easily as data conditions change, including on the fly, without an application deployment. That's often extremely useful.

MySpace scale may be at an extreme end of the spectrum, but we had formidable hardware to throw at it too (although x86, so nothing TOO crazy). So I think the ratio of hardware to scale at other sites is comparable, and so I think the same lessons apply. I have no experience working with oracle, but would you say that a 7 node oracle cluster is some pretty serious hardware? I myself really don't know, but it is a question I have :).

EDIT: I'm not discounting your experience, i just want to point out that i've experienced conditions where I think the orm approach would have broken down. If others have had different experiences, the more data points the better, but i think the scale/complexity/cost(hw) ratios play into the debate as well.

EDIT #2: Oh and I forgot to mention that the automated test suite you had is an incredible asset, and no doubt made it easier to discover problems early and deal with them effectively. But you do have to invest resources in creating one, and something like that is no small cost at a start up.

nettdata · on May 4, 2011

The point of my post was to say that if you take a serious look at the ORM you want to use, fully understand the issues you may have with it, design/adapt your development process to help mitigate the issues you may run into, there are huge advantages to using it.

I was just pointing out that ORM's are indeed quite effective in online systems that are more complex than a blogging site.

If you're going to say "no, don't use it", based on a development situation that is very much an outlier (MySpace), and use that experience to discount it for any but trivial use, then I'm not sure what to say.

They can and do offer real-world advantages with minimal downside if you treat them like any other tool, and not use them blindly, in reasonably complex and large systems, as I've tried to demonstrate.

As to your environment, the data requirements were quite different than ours. Our systems were more like online banking systems; very much an even split of fast writes and reads, transactionally bound to third party systems (in-game payment, in-game "real time" use of consumables, etc), real-time analytics for fraud detection, etc. We were very much high IO, and our caching opportunitites were few and far between.

And in our environment, we HAD to have sophisticated testing. I ensured that the stress and load testing was done so that we could directly simulate the load of our expected user base, with realistic profiles, in order to better engineer our databases and disk IO. It also allowed us to measure the impacts of feature additions, etc. If it failed in Production, it made the news, and we had millions of gamer-freaks bitching everywhere.

In my case, the middle-tier was not an issue... we enabled minimal caching on a per-box basis, and other than that, they were stateless, and we could add/remove them at will; the application WAS the database.

And you can still abstract various parts of the database while using an ORM. We did write a few special stored procedures, and used some forced query plans, views, etc., to tweak the performance.

And yes, Oracle can scale out quite well. Cache Fusion, high speed and low latency interconnects, and shared block access provides incredible scaling without having to do anything special in the middle tier.

n_are_q · on May 4, 2011

It's interesting to hear that has worked well, obviously this wasn't a small project. Your point about knowing how to use your tool definitely rings true. Also interesting that you had a use case where data loss and integrity actually mattered and in real time, unlike a social network or most start ups operating today. Going with a heavy oracle system instead of trying to roll your own creative distributed architecture definitely seems to make sense in that scenario. Just out curiosity, was this Java/Hibernate?

nettdata · on May 4, 2011

On one system we used Java/Oracle/Hibernate and went with the big single cluster. The other system was a .NET stack, using NHibernate and a large number of SQLServer instances. We also worked with Microsoft on integrating their latest (at the time beta) caching servers. We did indeed have to roll our own distributed architecture in that case, but it's not like we had to drop ORM to do it.

nettdata · on May 4, 2011

If anyone has any questions about how I've used ORM, etc., feel free to email me at nettdata@gmail.com if you like. I don't usually keep tabs on old threads, and have no problems sharing some of my experiences in this.

spudlyo · on May 3, 2011

Thanks for sharing your experiences. This is why I read HN.