Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I've never really digged into column-oriented storage, so had a quick skim... Would the below excerpts/example be a fair note of pros/cons of the general idea?

> Column-oriented organizations are more efficient when an aggregate needs to be computed over many rows but only for a notably smaller subset of all columns of data, because reading that smaller subset of data can be faster than reading all data.

Example: SELECT sum(a) FROM things;

> Column-oriented organizations are more efficient when new values of a column are supplied for all rows at once, because that column data can be written efficiently and replace old column data without touching any other columns for the rows.

Example: UPDATE things SET a = a+1;

> Row-oriented organizations are more efficient when many columns of a single row are required at the same time, and when row-size is relatively small, as the entire row can be retrieved with a single disk seek.

Example: SELECT * FROM things;

> Row-oriented organizations are more efficient when writing a new row if all of the row data is supplied at the same time, as the entire row can be written with a single disk seek.

Example: INSERT INTO things (a,b,c,d,e,f,g) VALUES (1,2,3,4,5,6,7);



That's pretty accurate. Column-stores also make data compression significantly more effective because they are storing many values of the same type together [1].

The improvements to your first two points are typically 1-2 magnitudes faster with column-stores [2].

[1] http://db.lcs.mit.edu/projects/cstore/abadisigmod06.pdf

[2] http://db.csail.mit.edu/projects/cstore/abadi-sigmod08.pdf


That sounds like a pretty good summary.

From a workload perspective, row-stores are predominant in the OLTP (transactional insert/update) domain. They are also used in OLAP and data warehousing. Still, column stores have benefits when the underlying tables have many columns and the user's running analytic queries over a small subset of those columns.


Also column-oriented db are much more efficient in filtering rows, specially for very large datasets with complex filtering expressions. This is what makes them good at data analysis.

I consider different from your first point because sometimes is not the aggregation, but the drill down what is needed




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: