Aren’t most SQL queries a table scan at some point? I guess you’d shard the index on range and the actual data can be hash sharded, but I don’t know if that buys you much since now the index and data are on separate machines. + you probably need to completely disallow queries on unindexed tables (ie you’re ClickHouse not spanner). That being said there is also interesting work done in the auto-indexing field that might provide a way out of the problem (ie generating an index transparently for the range seeing your hottest queries) but I think you’re still left with the amplification problem of the machines that need to get hit to access the underlying value.
Also, I’m not saying “list all”. I’m saying even list 10 or list 1000 is the same problem - you still have to contact all servers in your cluster to do a map/reduce to get the result. Sure, list operations may be less common but their cost seems exponentially more expensive.
Also, I’m not saying “list all”. I’m saying even list 10 or list 1000 is the same problem - you still have to contact all servers in your cluster to do a map/reduce to get the result. Sure, list operations may be less common but their cost seems exponentially more expensive.