DuckDB is mind blowingly awesome. It is like SQLite, lightweight, embeddable, se...

drdaeman · on March 12, 2025

I'm sorry, I must be exceptionally stupid (or haven't seriously worked in this particular problem domain and thus lacking awareness), but I still can't figure out the use cases from this feature list.

What sort of thing should I be working on, to think "oh, maybe I want this DuckDB thing here to do this for me?"

I guess I don't really get the "that you want to learn something about" bit.

setr · on March 12, 2025

If you’re using SQLite already, then it’s the same use case but better at analytics

If you’re using excel power query and XLOOKUPs, then it’s similar but dramatically faster and without the excel autocorrection nonsense

If you’re doing data processing that fits on your local machine eg 50MB, 10GB, 50GB CSVs kind of thing, then it should be your default.

If you’re using pandas/numpy, this is probably better/faster/easier

Basically if you’re doing one-time data mangling tasks with quick python scripts or excel or similar, you should probably be looking at SQLite/duckdb.

For bigger/repeatable jobs, then just consider it a competitor to doing things with multiple CSV/JSON files.

proamdev123 · on March 12, 2025

I’m not the person you asked, but here are some random, assorted examples of “structured data you want to learn something about”:

- data you’ve pulled from an API, such as stock history or weather data,

- banking records you want to analyze for patterns, trends, unauthorized transactions, etc

- your personal fitness data, such as workouts, distance, pace, etc

- your personal sleep patterns (data retrieved from a sleep tracking device),

- data you’ve pulled from an enterprise database at work — could be financial data, transactions, inventory, transit times, or anything else stored there that you might need to pull and analyze.

Here’s a personal example: I recently downloaded a publicly available dataset that came in the form of a 30 MB csv file. But instead of using commas to separate fields, it used the pipe character (‘|’). I used DuckDB to quickly read the data from the file. I could have actually queried the file directly using DuckDB SQL, but in my case I saved it to a local DuckDB database and queried it from there.

Hope that helps.

steve_adams_86 · on March 13, 2025

My dumb guy heuristic for DuckDB vs SQLite is something like:

  - Am I doing data analysis?
  - Is it read-heavy, write-light, using complex queries over large datasets?
  - Is the dataset large (several GB to terabytes or more)?
  - Do I want to use parquet/csv/json data without transformation steps?
  - Do I need to distribute the workload across multiple cores?

If any of those are a yes, I might want DuckDB

  - Do I need to write data frequently?
  - Are ACID transactions important?
  - Do I need concurrent writers?
  - Are my data sets tiny?
  - Are my queries super simple?

If most of the first questions are no and some of these are yes, SQLite is the right call

pelagicAustral · on March 12, 2025

Wow... sounds pretty good... you should be doing PR for them... I might give it a try, sounds like I should.