The Quiet Power of SQL

(blog.sturdystatistics.com)

12 points | by kianN 2 days ago ago

2 comments

  • tkcranny a day ago ago

    While it’s hardly insightful that SQL is useful, I would have liked to read more about what the actual workload involving duckdb on a local machine looked like. I’m fully on board that local or single vm workloads can do an awful lot, but I’ve never been particularly satisfied with the pipelines I’ve seen (including my own). Usually they’re piles of scripts and intermediate data files sitting around and are hard to make idempotent and understand if you aren’t the author.

    Also fwiw there’s no such thing as an M4 Ultra chip. That detail was either a mistake or hallucinated.

    • mkmccjr 18 hours ago ago

      Original author here -- thank you for your thoughtful comment.

      You're absolutely right that saying "SQL is useful" isn't exactly novel. My goal with the blog post was to describe the practical impact of leaning into SQL (and DuckDB) at our company.

      I'm not the SQL expert on our team (that's my colleague Kian) but I've seen the difference he's made with his expertise. A lot of the work we migrated into SQL was originally implemented as the kind of multi-step pipelines you described: we used multiple libraries, wrote intermediate files, and had to translate data between different formats.

      Kian recently rewrote a large stage of our pipeline so it runs entirely inside a single SQL script. It's a complicated script to be sure, but that's because the logic it implements is complex. And with CTEs, temp tables, and DuckDB's higher-order functions, it ended up being dramatically clearer than the original sprawl of code. More importantly, it's self-contained, and easy to inspect. Consolidating the logic into one place made a big difference for us.

      And thank you for catching my error about the CPU type. We recently moved from an M2 Ultra servers to M4 machines, and I mistakenly conflated the two when I wrote "M4 Ultra." I've corrected the post.