DuckDB also runs in Excel, by the way, via the free xlwings Lite add-in that you can install from the add-in store. It’s using the Python package and allows to write scripts, custom functions, as well as use a Jupyter-like notebook workflow.
If you start with Excel, I'll counter with Postgres: https://github.com/duckdb/pg_duckdb.
I haven't found the time to check this on one of our installation, though.
Hm, our internal benchmarking shows something like a 30x speedup compared to SQLite (https://github.com/ClickHouse/ClickBench shows an even greater speedup due to not considering cache size). Calculating back on the envelope I'd estimate 8x for multithreading and 4x for SIMD. Should we expect even more?
"Performance
Does DuckDB use SIMD?
DuckDB does not use explicit SIMD (single instruction, multiple data) instructions because they greatly complicate portability and compilation. Instead, DuckDB uses implicit SIMD, where we go to great lengths to write our C++ code in such a way that the compiler can auto-generate SIMD instructions for the specific hardware. As an example why this is a good idea, it took 10 minutes to port DuckDB to the Apple Silicon architecture."
I benchmarked DuckDB 1.5.2 with the latest Java JDBC driver which now supports user defined functions. This allows very fast modifications https://sqg.dev/blog/java-duckdb-benchmark/
I got introduced to it by Claude the other day as I was interrogating several GB of public csv files. Seemed magical as it out them all in parquet files and transformed what I needed into the normalized sqllite for my server. Coding agents seen quite comfortable with it!
Whoa, nice! I could see this being useful to people I work with. Do you think it would be a good setup for people who are technical but not great software developers? People who use basic R and Python for ETL and analysis, mostly.
I'm using DuckDB in another project (on my laptop) where `NetworkX` fails due to the memory limit of 32 GB. So yes, as soon as you are doing out of core work I'd assume the combination to be quite powerful. Knowledge in SQL would be a plus, though.
It is a educational/R&D type project. We are more of backend developers and `rill` worked fine as a rapid visualization frontend with low learning curve for us.
Edit: still realizing that I can't use markdown on HN...
I use it almost daily. Any time I benchmark changes or analyze logs, I collect the data I need as CSV and analyze it with duckdb. The flexibility and ease makes it so I find so much more interesting information. It's indispensable to me now
"Performance Does DuckDB use SIMD? DuckDB does not use explicit SIMD (single instruction, multiple data) instructions because they greatly complicate portability and compilation. Instead, DuckDB uses implicit SIMD, where we go to great lengths to write our C++ code in such a way that the compiler can auto-generate SIMD instructions for the specific hardware. As an example why this is a good idea, it took 10 minutes to port DuckDB to the Apple Silicon architecture."
https://duckdb.org/faq
Interested here: for me it works for out of core work. Where is the limit? On a related note: do you need to handle concurrency restrictions?
Edit: still realizing that I can't use markdown on HN...