Show HN: Streambed – Stream Postgres to Iceberg on S3, Supports Postgres Wire

(github.com)

43 points | by vira28 4 hours ago

2 comments

karakanb 12 minutes ago
Hi, this looks interesting, thanks for sharing. I am the builder of ingestr (https://github.com/bruin-data/ingestr), so I am very much in the same space.
I really like that you did this in Go, and I'll definitely dig a bit more into the source code to see how you tackled the CDC stuff, given that there is not many reliable CDC libraries in Go, and there are quite a few gotchas when it comes to doing CDC right. We also hand-rolled ours in ingestr, or I must say clanker-rolled, and we got quite a few things wrong in the first place.
Curious about the postgres-compatible query option: what's the usecase you have in mind there? My perception is that any org that would use Iceberg also has one or a few query engines in place, is this more for debugging stuff?
Quite cool stuff, keep it up!
cpard 1 hour ago
Replicating the Postgres WAL to S3 and Iceberg reliably is a hard problem but it’s not accurate to say that no ETL is needed here.
maybe you can say it’s more of an ELT pattern but anyone who’s interested into using this for realistic analytics they will have to transform the data at some point.
If an org is early enough to think that they can use a solution like this and just get in duckdb and start spitting out reports, they will be up for a really bad experience.
Please educate people to do the right thing and realize the scope of the work they are facing, it might feel that it hurts your growth in the short term but it will benefit you greatly in the mid-long term as a vendor.
[-]
- kikimora 1 hour ago
  IDK, AWS Zero ETL from Autora into Redshift really helped us at some point. You right that data transformation is very limited if not possible. But having data in an analytical store, being able to experiment with queries, understand what is wrong with your OLTP schema and then build ETL is way better than doing an upfront design.