The perils of UUID primary keys in SQLite

(andersmurphy.com)

54 points | by emschwartz 6 hours ago

5 comments

blopker 5 hours ago
UUIDs are way over used. There is almost always a better key to use, usually a bigint for databases. If you're making some kind of leaderless distributed data store, then maybe, but even then there are other ID sharding strategies I'd go for first depending on the constraints.
For a single database, bigints are smaller and faster, with less footguns.
UUIDs can be nice for an opaque public ID, however I'd still prefer something like a Sqid for space and usability.
[-]
- Fabricio20 3 hours ago
  > bigints are smaller and faster, with less footguns
  But be careful!! Javascript WILL interpret your bigints as Number() and round them down because they are too big without telling you!!!
  Famously seen by every snowflake user that has interacted with Javascript, quite an annoying problem.
  [-]
  - spiffytech 2 hours ago
    Fortunately we're seeing more JS DB libraries offering to read large numbers as the BigInt type.
  - paulddraper 2 hours ago
    !!
    Node.js drivers will correctly read int64 as string or bigint, not number.
    E.g. pg for PostgreSQL
    Maybe there’s a buggy driver but I don’t know it.
- JamesSwift 3 hours ago
  UUIDs also have a nice benefit of it being impossible to query the wrong table with one if you mixup what an FK goes to
  [-]
  - pyuser583 2 hours ago
    Yeah this is nice - also helps with grepping dump files.
  - mamcx 1 hour ago
    How is this done?
    [-]
    - nickpeterson 58 minutes ago
      They just mean you catch incorrect joins more easily because there is usually no overlap in keys between unrelated tables. Using int, you’re usually going to have some shared values between two unrelated tables.
    - sudoshred 1 hour ago
      Statistically impossible to inadvertently generate a collision using UUID keys. UUID is designed to be unique when generated across any computer system. Practically speaking if you have an exactly matching pair of UUIDs from disparate system you have found the exact record match. The name gives a hint "Universally unique identifier". -Not a cryptographer.
    - masklinn 54 minutes ago
      The U means if you join the wrong table your join will always come up empty.
      It does not actually make it impossible to query the wrong table it just tells you quickly when you’ve done so.
- bob1029 4 hours ago
  I am finding UUIDs help a lot if your primary schema consumer is an LLM.
  Inappropriate aliasing of integer keys allows for silent errors in queries because it will actually return some result a lot of the time. A UUID is immune to this problem. The model recognizes its mistake a lot more reliably when previously non-empty tables start showing up empty after attempting a join.
- crubier 3 hours ago
  No one ever got fired for using UUIDs
pyuser583 2 hours ago
Oh gosh the ints v uuids debate for pks. This is worse than vim v eMacs or brackets v braces.
w10-1 4 hours ago
Isn't the solution just to use the rowid (after doing the read-id-after-insert dance)?
How much trouble does SQLite reysing rowid's actually cause?
yepyoukno 6 hours ago
Perils of “UUIDv4”. Everyone knows that’s what UUIDv7 was really for, and you should always convert that to binary to optimize everything.
[-]
- JSR_FDED 4 hours ago
  Small nit: uuid7 is 128 bits (16 bytes) by definition. So there’s no need to convert it to binary. It already is. Unless you’re working with a stringified version of the uuid7.
  [-]
  - yepyoukno 4 hours ago
    Oh yes, I meant don’t store as an ID in its string format!
    [-]
    - dexterdog 3 hours ago
      It's just s dumb as storing dates as strings, but people still do it.
- themafia 2 hours ago
  > and you should always convert that to binary to optimize everything
  I disagree. I tried this once. Now you need a client access layer to touch the DB in any context. All your console tools no longer work well or at all. If they show up in URLs you need to deoptimize them for transport.
  You give up a lot of convenience for this optimization. You should be absolutely sure your design requires it before using it.
dumbledorf 6 hours ago
Wait how is sqlite doing a million inserts a second?
[-]
- kg 5 hours ago
  sqlite is really fast. I'm surprised it's only a million.
  [-]
- JSR_FDED 4 hours ago
  In batches
- smitty1e 5 hours ago
  ':memory:'
  https://sqlite.org/inmemorydb.html
  [-]
  - KPGv2 58 minutes ago
    Except this source code is not using :memory: The linked source code has
```
    (defonce db
      (d/init-db! "db/db.db"
        {:pool-size 4 :pragma {:synchronous "FULL"}}))
```
    That's writing to disk.