Jepsen: MariaDB Galera Cluster 12.1.2

(jepsen.io)

60 points | by aphyr 4 hours ago

4 comments

  • taneliv 1 hour ago
    While Jepsen (and this article) is focused on behavior under node failure and network partitions, this caught my eye:

    > It also exhibits Stale Read, Lost Update, and other forms of G-single in healthy clusters

    This looks like quite a fundamental issue.

  • linsomniac 3 hours ago
    I really like glaera for low volume clustering, because of the true multi-master nature. I've been using it for over a decade on a clustered mail server for storing account information, and more recently I've pumped the log information in there so each user can see their related log messages, for a user base of around 6,000 users, and it's been a real workhorse.
  • constructrurl 3 hours ago
    The finding that Galera's consistency guarantees can degrade below Read Uncommitted under faults is pretty striking. The P4 lost update anomaly happening even in a healthy cluster with no faults injected is the part that should really worry people - it means the issue isn't just about crash recovery, it's about the fundamental replication protocol.

    I wonder how many production Galera deployments are running with innodb_flush_log_at_trx_commit=0 as recommended in the docs, silently exposed to the MDEV-38974 coordinated crash data loss. That's the kind of default that works fine until the one day your datacenter loses power and you discover your "instantly replicated" writes were never actually durable.

  • linsomniac 3 hours ago
    I realize that we like to use the page title here on HN, but this really should be something like "Data loss cases with MariaDB Glaera Cluster 12.1.2".
    • hu3 3 hours ago
      One of the reasons is that this kind of title editorialization fosters generic commentaries in reaction to titles.