Understanding Clojure's Persistent Vectors, pt. 1 (2013)

(hypirion.com)

77 points | by mirzap 4 days ago

6 comments

  • gleenn 4 hours ago
    IMHO, this is one of the coolest aspects of Clojure: the data structures were designed to be immutable as efficiently as possible. This allows for some of the most critical aspects of the language to function. If you're whole program passes around values that are immutable, you don't ever have to worry about which part of the code owns what. Rust is such a different language but the borrow-checker is solving a nearly identical problem: who owns this memory and who is allowed to change it? In Rust, one owner ever gets to write to a piece of memory and you can move that around. It catches lots of bugs and is extremely efficient, but it is definitely a difficult aspect of the language to master. Clojure just says everyone gets to read and share everything, because every mutation is a copy. One piece of your codebase can never stomp on or free some memory that another part is using. Big tradeoffs with garbage collection but geeze does it make reasoning about Clojure programs SO much easier. And the cornerstone of that idea is fast, immutable data structures by default. Maps are implemented similarly in that they are fast to make mutated copies with structural sharing to save memory.
    • rienbdj 2 hours ago
      There are also atomic references when mutability is the best approach. This is forcing single writer but in the API rather than as a proof.
  • bjoli 25 minutes ago
    He went on to implement https://github.com/hypirion/c-rrb Which are just like clojures vectors but has fast insertions/deletes and merges.

    I semi-ported it to c# here: https://github.com/bjoli/RrbList/tree/main/src/Collections

    It is faster than clojures vectors (running on the JVM, so apples and cucumbers) in all cases, and mostly beats scala's vectors except for splitting which is crazy fast in scala's vectors).

  • a-french-anon 1 hour ago
    If you want to go further, the HAMT designed by Bagwell was apparently improved upon by Steindorfer to give CHAMP. See this FSet MR and the paper link within for more: https://gitlab.common-lisp.net/fset/fset/-/merge_requests/1
  • kunley 29 minutes ago
    Would be cool to have recent findings on persistent data structures via the lenses of cache locality
    • bjoli 23 minutes ago
      I know people using ref counting to support using allocation arenas for immutable structures. For some workloads that gives a pretty crazy performance boost.

      Just pre-allocating leaf nodes can reduce iteration overhead by 40%.

  • goldfishgold 6 hours ago
    Scala has a similar data structure https://www.scala-lang.org/api/2.13.6/scala/collection/immut.... Something was in the air in 2013.
    • swannodette 5 hours ago
      Clarifying Clojure had them in 2007. Scala implementations were inspired by Clojure’s. Both Odersky and Bagwell have given credit to Hickey.
    • bjoli 32 minutes ago
      Those are not really the same. Those are N=32 finger trees which have extra benefits (quick slices, for example, quicker insertions).
    • pjmlp 2 hours ago
      Real World Haskell was published in 2008, followed by Real World OCaml in 2013.

      Scala got introduced in 2004, with the first Programming in Scala book, in 2008.

      HN had plenty of PureScript and Elm.

      FP finally was going to get their spotlight, and then mainstream languages got enough FP features to keep being the go to tooling.

    • stingraycharles 6 hours ago
      It’s not a secret that they came from the same zeitgeist but took wildly different approaches, but inspired each other.

      I still don’t understand why they’re referred to as persistent vectors rather than immutable vectors, but I digress.

      • lemming 5 hours ago
        I still don’t understand why they’re referred to as persistent vectors rather than immutable vectors, but I digress.

        I believe that immutable just means, well, immutable, but persistent means that updates are achieved via structural sharing, so they’re efficient.

        • whateveracct 4 hours ago
          structural sharing = log n updates

          if you think immutable updates are O(n) in 2026, you're so far behind the curve it's laughable

          it's crazy how many ppl i interview just stop thinking and insist you can't do better than O(n)

      • mrkeen 2 hours ago
        Because standard libraries in mainstream languages are name-squatting on 'immutable' pretty hard.

        You wanted 2+1 to yield 3, but instead you get a runtime exception telling you that 2 can't be changed.

    • whateveracct 4 hours ago
      okasaki came first

      haskell too had them (IntMap honestly works fine in that use case)

      • Sesse__ 51 minutes ago
        Yeah, “Purely Functional Data Structures” came out in 1996! I read most of it recently, when I needed a C++ copy-on-write hash map. It was still fairly relevant (although the “obvious” solution of a HAMT was also the one we decided on).
  • MORPHOICES 2 hours ago
    [dead]