21 comments

  • rs545837 11 hours ago
    Some context on the validation so far: Elijah Newren, who wrote git's merge-ort (the default merge strategy), reviewed weave and said language-aware content merging is the right approach, that he's been asked about it enough times to be certain there's demand, and that our fallback-to-line-level strategy for unsupported languages is "a very reasonable way to tackle the problem." Taylor Blau from the Git team said he's "really impressed" and connected us with Elijah. The creator of libgit2 starred the repo. Martin von Zweigbergk (creator of jj) has also been excited about the direction. We are also working with GitButler team to integrate it as a research feature.

    The part that's been keeping me up at night: this becomes critical infrastructure for multi-agent coding. When multiple agents write code in parallel (Cursor, Claude Code, Codex all ship this now), they create worktrees for isolation. But when those branches merge back, git's line-level merge breaks on cases where two agents added different functions to the same file. weave resolves these cleanly because it knows they're separate entities. 31/31 vs git's 15/31 on our benchmark.

    Weave also ships as an MCP server with 14 tools, so agents can claim entities before editing, check who's touching what, and detect conflicts before they happen.

    • deckar01 11 hours ago
      Does this actually matter for multi-agent use cases? Surely people that are using swarms of AI agents to write code are just letting them resolve merge conflicts.
      • vidarh 2 hours ago
        I'm running agents doing merges right now, and yes and no. They can resolve merges, but it often takes multiple extra rounds. If you can avoid that more often it will definitely save both time and money.
      • rs545837 10 hours ago
        So that you don't feel that I am biased about my thing but just giving more context that it's not just me, its actually people saying on twitter how often the merging breaks when you are running production level code and often merging different branches.

        https://x.com/agent_wrapper/status/2026937132649247118 https://x.com/omega_memory/status/2028844143867228241 https://x.com/vincentmvdm/status/2027027874134343717

        • deckar01 10 hours ago
          Those users all work for companies that sell AI tools. And the first one literally says they let AI fix merge conflicts. The second one is in a thread advocating for 0 code review (which this can’t guarantee) (and also ew). The third is also saying to just have another bot handle merging.
          • rs545837 10 hours ago
            Thanks a lot for the fair criticism, Appreciate it! You're right that those links aren't the strongest evidence. The real argument isn't "people are complaining on twitter." It's just much simpler when two agents add different functions to the same file, where git creates a conflict that doesn't need to exist. Weave just knows they're separate entities and merges cleanly. Whether you let AI resolve the false conflict or avoid it entirely is a design choice, we think avoiding it is better.
            • deckar01 10 hours ago
              Dear god, it’s bots all the way down.
              • rs545837 10 hours ago
                What do you mean?
                • deckar01 9 hours ago
                  It’s your GitHub profile. It looks suspiciously just like the other 10 GitHub users that have been spamming AI generated issues and PRs for the last 2 weeks. They always go quiet eventually. I suspect because they are violating GitHub’s ToS, but maybe they just run out of free tokens.
                  • rs545837 9 hours ago
                    Thanks again for criticising, so tackling each of your comment:

                    GitHub’s ToS, because you suspect, so I can help you understand them.

                    > What violates it:

                            1. Automated Bulk issues/PRs, that we don't own
                            2. Fake Stars or Engagement Farming
                            3. Using Bot Accounts.
                    
                    We own the repo, there's not even a single fake star, I don't even know how to create a bot account lol.

                    > Scenario when we run out of free tokens.

                    Open AI and Anthropic have been sponsoring my company with credits, because I am trying to architect new software post agi world, so if I run out I will ask them for more tokens.

                    • deckar01 9 hours ago
                      And you are opening issues on projects trying to get them to adopt your product. Seems like spam to me. How much are you willing to spend maintaining this project if those free tokens go away?
                      • rs545837 9 hours ago
                        When you're just a normal guy genuinely trying to build something great and there's nobody who believes in you yet, the only thing you can do is go to projects you admire and ask "would this help you?" Patrick Collison did the same thing early on, literally taking people's laptops to install Stripe.
                  • Palanikannan 9 hours ago
                    https://github.com/Ataraxy-Labs/weave/pull/11

                    Dude did you just call me AI generated haha, i've been actively using weave for a gui I've been building for blazingly fast diffs

                    https://x.com/Palanikannan_M/status/2022190215021126004

                    So whenever I run into bugs I patched locally in my clone, I try to let the clanker raise a pr upstream, insane how easy things are now.

                    • deckar01 9 hours ago
                      I think you accidentally switched accounts.
                      • rs545837 8 hours ago
                        Nope that's other user, he has been working with me on weave, check the PRs that you are calling AI generated.
    • kubb 7 hours ago
      Congrats on getting acknowledged by people with credibility.

      I also think that this approach has a lot of potential. Keep up the good work sir.

      • rs545837 6 hours ago
        Thanks a lot! Appreciate it.
  • gritzko 11 hours ago
    At this point, the question is: why keep files as blobs in the first place. If a revision control system stores AST trees instead, all the work is AST-level. One can run SQL-level queries then to see what is changing where. Like

      - do any concurrent branches touch this function?
      - what new uses did this function accrete recently?
      - did we create any actual merge conflicts?
    
    Almost LSP-level querying, involving versions and branches. Beagle is a revision control system like that [1]

    It is quite early stage, but the surprising finding is: instead of being a depository of source code blobs, an SCM can be the hub of all activities. Beagle's architecture is extremely open in the assumption that a lot of things can be built on top of it. Essentially, it is a key-value db, keys are URIs and values are BASON (binary mergeable JSON) [2] Can't be more open than that.

    [1]: https://github.com/gritzko/librdx/tree/master/be

    [2]: https://github.com/gritzko/librdx/blob/master/be/STORE.md

    • rs545837 11 hours ago
      This is the right question. Storing ASTs directly would make all of this native instead of layered on top.

      The pragmatic reason weave works at the git layer: adoption. Getting people to switch merge drivers is hard enough, getting them to switch VCS is nearly impossible. So weave parses the three file versions on the fly during merge, extracts entities, resolves per-entity, and writes back a normal file that git stores as a blob. You get entity-level merging without anyone changing their workflow.

      But you're pointing at the ceiling of that approach. A VCS that stores ASTs natively could answer "did any concurrent branches touch this function?" as a query, not as a computation. That's a fundamentally different capability. Beagle looks interesting, will dig into the BASON format.

      We built something adjacent with sem (https://github.com/ataraxy-labs/sem) which extracts the entity dependency graph from git history. It can answer "what new uses did this function accrete" and "what's the blast radius of this change" but it's still a layer on top of git, not native storage.

    • zokier 5 hours ago
      > At this point, the question is: why keep files as blobs in the first place. If a revision control system stores AST trees instead, all the work is AST-level.

      The problem is that disks (and storage in general) store only bytes so you inherently need to deal with bytes at some point. You could view source code files as the serialization of the AST (or other parse tree).

      This is especially apparent with LISPs and their sexprs, but equally applies to other languages too.

    • pfdietz 9 hours ago
      Well, if you're programming in C or C++, there may not be a parse tree. Tree-sitter makes a best effort attempt to parse but it can't in general due to the preprocessor.
      • rs545837 9 hours ago
        Great point. C/C++ with macros and preprocessor directives is where tree-sitter's error recovery gets stretched. We support both C and C++ in sem-core(https://github.com/Ataraxy-Labs/sem) but the entity extraction is best-effort for heavily macro'd code. For most application-level C++ it works well, but something like the Linux kernel would be rough. Honestly that's an argument for gritzko's AST-native storage approach where the parser can be more tightly integrated.
        • pfdietz 4 hours ago
          It's an argument against preprocessors for programming languages.

          Tree-sitter's error handling is constrained by its intended use in editors, so incrementality and efficiency are important. For diffing/merging, a more elaborate parsing algorithm might be better, for example one that uses an Earley/CYK-like algorithm but attempts to minimize some error term (which a dynamic programming algorithm could be naturally extended to.)

    • orbisvicis 5 hours ago
      That's a really good point. I'm not familiar with Unison, but I think that's the idea behind the language?

      https://www.unison-lang.org/

      • rs545837 5 hours ago
        This is actually cool, gonna check it out.
    • samuelstros 8 hours ago
      How do you get blob file writes fast?

      I built lix [0] which stores AST’s instead of blobs.

      Direct AST writing works for apps that are “ast aware”. And I can confirm, it works great.

      But, the all software just writes bytes atm.

      The binary -> parse -> diff is too slow.

      The parse and diff step need to get out of the hot path. That semi defeats the idea of a VCS that stores ASTs though.

      [0] https://github.com/opral/lix

      • gritzko 7 hours ago
        I only diff the changed files. Producing blob out of BASON AST is trivial (one scan). Things may get slow for larger files, e.g. tree-sitter C++ parser is 25MB C file, 750KLoC. Takes couple seconds to import it. But it never changes, so no biggie.

        There is room for improvement, but that is not a show-stopper so far. I plan round-tripping Linux kernel with full history, must show all the bottlenecks.

        P.S. I checked lix. It uses a SQL database. That solves some things, but also creates an impedance mismatch. Must be x10 slow down at least. I use key-value and a custom binary format, so it works nice. Can go one level deeper still, use a custom storage engine, it will be even faster. Git is all custom.

      • rs545837 8 hours ago
        This is exactly a reason why weave stays on top of git instead of replacing storage. Parsing three file versions at merge time is fine (was about 5-67ms). Parsing on every read/write would be a different story. I know about Lix, but will check it out again.
    • jerf 36 minutes ago
      Everything on a disk ends up as a linear sequence of bytes. This is the source of the term "serialization", which I think is easy to hear as a magic word without realizing that it is actually telling you something important in its etymology: It is the process of taking an arbitrary data structure and turning it into something that can be sent or stored serially, that is, in an order, one bit at a time if you really get down to it. To turn something into a file, to send something over a socket, to read something off a sheet of paper to someone else, it has to be serialized.

      The process of taking such a linear stream and reconstructing the arbitrary data structure used to generate it (or, in more sophisticated cases, something related to it if not identical), is deserialization. You can't send anyone a cyclic graph directly but you can send them something they can deserialize into a cyclic graph if you arrange the serialization/deserialization protocol correctly. They may deserialize it into a raw string in some programming language so they can run regexes over it. They may deserialize it into a stream of tokens. This all happens from the same source of serialized data.

      So let's say we have an AST in memory. As complicated as your language likes, however recursive, however cross-"module", however bizarre it may be. But you want to store it on a disk or send it somewhere else. In that case it must be serialized and then deserialized.

      What determines what the final user ends up with is not the serialization protocol. What determines what the final user ends up with is the deserialization procedure they use. They may, for instance, drop everything except some declaration of what a "package" is if they're just doing some initial scan. They may deserialize it into a compiler's AST. They may deserialize it into tree sitter's AST. They may deserialize it into some other proprietary AST used by a proprietary static code analyzer with objects designed to not just represent the code but also be immediately useful in complicated flow analyses that no other user of the data is interested in using.

      The point of this seemingly rambling description of what serialization is is that

      "why keep files as blobs in the first place. If a revision control system stores AST trees instead"

      doesn't correspond to anything actionable or real. Structured text files are already your programming language's code stored as ASTs. The corresponding deserialization format involves "parsing" them, which is a perfectly sensible and very, very common deserialization method. For example, the HTML you are reading was deserialized into the browser's data structures, which are substantially richer than "just" an AST of HTML due to all the stuff a browser does with the HTML, with a very complicated parsing algorithm defined by the HTML standard. The textual representation may be slightly suboptimal for some purposes but they're pretty good at others (e.g., lots of regexes run against code over the years). If you want some other data structure in the consumer, the change has to happen in the code that consumes the serialized stream. There is no way to change the code as it is stored on disk to make it "more" or "less" AST-ish than it already is, and always has been.

      You can see that in the article under discussion. You don't have to change the source code, which is to say, the serialized representation of code on the disk, to get this new feature. You just have to change the deserializer, in this case, to use tree sitter to parse instead of deserializing into "an array of lines which are themselves just strings except maybe we ignore whitespace for some purposes".

      Once you see the source code as already being an AST, it is easy to see that there are multiple ways you could store it that could conceivably be optimized for other uses... but nothing you do to the serialization format is going to change what is possible at all, only adjust the speed at which it can be done. There is no "more AST-ish" representation that will make this tree sitter code any easier to write. What is on the disk is already maximally "AST-ish" as it is today. There isn't any "AST-ish"-ness being left on the table. The problem was always the consumers, not the representation.

      And as far as I can tell, it isn't generally the raw deserialization speed nowadays that is the problem with source code. Optimizing the format for any other purpose would break the simple ability to read it is as source code, which is valuable in its own right. But then, nothing stops you from representing source code in some other way right now if you want... but that doesn't open up possibilities that were previously impossible, it just tweak how quickly some things will run.

    • philipallstar 3 hours ago
      You might need a bit more than ASTs, as you need code to be human-readable as well as machine-readable. Maybe CSTs?
    • handfuloflight 9 hours ago
      Well, I'll be diving in. Thank you for sharing. Same for Weave.
      • rs545837 9 hours ago
        Awesome, let me know how it goes. Happy to help if you hit any rough edges.
  • _flux 8 hours ago
    How does it compare to https://mergiraf.org/ ? I've had good experience with it so far, although I rarely even need it.

    It's also based on treesitter, but probably otherwise a more baseline algorithm. I wonder if that "entity-awareness" actually then brings something to the table in addition to the AST.

    edit: man, I tried searching this thread for mention of the tool for a few times, but apparently its name is not mergigraf

  • rohitpaulk 3 hours ago
    Saw this on Twitter a few weeks ago, interesting approach.

    The post links to the GitHub repo, but imo the website does a better job of explaining what it does: https://ataraxy-labs.github.io/weave/

  • keysersoze33 11 hours ago
    Interesting that Weave tries to solve Mergiref's shortcomings (also Tree-sitter based):

    > git merges lines. mergiraf merges tree nodes. weave merges entities. [1]

    I've been using mergiraf for ~6 months and tried to use it to resolve a conflict from multiple Claude instances editing a large bash script. Sadly neither support bash out of the box, which makes me suspect that classic merge is better in this/some cases...

    Will try adding the bash grammar to mergiraf or weave next time

    [1] https://ataraxy-labs.github.io/weave/

    • rs545837 11 hours ago
      Hey, author here. This comparison came up a lot when weave went viral on X (https://x.com/rs545837/status/2021020365376671820).

      The key difference: mergiraf matches individual AST nodes (GumTree + PCS triples). Weave matches entities (functions, classes, methods) as whole units. Simpler, faster, and conflicts are readable ("conflict in validate_token" instead of a tree of node triples).

      The other big gap: weave ships as an MCP server with 14 tools for agent coordination. Agents can claim entities before editing and detect conflicts before they merge. That's the piece mergiraf doesn't have.

      On bash: weave falls back to line-level for unsupported languages, so it'll work as well as git does there.

      Adding a bash tree-sitter grammar would unlock entity-level merge for it. But I can work on it tonight, if you want it urgently.

      Cheers,

  • ezoe 6 hours ago
    So Weave claims AI based development increase git conflict frequency.

    Given that most git conflicts are easy to solve by person who didn't involved at changes, even for a person who don't know that programming language, it's natural to let AI handle the git conflicts.

    Solving a git conflict is most often a simple text manipulation without needing much of context. I see no problem current AI models can't do it.

    • rs545837 6 hours ago
      When you start seeing the diffs with entities instead of lines, is what interests me, you get much better semantic info.

      If you have a language specific parser, you can make a merge algorithm like weave. But the bigger win isn't resolving conflicts git shows you. It's catching the ones git misses entirely. So in those cases weave is much better, and there also other things like confidence-scored conflict classification, you should try it out it improves the agents performance, especially if you are a power user.

  • 50lo 9 hours ago
    If both sides refactor the same function into multiple smaller ones (extract method) or rename it, can Weave detect that as a structural refactor, or does it become “delete + add”? Any heuristics beyond name matching?
    • rs545837 9 hours ago
      Yes, weave detects renames via structural_hash (AST-normalized hash that ignores identifier names). If both sides rename the same function, it matches by structure and merges cleanly.
      • gritzko 9 hours ago
        This will not work for refactors. In fact, any other change will break the hash. I know because I used this approach for quite some time.
        • rs545837 9 hours ago
          Thanks a lot, I will test it out as you said, in the mean time, could you also open up an issue on the repo, so that it helps me and others to track the issue.
          • gritzko 9 hours ago
            I will ask Claude to open it, thanks!
            • rs545837 9 hours ago
              Thanks, lemme know how it goes, I will review and we can discuss over the issue.
  • orbisvicis 5 hours ago
    It's still possible for two commits to conflict only semantically, one obsoleting the other. Merging both would lead to dead code so perhaps stricter (line-base or ast-based) conflicts would be preferable.
    • rs545837 5 hours ago
      You're right, that's a real risk, weave runs post-merge validation for exactly this, it checks entity dependencies after merge, so if one side obsoletes what the other side depends on, it warns you even when the textual merge is clean
  • sea-gold 13 hours ago
    Website: https://ataraxy-labs.github.io/weave/

    I haven't tried it but this sounds like it would be really valuable to me.

    • rs545837 11 hours ago
      Haha, thanks for the feedback, yeah multi agent workflows were especially kept in mind when designing this! So I hope it helps, I am always here for feedback and feature requests.
  • spacecrafter3d 10 hours ago
    Awesome, I've been wanting this for a long time! Any chance of Swift being supported?
    • rs545837 9 hours ago
      Swift isn't supported yet but adding a new language is straightforward since we use tree-sitter. There's already a tree-sitter-swift grammar. Would happily accept a PR for it, if you are down for it.
  • taejavu 9 hours ago
    I tried this with the kind of merge conflict I'd expect it to solve automatically, and it didn't. Is it supposed to work while rebasing, or is it strictly for merges?
    • rs545837 9 hours ago
      Thanks for trying it! Would love to know what the merge conflict looked like, if you can share the repo or a minimal repro, I'll dig into why it didn't resolve. That kind of feedback is exactly what helps us improve.
  • conartist6 5 hours ago
    Wow, I read your company description and I can confirm that we are competitors in the fiercest, most direct sense. We're both want to create the dev platform of the future, but our philosophies could not be any more polar opposite.

    I hate AI. I fucking hate it. It doesn't help that the people who are infatuated with it keep shitting on me endlessly. 100,000 people scream that I'm obsolete, I'm replaceable, that I'm effectively worthless without an AI by my side, without that big ol condom stretched over my brain.

    So out of spite, perhaps, I give my blood sweat and tears to my work and what I build I build for others who care so much it hurts and who live their struggles in blood and in sweat and in tears. My wrists are in so much pain that I can barely sleep at night but I don't stop coding.

    You make clear that you will fight to give AI as much advantage as it can possibly have over people. "We're done building for human hands," says your website.

    I will fight with all I have to give human hands as much advantage as they can get over AI.

    • Palanikannan 5 hours ago
      and I love hacker news, this is why I pay my internet bills for
    • rs545837 5 hours ago
      you made me emotional!
  • WalterGR 10 hours ago
    It’s described as a “merge driver for Git”. Is it usable independently of git? Can I use it to diff arbitrary files?
    • rs545837 10 hours ago
      We got asked this on the X thread too, when we went viral here https://x.com/rs545837/status/2021020365376671820. Your git doesn't change at all. Weave plugs in through git's merge driver interface (.gitattributes), so git still handles everything, it just calls weave for the content merge step instead of its default line-level algorithm. All your git commands, workflow, and history stay exactly the same.

      For diffing arbitrary files outside git, we built sem (https://github.com/ataraxy-labs/sem) which does entity-level diffs. sem diff file1.py file2.py shows you which functions changed, were added, or deleted rather than line-level changes

  • kelseydh 12 hours ago
    Very cool, would love to see Ruby support added.
    • rs545837 11 hours ago
      Thanks for the request, our team is already working on it, and infact we were going to ship ruby tonight!

      Cheers,

  • SurvivorForge 10 hours ago
    The entity-level approach is a meaningful step up from line-based merging. Anyone who has dealt with a merge conflict where git splits a function signature across conflict markers knows how much context is lost at the line level. Curious how this handles languages with significant whitespace like Python, where indentation changes can shift the semantic meaning of entire blocks.
    • hrmtst93837 8 hours ago
      I think for Python entity-level merging has to treat indentation as structural rather than cosmetic, because a shifted indent can change which block a statement belongs to. In my experience the pragmatic approach is to parse into an AST with a tolerant parser like parso or tree-sitter, perform a 3-way AST merge that matches functions and classes by name and signature, then reserialize while preserving comment and whitespace spans. The practical tradeoff is that conflicted code is often syntactically invalid, so you need error tolerant recovery or a token-level fallback that normalizes INDENT and DEDENT tokens and runs an LCS style merge on tokens when AST matching fails. I've found combining node-matching heuristics with a lightweight reindent pass cuts down the number of manual fixes, but you still get a few gnarly cases when someone renamed a symbol and moved its body in the same commit.
      • rs545837 7 hours ago
        Really appreciate the detail here, this is clearly hard-won experience. I agree that indentation is structural in Python.
    • rs545837 10 hours ago
      Thanks for commenting, Good question. Python was one of the trickier ones to get right. Tree-sitter parses the full AST including indentation structure, so weave knows that an indented block belongs to its parent class or function. When merging, it matches entities by name and reconstructs with the original indentation preserved.

      We also handle Python class merge specifically: if both sides add different methods to the same class, weave merges them as separate inner entities rather than treating the whole class as one conflicting block. The indentation is derived from the AST structure, not from line diffing, so it can't accidentally shift a method out of its class scope.

  • Palanikannan 10 hours ago
    Dude, I tried this for a huge merge conflict and was able to auto resolve so much and came across sem for giving my agents context on what changed for reviewing some code and surprisingly, I feel git is done for good. Much less tokens, and much more accurate, I can see something big out of this, things like weave come once in a century
    • Erlangen 8 hours ago
      You have write access to this git repo as a first time contributor. You are likely the owner of this repo as well.

      You made a pull request not from your own fork, but from a separate branch, https://github.com/Ataraxy-Labs/weave/pull/9

      • duskdozer 8 hours ago
        He also was pushing code to a branch on the same repo three weeks ago: https://github.com/Ataraxy-Labs/weave/commits/fix/utf8-panic... but is acting in this post like he just discovered the project.
        • rs545837 8 hours ago
          Yup he works at Plane, and he used weave in merge conflicts there which helped him, he just told about it. He has been contributing and using weave since then.
      • rs545837 8 hours ago
        Yeah he is the other contributor to the repo.
    • rs545837 9 hours ago
      Haha appreciate the love man! Still early days but the fact that entity-level context cuts tokens that much validates the whole thesis. Glad it's working for you, keep the feedback coming.
    • CollinEMac 8 hours ago
      >I feel git is done for good

      I'm either not understanding your comment or not understanding the project. Isn't this built on top of git?

      • rs545837 8 hours ago
        I think he was just being sarcastic, git's so amazing it can never be done for good. Yeah it's just a merge driver that will sit on top of git.
  • rkagerer 6 hours ago
    No C#?
    • rs545837 6 hours ago
      C# is supported! It goes through sem-core's(the underlying library for parsing we use in Weave) tree-sitter-c-sharp plugin. Classes, methods, interfaces, enums, structs are all extracted with it. Let me know if you hit anything.
  • alkonaut 7 hours ago
    > This happens constantly when multiple AI agents work on the same codebase

    What?

    Is the idea of "multiple agents" of flesh and blood writing code that far fetched now?

    • rs545837 6 hours ago
      I meant when they each work on a separate branch and merge back, you get the similar kinds of conflicts, where a bunch of them should not even be a conflict, so weave is trying to solve it.
      • alkonaut 3 hours ago
        I still don't understand how it makes a difference if the agent is a human or a bot?
        • vidarh 53 minutes ago
          There's no fundamental difference, but in practice the difference is one of frequency.

          E.g. I sometimes have 10+ agents kicking off changes to the same small project at the same time, and likely to all want to merge within the next 15-30 minutes. At that rate, merge conflicts happens very frequently. The agents can mostly resolve them themselves, but it wastes tokens, and time.

  • esafak 12 hours ago
    Are agents any good with it?
    • rs545837 11 hours ago
      Yes I designed it for agents especially, there's also weave mcp that I built that you can checkout.

      The good part is that this research extends really good for code review because tracking entities is more semantically rich than lines.