This is one of the (several?) things that make me very worried about Rust long-term. I love the language, and reach for it even when it sometimes isn't the most appropriate thing. But reading some of the made-up syntax in the "Removing Coherence" section makes my head hurt.
When I used to write Scala, I accepted the fact that I don't have a background in type/set/etc. theory, and that there were some facets of the language that I'd probably never understand, and some code that others had written that I'd probably never understand.
With a language like Rust, I feel like we're getting there. Certain GAT syntxes sometimes take some time for me to wrap my head around when I encounter them. Rust feels like it shouldn't be a language where you need to have some serious credentials to be able to understand all its features and syntax.
On the other end we have Go, which was explicitly designed to be easy to learn (and, unrelatedly, I don't like for quite a few reasons). But I was hoping that we could have a middle ground here, and that Rust could be a fully-graspable systems-level language.
Then again, for more comparison, I haven't used C++ since before they added lambdas. I wonder if C++ has some hairy concepts and syntax today on par with Rust's more difficult parts.
Then you just implement Serialize and Deserialize for TypeWithDifferentSerialization.
This cover most occasional cases where you need to work around the orphan rule. And semantically, it's pretty reasonable: If a type behaves differently, then it really isn't the same type.
The alternative is to have a situation where you have library A define a data type, library B define an interface, and library C implement the interface from B for the type from A. Very few languages actually allow this, because you run into the problem where library D tries to do the same thing library C did, but does it differently. There are workarounds, but they add complexity and confusion, which may not be worth it.
The gotcha is what happens when TypeWithSomeSerialization is not something you’re using directly but is contained within SomeOtherTypeWithSomeSerialization which you are using directly. Then things get messy.
I don't think explicit naming of impls is wise. They will regularly be TraitImpl or similar and add no real value. If you want to distinguish traits, perhaps force them to be within separate modules and use mod_a::mod_b::<Trait for Type> syntax.
> An interesting outcome of removing coherence and having trait bound parameters is that there becomes a meaningful difference between having a trait bound on an impl or on a struct:
I don't think fully-qualified paths are enough on their own. You also need some way to designate that an impl is symbolically unique and has to be referred to by path. Otherwise, you still end up with a problem where the compiler doesn't know which implementation to use unless you precisely name it.
You depend on crates A and B. A impls Foo for Bar. You pass an instance of Bar to a function that accepts `impl Foo`. You are happy. Later crate B adds an impl of Foo for Bar. Clearly _at least_ one of these must be an orphan impl, but both could be. Suddenly it's ambiguous which implementation of Foo you're talking about, so you break because B added an impl.
There are many potential problems of this flavor with letting any `impl Trait for Type` be an orphan impl and then referenced by path. What happens, for example, if an impl that was an orphan impl in one version of A becomes a coherent impl in a later version of A?
I think there has to be special syntax for named/path-referenced/symbolic impls, even if the impl does not have an identifier name, so that the compiler can know "this impl only resolves if you tell me _specifically this impl_" and the impl provider has a way to create a solid consumer contract about how to use that impl in particular.
Also, not having an identifier name would mean you can't have different impls of Foo for Bar in the same module. That's probably not a limitation anyone would care about, but it's there.
Note the use case - someone wants to have the ability to replace a base-level crate such as serde.
When something near the bottom needs work, should there be a process for fixing it, which is a people problem? Or should there be a mechanism for bypassing it, which is a technical solution to a people problem? This is one of the curses of open source.
The first approach means that there will be confrontations which must be resolved. The second means a proliferation of very similar packages.
This is part of the life cycle of an open source language. Early on, you don't have enough packages to get anything done, and are grateful that someone took the time to code something.
Then it becomes clear that the early packages lacked something, and additional packages appear. Over time, you're drowning in cruft. In a previous posting, I mentioned ten years of getting a single standard ISO 8601 date parser adopted, instead of six packages with different bugs.
Someone else went through the same exercise with Javascript.
Go tends to take the first approach, while Python takes the second. One of Go's strengths is that most of the core packages are maintained and used internally by Google. So you know they've been well-exercised.
Between Github and AI, it's all too easy to create minor variants of packages. Plus we now have package supply chain attacks. Curation has thus become more important. At this point in history, it's probably good to push towards the first approach.
This is interesting but I wonder if you would accept that this also has the downside of moving at the speed of humans.
In a situation where you're building, I find the orphan rule frustrating because you can be stuck in a situation where you are unable to help yourself without forking half of the crates in the ecosystem.
Looking for improvements upstream, even with the absolute best solutions for option 1, has the fundamental downside that you can't unstick yourself.
This is also where I find it surprising that this article doesn't mention Scala at all. There are MANY UX/DX challenges with the implicit and witness system in Scala, so I would never guess suggest it directly, but never have I felt more enabled to solve my own problems in a language (and yes the absolute most complex, Haskell-in-Scala libraries can absolutely an impediment to this).
With AI this pace difference is even more noticeable.
I do think that the way that Scala approaches this by using imports historically was quite interesting. Using a use statement to bring a trait definition into scope isn't discussed in any of these proposals I think?
The problem is existentials, or rather the existence of existentials without the ability to explicitly override them. Even in Haskell, overriding typeclass instances requires turning off orphan checks, which is a rather large hammer.
So once you've identified this, now you might consider the universe of possible solutions to the problem. One of those solutions might be removing existentials from your language; think about how Scala would work if implicits were removed (I haven't used Scala 3, maybe this happened?). Another solution might be to decouple the whole concept of "existential implementations of typed extension points" from libraries (or crates, or however you compile and distribute code), and require bringing instances into scope via imports or similar.
Two things are true for sure, though: libraries already depend on the current behavior, whether that makes sense or not; and forcing users to understand coherence (which instance is used by which code) is almost always a giant impediment to getting users to like your language. Hence, "orphan rules", and why everyone hates Scala 2 implicits.
That was my first thought! I never had this problem with Scala (2.x for me, but I guess there's similar syntax/concepts in 3).
The article author does talk about naming trait impls and how to use them at call sites, but never seems to consider the idea that you could import a trait impl and use it everywhere within that scope, without extra onerous syntax.
Does this still solve the "HashMap" problem though? I guess it depends on when the named impl "binds". E.g. the named Hash impl would have to bind to the HashMap itself at creation, not at calls to `insert()` or `get()`. Which... seems like a reasonable thing?
> When something near the bottom needs work, should there be a process for fixing it, which is a people problem? Or should there be a mechanism for bypassing it, which is a technical solution to a people problem?
I don't think it's a people problem in the way we usually talk about the folly of creating technical solutions to people problems.
If something like serde is foundational, you simply can't radically change it without causing problems for lots and lots of people. That's a technical problem, not a people problem, even if serde needs radical change in order to evolve in the ways it needs to.
But sure, ok, let's imagine that wasn't the case. Let's say some new group of people decide that serde is lacking in some serious way, and they want to implement their changes. They can even do so without breaking compatibility with existing users of the crate. But the serde maintainers don't see the same problems; in fact, they believe that what this new group wants to do will actively cause more problems.
Neither group of people even needs to be right or wrong. Maybe both ways have pluses and minuses, and choosing just depends on what trade offs you value more. Neither group is wrong about wanting to either keep the status quo or make changes.
This is actually a technical problem: we need to find a way to allow both approaches coexist, without causing a ton of work for everyone else.
And even if we do run into situations where things need fixing, and things not getting fixed is a people problem, I'd argue for this particular sort of thing it's not only appropriate but essential that we have technical solutions to bypass the people problems. I mean, c'mon. People are people. People are going to be stubborn and not want change. Ossification is a real thing, and I think it's a rare project/organization that's able to avoid it. Sure, we could refuse to use technical workarounds when it's people we need to change, but in so many cases, that's just running up against a brick wall, over and over. Why do that to ourselves? Life is too short.
Having said that, I totally agree that there are situations where technical workarounds to people problems can be incredibly counter-productive, and cause more problems than they solve (like, "instead of expecting people to actually parent their kids, force everyone to give up their privacy for mandatory age verification; think of the children!"). But I don't think this is one of them.
there is no good way to handle colliding implementations. Both from parallel crates and due to changes over time.
Without it you can have many many additional forms of breakage. Worse you can have "new" breakage between two 3rd party crates without either of them changing due to some impl in a common ancestor changing (e.g. std) and this affecting two wild card implementations in each, now leading to an overlap.
When you have an overlap there are two options:
- fail compilation, but as mentioned this could be caused by a non breaking change in std in two in theory unrelated 3rd party dependencies
- try to choose one of the implementations. But that now gets very messy in multiple points: a) Which impl. to choose when. b) The user knowing which is chosen. c) Overlap with interactions with stuff like double dispatch, thread local variables, and in general side effects. The issues here are similar to specialization (and part why that is stuck in limbo), but a magnitude more complex as specialization is only (meant) for optimizations, while this can be deeply different behavior. Like `foo.bar()` with the same `use Bar as _;` might in one context return an `u32` and in another a `String`
In many other ecosystems it's not uncommon to run into having issues where certain libraries can't be used together at all. In rust that is close to not a thing (no_mange collisions and C dependencies are the only exception I can think of).
Similar, in my experience the likely hood of running into unintended breaking changes is lower in the rust ecosystem then e.g. python or js, that is partially due to coherence rules forcing a more clean design.
Also people are forced to have a somewhat clean dependency tree between crates in ways not all languages requires. This can help with incremental builds and compiler time, a area rust needs any help it can get. (As a side note, clean dependency structures in your modules can (sometimes) help will rust better parallelizing code gen, too.)
So overall it I think it's good.
Through it can be very annoying. And there is some potential for improvement in many ways.
---
EDIT: sorry some keyboard fat-fingering somehow submitted a half written response without me pressing enter...
I will never stop hating on the orphan rule, a perfect summary of what’s behind a lot of rust decisions. Purism and perfectionism at the cost of making a useful language, no better way to torpedo your ecosystem and make adding dependencies really annoying for no reason. Like not even a —dangerously-disable-the-orphan-rule, just no concessions here.
I think there are legitimate criticisms of Rust that fall in this category, but the orphan rule ain’t it.
In most other languages, it is simply not possible to “add” an interface to a class you don’t own. Rust let’s you do that if you own either the type or or the interface. That’s strictly more permissive than the competition.
The reasons those other languages have for not letting you add your interface to foreign types, or extend them with new members, are exactly the same reasons that Rust has the orphan rule.
When I used to write Scala, I accepted the fact that I don't have a background in type/set/etc. theory, and that there were some facets of the language that I'd probably never understand, and some code that others had written that I'd probably never understand.
With a language like Rust, I feel like we're getting there. Certain GAT syntxes sometimes take some time for me to wrap my head around when I encounter them. Rust feels like it shouldn't be a language where you need to have some serious credentials to be able to understand all its features and syntax.
On the other end we have Go, which was explicitly designed to be easy to learn (and, unrelatedly, I don't like for quite a few reasons). But I was hoping that we could have a middle ground here, and that Rust could be a fully-graspable systems-level language.
Then again, for more comparison, I haven't used C++ since before they added lambdas. I wonder if C++ has some hairy concepts and syntax today on par with Rust's more difficult parts.
https://tartanllama.xyz/posts/cpp-initialization-is-bonkers/
Let's say you have one library with:
And you want to define a custom serialization. In this case, you can write: Then you just implement Serialize and Deserialize for TypeWithDifferentSerialization.This cover most occasional cases where you need to work around the orphan rule. And semantically, it's pretty reasonable: If a type behaves differently, then it really isn't the same type.
The alternative is to have a situation where you have library A define a data type, library B define an interface, and library C implement the interface from B for the type from A. Very few languages actually allow this, because you run into the problem where library D tries to do the same thing library C did, but does it differently. There are workarounds, but they add complexity and confusion, which may not be worth it.
> An interesting outcome of removing coherence and having trait bound parameters is that there becomes a meaningful difference between having a trait bound on an impl or on a struct:
This seems unfortunate to me.
You depend on crates A and B. A impls Foo for Bar. You pass an instance of Bar to a function that accepts `impl Foo`. You are happy. Later crate B adds an impl of Foo for Bar. Clearly _at least_ one of these must be an orphan impl, but both could be. Suddenly it's ambiguous which implementation of Foo you're talking about, so you break because B added an impl.
There are many potential problems of this flavor with letting any `impl Trait for Type` be an orphan impl and then referenced by path. What happens, for example, if an impl that was an orphan impl in one version of A becomes a coherent impl in a later version of A?
I think there has to be special syntax for named/path-referenced/symbolic impls, even if the impl does not have an identifier name, so that the compiler can know "this impl only resolves if you tell me _specifically this impl_" and the impl provider has a way to create a solid consumer contract about how to use that impl in particular.
Also, not having an identifier name would mean you can't have different impls of Foo for Bar in the same module. That's probably not a limitation anyone would care about, but it's there.
When something near the bottom needs work, should there be a process for fixing it, which is a people problem? Or should there be a mechanism for bypassing it, which is a technical solution to a people problem? This is one of the curses of open source. The first approach means that there will be confrontations which must be resolved. The second means a proliferation of very similar packages.
This is part of the life cycle of an open source language. Early on, you don't have enough packages to get anything done, and are grateful that someone took the time to code something. Then it becomes clear that the early packages lacked something, and additional packages appear. Over time, you're drowning in cruft. In a previous posting, I mentioned ten years of getting a single standard ISO 8601 date parser adopted, instead of six packages with different bugs. Someone else went through the same exercise with Javascript.
Go tends to take the first approach, while Python takes the second. One of Go's strengths is that most of the core packages are maintained and used internally by Google. So you know they've been well-exercised.
Between Github and AI, it's all too easy to create minor variants of packages. Plus we now have package supply chain attacks. Curation has thus become more important. At this point in history, it's probably good to push towards the first approach.
In a situation where you're building, I find the orphan rule frustrating because you can be stuck in a situation where you are unable to help yourself without forking half of the crates in the ecosystem.
Looking for improvements upstream, even with the absolute best solutions for option 1, has the fundamental downside that you can't unstick yourself.
With AI this pace difference is even more noticeable.
I do think that the way that Scala approaches this by using imports historically was quite interesting. Using a use statement to bring a trait definition into scope isn't discussed in any of these proposals I think?
So once you've identified this, now you might consider the universe of possible solutions to the problem. One of those solutions might be removing existentials from your language; think about how Scala would work if implicits were removed (I haven't used Scala 3, maybe this happened?). Another solution might be to decouple the whole concept of "existential implementations of typed extension points" from libraries (or crates, or however you compile and distribute code), and require bringing instances into scope via imports or similar.
Two things are true for sure, though: libraries already depend on the current behavior, whether that makes sense or not; and forcing users to understand coherence (which instance is used by which code) is almost always a giant impediment to getting users to like your language. Hence, "orphan rules", and why everyone hates Scala 2 implicits.
The article author does talk about naming trait impls and how to use them at call sites, but never seems to consider the idea that you could import a trait impl and use it everywhere within that scope, without extra onerous syntax.
Does this still solve the "HashMap" problem though? I guess it depends on when the named impl "binds". E.g. the named Hash impl would have to bind to the HashMap itself at creation, not at calls to `insert()` or `get()`. Which... seems like a reasonable thing?
I don't think it's a people problem in the way we usually talk about the folly of creating technical solutions to people problems.
If something like serde is foundational, you simply can't radically change it without causing problems for lots and lots of people. That's a technical problem, not a people problem, even if serde needs radical change in order to evolve in the ways it needs to.
But sure, ok, let's imagine that wasn't the case. Let's say some new group of people decide that serde is lacking in some serious way, and they want to implement their changes. They can even do so without breaking compatibility with existing users of the crate. But the serde maintainers don't see the same problems; in fact, they believe that what this new group wants to do will actively cause more problems.
Neither group of people even needs to be right or wrong. Maybe both ways have pluses and minuses, and choosing just depends on what trade offs you value more. Neither group is wrong about wanting to either keep the status quo or make changes.
This is actually a technical problem: we need to find a way to allow both approaches coexist, without causing a ton of work for everyone else.
And even if we do run into situations where things need fixing, and things not getting fixed is a people problem, I'd argue for this particular sort of thing it's not only appropriate but essential that we have technical solutions to bypass the people problems. I mean, c'mon. People are people. People are going to be stubborn and not want change. Ossification is a real thing, and I think it's a rare project/organization that's able to avoid it. Sure, we could refuse to use technical workarounds when it's people we need to change, but in so many cases, that's just running up against a brick wall, over and over. Why do that to ourselves? Life is too short.
Having said that, I totally agree that there are situations where technical workarounds to people problems can be incredibly counter-productive, and cause more problems than they solve (like, "instead of expecting people to actually parent their kids, force everyone to give up their privacy for mandatory age verification; think of the children!"). But I don't think this is one of them.
And IMHO coherence and orphan rules have majorly contributed to the quality of the eco system.
Without it you can have many many additional forms of breakage. Worse you can have "new" breakage between two 3rd party crates without either of them changing due to some impl in a common ancestor changing (e.g. std) and this affecting two wild card implementations in each, now leading to an overlap.
When you have an overlap there are two options:
- fail compilation, but as mentioned this could be caused by a non breaking change in std in two in theory unrelated 3rd party dependencies
- try to choose one of the implementations. But that now gets very messy in multiple points: a) Which impl. to choose when. b) The user knowing which is chosen. c) Overlap with interactions with stuff like double dispatch, thread local variables, and in general side effects. The issues here are similar to specialization (and part why that is stuck in limbo), but a magnitude more complex as specialization is only (meant) for optimizations, while this can be deeply different behavior. Like `foo.bar()` with the same `use Bar as _;` might in one context return an `u32` and in another a `String`
In many other ecosystems it's not uncommon to run into having issues where certain libraries can't be used together at all. In rust that is close to not a thing (no_mange collisions and C dependencies are the only exception I can think of).
Similar, in my experience the likely hood of running into unintended breaking changes is lower in the rust ecosystem then e.g. python or js, that is partially due to coherence rules forcing a more clean design.
Also people are forced to have a somewhat clean dependency tree between crates in ways not all languages requires. This can help with incremental builds and compiler time, a area rust needs any help it can get. (As a side note, clean dependency structures in your modules can (sometimes) help will rust better parallelizing code gen, too.)
So overall it I think it's good.
Through it can be very annoying. And there is some potential for improvement in many ways.
---
EDIT: sorry some keyboard fat-fingering somehow submitted a half written response without me pressing enter...
EDIT 2: Fix spelling and sentence structure.
In most other languages, it is simply not possible to “add” an interface to a class you don’t own. Rust let’s you do that if you own either the type or or the interface. That’s strictly more permissive than the competition.
The reasons those other languages have for not letting you add your interface to foreign types, or extend them with new members, are exactly the same reasons that Rust has the orphan rule.
Rust: if you spent 3 weeks understanding the syntax and borrow-checker, here are all of the other problems, and the list keeps growing.
Man this cracks me up.