I stumbled across a … bug(?) … a limitation(?) … an issue(!) with the Rust compiler. Fortunately, I was able to find out what was going on in the Rust programming language forum. I thought this issue was interesting enough and subtle enough to try to explain to my blog audience.

To be clear, this is an issue. This is a limitation in the Rust trait solver. The maintainers of the Rust compiler didn’t make Rust work this way for a principled reason. There’s no particularly strong theoretical reason Rust has to have this limitation.

So, while understanding this issue will help understand Rust, there’s a certain level at which this issue is just an accident. So, while reading my explanation, keep in mind that I’m explaining how Rust works to explain why this limitation is possible as a limitation in how Rust happens to be implemented. This isn’t an explanation of why this limitation is necessary or fundamental, because it isn’t – it’s a limitation that could, and maybe at some point will be, fixed.

The Issue#

Here’s my version of a minimal reproduction of the issue in question, also available as a Rust playground permalink:

use std::marker::PhantomData;

pub trait Simple {
    type State;
    fn do_stuff();
}

struct Parameterized<T> {
    _phantom: PhantomData<T>,
}

impl<T> Simple for Parameterized<T> {
    type State = u32;

    fn do_stuff()
    // where Self: Simple
    {
        let state: Self::State = 3;
        println!("{state}");
    }
}

If you uncomment the where clause, all the sudden, the compiler forgets that Self::State and u32 are the same type within the scope of the do_stuff() function. It complains that you’re assining to a variable of the wrong type – that, in fact, it can’t know what type <Parameterized<T> as Simple>::State possibly could be! Strange!

If the where clause is gone, it works. If you don’t use a parameterized type, it works. But with the where clause on a parameterized type, it breaks.

Trait bounds in method signatures in trait impls are weird#

To be clear, uncommenting the where clause isn’t an immediate error, even though it’s not in the trait definition. Generally, the compiler lets you add redundant where clauses in trait implementations as long as your implementation will work everywhere the trait’s signature can be called. You can also remove where clauses and other trait bounds that you aren’t particularly using, such that code like this is completely kosher:

pub trait Simple {
    fn do_something<T: Clone>(&self, other: T);
}

impl Simple for u32 {
    fn do_something<T>(&self, _other: T) where u32: Simple {}
}

<u32 as Simple>::do_something() here takes the same number of type parameters, the same number of regular parameters, and its total constraint set is less restrictive than Simple::do_something(), so even though the signature looks different, it still passes muster. In our implementation, we didn’t need the : Clone, so we didn’t need to specify it.

However, that doesn’t mean we can now call <u32 as Simple>::do_something() on a value that doesn’t implement Clone. This still errors:

fn main() {
    struct Foo;
    1u32.do_something(Foo);
}

No, while the inside of <u32 as Simple>::do_something() is compiled with the signature we wrote with that specific implementation, our calls from the outside are still validated against the trait’s version of the function signature, even if we call it directly on a concrete value of our specific type.

Trait bounds are promises callers make to callees#

So, a function signature constitutes a contract between caller and callee, specifying what types of values the caller needs to provide as arguments (and which, in turn, the callee can expect to receive), and what types of value the callee may provide as return values (and which, therefore, the caller can expect to receive). In this framing, trait bounds, like the : Clone in T: Clone or the where clause where Self: Simple add to the agreement, serving as additional promises the caller makes to the callee, and therefore, in turn, additional guarantees the callee can rely upon.

Each trait bound (also known as a constraint, which is to say a limitation) restricts the allowed types of the function call. Each bound takes the set of possible valid calls of that function, and makes that set smaller – or exactly the same size, in the case of redundant bounds.

Given this, it makes sense that we can remove bounds when implementing a method of a trait – as long as we’re removing limitations on the caller, we can’t be breaking any callers. Callers can still uphold promises we’re not using.

It also makes sense that we can add redundant bounds. If we’re an impl of Simple, of course we already know Self: Simple, so why not say so in a where clause? Doesn’t every method of trait Trait basically have an implicit Self: Trait bound on it? We’re not asking the caller to promise anything we don’t already know to be true.

So far, so good. Rust’s behavior is in line with common sense (or at least my intuition, which I hope that you share). So, now with my next idea: since trait bounds are promises the caller makes, and that the callee can rely on, caller makes additional promises, we shouldn’t break the callee.

Let me put it another way: Sometimes, trait bounds are necessary because the implementation code inside the function, the callee, needs to rely on the promises conveyed by the trait bound. If we put a Clone bound on T, it’s probably because the callee wants to call T::clone() (or at least because they want to reserve the right to).

So, if we remove a trait bound, we might break the implementation code by pulling a promise out from underneath it. If we add a trait bound, however, that should never happen. The implementation should just stay valid. It stands to reason, right?

Well, it does stand to reason. But clearly, as we can tell from our little example up there, it’s not true. We can add a trait bound, add information that our function implementation should know, add additional promises the caller needs to make, and it makes our implementation, somehow, invalid.

Trait bounds are capabilities callees rely on#

To the caller, trait bounds are promises you have to be able to uphold, and they constrain how you can call the function. To the callee, inside the function itself, trait bounds are facts you can rely on. For example, if you see a trait bound T: Clone, it means you get to use the clone() method to duplicate values of type T. Without it, the compiler will complain if you use the clone() method.

In order to implement this sort of validation, the compiler needs some way of tracking all of this information. So, when you have T: Clone in your signature, the compiler internally has to remember that T implements Clone when it’s reading your function. In its internal data structures about the type variable T, it has to hold on to this bound, along with all other information it knows about T.

So, let’s return to our example:

impl<T> Simple for Parameterized<T> {
    type State = u32;

    fn do_stuff()
    // where Self: Simple
    {
        let state: Self::State = 3;
        println!("{state}");
    }
}

Here, we use Self::State. Self::State is an associated type defined in the Simple trait, so we can only write Self::State if we know that Self implements Simple. And since we’re in an impl block for Parameterized<T>, Self here is another way of writing the type Parameterized<T>.

Without the where clause, we know that Self implements Simple because we’re in the body of the actual impl Simple for Parameterized<T>. Because in an impl Simple for X block, we always know Self: Simple, and specifically, that Self’s implementation of Simple is in this block.

Based off of this information, the compiler not only gives us permission to write Self::State, it also lets us use the knowledge that Self::State in this context is u32, as it also comes from the fact that we’re in that impl block. The internal data the compiler uses to track information about Self can’t really be represented as Rust code, because it’s not just Self: Simple, but also Self::State IS IN FACT the type u32.

When we add the where clause, we now have another trait bound that says Self: Simple. This version of the trait bound, unlike the version implied by the impl, does NOT convey that Self::State is the type u32. When we write Self: Simple in a where clause, Rust just believes us – because it knows it won’t let us call the function in situations where it’s not true.

So, within the function do_stuff(), we now have two sources of the fact that Self: Simple. Either one allows us to write Self::State. But only the version implied by the impl allows us to know that Self::State is in fact u32. So, when we write Self::State, which version does Rust use to interpret it?

You might think the Rust compiler would consult both, as a human reading the code probably would. The Rust compiler knows from both the fact that we’re in the impl block and from the where clause that Self implements the type State. It also knows from the impl block version that Self::State is u32. So, it has enough information to conclude that Self::State is u32, and allow us to compile this code.

But that’s not what the compiler actually does. When compiling do_stuff(), the Rust compiler prefers the version of this trait bound from the where clause, which tells Rust nothing about whether Self::State is u32, and so when you write Self::State, Rust concludes that it can be any possible type. And so, when you try to write = 3, it concludes you can’t do that, not without more information, not without some syntax to say where Self::State is u32 (syntax that Rust doesn’t support because Rust doesn’t support conveying that information explicitly in a where clause).

It’s kind of like shadowing, where the version of the trait bound in the where clause shadows the version implied by the impl. But unlike shadowing, this behavior isn’t necessary. In the shadowing situation, it’s important to have a rule of which variable you mean, or else to ban having overlapping variable names. It doesn’t make sense to just declare that you obviously mean both. For this situation, you could just have both type bounds, and the Rust compiler could merge the information from both, as they represent facts that don’t contradict each other.

More importantly, there’s no intentional design decision that this comes from. There’s no Rust standard that mandates this error message. There isn’t even a principled reason that Rust should output an error in this situation – I would argue that it shouldn’t (and it would be less annoying for me if it didn’t).

Why’s it matter?#

I could see why in the minimal example I give, it doesn’t seem like a big deal. I can see the objections to this whole exercise now: Why shouldn’t Rust give an error message for saying the same thing twice, redundantly? Have you considered, Jimmy, just not doing that?

Well, in my case, the situation was more complicated. I didn’t just have where Self: Simple in an impl block for impl<T> Simple for SomeType<T>. I actually had where Self: Complicated<T> where Complicated<T> was a subtrait of Simple, hiding it from me. The code was working and working well before I added a type parameter for SomeType, and then broke when I added a type parameter.

For something a little closer to my actual example, see my Rust forum post.

I was able to work around the issue easily, by switching where Self: Complicated<T> to be where T: Complicated<Self>, as well as making an update to the corresponding semantics of Complicated. But I wasn’t able to do this until I understood why I was getting this error message. The error message was especially confusing since it only showed up when SomeType was parameterized!

The problem here isn’t the limitation of Rust. I think the limitation should be fixed, and I might try my hand at fixing it if I ever have the bandwidth (which is unlikely, my time is limited). But the more important problem is that the error message was deeply misleading.

But I’m glad I had this experience, because I learned more about Rust trait semantics.