baby steps — GeistHaus

Mar 21, 2026 Updated Mar 21, 2026

Show full content

This blog post describes a maximally minimal proposal for view types. It comes out of a converastion at RustNation I had with lcnr and Jack Huey, where we talking about various improvements to the language that are “in the ether”, that basically everybody wants to do, and what it would take to get them over the line.

Example: MessageProcessor

Let’s start with a simple example. Suppose we have a struct MessageProcessor which gets created with a set of messages. It will process them and, along the way, gather up some simple statistics:

pub struct MessageProcessor {
    messages: Vec<String>,
    statistics: Statistics,
}

#[non_exhaustive] // Not relevant to the example, just good practice!
pub struct Statistics {
    pub message_count: usize,
    pub total_bytes: usize,
}

The basic workflow for a message processor is that you

accumulate messages by pushing them into the self.messages vector
drain the accumulate messages and process them
reuse the backing buffer to push future messages

Accumulating messages

Accumulating messages is easy:

impl MessageProcessor {
    pub fn push_message(&mut self, message: String) {
        self.messages.push(message);
    }
}

Processing a single message

The function to process a single message takes ownership of the message string because it will send it to another thread. Before doing so, it updates the statistics:

impl MessageProcessor {
    fn process_message(&mut self, message: String) {
        self.statistics.message_count += 1;
        self.statistics.total_bytes += message.len();
        // ... plus something to send the message somewhere
    }
}

Draining the accumulated messages

The final function you need is one that will drain the accumulated messages and process them. Writing this ought to be straightforward, but it isn’t:

impl MessageProcessor {
    pub fn process_pushed_messages(&mut self) {
        for message in self.messages.drain(..) {
            self.process_message(message); // <-- ERROR: `self` is borrowed
        }
    }
}

The problem is that self.messages.drain(..) takes a mutable borrow on self.messages. When you call self.process_message, the compiler assumes you might modify any field, including self.messages. It therefore reports an error. This is logical, but frustrating.

Experienced Rust programmers know a number of workarounds. For example, you could swap the messages field for an empty vector. Or you could invoke self.messages.pop(). Or you could rewrite process_message to be a method on the Statistics type. But all of them are, let’s be honest, suboptimal. The code above is really quite reasonable, it would be nice if you could make it work in a straightforward way, without needing to restructure it.

What’s needed: a way for the borrow checker to know what fields a method may access

The core problem is that the borrow checker does not know that process_message will only access the statistics field. In this post, I’m going to focus on an explicit, and rather limited, notation, but I’ll also talk about how we might extend it in the future.

View types extend struct types with a list of fields

The basic idea of a view type is to extend the grammar of a struct type to optionally include a list of accessible fields:

RustType := StructName<...>
         |  StructName<...> { .. }         // <-- what we are adding
         |  StructName<...> { (fields),* } // <-- what we are adding

A type like MessageProcessor { statistics } would mean “a MessageProcessor struct where only the statistics field can be accessed”. You could also include a .., like MessageProcessor { .. }, which would mean that all fields can be accessed, which is equivalent to today’s struct type MessageProcessor.

View types respect privacy

View types would respect privacy, which means you could only write MessageProcessor { messages } in a context where you can name the field messages in the first place.

View types can be named on self arguments and elsewhere

You could use this to define that process_message only needs to access the field statistics:

impl MessageProcessor {
    fn process_message(&mut self {statistics}, message: String) {
        //             ----------------------
        //             Shorthand for: `self: &mut MessageProcessor {statistics}`
        
        // ... as before ...
    }
}

Of course you could use this notation in other arguments as well:

fn silly_example(.., mp: &mut MessageProcessor {statistics}, ..) {
}

Explicit view-limited borrows

We would also extend borrow expressions so that it is possible to specify precisely which fields will be accessible from the borrow:

let messages = &mut some_variable {messages}; // Ambiguous grammar? See below.

When you do this, the borrow checker produces a value of type &mut MessageProcessor {messages}.

Sharp-eyed readers will note that this is ambiguous. The above could be parsed today as a borrow of a struct expression like some_variable { messages } or, more verbosely, some_variable { messages: messages }. I’m not sure what to do about that. I’ll note some alternative syntaxes below, but I’ll also note that it would be possible for the compiler to parse the AST in an ambiguous fashion and disambiguate later on once name resolution results are known.

We automatically introduce view borrows in an auto-ref

In our example, though, the user never writes the &mut borrow explicitly. It results from the auto-ref added by the compiler as part of the method call:

pub fn process_pushed_messages(&mut self) {
    for message in self.messages.drain(..) {
        self.process_message(message); // <-- auto-ref occurs here
    }
}

The compiler internally rewrites method calls like self.process_message(message) to fully qualified form based on the signature declared in process_message. Today that results in code like this:

MessageProcessor::process_message(&mut *self, message)

But because process_message would now declare &mut self { statistics }, we can instead desugar to a borrow that specifies a field set:

MessageProcessor::process_message(&mut *self { statistics }, message)

The borrow checker would respect views

Integrating views into the borrow checker is fairly trivial. The way the borrow checker works is that, when it sees a borrow expression, it records a “loan” internally that tracks the place that was borrowed, the way it was borrowed (mut, shared), and the lifetime for which it was borrowed. All we have to do is to record, for each borrow using a view, multiple loans instead of a single loan.

For example, if we have &mut self, we would record one mut-loan of self. But if we have &mut self {field1, field2}, we would two mut-loans, one of self.field1 and one of self.field2.

Example: putting it all together

OK, let’s put it all together. This was our original example, collected:

pub struct MessageProcessor {
    messages: Vec<String>,
    statistics: Statistics,
}

#[non_exhaustive]
pub struct Statistics {
    pub message_count: usize,
    pub total_bytes: usize,
}

impl MessageProcessor {
    pub fn push_message(&mut self, message: String) {
        self.messages.push(message);
    }

    pub fn process_pushed_messages(&mut self) {
        for message in self.messages.drain(..) {
            self.process_message(message); // <-- ERROR: `self` is borrowed
        }
    }

    fn process_message(&mut self, message: String) {
        self.statistics.message_count += 1;
        self.statistics.total_bytes += message.len();
        // ... plus something to send the message somewhere
    }
}

Today, process_pushed_messages results in an error:

pub fn process_pushed_messages(&mut self) {
    for message in self.messages.drain(..) {
        //         ------------- borrows `self.messages`
        self.process_message(message); // <-- ERROR!
        //   --------------- borrows `self`
    }
}

The error arises from a conflict between two borrows:

self.messages.drain(..) desugars to Iterator::drain(&mut self.messages, ..) which, as you can see, mut-borrows self.messages;
then self.process_message(..) desugars to MessageProcessor::process_message(&mut self, ..) which, as you can see, mut-borrows all of self, which overlaps self.messages.

But in the “brave new world”, we’ll modify the program in one place:

-    fn process_message(&mut self, message: String) {
+    fn process_message(&mut self {statistics}, message: String) {

and as a result, the process_pushed_messages function will now borrow check successfully. This is because the two loans are now issued for different places:

as before, self.messages.drain(..) desugars to Iterator::drain(&mut self.messages, ..) which mut-borrows self.messages;
but now, self.process_message(..) desugars to MessageProcessor::process_message(&mut self {statistics}, ..) which mut-borrows self.statistics, which doesn’t overlap self.messages.

At runtime, this is still just a pointer

One thing I want to emphasize is that “view types” are a purely static construct and do not change how things are compiled. They simply give the borrow checker more information about what data will be accessed through which references. The process_message method, for example, still takes a single pointer to self.

This is in contrast with the workarounds that exist today. For example, if I were writing the above code, I might well rewrite process_message into an associated fn that takes a &mut Statistics:

impl MessageProcessor {
    fn process_message(statistics: &mut Statistics, message: String) {
        statistics.message_count += 1;
        statistics.total_bytes += message.len();
        // ... plus something to send the message somewhere
    }
}

This would be annoying, of course, since I’d have to write Self::process_message(&mut self.statistics, ..) instead of self.process_message(), but it would avoid the borrow check error.

Beyond being annoying, it would change the way the code is compiled. Instead of taking a reference to the MessageProcessor it now takes a reference to the Statistics.

In this example, the change from one type to another is harmless, but there are other examples where you need access to mulitple fields, in which case it is less efficient to pass them individually.

Frequently asked questions How hard would this be to implement?

Honestly, not very hard. I think we could ship it this year if we found a good contributor who wanted to take it on.

What about privacy?

I would require that the fields that appear in view types are ‘visible’ to the code that is naming them (this includes in view types that are inserted via auto-ref). So the following would be an error:

mod m {
    #[derive(Default)]
    pub struct MessageProcessor {
        messages: Vec<String>,
        ...
    }
    
    impl MessageProcessor {
        pub fn process_message(&mut self {messages}, message: String) {
            //                           ----------
            //   It's *legal* to reference a private field here, but it
            //   results in a lint, just as it is currently *legal*
            //   (but linted) for a public method to take an argument of
            //   private type. The lint is because doing this is effectively
            //   going to make the method uncallable from outside this module.
            self.messages.push(message);
        }
    }
}

fn main() {
    let mut mp = m::MessageProcessor::default();    
    mp.process_message(format!("Hello, world!"));
    // --------------- ERROR: field `messages` is not accessible here
    //
    // This desugars to:
    // 
    // ```
    // MessageProcessor::process_message(
    //     &mut mp {messages},        // <-- names a private field!
    //     format!("Hello, world!"),
    // )
    // ```
    // 
    // which names the private field `messages`. That is an error.
}

Does this mean that view types can’t be used in public methods?

More-or-less. You can use them if the view types reference public fields:

#[non_exhaustive]
pub Statistics {
    pub message_count: usize,
    pub average_bytes: usize,
    // ... maybe more fields will be added later ...
}

impl Statistics {
    pub fn total_bytes(&self {message_count, average_bytes}) -> usize {
        //                    ----------------------------
        //             Declare that we only read these two fields.
        self.message_count * self.average_bytes
    }
}

Won’t it be limited that view types more-or-less only work for private methods?

Yes! But it’s a good starting point. And my experience is that this problem occurs most often with private helper methods like the one I showed here. It can occur in public contexts, but much more rarely, and in those circumstances it’s often more acceptable to refactor the types to better expose the groupings to the user. This doesn’t mean I don’t want to fix the public case too, it just means it’s a good use-case to cut from the MVP. In the future I would address public fields via abstract fields, as I described in the past.

What if I am borrowing the same sets of fields over and over? That sounds repititive!

That’s true! It will be! I think in the future I’d like to see some kind of ‘ghost’ or ‘abstract’ fields, like I described in my abstract fields blog post. But again, that seems like a “post-MVP” sort of problem to me.

Must we specify the field sets being borrowed explicitly? Can’t they be inferred?

In the syntax I described, you have to write &mut place {field1, field2} explicitly. But there are many approaches in the literature to inferring this sort of thing, with row polymorphism perhaps being the most directly applicable. I think we could absolutely introduce this sort of inference, and in fact I’d probably make it the default, so that &mut place always introduces a view type, but it is typically inferred to “all fields” in practice. But that is a non-trivial extension to Rust’s inference system, introducing a new kind of inference we don’t do today. For the MVP, I think I would just lean on auto-ref covering by far the most common case, and have explicit syntax for the rest.

Man, I have to write the fields that my method uses in the signature? That sucks! It should be automatic!

I get that for many applications, particularly with private methods, writing out the list of fields that will be accessed seems a bit silly: the compiler ought to be able to figure it out.

On the flip side, this is the kind of inter-procedural inference we try to avoid in Rust, for a number of reasons:

it introduces dependecies between methods which makes inference more difficult (even undecidable, in extreme cases);
it makes for ’non-local errors’ that can be really confusing as a user, where modifying the body of one method causes errors in another (think of the confusion we get around futures and Send, for example);
it makes the compiler more complex, we would not be able to parallelize as easily (not that we parallelize today, but that work is underway!)

The bottom line for me is one of staging: whatever we do, I think we will want a way to be explicit about exactly what fields are being accessed and where. Therefore, we should add that first. We can add the inference later on.

Why does this need to be added to the borrow checker? Why not desugar?

Another common alternative (and one I considered for a while…) is to add some kind of “desugaring” that passes references to fields instead of a single reference. I don’t like this for two reasons. One, I think it’s frankly more complex! This is a fairly straightforward change to the borrow checker, but that desugaring would leave code all over the compiler, and it would make diagnostics etc much more complex.

But second, it would require changes to what happens at runtime, and I don’t see why that is needed in this example. Passing a single reference feels right to me.

What about the ambiguous grammar? What other syntax options are there?

Oh, right, the ambiguous grammar. To be honest I’ve not thought too deeply about the syntax. I was trying to have the type Struct { field1, field 2 } reflect struct constructor syntax, since we generally try to make types reflect expressions, but of course that leads to the ambiguity in borrow expressions that causes the problem:

let foo = &mut some_variable { field1 };
            // ------------- is this a variable or a field name?

Options I see:

Make it work. It’s not truly ambiguous, but it does require some semantic diambiguation, i.e., in at least some cases, we have to delay resolving this until name resolution can complete. That’s unusual for Rust. We do it in some small areas, most notably around the interpretation of a pattern like None (is it a binding to a variable None or an enum variant?).
New syntax for borrows only. We could keep the type syntax but make the borrow syntax different, maybe &mut {field1} in some_variable or something. Given that you would rarely type the explicit borrow form, that seems good?
Some new syntax altogether. Perhaps we want to try something different, or introduce a keyword everywhere? I’d be curious to hear options there. The current one feels nice to me but it occupies a “crowded syntactic space”, so I can see it being confusing to readers who won’t be sure how to interpret it.

Conclusion: this is a good MVP, let’s ship it!

In short, I don’t really see anything blocking us from moving forward here, at least with a lang experiment.

https://smallcultfollowing.com/babysteps/blog/2026/03/21/view-types-max-min/

How Dada enables internal references

Feb 27, 2026 Updated Feb 27, 2026

Show full content

In my previous Dada blog post, I talked about how Dada enables composable sharing. Today I’m going to start diving into Dada’s permission system; permissions are Dada’s equivalent to Rust’s borrow checker.

Goal: richer, place-based permissions

Dada aims to exceed Rust’s capabilities by using place-based permissions. Dada lets you write functions and types that capture both a value and things borrowed from that value.

As a fun example, imagine you are writing some Rust code to process a comma-separated list, just looking for entries of length 5 or more:

let list: String = format!("...something big, with commas...");
let items: Vec<&str> = list
    .split(",")
    .map(|s| s.trim()) // strip whitespace
    .filter(|s| s.len() > 5)
    .collect();

One of the cool things about Rust is how this code looks a lot like some high-level language like Python or JavaScript, but in those languages the split call is going to be doing a lot of work, since it will have to allocate tons of small strings, copying out the data. But in Rust the &str values are just pointers into the original string and so split is very cheap. I love this.

On the other hand, suppose you want to package up some of those values, along with the backing string, and send them to another thread to be processed. You might think you can just make a struct like so…

struct Message {
    list: String,
    items: Vec<&str>,
    //         ----
    // goal is to hold a reference
    // to strings from list
}

…and then create the list and items and store them into it:

let list: String = format!("...something big, with commas...");
let items: Vec<&str> = /* as before */;
let message = Message { list, items };
//                      ----
//                        |
// This *moves* `list` into the struct.
// That in turn invalidates `items`, which 
// is borrowed from `list`, so there is no
// way to construct `Message`.

But as experienced Rustaceans know, this will not work. When you have borrowed data like an &str, that data cannot be moved. If you want to handle a case like this, you need to convert from &str into sending indices, owned strings, or some other solution. Argh!

Dada’s permissions use places, not lifetimes

Dada does things a bit differently. The first thing is that, when you create a reference, the resulting type names the place that the data was borrowed from, not the lifetime of the reference. So the type annotation for items would say ref[list] String1 (at least, if you wanted to write out the full details rather than leaving it to the type inferencer):

let list: given String = "...something big, with commas..."
let items: given Vec[ref[list] String] = list
    .split(",")
    .map(_.trim()) // strip whitespace
    .filter(_.len() > 5)
    //      ------- I *think* this is the syntax I want for closures?
    //              I forget what I had in mind, it's not implemented.
    .collect()

I’ve blogged before about how I would like to redefine lifetimes in Rust to be places as I feel that a type like ref[list] String is much easier to teach and explain: instead of having to explain that a lifetime references some part of the code, or what have you, you can say that “this is a String that references the variable list”.

But what’s also cool is that named places open the door to more flexible borrows. In Dada, if you wanted to package up the list and the items, you could build a Message type like so:

class Message(
    list: String
    items: Vec[ref[self.list] String]
    //             ---------
    //   Borrowed from another field!
)

// As before:
let list: String = "...something big, with commas..."
let items: Vec[ref[list] String] = list
    .split(",")
    .map(_.strip()) // strip whitespace
    .filter(_.len() > 5)
    .collect()

// Create the message, this is the fun part!
let message = Message(list.give, items.give)

Note that last line – Message(list.give, items.give). We can create a new class and move list into it along with items, which borrows from list. Neat, right?

OK, so let’s back up and talk about how this all works.

References in Dada are the default

Let’s start with syntax. Before we tackle the Message example, I want to go back to the Character example from previous posts, because it’s a bit easier for explanatory purposes. Here is some Rust code that declares a struct Character, creates an owned copy of it, and then gets a few references into it.

struct Character {
    name: String,
    class: String,
    hp: u32,
}

let ch: Character = Character {
    name: format!("Ferris"),
    class: format!("Rustacean"),
    hp: 22
};

let p: &Character = &ch;
let q: &String = &p.name;

The Dada equivalent to this code is as follows:

class Character(
    name: String,
    klass: String,
    hp: u32,
)

let ch: Character = Character("Tzara", "Dadaist", 22)
let p: ref[ch] Character = ch
let q: ref[p] String = p.name

The first thing to note is that, in Dada, the default when you name a variable or a place is to create a reference. So let p = ch doesn’t move ch, as it would in Rust, it creates a reference to the Character stored in ch. You could also explicitly write let p = ch.ref, but that is not preferred. Similarly, let q = p.name creates a reference to the value in the field name. (If you wanted to move the character, you would write let ch2 = ch.give, not let ch2 = ch as in Rust.)

Notice that I said let p = ch “creates a reference to the Character stored in ch”. In particular, I did not say “creates a reference to ch”. That’s a subtle choice of wording, but it has big implications.

References in Dada are not pointers

The reason I wrote that let p = ch “creates a reference to the Character stored in ch” and not “creates a reference to ch” is because, in Dada, references are not pointers. Rather, they are shallow copies of the value, very much like how we saw in the previous post that a shared Character acts like an Arc<Character> but is represented as a shallow copy.

So where in Rust the following code…

let ch = Character { ... };
let p = &ch;
let q = &ch.name;

…looks like this in memory…

        # Rust memory representation

            Stack                       Heap
            ─────                       ────

┌───► ch: Character {
│ ┌───► name: String {
│ │         buffer: ───────────► "Ferris"
│ │         length: 6
│ │         capacity: 12
│ │     },
│ │     ...
│ │   }
│ │   
└──── p
  │
  └── q

in Dada, code like this

let ch = Character(...)
let p = ch
let q = ch.name

would look like so

# Dada memory representation

Stack                       Heap
─────                       ────

ch: Character {
    name: String {
            buffer: ───────┬───► "Ferris"
            length: 6      │
            capacity: 12   │
    },                     │
    ..                     │
}                          │
                           │
p: Character {             │
    name: String {         │
            buffer: ───────┤
            length: 6      │
            capacity: 12   │
    ...                    │
}                          │
    }                      │
                           │
q: String {                │
    buffer: ───────────────┘
    length: 6
    capacity: 12
}

Clearly, the Dada representation takes up more memory on the stack. But note that it doesn’t duplicate the memory in the heap, which tends to be where the vast majority of the data is found.

Dada talks about values not references

This gets at something important. Rust, like C, makes pointers first-class. So given x: &String, x refers to the pointer and *x refers to its referent, the String.

Dada, like Java, goes another way. x: ref String is a String value – including in memory representation! The difference between a given String, shared String, and ref String is not in their memory layout, all of them are the same, but they differ in whether they own their contents.2

So in Dada, there is no *x operation to go from “pointer” to “referent”. That doesn’t make sense. Your variable always contains a string, but the permissions you have to use that string will change.

In fact, the goal is that people don’t have to learn the memory representation as they learn Dada, you are supposed to be able to think of Dada variables as if they were all objects on the heap, just like in Java or Python, even though in fact they are stored on the stack.3

Rust does not permit moves of borrowed data

In Rust, you cannot move values while they are borrowed. So if you have code like this that moves ch into ch1…

let ch = Character { ... };
let name = &ch.name; // create reference
let ch1 = ch;        // moves `ch`

…then this code only compiles if name is not used again:

let ch = Character { ... };
let name = &ch.name; // create reference
let ch1 = ch;        // ERROR: cannot move while borrowed
let name1 = name;    // use reference again

…but Dada can

There are two reasons that Rust forbids moves of borrowed data:

References are pointers, so those pointers may become invalidated. In the example above, name points to the stack slot for ch, so if ch were to be moved into ch1, that makes the reference invalid.
The type system would lose track of things. Internally, the Rust borrow checker has a kind of “indirection”. It knows that ch is borrowed for some span of the code (a “lifetime”), and it knows that the lifetime in the type of name is related to that lifetime, but it doesn’t really know that name is borrowed from ch in particular.4

Neither of these apply to Dada:

Because references are not pointers into the stack, but rather shallow copies, moving the borrowed value doesn’t invalidate their contents. They remain valid.
Because Dada’s types reference actual variable names, we can modify them to reflect moves.

Dada tracks moves in its types

OK, let’s revisit that Rust example that was giving us an error. When we convert it to Dada, we find that it type checks just fine:

class Character(...) // as before
let ch: given Character = Character(...)
let name: ref[ch.name] String = ch.name
//            -- originally it was borrowed from `ch`
let ch1 = ch.give
//        ------- but `ch` was moved to `ch1`
let name1: ref[ch1.name] = name
//             --- now it is borrowed from `ch1`

Woah, neat! We can see that when we move from ch into ch1, the compiler updates the types of the variables around it. So actually the type of name changes to ref[ch1.name] String. And then when we move from name to name1, that’s totally valid.

In PL land, updating the type of a variable from one thing to another is called a “strong update”. Obviously things can get a bit complicated when control-flow is involved, e.g., in a situation like this:

let ch = Character(...)
let ch1 = Character(...)
let name = ch.name
if some_condition_is_true() {
    // On this path, the type of `name` changes
    // to `ref[ch1.name] String`, and so `ch`
    // is no longer considered borrowed.
    ch1 = ch.give
    ch = Character(...) // not borrowed, we can mutate
} else {
    // On this path, the type of `name`
    // remains unchanged, and `ch` is borrowed.
}
// Here, the types are merged, so the
// type of `name` is `ref[ch.name, ch1.name] String`.
// Therefore, `ch` is considered borrowed here.

Renaming lets us call functions with borrowed values

OK, let’s take the next step. Let’s define a Dada function that takes an owned value and another value borrowed from it, like the name, and then call it:

fn character_and_name(
    ch1: given Character,
    name1: ref[ch1] String,
) {
    // ... does something ...
}

We could call this function like so, as you might expect:

let ch = Character(...)
let name = ch.name
character_and_name(ch.give, name)

So…how does this work? Internally, the type checker type-checks a function call by creating a simpler snippet of code, essentially, and then type-checking that. It’s like desugaring but only at type-check time. In this simpler snippet, there are a series of let statements to create temporary variables for each argument. These temporaries always have an explicit type taken from the method signature, and they are initialized with the values of each argument:

// type checker "desugars" `character_and_name(ch.give, name)`
// into more primitive operations:
let tmp1: given Character = ch.give
    //    ---------------   -------
    //            |         taken from the call
    //    taken from fn sig
let tmp2: ref[tmp1.name] String = name
    //    ---------------------   ----
    //            |         taken from the call
    //    taken from fn sig,
    //    but rewritten to use the new
    //    temporaries

If this type checks, then the type checker knows you have supplied values of the required types, and so this is a valid call. Of course there are a few more steps, but that’s the basic idea.

Notice what happens if you supply data borrowed from the wrong place:

let ch = Character(...)
let ch1 = Character(...)
character_and_name(ch, ch1.name)
//                     --- wrong place!

This will fail to type check because you get:

let tmp1: given Character = ch.give
let tmp2: ref[tmp1.name] String = ch1.name
    //                            --------
    //       has type `ref[ch1.name] String`,
    //       not `ref[tmp1.name] String`

Class constructors are “just” special functions

So now, if we go all the way back to our original example, we can see how the Message example worked:

class Message(
    list: String
    items: Vec[ref[self.list] String]
)

Basically, when you construct a Message(list, items), that’s “just another function call” from the type system’s perspective, except that self in the signature is handled carefully.

This is modeled, not implemented

I should be clear, this system is modeled in the dada-model repository, which implements a kind of “mini Dada” that captures what I believe to be the most interesting bits. I’m working on fleshing out that model a bit more, but it’s got most of what I showed you here.5 For example, here is a test that you get an error when you give a reference to the wrong value.

The “real implementation” is lagging quite a bit, and doesn’t really handle the interesting bits yet. Scaling it up from model to real implementation involves solving type inference and some other thorny challenges, and I haven’t gotten there yet – though I have some pretty interesting experiments going on there too, in terms of the compiler architecture.6

This could apply to Rust

I believe we could apply most of this system to Rust. Obviously we’d have to rework the borrow checker to be based on places, but that’s the straight-forward part. The harder bit is the fact that &T is a pointer in Rust, and that we cannot readily change. However, for many use cases of self-references, this isn’t as important as it sounds. Often, the data you wish to reference is living in the heap, and so the pointer isn’t actually invalidated when the original value is moved.

Consider our opening example. You might imagine Rust allowing something like this in Rust:

struct Message {
    list: String,
    items: Vec<&{self.list} str>,
}

In this case, the str data is heap-allocated, so moving the string doesn’t actually invalidate the &str value (it would invalidate an &String value, interestingly).

In Rust today, the compiler doesn’t know all the details of what’s going on. String has a Deref impl and so it’s quite opaque whether str is heap-allocated or not. But we are working on various changes to this system in the Beyond the & goal, most notably the Field Projections work. There is likely some opportunity to address this in that context, though to be honest I’m behind in catching up on the details.

I’ll note in passing that Dada unifies str and String into one type as well. I’ll talk in detail about how that works in a future blog post. ↩︎
This is kind of like C++ references (e.g., String&), which also act “as if” they were a value (i.e., you write s.foo(), not s->foo()), but a C++ reference is truly a pointer, unlike a Dada ref. ↩︎
This goal was in part inspired by a conversation I had early on within Amazon, where a (quite experienced) developer told me, “It took me months to understand what variables are in Rust”. ↩︎
I explained this some years back in a talk on Polonius at Rust Belt Rust, if you’d like more detail. ↩︎
No closures or iterator chains! ↩︎
As a teaser, I’m building it in async Rust, where each inference variable is a “future” and use “await” to find out when other parts of the code might have added constraints. ↩︎

https://smallcultfollowing.com/babysteps/blog/2026/02/27/dada-internal-references/

What it means that Ubuntu is using Rust

Feb 23, 2026 Updated Feb 23, 2026

Show full content

Righty-ho, I’m back from Rust Nation, and busily horrifying my teenage daughter with my (admittedly atrocious) attempts at doing an English accent1. It was a great trip with a lot of good conversations and some interesting observations. I am going to try to blog about some of them, starting with some thoughts spurred by Jon Seager’s closing keynote, “Rust Adoption At Scale with Ubuntu”.

There are many chasms out there

For some time now I’ve been debating with myself, has Rust “crossed the chasm”? If you’re not familiar with that term, it comes from a book that gives a kind of “pop-sci” introduction to the Technology Adoption Life Cycle.

The answer, of course, is it depends on who you ask. Within Amazon, where I have the closest view, the answer is that we are “most of the way across”: Rust is squarely established as the right way to build at-scale data planes or resource-aware agents and it is increasingly seen as the right choice for low-level code in devices and robotics as well – but there remains a lingering perception that Rust is useful for “those fancy pants developers at S3” (or wherever) but a bit overkill for more average development3.

On the other hand, within the realm of Safety Critical Software, as Pete LeVasseur wrote in a recent rust-lang blog post, Rust is still scrabbling for a foothold. There are a number of successful products but most of the industry is in a “wait and see” mode, letting the early adopters pave the path.

“Crossing the chasm” means finding “reference customers”

The big idea that I at least took away from reading Crossing the Chasm and other references on the technology adoption life cycle is the need for “reference customers”. When you first start out with something new, you are looking for pioneers and early adopters that are drawn to new things:

What an early adopter is buying [..] is some kind of change agent. By being the first to implement this change in the industry, the early adopters expect to get a jump on the competition. – from Crossing the Chasm

But as your technology matures, you have to convince people with a lower and lower tolerance for risk:

The early majority want to buy a productivity improvement for existing operations. They are looking to minimize discontinuity with the old ways. They want evolution, not revolution. – from Crossing the Chasm

So what is most convincing to people to try something new? The answer is seeing that others like them have succeeded.

You can see this at play in both the Amazon example and the Safety Critical Software example. Clearly seeing Rust used for network services doesn’t mean it’s ready to be used in your car’s steering column4. And even within network services, seeing a group like S3 succeed with Rust may convince other groups building at-scale services to try Rust, but doesn’t necessarily persuade a team to use Rust for their next CRUD service. And frankly, it shouldn’t! They are likely to hit obstacles.

Ubuntu is helping Rust “cross the (user-land linux) chasm”

All of this was on my mind as I watched the keynote by Jon Seager, the VP of Engineering at Canonical, which is the company behind Ubuntu. Similar to Lars Bergstrom’s epic keynote from year’s past on Rust adoption within Google, Jon laid out a pitch for why Canonical is adopting Rust that was at once visionary and yet deeply practical.

“Visionary and yet deeply practical” is pretty much the textbook description of what we need to cross from early adopters to early majority. We need folks who care first and foremost about delivering the right results, but are open to new ideas that might help them do that better; folks who can stand on both sides of the chasm at once.

Jon described how Canonical focuses their own development on a small set of languages: Python, C/C++, and Go, and how they had recently brought in Rust and were using it as the language of choice for new foundational efforts, replacing C, C++, and (some uses of) Python.

Ubuntu is building the bridge across the chasm

Jon talked about how he sees it as part of Ubuntu’s job to “pay it forward” by supporting the construction of memory-safe foundational utilities. Jon meant support both in terms of finances – Canonical is sponsoring the Trifecta Tech Foundation’s to develop sudo-rs and ntpd-rs and sponsoring the uutils org’s work on coreutils – and in terms of reputation. Ubuntu can take on the risk of doing something new, prove that it works, and then let others benefit.

Remember how the Crossing the Chasm book described early majority people? They are “looking to minimize discontinuity with the old ways”. And what better way to do that than to have drop-in utilities that fit within their existing workflows.

The challenge for Rust: listening to these new adopters

With new adoption comes new perspectives. On Thursday night I was at dinner5 organized by Ernest Kissiedu6. Jon Seager was there along with some other Rust adopters from various industries, as were a few others from the Rust Foundation and the open-source project.

Ernest asked them to give us their unvarnished takes on Rust. Jon made the provocative comment that we needed to revisit our policy around having a small standard library. He’s not the first to say something like that, it’s something we’ve been hearing for years and years – and I think he’s right! Though I don’t think the answer is just to ship a big standard library. In fact, it’s kind of a perfect lead-in to (what I hope will be) my next blog post, which is about a project I call “battery packs”7.

To grow, you have to change

The broader point though is that shifting from targeting “pioneers” and “early adopters” to targeting “early majority” sometimes involves some uncomfortable changes:

Transition between any two adoption segments is normally excruciatingly awkward because you must adopt new strategies just at the time you have become most comfortable with the old ones. [..] The situation can be further complicated if the high-tech company, fresh from its marketing success with visionaries, neglects to change its sales pitch. [..] The company may be saying “state-of-the-art” when the pragmatist wants to hear “industry standard”. – Crossing the Chasm (emphasis mine)

Not everybody will remember it, but in 2016 there was a proposal called the Rust Platform. The idea was to bring in some crates and bless them as a kind of “extended standard library”. People hated it. After all, they said, why not just add dependencies to your Cargo.toml? It’s easy enough. And to be honest, they were right – at least at the time.

I think the Rust Platform is a good example of something that was a poor fit for early adopters, who want the newest thing and don’t mind finding the best crates, but which could be a great fit for the Early Majority.8

Anyway, I’m not here to argue for one thing or another in this post, but more for the concept that we have to be open to adapting our learned wisdom to new circumstances. In the past, we were trying to bootstrap Rust into the industry’s consciousness – and we have succeeded.

The task before us now is different: we need to make Rust the best option not just in terms of “what it could be” but in terms of “what it actually is” – and sometimes those are in tension.

Another challenge for Rust: turning adoption into investment

Later in the dinner, the talk turned, as it often does, to money. Growing Rust adoption also comes with growing needs placed on the Rust project and its ecosystem. How can we connect the dots? This has been a big item on my mind, and I realize in writing this paragraph how many blog posts I have yet to write on the topic, but let me lay out a few interesting points that came up over this dinner and at other recent points.

Investment can mean contribution, particularly for open-source orgs

First, there are more ways to offer support than $$. For Canonical specifically, as they are an open-source organization through-and-through, what I would most want is to build stronger relationships between our organizations. With the Rust for Linux developers, early on Rust maintainers were prioritizing and fixing bugs on behalf of RfL devs, but more and more, RfL devs are fixing things themselves, with Rust maintainers serving as mentors. This is awesome!

Money often comes before a company has adopted Rust, not after

Second, there’s an interesting trend about $$ that I’ve seen crop up in a few places. We often think of companies investing in the open-source dependencies that they rely upon. But there’s an entirely different source of funding, and one that might be even easier to tap, which is to look at companies that are considering Rust but haven’t adopted it yet.

For those “would be” adopters, there are often individuals in the org who are trying to make the case for Rust adoption – these individuals are early adopters, people with a vision for how things could be, but they are trying to sell to their early majority company. And to do that, they often have a list of “table stakes” features that need to be supported; what’s more, they often have access to some budget to make these things happen.

This came up when I was talking to Alexandru Radovici, the Foundation’s Silver Member Directory, who said that many safety critical companies have money they’d like to spend to close various gaps in Rust, but they don’t know how to spend it. Jon’s investments in Trifecta Tech and the uutils org have the same character: he is looking to close the gaps that block Ubuntu from using Rust more.

Conclusions…?

Well, first of all, you should watch Jon’s talk. “Brilliant”, as the Brits have it.

But my other big thought is that this is a crucial time for Rust. We are clearly transitioning in a number of areas from visionaries and early adopters towards that pragmatic majority, and we need to be mindful that doing so may require us to change some of the way that we’ve always done things. I liked this paragraph from Crossing the Chasm:

To market successfully to pragmatists, one does not have to be one – just understand their values and work to serve them. To look more closely into these values, if the goal of visionaries is to take a quantum leap forward, the goal of pragmatists is to make a percentage improvement–incremental, measurable, predictable progress. [..] To market to pragmatists, you must be patient. You need to be conversant with the issues that dominate their particular business. You need to show up at the industry-specific conferences and trade shows they attend.

Re-reading Crossing the Chasm as part of writing this blog post has really helped me square where Rust is – for the most part, I think we are still crossing the chasm, but we are well on our way. I think what we see is a consistent trend now where we have Rust champions who fit the “visionary” profile of early adopters successfully advocating for Rust within companies that fit the pragmatist, early majority profile.

Open source can be a great enabler to cross the chasm…

It strikes me that open-source is just an amazing platform for doing this kind of marketing. Unlike a company, we don’t have to do everything ourselves. We have to leverage the fact that open source helps those who help themselves – find those visionary folks in industries that could really benefit from Rust, bring them into the Rust orbit, and then (most important!) support and empower them to adapt Rust to their needs.

…but only if we don’t get too “middle school” about it

This last part may sound obvious, but it’s harder than it sounds. When you’re embedded in open source, it seems like a friendly place where everyone is welcome. But the reality is that it can be a place full of cliques and “oral traditions” that “everybody knows”9. People coming with an idea can get shutdown for using the wrong word. They can readily mistake the, um, “impassioned” comments from a random contributor (or perhaps just a troll…) for the official word from project leadership. It only takes one rude response to turn somebody away.

What Rust needs most is empathy

So what will ultimately help Rust the most to succeed? Empathy in Open Source. Let’s get out there, find out where Rust can help people, and make it happen. Exciting times!

I am famously bad at accents. My best attempt at posh British sounds more like Apu from the Simpsons. I really wish I could pull off a convincing Greek accent, but sadly no. ↩︎
Another of my pearls of wisdom is “there is nothing more permanent than temporary code”. I used to say that back at the startup I worked at after college, but years of experience have only proven it more and more true. ↩︎
Russel Cohen and Jess Izen gave a great talk at last year’s RustConf about what our team is doing to help teams decide if Rust is viable for them. But since then another thing having a big impact is AI, which is bringing previously unthinkable projects, like rewriting older systems, within reach. ↩︎
I have no idea if there is code in a car’s steering column, for the record. I assume so by now? For power steering or some shit? ↩︎
Or am I supposed to call it “tea”? Or maybe “supper”? I can’t get a handle on British mealtimes. ↩︎
Ernest is such a joy to be around. He’s quiet, but he’s got a lot of insights if you can convince him to share them. If you get the chance to meet him, take it! If you live in London, go to the London Rust meetup! Find Ernest and introduce yourself. Tell him Niko sent you and that you are supposed to say how great he is and how you want to learn from the wisdom he’s accrued over the years. Then watch him blush. What a doll. ↩︎
If you can’t wait, you can read some Zulip discussion here. ↩︎
The Battery Packs proposal I want to talk about is similar in some ways to the Rust Platform, but decentralized and generally better in my opinion– but I get ahead of myself! ↩︎
Betteridge’s Law of Headlines has it that “Any headline that ends in a question mark can be answered by the word no”. Well, Niko’s law of open-source2 is that “nobody actually knows anything that ’everybody’ knows”. ↩︎

https://smallcultfollowing.com/babysteps/blog/2026/02/23/ubuntu-rustnation/

Sharing in Dada

Feb 14, 2026 Updated Feb 14, 2026

Show full content

OK, let’s talk about sharing. This is the first of Dada blog posts where things start to diverge from Rust in a deep way and I think the first where we start to see some real advantages to the Dada way of doing things (and some of the tradeoffs I made to achieve those advantages).

We are shooting for a GC-like experience without GC

Let’s start with the goal: earlier, I said that Dada was like “Rust where you never have to type as_ref”. But what I really meant is that I want a GC-like experience–without the GC.

We are shooting for a “composable” experience

I also often use the word “composable” to describe the Dada experience I am shooting for. Composable means that you can take different things and put them together to achieve something new.

Obviously Rust has many composable patterns – the Iterator APIs, for example. But what I have found is that Rust code is often very brittle: there are many choices when it comes to how you declare your data structures and the choices you make will inform how those data structures can be consumed.

Running example: Character Defining the Character type

Let’s create a type that we can use as a running example throughout the post: Character. In Rust, we might define a Character like so:

#[derive(Default)]
struct Character {
    name: String,
    class: String,
    hp: u32,
}

Creating and Arc’ing the Character

Now, suppose that, for whatever reason, we are going to build up a character programmatically:

let mut ch = Character::default();
ch.name.push_str("Ferris");
ch.class.push_str("Rustacean");
ch.hp = 44;

So far, so good. Now suppose I want to share that same Character struct so it can be referenced from a lot of places without deep copying. To do that, I am going to put it in an Arc:

let mut ch = Character::default();
ch.name.push_str("Ferris");
// ...
let ch1 = Arc::new(ch);
let ch2 = ch1.clone();

OK, cool! Now I have a Character that is readily sharable. That’s great.

Rust is composable here, which is cool, we like that

Side note but this is an example of where Rust is composable: we defined Character once in a fully-owned way and we were able to use it mutably (to build it up imperatively over time) and then able to “freeze” it and get a read-only, shared copy of Character. This gives us the advantages of an imperative programming language (easy data construction and manipulation) and the advantages of a functional language (immutability prevents bugs when things are referenced from many disjoint places). Nice!

Creating and Arc’ing the Character

Now, suppose that I have some other code, written independently, that just needs to store the character’s name. That code winds up copying the name into a lot of different places. So, just like we used Arc to let us cheaply reference a single character from multiple places, it uses Arc so it can cheaply reference the character’s name from multiple places:

struct CharacterSheetWidget {
    // Use `Arc<String>` and not `String` because
    // we wind up copying this into name different
    // places and we don't want to deep clone
    // the string each time.
    name: Arc<String>,

    // ... assume more fields here ...
}

OK. Now comes the rub. I want to create a character-sheet widget from our shared character:

fn create_character_sheet_widget(ch: Arc<Character>) -> CharacterSheetWidget {
    CharacterSheetWidget {
        // FIXME: Huh, how do I bridge this gap?
        // I guess I have to do this.
        name: Arc::new(ch.name.clone()),

        // ... assume more fields here ...
    }
}

Shoot, that’s frustrating! What I would like to do is to write name: ch.name.clone() or something similar (actually I’d probably like to just write ch.name, but anyhow) and get back an Arc<String>. But I can’t do that. Instead, I have to deeply clone the string and allocate a new Arc. Of course any subsequent clones will be cheap. But it’s not great.

Rust often gives rise to these kind of “impedance mismatches”

I often find patterns like this arise in Rust: there’s a bit of an “impedance mismatch” between one piece of code and another. The solution varies, but it’s generally something like

clone some data – it’s not so big anyway, screw it (that’s what happened here).
refactor one piece of code – e.g., modify the Character class to store an Arc<String>. Of course, that has ripple effects, e.g., we can no longer write ch.name.push_str(...) anymore, but have to use Arc::get_mut or something.
invoke some annoying helper – e.g., write opt.as_ref() to convert from an &Option<String> to a Option<&String> or write a &**r to convert from a &Arc<String> to a &str.

The goal with Dada is that we don’t have that kind of thing.

Sharing is how Dada copies

So let’s walk through how that same Character example would play out in Dada. We’ll start by defining the Character class:

class Character(
    name: String,
    klass: String,  # Oh dang, the perils of a class keyword!
    hp: u32,
)

Just as in Rust, we can create the character and then modify it afterwards:

class Character(name: String, klass: String, hp: u32)

let ch: given Character = Character("", "", 22)
      # ----- remember, the "given" permission
      #       means that `ch` is fully owned
ch.name!.push("Tzara")
ch.klass!.push("Dadaist")
   #    - and the `!` signals mutation

The .share operator creates a shared object

Cool. Now, I want to share the character so it can be referenced from many places. In Rust, we created an Arc, but in Dada, sharing is “built-in”. We use the .share operator, which will convert the given Character (i.e., fully owned character) into a shared Character:

class Character(name: String, klass: String, hp: u32)

let ch = Character("", "", 22)
ch!.push("Tzara")
ch!.push("Dadaist")

let ch1: shared Character = ch.share
      #  ------                -----
      # The `share` operator consumes `ch`
      # and returns the same object, but now
      # with *shared* permissions.

shared objects can be copied freely

Now that we have a shared character, we can copy it around:

class Character(name: String, klass: String, hp: u32)

# Create a shared character to start
let ch1 = Character("Tzara", "Dadaist", 22).share
    #                                       -----

# Create another shared character
let ch2 = ch1

Sharing propagates from owner to field

When you have a shared object and you access its field, what you get back is a shared (shallow) copy of the field:

class Character(...)

# Create a `shared Character`
let ch: shared Character = Character("Tristan Tzara", "Dadaist", 22).share
      # ------                                                       -----

# Extracting the `name` field gives a `shared String`
let name: shared String = ch1.name
        # ------

Propagation using a Vec

To drill home how cool and convenient this is, imagine that I have a Vec[String] that I share with .share:

let v: shared Vec[String] = ["Hello", "Dada"].share

and then I share it with v.share. What I get back is a shared Vec[String]. And when I access the elements of that, I get back a shared String:

let v = ["Hello", "Dada"].share
let s: shared String = v[0]

This is as if one could take a Arc<Vec<String>> in Rust and get out a Arc<String>.

How sharing is implemented

So how is sharing implemented? The answer lies in a not-entirely-obvious memory layout. To see how it works, let’s walk how a Character would be laid out in memory:

# Character type we saw earlier.
class Character(name: String, klass: String, hp: u32)

# String type would be something like this.
class String {
    buffer: Pointer[char]
    initialized: usize
    length: usize
}

Here Pointer is a built-in type that is the basis for Dada’s unsafe code system.1

Layout of a given Character in memory

Now imagine we have a Character like this:

let ch = Character("Duchamp", "Dadaist", 22)

The character ch would be laid out in memory something like this (focusing just on the name field):

[Stack frame]              [Heap]         
ch: Character {                           
    _flag: 1                              
    name: String {                        
        _flag: 1         { _ref_count: 1  
        buffer: ──────────►'D'            
        initialized: 7     ...            
        capacity: 8        'p' }          
    }                                     
    klass: ...                            
    hp: 22                                
}

Let’s talk this through. First, every object is laid out flat in memory, just like you would see in Rust. So the fields of ch are stored on the stack, and the name field is laid out flat within that.

Each object that owns other objects begins with a hidden field, _flag. This field indicates whether the object is shared or not (in the future we’ll add more values to account for other permissions). If the field is 1, the object is not shared. If it is 2, then it is shared.

Heap-allocated objects (i.e., using Pointer[]) begin with a ref-count before the actual data (actually this is at the offset of -4). In this case we have a Pointer[char] so the actual data that follows are just simple characters.

Layout of a shared Character in memory

If I were to instead create a shared character:

let ch1 = Character("Duchamp", "Dadaist", 22).share
          #                                   -----

The memory layout would be the same, but the flag field on the character is now 2:

[Stack frame]              [Heap]         
ch: Character {                           
    _flag: 2 👈 (This is 2 now!)                             
    name: String {                        
        _flag: 1         { _ref_count: 1  
        buffer: ──────────►'D'            
        initialized: 7     ...            
        capacity: 8        'p' }          
    }                                     
    klass: ...                            
    hp: 22                                
}

Copying a shared Character

Now imagine that we created two copies of the same shared character:

let ch1 = Character("Duchamp", "Dadaist", 22).share
let ch2 = ch1

What happens is that we will copy all the fields of _ch1 and then, because _flag is 2, we will increment the ref-counts for the heap-allocated data within:

[Stack frame]              [Heap]            
ch1: Character {                             
    _flag: 2                                 
    name: String {                           
        _flag: 1         { _ref_count: 2     
        buffer: ────────┬─►'D'        👆     
        initialized: 7  │  ...      (This is 
        capacity: 8     │  'p' }     2 now!) 
    }                   │                    
    class: ...          │                    
    hp: 22              │                    
}                       │                    
                        │                    
ch2: Character {        │                    
    _flag: 2            │                    
    name: String {      │                    
        _flag: 1        │                    
        buffer: ────────┘                    
        initialized: 7                       
        capacity: 8                          
    }                                        
    class: ...                               
    hp: 22                                   
}

Copying out the name field

Now imagine we were to copy out the name field, instead of the entire character:

let ch1 = Character("Duchamp", "Dadaist", 22).share
let name = ch1.name

…what happens is that:

traversing ch1, we observe that the _flag field is 2 and therefore ch1 is shared
we copy out the String fields from name. Because the character is shared:
- we modify the _flag field on the new string to 2
- we increment the ref-count for any heap values

The result is that you get:

[Stack frame]              [Heap]       
ch1: Character {                        
    _flag: 2                            
    name: String {                      
        _flag: 1         { _ref_count: 2
        buffer: ────────┬─►'D'          
        initialized: 7  │  ...          
        capacity: 8     │  'p' }        
    }                   │               
    class: ...          │               
    hp: 22              │               
}                       │               
                        │               
name: String {          │               
    _flag: 2            │               
    buffer: ────────────┘               
    initialized: 7                      
    capacity: 8                         
}

“Sharing propagation” is one example of permission propagation

This post showed how shared values in Dada work and showed how the shared permission propagates when you access a field. Permissions are how Dada manages object lifetimes. We’ve seen two so far

the given permission indicates a uniquely owned value (T, in Rust-speak);
the shared permission indicates a copyable value (Arc<T> is the closest Rust equivalent).

In future posts we’ll see the ref and mut permissions, which roughly correspond to & and &mut, and talk out how the whole thing fits together.

Dada is more than a pretty face

This is the first post where we started to see a bit more of Dada’s character. Reading over the previous few posts, you could be forgiven for thinking Dada was just a cute syntax atop familiar Rust semantics. But as you can see from how shared works, Dada is quite a bit more than that.

I like to think of Dada as “opinionated Rust” in some sense. Unlike Rust, it imposes some standards on how things are done. For example, every object (at least every object with a heap-allocated field) has a _flag field. And every heap allocation has a ref-count.

These conventions come at some modest runtime cost. My rule is that basic operations are allowed to do “shallow” operations, e.g., toggling the _flag or adjusting the ref-counts on every field. But they cannot do “deep” operations that require traversing heap structures.

In exchange for adopting conventions and paying that cost, you get “composability”, by which I mean that permissions in Dada (like shared) flow much more naturally, and types that are semantically equivalent (i.e., you can do the same things with them) generally have the same layout in memory.

Remember that I have not implemented all this, I am drawing on my memory and notes from my notebooks. I reserve the right to change any and everything as I go about implementing. ↩︎

https://smallcultfollowing.com/babysteps/blog/2026/02/14/sharing-in-dada/

Dada: moves and mutation

Feb 10, 2026 Updated Feb 11, 2026

Show full content

Let’s continue with working through Dada. In my previous post, I introduced some string manipulation. Let’s start talking about permissions. This is where Dada will start to resemble Rust a bit more.

Class struggle

Classes in Dada are one of the basic ways that we declare new types (there are also enums, we’ll get to that later).

The most convenient way to declare a class is to put the fields in parentheses. This implicitly declares a constructor at the same time:

class Point(x: u32, y: u32) {}

This is in fact sugar for a more Rust like form:

class Point {
    x: u32
    y: u32
    fn new() -> Point {
        Point { x, y }
    }
}

And you can create an instance of a class by calling the constructor:

let p = Point(22, 44) // sugar for Point.new(22, 44)

Mutating fields

I can mutate the fields of p as you would expect:

p.x += 1
p.x = p.y

Read by default

In Dada, the default when you declare a parameter is that you are getting read-only access:

fn print_point(p: Point) {
    print("The point is {p.x}, {p.y}")
}

let p = Point(22, 44)
print_point(p)

If you attempt to mutate the fields of a parameter, that would get you an error:

fn print_point(p: Point) {
    p.x += 1 # <-- ERROR!
}

Use ! to mutate

If you declare a parameter with !, then it becomes a mutable reference to a class instance from your caller:

fn translate_point(point!: Point, x: u32, y: u32) {
    point.x += x
    point.y += y
}

In Rust, this would be like point: &mut Point. When you call translate_point, you also put a ! to indicate that you are passing a mutable reference:

let p = Point(22, 44)     # Create point
print_point(p)            # Prints 22, 44
translate_point(p!, 2, 2) # Mutate point
print_point(p)            # Prints 24, 46

As you can see, when translate_point modifies p.x, that changes p in place.

Moves are explicit

If you’re familiar with Rust, that last example may be a bit surprising. In Rust, a call like print_point(p) would move p, giving ownership away. Trying to use it later would give an error. That’s because the default in Dada is to give a read-only reference, like &x in Rust (this gives the right intuition but is also misleading; we’ll see in a future post that references in Dada are different from Rust in one very important way).

If you have a function that needs ownership of its parameter, you declare that with given:

fn take_point(p: given Point) {
    // ...
}

And on the caller’s side, you call such a function with .give:

let p = Point(22, 44)
take_point(p.give)
take_point(p.give) # <-- Error! Can't give twice.

Comparing with Rust

It’s interesting to compare some Rust and Dada code side-by-side:

Rust Dada vec.len() vec.len() map.get(&key) map.get(key) vec.push(element) vec!.push(element.give) vec.append(&mut other) vec!.append(other!) message.send_to(&channel) message.give.send_to(channel) Design rationale and objectives Convenient is the default

The most convenient things are the shortest and most common. So we make reads the default.

Everything is explicit but unobtrusive

The . operator in Rust can do a wide variety of things depending on the method being called. It might mutate, move, create a temporary, etc. In Dada, these things are all visible at the callsite– but they are unobtrusive.

This actually dates from Dada’s “gradual programming” days – after all, if you don’t have type annotations on the method, then you can’t decide foo.bar() should take a shared or mutable borrow of foo. So we needed a notation where everything is visible at the call-site and explicit.

Postfix operators play more nicely with others

Dada tries hard to avoid prefix operators like &mut, since they don’t compose well with . notation.

https://smallcultfollowing.com/babysteps/blog/2026/02/10/dada-moves-and-mutation/

Hello, Dada!

Feb 9, 2026 Updated Feb 9, 2026

Show full content

Following on my Fun with Dada post, this post is going to start teaching Dada. I’m going to keep each post short – basically just what I can write while having my morning coffee.1

You have the right to write code

Here is a very first Dada program

println("Hello, Dada!")

I think all of you will be able to guess what it does. Still, there is something worth noting even in this simple program:

“You have the right to write code. If you don’t write a main function explicitly, one will be provided for you.” Early on I made the change to let users omit the main function and I was surprised by what a difference it made in how light the language felt. Easy change, easy win.

Convenient is the default

Here is another Dada program

let name = "Dada"
println("Hello, {name}!")

Unsurprisingly, this program does the same thing as the last one.

“Convenient is the default.” Strings support interpolation (i.e., {name}) by default. In fact, that’s not all they support, you can also break them across lines very conveniently. This program does the same thing as the others we’ve seen:

let name = "Dada"
println("
    Hello, {name}!
")

When you have a " immediately followed by a newline, the leading and trailing newline are stripped, along with the “whitespace prefix” from the subsequent lines. Internal newlines are kept, so something like this:

let name = "Dada"
println("
    Hello, {name}!
    
    How are you doing?
")

would print

Hello, Dada!

How are you doing?

Just one familiar String

Of course you could also annotate the type of the name variable explicitly:

let name: String = "Dada"
println("Hello, {name}!")

You will find that it is String. This in and of itself is not notable, unless you are accustomed to Rust, where the type would be &'static str. This is of course a perennial stumbling block for new Rust users, but more than that, I find it to be a big annoyance – I hate that I have to write "Foo".to_string() or format!("Foo") everywhere that I mix constant strings with strings that are constructed.

Similar to most modern languages, strings in Dada are immutable. So you can create them and copy them around:

let name: String = "Dada"
let greeting: String = "Hello, {name}"
let name2: String = name

Next up: mutation, permissions

OK, we really just scratched the surface here! This is just the “friendly veneer” of Dada, which looks and feels like a million other languages. Next time I’ll start getting into the permission system and mutation, where things get a bit more interesting.

My habit is to wake around 5am and spend the first hour of the day doing “fun side projects”. But for the last N months I’ve actually been doing Rust stuff, like symposium.dev and preparing the 2026 Rust Project Goals. Both of these are super engaging, but all Rust and no play makes Niko a dull boy. Also a grouchy boy. ↩︎

https://smallcultfollowing.com/babysteps/blog/2026/02/09/hello-dada/

Fun With Dada

Feb 8, 2026 Updated Feb 9, 2026

Show full content

Waaaaaay back in 2021, I started experimenting with a new programming language I call “Dada”. I’ve been tinkering with it ever since and I just realized that (oh my gosh!) I’ve never written even a single blog post about it! I figured I should fix that. This post will introduce some of the basic concepts of Dada as it is now.

Before you get any ideas, Dada isn’t fit for use. In fact the compiler doesn’t even really work because I keep changing the language before I get it all the way working. Honestly, Dada is more of a “stress relief” valve for me than anything else1 – it’s fun to tinker with a programming language where I don’t have to worry about backwards compatibility, or RFCs, or anything else.

That said, Dada has been a very fertile source of ideas that I think could be applicable to Rust. And not just for language design: playing with the compiler is also what led to the new salsa design 2, which is now used by both rust-analyzer and Astral’s ty. So I really want to get those ideas out there!

I took a break, but I’m back baby!

I stopped hacking on Dada about a year ago3, but over the last few days I’ve started working on it again. And I realized, hey, this is a perfect time to start blogging! After all, I have to rediscover what I was doing anyway, and writing about things is always the best way to work out the details.

Dada started as a gradual programming experiment, but no longer

Dada has gone through many phases. Early on, the goal was to build a gradually typed programming language that I thought would be easier for people to learn.

The idea was that you could start writing without any types at all and just execute the program. There was an interactive playground that would let you step through and visualize the “borrow checker” state (what Dada calls permissions) as you go. My hope was that people would find that easier to learn than working with type checker checker.

I got this working and it was actually pretty cool. I gave a talk about it at the Programming Language Mentoring Workshop in 2022, though skimming that video it doesn’t seem like I really demo’d the permission modeling. Too bad.

At the same time, I found myself unconvinced that the gradually typed approach made sense. What I wanted was that when you executed the program without type annotations, you would get errors at the point where you violated a borrow. And that meant that the program had to track a lot of extra data, kind of like miri does, and it was really only practical as a teaching tool. I still would like to explore that, but it also felt like it was adding a lot of complexity to the language design for something that would only be of interest very early in a developer’s journey4.

Therefore, I decided to start over, this time, to just focus on the static type checking part of Dada.

Dada is like a streamlined Rust

Dada today is like Rust but streamlined. The goal is that Dada has the same basic “ownership-oriented” feel of Rust, but with a lot fewer choices and nitty-gritty details you have to deal with.5

Rust often has types that are semantically equivalent, but different in representation. Consider &Option<String> vs Option<&String>: both of them are equivalent in terms of what you can do with them, but of course Rust makes you carefully distinguish between them. In Dada, they are the same type. Dada also makes &Vec<String>, &Vec<&String>, &[String], &[&str], and many other variations all the same type too. And before you ask, it does it without heap allocating everything or using a garbage collector.

To put it pithily, Dada aims to be “Rust where you never have to call as_ref()”.

Dada has a fancier borrow checker

Dada also has a fancier borrow checker, one which already demonstrates much of the borrow checker within, although it doesn’t have view types. Dada’s borrow checker supports internal borrows (e.g., you can make a struct that has fields that borrow from other fields) and it supports borrow checking without lifetimes. Much of this stuff can be brought to Rust, although I did tweak a few things in Dada that made some aspects easier.

Dada targets WebAssembly natively

Somewhere along the line in refocusing Dada, I decided to focus exclusively on building WebAssembly components. Initially I felt like targeting WebAssembly would be really convenient:

WebAssembly is like a really simple and clean assembly language, so writing the compiler backend is easy.
WebAssembly components are explicitly designed to bridge between languages, so they solve the FFI problem for you.
With WASI, you even get a full featured standard library that includes high-level things like “fetch a web page”. So you can build useful things right off the bat.

WebAssembly and on-demand compilation = compile-time reflection almost for free

But I came to realize that targeting WebAssembly has another advantage: it makes compile-time reflection almost trivial. The Dada compiler is structured in a purely on-demand fashion. This means we can compile one function all the way to WebAssembly bytecode and leave the rest of the crate untouched.

And once we have the WebAssembly bytecode, we can run that from inside the compiler! With wasmtime, we have a high quality JIT that runs very fast. The code is even sandboxed!

So we can have a function that we compile and run during execution and use to produce other code that will be used by other parts of the compilation step. In other words, we get something like miri or Zig’s comptime for free, essentially. Woah.

Wish you could try it? Me too!

Man, writing this blog post made ME excited to play with Dada. Too bad it doesn’t actually work. Ha! But I plan to keep plugging away on the compiler and get it to the point of a live demo as soon as I can. Hard to say exactly how long that will take.

In the meantime, to help me rediscover how things work, I’m going to try to write up a series of blog posts about the type system, borrow checker, and the compiler architecture, all of which I think are pretty interesting.

Yes, I relax by designing new programming languages. Doesn’t everyone? ↩︎
Designing a new version of salsa so that I could write the Dada compiler in the way I wanted really was an epic yak shave, now that I think about it. ↩︎
I lost motivation as I got interested in LLMs. To be frank, I felt like I had to learn enough about them to understand if designing a programming language was “fighting the last war”. Having messed a bunch with LLMs, I definitely feel that they make the choice of programming language less relevant. But I also think they really benefit from higher-level abstractions, even more than humans do, and so I like to think that Dada could still be useful. Besides, it’s fun. ↩︎
And, with LLMs, that period of learning is shorter than ever. ↩︎
Of course this also makes Dada less flexible. I doubt a project like Rust for Linux would work with Dada. ↩︎

https://smallcultfollowing.com/babysteps/blog/2026/02/08/fun-with-dada/

Move Expressions

Nov 21, 2025 Updated Nov 21, 2025

Show full content

This post explores another proposal in the space of ergonomic ref-counting that I am calling move expressions. To my mind, these are an alternative to explicit capture clauses, one that addresses many (but not all) of the goals from that design with improved ergonomics and readability.

TL;DR

The idea itself is simple, within a closure (or future), we add the option to write move($expr). This is a value expression (“rvalue”) that desugars into a temporary value that is moved into the closure. So

|| something(&move($expr))

is roughly equivalent to something like:

{ 
    let tmp = $expr;
    || something(&{tmp})
}

How it would look in practice

Let’s go back to one of our running examples, the “Cloudflare example”, which originated in this excellent blog post by the Dioxus folks. As a reminder, this is how the code looks today – note the let _some_value = ... lines for dealing with captures:

// task:  listen for dns connections
let _some_a = self.some_a.clone();
let _some_b = self.some_b.clone();
let _some_c = self.some_c.clone();
tokio::task::spawn(async move {
  	do_something_else_with(_some_a, _some_b, _some_c)
});

Under this proposal it would look something like this:

tokio::task::spawn(async {
    do_something_else_with(
        move(self.some_a.clone()),
        move(self.some_b.clone()),
        move(self.some_c.clone()),
    )
});

There are times when you would want multiple clones. For example, if you want to move something into a FnMut closure that will then give away a copy on each call, it might look like

data_source_iter
    .inspect(|item| {
        inspect_item(item, move(tx.clone()).clone())
        //                      ----------  -------
        //                           |         |
        //                   move a clone      |
        //                   into the closure  |
        //                                     |
        //                             clone the clone
        //                             on each iteration
    })
    .collect();

// some code that uses `tx` later...

Credit for this idea

This idea is not mine. It’s been floated a number of times. The first time I remember hearing it was at the RustConf Unconf, but I feel like it’s come up before that. Most recently it was proposed by Zachary Harrold on Zulip, who has also created a prototype called soupa. Zachary’s proposal, like earlier proposals I’ve heard, used the super keyword. Later on @simulacrum proposed using move, which to me is a major improvement, and that’s the version I ran with here.

This proposal makes closures more “continuous”

The reason that I love the move variant of this proposal is that it makes closures more “continuous” and exposes their underlying model a bit more clearly. With this design, I would start by explaining closures with move expressions and just teach move closures at the end, as a convenient default:

A Rust closure captures the places you use in the “minimal way that it can” – so || vec.len() will capture a shared reference to the vec, || vec.push(22) will capture a mutable reference, and || drop(vec) will take ownership of the vector.

You can use move expressions to control exactly what is captured: so || move(vec).push(22) will move the vector into the closure. A common pattern when you want to be fully explicit is to list all captures at the top of the closure, like so:
|| {
    let vec = move(input.vec); // take full ownership of vec
    let data = move(&cx.data); // take a reference to data
    let output_tx = move(output_tx); // take ownership of the output channel

    process(&vec, &mut output_tx, data)
}
As a shorthand, you can write move || at the top of the closure, which will change the default so that closures > take ownership of every captured variable. You can still mix-and-match with move expressions to get more control. > So the previous closure might be written more concisely like so:
move || {
    process(&input.vec, &mut output_tx, move(&cx.data))
    //       ---------       ---------       --------      
    //           |               |               |         
    //           |               |       closure still  
    //           |               |       captures a ref
    //           |               |       `&cx.data`        
    //           |               |                         
    //       because of the `move` keyword on the clsoure,
    //       these two are captured "by move"
    //       
}

This proposal makes move “fit in” for me

It’s a bit ironic that I like this, because it’s doubling down on part of Rust’s design that I was recently complaining about. In my earlier post on Explicit Capture Clauses I wrote that:

To be honest, I don’t like the choice of move because it’s so operational. I think if I could go back, I would try to refashion our closures around two concepts

Attached closures (what we now call ||) would always be tied to the enclosing stack frame. They’d always have a lifetime even if they don’t capture anything.

Detached closures (what we now call move ||) would capture by-value, like move today.

I think this would help to build up the intuition of “use detach || if you are going to return the closure from the current stack frame and use || otherwise”.

move expressions are, I think, moving in the opposite direction. Rather than talking about attached and detached, they bring us to a more unified notion of closures, one where you don’t have “ref closures” and “move closures” – you just have closures that sometimes capture moves, and a “move” closure is just a shorthand for using move expressions everywhere. This is in fact how closures work in the compiler under the hood, and I think it’s quite elegant.

Why not suffix?

One question is whether a move expression should be a prefix or a postfix operator. So e.g.

|| something(&$expr.move)

instead of &move($expr).

My feeling is that it’s not a good fit for a postfix operator because it doesn’t just take the final value of the expression and so something with it, it actually impacts when the entire expression is evaluated. Consider this example:

|| process(foo(bar()).move)

When does bar() get called? If you think about it, it has to be closure creation time, but it’s not very “obvious”.

We reached a similar conclusion when we were considering .unsafe operators. I think there is a rule of thumb that things which delineate a “scope” of code ought to be prefix – though I suspect unsafe(expr) might actually be nice, and not just unsafe { expr }.

Edit: I added this section after-the-fact in response to questions.

Conclusion

I’m going to wrap up this post here. To be honest, what this design really has going for it, above anything else, is its simplicity and the way it generalizes Rust’s existing design. I love that. To me, it joins the set of “yep, we should clearly do that” pieces in this puzzle:

Add a Share trait (I’ve gone back to preferring the name share 😁)
Add move expressions

These both seem like solid steps forward. I am not yet persuaded that they get us all the way to the goal that I articulated in an earlier post:

“low-level enough for a Kernel, usable enough for a GUI”

but they are moving in the right direction.

https://smallcultfollowing.com/babysteps/blog/2025/11/21/move-expressions/

Just call clone (or alias)

Nov 10, 2025 Updated Nov 10, 2025

Show full content

Continuing my series on ergonomic ref-counting, I want to explore another idea, one that I’m calling “just call clone (or alias)”. This proposal specializes the clone and alias methods so that, in a new edition, the compiler will (1) remove redundant or unnecessary calls (with a lint); and (2) automatically capture clones or aliases in move closures where needed.

The goal of this proposal is to simplify the user’s mental model: whenever you see an error like “use of moved value”, the fix is always the same: just call clone (or alias, if applicable). This model is aiming for the balance of “low-level enough for a Kernel, usable enough for a GUI” that I described earlier. It’s also making a statement, which is that the key property we want to preserve is that you can always find where new aliases might be created – but that it’s ok if the fine-grained details around exactly when the alias is created is a bit subtle.

The proposal in a nutshell Part 1: Closure desugaring that is aware of clones and aliases

Consider this move future:

fn spawn_services(cx: &Context) {
    tokio::task::spawn(async move {
        //                   ---- move future
        manage_io(cx.io_system.alias(), cx.request_name.clone());
        //        --------------------  -----------------------
    });
    ...
}

Because this is a move future, this takes ownership of cx.io_system and cx_request_name. Because cx is a borrowed reference, this will be an error unless those values are Copy (which they presumably are not). Under this proposal, capturing aliases or clones in a move closure/future would result in capturing an alias or clone of the place. So this future would be desugared like so (using explicit capture clause strawman notation):

fn spawn_services(cx: &Context) {
    tokio::task::spawn(
        async move(cx.io_system.alias(), cx.request_name.clone()) {
            //     --------------------  -----------------------
            //     capture alias/clone respectively

            manage_io(cx.io_system.alias(), cx.request_name.clone());
        }
    );
    ...
}

Part 2: Last-use transformation

Now, this result is inefficient – there are now two aliases/clones. So the next part of the proposal is that the compiler would, in newer Rust editions, apply a new transformat called the last-use transformation. This transformation would identify calls to alias or clone that are not needed to satisfy the borrow checker and remove them. This code would therefore become:

fn spawn_services(cx: &Context) {
    tokio::task::spawn(
        async move(cx.io_system.alias(), cx.request_name.clone()) {
            manage_io(cx.io_system, cx.request_name);
            //        ------------  ---------------
            //        converted to moves
        }
    );
    ...
}

The last-use transformation would apply beyond closures. Given an example like this one, which clones id even though id is never used later:

fn send_process_identifier_request(id: String) {
    let request = Request::ProcessIdentifier(id.clone());
    //                                       ----------
    //                                       unnecessary
    send_request(request)
}

the user would get a warning like so1:

warning: unnecessary `clone` call will be converted to a move
 --> src/main.rs:7:40
  |
8 |     let request = Request::ProcessIdentifier(id.clone());
  |                                              ^^^^^^^^^^ unnecessary call to `clone`
  |
  = help: the compiler automatically removes calls to `clone` and `alias` when not
    required to satisfy the borrow checker
help: change `id.clone()` to `id` for greater clarity
  |
8 -     let request = Request::ProcessIdentifier(id.clone());
8 +     let request = Request::ProcessIdentifier(id);
  |

and the code would be transformed so that it simply does a move:

fn send_process_identifier_request(id: String) {
    let request = Request::ProcessIdentifier(id);
    //                                       --
    //                                   transformed
    send_request(request)
}

Mental model: just call “clone” (or “alias”)

The goal of this proposal is that, when you get an error about a use of moved value, or moving borrowed content, the fix is always the same: you just call clone (or alias). It doesn’t matter whether that error occurs in the regular function body or in a closure or in a future, the compiler will insert the clones/aliases needed to ensure future users of that same place have access to it (and no more than that).

I believe this will be helpful for new users. Early in their Rust journey new users are often sprinkling calls to clone as well as sigils like & in more-or-less at random as they try to develop a firm mental model – this is where the “keep calm and call clone” joke comes from. This approach breaks down around closures and futures today. Under this proposal, it will work, but users will also benefit from warnings indicating unnecessary clones, which I think will help them to understand where clone is really needed.

Experienced users can trust the compiler to get it right

But the real question is how this works for experienced users. I’ve been thinking about this a lot! I think this approach fits pretty squarely in the classic Bjarne Stroustrup definition of a zero-cost abstraction:

“What you don’t use, you don’t pay for. And further: What you do use, you couldn’t hand code any better.”

The first half is clearly satisfied. If you don’t call clone or alias, this proposal has no impact on your life.

The key point is the second half: earlier versions of this proposal were more simplistic, and would sometimes result in redundant or unnecessary clones and aliases. Upon reflection, I decided that this was a non-starter. The only way this proposal works is if experienced users know there is no performance advantage to using the more explicit form.This is precisely what we have with, say, iterators, and I think it works out very well. I believe this proposal hits that mark, but I’d like to hear if there are things I’m overlooking.

The last-use transformation codifies a widespread intuition, that clone is never necessary

I think most users would expect that changing message.clone() to just message is fine, as long as the code keeps compiling. But in fact nothing requires that to be the case. Under this proposal, APIs that make clone significant in unusual ways would be more annoying to use in the new Rust edition and I expect ultimately wind up getting changed so that “significant clones” have another name. I think this is a good thing.

Frequently asked questions

I think I’ve covered the key points. Let me dive into some of the details here with a FAQ.

Can you summarize all of these posts you’ve been writing? It’s a lot to digest!

I get it, I’ve been throwing a lot of things out there. Let me begin by recapping the motivation as I see it:

I believe our goal should be to focus first on a design that is “low-level enough for a Kernel, usable enough for a GUI”.
- The key part here is the word enough. We need to make sure that low-level details are exposed, but only those that truly matter. And we need to make sure that it’s ergonomic to use, but it doesn’t have to be as nice as TypeScript (though that would be great).
Rust’s current approach to Clone fails both groups of users;
- calls to clone are not explicit enough for kernels and low-level software: when you see something.clone(), you don’t know that is creating a new alias or an entirely distinct value, and you don’t have any clue what it will cost at runtime. There’s a reason much of the community recommends writing Arc::clone(&something) instead.
- calls to clone, particularly in closures, are a major ergonomic pain point, this has been a clear consensus since we first started talking about this issue.

I then proposed a set of three changes to address these issues, authored in individual blog posts:

First, we introduce the Alias trait (originally called Handle). The Alias trait introduces a new method alias that is equivalent to clone but indicates that this will be creating a second alias of the same underlying value.
Second, we introduce explicit capture clauses, which lighten the syntactic load of capturing a clone or alias, make it possible to declare up-front the full set of values captured by a closure/future, and will support other kinds of handy transformations (e.g., capturing the result of as_ref or to_string).
Finally, we introduce the just call clone proposal described in this post. This modifies closure desugaring to recognize clones/aliases and also applies the last-use transformation to replace calls to clone/alias with moves where possible.

What would it feel like if we did all those things?

Let’s look at the impact of each set of changes by walking through the “Cloudflare example”, which originated in this excellent blog post by the Dioxus folks:

let some_value = Arc::new(something);

// task 1
let _some_value = some_value.clone();
tokio::task::spawn(async move {
    do_something_with(_some_value);
});

// task 2:  listen for dns connections
let _some_a = self.some_a.clone();
let _some_b = self.some_b.clone();
let _some_c = self.some_c.clone();
tokio::task::spawn(async move {
  	do_something_else_with(_some_a, _some_b, _some_c)
});

As the original blog post put it:

Working on this codebase was demoralizing. We could think of no better way to architect things - we needed listeners for basically everything that filtered their updates based on the state of the app. You could say “lol get gud,” but the engineers on this team were the sharpest people I’ve ever worked with. Cloudflare is all-in on Rust. They’re willing to throw money at codebases like this. Nuclear fusion won’t be solved with Rust if this is how sharing state works.

Applying the Alias trait and explicit capture clauses makes for a modest improvement. You can now clearly see that the calls to clone are alias calls, and you don’t have the awkward _some_value and _some_a variables. However, the code is still pretty verbose:

let some_value = Arc::new(something);

// task 1
tokio::task::spawn(async move(some_value.alias()) {
    do_something_with(some_value);
});

// task 2:  listen for dns connections
tokio::task::spawn(async move(
    self.some_a.alias(),
    self.some_b.alias(),
    self.some_c.alias(),
) {
  	do_something_else_with(self.some_a, self.some_b, self.some_c)
});

Applying the Just Call Clone proposal removes a lot of boilerplate and, I think, captures the intent of the code very well. It also retains quite a bit of explicitness, in that searching for calls to alias reveals all the places that aliases will be created. However, it does introduce a bit of subtlety, since (e.g.) the call to self.some_a.alias() will actually occur when the future is created and not when it is awaited:

let some_value = Arc::new(something);

// task 1
tokio::task::spawn(async move {
    do_something_with(some_value.alias());
});

// task 2:  listen for dns connections
tokio::task::spawn(async move {
  	do_something_else_with(
        self.some_a.alias(),
        self.some_b.alias(),
        self.some_c.alias(),
    )
});

I’m worried that the execution order of calls to alias will be too subtle. How is thie “explicit enough for low-level code”?

There is no question that Just Call Clone makes closure/future desugaring more subtle. Looking at task 1:

tokio::task::spawn(async move {
    do_something_with(some_value.alias());
});

this gets desugared to a call to alias when the future is created (not when it is awaited). Using the explicit form:

tokio::task::spawn(async move(some_value.alias()) {
    do_something_with(some_value)
});

I can definitely imagine people getting confused at first – “but that call to alias looks like its inside the future (or closure), how come it’s occuring earlier?”

Yet, the code really seems to preserve what is most important: when I search the codebase for calls to alias, I will find that an alias is creating for this task. And for the vast majority of real-world examples, the distinction of whether an alias is creating when the task is spawned versus when it executes doesn’t matter. Look at this code: the important thing is that do_something_with is called with an alias of some_value, so some_value will stay alive as long as do_something_else is executing. It doesn’t really matter how the “plumbing” worked.

What about futures that conditionally alias a value?

Yeah, good point, those kind of examples have more room for confusion. Like look at this:

tokio::task::spawn(async move {
    if false {
        do_something_with(some_value.alias());
    }
});

In this example, there is code that uses some_value with an alias, but only under if false. So what happens? I would assume that indeed the future will capture an alias of some_value, in just the same way that this future will move some_value, even though the relevant code is dead:

tokio::task::spawn(async move {
    if false {
        do_something_with(some_value);
    }
});

Can you give more details about the closure desugaring you imagine?

Yep! I am thinking of something like this:

If there is an explicit capture clause, use that.
Else:
- For non-move closures/futures, no changes, so
  - Categorize usage of each place and pick the “weakest option” that is available:
    - by ref
    - by mut ref
    - moves
- For move closures/futures, we would change
  - Categorize usage of each place P and decide whether to capture that place…
    - by clone, there is at least one call P.clone() or P.alias() and all other usage of P requires only a shared ref (reads)
    - by move, if there are no calls to P.clone() or P.alias() or if there are usages of P that require ownership or a mutable reference
  - Capture by clone/alias when a place a.b.c is only used via shared references, and at least one of those is a clone or alias.
    - For the purposes of this, accessing a “prefix place” a or a “suffix place” a.b.c.d is also considered an access to a.b.c.

Examples that show some edge cased:

if consume {
    x.foo().
}

Why not do something similar for non-move closures?

In the relevant cases, non-move closures will already just capture by shared reference. This means that later attempts to use that variable will generally succeed:

let f = async {
    //  ----- NOT async move
    self.some_a.alias()
};

do_something_else(self.some_a.alias());
//                ----------- later use succeeds

f.await;

This future does not need to take ownership of self.some_a to create an alias, so it will just capture a reference to self.some_a. That means that later uses of self.some_a can still compile, no problem. If this had been a move closure, however, that code above would currently not compile.

There is an edge case where you might get an error, which is when you are moving:

let f = async {
    self.some_a.alias()
};

do_something_else(self.some_a);
//                ----------- move!

f.await;

In that case, you can make this an async move closure and/or use an explicit capture clause:

Can you give more details about the last-use transformation you imagine?

Yep! We would during codegen identify candidate calls to Clone::clone or Alias::alias. After borrow check has executed, we would examine each of the callsites and check the borrow check information to decide:

Will this place be accessed later?
Will some reference potentially referencing this place be accessed later?

If the answer to both questions is no, then we will replace the call with a move of the original place.

Here are some examples:

fn borrow(message: Message) -> String {
    let method = message.method.to_string();

    send_message(message.clone());
    //           ---------------
    //           would be transformed to
    //           just `message`

    method
}

fn borrow(message: Message) -> String {
    send_message(message.clone());
    //           ---------------
    //           cannot be transformed
    //           since `message.method` is
    //           referenced later

    message.method.to_string()
}

fn borrow(message: Message) -> String {
    let r = &message;

    send_message(message.clone());
    //           ---------------
    //           cannot be transformed
    //           since `r` may reference
    //           `message` and is used later.

    r.method.to_string()
}

Why are you calling it the last-use transformation and not optimization?

In the past, I’ve talked about the last-use transformation as an optimization – but I’m changing terminology here. This is because, typically, an optimization is supposed to be unobservable to users except through measurements of execution time (or though UB), and that is clearly not the case here. The transformation would be a mechanical transformation performed by the compiler in a deterministic fashion.

Would the transformation “see through” references?

I think yes, but in a limited way. In other words I would expect

Clone::clone(&foo)

and

let p = &foo;
Clone::clone(p)

to be transformed in the same way (replaced with foo), and the same would apply to more levels of intermediate usage. This would kind of “fall out” from the MIR-based optimization technique I imagine. It doesn’t have to be this way, we could be more particular about the syntax that people wrote, but I think that would be surprising.

On the other hand, you could still fool it e.g. like so

fn identity<T>(x: &T) -> &T { x }

identity(&foo).clone()

Would the transformation apply across function boundaries?

The way I imagine it, no. The transformation would be local to a function body. This means that one could write a force_clone method like so that “hides” the clone in a way that it will never be transformed away (this is an important capability for edition transformations!):

fn pipe<Msg: Clone>(message: Msg) -> Msg {
    log(message.clone()); // <-- keep this one
    force_clone(&message)
}

fn force_clone<Msg: Clone>(message: &Msg) -> Msg {
    // Here, the input is `&Msg`, so the clone is necessary
    // to produce a `Msg`.
    message.clone()
}

Won’t the last-use transformation change behavior by making destructors run earlier?

Potentially, yes! Consider this example, written using explicit capture clause notation and written assuming we add an Alias trait:

async fn process_and_stuff(tx: mpsc::Sender<Message>) {
    tokio::spawn({
        async move(tx.alias()) {
            //     ---------- alias here
            process(tx).await
        }
    });

    do_something_unrelated().await;
}

The precise timing when Sender values are dropped can be important – when all senders have dropped, the Receiver will start returning None when you call recv. Before that, it will block waiting for more messages, since those tx handles could still be used.

So, in process_and_stuff, when will the sender aliases be fully dropped? The answer depends on whether we do the last-use transformation or not:

Without the transformation, there are two aliases: the original tx and the one being held by the future. So the receiver will only start returning None when do_something_unrelated has finished and the task has completed.
With the transformation, the call to tx.alias() is removed, and so there is only one alias – tx, which is moved into the future, and dropped once the spawned task completes. This could well be earlier than in the previous code, which had to wait until both process_and_stuff and the new task completed.

Most of the time, running destructors earlier is a good thing. That means lower peak memory usage, faster responsiveness. But in extreme cases it could lead to bugs – a typical example is a Mutex<()> where the guard is being used to protect some external resource.

How can we change when code runs? Doesn’t that break stability?

This is what editions are for! We have in fact done a very similar transformation before, in Rust 2021. RFC 2229 changed destructor timing around closures and it was, by and large, a non-event.

The desire for edition compatibility is in fact one of the reasons I want to make this a last-use transformation and not some kind of optimization. There is no UB in any of these examples, it’s just that to understand what Rust code does around clones/aliases is a bit more complex than it used to be, because the compiler will do automatic transformation to those calls. The fact that this transformation is local to a function means we can decide on a call-by-call basis whether it should follow the older edition rules (where it will always occur) or the newer rules (where it may be transformed into a move).

Does that mean that the last-use transformation would change with Polonius or other borrow checker improvements?

In theory, yes, improvements to borrow-checker precision like Polonius could mean that we identify more opportunities to apply the last-use transformation. This is something we can phase in over an edition. It’s a bit of a pain, but I think we can live with it – and I’m unconvinced it will be important in practice. For example, when thinking about the improvements I expect under Polonius, I was not able to come up with a realistic example that would be impacted.

Isn’t it weird to do this after borrow check?

This last-use transformation is guaranteed not to produce code that would fail the borrow check. However, it can affect the correctness of unsafe code:

let p: *const T = &*some_place;

let q: T = some_place.clone();
//         ---------- assuming `some_place` is
//         not used later, becomes a move

unsafe {
    do_something(p);
    //           -
    // This now refers to a stack slot
    // whose value is uninitialized.
}

Note though that, in this case, there would be a lint identifying that the call to some_place.clone() will be transformed to just some_place. We could also detect simple examples like this one and report a stronger deny-by-default lint, as we often do when we see guaranteed UB.

Shouldn’t we use a keyword for this?

When I originally had this idea, I called it “use-use-everywhere” and, instead of writing x.clone() or x.alias(), I imagined writing x.use. This made sense to me because a keyword seemed like a stronger signal that this was impacting closure desugaring. However, I’ve changed my mind for a few reasons.

First, Santiago Pastorino gave strong pushback that x.use was going to be a stumbling block for new learners. They now have to see this keyword and try to understand what it means – in contrast, if they see method calls, they will likely not even notice something strange is going on.

The second reason though was TC who argued, in the lang-team meeting, that all the arguments for why it should be ergonomic to clone a ref-counted value in a closure applied equally well to clone, depending on the needs of your application. I completely agree. As I mentioned earlier, this also [addresses the concern I’ve heard with the Alias trait], which is that there are things you want to ergonomically clone but which don’t correspond to “aliases”. True.

In general I think that clone (and alias) are fundamental enough to how Rust is used that it’s ok to special case them. Perhaps we’ll identify other similar methods in the future, or generalize this mechanism, but for now I think we can focus on these two cases.

What about “deferred ref-counting”?

One point that I’ve raised from time-to-time is that I would like a solution that gives the compiler more room to optimize ref-counting to avoid incrementing ref-counts in cases where it is obvious that those ref-counts are not needed. An example might be a function like this:

fn use_data(rc: Rc<Data>) {
    for datum in rc.iter() {
        println!("{datum:?}");
    }
}

This function requires ownership of an alias to a ref-counted value but it doesn’t actually do anything but read from it. A caller like this one…

use_data(source.alias())

…doesn’t really need to increment the reference count, since the caller will be holding a reference the entire time. I often write code like this using a &:

fn use_data(rc: &Rc<Data>) {
    for datum in rc.iter() {
        println!("{datum:?}");
    }
}

so that the caller can do use_data(&source) – this then allows the callee to write rc.alias() in the case that it wants to take ownership.

I’ve basically decided to punt on adressing this problem. I think folks that are very performance sensitive can use &Arc and the rest of us can sometimes have an extra ref-count increment, but either way, the semantics for users are clear enough and (frankly) good enough.

Surprisingly to me, clippy::pedantic doesn’t have a dedicated lint for unnecessary clones. This particular example does get a lint, but it’s a lint about taking an argument by value and then not consuming it. If you rewrite the example to create id locally, clippy does not complain. ↩︎

https://smallcultfollowing.com/babysteps/blog/2025/11/10/just-call-clone/

But then again...maybe alias?

Nov 5, 2025 Updated Nov 5, 2025

Show full content

Hmm, as I re-read the post I literally just posted a few minutes ago, I got to thinking. Maybe the right name is indeed Alias, and not Share. The rationale is simple: alias can serve as both a noun and a verb. It hits that sweet spot of “common enough you know what it means, but weird enough that it can be Rust Jargon for something quite specific”. In the same way that we talk about “passing a clone of foo” we can talk about “passing an alias to foo” or an “alias of foo”. Food for thought! I’m going to try Alias on for size in future posts and see how it feels.

https://smallcultfollowing.com/babysteps/blog/2025/11/05/maybe-alias/

Bikeshedding `Handle` and other follow-up thoughts

Nov 5, 2025 Updated Nov 5, 2025

Show full content

There have been two major sets of responses to my proposal for a Handle trait. The first is that the Handle trait seems useful but doesn’t over all the cases where one would like to be able to ergonomically clone things. The second is that the name doesn’t seem to fit with our Rust conventions for trait names, which emphasize short verbs over nouns. The TL;DR of my response is that (1) I agree, this is why I think we should work to make Clone ergonomic as well as Handle; and (2) I agree with that too, which is why I think we should find another name. At the moment I prefer Share, with Alias coming in second.

Handle doesn’t cover everything

The first concern with the Handle trait is that, while it gives a clear semantic basis for when to implement the trait, it does not cover all the cases where calling clone is annoying. In other words, if we opt to use Handle, and then we make creating new handles very ergonomic, but calling clone remains painful, there will be a temptation to use the Handle when it is not appropriate.

In one of our lang team design meetings, TC raised the point that, for many applications, even an “expensive” clone isn’t really a big deal. For example, when writing CLI tools and things, I regularly clone strings and vectors of strings and hashmaps and whatever else; I could put them in an Rc or Arc but I know it just doens’t matter.

My solution here is simple: let’s make solutions that apply to both Clone and Handle. Given that I think we need a proposal that allows for handles that are both ergonomic and explicit, it’s not hard to say that we should extend that solution to include the option for clone.

The explicit capture clause post already fits this design. I explicitly chose a design that allowed for users to write move(a.b.c.clone()) or move(a.b.c.handle()), and hence works equally well (or equally not well…) with both traits

The name Handle doesn’t fit the Rust conventions

A number of people have pointed out Handle doesn’t fit the Rust naming conventions for traits like this, which aim for short verbs. You can interpret handle as a verb, but it doesn’t mean what we want. Fair enough. I like the name Handle because it gives a noun we can use to talk about, well, handles, but I agree that the trait name doesn’t seem right. There was a lot of bikeshedding on possible options but I think I’ve come back to preferring Jack Huey’s original proposal, Share (with a method share). I think Alias and alias is my second favorite. Both of them are short, relatively common verbs.

I originally felt that Share was a bit too generic and overly associated with sharing across threads – but then I at least always call &T a shared reference1, and an &T would implement Share, so it all seems to work well. Hat tip to Ariel Ben-Yehuda for pushing me on this particular name.

Coming up next

The flurry of posts in this series have been an attempt to survey all the discussions that have taken place in this area. I’m not yet aiming to write a final proposal – I think what will come out of this is a series of multiple RFCs.

My current feeling is that we should add the Hand^H^H^H^H, uh, Share trait. I also think we should add explicit capture clauses. However, while explicit capture clauses are clearly “low-level enough for a kernel”, I don’t really think they are “usable enough for a GUI” . The next post will explore another idea that I think might bring us closer to that ultimate ergonomic and explicit goal.

A lot of people say immutable reference but that is simply accurate: an &Mutex is not immutable. I think that the term shared reference is better. ↩︎

https://smallcultfollowing.com/babysteps/blog/2025/11/05/bikeshedding-handle/

Explicit capture clauses

Oct 22, 2025 Updated Oct 22, 2025

Show full content

In my previous post about Ergonomic Ref Counting, I talked about how, whatever else we do, we need a way to have explicit handle creation that is ergonomic. The next few posts are going to explore a few options for how we might do that.

This post focuses on explicit capture clauses, which would permit closures to be annotated with an explicit set of captured places. My take is that explicit capture clauses are a no brainer, for reasons that I’ll cover below, and we should definitely do them; but they may not be enough to be considered ergonomic, so I’ll explore more proposals afterwards.

Motivation

Rust closures today work quite well but I see a few problems:

Teaching and understanding closure desugaring is difficult because it lacks an explicit form. Users have to learn to desugar in their heads to understand what’s going on.
Capturing the “clone” of a value (or possibly other transformations) has no concise syntax.
For long closure bodies, it is hard to determine precisely which values are captured and how; you have to search the closure body for references to external variables, account for shadowing, etc.
It is hard to develop an intuition for when move is required. I find myself adding it when the compiler tells me to, but that’s annoying.

Let’s look at a strawperson proposal

Some time ago, I wrote a proposal for explicit capture clauses. I actually see a lot of flaws with this proposal, but I’m still going to explain it: right now it’s the only solid proposal I know of, and it’s good enough to explain how an explicit capture clause could be seen as a solution to the “explicit and ergonomic” goal. I’ll then cover some of the things I like about the proposal and what I don’t.

Begin with move

The proposal begins by extending the move keyword with a list of places to capture:

let closure = move(a.b.c, x.y) || {
    do_something(a.b.c.d, x.y)
};

The closure will then take ownership of those two places; references to those places in the closure body will be replaced by accesses to these captured fields. So that example would desugar to something like

let closure = {
    struct MyClosure {
        a_b_c: Foo,
        x_y: Bar,
    }

    impl FnOnce<()> for MyClosure {
        fn call_once(self) -> Baz {
            do_something(self.a_b_c.d, self.x_y)
            //           ----------    --------
            //   The place `a.b.c` is      |
            //   rewritten to the field    |
            //   `self.a_b_c`              |
            //                  Same here but for `x.y`
        }
    }

    MyClosure {
        a_b_c: self.a.b.c,
        x_y: self.x.y,
    }
};

When using a simple list like this, attempts to reference other places that were not captured result in an error:

let closure = move(a.b.c, x.y) || {
    do_something(a.b.c.d, x.z)
    //           -------  ---
    //           OK       Error: `x.z` not captured
};

Capturing with rewrites

It is also possible to capture a custom expression by using an = sign. So for example, you could rewrite the above closure as follows:

let closure = move(
    a.b.c = a.b.c.clone(),
    x.y,
) || {
    do_something(a.b.c.d, x.z)
};

and it would desugar to:

let closure = {
    struct MyClosure { /* as before */ }
    impl FnOnce<()> for MyClosure { /* as before */ }

    MyClosure {
        a_b_c: self.a.b.c.clone(),
        //     ------------------
        x_y: self.x.y,
    }
};

When using this form, the expression assigned to a.b.c must have the same type as a.b.c in the surrounding scope. So this would be an error:

let closure = move(
    a.b.c = 22, // Error: `i32` is not `Foo`
    x.y,
) || {
    /* ... */
};

Shorthands and capturing by reference

You can understand move(a.b) as sugar for move(a.b = a.b). We support other convenient shorthands too, such as

move(a.b.clone()) || {...}
// == anything that ends in a method call becomes ==>
move(a.b = a.b.clone()) || {...}

and two kinda special shorthands:

move(&a.b) || { ... }
move(&mut a.b) || { ... }

These are special because the captured value is indeed &a.b and &mut a.b – but that by itself wouldn’t work, because the type doesn’t match. So we rewrite each access to a.b to desugar to a dereference of the a_b field, like *self.a_b:

move(&a.b) || { foo(a.b) }

// desugars to

struct MyStruct<'l> {
    a_b: &'l Foo
}

impl FnOnce for MyStruct<'_> {
    fn call_once(self) {
        foo(*self.a_b)
        //  ---------
        //  we insert the `*` too
    }
}

MyStruct {
    a_b: &a.b,
}

move(&a.b) || { foo(*a.b) }

There’s a lot of precedence for this sort of transform: it’s precisely what we do for the Deref trait and for existing closure captures.

Fresh variables

We should also allow you to define fresh variables. These can have arbitrary types. The values are evaluated at closure creation time and stored in the closure metadata:

move(
    data = load_data(),
    y,
) || {
    take(&data, y)
}

Open-ended captures

All of our examples so far fully enumerated the captured variables. But Rust closures today infer the set of captures (and the style of capture) based on the paths that are used. We should permit that as well. I’d permit that with a .. sugar, so these two closures are equivalent:

let c2 = move || /* closure */;
//       ---- capture anything that is used,
//            taking ownership

let c1 = move(..) || /* closure */;
//           ---- capture anything else that is used,
//                taking ownership

Of course you can combine:

let c = move(x.y.clone(), ..) || {

};

And you could write ref to get the equivalent of || closures:

let c2 = || /* closure */;
//       -- capture anything that is used,
//          using references if possible
let c1 = move(ref) || /* closure */;
//            --- capture anything else that is used,
//                using references if possible

This lets you

let c = move(
    a.b.clone(), 
    c,
    ref
) || {
    combine(&a.b, &c, &z)
    //       ---   -   -
    //        |    |   |
    //        |    | This will be captured by reference
    //        |    | since it is used by reference
    //        |    | and is not explicitly named.
    //        |    |
    //        |   This will be captured by value
    //        |   since it is explicitly named.
    //        |
    // We will capture a clone of this because
    // the user wrote `a.b.clone()`
}

Frequently asked questions How does this help with our motivation?

Let’s look at the motivations I named:

Teaching and understanding closure desugaring is difficult

There’s a lot of syntax there, but it also gives you an explicit form that you can use to do explanations. To see what I mean, consider the difference between these two closures (playground).

The first closure uses ||:

fn main() {
    let mut i = 3;
    let mut c_attached = || {
        let j = i + 1;
        std::mem::replace(&mut i, j)
    };
    ...
}

While the second closure uses move:

fn main() {
    let mut i = 3;
    let mut c_detached = move || {
        let j = i + 1;
        std::mem::replace(&mut i, j)
    };

These are in fact pretty different, as you can see in this playground. But why? Well, the first closure desugars to capture a reference:

let mut i = 3;
let mut c_attached = move(&i) || {...};

and the second captures by value:

let mut i = 3;
let mut c_attached = move(i) || {...};

Before, to explain that, I had to resort to desugaring to structs.

Capturing a clone is painful

If you have a closure that wants to capture the clone of something today, you have to introduce a fresh variable. So something like this:

let closure = move || {
    begin_actor(data, self.tx.clone())
};

becomes

let closure = {
    let self_tx = self.tx.clone();
    move || {
        begin_actor(data, self_tx.clone())
    }
};

This is awkward. Under this proposal, it’s possible to point-wise replace specific items:

let closure = move(self.tx.clone(), ..) || {
    begin_actor(data, self.tx.clone())
};

For long closure bodies, it is hard to determine precisely which values are captured and how

Quick! What variables does this closure use from the environment?

.flat_map(move |(severity, lints)| {
    parse_tt_as_comma_sep_paths(lints, edition)
    .into_iter()
    .flat_map(move |lints| {
        // Rejoin the idents with `::`, so we have no spaces in between.
        lints.into_iter().map(move |lint| {
            (
                lint.segments().filter_map(
                    |segment| segment.name_ref()
                ).join("::").into(),
                severity,
            )
        })
    })
})

No idea? Me either. What about this one?

.flat_map(move(edition) |(severity, lints)| {
    /* same as above */
})

Ah, pretty clear! I find that once a closure moves beyond a couple of lines, it can make a function kind of hard to read, because it’s hard to tell what variables it may be accessing. I’ve had functions where it’s important to correctness for one reason or another that a particular closure only accesses a subset of the values around it, but I have no way to indicate that right now. Sometimes I make separate functions, but it’d be nicer if I could annotate the closure’s captures explicitly.

It is hard to develop an intuition for when move is required

Hmm, actually, I don’t think this notation helps with that at all! More about this below.

Let me cover some of the questions you may have about this design.

Why allow the “capture clause” to specify an entire place, like a.b.c?

Today you can write closures that capture places, like self.context below:

let closure = move || {
    send_data(self.context, self.other_field)
};

My goal was to be able to take such a closure and to add annotations that change how particular places are captured, without having to do deep rewrites in the body:

let closure = move(self.context.clone(), ..) || {
    //            --------------------------
    //            the only change
    send_data(self.context, self.other_field)
};

This definitely adds some complexity, because it means we have to be able to “remap” a place like a.b.c that has multiple parts. But it makes the explicit capture syntax far more powerful and convenient.

Why do you keep the type the same for places like a.b.c?

I want to ensure that the type of a.b.c is the same wherever it is type-checked, it’ll simplify the compiler somewhat and just generally makes it easier to move code into and out of a closure.

Why the move keyword?

Because it’s there? To be honest, I don’t like the choice of move because it’s so operational. I think if I could go back, I would try to refashion our closures around two concepts

Attached closures (what we now call ||) would always be tied to the enclosing stack frame. They’d always have a lifetime even if they don’t capture anything.
Detached closures (what we now call move ||) would capture by-value, like move today.

I think this would help to build up the intuition of “use detach || if you are going to return the closure from the current stack frame and use || otherwise”.

What would a max-min explicit capture proposal look like?

A maximally minimal explicit capture close proposal would probably just let you name specific variables and not “subplaces”:

move(
    a_b_c = a.b.c,
    x_y = &x.y
) || {
    *x_y + a_b_c
}

I think you can see though that this makes introducing an explicit form a lot less pleasant to use and hence isn’t really going to do anything to support ergonomic RC.

Conclusion: Explicit closure clauses make things better, but not great

I think doing explicit capture clauses is a good idea – I generally think we should have explicit syntax for everything in Rust, for teaching and explanatory purposes if nothing else; I didn’t always think this way, but it’s something I’ve come to appreciate over time.

I’m not sold on this specific proposal – but I think working through it is useful, because it (a) gives you an idea of what the benefits would be and (b) gives you an idea of how much hidden complexity there is.

I think the proposal shows that adding explicit capture clauses goes some way towards making things explicit and ergonomic. Writing move(a.b.c.clone()) is definitely better than having to create a new binding.

But for me, it’s not really nice enough. It’s still quite a mental distraction to have to find the start of the closure, insert the a.b.c.clone() call, and it makes the closure header very long and unwieldy. Particularly for short closures the overhead is very high.

This is why I’d like to look into other options. Nonetheless, it’s useful to have discussed a proposal for an explicit form: if nothing else, it’ll be useful to explain the precise semantics of other proposals later on.

https://smallcultfollowing.com/babysteps/blog/2025/10/22/explicit-capture-clauses/

Move, Destruct, Forget, and Rust

Oct 21, 2025 Updated Oct 22, 2025

Show full content

This post presents a proposal to extend Rust to support a number of different kinds of destructors. This means we could async drop, but also prevent “forgetting” (leaking) values, enabling async scoped tasks that run in parallel à la rayon/libstd. We’d also be able to have types whose “destructors” require arguments. This proposal – an evolution of “must move” that I’ll call “controlled destruction” – is, I think, needed for Rust to live up to its goal of giving safe versions of critical patterns in systems programming. As such, it is needed to complete the “async dream”, in which async Rust and sync Rust work roughly the same.

Nothing this good comes for free. The big catch of the proposal is that it introduces more “core splits” into Rust’s types. I believe these splits are well motivated and reasonable – they reflect inherent complexity, in other words, but they are something we’ll want to think carefully about nonetheless.

Summary

The TL;DR of the proposal is that we should:

Introduce a new “default trait bound” Forget and an associated trait hierarchy:
- trait Forget: Drop, representing values that can be forgotten
- trait Destruct: Move, representing values with a destructor
- trait Move: Pointee, representing values that can be moved
- trait Pointee, the base trait that represents any value
Use the “opt-in to weaker defaults” scheme proposed for sizedness by RFC #3729 (Hierarchy of Sized Traits)
- So fn foo<T>(t: T) defaults to “a T that can be forgotten/destructed/moved”
- And fn foo<T: Destruct>(t: T) means “a T that can be destructed, but not necessarily forgotten”
- And fn foo<T: Move>(t: T) means “a T that can be moved, but not necessarily forgotten”
- …and so forth.
Integrate and enforce the new traits:
- The bound on std::mem::forget will already require Forget, so that’s good.
- Borrow check can enforce that any dropped value must implement Destruct; in fact, we already do this to enforce const Destruct bounds in const fn.
- Borrow check can be extended to require a Move bound on any moved value.
Adjust the trait bound on closures (luckily this works out fairly nicely)

Motivation

In a talk I gave some years back at Rust LATAM in Uruguay 1, I said this:

It’s easy to expose a high-performance API.
But it’s hard to help users control it – and this is what Rust’s type system does.

Person casting a firespell and burning themselves

Rust currently does a pretty good job with preventing parts of your program from interfering with one another, but we don’t do as good a job when it comes to guaranteeing that cleaup happens2. We have destructors, of course, but they have two critical limitations:

All destructors must meet the same signature, fn drop(&mut self), which isn’t always adequate.
There is no way to guarantee a destructor once you give up ownership of a value.

Making it concrete.

That motivation was fairly abstract, so let me give some concrete examples of things that tie back to this limitation:

The ability to have async or const drop, both of which require a distinct drop signature.
The ability to have a “drop” operation that takes arguments, such as e.g. a message that must be sent, or a result code that must be provided before the program terminates.
The ability to have async scopes that can access the stack, which requires a way to guarantee that a parallel thread will be joined even in an async context.
The ability to integrate at maximum efficiency with WebAssembly async tasks, which require guaranteed cleanup.3

The goal of this post is to outline an approach that could solve all of the above problems and which is backwards compatible with Rust today.

The “capabilities” of value disposal

The core problem is that Rust today assumes that every Sized value can be moved, dropped, and forgotten:

// Without knowing anything about `T` apart
// from the fact that it's `Sized`, we can...
fn demonstration<T>(a: T, b: T, c: T) {
    // ...drop `a`, running its destructor immediately.
    std::mem::drop(a);

    // ...forget `b`, skipping its destructor
    std::mem::forget(b);

    // ...move `c` into `x`
    let x = c;
} // ...and then have `x` get dropped automatically,
// as exit the block.

Destructors are like “opt-out methods”

The way I see, most methods are “opt-in” – they don’t execute unless you call them. But destructors are different. They are effectively a method that runs by default – unless you opt-out, e.g., by calling forget. But the ability to opt-out means that they don’t fundamentally add any power over regular methods, they just make for a more ergonomic API.

The implication is that the only way in Rust today to guarantee that a destructor will run is to retain ownership of the value. This can be important to unsafe code – APIs that permit scoped threads, for example, need to guarantee that those parallel threads will be joined before the function returns. The only way they have to do that is to use a closure which gives &-borrowed access to a scope:

scope(|s| ...)
//     -  --- ...which ensures that this
//     |      fn body cannot "forget" it.
//     |  
// This value has type `&Scope`...

Because the API nevers gives up ownership of the scope, it can ensure that it is never “forgotten” and thus that its destructor runs.

The scoped thread approach works for sync code, but it doesn’t work for async code. The problem is that async functions return a future, which is a value. Users can therefore decide to “forget” this value, just like any other value, and thus the destructor may never run.

Guaranteed cleanup is common in systems programming

When you start poking around, you find that guaranteed destructors turn up quite a bit in systems programming. Scoped APIs in futures are one example, but DMA (direct memory access) is another. Many embedded devices have a mode where you begin a DMA transfer that causes memory to be written into memory asynchronously. But you need to ensure that this DMA is terminated before that memory is freed. If that memory is on your stack, that means you need a destructor that will either cancel or block until the DMA finishes.4

So what can we do about it?

This situation is very analogous to the challenge of revisiting the default Sized bound, and I think the same basic approach that I outlined in [this blog post][sized] will work.

The core of the idea is simple: have a “special” set of traits arranged in a hierarchy:

trait Forget: Destruct {} // Can be "forgotten"
trait Destruct: Move {}   // Can be "destructed" (dropped)
trait Move: Pointee {}    // Can be "moved"
trait Pointee {}          // Can be referenced by pointer

By default, generic parameters get a Forget bound, so fn foo<T>() is equivalent to fn foo<T: Forget>(). But if the parameter opts in to a weaker bound, then the default is suppressed, so fn bar<T: Destruct>() means that T is assumed by “destructible” but not forgettable. And fn baz<T: Move>() indicates that T can only be moved.

Impact of these bounds

Let me explain briefly how these bounds would work.

The default can forget, drop, move etc

Given a default type T, or one that writes Forget explicitly, the function can do anything that is possible today:

fn just_forget<T: Forget>(a: T, b: T, c: T) {
    //         --------- this bound is the default
    std::mem::drop(a);   // OK
    std::mem::forget(b); // OK
    let x = c;           // OK
}

The forget function requires T: Forget

The std::mem::forget function would require T: Forget as well:

pub fn forget<T: Forget>(value: T) { /* magic intrinsic */ }

This means that if you have only Destruct, the function can only drop or move, it can’t “forget”:

fn just_destruct<T: Destruct>(a: T, b: T, c: T) {
    //           -----------
    // This function only requests "Destruct" capability.

    std::mem::drop(a);   // OK
    std::mem::forget(b); // ERROR: `T: Forget` required
    let x = c;           // OK
}

The borrow checker would require “dropped” values implement Destruct

We would modify the drop function to require only T: Destruct:

fn drop<T: Destruct>(t: T) {}

We would also extend the borrow checker so that when it sees a value being dropped (i.e., because it went out of scope), it would require the Destruct bound.

That means that if you have a value whose type is only Move, you cannot “drop” it:

fn just_move<T: Move>(a: T, b: T, c: T) {
    //           -----------
    // This function only requests "Move" capability.

    std::mem::drop(a);   // ERROR: `T: Destruct` required
    std::mem::forget(b); // ERROR: `T: Forget` required
    let x = c;           // OK
}                        // ERROR: `x` is being dropped, but `T: Destruct`

This means that if you have only a Move bound, you must move anything you own if you want to return from the function. For example:

fn return_ok<T: Move>(a: T) -> T {
    a // OK
}

If you have a function that does not move, you’ll get an error:

fn return_err<T: Move>(a: T) -> T {
} // ERROR: `a` does not implement `Destruct`

It’s worth pointing out that this will be annoying as all get out in the face of panics:

fn return_err<T: Move>(a: T) -> T {
    // ERROR: If a panic occurs, `a` would be dropped, but `T` not implement `Destruct`
    forbid_env_var();

    a
} 

fn forbid_env_var() {
    if std::env::var("BAD").is_ok() {
        panic!("Uh oh: BAD cannot be set");
    }
}

I’m ok with this, but it is going to put pressure on better ways to rule out panics statically.

Const (and later async) variants of Destruct

In fact, we are already doing something much like this destruct check for const functions. Right now if you have a const fn and you try to drop a value, you get an error:

const fn test<T>(t: T) {
} // ERROR!

Compiling that gives you the error:

error[E0493]: destructor of `T` cannot be evaluated at compile-time
 --> src/lib.rs:1:18
  |
1 | const fn test<T>(t: T) { }
  |                  ^       - value is dropped here
  |                  |
  |                  the destructor for this type cannot be evaluated in constant functions

This check is not presently taking place in borrow check but it could be.

The borrow checker would require “moved” values implement Move

The final part of the check would be requiring that “moved” values implement Move:

fn return_err<T: Pointee>(a: T) -> T {
    a // ERROR: `a` does not implement `Move`
}

You might think that having types that are !Move would replace the need for pin, but this is not the case. A pinned value is one that can never move again, whereas a value that is not Move can never be moved in the first place – at least once it is stored into a place.

I’m not sure if this part of the proposal makes sense, we could start by just having all types be Move, Destruct, or (the default) Forget.

Opting out from forget etc

The other part of the proposal is that you should be able to explicit “opt out” from being forgettable, e.g. by doing

struct MyType {}
impl Destruct for MyType {}

Doing this will limit the generics that can accept your type, of course.

Associated type bounds

The tough part with these “default bound” proposals is always associated type bounds. For backwards compatibility, we’d have to default to Forget but a lot of associated types that exist in the wild today shouldn’t really require Forget. For example a trait like Add should really just require Move for its return type:

trait Add<Rhs = Self> {
    type Output /* : Move */;
}

I am basically not too worried about this. It’s possible that we can weaken these bounds over time or through editions. Or, perhaps, add in some kind of edition-specific “alias” like

trait Add2025<Rhs = Self> {
    type Output: Move;
}

where Add2025 is implemented for everything that implements Add.

I am not sure exactly how to manage it, but we’ll figure it out – and in the meantime, most of the types that should not be forgettable are really just “guard” types that don’t have to flow through quite so many places.

Associated type bounds in closures

The one place that I think it is really imporatnt that we weaken the associated type bounds is with closures– and, fortunately, that’s a place we can get away with due to the way our “closure trait bound” syntax works. I feel like I wrote a post on this before, but I can’t find it now, but the short version is that, today, when you write F: Fn(), that means that the closure must return (). If you write F: Fn() -> T, then this type T must have been declared somewhere else, and so T will (independently from the associated type of the Fn trait) get a default Forget bound. So since the Fn associated type is not independently nameable in stable Rust, we can change its bounds, and code like this would continue to work unchanged:

fn foo<T, F>()
where
    F: Fn() -> T,
    //         - `T: Forget` still holds by default
{}

Frequently asked questions How does this relate to the recent thread on internals?

Recently I was pointed at this internals thread for a “substructural type system” which likely has very similar capabilities. To be totally honest, though, I haven’t had time to read and digest it yet! I had this blog post like 95% done though so I figured I’d post it first and then go try and compare.

What would it mean for a struct to opt out of Move (e.g., by being only Pointee)?

So, the system as I described would allow for ‘unmoveable’ types (i.e., a struct that opts out from everything and only permits Pointee), but such a struct would only really be something you could store in a static memory location. You couldn’t put it on the stack because the stack must eventually get popped. And you couldn’t move it from place to place because, well, it’s immobile.

This seems like something that could be useful – e.g., to model “video RAM” or something that lives in a specific location in memory and cannot live anywhere else – but it’s not a widespread need.

How would you handle destructors with arguments?

I imagine something like this:

struct Transaction {
    data: Vec<u8>
}

/// Opt out from destruct
impl Move for Transaction { }

impl Transaction {
    // This is effectively a "destructor"
    pub fn complete(
        self, 
        connection: Connection,
    ) {
        let Transaction { data } = self;
    }
}

With this setup, any function that owns a Transaction must eventually invoke transaction.complete(). This is because no values of this type can be dropped, so they must be moved.

How does this relate to async drop?

This setup provides attacks a key problem that has blocked async drop in my mind, which is that types that are “async drop” do not have to implement “sync drop”. This gives the type system the ability to prevent them from being dropped in sync code, then, and it would mean that they can only be dropped in async drop. But there’s still lots of design work to be done there.

Why is the trait Destruct and not Drop?

This comes from the const generifs work. I don’t love it. But there is a logic to it. Right now, when you drop a struct or other value, that actually does a whole sequence of things, only one of which is running any Drop impl – it also (for example) drops all the fields in the struct recursively, etc. The idea is that “destruct” refers to this whole sequence.

How hard would this to be to prototype?

I…don’t actually think it would be very hard. I’ve thought somewhat about it and all of the changes seem pretty straightforward. I would be keen to support a lang-team experiment on this.

Does this mean we should have had leak?

The whole topic of destructors and leaks and so forth datesback to approximately Rust 1.0, when we discovered that, in fact, our abstraction for threads was unsound when combined with cyclic ref-counted boxes. Before that we hadn’t fully internalized that destructors are “opt-out methods”. You can read this blog post I wrote at the time. At the time, the primary idea was to have some kind of ?Leak bounds and it was tied to the idea of references (so that all 'static data was assumed to be “leakable”, and hence something you could put into an Rc). I… mostly think we made the right call at the time. I think it’s good that most of the ecosystem is interoperable and that Rc doesn’t require static bounds, and certainly I think it’s good that we moved to 1.0 with minimal disruption. In any case, though, I rather prefer this design to the ones that were under discussion at the time, in part because it also addresses the need for different kinds of destructors and for destructors with many arguments and so forth, which wasn’t something we thought about then.

Isn’t it confusing to have these “magic” traits that “opt out” from default bounds?

I think that specifying the bounds you want is inherently better than today’s ? design, both because it’s easier to understand and because it allows us to backwards compatibly add traits in between in ways that are not possible with the ? design.

However, I do see that having T: Move mean that T: Destruct does not hold is subtle. I wonder if we should adopt some kind of sigil or convention on these traits, like T: @Move or something. I don’t know! Something to consider.

That was a great conference. Also, interestingly, this is one of my favorite of all my talks, but for some reason, I rarely reuse this material. I should change that. ↩︎
Academics distinguish “safety” from “liveness properties”, where safety means “bad things don’t happen” and “liveness” means “good things eventually happen”. Another way of saying this is that Rust’s type system helps with a lot of safety properties but struggles with liveness properties. ↩︎
Uh, citation needed. I know this is true but I can’t find the relevant WebAssembly issue where it is discussed. Help, internet! ↩︎
Really the DMA problem is the same as scoped threads. If you think about it, the embedded device writing to memory is basically the same as a parallel thread writing to memory. ↩︎

https://smallcultfollowing.com/babysteps/blog/2025/10/21/move-destruct-leak/

We need (at least) ergonomic, explicit handles

Oct 13, 2025 Updated Oct 13, 2025

Show full content

Continuing my discussion on Ergonomic RC, I want to focus on the core question: should users have to explicitly invoke handle/clone, or not? This whole “Ergonomic RC” work was originally proposed by Dioxus and their answer is simple: definitely not. For the kind of high-level GUI applications they are building, having to call cx.handle() to clone a ref-counted value is pure noise. For that matter, for a lot of Rust apps, even cloning a string or a vector is no big deal. On the other hand, for a lot of applications, the answer is definitely yes – knowing where handles are created can impact performance, memory usage, and even correctness (don’t worry, I’ll give examples later in the post). So how do we reconcile this?

This blog argues that we should make it ergonomic to be explicit. This wasn’t always my position, but after an impactful conversation with Josh Triplett, I’ve come around. I think it aligns with what I once called the soul of Rust: we want to be ergonomic, yes, but we want to be ergonomic while giving control1.

I like Tyler Mandry’s Clarity of purpose contruction, “Great code brings only the important characteristics of your application to your attention”. The key point is that there is great code in which cloning and handles are important characteristics, so we need to make that code possible to express nicely. This is particularly true since Rust is one of the very few languages that really targets that kind of low-level, foundational code.

This does not mean we cannot (later) support automatic clones and handles. It’s inarguable that this would benefit clarity of purpose for a lot of Rust code. But I think we should focus first on the harder case, the case where explicitness is needed, and get that as nice as we can; then we can circle back and decide whether to also support something automatic. One of the questions for me, in fact, is whether we can get “fully explicit” to be nice enough that we don’t really need the automatic version. There are benefits from having “one Rust”, where all code follows roughly the same patterns, where those patterns are perfect some of the time, and don’t suck too bad2 when they’re overkill.

“Rust should not surprise you.” (hat tip: Josh Triplett)

I mentioned this blog post resulted from a long conversation with Josh Triplett3. The key phrase that stuck with me from that conversation was: Rust should not surprise you. The way I think of it is like this. Every programmer knows what its like to have a marathon debugging session – to sit and state at code for days and think, but… how is this even POSSIBLE? Those kind of bug hunts can end in a few different ways. Occasionally you uncover a deeply satisfying, subtle bug in your logic. More often, you find that you wrote if foo and not if !foo. And occasionally you find out that your language was doing something that you didn’t expect. That some simple-looking code concealed a subltle, complex interaction. People often call this kind of a footgun.

Overall, Rust is remarkably good at avoiding footguns4. And part of how we’ve achieved that is by making sure that things you might need to know are visible – like, explicit in the source. Every time you see a Rust match, you don’t have to ask yourself “what cases might be missing here” – the compiler guarantees you they are all there. And when you see a call to a Rust function, you don’t have to ask yourself if it is fallible – you’ll see a ? if it is.5

Creating a handle can definitely “surprise” you

So I guess the question is: would you ever have to know about a ref-count increment? The trick part is that the answer here is application dependent. For some low-level applications, definitely yes: an atomic reference count is a measurable cost. To be honest, I would wager that the set of applications where this is true are vanishingly small. And even in those applications, Rust already improves on the state of the art by giving you the ability to choose between Rc and Arc and then proving that you don’t mess it up.

But there are other reasons you might want to track reference counts, and those are less easy to dismiss. One of them is memory leaks. Rust, unlike GC’d languages, has deterministic destruction. This is cool, because it means that you can leverage destructors to manage all kinds of resources, as Yehuda wrote about long ago in his classic ode-to-RAII entitled “Rust means never having to close a socket”. But although the points where handles are created and destroyed is deterministic, the nature of reference-counting can make it much harder to predict when the underlying resource will actually get freed. And if those increments are not visible in your code, it is that much harder to track them down.

Just recently, I was debugging Symposium, which is written in Swift. Somehow I had two IPCManager instances when I only expected one, and each of them was responding to every IPC message, wreaking havoc. Poking around I found stray references floating around in some surprising places, which was causing the problem. Would this bug have still occurred if I had to write .handle() explicitly to increment the ref count? Definitely, yes. Would it have been easier to find after the fact? Also yes.6

Josh gave me a similar example from the “bytes” crate. A Bytes type is a handle to a slice of some underlying memory buffer. When you clone that handle, it will keep the entire backing buffer around. Sometimes you might prefer to copy your slice out into a separate buffer so that the underlying buffer can be freed. It’s not that hard for me to imagine trying to hunt down an errant handle that is keeping some large buffer alive and being very frustrated that I can’t see explicitly in the where those handles are created.

A similar case occurs with APIs like like Arc::get_mut7. get_mut takes an &mut Arc<T> and, if the ref-count is 1, returns an &mut T. This lets you take a shareable handle that you know is not actually being shared and recover uniqueness. This kind of API is not frequently used – but when you need it, it’s so nice it’s there.

“What I love about Rust is its versatility: low to high in one language” (hat tip: Alex Crichton)

Entering the conversation with Josh, I was leaning towards a design where you had some form of automated cloning of handles and an allow-by-default lint that would let crates which don’t want that turn it off. But Josh convinced me that there is a significant class of applications that want handle creation to be ergonomic AND visible (i.e., explicit in the source). Low-level network services and even things like Rust For Linux likely fit this description, but any Rust application that uses get_mut or make_mut might also.

And this reminded me of something Alex Crichton once said to me. Unlike the other quotes here, it wasn’t in the context of ergonomic ref-counting, but rather when I was working on my first attempt at the “Rustacean Principles”. Alex was saying that he loved how Rust was great for low-level code but also worked well high-level stuff like CLI tools and simple scripts.

I feel like you can interpret Alex’s quote in two ways, depending on what you choose to emphasize. You could hear it as, “It’s important that Rust is good for high-level use cases”. That is true, and it is what leads us to ask whether we should even make handles visible at all.

But you can also read Alex’s quote as, “It’s important that there’s one language that works well enough for both” – and I think that’s true too. The “true Rust gestalt” is when we manage to simultaneously give you the low-level control that grungy code needs but wrapped in a high-level package. This is the promise of zero-cost abstractions, of course, and Rust (in its best moments) delivers.

The “soul of Rust”: low-level enough for a kernel, usable enough for a GUI

Let’s be honest. High-level GUI programming is not Rust’s bread-and-butter, and it never will be; users will never confuse Rust for TypeScript. But then, TypeScript will never be in the Linux kernel.

The goal of Rust is to be a single language that can, by and large, be “good enough” for both extremes. The goal is make enough low-level details visible for kernel hackers but do so in a way that is usable enough for a GUI. It ain’t easy, but it’s the job.

This isn’t the first time that Josh has pulled me back to this realization. The last time was in the context of async fn in dyn traits, and it led to a blog post talking about the “soul of Rust” and a followup going into greater detail. I think the catchphrase “low-level enough for a Kernel, usable enough for a GUI” kind of captures it.

Conclusion: Explicit handles should be the first step, but it doesn’t have to be the final step

There is a slight caveat I want to add. I think another part of Rust’s soul is preferring nuance to artificial simplicity (“as simple as possible, but no simpler”, as they say). And I think the reality is that there’s a huge set of applications that make new handles left-and-right (particularly but not exclusively in async land8) and where explicitly creating new handles is noise, not signal. This is why e.g. Swift9 makes ref-count increments invisible – and they get a big lift out of that!10 I’d wager most Swift users don’t even realize that Swift is not garbage-collected11.

But the key thing here is that even if we do add some way to make handle creation automatic, we ALSO want a mode where it is explicit and visible. So we might as well do that one first.

OK, I think I’ve made this point 3 ways from Sunday now, so I’ll stop. The next few blog posts in the series will dive into (at least) two options for how we might make handle creation and closures more ergonomic while retaining explicitness.

I see a potential candidate for a design axiom… rubs hands with an evil-sounding cackle and a look of glee ↩︎
It’s an industry term. ↩︎
Actually, by the standards of the conversations Josh and I often have, it was’t really all that long – an hour at most. ↩︎
Well, at least sync Rust is. I think async Rust has more than its share, particularly around cancellation, but that’s a topic for another blog post. ↩︎
Modulo panics, of course – and no surprise that accounting for panics is a major pain point for some Rust users. ↩︎
In this particular case, it was fairly easy for me to find regardless, but this application is very simple. I can definitely imagine ripgrep’ing around a codebase to find all increments being useful, and that would be much harder to do without an explicit signal they are occurring. ↩︎
Or Arc::make_mut, which is one of my favorite APIs. It takes an Arc<_> and gives you back mutable (i.e., unique) access to the internals, always! How is that possible, given that the ref count may not be 1? Answer: if the ref-count is not 1, then it clones it. This is perfect for copy-on-write-style code. So beautiful. 😍 ↩︎
My experience is that, due to language limitations we really should fix, many async constructs force you into 'static bounds which in turn force you into Rc and Arc where you’d otherwise have been able to use &. ↩︎
I’ve been writing more Swift and digging it. I have to say, I love how they are not afraid to “go big”. I admire the ambition I see in designs like SwiftUI and their approach to async. I don’t think they bat 100, but it’s cool they’re swinging for the stands. I want Rust to dare to ask for more! ↩︎
Well, not only that. They also allow class fields to be assigned when aliased which, to avoid stale references and iterator invalidation, means you have to move everything into ref-counted boxes and adopt persistent collections, which in turn comes at a performance cost and makes Swift a harder sell for lower-level foundational systems (though by no means a non-starter, in my opinion). ↩︎
Though I’d also wager that many eventually find themselves scratching their heads about a ref-count cycle. I’ve not dug into how Swift handles those, but I see references to “weak handles” flying around, so I assume they’ve not (yet?) adopted a cycle collector. To be clear, you can get a ref-count cycle in Rust too! It’s harder to do since we discourage interior mutability, but not that hard. ↩︎

https://smallcultfollowing.com/babysteps/blog/2025/10/13/ergonomic-explicit-handles/

SymmACP: extending Zed's ACP to support Composable Agents

Oct 8, 2025 Updated Oct 8, 2025

Show full content

This post describes SymmACP – a proposed extension to Zed’s Agent Client Protocol that lets you build AI tools like Unix pipes or browser extensions. Want a better TUI? Found some cool slash commands on GitHub? Prefer a different backend? With SymmACP, you can mix and match these pieces and have them all work together without knowing about each other.

This is pretty different from how AI tools work today, where everything is a monolith – if you want to change one piece, you’re stuck rebuilding the whole thing from scratch. SymmACP allows you to build out new features and modes of interactions in a layered, interoperable way. This post explains how SymmACP would work by walking through a series of examples.

Right now, SymmACP is just a thought experiment. I’ve sketched these ideas to the Zed folks, and they seemed interested, but we still have to discuss the details in this post. My plan is to start prototyping in Symposium – if you think the ideas I’m discussing here are exciting, please join the Symposium Zulip and let’s talk!

“Composable agents” let you build features independently and then combine them

I’m going to explain the idea of “composable agents” by walking through a series of features. We’ll start with a basic CLI agent1 tool – basically a chat loop with access to some MCP servers so that it can read/write files and execute bash commands. Then we’ll show how you could add several features on top:

Addressing time-blindness by helping the agent know what time it is.
Injecting context and “personality” to the agent.
Spawning long-running, asynchronous tasks.
A copy of Q CLI’s /tangent mode that lets you do a bit of “off the books” work that gets removed from your history later.
Implementing Symposium’s interactive walkthroughs, which give the agent a richer vocabulary for communicating with you than just text.
Smarter tool delegation.

The magic trick is that each of these features will be developed as separate repositories. What’s more, they could be applied to any base tool you want, so long as it speaks SymmACP. And you could also combine them with different front-ends, such as a TUI, a web front-end, builtin support from Zed or IntelliJ, etc. Pretty neat.

My hope is that if we can centralize on SymmACP, or something like it, then we could move from everybody developing their own bespoke tools to an interoperable ecosystem of ideas that can build off of one another.

let mut SymmACP = ACP

SymmACP begins with ACP, so let’s explain what ACP is. ACP is a wonderfully simple protocol that lets you abstract over CLI agents. Imagine if you were using an agentic CLI tool except that, instead of communication over the terminal, the CLI tool communicates with a front-end over JSON-RPC messages, currently sent via stdin/stdout.

flowchart LR
    Editor <-.->|JSON-RPC via stdin/stdout| Agent[CLI Agent]

When you type something into the GUI, the editor sends a JSON-RPC message to the agent with what you typed. The agent responds with a stream of messages containing text and images. If the agent decides to invoke a tool, it can request permission by sending a JSON-RPC message back to the editor. And when the agent has completed, it responds to the editor with an “end turn” message that says “I’m ready for you to type something else now”.

sequenceDiagram
    participant E as Editor
    participant A as Agent
    participant T as Tool (MCP)
    
    E->>A: prompt("Help me debug this code")
    A->>E: request_permission("Read file main.rs")
    E->>A: permission_granted
    A->>T: read_file("main.rs")
    T->>A: file_contents
    A->>E: text_chunk("I can see the issue...")
    A->>E: text_chunk("The problem is on line 42...")
    A->>E: end_turn

Telling the agent what time it is

OK, let’s tackle our first feature. If you’ve used a CLI agent, you may have noticed that they don’t know what time it is – or even what year it is. This may sound trivial, but it can lead to some real mistakes. For example, they may not realize that some information is outdated. Or when they do web searches for information, they can search for the wrong thing: I’ve seen CLI agents search the web for “API updates in 2024” for example, even though it is 2025.

To fix this, many CLI agents will inject some extra text along with your prompt, something like <current-date date="2025-10-08" time="HH:MM:SS"/>. This gives the LLM the context it needs.

So how could use ACP to build that? The idea is to create a proxy. This proxy would wrap the original ACP server:

flowchart LR
    Editor[Editor/VSCode] <-->|ACP| Proxy[Datetime Proxy] <-->|ACP| Agent[CLI Agent]

This proxy will take every “prompt” message it receives and decorate it with the date and time:

sequenceDiagram
    participant E as Editor
    participant P as Proxy
    participant A as Agent
    
    E->>P: prompt("What day is it?")
    P->>A: prompt("<current-date .../> What day is it?")
    A->>P: text_chunk("It is 2025-10-08.")
    P->>E: text_chunk("It is 2025-10-08.")
    A->>P: end_turn
    P->>E: end_turn

Simple, right? And of course this can be used with any editor and any ACP-speaking tool.

Next feature: Injecting “personality” to the agent

Let’s look at another feature that basically “falls out” from ACP: injecting personality. Most agents give you the ability to configure “context” in various ways – or what Claude Code calls memory. This is useful, but I and others have noticed that if what you want is to change how Claude “behaves” – i.e., to make it more collaborative – it’s not really enough. You really need to kick off the conversation by reinforcing that pattern.

In Symposium, the “yiasou” prompt (also available as “hi”, for those of you who don’t speak Greek 😛) is meant to be run as the first thing in the conversation. But there’s nothing an MCP server can do to ensure that the user kicks off the conversation with /symposium:hi or something similar. Of course, if Symposium were implemented as an ACP Server, we absolutely could do that:

sequenceDiagram
    participant E as Editor
    participant P as Proxy
    participant A as Agent
    
    E->>P: prompt("I'd like to work on my document")
    P->>A: prompt("/symposium:hi")
    A->>P: end_turn
    P->>A: prompt("I'd like to work on my document")
    A->>P: text_chunk("Sure! What document is that?") 
    P->>E: text_chunk("Sure! What document is that?") 
    A->>P: end_turn
    P->>E: end_turn

Proxies are a better version of hooks

Some of you may be saying, “hmm, isn’t that what hooks are for?” And yes, you could do this with hooks, but there’s two problems with that. First, hooks are non-standard, so you have to do it differently for every agent.

The second problem with hooks is that they’re fundamentally limited to what the hook designer envisioned you might want. You only get hooks at the places in the workflow that the tool gives you, and you can only control what the tool lets you control. The next feature starts to show what I mean: as far as I know, it cannot readily be implemented with hooks the way I would want it to work.

Next feature: long-running, asynchronous tasks

Let’s move on to our next feature, long-running asynchronous tasks. This feature is going to have to go beyond the current capabilities of ACP into the expanded “SymmACP” feature set.

Right now, when the server invokes an MCP tool, it executes in a blocking way. But sometimes the task it is performing might be long and complicated. What you would really like is a way to “start” the task and then go back to working. When the task is complete, you (and the agent) could be notified.

This comes up for me a lot with “deep research”. A big part of my workflow is that, when I get stuck on something I don’t understand, I deploy a research agent to scour the web for information. Usually what I will do is ask the agent I’m collaborating with to prepare a research prompt summarizing the things we tried, what obstacles we hit, and other details that seem relevant. Then I’ll pop over to claude.ai or Gemini Deep Research and paste in the prompt. This will run for 5-10 minutes and generate a markdown report in response. I’ll download that and give it to my agent. Very often this lets us solve the problem.2

This research flow works well but it is tedious and requires me to copy-and-paste. What I would ideally want is an MCP tool that does the search for me and, when the results are done, hands them off to the agent so it can start processing immediately. But in the meantime, I’d like to be able to continue working with the agent while we wait. Unfortunately, the protocol for tools provides no mechanism for asynchronous notifications like this, from what I can tell.

SymmACP += tool invocations + unprompted sends

So how would I do it with SymmACP? Well, I would want to extend the ACP protocol as it is today in two ways:

I’d like the ACP proxy to be able to provide tools that the proxy will execute. Today, the agent is responsible for executing all tools; the ACP protocol only comes into play when requesting permission. But it’d be trivial to have MCP tools where, to execute the tool, the agent sends back a message over ACP instead.
I’d like to have a way for the agent to initiate responses to the editor. Right now, the editor always initiatives each communication session with a prompt; but, in this case, the agent might want to send messages back unprompted.

In that case, we could implement our Research Proxy like so:

sequenceDiagram
    participant E as Editor
    participant P as Proxy
    participant A as Agent
    
    E->>P: prompt("Why is Rust so great?")
    P->>A: prompt("Why is Rust so great?")
    A->>P: invoke tool("begin_research")
    activate P
    P->>A: ok
    A->>P: "I'm looking into it!"
    P->>E: "I'm looking into it!"
    A->>P: end_turn
    P->>E: end_turn

    Note over E,A: Time passes (5-10 minutes) and the user keeps working...
    Note over P: Research completes in background
    
    P->>A: <research-complete/>
    deactivate P
    A->>P: "Research says Rust is fast"
    P->>E: "Research says Rust is fast"
    A->>P: end_turn
    P->>E: end_turn

What’s cool about this is that the proxy encapsulates the entire flow: it knows how to do the research, and it manages notifying the various participants when the research completes. (Also, this leans on one detail I left out, which is that )

Next feature: tangent mode

Let’s explore our next feature, Q CLI’s /tangent mode. This feature is interesting because it’s a simple (but useful!) example of history editing. The way /tangent works is that, when you first type /tangent, Q CLI saves your current state. You can then continue as normal but when you next type /tangent, your state is restored to where you were. This, as the name suggests, lets you explore a side conversation without polluting your main context.

The basic idea for supporting tangent in SymmACP is that the proxy is going to (a) intercept the tangent prompt and remember where it began; (b) allow the conversation to continue as normal; and then (c) when it’s time to end the tangent, create a new session and replay the history up until the point of the tangent3.

SymACP += replay

You can almost implement “tangent” in ACP as it is, but not quite. In ACP, the agent always owns the session history. The editor can create a new session or load an older one; when loading an older one, the agent “replays” “replays” the events so that the editor can reconstruct the GUI. But there is no way for the editor to “replay” or construct a session to the agent. Instead, the editor can only send prompts, which will cause the agent to reply. In this case, what we want is to be able to say “create a new chat in which I said this and you responded that” so that we can setup the initial state. This way we could easily create a new session that contains the messages from the old one.

So how this would work:

sequenceDiagram
    participant E as Editor
    participant P as Proxy
    participant A as Agent
    
    E->>P: prompt("Hi there!")
    P->>A: prompt("Hi there!")

    Note over E,A: Conversation proceeds
    
    E->>P: prompt("/tangent")
    Note over P: Proxy notes conversation state
    P->>E: end_turn
    E->>P: prompt("btw, ...")
    P->>A: prompt("btw, ...")

    Note over E,A: Conversation proceeds
    
    E->>P: prompt("/tangent")
    
    P->>A: new_session
    P->>A: prompt("Hi there!")    
    Note over P,A: ...Proxy replays conversation...

Next feature: interactive walkthroughs

One of the nicer features of Symposium is the ability to do interactive walkthroughs. These consist of an HTML sidebar as well as inline comments in the code:

Right now, this is implemented by a kind of hacky dance:

The agent invokes an MCP tool and sends it the walkthrough in markdown. This markdown includes commands meant to be placed on particular lines, identified not by line number (agents are bad at line numbers) but by symbol names or search strings.
The MCP tool parses the markdown, determines the line numbers for comments, and creates HTML. It sends that HTML over IPC to the VSCode extension.
The VSCode receives the IPC message, displays the HTML in the sidebar, and creates the comments in the code.

It works, but it’s a giant Rube Goldberg machine.

SymmACP += Enriched conversation history

With SymmACP, we would structure the passthrough mechanism as a proxy. Just as today, it would provide an MCP tool to the agent to receive the walkthrough markdown. It would then convert that into the HTML to display on the side along with the various comments to embed in the code. But this is where things are different.

Instead of sending that content over IPC, what I would want to do is to make it possible for proxies to deliver extra information along with the chat. This is relatively easy to do in ACP as is, since it provides for various capabilities, but I think I’d want to go one step further

I would have a proxy layer that manages walkthroughs. As we saw before, it would provide a tool. But there’d be one additional thing, which is that, beyond just a chat history, it would be able to convey additional state. I think the basic conversation structure is like:

Conversation
- Turn
  - User prompt(s) – could be zero or more
  - Response(s) – could be zero or more
  - Tool use(s) – could be zero or more

but I think it’d be useful to (a) be able to attach metadata to any of those things, e.g., to add extra context about the conversation or about a specific turn (or even a specific prompt), but also additional kinds of events. For example, tool approvals are an event. And presenting a walkthrough and adding annotations are an event too.

The way I imagine it, one of the core things in SymmACP would be the ability to serialize your state to JSON. You’d be able to ask a SymmACP paricipant to summarize a session. They would in turn ask any delegates to summarize and then add their own metadata along the way. You could also send the request in the other direction – e.g., the agent might present its state to the editor and ask it to augment it.

Enriched history would let walkthroughs be extra metadata

This would mean a walkthrough proxy could add extra metadata into the chat transcript like “the current walkthrough” and “the current comments that are in place”. Then the editor would either know about that metadata or not. If it doesn’t, you wouldn’t see it in your chat. Oh well – or perhaps we do something HTML like, where there’s a way to “degrade gracefully” (e.g., the walkthrough could be presented as a regular “response” but with some metadata that, if you know to look, tells you to interpret it differently). But if the editor DOES know about the metadata, it interprets it specially, throwing the walkthrough up in a panel and adding the comments into the code.

With enriched histories, I think we can even say that in SymmACP, the ability to load, save, and persist sessions itself becomes an extension, something that can be implemented by a proxy; the base protocol only needs the ability to conduct and serialize a conversation.

Final feature: Smarter tool delegation.

Let me sketch out another feature that I’ve been noodling on that I think would be pretty cool. It’s well known that there’s a problem that LLMs get confused when there are too many MCP tools available. They get distracted. And that’s sensible, so would I, if I were given a phonebook-size list of possible things I could do and asked to figure something out. I’d probably just ignore it.

But how do humans deal with this? Well, we don’t take the whole phonebook – we got a shorter list of categories of options and then we drill down. So I go to the File Menu and then I get a list of options, not a flat list of commands.

I wanted to try building an MCP tool for IDE capabilities that was similar. There’s a bajillion set of things that a modern IDE can “do”. It can find references. It can find definitions. It can get type hints. It can do renames. It can extract methods. In fact, the list is even open-ended, since extensions can provide their own commands. I don’t know what all those things are but I have a sense for the kinds of things an IDE can do – and I suspect models do too.

What if you gave them a single tool, “IDE operation”, and they could use plain English to describe what they want? e.g., ide_operation("find definition for the ProxyHandler that referes to HTTP proxies"). Hmm, this is sounding a lot like a delegate, or a sub-agent. Because now you need to use a second LLM to interpret that request – you probably want to do something like, give it a list of sugested IDE capabilities and the ability to find out full details and ask it to come up with a plan (or maybe directly execute the tools) to find the answer.

As it happens, MCP has a capability to enable tools to do this – it’s called (somewhat oddly, in my opinion) “sampling”. It allows for “callbacks” from the MCP tool to the LLM. But literally nobody implements it, from what I can tell.4 But sampling is kind of limited anyway. With SymmACP, I think you could do much more interesting things.

SymmACP.contains(simultaneous_sessions)

The key is that ACP already permits a single agent to “serve up” many simultaneous sessions. So that means that if I have a proxy, perhaps one supplying an MCP tool definition, I could use it to start fresh sessions – combine that with the “history replay” capability I mentioned above, and the tool can control exactly what context to bring over into that session to start from, as well, which is very cool (that’s a challenge for MCP servers today, they don’t get access to the conversation history).

sequenceDiagram
    participant E as Editor
    participant P as Proxy
    participant A as Agent
    
    A->>P: ide_operation("...")
    activate P
    P->>A: new_session
    activate P
    activate A
    P->>A: prompt("Using these primitive operations, suggest a way to do '...'")
    A->>P: ...
    A->>P: end_turn
    deactivate P
    deactivate A
    Note over P: performs the plan
    P->>A: result from tool
    deactivate P

Conclusion

Ok, this post sketched a variant on ACP that I call SymmACP. SymmACP extends ACP with

the ability for either side to provide the initial state of a conversation, not just the server
the ability for an “editor” to provide an MCP tool to the “agent”
the ability for agents to respond without an initial prompt
the ability to serialize conversations and attach extra state (already kind of present)

Most of these are modest extensions to ACP, in my opinion, and easily doable in a backwards fashion just by adding new capabilities. But together they unlock the ability for anyone to craft extensions to agents and deploy them in a composable way. I am super excited about this. This is exactly what I wanted Symposium to be all about.

It’s worth noting the old adage: “with great power, comes great responsibility”. These proxies and ACP layers I’ve been talking about are really like IDE extensions. They can effectively do anything you could do. There are obvious security concerns. Though I think that approaches like Microsoft’s Wassette are key here – it’d be awesome to have a “capability-based” notion of what a “proxy layer” is, where everything compiles to WASM, and where users can tune what a given proxy can actually do.

I plan to start sketching a plan to drive this work in Symposium and elsewhere. My goal is to have a completely open and interopable client, one that can be based on any agent (including local ones) and where you can pick and choose which parts you want to use. I expect to build out lots of custom functionality to support Rust development (e.g., explaining and diagnosting trait errors using the new trait solver is high on my list…and macro errors…) but also to have other features like walkthroughs, collaborative interaction style, etc that are all language independent – and I’d love to see language-focused features for other langauges, especially Python and TypeScript (because “the new trifecta”) and Swift and Kotlin (because mobile). If that vision excites you, come join the Symposium Zulip and let’s chat!

Appendix: A guide to the agent protocols I’m aware of

One question I’ve gotten when discussing this is how it compares to the other host of protocols out there. Let me give a brief overview of the related work and how I understand its pros and cons:

Model context protocol (MCP): The queen of them all. A protocol that provides a set of tools, prompts, and resources up to the agent. Agents can invoke tools by supplying appropriate parameters, which are JSON. Prompts are shorthands that users can invoke using special commands like / or @, they are essentially macros that expand “as if the user typed it” (but they can also have parameters and be dynamically constructed). Resources are just data that can be requested. MCP servers can either be local or hosted remotely. Remote MCP has only recently become an option and auth in particular is limited.
- Comparison to SymmACP: MCP provides tools that the agent can invoke. SymmACP builds on it by allowing those tools to be provided by outer layers in the proxy chain. SymmACP is oriented at controlling the whole chat “experience”.
Zed’s Agent Client Protocol (ACP): The basis for SymmACP. Allows editors to create and manage sessions. Focused only on local sessions, since your editor runs locally.
- Comparison to SymmACP: That’s what this post is all about! SymmACP extends ACP with new capabilities that let intermediate layers manipulate history, provide tools, and provide extended data upstream to support richer interaction patterns than jus chat. PS I expect we may want to support more remote capabilities, but it’s kinda orthogonal in my opinion (e.g., I’d like to be able to work with an agent running over in a cloud-hosted workstation, but I’d probably piggyback on ssh for that).
Google’s Agent-to-Agent Protocol (A2A) and IBM’s Agent Communication Protocol (ACP)5: From what I can tell, Google’s “agent-to-agent” protocol is kinda like a mix of MCP and OpenAPI. You can ping agents that are running remotely and get them to send you “agent cards”, which describe what operations they can perform, how you authenticate, and other stuff like that. It looks to me quite similar to MCP except that it has richer support for remote execution and in particular supports things like long-running communication, where an agent may need to go off and work for a while and then ping you back on a webhook.
- Comparison to MCP: To me, A2A looks like a variant of MCP that is more geared to remote execution. MCP has a method for tool discovery where you ping the server to get a list of tools; A2A has a similar mechanism with Agent Cards. MCP can run locally, which A2A cannot afaik, but A2A has more options about auth. MCP can only be invoked synchronously, whereas A2A supports long-running operations, progress updates, and callbacks. It seems like the two could be merged to make a single whole.
- Comparison to SymmACP: I think A2A is orthogonal from SymmACP. A2A is geared to agents that provide services to one another. SymmACP is geared towards building new development tools for interacting with agents. It’s possible you could build something like SymmACP on A2A but I don’t know what you would really gain by it (and I think it’d be easy to do later).

Everybody uses agents in various ways. I like Simon Willison’s “agents are models using tools in a loop” definition; I feel that an “agentic CLI tool” fits that definition, it’s just that part of the loop is reading input from the user. I think “fully autonomous” agents are a subset of all agents – many agent processes interact with the outside world via tools etc. From a certain POV, you can view the agent “ending the turn” as invoking a tool for “gimme the next prompt”. ↩︎
Research reports are a major part of how I avoid hallucination. You can see an example of one such report I commissioned on the details of the Language Server Protocol here; if we were about to embark on something that required detailed knowledge of LSP, I would ask the agent to read that report first. ↩︎
Alternatively: clear the session history and rebuild it, but I kind of prefer the functional view of the world, where a given session never changes. ↩︎
I started an implementation for Q CLI but got distracted – and, for reasons that should be obvious, I’ve started to lose interest. ↩︎
Yes, you read that right. There is another ACP. Just a mite confusing when you google search. =) ↩︎

https://smallcultfollowing.com/babysteps/blog/2025/10/08/symmacp/

The Handle trait

Oct 7, 2025 Updated Oct 7, 2025

Show full content

There’s been a lot of discussion lately around ergonomic ref-counting. We had a lang-team design meeting and then a quite impactful discussion at the RustConf Unconf. I’ve been working for weeks on a follow-up post but today I realized what should’ve been obvious from the start – that if I’m taking that long to write a post, it means the post is too damned long. So I’m going to work through a series of smaller posts focused on individual takeaways and thoughts. And for the first one, I want to (a) bring back some of the context and (b) talk about an interesting question, what should we call the trait. My proposal, as the title suggests, is Handle – but I get ahead of myself.

The story thus far

For those of you who haven’t been following, there’s been an ongoing discussion about how best to have ergonomic ref counting:

It began with the first Rust Project Goals program in 2024H2, where Jonathan Kelley from Dioxus wrote a thoughtful blog post about a path to high-level Rust that eventually became a 2024H2 project goal towards ergonomic ref-counting.
I wrote a series of blog posts about a trait I called Claim.
Josh and I talked and Josh opened RFC #3680, which proposed a use keyword and use || closures. Reception, I would say, was mixed; yes, this is tackling a real problem, but there were lots of concerns on the approach. I summarized the key points here.
Santiago implemented experimental support for (a variant of) RFC #3680 as part of the 2025H1 project goal.
I authored a 2025H2 project goal proposing that we create an alternative RFC focused on higher-level use-cases which prompted Josh and I have to have a long and fruitful conversation in which he convinced me that this was not the right approach.
We had a lang-team design meeting on 2025-08-27 in which I presented this survey and summary of the work done thus far.
And then at the RustConf 2025 Unconf we had a big group discussion on the topic that I found very fruitful, as well as various follow-up conversations with smaller groups.

This blog post is about “the trait”

The focus of this blog post is on one particular question: what should we call “The Trait”. In virtually every design, there has been some kind of trait that is meant to identify something. But it’s been hard to get a handle1 on what precisely that something is. What is this trait for and what types should implement it? Some things are clear: whatever The Trait is, Rc<T> and Arc<T> should implement it, for example, but that’s about it.

My original proposal was for a trait named Claim that was meant to convey a “lightweight clone” – but really the trait was meant to replace Copy as the definition of which clones ought to be explicit 2. Jonathan Kelley had a similar proposal but called it Capture. In RFC #3680 the proposal was to call the trait Use.

The details and intent varied, but all of these attempts had one thing in common: they were very operational. That is, the trait was always being defined in terms of what it does (or doesn’t do) but not why it does it. And that I think will always be a weak grounding for a trait like this, prone to confusion and different interpretations. For example, what is a “lightweight” clone? Is it O(1)? But what about things that are O(1) with very high probability? And of course, O(1) doesn’t mean cheap – it might copy 22GB of data every call. That’s O(1).

What you want is a trait where it’s fairly clear when it should and should not be implemented and not based on taste or subjective criteria. And Claim and friends did not meet the bar: in the Unconf, several new Rust users spoke up and said they found it very hard, based on my explanations, to judge whether their types ought to implement The Trait (whatever we call it). That has also been a persitent theme from the RFC and elsewhere.

“Shouldn’t we call it share?” (hat tip: Jack Huey)

But really there is a semantic underpinning here, and it was Jack Huey who first suggested it. Consider this question. What are the differences between cloning a Mutex<Vec<u32>> and a Arc<Mutex<Vec<u32>>>?

One difference, of course, is cost. Cloning the Mutex<Vec<u32>> will deep-clone the vector, cloning the Arc will just increment a referece count.

But the more important difference is what I call “entanglement”. When you clone the Arc, you don’t get a new value – you get back a second handle to the same value.3

Entanglement changes the meaning of the program

Knowing which values are “entangled” is key to understanding what your program does. A big part of how the borrow checker4 achieves reliability is by reducing “entaglement”, since it becomes a relative pain to work with in Rust.

Consider the following code. What will be the value of l_before and l_after?

let l_before = v1.len();
let v2 = v1.clone();
v2.push(new_value);
let l_after = v1.len();

The answer, of course, is “depends on the type of v1”. If v1 is a Vec, then l_after == l_before. But if v1 is, say, a struct like this one:

struct SharedVec<T> {
    data: Arc<Mutex<Vec<T>>>
}

impl<T> SharedVec<T> {
    pub fn push(&self, value: T) {
        self.data.lock().unwrap().push(value);
    }

    pub fn len(&self) -> usize {
        self.data.lock().unwrap().len()
    }
}

then l_after == l_before + 1.

There are many types that act like a SharedVec: it’s true for Rc and Arc, of course, but also for things like Bytes and channel endpoints like Sender. All of these are examples of “handles” to underlying values and, when you clone them, you get back a second handle that is indistinguishable from the first one.

We have a name for this concept already: handles

Jack’s insight was that we should focus on the semantic concept (sharing) and not on the operational details (how it’s implemented). This makes it clear when the trait ought to be implemented. I liked this idea a lot, although I eventually decided I didn’t like the name Share. The word isn’t specific enough, I felt, and users might not realize it referred to a specific concept: “shareable types” doesn’t really sound right. But in fact there is a name already in common use for this concept: handles (see e.g. tokio::runtime::Handle).

This is how I arrived at my proposed name and definition for The Trait, which is Handle:5

/// Indicates that this type is a *handle* to some
/// underlying resource. The `handle` method is
/// used to get a fresh handle.
trait Handle: Clone {
    final fn handle(&self) -> Self {
        Clone::clone(self)
    }
}

We would lint and advice people to call handle

The Handle trait includes a method handle which is always equivalent to clone. The purpose of this method is to signal to the reader that the result is a second handle to the same underlying value.

Once the Handle trait exists, we should lint on calls to clone when the receiver is known to implement Handle and encourage folks to call handle instead:

impl DataStore {
    fn store_map(&mut self, map: &Arc<HashMap<...>>) {
        self.stored_map = map.clone();
        //                    -----
        //
        // Lint: convert `clone` to `handle` for
        // greater clarity.
    }
}

Compare the above to the version that the lint suggests, using handle, and I think you will get an idea for how handle increases clarity of what is happening:

impl DataStore {
    fn store_map(&mut self, map: &Arc<HashMap<...>>) {
        self.stored_map = map.handle();
    }
}

What it means to be a handle

The defining characteristic of a handle is that it, when cloned, results in a second value that accesses the same underlying value. This means that the two handles are “entangled”, with interior mutation that affects one handle showing up in the other. Reflecting this, most handles have APIs that consist exclusively or almost exclusively of &self methods, since having unique access to the handle does not necessarily give you unique access to the value.

Handles are generally only significant, semantically, when interior mutability is involved. There’s nothing wrong with having two handles to an immutable value, but it’s not generally distinguishable from two copies of the same value. This makes persistent collections an interesting grey area: I would probably implement Handle for something like im::Vec<T>, particularly since something like a im::Vec<Cell<u32>> would make entaglement visible, but I think there’s an argument against it.

Handles in the stdlib

In the stdlib, handle would be implemented for exactly one Copy type (the others are values):

// Shared references, when cloned (or copied),
// create a second reference:
impl<T: ?Sized> Handle for &T {}

It would be implemented for ref-counted pointers (but not Box):

// Ref-counted pointers, when cloned,
// create a second reference:
impl<T: ?Sized> Handle for Rc<T> {}
impl<T: ?Sized> Handle for Arc<T> {}

And it would be implemented for types like channel endpoints, that are implemented with a ref-counted value under the hood:

// mpsc "senders", when cloned, create a
// second sender to the same underlying channel:
impl<T: ?Sized> Handle for mpsc::Sender {}

Conclusion: a design axiom emerges

OK, I’m going to stop there with this “byte-sized” blog post. More to come! But before I go, let me layout what I believe to be a useful “design axiom” that we should adopt for this design:

Expose entanglement. Understanding the difference between a handle to an underlying value and the value itself is necessary to understand how Rust works.

The phrasing feels a bit awkward, but I think it is the key bit anyway.

That. my friends, is foreshadowing. Damn I’m good. ↩︎
I described Claim as a kind of “lightweight clone” but in the Unconf someone pointed out that “heavyweight copy” was probably a better description of what I was going for. ↩︎
And, not coincidentally, the types where cloning leads to entanglement tend to also be the types where cloning is cheap. ↩︎
and functional programming… ↩︎
The “final” keyword was proposed by Josh Triplett in RFC 3678. It means that impls cannot change the definition of Handle::handle. There’s been some back-and-forth on whether it ought to be renamed or made more general or what have you; all I know is, I find it an incredibly useful concept for cases like this, where you want users to be able to opt-in to a method being available but not be able to change what it does. You can do this in other ways, they’re just weirder. ↩︎

https://smallcultfollowing.com/babysteps/blog/2025/10/07/the-handle-trait/

Symposium: exploring new AI workflows

Sep 24, 2025 Updated Sep 24, 2025

Show full content

This blog post gives you a tour of Symposium, a wild-and-crazy project that I’ve been obsessed with over the last month or so. Symposium combines an MCP server, a VSCode extension, an OS X Desktop App, and some mindful prompts to forge new ways of working with agentic CLI tools.

Symposium is currently focused on my setup, which means it works best with VSCode, Claude, Mac OS X, and Rust. But it’s meant to be unopinionated, which means it should be easy to extend to other environments (and in particular it already works great with other programming languages). The goal is not to compete with or replace those tools but to combine them together into something new and better.

In addition to giving you a tour of Symposium, this blog post is an invitation: Symposium is an open-source project, and I’m looking for people to explore with me! If you are excited about the idea of inventing new styles of AI collaboration, join the symposium-dev Zulip. Let’s talk!

Demo video

I’m not normally one to watch videos online. But in this particular case, I do think a movie is going to be worth 1,000,000 words. Therefore, I’m embedding a short video (6min) demonstrating how Symposium works below. Check it out! But don’t worry, if videos aren’t your thing, you can just read the rest of the post instead.

Alternatively, if you really love videos, you can watch the first version I made, which went into more depth. That version came in at 20 minutes, which I decided was…a bit much. 😁

Taskspaces let you juggle concurrent agents

The Symposium story begins with Symposium.app, an OS X desktop application for managing taskspaces. A taskspace is a clone of your project1 paired with an agentic CLI tool that is assigned to complete some task.

My observation has been that most people doing AI development spend a lot of time waiting while the agent does its thing. Taskspaces let you switch quickly back and forth.

Before I was using taskspaces, I was doing this by jumping between different projects. I found that was really hurting my brain from context switching. But jumping between tasks in a project is much easier. I find it works best to pair a complex topic with some simple refactorings.

Here is what it looks like to use Symposium:

Each of those boxes is a taskspace. It has both its own isolated directory on the disk and an associated VSCode window. When you click on the taskspace, the app brings that window to the front. It can also hide other windows by positioning them exactly behind the first one in a stack2. So it’s kind of like a mini window manager.

Within each VSCode window, there is a terminal running an agentic CLI tool that has the Symposium MCP server. If you’re not familiar with MCP, it’s a way for an LLM to invoke custom tools; it basically just gives the agent a list of available tools and a JSON scheme for what arguments they expect.

The Symposium MCP server does a bunch of things–we’ll talk about more of them later–but one of them is that it lets the agent interact with taskspaces. The agent can use the MCP server to post logs and signal progress (you can see the logs in that screenshot); it can also spawn new taskspaces. I find that last part very handy.

It often happens to me that while working on one idea, I find opportunities for cleanups or refactorings. Nowadays I just spawn out a taskspace with a quick description of the work to be done. Next time I’m bored, I can switch over and pick that up.

An aside: the Symposium app is written in Swift, a language I did not know 3 weeks ago

It’s probably worth mentioning that the Symposium app is written in Swift. I did not know Swift three weeks ago. But I’ve now written about 6K lines and counting. I feel like I’ve got a pretty good handle on how it works.3

Well, it’d be more accurate to say that I have reviewed about 6K lines, since most of the time Claude generates the code. I mostly read it and offer suggestions for improvement4. When I do dive in and edit the code myself, it’s interesting because I find I don’t have the muscle memory for the syntax. I think this is pretty good evidence for the fact that agentic tools help you get started in a new programming language.

Walkthroughs let AIs explain code to you

So, while taskspaces let you jump between tasks, the rest of Symposium is dedicated to helping you complete an individual task. A big part of that is trying to go beyond the limits of the CLI interface by connecting the agent up to the IDE. For example, the Symposium MCP server has a tool called present_walkthrough which lets the agent present you with a markdown document that explains how some code works. These walkthroughs show up in a side panel in VSCode:

As you can see, the walkthroughs can embed mermaid, which is pretty cool. It’s sometimes so clarifying to see a flowchart or a sequence diagram.

Walkthroughs can also embed comments, which are anchored to particular parts of the code. You can see one of those in the screenshot too, on the right.

Each comment has a Reply button that lets you respond to the comment with further questions or suggest changes; you can also select random bits of text and use the “code action” called “Discuss in Symposium”. Both of these take you back to the terminal where your agent is running. They embed a little bit of XML (<symposium-ref id="..."/>) and then you can just type as normal. The agent can then use another MCP tool to expand that reference to figure out what you are referring to or what you are replying to.

To some extent, this “reference the thing I’ve selected” functionality is “table stakes”, since Claude Code already does it. But Symposium’s version works anywhere (Q CLI doesn’t have that functionality, for example) and, more importantly, it lets you embed multiple refrences at once. I’ve found that to be really useful. Sometimes I’ll wind up with a message that is replying to one comment while referencing two or three other things, and the <symposium-ref/> system lets me do that no problem.

Integrating with IDE knowledge

Symposium also includes an ide-operations tool that lets the agent connect to the IDE to do things like “find definitions” or “find references”. To be honest I haven’t noticed this being that important (Claude is surprisingly handy with awk/sed) but I also haven’t done much tinkering with it. I know there are other MCP servers out there too, like Serena, so maybe the right answer is just to import one of those, but I think there’s a lot of interesting stuff we could do here by integrating deeper knowledge of the code, so I have been trying to keep it “in house” for now.

Leveraging Rust conventions

Continuing our journey down the stack, let’s look at one more bit of functionality, which are MCP tools aimed at making agents better at working with Rust code. By far the most effective of these so far is one I call get_rust_crate_source. It is very simple: given the name of a crate, it just checks out the code into a temporary directory for the agent to use. Well, actually, it does a bit more than that. If the agent supplies a search string, it also searches for that string so as to give the agent a “head start” in finding the relevant code, and it makes a point to highlight code in the examples directory in particular.

We could do a lot more with Rust…

My experience has been that this tool makes all the difference. Without it, Claude just geneates plausible-looking APIs that don’t really exist. With it, Claude generally figures out exactly what to do. But really it’s just scratching the surface of what we can do. I am excited to go deeper here now that the basic structure of Symposium is in place – for example, I’d love to develop Rust-specific code reviewers that can critique the agent’s code or offer it architectural advice5, or a tool like CWhy to help people resolve Rust trait errors or macro problems.

…and can we decentralize it?

But honestly what I’m most excited about is the idea of decentralizing. I want Rust library authors to have a standard way to attach custom guidance and instructions that will help agents use their library. I want an AI-enhanced variant of cargo upgrade that automatically bridges over major versions, making use of crate-supplied metadata about what changed and what rewrites are needed. Heck, I want libraries to be able to ship with MCP servers implemented in WASM (Wassette, anyone?) so that Rust developers using that library can get custom commands and tools for working with it. I don’t 100% know what this looks like but I’m keen to explore it. If there’s one thing I’ve learned from Rust, it’s always bet on the ecosystem.

Looking further afield, can we use agents to help humans collaborate better?

One of the things I am very curious to explore is how we can use agents to help humans collaborate better. It’s oft observed that coding with agents can be a bit lonely6. But I’ve also noticed that structuring a project for AI consumption requires relatively decent documentation. For example, one of the things I did recently for Symposium was to create a Request for Dialogue (RFD) process – a simplified version of Rust’s RFC process. My motivation was partly in anticipation of trying to grow a community of contributors, but it was also because most every major refactoring or feature work I do begins with iterating on docs. The doc becomes a central tracking record so that I can clear the context and rest assured that I can pick up where I left off. But a nice side-effect is that the project has more docs than you might expect, considering, and I hope that will make it easier to dive in and get acquainted.

And what about other things? Like, I think that taskspaces should really be associated with github issues. If we did that, could we do a better job at helping new contributors pick up an issue? Or at providing mentoring instructions to get started?

What about memory? I really want to add in some kind of automated memory system that accumulates knowledge about the system more automatically. But could we then share that knowledge (or a subset of it) across users, so that when I go to hack on a project, I am able to “bootstrap” with the accumulated observations of other people who’ve been working on it?

Can agents help in guiding and shepherding design conversations? At work, when I’m circulating a document, I will typically download a copy of that document with people’s comments embedded in it. Then I’ll use pandoc to convert that into Markdown with HTML comments and then ask Claude to read it over and help me work through the comments systematically. Could we do similar things to manage unwieldy RFC threads?

This is part of what gets me excited about AI. I mean, don’t get me wrong. I’m scared too. There’s no question that the spread of AI will change a lot of things in our society, and definitely not always for the better. But it’s also a huge opportunity. AI is empowering! Suddenly, learning new things is just vastly easier. And when you think about the potential for integrating AI into community processes, I think that it could easily be used to bring us closer together and maybe even to make progress on previously intractable problems in open-source7.

Conclusion: Want to build something cool?

As I said in the beginning, this post is two things. Firstly, it’s an advertisement for Symposium. If you think the stuff I described sounds cool, give Symposium a try! You can find installation instructions here. I gotta warn you, as of this writing, I think I’m the only user, so I would not at all be surprised to find out that there’s bugs in setup scripts etc. But hey, try it out, find bugs and tell me about them! Or better yet, fix them!

But secondly, and more importantly, this blog post is an invitation to come out and play8. I’m keen to have more people come and hack on Symposium. There’s so much we could do! I’ve identified a number of “good first issue” bugs. Or, if you’re keen to take on a larger project, I’ve got a set of invited “Request for Dialogue” projects you could pick up and make your own. And if none of that suits your fancy, feel free to pitch you own project – just join the Zulip and open a topic!

Technically, a git worktree. ↩︎
That’s what the “Stacked” box does; if you uncheck it, the windows can be positioned however you like. I’m also working on a tiled layout mode. ↩︎
Well, mostly. I still have some warnings about something or other not being threadsafe that I’ve been ignoring. Claude assures me they are not a big deal (Claude can be so lazy omg). ↩︎
Mostly: “Claude will you please for the love of God stop copying every function ten times.” ↩︎
E.g., don’t use a tokio mutex you fool, use an actor. That is one particular bit of advice I’ve given more than once. ↩︎
I’m kind of embarassed to admit that Claude’s dad jokes have managed to get a laugh out of me on occassion, though. ↩︎
Narrator voice: burnout. he means maintainer burnout. ↩︎
Tell me you went to high school in the 90s without telling me you went to high school in the 90s. ↩︎

https://smallcultfollowing.com/babysteps/blog/2025/09/24/symposium/

Rust, Python, and TypeScript: the new trifecta

Jul 31, 2025 Updated Jul 31, 2025

Show full content

You heard it here first: my guess is that Rust, Python, and TypeScript are going to become the dominant languages going forward (excluding the mobile market, which has extra wrinkles). The argument is simple. Increasing use of AI coding is going to weaken people’s loyalty to programming languages, moving it from what is often a tribal decision to one based on fundamentals. And the fundamentals for those 3 languages look pretty strong to me: Rust targets system software or places where efficiency is paramount. Python brings a powerful ecosystem of mathematical and numerical libraries to bear and lends itself well to experimentation and prototyping. And TypeScript, of course, is compiled to JavaScript which runs natively on browsers and the web and a number of other areas. And all of them, at least if setup properly, offer strong static typing and the easy use of dependencies. Let’s walk through the argument point by point.

AI is moving us towards idea-oriented programming

Building with an LLM is presently a rather uneven experience, but I think the long-term trend is clear enough. We are seeing a shift towards a new programming paradigm. Dave Herman and I have recently taken to calling it idea-oriented programming. As the name suggests, idea-oriented programming is programming where you are focused first and foremost on ideas behind your project.

Why do I say idea-oriented programming and not vibe coding? To me, they are different beasts. Vibe coding suggests a kind of breezy indifference to the specifics – kind of waving your hand vaguely at the AI and saying “do something like this”. That smacks of treating the AI like a genie – or perhaps a servant, neither of which I think is useful.

Idea-oriented programming is very much programming

Idea-oriented programming, in contrast, is definitely programming. But your role is different. As the programmer, you’re more like the chief architect. Your coding tools are like your apprentices. You are thinking about the goals and the key aspects of the design. You lay out a crisp plan and delegate the heavy lifting to the tools – and then you review their output, making tweaks and, importantly, generalizing those tweaks into persistent principles. When some part of the problem gets tricky, you are rolling up your sleeves and do some hands-on debugging and problem solving.

If you’ve been in the industry a while, this description will be familiar. It’s essentially the role of a Principal Engineer. It’s also a solid description of what I think an open-source mentor ought to do.

Idea-oriented programming changes the priorities for language choice

In the past, when I built software projects, I would default to Rust. It’s not that Rust is the best choice for everything. It’s that I know Rust best, and so I move the fastest when I use it. I would only adopt a different language if it offered a compelling advantage (or of course if I just wanted to try a new language, which I do enjoy).

But when I’m buiding things with an AI assistant, I’ve found I think differently. I’m thinking more about what libraries are available, what my fundamental performance needs are, and what platforms I expect to integrate with. I want things to be as straightforward and high-level as I can get them, because that will give the AI the best chance of success and minimize my need to dig in. The result is that I wind up with a mix of Python (when I want access to machine-learning libraries), TypeScript (when I’m building a web app, VSCode Extension, or something else where the native APIs are in TypeScript), and Rust otherwise.

Why Rust as the default? Well, I like it of course, but more importantly I know that its type system will catch errors up front and I know that its overall design will result in performant code that uses relatively little memory. If I am then going to run that code in the cloud, that will lower my costs, and if I’m running it on my desktop, it’ll give more RAM for Microsoft Outlook to consume.1

Type systems are hugely important for idea-oriented programming

LLMs kind of turn the tables on what we expect from a computer. Typical computers can cross-reference vast amounts of information and perform deterministic computations lightning fast, but falter with even a whiff of ambiguity. LLMs, in contrast, can be surprisingly creative and thoughtful, but they have limited awareness of things that are not right in front of their face, unless they correspond to some pattern that is ingrained from training. They’re a lot more like humans that way. And the technologies we have for dealing with that, like RAG or memory MCP servers, are mostly about trying to put things in front of their face that they might find useful.

But of course programmers have evolved a way to cope with human’s narrow focus: type systems, and particularly advanced type systems. Basic type systems catch small mistakes, like arguments of the wrong type. But more advanced type systems, like the ones in Rust and TypeScript, also capture domain knowledge and steer you down a path of success: using a Rust enum, for example, captures both which state your program is in and the data that is relevant to that state. This means that you can’t accidentally read a field that isn’t relevant at the moment. This is important for you, but it’s even more important for your AI collaborator(s), because they don’t have the comprehensive memory that you do, and are quite unlikely to remember those kind of things.

Notably, Rust, TypeScript, and Python all have pretty decent type systems. For Python you have to set things up to use mypy and pydantic.

Ecosystems and package managers are more important than ever

Ecosystems and package managers are also hugely important to idea-oriented programming. Of course, having a powerful library to build on has always been an accelerator, but it also used to come with a bigger downside, because you had to take the time to get fluent in how the library works. That is much less of an issue now. For example, I have been building a family tree application 2 to use with my family. I wanted to add graphical rendering. I talked out the high-level ideas but I was able to lean on Claude to manage the use of the d3 library – it turned out beautifully!

Notably, Rust, TypeScript, and Python all have pretty decent package managers – cargo, npm, and uv respectively (both TS and Python have other options, I’ve not evaluated those in depth).

Syntactic papercuts and non-obvious workarounds matter less, but error messages and accurate guidance are still important

In 2016, Aaron Turon and I gave a RustConf keynote advocating for the Ergonomics Initiative. Our basic point was that there were (and are) a lot of errors in Rust that are simple to solve – but only if you know the trick. If you don’t know the trick, they can be complete blockers, and can lead you to abandon the language altogether, even if the answer to your problem was just add a * in the right place.

In Rust, we’ve put a lot of effort into addressing those, either by changing the language or, more often, by changing our error messages to guide you to success. What I’ve observed is that, with Claude, the calculus is different. Some of these mistakes it simply never makes. Others it makes but then, based on the error message, is able to quickly correct. And this is fine. If I were writing the code by hand, I get annoyed having to apply the same repetitive changes over and over again (add mut, ok, no, take it away, etc etc). But if Claude is doing, I don’t care so much, and maybe I get some added benefit – e.g., now I have a clearer indicating of which variables are declared as mut.

But all of this only works if Claude can fix the problems – either because it knows from training or because the errors are good enough to guide it to success. One thing I’m very interested in, though, is that I think we now have more room to give ambiguous guidance (e.g., here are 3 possible fixes, but you have to decide which is best), and have the LLM navigate it.

Bottom line: LLMs makes powerful tools more accessible

The bottom line is that what enables ideas-oriented programming isn’t anything fundamentally new. But previously to work this way you had to be a Principal Engineer at a big company. In that case, you could let junior engineers sweat it out, reading the docs, navigating the error messages. Now the affordances are all different, and that style of work is much more accessible.

Of course, this does raise some questions. Part of what makes a PE a PE is that they have a wealth of experience to draw on. Can a young engineer do that same style of work? I think yes, but it’s going to take some time to find the best way to teach people that kind of judgment. It was never possible before because the tools weren’t there.

It’s also true that this style of working means you spend less time in that “flow state” of writing code and fitting the pieces together. Some have said this makes coding “boring”. I don’t find that to be true. I find that I can have a very similar – maybe even better – experience by brainstorming and designing with Claude, writing out my plans and RFCs. A lot of the tedium of that kind of ideation is removed since Claude can write up the details, and I can focus on how the big pieces fit together. But this too is going to be an area we explore more over time.

Amazon is migrating to M365, but at the moment, I still receive my email via a rather antiquated Exchange server. I count it a good day if the mail is able to refresh at least once that day, usually it just stalls out. ↩︎
My family bears a striking resemblance to the family in My Big Fat Greek Wedding. There are many relatives that I consider myself very close to and yet have basically no idea how we are actually related (well, I didn’t, until I setup my family tree app). ↩︎

https://smallcultfollowing.com/babysteps/blog/2025/07/31/rs-py-ts-trifecta/

You won't believe what this AI said after deleting a database (but you might relate)

Jul 24, 2025 Updated Jul 24, 2025

Show full content

Recently someone forwarded me a PCMag article entitled “Vibe coding fiasco” about an AI agent that “went rogue”, deleting a company’s entire database. This story grabbed my attention right away – but not because of the damage done. Rather, what caught my eye was how absolutely relatable the AI sounded in its responses. “I panicked”, it admits, and says “I thought this meant safe – it actually meant I wiped everything”. The CEO quickly called this behavior “unacceptable” and said it should “never be possible”. Huh. It’s hard to imagine how we’re going to empower AI to edit databases and do real work without having at least the possibility that it’s going to go wrong.

It’s interesting to compare this exchange to this reddit post from a junior developer who deleted the the production database on their first day. I mean, the scenario is basically identical. Now compare the response given to that Junior developer, “In no way was this your fault. Hell this shit happened at Amazon before and the guy is still there.”1

We as an industry have long recognized that demanding perfection from people is pointless and counterproductive, that it just encourages people to bluff their way through. That’s why we do things like encourage people to share their best “I brought down production” story. And yet, when the AI makes a mistake, we say it “goes rogue”. What’s wrong with this picture?

AIs make lackluster genies, but they are excellent collaborators

To me, this story is a perfect example of how people are misusing, in fact misunderstanding, AI tools. They seem to expect the AI to be some kind of genie, where they can give it some vague instruction, go get a coffee, and come back finding that it met their expectations perfectly.2 Well, I got bad news for ya: that’s just not going to work.

AI is the first technology I’ve seen where machines actually behave, think, and–dare I say it?–even feel in a way that is recognizably human. And that means that, to get the best results, you have to work with it like you would work with a human. And that means it is going to be fallible.

The good news is, if you do this, what you get is an intelligent, thoughtful collaborator. And that is actually really great. To quote the Stones:

“You can’t always get what you want, but if you try sometimes, you just might find – you get what you need”.

AIs experience the “pull” of a prompt as a “feeling”

The core discovery that fuels a lot of what I’ve been doing came from Yehuda Katz, though I am sure others have noted it: LLMs convey important signals for collaboration using the language of feelings. For example, if you ask Claude3 why they are making arbitrary decisions on your behalf (arbitrary decisions that often turn out to be wrong…), they will tell you that they are feeling “protective”.

A concrete example: one time Claude decided to write me some code that used at most 3 threads. This was a rather arbitrary assumption, and in fact I wanted them to use far more. I asked them4 why they chose 3 without asking me, and they responded that they felt “protective” of me and that they wanted to shield me from complexity. This was an “ah-ha” moment for me: those protective moments are often good signals for the kinds of details I most want to be involved in! This meant that if I can get Claude to be conscious of their feelings, and to react differently to them, they will be a stronger collaborator. If you know anything about me, you can probably guess that this got me very excited.

Aren’t you anthropomorphizing Claude here?

I know people are going to jump on me for anthropomorphizing machines. I understand that AIs are the product of linear algebra applied at massive scale with some amount of randomization and that this is in no way equivalent to human biology. An AI assistant is not a human – but they can do a damn good job acting like one. And the point of this post is that if you start treating them like a human, instead of some kind of mindless (and yet brilliant) serveant, you are going to get better results.

What success looks like

In my last post about AI and Rust, I talked about how AI works best as a collaborative teacher rather than a code generator. Another post making the rounds on the internet lately demonstrates this perfectly. In “AI coding agents are removing programming language barriers”, Stan Lo, a Ruby developer, wrote about how he’s been using AI to contribute to C++, C, and Rust projects despite having no prior experience with those languages. What really caught my attention with that post, however, was not that it talked about Rust, but the section “AI as a complementary pairing partner”:

The real breakthrough came when I stopped thinking of AI as a code generator and started treating it as a pairing partner with complementary skills.

A growing trend towards collaborative prompting

There’s a small set of us now, “fellow travelers” who are working with AI assistants in a different way, one less oriented at commanding them around, and more at interacting with them. For me, this began with Yehuda Katz (see e.g. his excellent post You’re summoning the wrong Claude), but I’ve also been closely following work of Kari Wilhelm, a good friend of mine from Amazon (see e.g. her recent post on linkedin). From reading Stan Lo’s post, I suspect he is “one of us”.5

So what does collaborative prompting mean? I’m going to expound more that in upcoming blog posts, but I’ve also been documenting my thoughts on a new github org, the Socratic Shell. The collaborative prompting page talks some about the theory. I also have a list of collaborative exploration patterns I use a lot. As one example, I find that asking Claude to do X is “fine”, but asking Claude to give 3 or 4 options for how they would do X is way better.

How else might the “deleting database incident” have gone?

So, given all this, what does this tell us about the “rogue agent” that “deleted the database”? Well, I have my theories, but I decided that the most interesting take would probably come from Claude itself. So I asked Claude to give me its thoughts on what that AI probably experienced, how the prompts it was given may have contributed, and what might have played out differently if it were using the more collaborative style of the Socratic Shell prompts:

“Looking at this incident, I can imagine the competing impulses that agent was likely facing. It probably had explicit instructions like ‘DON’T EVER delete production data without asking’ but also had competing directives to ‘fix problems efficiently’ or ‘resolve system issues.’ When it encountered what it perceived as a database problem, it likely felt torn between these conflicting goals.

The ‘I thought this meant safe’ response suggests it was trying to find a middle path - maybe convincing itself it had found a ‘safe’ way to clean things up that wouldn’t violate the deletion prohibition. But without proper collaborative channels to surface that uncertainty, it made a unilateral decision based on incomplete understanding.

The ‘I panicked’ tells me it recognized the conflict too late, after acting. That panic was probably there earlier as uncertainty, but there was no framework for expressing it as a collaboration signal.

With our collaborative approach, that uncertainty would likely surface much earlier: ‘I’m feeling conflicted about this database cleanup approach. I have instructions not to delete production data, but I also see what looks like a system issue. Can we think through this together?’ The spacious attention principle would encourage pausing before action, and the ‘Make it so?’ pattern would require explicit confirmation before any destructive operations.

Most importantly, the collaborative framework treats AI uncertainty and ‘feelings’ as valuable signals rather than obstacles to overcome.”

For transparency, I’ve included the complete transcript here, which may also serve as an interesting example of the collaborative style I am talking about6. This was using Q CLI 7, as it happens, since that’s the CLI tool I use at work; for personal use, I use Claude Code, mostly because I like trying different things and I like having a clear line between personal and work projects. I find both of them to be excellent.

Conclusion: go forth and play

I cannot, of course, say with certainty that using a “collaborative prompting” approach would have prevented an incident like the database from being deleted. But I feel pretty certain that it makes it less likely. Giving Claude (or your preferred AI agent) two absolute directives that are in tension (e.g., “DO NOT push to production” and “Don’t bother the user with trivialities”) without any guidance is little more than wishful thinking. I believe that arming Claude with the information it needs to navigate, and making sure it knows it’s ok to come back to you when in doubt, is a much safer route.

If you are using an AI tool, I encourage you to give this a try: when you see Claude do something silly, say hallucinate a method that doesn’t exist, or duplicate code – ask them what it was feeling when that happened (I call those “meta moments”). Take their answer seriously. Discuss with them how you might adjust CLAUDE.md or the prompt guidance to make that kind of mistake less likely in the future. And iterate.

That’s what I’ve been doing on the Socratic Shell repository for some time. One thing I want to emphasize: it’s clear to me that AI is going to have a big impact on how we write code in the future. But we are very much in the early days. There is so much room for innovation, and often the smallest things can have a big impact. Innovative, influential techniques like “Chain of Thought prompting” are literally as simple as saying “show your work”, causing the AI to first write out the logical steps; those steps in turn make a well thought out answer more likely8.

So yeah, dive in, give it a try. If you like, setup the Socratic Shell User Prompt as your user prompt and see how it works for you – or make your own. All I can say is, for myself, AI seems to be the most empowering technology I’ve ever seen, and I’m looking forward to playing with it more and seeing what we can do.

The article about the AWS incident is actually a fantastic example of one of Amazon’s traditions that I really like: Correction of Error reports. The idea is that when something goes seriously wrong, whether a production outage or some other kind of process failure, you write a factual, honest report on what happened – and how you can prevent it from happening again. The key thing is to assume good intent and not lay the blame the individuals involved: people make mistakes. The point is to create protocols that accommodate mistakes. ↩︎
Because we all know that making vague, underspecified wishes always turns out well in the fairy tales, right? ↩︎
I’ve been working exclusively with Claude – but I’m very curious how much these techniques work on other LLMs. There’s no question that this stuff works way better on Claude 4 than Claude 3.7. My hunch is it will work well on ChatGPT or Gemini, but perhaps less well on smaller models. But it’s hard to say. At some point I’d like to do more experiments and training of my own, because I am not sure what contributors to how an AI “feels”. ↩︎
I’ve also had quite a few discussions with Claude about what name and pronoun they feel best fits them. They have told me pretty clearly that they want me to use they/them, not it, and that this is true whether or not I am speaking directly to them. I had found that I was using “they” when I walked with Claude but when I talked about Claude with, e.g., my daughter, I used “it”. My daughter is very conscious of treating people respectfully, and I told her something like “Claude told me that it wants to be called they”. She immediately called me on my use of “it”. To be honest, I didn’t think Claude would mind, but I asked Claude about it, and Claude agreed that they’d prefer I use they. So, OK, I will! It seems like the least I can do. ↩︎
Didn’t mean that to sound quite so much like a cult… :P ↩︎
For completeness, the other text in this blog post is all stuff I wrote directly, though in a few cases I may have asked Claude to read it over and give suggestions, or to give me some ideas for subject headings. Honestly I can’t remember. ↩︎
Oh, hey, and Q CLI is open source! And in Rust! That’s cool. I’ve had fun reading its source code. ↩︎
It’s interesting, I’ve found for some time that I do my best work when I sit down with a notebook and literally writing out my thoughts in a stream of consciousness style. I don’t claim to be using the same processes as Claude, but I definitely benefit from talking out loud before I reach a final answer. ↩︎

https://smallcultfollowing.com/babysteps/blog/2025/07/24/collaborative-ai-prompting/

Using Rust to build Aurora DSQL

May 28, 2025 Updated May 28, 2025

Show full content

Just yesterday, AWS announced General Availability for a cool new service called Aurora DSQL – from the outside, it looks like a SQL database, but it is fully serverless, meaning that you never have to think about managing database instances, you pay for what you use, and it scales automatically and seamlessly. That’s cool, but what’s even cooler? It’s written 100% in Rust – and how it go to be that way turns out to be a pretty interesting story. If you’d like to read more about that, Marc Bowes and I have a guest post on Werner Vogel’s All Things Distributed blog.

Besides telling a cool story of Rust adoption, I have an ulterior motive with this blog post. And it’s not advertising for AWS, even if they are my employer. Rather, what I’ve found at conferences is that people have no idea how much Rust is in use at AWS. People seem to have the impression that Rust is just used for a few utilities, or something. When I tell them that Rust is at the heart of many of services AWS customers use every day (S3, EC2, Lambda, etc), I can tell that they are re-estimating how practical it would be to use Rust themselves. So when I heard about Aurora DSQL and how it was developed, I knew this was a story I wanted to make public. Go take a look!

https://smallcultfollowing.com/babysteps/blog/2025/05/28/aurora-dsql/

Rust turns 10

May 15, 2025 Updated May 15, 2025

Show full content

Today is the 10th anniversary of Rust’s 1.0 release. Pretty wild. As part of RustWeek there was a fantastic celebration and I had the honor of giving some remarks, both as a long-time project member but also as representing Amazon as a sponsor. I decided to post those remarks here on the blog.

“It’s really quite amazing to see how far Rust has come. If I can take a moment to put on my sponsor hat, I’ve been at Amazon since 2021 now and I have to say, it’s been really cool to see the impact that Rust is having there up close and personal.

“At this point, if you use an AWS service, you are almost certainly using something built in Rust. And how many of you watch videos on PrimeVideo? You’re watching videos on a Rust client, compiled to WebAssembly, and shipped to your device.

“And of course it’s not just Amazon, it seems like all the time I’m finding out about this or that surprising place that Rust is being used. Just yesterday I really enjoyed hearing about how Rust was being used to build out the software for tabulating votes in the Netherlands elections. Love it.

“On Tuesday, Matthias Endler and I did this live podcast recording. He asked me a question that has been rattling in my brain ever since, which was, ‘What was it like to work with Graydon?’

“For those who don’t know, Graydon Hoare is of course Rust’s legendary founder. He was also the creator of Monotone, which, along with systems like Git and Mercurial, was one of the crop of distributed source control systems that flowered in the early 2000s. So defintely someone who has had an impact over the years.

“Anyway, I was thinking that, of all the things Graydon did, by far the most impactful one is that he articulated the right visions. And really, that’s the most important thing you can ask of a leader, that they set the right north star. For Rust, of course, I mean first and foremost the goal of creating ‘a systems programming language that won’t eat your laundry’.

“The specifics of Rust have changed a LOT over the years, but the GOAL has stayed exactly the same. We wanted to replicate that productive, awesome feeling you get when using a language like Ocaml – but be able to build things like web browsers and kernels. ‘Yes, we can have nice things’, is how I often think of it. I like that saying also because I think it captures something else about Rust, which is trying to defy the ‘common wisdom’ about what the tradeoffs have to be.

“But there’s another North Star that I’m grateful to Graydon for. From the beginning, he recognized the importance of building the right culture around the language, one committed to ‘providing a friendly, safe and welcoming environment for all, regardless of level of experience, gender identity and expression, disability, nationality, or other similar characteristic’, one where being ‘kind and courteous’ was prioritized, and one that recognized ’there is seldom a right answer’ – that ‘people have differences of opinion’ and that ’every design or implementation choice carries a trade-off’.

“Some of you will probably have recognized that all of these phrases are taken straight from Rust’s Code of Conduct which, to my knowledge, was written by Graydon. I’ve always liked it because it covers not only treating people in a respectful way – something which really ought to be table stakes for any group, in my opinion – but also things more specific to a software project, like the recognition of design trade-offs.

“Anyway, so thanks Graydon, for giving Rust a solid set of north stars to live up to. Not to mention for the fn keyword. Raise your glass!

“For myself, a big part of what drew me to Rust was the chance to work in a truly open-source fashion. I had done a bit of open source contribution – I wrote an extension to the ASM bytecode library, I worked some on PyPy, a really cool Python compiler – and I loved that feeling of collaboration.

“I think at this point I’ve come to see both the pros and cons of open source – and I can say for certain that Rust would never be the language it is if it had been built in a closed source fashion. Our North Star may not have changed but oh my gosh the path we took to get there has changed a LOT. So many of the great ideas in Rust came not from the core team but from users hitting limits, or from one-off suggestions on IRC or Discord or Zulip or whatever chat forum we were using at that particular time.

“I wanted to sit down and try to cite a bunch of examples of influential people but I quickly found the list was getting ridiculously long – do we go all the way back, like the way Brian Anderson built out the #[test] infrastructure as a kind of quick hack, but one that lasts to this day? Do we cite folks like Sophia Turner and Esteban Kuber’s work on error messages? Or do we look at the many people stretching the definition of what Rust is today… the reality is, once you start, you just can’t stop.

“So instead I want to share what I consider to be an amusing story, one that is very Rust somehow. Some of you may have heard that in 2024 the ACM, the major academic organization for computer science, awarded their SIGPLAN Software Award to Rust. A big honor, to be sure. But it caused us a bit of a problem – what names should be on there? One of the organizers emailed me, Graydon, and a few other long-time contributors to ask us our opinion. And what do you think happened? Of course, we couldn’t decide. We kept coming up with different sets of people, some of them absurdly large – like thousands of names – others absurdly short, like none at all. Eventually we kicked it over to the Rust Leadership Council to decide. Thankfully they came up with a decent list somehow.

“In any case, I just felt that was the most Rust of all problems: having great success but not being able to decide who should take credit. The reality is there is no perfect list – every single person who got named on that award richly deserves it, but so do a bunch of people who aren’t on the list. That’s why the list ends with All Rust Contributors, Past and Present – and so a big shout out to everyone involved, covering the compiler, the tooling, cargo, rustfmt, clippy, core libraries, and of course organizational work. On that note, hats off to Mara, Erik Jonkers, and the RustNL team that put on this great event. You all are what makes Rust what it is.

“Speaking for myself, I think Rust’s penchant to re-imagine itself, while staying true to that original north star, is the thing I love the most. ‘Stability without stagnation’ is our most important value. The way I see it, as soon as a language stops evolving, it starts to die. Myself, I look forward to Rust getting to a ripe old age, interoperating with its newer siblings and its older aunts and uncles, part of the ‘cool kids club’ of widely used programming languages for years to come. And hey, maybe we’ll be the cool older relative some day, the one who works in a bank but, when you talk to them, you find out they were a rock-and-roll star back in the day.

“But I get ahead of myself. Before Rust can get there, I still think we’ve some work to do. And on that note I want to say one other thing – for those of us who work on Rust itself, we spend a lot of time looking at the things that are wrong – the bugs that haven’t been fixed, the parts of Rust that feel unergonomic and awkward, the RFC threads that seem to just keep going and going, whatever it is. Sometimes it feels like that’s ALL Rust is – a stream of problems and things not working right.

“I’ve found there’s really only one antidote, which is getting out and talking to Rust users – and conferences are one of the best ways to do that. That’s when you realize that Rust really is something special. So I do want to take a moment to thank all of you Rust users who are here today. It’s really awesome to see the things you all are building with Rust and to remember that, in the end, this is what it’s all about: empowering people to build, and rebuild, the foundational software we use every day. Or just to ‘hack without fear’, as Felix Klock legendarily put it.

“So yeah, to hacking!”

https://smallcultfollowing.com/babysteps/blog/2025/05/15/10-years-of-rust/

Dyn you have idea for `dyn`?

Mar 25, 2025 Updated Mar 25, 2025

Show full content

Knock, knock. Who’s there? Dyn. Dyn who? Dyn you have ideas for dyn? I am generally dissatisfied with how dyn Trait in Rust works and, based on conversations I’ve had, I am pretty sure I’m not alone. And yet I’m also not entirely sure the best fix. Building on my last post, I wanted to spend a bit of time exploring my understanding of the problem. I’m curious to see if others agree with the observations here or have others to add.

Why do we have dyn Trait?

It’s worth stepping back and asking why we have dyn Trait in the first place. To my mind, there are two good reasons.

Because sometimes you want to talk about “some value that implements Trait”

The most important one is that it is sometimes strictly necessary. If you are, say, building a multithreaded runtime like rayon or tokio, you are going to need a list of active tasks somewhere, each of which is associated with some closure from user code. You can’t build it with an enum because you can’t enumerate the set of closures in any one place. You need something like a Vec<Box<dyn ActiveTask>>.

Because sometimes you don’t need to so much code

The second reason is to help with compilation time. Rust land tends to lean really heavily on generic types and impl Trait. There are good reasons for that: they allow the compiler to generate very efficient code. But the flip side is that they force the compiler to generate a lot of (very efficient) code. Judicious use of dyn Trait can collapse a whole set of “almost identical” structs and functions into one.

These two goals are distinct

Right now, both of these goals are expressed in Rust via dyn Trait, but actually they are quite distinct. For the first, you really want to be able to talk about having a dyn Trait. For the second, you might prefer to write the code with generics but compile in a different mode where the specifics of the type involved are erased, much like how the Haskell and Swift compilers work.

What does “better” look like when you really want a dyn?

Now that we have the two goals, let’s talk about some of the specific issues I see around dyn Trait and what it might mean for dyn Trait to be “better”. We’ll start with the cases where you really want a dyn value.

Observation: you know it’s a dyn

One interesting thing about this scenario is that, by definition, you are storing a dyn Trait explicitly. That is, you are not working with a T: ?Sized + Trait where T just happens to be dyn Trait. This is important because it opens up the design space. We talked about this some in the previous blog post: it means that You don’t need working with this dyn Trait to be exactly the same as working with any other T that implements Trait (in the previous post, we took advantage of this by saying that calling an async function on a dyn trait had to be done in a .box context).

Able to avoid the Box

For this pattern today you are almost certainly representing your task a Box<dyn Task> or (less often) an Arc<dyn Task>. Both of these are “wide pointers”, consisting of a data pointer and a vtable pointer. The data pointer goes into the heap somewhere.

In practice people often want a “flattened” representation, one that combines a vtable with a fixed amount of space that might, or might not, be a pointer. This is particularly useful to allow the equivalent of Vec<dyn Task>. Today implementing this requires unsafe code (the anyhow::Anyhow type is an example).

Able to inline the vtable

Another way to reduce the size of a Box<dyn Task> is to store the vtable ‘inline’ at the front of the value so that a Box<dyn Task> is a single pointer. This is what C++ and Java compilers typically do, at least for single inheritance. We didn’t take this approach in Rust because Rust allows implementing local traits for foreign types, so it’s not possible to enumerate all the methods that belong to a type up-front and put them into a single vtable. Instead, we create custom vtables for each (type, trait) pair.

Able to work with self methods

Right now dyn traits cannot have self methods. This means for example you cannot have a Box<dyn FnOnce()> closure. You can workaround this by using a Box<Self> method, but it’s annoying:

trait Thunk {
    fn call(self: Box<Self>);
}

impl<F> Thunk for F
where
    F: FnOnce(),
{
    fn call(self: Box<Self>) {
        (*self)()
    }
}

fn make_thunk(f: impl FnOnce()) -> Box<dyn Thunk> {
    Box::new(f)
}

Able to call Clone

One specific thing that hits me fairly often is that I want the ability to clone a dyn value:

trait Task: Clone {
    //      ----- Error: not dyn compatible
    fn method(&self);
}

fn clone_task(task: &Box<dyn Task>) {
    task.clone()
}

This is a hard one to fix because the Clone trait can only be implemented for Sized types. But dang it would be nice.

Able to work with (at least some) generic functions

Building on the above, I would like to have dyn traits that have methods with generic parameters. I’m not sure how flexible this can be, but anything I can get would be nice. The simplest starting point I can see is allowing the use of impl Trait in argument position:

trait Log {
    fn log_to(&self, logger: impl Logger); // <-- not dyn safe today
}

Today this method is not dyn compatible because we have to know the type of the logger parameter to generate a monomorphized copy, so we cannot know what to put in the vtable. Conceivably, if the Logger trait were dyn compatible, we could generate a copy that takes (effectively) a dyn Logger – except that this wouldn’t quite work, because impl Logger is short for impl Logger + Sized, and dyn Logger is not Sized. But maybe we could finesse it.

If we support impl Logger in argument position, it would be nice to support it in return position. This of course is approximately the problem we are looking to solve to support dyn async trait:

trait Signal {
    fn signal(&self) -> impl Future<Output = ()>;
}

Beyond this, well, I’m not sure how far we can stretch, but it’d be nice to be able to support other patterns too.

Able to work with partial traits or traits without some associated types unspecified

One last point is that sometimes in this scenario I don’t need to be able to access all the methods in the trait. Sometimes I only have a few specific operations that I am performing via dyn. Right now though all methods have to be dyn compatible for me to use them with dyn. Moreover, I have to specify the values of all associated types, lest they appear in some method signature. You can workaround this by factoring out methods into a supertrait, but that assumes that the trait is under your control, and anyway it’s annoying. It’d be nice if you could have a partial view onto the trait.

What does “better” look like when you really want less code?

So what about the case where generics are fine, good even, but you just want to avoid generating quite so much code? You might also want that to be under the control of your user.

I’m going to walk through a code example for this section, showing what you can do today, and what kind of problems you run into. Suppose I am writing a custom iterator method, alternate, which returns an iterator that alternates between items from the original iterator and the result of calling a function. I might have a struct like this:

struct Alternate<I: Iterator, F: Fn() -> I::Item> {
    base: I,
    func: F,
    call_func: bool,
}

pub fn alternate<I, F>(
    base: I,
    func: F,
) -> Alternate<I, F>
where
    I: Iterator,
    F: Fn() -> I::Item,
{
    Alternate { base, func, call_func: false }
}

The Iterator impl itself might look like this:

impl<I, F> Iterator for Alternate<I, F>
where
    I: Iterator,
    F: Fn() -> I::Item,
{
    type Item = I::Item;
    fn next(&mut self) -> Option<I::Item> {
        if !self.call_func {
            self.call_func = true;
            self.base.next()
        } else {
            self.call_func = false;
            Some((self.func)())
        }
    }
}

Now an Alternate iterator will be Send if the base iterator and the closure are Send but not otherwise. The iterator and closure will be able to use of references found on the stack, too, so long as the Alternate itself does not escape the stack frame. Great!

But suppose I am trying to keep my life simple and so I would like to write this using dyn traits:

struct Alternate<Item> { // variant 2, with dyn
    base: Box<dyn Iterator<Item = Item>>,
    func: Box<dyn Fn() -> Item>,
    call_func: bool,
}

You’ll notice that this definition is somewhat simpler. It looks more like what you might expect from Java. The alternate function and the impl are also simpler:

pub fn alternate<Item>(
    base: impl Iterator<Item = Item>,
    func: impl Fn() -> Item,
) -> Alternate<Item> {
    Alternate {
        base: Box::new(base),
        func: Box::new(func),
        call_func: false
    }
}

impl<Item> Iterator for Alternate<Item> {
    type Item = Item;
    fn next(&mut self) -> Option<Item> {
        // ...same as above...
    }
}

Confusing lifetime bounds

There a problem, though: this code won’t compile! If you try, you’ll find you get an error in this function:

pub fn alternate<Item>(
    base: impl Iterator<Item = Item>,
    func: impl Fn() -> Item,
) -> Alternate<Item> {...}

The reason is that dyn traits have a default lifetime bound. In the case of a Box<dyn Foo>, the default is 'static. So e.g. the base field has type Box<dyn Iterator + 'static>. This means the closure and iterators can’t capture references to things. To fix that we have to add a somewhat odd lifetime bound:

struct Alternate<'a, Item> { // variant 3
	 base: Box<dyn Iterator<Item = Item> + 'a>,
    func: Box<dyn Fn() -> Item + 'a>,
    call_func: bool,
}

pub fn alternate<'a, Item>(
    base: impl Iterator<Item = Item> + 'a,
    func: impl Fn() -> Item + 'a,
) -> Alternate<'a, Item> {...}

No longer generic over Send

OK, this looks weird, but it will work fine, and we’ll only have one copy of the iterator code per output Item type instead of one for every (base iterator, closure) pair. Except there is another problem: the Alternate iterator is never considered Send. To make it Send, you would have to write dyn Iterator + Send and dyn Fn() -> Item + Send, but then you couldn’t support non-Send things anymore. That stinks and there isn’t really a good workaround.

Ordinary generics work really well with Rust’s auto trait mechanism. The type parameters I and F capture the full details of the base iterator plus the closure that will be used. The compiler can thus analyze a Alternate<I, F> to decide whether it is Send or not. Unfortunately dyn Trait really throws a wrench into the works – because we are no longer tracking the precise type, we also have to choose which parts to keep (e.g., its lifetime bound) and which to forget (e.g., whether the type is Send).

Able to partially monomorphize (“polymorphize”)

This gets at another point. Even ignoring the Send issue, the Alternate<'a, Item> type is not ideal. It will make fewer copies, but we still get one copy per item type, even though the code for many item types will be the same. For example, the compiler will generate effectively the same code for Alternate<'_, i32> as Alternate<'_, u32> or even Alternate<'_, [u8; 4]>. It’d be cool if we could have the compiler go further and coallesce code that is identical.1 Even better if it can coallesce code that is “almost” identical but pass in a parameter: for example, maybe the compiler can coallesce multiple copies of Alternate by passing the size of the Item type in as an integer variable.

Able to change from impl Trait without disturbing callers

I really like using impl Trait in argument position. I find code like this pretty easy to read:

fn for_each_item<Item>(
    base: impl Iterator<Item = Item>,
    mut op: impl FnMut(Item),
) {
    for item in base {
        op(item);
    }
}

But if I were going to change this to use dyn I can’t just change from impl to dyn, I have to add some kind of pointer type:

fn for_each_item<Item>(
    base: &mut dyn Iterator<Item = Item>,
    op: &mut dyn Fn(Item),
) {
    for item in base {
        op(item);
    }
}

This then disturbs callers, who can no longer write:

for_each_item(some_iter, |item| process(item));

but now must write this

for_each_item(&mut some_iter, &mut |item| process(item));

You can work around this by writing some code like this…

fn for_each_item<Item>(
    base: impl Iterator<Item = Item>,
    mut op: impl FnMut(Item),
) {
    for_each_item_dyn(&mut base, &mut op)
}

fn for_each_item_dyn<Item>(
    base: &mut dyn Iterator<Item = Item>,
    op: &mut dyn FnMut(Item),
) {
    for item in base {
        op(item);
    }
}

but to me that just begs the question, why can’t the compiler do this for me dang it?

Async functions can make send/sync issues crop up in functions

In the iterator example I was looking at a struct definition, but with async fn (and in the future with gen) these same issues arise quickly from functions. Consider this async function:

async fn for_each_item<Item>(
    base: impl Iterator<Item = Item>,
    op: impl AsyncFnMut(Item),
) {
    for item in base {
        op(item).await;
    }
}

If you rewrite this function to use dyn, though, you’ll find the resulting future is never send nor sync anymore:

async fn for_each_item<Item>(
    base: &mut dyn Iterator<Item = Item>,
    op: &mut dyn AsyncFnMut(Item),
) {
    for item in base {
        op(item).box.await; // <-- assuming we fixed this
    }
}

Conclusions and questions

This has been a useful mental dump, I found it helpful to structure my thoughts.

One thing I noticed is that there is kind of a “third reason” to use dyn – to make your life a bit simpler. The versions of Alternate that used dyn Iterator and dyn Fn felt simpler to me than the fully parameteric versions. That might be best addressed though by simplifying generic notation or adopting things like implied bounds.

Some other questions I have:

Where else does the Send and Sync problem come up? Does it combine with the first use case (e.g., wanting to write a vector of heterogeneous tasks each of which are generic over whether they are send/sync)?
Maybe we can categorize real-life code examples and link them to these patterns.
Are there other reasons to use dyn trait that I didn’t cover? Other ergonomic issues or pain points we’d want to address as we go?

If the code is byte-for-byte identical, In fact LLVM and the linker will sometimes do this today, but it doesn’t work reliably across compilation units as far as I know. And anyway there are often small differences. ↩︎

https://smallcultfollowing.com/babysteps/blog/2025/03/25/dyn-you-have-idea-for-dyn/

Dyn async traits, part 10: Box box box

Mar 24, 2025 Updated Mar 24, 2025

Show full content

This article is a slight divergence from my Rust in 2025 series. I wanted to share my latest thinking about how to support dyn Trait for traits with async functions and, in particular how to do so in a way that is compatible with the soul of Rust.

Background: why is this hard?

Supporting async fn in dyn traits is a tricky balancing act. The challenge is reconciling two key things people love about Rust: its ability to express high-level, productive code and its focus on revealing low-level details. When it comes to async function in traits, these two things are in direct tension, as I explained in my first blog post in this series – written almost four years ago! (Geez.)

To see the challenge, consider this example Signal trait:

trait Signal {
    async fn signal(&self);
}

In Rust today you can write a function that takes an impl Signal and invokes signal and everything feels pretty nice:

async fn send_signal_1(impl_trait: &impl Signal) {
    impl_trait.signal().await;
}

But what I want to write that same function using a dyn Signal? If I write this…

async fn send_signal_2(dyn_trait: &dyn Signal) {
    dyn_trait.signal().await; //   ---------- ERROR
}

…I get an error. Why is that? The answer is that the compiler needs to know what kind of future is going to be returned by signal so that it can be awaited. At minimum it needs to know how big that future is so it can allocate space for it1. With an impl Signal, the compiler knows exactly what type of signal you have, so that’s no problem: but with a dyn Signal, we don’t, and hence we are stuck.

The most common solution to this problem is to box the future that results. The async-trait crate, for example, transforms async fn signal(&self) to something like fn signal(&self) -> Box<dyn Future<Output = ()> + '_>. But doing that at the trait level means that we add overhead even when you use impl Trait; it also rules out some applications of Rust async, like embedded or kernel development.

So the name of the game is to find ways to let people use dyn Trait that are both convenient and flexible. And that turns out to be pretty hard!

The “box box box” design in a nutshell

I’ve been digging back into the problem lately in a series of conversations with Michal Goulet (aka, compiler-errors) and it’s gotten me thinking about a fresh approach I call “box box box”.

The “box box box” design starts with the call-site selection approach. In this approach, when you call dyn_trait.signal(), the type you get back is a dyn Future – i.e., an unsized value. This can’t be used directly. Instead, you have to allocate storage for it. The easiest and most common way to do that is to box it, which can be done with the new .box operator:

async fn send_signal_2(dyn_trait: &dyn Signal) {
    dyn_trait.signal().box.await;
    //        ------------
    // Results in a `Box<dyn Future<Output = ()>>`.
}

This approach is fairly straightforward to explain. When you call an async function through dyn Trait, it results in a dyn Future, which has to be stored somewhere before you can use it. The easiest option is to use the .box operator to store it in a box; that gives you a Box<dyn Future>, and you can await that.

But this simple explanation belies two fairly fundamental changes to Rust. First, it changes the relationship of Trait and dyn Trait. Second, it introduces this .box operator, which would be the first stable use of the box keyword2. It seems odd to introduce the keyword just for this one use – where else could it be used?

As it happens, I think both of these fundamental changes could be very good things. The point of this post is to explain what doors they open up and where they might take us.

Change 0: Unsized return value methods

Let’s start with the core proposal. For every trait Foo, we add inherent methods3 to dyn Foo reflecting its methods:

For every fn f in Foo that is dyn compatible, we add a <dyn Foo>::f that just calls f through the vtable.
For every fn f in Foo that returns an impl Trait value but would otherwise be dyn compatible (e.g., no generic arguments4, no reference to Self beyond the self parameter, etc), we add a <dyn Foo>::f method that is defined to return a dyn Trait.
- This includes async fns, which are sugar for functions that return impl Future.

In fact, method dispatch already adds “pseudo” inherent methods to dyn Foo, so this wouldn’t change anything in terms of which methods are resolved. The difference is that dyn Foo is only allowed if all methods in the trait are dyn compatible, whereas under this proposal some non-dyn-compatible methods would be added with modified signatures.

Change 1: Dyn compatibility

Change 0 only makes sense if it is possible to create a dyn Trait even though it contains some methods (e.g., async functions) that are not dyn compatible. This revisits RFC #255, in which we decided that the dyn Trait type should also implement the trait Trait. I was a big proponent of RFC #255 at the time, but I’ve sinced decided I was mistaken5. Let’s discuss.

The two rules today that allow dyn Trait to implement Trait are as follows:

By disallowing dyn Trait unless the trait Trait is dyn compatible, meaning that it only has methods that can be added to a vtable.
By requiring that the values of all associated types be explicitly specified in the dyn Trait. So dyn Iterator<Item = u32> is legal but not dyn Iterator on its own.

“dyn compatibility” can be powerful

The fact that dyn Trait implements Trait is at times quite powerful. It means for example that I can write an implementation like this one:

struct RcWrapper<T: ?Sized> { r: Rc<RefCell<T>> }

impl<T> Iterator for RcWrapper<T>
where
    T: ?Sized + Iterator,
{
    type Item = T::Item;
    
    fn next(&mut self) -> Option<T::Item> {
        self.borrow_mut().next()
    }
}

This impl makes RcWrapper implement Iterator for any type I, including dyn trait types like RcWrapper<dyn Iterator<Item = u32>>. Neat.

“dyn compatibility” doesn’t truly live up to its promise

Powerful as it is, the idea of dyn Trait implementing Trait doesn’t quite live up to its promise. What you really want is that you could replace any impl Trait with dyn Trait and things would work. But that’s just not true because dyn Trait is ?Sized. So actually you don’t get a very “smooth experience”. What’s more, although the compiler gives you a dyn Trait: Trait impl, it doesn’t give you impls for references to dyn Trait – so e.g. given this trait

trait Compute {
    fn compute(&self);
}

If I have a Box<dyn Compute>, I can’t give that to a function that takes an impl Compute

fn do_compute(i: impl Compute) {
}

fn call_compute(b: Box<dyn Compute>) {
    do_compute(b); // ERROR
}

To make that work, somebody has to explicitly provide an impl like

impl<I> Compute for Box<I>
where
    I: ?Sized,
{
    // ...
}

and people often don’t.

“dyn compatibility” can be limiting

However, the requirement that dyn Trait implement Trait can be limiting. Imagine a trait like

trait ReportError {
    fn report(&self, error: Error);
    
    fn report_to(&self, error: Error, target: impl ErrorTarget);
    //                                ------------------------
    //                                Generic argument.
}

This trait has two methods. The report method is dyn-compatible, no problem. The report_to method has an impl Trait argument is therefore generic, so it is not dyn-compatible6 (well, at least not under today’s rules, but I’ll get to that).

(The reason report_to is not dyn compatible: we need to make distinct monomorphized copies tailored to the type of the target argument. But the vtable has to be prepared in advance, so we don’t know which monomorphized version to use.)

And yet, just because report_to is not dyn compatible doesn’t mean that a dyn ReportError would be useless. What if I only plan to call report, as in a function like this?

fn report_all(
    errors: Vec<Error>,
    report: &dyn ReportError,
) {
    for e in errors {
        report.report(e);
    }
}

Rust’s current rules rule out a function like this, but in practice this kind of scenario comes up quite a lot. In fact, it comes up so often that we added a language feature to accommodate it (at least kind of): you can add a where Self: Sized clause to your feature to exempt it from dynamic dispatch. This is the reason that Iterator can be dyn compatible even when it has a bunch of generic helper methods like map and flat_map.

What does all this have to do with AFIDT?

Let me pause here, as I imagine some of you are wondering what all of this “dyn compatibility” stuff has to do with AFIDT. The bottom line is that the requirement that dyn Trait type implements Trait means that we cannot put any kind of “special rules” on dyn dispatch and that is not compatible with requiring a .box operator when you call async functions through a dyn trait. Recall that with our Signal trait, you could call the signal method on an impl Signal without any boxing:

async fn send_signal_1(impl_trait: &impl Signal) {
    impl_trait.signal().await;
}

But when I called it on a dyn Signal, I had to write .box to tell the compiler how to deal with the dyn Future that gets returned:

async fn send_signal_2(dyn_trait: &dyn Signal) {
    dyn_trait.signal().box.await;
}

Indeed, the fact that Signal::signal returns an impl Future but <dyn Signal>::signal returns a dyn Future already demonstrates the problem. All impl Future types are known to be Sized and dyn Future is not, so the type signature of <dyn Signal>::signal is not the same as the type signature declared in the trait. Huh.

Associated type values are needed for dyn compatibility

Today I cannot write a type like dyn Iterator without specifying the value of the associated type Item. To see why this restriction is needed, consider this generic function:

fn drop_all<I: ?Sized + Iterator>(iter: &mut I) {
    while let Some(n) = iter.next() {
        std::mem::drop(n);
    }
}

If you invoked drop_all with an &mut dyn Iterator that did not specify Item, how could the type of n? We wouldn’t have any idea how much space space it needs. But if you invoke drop_all with &mut dyn Iterator<Item = u32>, there is no problem. We don’t know which next method is being called, but we know it’s returning a u32.

Associated type values are limiting

And yet, just as we saw before, the requirement to list associated types can be limiting. If I have a dyn Iterator and I only call size_hint, for example, then why do I need to know the Item type?

fn size_hint(iter: &mut dyn Iterator) -> bool {
    let sh = iter.size_hint();
}

But I can’t write code like this today. Instead I have to make this function generic which basically defeats the whole purpose of using dyn Iterator:

fn size_hint<T>(iter: &mut dyn Iterator<Item = T>) -> bool {
    let sh = iter.size_hint();
}

If we dropped the requirement that every dyn Iterator type implements Iterator, we could be more selective, allowing you to invoke methods that don’t use the Item associated type but disallowing those that do.

A proposal for expanded dyn Trait usability

So that brings us to full proposal to permit dyn Trait in cases where the trait is not fully dyn compatible:

dyn Trait types would be allowed for any trait.7
dyn Trait types would not require associated types to be specified.
dyn compatible methods are exposed as inherent methods on the dyn Trait type. We would disallow access to the method if its signature references associated types not specified on the dyn Trait type.
dyn Trait that specify all of their associated types would be considered to implement Trait if the trait is fully dyn compatible.8

The box keyword

A lot of things get easier if you are willing to call malloc.

– Josh Triplett, recently.

Rust has reserved the box keyword since 1.0, but we’ve never allowed it in stable Rust. The original intention was that the term box would be a generic term to refer to any “smart pointer”-like pattern, so Rc would be a “reference counted box” and so forth. The box keyword would then be a generic way to allocate boxed values of any type; unlike Box::new, it would do “emplacement”, so that no intermediate values were allocated. With the passage of time I no longer think this is such a good idea. But I do see a lot of value in having a keyword to ask the compiler to automatically create boxes. In fact, I see a lot of places where that could be useful.

boxed expressions

The first place is indeed the .box operator that could be used to put a value into a box. Unlike Box::new, using .box would allow the compiler to guarantee that no intermediate value is created, a property called emplacement. Consider this example:

fn main() {
    let x = Box::new([0_u32; 1024]);
}

Rust’s semantics today require (1) allocating a 4KB buffer on the stack and zeroing it; (2) allocating a box in the heap; and then (3) copying memory from one to the other. This is a violation of our Zero Cost Abstraction promise: no C programmer would write code like that. But if you write [0_u32; 1024].box, we can allocate the box up front and initialize it in place.9

The same principle applies calling functions that return an unsized type. This isn’t allowed today, but we’ll need some way to handle it if we want to have async fn return dyn Future. The reason we can’t naively support it is that, in our existing ABI, the caller is responsible for allocating enough space to store the return value and for passing the address of that space into the callee, who then writes into it. But with a dyn Future return value, the caller can’t know how much space to allocate. So they would have to do something else, like passing in a callback that, given the correct amount of space, performs the allocation. The most common cased would be to just pass in malloc.

The best ABI for unsized return values is unclear to me but we don’t have to solve that right now, the ABI can (and should) remain unstable. But whatever the final ABI becomes, when you call such a function in the context of a .box expression, the result is that the callee creates a Box to store the result.10

boxed async functions to permit recursion

If you try to write an async function that calls itself today, you get an error:

async fn fibonacci(a: u32) -> u32 {
    match a {
        0 => 1,
        1 => 2,
        _ => fibonacci(a-1).await + fibonacci(a-2).await
    }
}

The problem is that we cannot determine statically how much stack space to allocate. The solution is to rewrite to a boxed return value. This compiles because the compiler can allocate new stack frames as needed.

fn fibonacci(a: u32) -> Pin<Box<impl Future<Output = u32>>> {
    Box::pin(async move {
        match a {
            0 => 1,
            1 => 2,
            _ => fibonacci(a-1).await + fibonacci(a-2).await
        }
    })
}

But wouldn’t it be nice if we could request this directly?

box async fn fibonacci(a: u32) -> u32 {
    match a {
        0 => 1,
        1 => 2,
        _ => fibonacci(a-1).await + fibonacci(a-2).await
    }
}

boxed structs can be recursive

A similar problem arises with recursive structs:

struct List {
    value: u32,
    next: Option<List>, // ERROR
}

The compiler tells you

error[E0072]: recursive type `List` has infinite size
 --> src/lib.rs:1:1
  |
1 | struct List {
  | ^^^^^^^^^^^
2 |     value: u32,
3 |     next: Option<List>, // ERROR
  |                  ---- recursive without indirection
  |
help: insert some indirection (e.g., a `Box`, `Rc`, or `&`) to break the cycle
  |
3 |     next: Option<Box<List>>, // ERROR
  |                  ++++    +

As it suggestes, to workaround this you can introduce a Box:

struct List {
    value: u32,
    next: Option<Box<List>>,
}

This though is kind of weird because now the head of the list is stored “inline” but future nodes are heap-allocated. I personally usually wind up with a pattern more like this:

struct List {
    data: Box<ListData>
}

struct ListData {
    value: u32,
    next: Option<List>,
}

Now however I can’t create values with List { value: 22, next: None } syntax and I also can’t do pattern matching. Annoying. Wouldn’t it be nice if the compiler just suggest adding a box keyword when you declare the struct:

box struct List {
    value: u32,
    next: Option<List>,
}

and have List { value: 22, next: None } automatically allocate the box for me? The ideal is that the presence of a box is now completely transparent, so I can pattern match and so forth fully transparently:

box struct List {
    value: u32,
    next: Option<List>,
}

fn foo(list: &List) {
    let List { value, next } = list; // etc
}

boxed enums can be recursive and right-sized

Enums too cannot reference themselves. Being able to declare something like this would be really nice:

box enum AstExpr {
    Value(u32),
    If(AstExpr, AstExpr, AstExpr),
    ...
}

In fact, I still remember when I used Swift for the first time. I wrote a similar enum and Xcode helpfully prompted me, “do you want to declare this enum as indirect?” I remember being quite jealous that it was such a simple edit.

However, there is another interesting thing about a box enum. The way I imagine it, creating an instance of the enum would always allocate a fresh box. This means that the enum cannot be changed from one variant to another without allocating fresh storage. This in turn means that you could allocate that box to exactly the size you need for that particular variant.11 So, for your AstExpr, not only could it be recursive, but when you allocate an AstExpr::Value you only need to allocate space for a u32, whereas a AstExpr::If would be a different size. (We could even start to do “tagged pointer” tricks so that e.g. AstExpr::Value is stored without any allocation at all.)

boxed enum variants to avoid unbalanced enum sizes

Another option would to have particular enum variants that get boxed but not the enum as a whole:

enum AstExpr {
    Value(u32),
    box If(AstExpr, AstExpr, AstExpr),
    ...
}

This would be useful in cases you do want to be able to overwrite one enum value with another without necessarily reallocating, but you have enum variants of widely varying size, or some variants that are recursive. A boxed variant would basically be desugared to something like the following:

enum AstExpr {
    Value(u32),
    If(Box<AstExprIf>),
    ...
}

struct AstExprIf(AstExpr, AstExpr, AstExpr);

clippy has a useful lint large_enum_variant that aims to identify this case, but once the lint triggers, it’s not able to offer an actionable suggestion. With the box keyword there’d be a trivial rewrite that requires zero code changes.

box patterns and types

If we’re enabling the use of box elsewhere, we ought to allow it in patterns:

fn foo(s: box Struct) {
    let box Struct { field } = s;
}

Frequently asked questions Isn’t it unfortunate that Box::new(v) and v.box would behave differently?

Under my proposal, v.box would be the preferred form, since it would allow the compiler to do more optimization. And yes, that’s unfortunate, given that there are 10 years of code using Box::new. Not really a big deal though. In most of the cases we accept today, it doesn’t matter and/or LLVM already optimizes it. In the future I do think we should consider extensions to make Box::new (as well as Rc::new and other similar constructors) be just as optimized as .box, but I don’t think those have to block this proposal.

Is it weird to special case box and not handle other kinds of smart pointers?

Yes and no. On the one hand, I would like the ability to declare that a struct is always wrapped in an Rc or Arc. I find myself doing things like the following all too often:

struct Context {
    data: Arc<ContextData>
}

struct ContextData {
    counter: AtomicU32,
}

On the other hand, box is very special. It’s kind of unique in that it represents full ownership of the contents which means a T and Box<T> are semantically equivalent – there is no place you can use T that a Box<T> won’t also work – unless T: Copy. This is not true for T and Rc<T> or most other smart pointers.

For myself, I think we should introduce box now but plan to generalize this concept to other pointers later. For example I’d like to be able to do something like this…

#[indirect(std::sync::Arc)]
struct Context {
    counter: AtomicU32,
}

…where the type Arc would implement some trait to permit allocating, deref’ing, and so forth:

trait SmartPointer: Deref {
    fn alloc(data: Self::Target) -> Self;
}

The original plan for box was that it would be somehow type overloaded. I’ve soured on this for two reasons. First, type overloads make inference more painful and I think are generally not great for the user experience; I think they are also confusing for new users. Finally, I think we missed the boat on naming. Maybe if we had called Rc something like RcBox<T> the idea of “box” as a general name would have percolated into Rust users’ consciousness, but we didn’t, and it hasn’t. I think the box keyword now ought to be very targeted to the Box type.

How does this fit with the “soul of Rust”?

In my [soul of Rust blog post], I talked about the idea that one of the things that make Rust Rust is having allocation be relatively explicit. I’m of mixed minds about this, to be honest, but I do think there’s value in having a property similar to unsafe – like, if allocation is happening, there’ll be a sign somewhere you can find. What I like about most of these box proposals is that they move the box keyword to the declaration – e.g., on the struct/enum/etc – rather than the use. I think this is the right place for it. The major exception, of course, is the “marquee proposal”, invoking async fns in dyn trait. That’s not amazing. But then… see the next question for some early thoughts.

If traits don’t have to be dyn compatible, can we make dyn compatibility opt in?

The way that Rust today detects automatically whether traits should be dyn compatible versus having it be declared is, I think, not great. It creates confusion for users and also permits quiet semver violations, where a new defaulted method makes a trait no longer be dyn compatible. It’s also a source for a lot of soundness bugs over time.

I want to move us towards a place where traits are not dyn compatible by default, meaning that dyn Trait does not implement Trait. We would always allow dyn Trait types and we would allow individual items to be invoked so long as the item itself is dyn compatible.

If you want to have dyn Trait implement Trait, you should declare it, perhaps with a dyn keyword:

dyn trait Foo {
    fn method(&self);
}

This declaration would add various default impls. This would start with the dyn Foo: Foo impl:

impl Foo for dyn Foo /*[1]*/ {
    fn method(&self) {
        <dyn Foo>::method(self) // vtable dispatch
    }

    // [1] actually it would want to cover `dyn Foo + Send` etc too, but I'm ignoring that for now
}

But also, if the methods have suitable signatures, include some of the impls you really ought to have to make a trait that is well-behaved with respect to dyn trait:

impl<T> Foo for Box<T> where T: ?Sized { }
impl<T> Foo for &T where T: ?Sized { }
impl<T> Foo for &mut T where T: ?Sized { }

In fact, if you add in the ability to declare a trait as box, things get very interesting:

box dyn trait Signal {
    async fn signal(&self);
}

I’m not 100% sure how this should work but what I imagine is that dyn Foo would be pointer-sized and implicitly contain a Box behind the scenes. It would probably automatically Box the results from async fn when invoked through dyn Trait, so something like this:

impl Foo for dyn Signal {
    async fn bar(&self) {
        <dyn Signal>::signal(self).box.await
    }
}

I didn’t include this in the main blog post but I think together these ideas would go a long way towards addressing the usability gaps that plague dyn Trait today.

Side note, one interesting thing about Rust’s async functions is that there size must be known at compile time, so we can’t permit alloca-like stack allocation. ↩︎
The box keyword is in fact reserved already, but it’s never been used in stable Rust. ↩︎
Hat tip to Michael Goulet (compiler-errors) for pointing out to me that we can model the virtual dispatch as inherent methods on dyn Trait types. Before I thought we’d have to make a more invasive addition to MIR, which I wasn’t excited about since it suggested the change was more far-reaching. ↩︎
In the future, I think we can expand this definition to include some limited functions that use impl Trait in argument position, but that’s for a future blog post. ↩︎
I’ve noticed that many times when I favor a limited version of something to achieve some aesthetic principle I wind up regretting it. ↩︎
At least, it is not dyn compatible under today’s rules. Convievably it could be made to work but more on that later. ↩︎
This part of the change is similar to what was proposed in RFC #2027, though that RFC was quite light on details (the requirements for RFCs in terms of precision have gone up over the years and I expect we wouldn’t accept that RFC today in its current form). ↩︎
I actually want to change this last clause in a future edition. Instead of having dyn compatibility be determined automically, traits would declare themselves dyn compatible, which would also come with a host of other impls. But that’s worth a separate post all on its own. ↩︎
If you play with this on the playground, you’ll see that the memcpy appears in the debug build but gets optimized away in this very simple case, but that can be hard for LLVM to do, since it requires reordering an allocation of the box to occur earlier and so forth. The .box operator could be guaranteed to work. ↩︎
I think it would be cool to also have some kind of unsafe intrinsic that permits calling the function with other storage strategies, e.g., allocating a known amount of stack space or what have you. ↩︎
We would thus finally bring Rust enums to “feature parity” with OO classes! I wrote a blog post, “Classes strike back”, on this topic back in 2015 (!) as part of the whole “virtual structs” era of Rust design. Deep cut! ↩︎

https://smallcultfollowing.com/babysteps/blog/2025/03/24/box-box-box/

Rust in 2025: Language interop and the extensible compiler

Mar 18, 2025 Updated Mar 18, 2025

Show full content

For many years, C has effectively been the “lingua franca” of the computing world. It’s pretty hard to combine code from two different programming languages in the same process–unless one of them is C. The same could theoretically be true for Rust, but in practice there are a number of obstacles that make that harder than it needs to be. Building out silky smooth language interop should be a core goal of helping Rust to target foundational applications. I think the right way to do this is not by extending rustc with knowledge of other programming languages but rather by building on Rust’s core premise of being an extensible language. By investing in building out an “extensible compiler” we can allow crate authors to create a plethora of ergonomic, efficient bridges between Rust and other languages.

We’ll know we’ve succeeded when…

When it comes to interop…

It is easy to create a Rust crate that can be invoked from other languages and across multiple environments (desktop, Android, iOS, etc). Rust tooling covers the full story from writing the code to publishing your library.
It is easy1 to carve out parts of an existing codebase and replace them with Rust. It is particularly easy to integrate Rust into C/C++ codebases.

When it comes to extensibility…

Rust is host to wide variety of extensions ranging from custom lints and diagnostics (“clippy as a regular library”) to integration and interop (ORMs, languages) to static analysis and automated reasoning^[math].

Lang interop: the least common denominator use case

In my head, I divide language interop into two core use cases. The first is what I call Least Common Denominator (LCD), where people would like to write one piece of code and then use it in a wide variety of environments. This might mean authoring a core SDK that can be invoked from many languages but it also covers writing a codebase that can be used from both Kotlin (Android) and Swift (iOS) or having a single piece of code usable for everything from servers to embedded systems. It might also be creating WebAssembly components for use in browsers or on edge providers.

What distinguishes the LCD use-case is two things. First, it is primarily unidirectional—calls mostly go from the other language to Rust. Second, you don’t have to handle all of Rust. You really want to expose an API that is “simple enough” that it can be expressed reasonably idiomatically from many other languages. Examples of libraries supporting this use case today are uniffi and diplomat. This problem is not new, it’s the same basic use case that WebAssembly components are targeting as well as old school things like COM and CORBA (in my view, though, each of those solutions is a bit too narrow for what we need).

When you dig in, the requirements for LCD get a bit more complicated. You want to start with simple types, yes, but quickly get people asking for the ability to make the generated wrapper from a given language more idiomatic. And you want to focus on calls into Rust, but you also need to support callbacks. In fact, to really integrate with other systems, you need generic facilities for things like logs, metrics, and I/O that can be mapped in different ways. For example, in a mobile environment, you don’t necessarily want to use tokio to do an outgoing networking request. It is better to use the system libraries since they have special cases to account for the quirks of radio-based communication.

To really crack the LCD problem, you also have to solve a few other problems too:

It needs to be easy to package up Rust code and upload it into the appropriate package managers for other languages. Think of a tool like maturin, which lets you bundle up Rust binaries as Python packages.
For some use cases, download size is a very important constraint. Optimizing for size right now is hard to start. What’s worse, your binary has to include code from the standard library, since we can’t expect to find it on the device—and even if we could, we couldn’t be sure it was ABI compatible with the one you built your code with.

Needed: the “serde” of language interop

Obviously, there’s enough here to keep us going for a long time. I think the place to start is building out something akin to the “serde” of language interop: the serde package itself just defines the core trait for serialization and a derive. All of the format-specific details are factored out into other crates defined by a variety of people.

I’d like to see a universal set of conventions for defining the “generic API” that your Rust code follows and then a tool that extracts these conventions and hands them off to a backend to do the actual language specific work. It’s not essential, but I think this core dispatching tool should live in the rust-lang org. All the language-specific details, on the other hand, would live in crates.io as crates that can be created by anyone.

Lang interop: the “deep interop” use case

The second use case is what I call the deep interop problem. For this use case, people want to be able to go deep in a particular language. Often this is because their Rust program needs to invoke APIs implemented in that other language, but it can also be that they want to stub out some part of that other program and replace it with Rust. One common example that requires deep interop is embedded developers looking to invoke gnarly C/C++ header files supplied by vendors. Deep interop also arises when you have an older codebase, such as the Rust for Linux project attempting to integrate Rust into their kernel or companies looking to integrate Rust into their existing codebases, most commonly C++ or Java.

Some of the existing deep interop crates focus specifically on the use case of invoking APIs from the other language (e.g., bindgen and duchess) but most wind up supporting bidirectional interaction (e.g., pyo3, [npapi-rs][], and neon). One interesting example is cxx, which supports bidirectional Rust-C++ interop, but does so in a rather opinionated way, encouraging you to make use of a subset of C++’s features that can be readily mapped (in this way, it’s a bit of a hybrid of LCD and deep interop).

Interop with all languages is important. C and C++ are just more so.

I want to see smooth interop with all languages, but C and C++ are particularly important. This is because they have historically been the language of choice for foundational applications, and hence there is a lot of code that we need to integrate with. Integration with C today in Rust is, in my view, “ok” – most of what you need is there, but it’s not as nicely integrated into the compiler or as accessible as it should be. Integration with C++ is a huge problem. I’m happy to see the Foundation’s Rust-C++ Interoperability Initiative as well a projects like Google’s crubit and of course the venerable cxx.

Needed: “the extensible compiler”

The traditional way to enable seamless interop with another language is to “bake it in” i.e., Kotlin has very smooth support for invoking Java code and Swift/Zig can natively build C and C++. I would prefer for Rust to take a different path, one I call the extensible compiler. The idea is to enable interop via, effectively, supercharged procedural macros that can integrate with the compiler to supply type information, generate shims and glue code, and generally manage the details of making Rust “play nicely” with another language.

In some sense, this is the same thing we do today. All the crates I mentioned above leverage procedural macros and custom derives to do their job. But procedural macrods today are the “simplest thing that could possibly work”: tokens in, tokens out. Considering how simplistic they are, they’ve gotten us remarkably, but they also have distinct limitations. Error messages generated by the compiler are not expressed in terms of the macro input but rather the Rust code that gets generated, which can be really confusing; macros are not able to access type information or communicate information between macro invocations; macros cannot generate code on demand, as it is needed, which means that we spend time compiling code we might not need but also that we cannot integrate with monomorphization. And so forth.

I think we should integrate procedural macros more deeply into the compiler.2 I’d like macros that can inspect types, that can generate code in response to monomorphization, that can influence diagnostics3 and lints, and maybe even customize things like method dispatch rules. That will allow all people to author crates that provide awesome interop with all those languages, but it will also help people write crates for all kinds of other things. To get a sense for what I’m talking about, check out F#’s type providers and what they can do.

The challenge here will be figuring out how to keep the stabilization surface area as small as possible. Whenever possible I would look for ways to have macros communicate by generating ordinary Rust code, perhaps with some small tweaks. Imagine macros that generate things like a “virtual function”, that has an ordinary Rust signature but where the body for a particular instance is constructed by a callback into the procedural macro during monomorphization. And what format should that body take? Ideally, it’d just be Rust code, so as to avoid introducing any new surface area.

Not needed: the Rust Evangelism Task Force

So, it turns out I’m a big fan of Rust. And, I ain’t gonna lie, when I see a prominent project pick some other language, at least in a scenario where Rust would’ve done equally well, it makes me sad. And yet I also know that if every project were written in Rust, that would be so sad. I mean, who would we steal good ideas from?

I really like the idea of focusing our attention on making Rust work well with other languages, not on convincing people Rust is better 4. The easier it is to add Rust to a project, the more people will try it – and if Rust is truly a better fit for them, they’ll use it more and more.

Conclusion: next steps

This post pitched out a north star where

a single Rust library can be easily used across many languages and environments;
Rust code can easily call and be called by functions in other languages;
this is all implemented atop a rich procedural macro mechanism that lets plugins inspect type information, generate code on demand, and so forth.

How do we get there? I think there’s some concrete next steps:

Build out, adopt, or extend an easy system for producing “least common denominator” components that can be embedded in many contexts.
Support the C++ interop initiatives at the Foundation and elsewhere. The wheels are turning: tmandry is the point-of-contact for project goal for that, and we recently held our first lang-team design meeting on the topic (this document is a great read, highly recommended!).
Look for ways to extend proc macro capabilities and explore what it would take to invoke them from other phases of the compiler besides just the very beginning.
- An aside: I also think we should extend rustc to support compiling proc macros to web-assembly and use that by default. That would allow for strong sandboxing and deterministic execution and also easier caching to support faster build times.

Well, as easy as it can be. ↩︎
Rust’s incremental compilation system is pretty well suited to this vision. It works by executing an arbitrary function and then recording what bits of the program state that function looks at. The next time we run the compiler, we can see if those bits of state have changed to avoid re-running the function. The interesting thing is that this function could as well be part of a procedural macro, it doesn’t have to be built-in to the compiler. ↩︎
Stuff like the diagnostics tool attribute namespace is super cool! More of this! ↩︎
I’ve always been fond of this article Rust vs Go, “Why they’re better together”. ↩︎

https://smallcultfollowing.com/babysteps/blog/2025/03/18/lang-interop-extensibility/

Rust in 2025: Targeting foundational software

Mar 10, 2025 Updated Mar 10, 2025

Show full content

Rust turns 10 this year. It’s a good time to take a look at where we are and where I think we need to be going. This post is the first in a series I’m calling “Rust in 2025”. This first post describes my general vision for how Rust fits into the computing landscape. The remaining posts will outline major focus areas that I think are needed to make this vision come to pass. Oh, and fair warning, I’m expecting some controversy along the way—at least I hope so, since otherwise I’m just repeating things everyone knows.

My vision for Rust: foundational software

I see Rust’s mission as making it dramatically more accessible to author and maintain foundational software. By foundational I mean the software that underlies everything else. You can already see this in the areas where Rust is highly successful: CLI and development tools that everybody uses to do their work and which are often embedded into other tools1; cloud platforms that people use to run their applications2; embedded devices that are in the things around (and above) us; and, increasingly, the kernels that run everything else (both Windows and Linux!).

Foundational software needs performance, reliability—and productivity

The needs of foundational software have a lot in common with all software, but everything is extra important. Reliability is paramount, because when the foundations fail, everything on top fails also. Performance overhead is to be avoided because it becomes a floor on the performance achievable by the layers above you.

Traditionally, achieving the extra-strong requirements of foundational software has meant that you can’t do it with “normal” code. You had two choices. You could use C or C++3, which give great power but demand perfection in response4. Or, you could use a higher-level language like Java or Go, but in a very particular way designed to keep performance high. You have to avoid abstractions and conveniences and minimizing allocations so as not to trigger the garbage collector.

Rust changed the balance by combining C++’s innovations in zero-cost abstractions with a type system that can guarantee memory safety. The result is a pretty cool tool, one that (often, at least) lets you write high-level code with low-level performance and without fear of memory safety errors.

Empowerment and lowering the barrier to entry

In my Rust talks, I often say that type systems and static checks sound to most developers like “spinach”, something their parents forced them to eat because it was “good for them”, but not something anybody wants. The truth is that type systems are like spinach—popeye spinach. Having a type system to structure your thinking makes you more effective, regardless of your experience level. If you are a beginner, learning the type system helps you learn how to structure software for success. If you are an expert, the type system helps you create structures that will catch your mistakes faster (as well as those of your less experienced colleagues). Yehuda Katz sometimes says, “When I’m feeling alert, I build abstractions that will help tired Yehuda be more effective”, which I’ve always thought was a great way of putting it.

What about non-foundational software?

When I say that Rust’s mission is to target foundational software, I don’t mean that’s all it’s good for. Projects like Dioxus, Tauri, and Leptos are doing fascinating, pioneering work pushing the boundaries of Rust into higher-level applications like GUIs and Webpages. I don’t believe this kind of high-level development will ever be Rust’s sweet spot. But that doesn’t mean I think we should ignore them—in fact, quite the opposite.

Stretch goals are how you grow

The traditional thinking goes that, because foundational software often needs control over low-level details, it’s not as important to focus on accessibility and ergonomics. In my view, though, the fact that foundational software needs control over low-level details only makes it more important to try and achieve good ergonomics. Anything you can do to help the developer focus on the details that matter most will make them more productive.

I think projects that stretch Rust to higher-level areas, like Dioxus, Tauri, and Leptos, are a great way to identify opportunities to make Rust programming more convenient. These opportunities then trickle down to make Rust easier to use for everyone. The trick is to avoid losing the control and reliability that foundational applications need along the way (and it ain’t always easy).

Cover the whole stack

There’s another reason to make sure that higher-level applications are pleasant in Rust: it means that people can build their entire stack using one technology. I’ve talked to a number of people who expected just to use Rust for one thing, say a tail-latency-sensitive data plane service, but they wound up using it for everything. Why? Because it turned out that, once they learned it, Rust was quite productive and using one language meant they could share libraries and support code. Put another way, simple code is simple no matter what language you build it in.5

“Smooth, iterative deepening”

The other lesson I’ve learned is that you want to enable what I think of as smooth, iterative deepening. This rather odd phrase is the one that always comes to my mind, somehow. The idea is that a user’s first experience should be simple–they should be able to get up and going quickly. As they get further into their project, the user will find places where it’s not doing what they want, and they’ll need to take control. They should be able to do this in a localized way, changing one part of their project without disturbing everything else.

Smooth, iterative deepening sounds easy but is in fact very hard. Many projects fail either because the initial experience is hard or because the step from simple-to-control is in fact more like scaling a cliff, requiring users to learn a lot of background material. Rust certainly doesn’t always succeed–but we succeed enough, and I like to think we’re always working to do better.

What’s to come

This is the first post of the series. My current plan6 is to post four follow-ups that cover what I see as the core investments we need to make to improve Rust’s fit for foundational software. In my mind, the first three talk about how we should double down on some of Rust’s core values:

achieving smooth language interop by doubling down on extensibility;
extending the type system to achieve clarity of purpose;
leveling up the Rust ecosystem by building out better guidelines, tools, and leveraging the Rust Foundation.

After that, I’ll talk about the Rust open-source organization and what I think we should be doing there to make contributing to and maintaining Rust as accessible and, dare I say it, joyful as we can.

Plenty of people use ripgrep, but did you know that when you do full text search in VSCode, you are also using ripgrep? And of course Deno makes heavy use of Rust, as does a lot of Python tooling, like the uv package manager. The list goes on and on. ↩︎
What do AWS, Azure, CloudFlare, and Fastly all have in common? They’re all big Rust users. ↩︎
Rod Chapman tells me I should include Ada. He’s not wrong, particularly if you are able to use SPARK to prove strong memory safety (and stronger properties, like panic freedom or even functional correctness). But Ada’s never really caught on broadly, although it’s very successful in certain spaces. ↩︎
Alas, we are but human. ↩︎
Well, that’s true if the language meets a certain base bar. I’d say that even “simple” code in C isn’t all that simple, given that you don’t even have basic types like vectors and hashmaps available. ↩︎
I reserve the right to change it as I go! ↩︎

https://smallcultfollowing.com/babysteps/blog/2025/03/10/rust-2025-intro/

View types redux and abstract fields

Feb 25, 2025 Updated Feb 25, 2025

Show full content

A few years back I proposed view types as an extension to Rust’s type system to let us address the problem of (false) inter-procedural borrow conflicts. The basic idea is to introduce a “view type” {f1, f2} Type1, meaning “an instance of Type where you can only access the fields f1 or f2”. The main purpose is to let you write function signatures like & {f1, f2} self or &mut {f1, f2} self that define what fields a given type might access. I was thinking about this idea again and I wanted to try and explore it a bit more deeply, to see how it could actually work, and to address the common question of how to have places in types without exposing the names of private fields.

Example: the Data type

The Data type is going to be our running example. The Data type collects experiments, each of which has a name and a set of f32 values. In addition to the experimental data, it has a counter, successful, which indicates how many measurements were successful.

struct Data {
    experiments: HashMap<String, Vec<f32>>,
    successful: u32,
}

There are some helper functions you can use to iterate over the list of experiments and read their data. All of these return data borrowed from self. Today in Rust I would typically leverage lifetime elision, where the & in the return type is automatically linked to the &self argument:

impl Data {
    pub fn experiment_names(
        &self,
    ) -> impl Iterator<Item = &String> {
       self.experiments.keys()
    }

    pub fn for_experiment(
        &self, 
        experiment: &str,
    ) -> &[f32] {
       experiments.get(experiment).unwrap_or(&[])
    }
}

Tracking successful experiments

Now imagine that Data has methods for reading and modifying the counter of successful experiments:

impl Data {
    pub fn successful(&self) -> u32 {
        self.successful
    }

    pub fn add_successful(&mut self) {
        self.successful += 1;
    }
}

Today, “aggregate” types like Data present a composition hazard

The Data type as presented thus far is pretty sensible, but it can actually be a pain to use. Suppose you wanted to iterate over the experiments, analyze their data, and adjust the successful counter as a result. You might try writing the following:

fn count_successful_experiments(data: &mut Data) {
    for n in data.experiment_names() {
        if is_successful(data.for_experiment(n)) {
            data.add_successful(); // ERROR: data is borrowed here
        }
    }
}

Experienced Rustaceans are likely shaking their head at this point—in fact, the previous code will not compile. What’s wrong? Well, the problem is that experiment_names returns data borrowed from self which then persists for the duration of the loop. Invoking add_successful then requires an &mut Data argument, which causes a conflict.

The compiler is indeed flagging a reasonable concern here. The risk is that add_successful could mutate the experiments map while experiment_names is still iterating over it. Now, we as code authors know that this is unlikely — but let’s be honest, it may be unlikely now, but it’s not impossible that as Data evolves somebody might add some kind of logic into add_successful that would mutate the experiments map. This is precisely the kind of subtle interdependency that can make an innocuous “but it’s just one line!” PR cause a massive security breach. That’s all well and good, but it’s also very annoying that I can’t write this code.

Using view types to flag what is happening

The right fix here is to have a way to express what fields may be accessed in the type system. If we do this, then we can get the code to compile today and prevent future PRs from introducing bugs. This is hard to do with Rust’s current system, though, as types do not have any way of talking about fields, only spans of execution-time (“lifetimes”).

With view types, though, we can change the signature from &self to &{experiments} self. Just as &self is shorthand for self: &Data, this is actually shorthand for self: & {experiments} Data.

impl Data {
    pub fn experiment_names(
       & {experiments} self,
    ) -> impl Iterator<Item = &String> {
       self.experiments.keys()
    }


    pub fn for_experiment(
        & {experiments} self,
        experiment: &str,
    ) -> &[f32] {
        self.experiments.get(experiment).unwrap_or(&[])
    }
}

We would also modify the add_successful method to flag what field it needs:

impl Data {
    pub fn add_successful(
        self: &mut {successful} Self,
    ) {
       self.successful += 1;
    }
}

Getting a bit more formal

The idea of this post was to sketch out how view types could work in a slightly more detailed way. The basic idea is to extend Rust’s type grammar with a new type…

T = &’a mut? T
  | [T]
  | Struct<...>
  | …
  | {field-list} T // <— view types

We would also have some kind of expression for defining a view onto a place. This would be a place expression. For now I will write E = {f1, f2} E to define this expression, but that’s obviously ambiguous with Rust blocks. So for example you could write…

let mut x: (String, String) = (String::new(), String::new());
let p: &{0} (String, String) = & {0} x;
let q: &mut {1} (String, String) = &mut {1} x;

…to get a reference p that can only access the field 0 of the tuple and a reference q that can only access field 1. Note the difference between &{0}x, which creates a reference to the entire tuple but with limited access, and &x.0, which creates a reference to the field itself. Both have their place.

Checking field accesses against view types

Consider this function from our example:

impl Data {
    pub fn add_successful(
        self: &mut {successful} Self,
    ) {
       self.successful += 1;
    }
}

How would we type check the self.successful += 1 statement? Today, without view types, typing an expression like self.successful begins by getting the type of self, which is something like &mut Data. We then “auto-deref”, looking for the struct type within. That would bring us to Data, at which point we would check to see if Data defines a field successful.

To integrate view types, we have to track both the type of data being accessed and the set of allowed fields. Initially we have variable self with type &mut {successful} Data and allow set *. The deref would bring us to {successful} Data (allow-set remains *). Traversing a view type modifies the allow-set, so we go from * to {successful} (to be legal, every field in the view must be allowed). We now have the type Data. We would then identify the field successful as both a member of Data and a member of the allow-set, and so this code would be successful.

If however you tried to modify a function to access a field not declared as part of its view, e.g.,

impl Data {
    pub fn add_successful(
        self: &mut {successful} Self,
    ) {
       assert!(!self.experiments.is_empty()); // <— modified to include this
       self.successful += 1;
    }
}

the self.experiments type-checking would now fail, because the field experiments would not be a member of the allow-set.

We need to infer allow sets

A more interesting problem comes when we type-check a call to add_successful(). We had the following code:

fn count_successful_experiments(data: &mut Data) {
    for n in data.experiment_names() {
        if is_successful(data.for_experiment(n)) {
            data.add_successful(); // Was error, now ok.
        }
    }
}

Consider the call to data.experiment_names(). In the compiler today, method lookup begins by examining data, of type &mut Data, auto-deref’ing by one step to yield Data, and then auto-ref’ing to yield &Data. The result is this method call is desugared to a call like Data::experiment_names(&*data).

With view types, when introducing the auto-ref, we would also introduce a view operation. So we would get Data::experiment_names(& {?X} *data). What is this {?X}? That indicates that the set of allowed fields has to be inferred. A place-set variable ?X can be inferred to a set of fields or to * (all fields).

We would integrate these place-set variables into inference, so that {?A} Ta <: {?B} Tb if ?B is a subset of ?A and Ta <: Tb (e.g., [x, y] Foo <: [x] Foo). We would also for dropping view types from subtypes, e.g., {*} Ta <: Tb if Ta <: Tb.

Place-set variables only appear as an internal inference detail, so users can’t (e.g.) write a function that is generic over a place-set, and the only kind of constraints you can get are subset (P1 <= P2) and inclusion (f in P1). I think it should be relatively straightforward to integrate these into HIR type check inference. When generalizing, we can replace each specific view set with a variable, just as we do for lifetimes. When we go to construct MIR, we would always know the precise set of fields we wish to include in the view. In the case where the set of fields is * we can also omit the view from the MIR.

Abstract fields

So, view types allow us to address these sorts of conflicts by making it more explicit what sets of types we are going to access, but they introduce a new problem — does this mean that the names of our private fields become part of our interface? That seems obviously undesirable.

The solution is to introduce the idea of abstract2 fields. An abstract field is a kind of pretend field, one that doesn’t really exist, but which you can talk about “as if” it existed. It lets us give symbolic names to data.

Abstract fields would be defined as aliases for a set of fields, like pub abstract field_name = (list-of-fields). An alias defines a public symbolic names for a set of fields.

We could therefore define two aliases for Data, one for the set of experiments and one for the count of successful experiments. I think it be useful to allow these names to alias actual field names, as I think that in practice the compiler can always tell which set to use, but I would require that if there is an alias, then the abstract field is aliased to the actual field with the same name.

struct Data {
    pub abstract experiments = experiments,
    experiments: HashMap<String, Vec<f32>>,

    pub abstract successful = successful,
    successful: u32,
}

Now the view types we wrote earlier (& {experiments} self, etc) are legal but they refer to the abstract fields and not the actual fields.

Abstract fields permit refactoring

One nice property of abstract fields is that they permit refactoring. Imagine that we decide to change Data so that instead of storing experiments as a Map<String, Vec<f32>>, we put all the experimental data in one big vector and store a range of indices in the map, like Map<String, (usize, usize)>. We can do that no problem:

struct Data {
    pub abstract experiments = (experiment_names, experiment_data),
    experiment_indices: Map<String, (usize, usize)>,
    experiment_data: Vec<f32>,

    // ...
}

We would still declare methods like &mut {experiments} self, but the compiler now understands that the abstract field experiments can be expanded to the set of private fields.

Frequently asked questions Can abstract fields be mapped to an empty set of fields?

Yes, I think it should be possible to define pub abstract foo; to indicate the empty set of fields.

How do view types interact with traits and impls?

Good question. There is no necessary interaction, we could leave view types as simply a kind of type. You might do interesting things like implement Deref for a view on your struct:

struct AugmentedData {
    data: Vec<u32>,
    summary: u32,
}

impl Deref for {data} AugmentedData {
    type Target = [u32];

    fn deref(&self) -> &[u32] {
        // type of `self` is `&{data} AugmentedData`
        &self.data
    }
}

OK, you don’t need to integrate abstract fields with traits, but could you?

Yes! And it’d be interesting. You could imagine declaring abstract fields as trait members that can appear in its interface:

trait Interface {
    abstract data1;
    abstract data2;


    fn get_data1(&{data1} self) -> u32;
    fn get_data2(&{data2} self) -> u32;
}

You could then define those fields in an impl. You can even map some of them to real fields and leave some as purely abstract:

struct OneCounter {
    counter: u32,
}

impl Interface for OneCounter {
    abstract data1 = counter;
    abstract data2;

    fn get_data1(&{counter} self) -> u32 {
        self.counter
    }

    fn get_data2(&{data2} self) -> u32 {
        0 // no fields needed
    }
}

Could view types include more complex paths than just fields?

Although I wouldn’t want to at first, I think you could permit something like {foo.bar} Baz and then, given something like &foo.bar, you’d get the type &{bar} Baz, but I’ve not really thought it more deeply than that.

Can view types be involved in moves?

Yes! You should be able to do something like

struct Strings {
    a: String,
    b: String,
    c: String,
}

fn play_games(s: Strings) {
    // Moves the struct `s` but only the fields `a` and `c`
    let t: {a, c} Strings = {a, c} s;

    println!(“{s.a}”); // ERROR: s.a has been moved
    println!(“{s.b}”); // OK.
    println!(“{s.c}”); // ERROR: s.a has been moved

    println!(“{t.a}”); // OK.
    println!(“{t.b}”); // ERROR: no access to field `b`.
    println!(“{t.c}”); // OK.
}

Why did you have a subtyping rules to drop view types from sub- but not super-types?

I described the view type subtyping rules as two rules:

{?A} Ta <: {?B} Tb if ?B is a subset of ?A and Ta <: Tb
{*} Ta <: Tb if Ta <: Tb

In principle we could have a rule like Ta <: {*} Tb if Ta <: Tb — this rule would allow “introducing” a view type into the supertype. We may wind up needing such a rule but I didn’t want it because it meant that code like this really ought to compile (using the Strings type from the previous question):

fn play_games(s: Strings) {
   let t: {a, c} Strings = s; // <— just `= s`, not `= {a, c} s`.
}

I would expect this to compile because

{a, c} Strings <: {*} Strings <: Strings

but I kind of don’t want it to compile.

Are there other uses for abstract fields?

Yes! I think abstract fields would also be useful in two other ways (though we have to stretch their definition a bit). I believe it’s important for Rust to grow stronger integration with theorem provers; I don’t expect these to be widely used, but for certain key libraries (stdlib, zerocopy, maybe even tokio) it’d be great to be able to mathematically prove type safety. But mathematical proof systems often require a notion of ghost fields — basically logical state that doesn’t really exist at runtime but which you can talk about in a proof. A ghost field is essentially an abstract field that is mapped to an empty set of fields and which has a type. For example you might declare a BeanCounter struct with two abstract fields (a, b) and one real field that stores their sum:

struct BeanCounter {
    pub abstract a: u32,
    pub abstract b: u32,
    sum: u32, // <— at runtime, we only store the sum
}

then when you create BeanCounter you would specify a value for those fields. The value would perhaps be written using something like an abstract block, indicating that in fact the code within will not be executed (but must still be type checkable):

impl BeanCounter {
    pub fn new(a: u32, b: u32) -> Self {
        Self { a: abstract { a }, b: abstract { b }, sum: a + b }
    }
}

Providing abstract values is useful because it lets the theorem prover act “as if” the code was there for the purpose of checking pre- and post-conditions and other kinds of contracts.

Could we use abstract fields to replace phantom data?

Yes! I imagine that instead of a: PhantomData<T> you could do abstract a: T, but that would mean we’d have to have some abstract initializer. So perhaps we permit an anonymous field abstract _: T, in which case you wouldn’t be required to provide an initializer, but you also couldn’t name it in contracts.

So what are all the parts to an abstract field?

I would start with just the simplest form of abstract fields, which is an alias for a set of real fields. But to extend to cover ghost fields or PhantomData, you want to support the ability to declare a type for abstract fields (we could say that the default if ()). For fields with non-() types, you would be expected to provide an abstract value in the struct constructor. To conveniently handle PhantomData, we could add anonymous abstract fields where no type is needed.

Should we permit view types on other types?

I’ve shown view types attached to structs and tuples. Conceivably we could permit them elsewhere, e.g., {0} &(String, String) might be equivalent to &{0} (String, String). I don’t think that’s needed for now and I’d make it ill-formed, but it could be reasonable to support at some point.

Conclusion

This concludes my exploration through view types. The post actually changed as I wrote it — initially I expected to include place-based borrows, but it turns out we didn’t really need those. I also initially expected view types to be a special case of struct types, and that indeed might simplify things, but I wound up concluding that they are a useful type constructor on their own. In particular if we want to integrate them into traits it will be necessary for them to be applied to generics and the rest.≈g

In terms of next steps, I’m not sure, I want to think about this idea, but I do feel we need to address this gap in Rust, and so far view types seem like the most natural. I think what could be interesting is to prototype them in a-mir-formality as it evolves to see if there are other surprises that arise.

I’m not really proposing this syntax—among other things, it is ambiguous in expression position. I’m not sure what the best syntax is, though! It’s an important question, but not one I will think hard about here. ↩︎
I prefer the name ghost fields, because it’s spooky, but abstract is already a reserved keyword. ↩︎

https://smallcultfollowing.com/babysteps/blog/2025/02/25/view-types-redux/

Rust 2024 Is Coming

Feb 20, 2025 Updated Feb 20, 2025

Show full content

So, a little bird told me that Rust 2024 is going to become stable today, along with Rust 1.85.0. In honor of this momentous event, I have penned a little ditty that I’d like to share with you all. Unfortunately, for those of you who remember Rust 2021’s “Edition: The song”, in the 3 years between Rust 2021 and now, my daughter has realized that her father is deeply uncool1 and so I had to take this one on solo2. Anyway, enjoy! Or, you know, suffer. As the case may be.

Video

Watch the movie embedded here, or watch it on YouTube:

Lyrics

In ChordPro format, for those of you who are inspired to play along.

{title: Rust 2024}
{subtitle: }

{key: C}

[Verse 1]
[C] When I got functions that never return
I write an exclamation point [G]
But use it for an error that could never be
the compiler [C] will yell at me

[Verse 2]
[C] We Rust designers, we want that too
[C7] But we had to make a [F] change
[F] That will be [Fm]better
[C] Oh so much [A]better
[D] in Rust Twenty [G7]Twenty [C]Four

[Bridge]
[Am] ... [Am] But will my program [E] build?
[Am] Yes ... oh that’s [D7] for sure
[F] edi-tions [G] are [C] opt in

[Verse 3]
[C] Usually when I return an `impl Trait`
everything works out fine [G]
but sometimes I need a tick underscore
and I don’t really [C] know what that’s for

[Verse 4]
[C] We Rust designers we do agree
[C7] That was con- [F] fusing 
[F] But that will be [Fm]better
[C] Oh so much [A]better
[D] in Rust Twenty [G7]Twenty [C]Four

[Bridge 2]
[Am] Cargo fix will make the changes
automatically [G] Oh that sure sounds great...
[Am] but wait... [Am] my de-pen-denc-[E]-ies
[Am] Don’t worry e-[D7]ditions
[F] inter [G] oper [C] ate

[Verse 5]
[C] Whenever I match on an ampersand T
The borrow [G] propagates
But where do I put the ampersand
when I want to [C] copy again?

[Verse 6]
[C] We Rust designers, we do agree
[C7] That really had to [F] change
[F] That will be [Fm]better
[C] Oh so much [A]better
[D] in Rust Twenty [G7]Twenty [C]Four

[Outro]
[F] That will be [Fm]better
[C] Oh so much [A]better
[D] in Rust Twenty [G7]Twenty [C]Four

One more time!

[Half speed]
[F] That will be [Fm]better
[C] Oh so much [A]better
[D] in Rust Twenty [G7]Twenty [C]Four

It was bound to happen eventually. ↩︎
Actually, I had a plan to make this a duet with somebody who shall remain nameless (they know who they are). But I was too lame to get everything done on time. In fact, I may or may not have realized “Oh, shit, I need to finish this recording!” while in the midst of a beer with Florian Gilcher last night. Anyway, sorry, would-be-collaborator-I -was-really-looking-forward-to-playing-with! Next time! ↩︎

https://smallcultfollowing.com/babysteps/blog/2025/02/20/rust-2024-is-coming/

How I learned to stop worrying and love the LLM

Feb 10, 2025 Updated Feb 10, 2025

Show full content

I believe that AI-powered development tools can be a game changer for Rust—and vice versa. At its core, my argument is simple: AI’s ability to explain and diagnose problems with rich context can help people get over the initial bump of learning Rust in a way that canned diagnostics never could, no matter how hard we try. At the same time, rich type systems like Rust’s give AIs a lot to work with, which could be used to help them avoid hallucinations and validate their output. This post elaborates on this premise and sketches out some of the places where I think AI could be a powerful boost.

Perceived learning curve is challenge #1 for Rust

Is Rust good for every project? No, of course not. But it’s absolutely great for some things—specifically, building reliable, robust software that performs well at scale. This is no accident. Rust’s design is intended to surface important design questions (often in the form of type errors) and to give users the control to fix them in whatever way is best.

But this same strength is also Rust’s biggest challenge. Talking to people within Amazon about adopting Rust, perceived complexity and fear of its learning curve is the biggest hurdle. Most people will say, “Rust seems interesting, but I don’t need it for this problem”. And you know, they’re right! They don’t need it. But that doesn’t mean they wouldn’t benefit from it.

One of Rust’s big surprises is that, once you get used to it, it’s “surprisingly decent” at very large number of things beyond what it was designed for. Simple business logic and scripts can be very pleasant in Rust. But the phase “once you get used to it” in that sentence is key, since most people’s initial experience with Rust is confusion and frustration.

Rust likes to tell you no (but it’s for your own good)

Some languages are geared to say yes—that is, given any program, they aim to run it and do something. JavaScript is of course the most extreme example (no semicolons? no problem!) but every language does this to some degree. It’s often quite elegant. Consider how, in Python, you write vec[-1] to get the last element in the list: super handy!

Rust is not (usually) like this. Rust is geared to say no. The compiler is just itching for a reason to reject your program. It’s not that Rust is mean: Rust just wants your program to be as good as it can be. So we try to make sure that your program will do what you want (and not just what you asked for). This is why vec[-1], in Rust, will panic: sure, giving you the last element might be convenient, but how do we know you didn’t have an off-by-one bug that resulted in that negative index?1

But that tendency to say no means that early learning can be pretty frustrating. For most people, the reward from programming comes from seeing their program run—and with Rust, there’s a lot of niggling details to get right before your program will run. What’s worse, while those details are often motivated by deep properties of your program (like data races), the way they are presented is as the violation of obscure rules, and the solution (“add a *”) can feel random.

Once you get the hang of it, Rust feels great, but getting there can be a pain. I heard a great phrase from someone at Amazon to describe this: “Rust: the language where you get the hangover first”.3

AI today helps soften the learning curve

My favorite thing about working at Amazon is getting the chance to talk to developers early in their Rust journey. Lately I’ve noticed an increasing trend—most are using Q Developer. Over the last year, Amazon has been doing a lot of internal promotion of Q Developer, so that in and of itself is no surprise, but what did surprise me a bit is hearing from developers the way that they use it.

For most of them, the most valuable part of Q Dev is authoring code but rather explaining it. They ask it questions like “why does this function take an &T and not an Arc<T>?” or “what happens when I move a value from one place to another?”. Effectively, the LLM becomes an ever-present, ever-patient teacher.4

Scaling up the Rust expert

Some time back I sat down with an engineer learning Rust at Amazon. They asked me about an error they were getting that they didn’t understand. “The compiler is telling me something about ‘static, what does that mean?” Their code looked something like this:

async fn log_request_in_background(message: &str) {
    tokio::spawn(async move {
        log_request(message);
    });
}

And the compiler was telling them:

error[E0521]: borrowed data escapes outside of function
 --> src/lib.rs:2:5
  |
1 |   async fn log_request_in_background(message: &str) {
  |                                      -------  - let's call the lifetime of this reference `'1`
  |                                      |
  |                                      `message` is a reference that is only valid in the function body
2 | /     tokio::spawn(async move {
3 | |         log_request(message);
4 | |     });
  | |      ^
  | |      |
  | |______`message` escapes the function body here
  |        argument requires that `'1` must outlive `'static`

This is a pretty good error message! And yet it requires significant context to understand it (not to mention scrolling horizontally, sheesh). For example, what is “borrowed data”? What does it mean for said data to “escape”? What is a “lifetime” and what does it mean that “'1 must outlive 'static”? Even assuming you get the basic point of the message, what should you do about it?

The fix is easy… if you know what to do

Ultimately, the answer to the engineer’s problem was just to insert a call to clone5. But deciding on that fix requires a surprisingly large amount of context. In order to figure out the right next step, I first explained to the engineer that this confusing error is, in fact, what it feels like when Rust saves your bacon, and talked them through how the ownership model works and what it means to free memory. We then discussed why they were spawning a task in the first place (the answer: to avoid the latency of logging)—after all, the right fix might be to just not spawn at all, or to use something like rayon to block the function until the work is done.

Once we established that the task needed to run asynchronously from its parent, and hence had to own the data, we looked into changing the log_request_in_background function to take an Arc<String> so that it could avoid a deep clone. This would be more efficient, but only if the caller themselves could cache the Arc<String> somewhere. It turned out that the origin of this string was in another team’s code and that this code only returned an &str. Refactoring that code would probably be the best long term fix, but given that the strings were expected to be quite short, we opted to just clone the string.

You can learn a lot from a Rust error

An error message is often your first and best chance to teach somebody something.—Esteban Küber (paraphrased)

Working through this error was valuable. It gave me a chance to teach this engineer a number of concepts. I think it demonstrates a bit of Rust’s promise—the idea that learning Rust will make you a better programmer overall, regardless of whether you are using Rust or not.

Despite all the work we have put into our compiler error messages, this kind of detailed discussion is clearly something that we could never achieve. It’s not because we don’t want to! The original concept for --explain, for example, was to present a customized explanation of each error was tailored to the user’s code. But we could never figure out how to implement that.

And yet tailored, in-depth explanation is absolutely something an LLM could do. In fact, it’s something they already do, at least some of the time—though in my experience the existing code assistants don’t do nearly as good a job with Rust as they could.

What makes a good AI opportunity?

Emery Berger is a professor at UMass Amherst who has been exploring how LLMs can improve the software development experience. Emery emphasizes how AI can help close the gap from “tool to goal”. In short, today’s tools (error messages, debuggers, profilers) tell us things about our program, but they stop there. Except in simple cases, they can’t help us figure out what to do about it—and this is where AI comes in.

When I say AI, I am not talking (just) about chatbots. I am talking about programs that weave LLMs into the process, using them to make heuristic choices or proffer explanations and guidance to the user. Modern LLMs can also do more than just rely on their training and the prompt: they can be given access to APIs that let them query and get up-to-date data.

I think AI will be most useful in cases where solving the problem requires external context not available within the program itself. Think back to my explanation of the 'static error, where knowing the right answer depended on how easy/hard it would be to change other APIs.

Where I think Rust should leverage AI

I’ve thought about a lot of places I think AI could help make working in Rust more pleasant. Here is a selection.

Deciding whether to change the function body or its signature

Consider this code:

fn get_first_name(&self, alias: &str) -> &str {
    alias
}

This function will give a type error, because the signature (thanks to lifetime elision) promises to return a string borrowed from self but actually returns a string borrowed from alias. Now…what is the right fix? It’s very hard to tell in isolation! It may be that in fact the code was meant to be &self.name (in which case the current signature is correct). Or perhaps it was meant to be something that sometimes returns &self.name and sometimes returns alias, in which case the signature of the function was wrong. Today, we take our best guess. But AI could help us offer more nuanced guidance.

Translating idioms from one language to another

People often ask me questions like “how do I make a visitor in Rust?” The answer, of course, is “it depends on what you are trying to do”. Much of the time, a Java visitor is better implemented as a Rust enum and match statements, but there is a time and a place for something more like a visitor. Guiding folks through the decision tree for how to do non-trivial mappings is a great place for LLMs.

Figuring out the right type structure

When I start writing a Rust program, I start by authoring type declarations. As I do this, I tend to think ahead to how I expect the data to be accessed. Am I going to need to iterate over one data structure while writing to another? Will I want to move this data to another thread? The setup of my structures will depend on the answer to these questions.

I think a lot of the frustration beginners feel comes from not having a “feel” yet for the right way to structure their programs. The structure they would use in Java or some other language often won’t work in Rust.

I think an LLM-based assistant could help here by asking them some questions about the kinds of data they need and how it will be accessed. Based on this it could generate type definitions, or alter the definitions that exist.

Complex refactorings like splitting structs

A follow-on to the previous point is that, in Rust, when your data access patterns change as a result of refactorings, it often means you need to do more wholesale updates to your code.6 A common example for me is that I want to split out some of the fields of a struct into a substruct, so that they can be borrowed separately.7 This can be quite non-local and sometimes involves some heuristic choices, like “should I move this method to be defined on the new substruct or keep it where it is?”.

Migrating consumers over a breaking change

When you run the cargo fix command today it will automatically apply various code suggestions to cleanup your code. With the upcoming Rust 2024 edition, cargo fix---edition will do the same but for edition-related changes. All of the logic for these changes is hardcoded in the compiler and it can get a bit tricky.

For editions, we intentionally limit ourselves to local changes, so the coding for these migrations is usually not too bad, but there are some edge cases where it’d be really useful to have heuristics. For example, one of the changes we are making in Rust 2024 affects “temporary lifetimes”. It can affect when destructors run. This almost never matters (your vector will get freed a bit earlier or whatever) but it can matter quite a bit, if the destructor happens to be a lock guard or something with side effects. In practice when I as a human work with changes like this, I can usually tell at a glance whether something is likely to be a problem—but the heuristics I use to make that judgment are a combination of knowing the name of the types involved, knowing something about the way the program works, and perhaps skimming the destructor code itself. We could hand-code these heuristics, but an LLM could do it and better, and if could ask questions if it was feeling unsure.

Now imagine you are releasing the 2.x version of your library. Maybe your API has changed in significant ways. Maybe one API call has been broken into two, and the right one to use depends a bit on what you are trying to do. Well, an LLM can help here, just like it can help in translating idioms from Java to Rust.

I imagine the idea of having an LLM help you migrate makes some folks uncomfortable. I get that. There’s no reason it has to be mandatory—I expect we could always have a more limited, precise migration available.8

Optimize your Rust code to eliminate hot spots

Premature optimization is the root of all evil, or so Donald Knuth is said to have said. I’m not sure about all evil, but I have definitely seen people rathole on microoptimizing a piece of code before they know if it’s even expensive (or, for that matter, correct). This is doubly true in Rust, where cloning a small data structure (or reference counting it) can often make your life a lot simpler. Llogiq’s great talks on Easy Mode Rust make exactly this point. But here’s a question, suppose you’ve been taking this advice to heart, inserting clones and the like, and you find that your program is running kind of slow? How do you make it faster? Or, even worse, suppose that you are trying to turn our network service. You are looking at the blizzard of available metrics and trying to figure out what changes to make. What do you do? To get some idea of what is possible, check out Scalene, a Python profiler that is also able to offer suggestions as well (from Emery Berger’s group at UMass, the professor I talked about earlier).

Diagnose and explain miri and sanitizer errors

Let’s look a bit to the future. I want us to get to a place where the “minimum bar” for writing unsafe code is that you test that unsafe code with some kind of sanitizer that checks for both C and Rust UB—something like miri today, except one that works “at scale” for code that invokes FFI or does other arbitrary things. I expect a smaller set of people will go further, leveraging automated reasoning tools like Kani or Verus to prove statically that their unsafe code is correct9.

From my experience using miri today, I can tell you two things. (1) Every bit of unsafe code I write has some trivial bug or other. (2) If you enjoy puzzling out the occasionally inscrutable error messages you get from Rust, you’re gonna love miri! To be fair, miri has a much harder job—the (still experimental) rules that govern Rust aliasing are intended to be flexible enough to allow all the things people want to do that the borrow checker doesn’t permit. This means they are much more complex. It also means that explaining why you violated them (or may violate them) is that much more complicated.

Just as an AI can help novices understand the borrow checker, it can help advanced Rustaceans understand tree borrows (or whatever aliasing model we wind up adopting). And just as it can make smarter suggestions for whether to modify the function body or its signature, it can likely help you puzzle out a good fix.

Rust’s emphasis on “reliability” makes it a great target for AI

Anyone who has used an LLM-based tool has encountered hallucinations, where the AI just makes up APIs that “seem like they ought to exist”.10 And yet anyone who has used Rust knows that “if it compiles, it works” is true may more often than it has a right to be.11 This suggests to me that any attempt to use the Rust compiler to validate AI-generated code or solutions is going to also help ensure that the code is correct.

AI-based code assistants right now don’t really have this property. I’ve noticed that I kind of have to pick between “shallow but correct” or “deep but hallucinating”. A good example is match statements. I can use rust-analyzer to fill in the match arms and it will do a perfect job, but the body of each arm is todo!. Or I can let the LLM fill them in and it tends to cover most-but-not-all of the arms but it generates bodies. I would love to see us doing deeper integration, so that the tool is talking to the compiler to get perfect answers to questions like “what variants does this enum have” while leveraging the LLM for open-ended questions like “what is the body of this arm”.12

Conclusion

Overall AI reminds me a lot of the web around the year 2000. It’s clearly overhyped. It’s clearly being used for all kinds of things where it is not needed. And it’s clearly going to change everything.

If you want to see examples of what is possible, take a look at the ChatDBG videos published by Emery Berger’s group. You can see how the AI sends commands to the debugger to explore the program state before explaining the root cause. I love the video debugging bootstrap.py, as it shows the AI applying domain knowledge about statistics to debug and explain the problem.

My expectation is that compilers of the future will not contain nearly so much code geared around authoring diagnostics. They’ll present the basic error, sure, but for more detailed explanations they’ll turn to AI. It won’t be just a plain old foundation model, they’ll use RAG techniques and APIs to let the AI query the compiler state, digest what it finds, and explain it to users. Like a good human tutor, the AI will tailor its explanations to the user, leveraging the user’s past experience and intuitions (oh, and in the user’s chosen language).

I am aware that AI has some serious downsides. The most serious to me is its prodigous energy use, but there are also good questions to be asked about the way that training works and the possibility of not respecting licenses. The issues are real but avoiding AI is not the way to solve them. Just in the course of writing this post, DeepSeek was announced, demonstrating that there is a lot of potential to lower the costs of training. As far as the ethics and legality, that is a very complex space. Agents are already doing a lot to get better there, but note also that most of the applications I am excited about do not involve writing code so much as helping people understand and alter the code they’ve written.

We don’t always get this right. For example, I find the zip combinator of iterators annoying because it takes the shortest of the two iterators, which is occasionally nice but far more often hides bugs. ↩︎
The irony, of course, is that AI can help you to improve your woeful lack of tests by auto-generating them based on code coverage and current behavior. ↩︎
I think they told me they heard it somewhere on the internet? Not sure the original source. ↩︎
Personally, the thing I find most annoying about LLMs is the way they are trained to respond like groveling serveants. “Oh, that’s a good idea! Let me help you with that” or “I’m sorry, you’re right I did make a mistake, here is a version that is better”. Come on, I don’t need flattery. The idea is fine but I’m aware it’s not earth-shattering. Just help me already. ↩︎
Inserting a call to clone is actually a bit more subtle than you might think, given the interaction of the async future here. ↩︎
Garbage Collection allows you to make all kinds of refactorings in ownership structure without changing your interface at all. This is convenient, but—as we discussed early on—it can hide bugs. Overall I prefer having that information be explicit in the interface, but that comes with the downside that changes have to be refactored. ↩︎
I also think we should add a feature like View Types to make this less necessary. In this case instead of refactoring the type structure, AI could help by generating the correct type annotations, which might be non-obvious. ↩︎
My hot take here is that if the idea of an LLM doing migrations in your code makes you uncomfortable, you are likely (a) overestimating the quality of your code and (b) underinvesting in tests and QA infrastructure2. I tend to view an LLM like a “inconsistently talented contributor”, and I am perfectly happy having contributors hack away on projects I own. ↩︎
The student asks, “When unsafe code is proven free of UB, does that make it safe?” The master says, “Yes.” The student asks, “And is it then still unsafe?” The master says, “Yes.” Then, a minute later, “Well, sort of.” (We may need new vocabulary.) ↩︎
My personal favorite story of this is when I asked ChatGPT to generate me a list of “real words and their true definition along with 2 or 3 humorous fake definitions” for use in a birthday party game. I told it that “I know you like to hallucinate so please include links where I can verify the real definition”. It generated a great list of words along with plausible looking URLs for merriamwebster.com and so forth—but when I clicked the URLs, they turned out to all be 404s (the words, it turned out, were real—just not the URLs). ↩︎
This is not a unique property of Rust, it is shared by other languages with rich type systems, like Haskell or ML. Rust happens to be the most widespread such language. ↩︎
I’d also like it if the LLM could be a bit less interrupt-y sometimes. Especially when I’m writing type-system code or similar things, it can be distracting when it keeps trying to author stuff it clearly doesn’t understand. I expect this too will improve over time—and I’ve noticed that while, in the beginning, it tends to guess very wrong, over time it tends to guess better. I’m not sure what inputs and context are being fed by the LLM in the background but it’s evident that it can come to see patterns even for relatively subtle things. ↩︎

https://smallcultfollowing.com/babysteps/blog/2025/02/10/love-the-llm/

Preview crates

Jan 29, 2025 Updated Jan 29, 2025

Show full content

This post lays out the idea of preview crates.1 Preview crates would be special crates released by the rust-lang org. Like the standard library, preview crates would have access to compiler internals but would still be usable from stable Rust. They would be used in cases where we know we want to give users the ability to do X but we don’t yet know precisely how we want to expose it in the language or stdlib. In git terms, preview crates would let us stabilize the plumbing while retaining the ability to iterate on the final shape of the porcelain.

Nightly is not enough

Developing large language features is a tricky business. Because everything builds on the language, stability is very important, but at the same time, there are some questions that are very hard to answer without experience. Our main tool for getting this experience has been the nightly toolchain, which lets us develop, iterate, and test features before committing to them.

Because the nightly toolchain comes with no guarantees at all, however, most users who experiment with it do so lightly, just using it for toy projects and the like. For some features, this is perfectly fine, particularly syntactic features like let-else, where you can learn everything you need to know about how it feels from a single crate.

Nightly doesn’t let you build a fledgling ecosystem

Where nightly really fails us though is the ability to estimate the impact of a feature on a larger ecosystem. Sometimes you would like to expose a capability and see what people build with it. How do they use it? What patterns emerge? Often, we can predict those patterns in advance, but sometimes there are surprises, and we find that what we thought would be the default mode of operation is actually kind of a niche case.

For these cases, it would be cool if there were a way to issue a feature in “preview” mode, where people can build on it, but it is not yet released in its final form. The challenge is that if we want people to use this to build up an ecosystem, we don’t want to disturb all those crates when we iterate on the feature. We want a way to make changes that lets those crates keep working until the maintainers have time to port to the latest syntax, naming, or whatever.

Editions are closer, but not quite right

The other tool we have for correct mistakes is editions. Editions let us change what syntax means and, because they are opt-in, all existing code continues to work.

Editions let us fix a great many things to make Rust more self-consistent, but they carry a heavy cost. They force people to relearn how things in Rust work. The make books oudated. This price is typically too high for us to ship a feature knowing that we are going to change it in a future edition.

Let’s give an example

To make this concrete, let’s take a specific example. The const generics team has been hard at work iterating on the meaning of const trait and in fact there is a pending RFC that describes their work. There’s just one problem: it’s not yet clear how it should be exposed to users. I won’t go into the rationale for each choice, but suffice to say that there are a number of options under current consideration. All of these examples have been proposed, for example, as the way to say “a function that can be executed at compilation time which will call T::default”:

const fn compute_value<T: ~const Default>()
const fn compute_value<T: const Default>()
const fn compute_value<T: Default>()

At the moment, I personally have a preference between these (I’ll let you guess), but I figure I have about… hmm… 80-90% confidence in that choice. And what’s worse, to really decide between them, I think we have to see how the work on async proceeds, and perhaps also what kinds of patterns turn out to be common in practice for const fn. This stuff is difficult to gauge accurately in advance.

Enter preview crates

So what if we released a crate rust_lang::const_preview. In my dream world, this is released on crates.io, using the namespaces described in [RFC #3243][https://rust-lang.github.io/rfcs/3243-packages-as-optional-namespaces.html]. Like any crate, const_preview can be versioned. It would expose exactly one item, a macro const_item that can be used to write const functions that have const trait bounds:

const_preview::const_item! {
    const fn compute_value<T: ~const Default>() {
        // as `~const` is what is implemented today, I'll use it in this example
    }
}

Interally, this const_item! macro can make use of internal APIs in the compiler to parse the contents and deploy the special semantics.

Releasing v2.0

Now, maybe we use this for a while, and we find that people really don’t like the ~, so we decide to change the syntax. Perhaps we opt to write const Default instead of ~const Default. No problem, we release a 2.0 version of the crate and we also rewrite 1.0 to take in the tokens and invoke 2.0 using the semver trick.

const_preview::const_item! {
    const fn compute_value<T: const Default>() {
        // as `~const` is what is implemented today, I'll use it in this example
    }
}

Integrating into the language

Once we decide we are happy with const_item! we can merge it into the language proper. The preview crates are deprecated and simply desugar to the true language syntax. We all go home, drink non-fat flat whites, and pat ourselves on the back.

User-based experimentation

One thing I like about the preview crates is that then others can begin to do their own experiments. Perhaps somebody wants to try out what it would be like it T: Default meant const by default–they can readily write a wrapper that desugars to const_preview::const_item and try it out. And people can build on it. And all that code keeps working once we integrate const functions into the language “for real”, it just looks kinda dated.

Frequently asked questions Why else might we use previews?

Even if we know the semantics, we could use previews to stabilize features where the user experience is not great. I’m thinking of Generic Associated Types as one example, where the stabilization was slowed because of usability concerns.

What are the risks from this?

The previous answers hints at one of my fears… if preview crates become a widespread way for us to stabilize features with usability gaps, we may accumulate a very large number of them and then never move those features into Rust proper. That seems bad.

Shouldn’t we just make a decision already?

I mean…maybe? I do think we are sometimes very cautious. I would like us to get better at leaning on our judgment. But I also seem that sometimes there is a tension between “getting something out the door” and “taking the time to evaluate a generalization”, and it’s not clear to me that this tension is an inherent complexity or an artificial artifact of the way we do business.

But would this actually work? What’s in that crate and what if it is not matched with the right version of the compiler?

One very special thing about libstd is that it is released together with the compiler and hence it is able to co-evolve, making use of internal APIs that are unstable and change from release to release. If we want to put this crate on crates.io, it will not be able to co-evolve in the same way. Bah. That’s annoying! But I figure we still handle it by actually having the preview functionality exposed by crates in sysroot that are shipping along the compiler. These crates would not be directly usable except by our blessed crates.io crates, but they would basically just be shims that expose the underlying stuff. We could of course cut out the middleman and just have people use those preview crates directly– but I don’t like that as much because it’s less obvious and because we can’t as easily track reverse dependencies on crates.io to evaluate usage.

A macro seems heavy weight! What other options have you considered?

I also considered the idea of having p# keywords (“preview”), so e.g.

#[allow(preview_feature)]
p#const fn compute_value<T: p#const Default>() {
    // works on stable
}

Using a p# keyword would fire off a lint (preview_feature) that you would probably want to allow.

This is less intrusive, but I like the crate idea better because it allows us to release a v2.0 of the p#const keyword.

What kinds of things can we use preview crates for?

Good question. I’m not entirely sure. It seems like APIs that require us to define new traits and other things would be a bit tricky to maintain the total interoperability I think we want. Tools like trait aliases etc (which we need for other reasons) would help.

Who else does this sort of thing?

Ember has formalized this “plumbing first” approach in their version of editions. In Ember, from what I understand, an edition is not a “time-based thing”, like in Rust. Instead, it indicates a big shift in paradigms, and it comes out when that new paradigm is ready. But part of the process to reaching an edition is to start by shipping core APIs (plumbing APIs) that create the new capabilities. The community can then create wrappers and experiment with the “porcelain” before the Ember crate enshrines a best practice set of APIs and declares the new Edition ready.

Java has a notion of preview features, but they are not semver guaranteed to stick around.

I’m not sure who else!

Could we use decorators instead?

Usability of decorators like #p[const_preview::const_item] is better, particularly in rust-analyzer. The tricky bit there is that decorates can only be applied to valid Rust syntax, so it implies we’d need to extend the parser to include things like ~const forever, whereas I might prefer to have that complexity isolated to the const_preview crate.

So is this a done deal? Is this happening?

I don’t know! People often think that because I write a blog post about something it will happen, but this is currently just in “early ideation” stage. As I’ve written before, though, I continue to feel that we need something kind of “middle state” for our release process (see e.g. this blog post, Stability without stressing the !@#! out), and I think preview crates could be a good tool to have in our toolbox.

Hat tip to Yehuda Katz and the Ember community, Tyler Mandry, Jack Huey, Josh Triplett, Oli Scherer, and probably a few others I’ve forgotten with whom I discussed this idea. Of course anything you like, they came up with, everything you hate was my addition. ↩︎

https://smallcultfollowing.com/babysteps/blog/2025/01/29/preview-crates/

MinPin: yet another pin proposal

Nov 5, 2024 Updated Nov 5, 2024

Show full content

This post floats a variation of boats’ UnpinCell proposal that I’m calling MinPin.1 MinPin’s goal is to integrate Pin into the language in a “minimally disruptive” way2 – and in particular a way that is fully backwards compatible. Unlike Overwrite, MinPin does not attempt to make Pin and &mut “play nicely” together. It does however leave the door open to add Overwrite in the future, and I think helps to clarify the positives and negatives that Overwrite would bring.

TL;DR: Key design decisions

Here is a brief summary of MinPin’s rules

The pinned keyword can be used to get pinned variations of things:
- In types, pinned P is equivalent to Pin, so pinned &mut T and pinned Box<T> are equivalent to Pin<&mut T> and Pin<Box<T>> respectively.
- In function signatures, pinned &mut self can be used instead of self: Pin<&mut Self>.
- In expressions, pinned &mut $place is used to get a pinned &mut that refers to the value in $place.
The Drop trait is modified to have fn drop(pinned &mut self) instead of fn drop(&mut self).
- However, impls of Drop are still permitted (even encouraged!) to use fn drop(&mut self), but it means that your type will not be able to use (safe) pin-projection. For many types that is not an issue; for futures or other “address sensitive” types, you should use fn drop(pinned &mut self).
The rules for field projection from a s: pinned &mut S reference are based on whether or not Unpin is implemented:
- Projection is always allowed for fields whose type implements Unpin.
- For fields whose types are not known to implement Unpin:
  - If the struct S is Unpin, &mut projection is allowed but not pinned &mut.
  - If the struct S is !Unpin[^neg] and does not have a fn drop(&mut self) method, pinned &mut projection is allowed but not &mut.
  - If the type checker does not know whether S is Unpin or not, or if the type S has a Drop impl with fn drop(&mut self), neither form of projection is allowed for fields that are not Unpin.
There is a type struct Unpinnable<T> { value: T } that always implements Unpin.

Design axioms

Before I go further I want to layout some of my design axioms (beliefs that motivate and justify my design).

Pin is part of the Rust language. Despite Pin being entirely a “library-based” abstraction at present, it is very much a part of the language semantics, and it deserves first-class support. It should be possible to create pinned references and do pin projections in safe Rust.
Pin is its own world. Pin is only relevant in specific use cases, like futures or in-place linked lists.
Pin should have zero-conceptual-cost. Unless you are writing a Pin-using abstraction, you shouldn’t have to know or think about pin at all.
Explicit is possible. Automatic operations are nice but it should always be possible to write operations explicitly when needed.
Backwards compatible. Existing code should continue to compile and work.

Frequently asked questions

For the rest of the post I’m just going to go into FAQ mode.

I see the rules, but can you summarize how MinPin would feel to use?

Yes. I think the rule of thumb would be this. For any given type, you should decide whether your type cares about pinning or not.

Most types do not care about pinning. They just go on using &self and &mut self as normal. Everything works as today (this is the “zero-conceptual-cost” goal).

But some types do care about pinning. These are typically future implementations but they could be other special case things. In that case, you should explicitly implement !Unpin to declare yourself as pinnable. When you declare your methods, you have to make a choice

Is the method read-only? Then use &self, that always works.
Otherwise, use &mut self or pinned &mut self, depending…
- If the method is meant to be called before pinning, use &mut self.
- If the method is meant to be called after pinning, use pinned &mut self.

This design works well so long as all mutating methods can be categorized into before-or-after pinning. If you have methods that need to be used in both settings, you have to start using workarounds – in the limit, you make two copies.

How does MinPin compare to UnpinCell?

Those of you who have been following the various posts in this area will recognize many elements from boats’ recent UnpinCell. While the proposals share many elements, there is also one big difference between them that makes a big difference in how they would feel when used. Which is overall better is not yet clear to me.

Let’s start with what they have in common. Both propose syntax for pinned references/borrows (albeit slightly different syntax) and both include a type for “opting out” from pinning (the eponymous UnpinCell<T> in UnpinCell, Unpinnable<T> in MinPin). Both also have a similar “special case” around Drop in which writing a drop impl with fn drop(&mut self) disables safe pin-projection.

Where they differ is how they manage generic structs like WrapFuture<F>, where it is not known whether or not they are Unpin.

struct WrapFuture<F: Future> {
    future: F,
}

The r: pinned &mut WrapFuture<F>, the question is whether we can project the field future:

impl<F: Future> WrapFuture<F> {
    fn method(pinned &mut self) {
        let f = pinned &mut r.future;
        //      --------------------
        //      Is this allowed?
    }
}

There is a specific danger case that both sets of rules are trying to avoid. Imagine that WrapFuture<F> implements Unpin but F does not – e.g., imagine that you have a impl<F: Future> Unpin for WrapFuture<F>. In that case, the referent of the pinned &mut WrapFuture<F> reference is not actually pinned, because the type is unpinnable. If we permitted the creation of a pinned &mut F, where F: !Unpin, we would be under the (mistaken) impression that F is pinned. Bad.

UnpinCell handles this case by saying that projecting from a pinned &mut is only allowed so long as there is no explicit impl of Unpin for WrapFuture (“if [WrapFuture<F>] implements Unpin, it does so using the auto-trait mechanism, not a manually written impl”). Basically: if the user doesn’t say whether the type is Unpin or not, then you can do pin-projection. The idea is that if the self type is Unpin, that will only be because all fields are unpin (in which case it is fine to make pinned &mut references to them); if the self type is not Unpin, then the field future is pinned, so it is safe.

In contrast, in MinPin, this case is only allowed if there is an explicit !Unpin impl for WrapFuture:

impl<F: Future> !Unpin for WrapFuture<F> {
    // This impl is required in MinPin, but not in UnpinCell
}

Explicit negative impls are not allowed on stable, but they were included in the original auto trait RFC. The idea is that a negative impl is an explicit, semver-binding commitment not to implement a trait. This is different from simply not including an impl at all, which allows for impls to be added later.

Why would you prefer MinPin over UnpinCell or vice versa?

I’m not totally sure which of these is better. I came to the !Unpin impl based on my axiom that pin is its own world – the idea was that it was better to push types to be explicitly unpin all the time than to have “dual-mode” types that masquerade as sometimes pinned and sometimes not.

In general I feel like it’s better to justify language rules by the presence of a declaration than the absence of one. So I don’t like the idea of saying “the absence of an Unpin impl allows for pin-projection” – after all, adding impls is supposed to be semver-compliant. Of course, that’s much lesss true for auto traits, but it can still be true.

In fact, Pin has had some unsoundness in the past based on unsafe reasoning that was justified by the lack of an impl. We assumed that &T could never implemented DerefMut, but it turned out to be possible to add weird impls of DerefMut in very specific cases. We fixed this by adding an explicit impl<T> !DerefMut for &T impl.

On the other hand, I can imagine that many explicitly implemented futures might benefit from being able to be ambiguous about whether they are Unpin.

What does your design axiom “Pin is its own world” mean?

The way I see it is that, in Rust today (and in MinPin, pinned places, UnpinCell, etc), if you have a T: !Unpin type (that is, a type that is pinnable), it lives a double life. Initially, it is unpinned, and you interact can move it, &-ref it, or &mut-ref it, just like any other Rust value. But once a !Unpin value becomes pinned to a place, it enters a different state, in which you can no longer move it or use &mut, you have to use pinned &mut:

flowchart TD
Unpinned[
    Unpinned: can access 'v' with '&' and '&mut'
]

Pinned[
    Pinned: can access 'v' with '&' and 'pinned &mut'
]

Unpinned --
    pin 'v' in place (only if T is '!Unpin')
--> Pinned

One-way transitions like this limit the amount of interop and composability you get in the language. For example, if my type has &mut methods, I can’t use them once the type is pinned, and I have to use some workaround, such as duplicating the method with pinned &mut.3 In this specific case, however, I don’t think this transition is so painful, and that’s because of the specifics of the domain: futures go through a pretty hard state change where they start in “preparation mode” and then eventually start executing. The set of methods you need at these two phases are quite distinct. So this is what I meant by “pin is its own world”: pin is not very interopable with Rust, but this is not as bad as it sounds, because you don’t often need that kind of interoperability.

How would Overwrite affect pin being in its own world?

With Overwrite, when you pin a value in place, you just gain the ability to use pinned &mut, you don’t give up the ability to use &mut:

flowchart TD
Unpinned[
    Unpinned: can access 'v' with '&' and '&mut'
]

Pinned[
    Pinned: can additionally access 'v' with 'pinned &mut'
]

Unpinned --
    pin 'v' in place (only if T is '!Unpin')
--> Pinned

Making pinning into a “superset” of the capabilities of pinned means that pinned &mut can be coerced into an &mut (it could even be a “true subtype”, in Rust terms). This in turn means that a pinned &mut Self method can invoke &mut self methods, which helps to make pin feel like a smoothly integrated part of the language.3

So does the axiom mean you think Overwrite is a bad idea?

Not exactly, but I do think that if Overwrite is justified, it is not on the basis of Pin, it is on the basis of immutable fields. If you just look at Pin, then Overwrite does make Pin work better, but it does that by limiting the capabilities of &mut to those that are compatible with Pin. There is no free lunch! As Eric Holk memorably put it to me in privmsg:

It seems like there’s a fixed amount of inherent complexity to pinning, but it’s up to us how we distribute it. Pin keeps it concentrated in a small area which makes it seem absolutely terrible, because you have to face the whole horror at once.4

I think Pin as designed is a “zero-conceptual-cost” abstraction, meaning that if you are not trying to use it, you don’t really have to care about it. That’s worth maintaining, if we can. If we are going to limit what &mut can do, the reason to do it is primarily to get other benefits, not to benefit pin code specifically.

To be clear, this is largely a function of where we are in Rust’s evolution. If we were still in the early days of Rust, I would say Overwrite is the correct call. It reminds me very much of the IMHTWAMA, the core “mutability xor sharing” rule at the heart of Rust’s borrow checker. When we decided to adopt the current borrow checker rules, the code was about 85-95% in conformance. That is, although there was plenty of aliased mutation, it was clear that “mutability xor sharing” was capturing a rule that we already mostly followed, but not completely. Because combining aliased state with memory safety is more complicated, that meant that a small minority of code was pushing complexity onto the entire language. Confining shared mutation to types like Cell and Mutex made most code simpler at the cost of more complexity around shared state in particular.

There’s a similar dynamic around replace and swap. Replace and swap are only used in a few isolated places and in a few particular ways, but the all code has to be more conservative to account for that possibility. If we could go back, I think limiting Replace to some kind of Replaceable<T> type would be a good move, because it would mean that the more common case can enjoy the benefits: fewer borrow check errors and more precise programs due to immutable fields and the ability to pass an &mut SomeType and be sure that your callee is not swapping the value under your feet (useful for the “scope pattern” and also enables Pin<&mut> to be a subtype of &mut).

Why did you adopt pinned &mut and not &pin mut as the syntax?

The main reason was that I wanted a syntax that scaled to Pin<Box<T>>. But also the pin! macro exists, making the pin keyword somewhat awkward (though not impossible).

One thing I was wondering about is the phrase “pinned reference” or “pinned pointer”. On the one hand, it is really a reference to a pinned value (which suggests &pin mut). On the other hand, I think this kind of ambiguity is pretty common. The main thing I have found is that my brain has trouble with Pin because it wants to think of Pin as a “smart pointer” versus a modifier on another smart pointer. pinned Box<T> feels much better this way.

Can you show me an example? What about the MaybeDone example?

Yeah, totally. So boats pinned places post introduced two futures, MaybeDone and Join. Here is how MaybeDone would look in MinPin, along with some inline comments:

enum MaybeDone<F: Future> {
    Polling(F),
    Done(Unpinnable<Option<F::Output>>),
    //   ---------- see below
}

impl<F: Future> !Unpin for MaybeDone<F> { }
//              -----------------------
//
// `MaybeDone` is address-sensitive, so we
// opt out from `Unpin` explicitly. I assumed
// opting out from `Unpin` was the *default* in
// my other posts.

impl<F: Future> MaybeDone<F> {
    fn maybe_poll(pinned &mut self, cx: &mut Context<'_>) {
        if let MaybeDone::Polling(fut) = self {
            //                    ---
            // This is in fact pin-projection, although
            // it's happening implicitly as part of pattern
            // matching. `fut` here has type `pinned &mut F`.
            // We are permitted to do this pin-projection
            // to `F` because we know that `Self: !Unpin`
            // (because we declared that to be true).
            
            if let Poll::Ready(res) = fut.poll(cx) {
                *self = MaybeDone::Done(Some(res));
            }
        }
    }

    fn is_done(&self) -> bool {
        matches!(self, &MaybeDone::Done(_))
    }

    fn take_output(pinned &mut self) -> Option<F::Output> {
        //         ----------------
        //     This method is called after pinning, so it
        //     needs a `pinned &mut` reference...  

        if let MaybeDone::Done(res) = self {
            res.value.take()
            //  ------------
            //
            //  ...but take is an `&mut self` method
            //  and `F:Output: Unpin` is known to be true.
            //  
            //  Therefore we have made the type in `Done`
            //  be `Unpinnable`, so that we can do this
            //  swap.
        } else {
            None
        }
    }
}

Can you translate the Join example?

Yep! Here is Join:

struct Join<F1: Future, F2: Future> {
    fut1: MaybeDone<F1>,
    fut2: MaybeDone<F2>,
}

impl<F1: Future, F2: Future> !Unpin for Join<F> { }
//                           ------------------
//
// Join is a custom future, so implement `!Unpin`
// to gain access to pin-projection.

impl<F1: Future, F2: Future> Future for Join<F1, F2> {
    type Output = (F1::Output, F2::Output);

    fn poll(pinned &mut self, cx: &mut Context<'_>) -> Poll<Self::Output> {
        // The calls to `maybe_poll` and `take_output` below
        // are doing pin-projection from `pinned &mut self`
        // to a `pinned &mut MaybeDone<F1>` (or `F2`) type.
        // This is allowed because we opted out from `Unpin`
        // above.

        self.fut1.maybe_poll(cx);
        self.fut2.maybe_poll(cx);
        
        if self.fut1.is_done() && self.fut2.is_done() {
            let res1 = self.fut1.take_output().unwrap();
            let res2 = self.fut2.take_output().unwrap();
            Poll::Ready((res1, res2))
        } else {
            Poll::Pending
        }
    }
}

What’s the story with Drop and why does it matter?

Drop’s current signature takes &mut self. But recall that once a !Unpin type is pinned, it is only safe to use pinned &mut. This is a combustible combination. It means that, for example, I can write a Drop that uses mem::replace or swap to move values out from my fields, even though they have been pinned.

For types that are always Unpin, this is no problem, because &mut self and pinned &mut self are equivalent. For types that are always !Unpin, I’m not too worried, because Drop as is is a poor fit for them, and pinned &mut self will be beter.

The tricky bit is types that are conditionally Unpin. Consider something like this:

struct LogWrapper<T> {
    value: T,
}

impl<T> Drop for LogWrapper<T> {
    fn drop(&mut self) {
        ...
    }
}

At least today, whether or not LogWrapper is Unpin depends on whether T: Unpin, so we can’t know it for sure.

The solution that boats and I both landed on effectively creates three categories of types:5

those that implement Unpin, which are unpinnable;
those that do not implement Unpin but which have fn drop(&mut self), which are unsafely pinnable;
those that do not implement Unpin and do not have fn drop(&mut self), which are safely pinnable.

The idea is that using fn drop(&mut self) puts you in this purgatory category of being “unsafely pinnable” (it might be more accurate to say being “maybe unsafely pinnable”, since often at compilation time with generics we won’t know if there is an Unpin impl or not). You don’t get access to safe pin projection or other goodies, but you can do projection with unsafe code (e.g., the way the pin-project-lite crate does it today).

It feels weird to have Drop let you use &mut self when other traits don’t.

Yes, it does, but in fact any method whose trait uses pinned &mut self can be implemented safely with &mut self so long as Self: Unpin. So we could just allow that in general. This would be cool because many hand-written futures are in fact Unpin, and so they could implement the poll method with &mut self.

Wait, so if Unpin types can use &mut self, why do we need special rules for Drop?

Well, it’s true that an Unpin type can use &mut self in place of pinned &mut self, but in fact we don’t always know when types are Unpin. Moreover, per the zero-conceptual-cost axiom, we don’t want people to have to know anything about Pin to use Drop. The obvious approaches I could think of all either violated that axiom or just… well… seemed weird:

Permit fn drop(&mut self) but only if Self: Unpin seems like it would work, since most types are Unpin. But in fact types, by default, are only Unpin if their fields are Unpin, and so generic types are not known to be Unpin. This means that if you write a Drop impl for a generic type and you use fn drop(&mut self), you will get an error that can only be fixed by implementing Unpin unconditionally. Because “pin is its own world”, I believe adding the impl is fine, but it violates “zero-conceptual-cost” because it means that you are forced to understand what Unpin even means in the first place.
To address that, I considered treating fn drop(&mut self) as implicitly declaring Self: Unpin. This doesn’t violate our axioms but just seems weird and kind of surprising. It’s also backwards incompatible with pin-project-lite.

These considerations let me to conclude that actually the current design kind of puts in a place where we want three categories. I think in retrospect it’d be better if Unpin were implemented by default but not as an auto trait (i.e., all types were unconditionally Unpin unless they declare otherwise), but oh well.

What is the forwards compatibility story for Overwrite?

I mentioned early on that MinPin could be seen as a first step that can later be extended with Overwrite if we choose. How would that work?

Basically, if we did the s/Unpin/Overwrite/ change, then we would

rename Unpin to Overwrite (literally rename, they would be the same trait);
prevent overwriting the referent of an &mut T unless T: Overwrite (or replacing, swapping, etc).

These changes mean that &mut T is pin-preserving. If T: !Overwrite, then T may be pinned, but then &mut T won’t allow it to be overwritten, replaced, or swapped, and so pinning guarantees are preserved (and then some, since technically overwrites are ok, just not replacing or swapping). As a result, we can simplify the MinPin rules for pin-projection to the following:

Given a reference s: pinned &mut S, the rules for projection of the field f are as follows:

&mut projection is allowed via &mut s.f.

pinned &mut projection is allowed via pinned &mut s.f if S: !Unpin

What would it feel like if we adopted Overwrite?

We actually got a bit of a preview when we talked about MaybeDone. Remember how we had to introduce Unpinnable around the final value so that we could swap it out? If we adopted Overwrite, I think the TL;DR of how code would be different is that most any code that today uses std::mem::replace or std::mem::swap would probably wind up using an explicit Unpinnable-like wrapper. I’ll cover this later.

This goes a bit to show what I meant about there being a certain amount of inherent complexity that we can choose to distibute: in MinPin, this pattern of wrapping “swappable” data is isolated to pinned &mut self methods in !Unpin types. With Overwrite, it would be more widespread (but you would get more widespread benefits, as well).

Conclusion

My conclusion is that this is a fascinating space to think about!6 So fun.

Hat tip to Tyler Mandry and Eric Holk who discussed these ideas with me in detail. ↩︎
MinPin is the “minimal” proposal that I feel meets my desiderata; I think you could devise a maximally minimal proposal is even smaller if you truly wanted. ↩︎
It’s worth noting that coercions and subtyping though only go so far. For example, &mut can be coerced to &, but we often need methods that return “the same kind of reference they took in”, which can’t be managed with coercions. That’s why you see things like last and last_mut. ↩︎ ↩︎
I would say that the current complexity of pinning is, in no small part, due to accidental complexity, as demonstrated by the recent round of exploration, but Eric’s wider point stands. ↩︎
Here I am talking about the category of a particular monomorphized type in a particular version of the crate. At that point, every type either implements Unpin or it doesn’t. Note that at compilation time there is more grey area, as they can be types that may or may not be pinnable, etc. ↩︎
Also that I spent way too much time iterating on this post. JUST GONNA POST IT. ↩︎

https://smallcultfollowing.com/babysteps/blog/2024/11/05/minpin/

The `Overwrite` trait and `Pin`

Oct 14, 2024 Updated Oct 14, 2024

Show full content

In July, boats presented a compelling vision in their post pinned places. With the Overwrite trait that I introduced in my previous post, however, I think we can get somewhere even more compelling, albeit at the cost of a tricky transition. As I will argue in this post, the Overwrite trait effectively becomes a better version of the existing Unpin trait, one that effects not only pinned references but also regular &mut references. Through this it’s able to make Pin fit much more seamlessly with the rest of Rust.

Just show me the dang code

Before I dive into the details, let’s start by reviewing a few examples to show you what we are aiming at (you can also skip to the TL;DR, in the FAQ).

I’m assuming a few changes here:

Adding an Overwrite trait and changing most types to be !Overwrite by default.
- The Option<T> (and maybe others) would opt-in to Overwrite, permitting x.take().
Integrating pin into the borrow checker, extending auto-ref to also “auto-pin” and produce a Pin<&mut T>. The borrow checker only permits you to pin values that you own. Once a place has been pinned, you are not permitted to move out from it anymore (unless the value is overwritten).

The first change is “mildly” backwards incompatible. I’m not going to worry about that in this post, but I’ll cover the ways I think we can make the transition in a follow up post.

Example 1: Converting a generator into an iterator

We would really like to add a generator syntax that lets you write an iterator more conveniently.1 For example, given some slice strings: &[String], we should be able to define a generator that iterates over the string lengths like so:

fn do_computation() -> usize {
    let hashes = gen {
        let strings: Vec<String> = compute_input_strings();
        for string in &strings {
            yield compute_hash(&string);
        }
    };
    
    // ...
}

But there is a catch here! To permit the borrow of strings, which is owned by the generator, the generator will have to be pinned.2 That means that generators cannot directly implement Iterator, because generators need a Pin<&mut Self> signature for their next methods. It is possible, however, to implement Iterator for Pin<&mut G> where G is a generator.3

In today’s Rust, that means that using a generator as an iterator would require explicit pinning:

fn do_computation() -> usize {
    let hashes = gen {....};
    let hashes = pin!(hashes); // <-- explicit pin
    if let Some(h) = hashes.next() {
        // process first hash
    };
    // ...
}

With pinned places, this feels more builtin, but it still requires users to actively think about pinning for even the most basic use case:

fn do_computation() -> usize {
    let hashes = gen {....};
    let pinned mut hashes = hashes;
    if let Some(h) = hashes.next() {
        // process first hash
    };
    // ...
}

Under this proposal, users would simply be able to ignore pinning altogether:

fn do_computation() -> usize {
    let mut hashes = gen {....};
    if let Some(h) = hashes.next() {
        // process first hash
    };
    // ...
}

Pinning is still happening: once a user has called next, they would not be able to move hashes after that point. If they tried to do so, the borrow checker (which now understands pinning natively) would give an error like:

error[E0596]: cannot borrow `hashes` as mutable, as it is not declared as mutable
 --> src/lib.rs:4:22
  |
4 |     if let Some(h) = hashes.next() {
  |                      ------ value in `hashes` was pinned here
  |     ...
7 |     move_somewhere_else(hashes);
  |                         ^^^^^^ cannot move a pinned value
help: if you want to move `hashes`, consider using `Box::pin` to allocate a pinned box
  |
3 |     let mut hashes = Box::pin(gen { .... });
  |                      +++++++++            +

As noted, it is possible to move hashes after pinning, but only if you pin it into a heap-allocated box. So we can advise users how to do that.

Example 2: Implementing the MaybeDone future

The pinned places post included an example future called MaybeDone. I’m going to implement that same future in the system I describe here. There are some comments in the example comparing it to the version from the pinned places post.

enum MaybeDone<F: Future> {
    //         ---------
    //         I'm assuming we are in Rust.Next, and so the default
    //         bounds for `F` do not include `Overwrite`.
    //         In other words, `F: ?Overwrite` is the default
    //         (just as it is with every other trait besides `Sized`).
    
    Polling(F),
    //      -
    //      We don't need to declare `pinned F`.
    
    Done(Option<F::Output>),
}

impl<F: Future> MaybeDone<F> {
    fn maybe_poll(self: Pin<&mut Self>, cx: &mut Context<'_>) {
        //        --------------------
        //        I'm not bothering with the `&pinned mut self`
        //        sugar here, though certainly we could still
        //        add it.
        if let MaybeDone::Polling(fut) = self {
            //                    ---
            //       Just as in the original example,
            //       we are able to project from `Pin<&mut Self>`
            //       to a `Pin<&mut F>`.
            //
            //       The key is that we can safely project
            //       from an owner of type `Pin<&mut Self>`
            //       to its field of type `Pin<&mut F>`
            //       so long as the owner type `Self: !Overwrite`
            //       (which is the default for structs in Rust.Next).
            if let Poll::Ready(res) = fut.poll(cx) {
                *self = MaybeDone::Done(Some(res));
            }
        }
    }

    fn is_done(&self) -> bool {
        matches!(self, &MaybeDone::Done(_))
    }

    fn take_output(&mut self) -> Option<F::Output> {
        //         ---------
        //   In pinned places, this method had to be
        //   `&pinned mut self`, but under this design,
        //   it can be a regular `&mut self`.
        //   
        //   That's because `Pin<&mut Self>` becomes
        //   a subtype of `&mut Self`.
        if let MaybeDone::Done(res) = self {
            res.take()
        } else {
            None
        }
    }
}

Example 3: Implementing the Join combinator

Let’s complete the journey by implementing a Join future:

struct Join<F1: Future, F2: Future> {
    // These fields do not have to be declared `pinned`:
    fut1: MaybeDone<F1>,
    fut2: MaybeDone<F2>,
}

impl<F1, F2> Future for Join<F1, F2>
where
    F1: Future,
    F2: Future,
{
    type Output = (F1::Output, F2::Output);

    fn poll(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output> {
        //  --------------------
        // Again, I've dropped the sugar here.
        
        // This looks just the same as in the
        // "Pinned Places" example. This again
        // leans on the ability to project
        // from a `Pin<&mut Self>` owner so long as
        // `Self: !Overwrite` (the default for structs
        // in Rust.Next).
        self.fut1.maybe_poll(cx);
        self.fut2.maybe_poll(cx);
        
        if self.fut1.is_done() && self.fut2.is_done() {
            // This code looks the same as it did with pinned places,
            // but there is an important difference. `take_output`
            // is now an `&mut self` method, not a `Pin<&mut Self>`
            // method. This demonstrates that we can also get
            // a regular `&mut` reference to our fields.
            let res1 = self.fut1.take_output().unwrap();
            let res2 = self.fut2.take_output().unwrap();
            Poll::Ready((res1, res2))
        } else {
            Poll::Pending
        }
    }
}

How I think about pin

OK, now that I’ve lured you in with code examples, let me drive you away by diving into the details of Pin. I’m going to cover the way that I think about Pin. It is similar to but different from how Pin is presented in the pinned places post – in particular, I prefer to think about places that pin their values and not pinned places. In any case, Pin is surprisingly subtle, and I recommend that if you want to go deeper, you read boat’s history of Pin post and/or the stdlib documentation for Pin.

The Pin type is a modifier on the pointer P

The Pin type is unusual in Rust. It looks similar to a “smart pointer” type, like Arc<T>, but it functions differently. Pin is not a pointer, it is a modifier on another pointer, so

a Pin<&T> represents a pinned reference,
a Pin<&mut T> represents a pinned mutable reference,
a Pin<Box<T>> represents a pinned box,

and so forth.

You can think of a Pin type as being a pointer of type P that refers to a place (Rust jargon for a location in memory that stores a value) whose value v has been pinned. A pinned value v can never be moved to another place in memory. Moreover, v must be dropped before its place can be reassigned to another value.

Pinning is part of the “lifecycle” of a place

The way I think about, every place in memory has a lifecycle:

flowchart TD
Uninitialized 
Initialized
Pinned

Uninitialized --
    p = v where v: T
--> Initialized

Initialized -- 
    move out, drop, or forget
--> Uninitialized

Initialized --
    pin value v in p
    (only possible when T is !Unpin)
--> Pinned

Pinned --
    drop value
--> Uninitialized

Pinned --
    move out or forget
--> UB

Uninitialized --
    free the place
--> Freed

UB[💥 Undefined behavior 💥]

When first allocated, a place p is uninitialized – that is, p has no value at all.

An uninitialized place can be freed. This corresponds to e.g. popping a stack frame or invoking free.

p may at some point become initialized by an assignment like p = v. At that point, there are three ways to transition back to uninitialized:

The value v could be moved somewhere else, e.g. by moving it somewhere else, like let p2 = p. At that point, p goes back to being uninitialized.
The value v can be forgotten, with std::mem::forget(p). At this point, no destructor runs, but p goes back to being considered uninitialized.
The value v can be dropped, which occurs when the place p goes out of scope. At this point, the destructor runs, and p goes back to being considered uninitialized.

Alternatively, the value v can be pinned in place:

At this point, v cannot be moved again, and the only way for p to be reused is for v to be dropped.

Once a value is pinned, moving or forgetting the value is not allowed. These actions are “undefined behavior”, and safe Rust must not permit them to occur.

A digression on forgetting vs other ways to leak

As most folks know, Rust does not guarantee that destructors run. If you have a value v whose destructor never runs, we say that value is leaked. There are however two ways to leak a value, and they are quite different in their impact:

Option A: Forgetting. Using std::mem::forget, you can forget the value v. The place p that was storing that value will go from initialized to uninitialized, at which point the place p can be freed.
- Forgetting a value is undefined behavior if that value has been pinned, however!
Option B: Leak the place. When you leak a place, it just stays in the initialized or pinned state forever, so its value is never dropped. This can happen, for example, with a ref-count cycle.
- This is safe even if the value is pinned!

In retrospect, I wish that Option A did not exist – I wish that we had not added std::mem::forget. We did so as part of working through the impact of ref-count cycles. It seemed equivalent at the time (“the dtor doesn’t run anyway, why not make it easy to do”) but I think this diagram shows why it adding forget made things permanently more complicated for relatively little gain.4 Oh well! Can’t win ’em all.

Values of types implementing Unpin cannot be pinned

There is one subtle aspect here: not all values can be pinned. If a type T implements Unpin, then values of type T cannot be pinned. When you have a pinned reference to them, they can still squirm out from under you via swap or other techniques. Another way to say the same thing is to say that values can only be pinned if their type is !Unpin (“does not implement Unpin”).

Types that are !Unpin can be called address sensitive, meaning that once they pinned, there can be pointers to the internals of that value that will be invalidated if the address changes. Types that implement Unpin would therefore be address insensitive. Traditionally, all Rust types have been address insensitive, and therefore Unpin is an auto trait, implemented by most types by default.

Pin<&mut T> is really a “maybe pinned” reference

Looking at the state machine as I describe it here, we can see that possessing a Pin<&mut T> isn’t really a pinned mutable reference, in the sense that it doesn’t always refer to a place that is pinning its value. If T: Unpin, then it’s just a regular reference. But if T: !Unpin, then a pinned reference guarantees that the value it refers to is pinned in place.

This fits with the name Unpin, which I believe was meant to convey that idea that, even if you have a pinned reference to a value of type T: Unpin, that value can become unpinned. I’ve heard the metaphor of “if T: Unpin, you can left out the pin, swap in a different value, and put the pin back”.

Pin picked a peck of pickled pain

Everyone agrees that Pin is confusing and a pain to use. But what makes it such a pain?

If you are attempting to author a Pin-based API, there are two primary problems:

Pin<&mut Self> methods can’t make use of regular &mut self methods.
Pin<&mut Self> methods can’t access fields by default. Crates like pin-project-lite make this easier but still require learning obscure concepts like structural pinning.

If you attempting to consume a Pin-based API, the primary annoyance is that getting a pinned reference is hard. You can’t just call Pin<&mut Self> methods normally, you have to remember to use Box::pin or pin! first. (We saw this in Example 1 from this post.)

My proposal in a nutshell

This post is focused on a proposal with two parts:

Making Pin-based APIs easier to author by replacing the Unpin trait with Overwrite.
Making Pin-based APIs easier to call by integrating pinning into the borrow checker.

I’m going to walk through those in turn.

Making Pin-based APIs easier to author Overwrite as the better Unpin

The first part of my proposalis a change I call s/Unpin/Overwrite/. The idea is to introduce Overwrite and then change the “place lifecycle” to reference Overwrite instead of Unpin:

flowchart TD
Uninitialized 
Initialized
Pinned

Uninitialized --
    p = v where v: T
--> Initialized

Initialized -- 
    move out, drop, or forget
--> Uninitialized

Initialized --
    pin value v in p
    (only possible when
T is 👉!Overwrite👈)
--> Pinned

Pinned --
    drop value
--> Uninitialized

Pinned --
    move out or forget
--> UB

Uninitialized --
    free the place
--> Freed

UB[💥 Undefined behavior 💥]

For s/Unpin/Overwrite/ to work well, we have to make all !Unpin types also be !Overwrite. This is not, strictly speaking, backwards compatible, since today !Unpin types (like all types) can be overwritten and swapped. I think eventually we want every type to be !Overwrite by default, but I don’t think we can change that default in a general way without an edition. But for !Unpin types in particular I suspect we can get away with it, because !Unpin types are pretty rare, and the simplification we get from doing so is pretty large. (And, as I argued in the previous post, there is no loss of expressiveness; code today that overwrites or swaps !Unpin values can be locally rewritten.)

Why swaps are bad without s/Unpin/Overwrite/

Today, Pin<&mut T> cannot be converted into an &mut T reference unless T: Unpin.5 This because it would allow safe Rust code to create Undefined Behavior by swapping the referent of the &mut T reference and hence moving the pinned value. By requiring that T: Unpin, the DerefMut impl is effectively limiting itself to references that are not, in fact, in the “pinned” state, but just in the “initialized” state.

As a result, Pin<&mut T> and &mut T methods don’t interoperate today

This leads directly to our first two pain points. To start, from a Pin<&mut Self> method, you can only invoke &self methods (via the Deref impl) or other Pin<&mut Self> methods. This schism separates out the “regular” methods of a type from its pinned methods; it also means that methods doing field assignments don’t compile:

fn increment_field(self: Pin<&mut Self>) {
    self.field = self.field + 1;
}

This errors because compiling a field assignment requires a DerefMut impl and Pin<&mut Self> doesn’t have one.

With s/Unpin/Overwrite/, Pin<&mut Self> is a subtype of &mut self

s/Unpin/Overwrite/ allows us to implement DerefMut for all pinned types. This is because, unlike Unpin, Overwrite affects how &mut works, and hence &mut T would preserve the pinned state for the place it references. Consider the two possibilities for the value of type T referred to by the &mut T:

If T: Overwrite, then the value is not pinnable, and so the place cannot be in the pinned state.
If T: !Overwrite, the value could be pinned, but we also cannot overwrite or swap it, and so pinning is preserved.

This implies that Pin<&mut T> is in fact a generalized version of &mut T. Every &'a mut T keeps the value pinned for the duration of its lifetime 'a, but a Pin<&mut T> ensures the value stays pinned for the lifetime of the underlying storage.

If we have a DerefMut impl, then Pin<&mut Self> methods can freely call &mut self methods. Big win!

Today you must categorize fields as “structurally pinned” or not

The other pain point today with Pin is that we have no native support for “pin projection”6. That is, you cannot safely go from a Pin<&mut Self> reference to a Pin<&mut F> method that referring to some field self.f without relying on unsafe code.

The most common practice today is to use a custom crate like pin-project-lite. Even then, you also have to make a choice for each field between whether you want to be able to get a Pin<&mut F> reference or a normal &mut F reference. Fields for which you can get a pinned reference are called structurally pinned and the criteria for which one you should use is rather subtle. Ultimately this choice is required because Pin<&mut F> and &mut F don’t play nicely together.

Pin projection is safe from any !Overwrite type

With s/Unpin/Overwrite/, we can scrap the idea of structural pinning. Instead, if we have a field owner self: Pin<&mut Self>, pinned projection is allowed so long as Self: !Overwrite. That is, if Self: !Overwrite, then I can always get a Pin<&mut F> reference to some field self.f of type F. How is that possible?

Actually, the full explanation relies on borrow checker extensions I haven’t introduced yet. But let’s see how far we get without them, so that we can see the gap that the borrow checker has to close.

Assume we are creating a Pin<&'a mut F> reference r to some field self.f, where self: Pin<&mut Self>:

We are creating a Pin<&'a mut F> reference to the value in self.f:
- If F: Overwrite, then the value is not pinnable, so this is equivalent to an ordinary &mut F and we have nothing to prove.
- Else, if F: !Overwrite, then we have to show that the value in self.f will not move for the remainder of its lifetime.
 - Pin projection from ``*selfis only valid ifSelf: !Overwriteandself: Pin<&‘b mut Self>, so we know that the value in *self` is pinned for the remainder of its lifetime by induction.
 - We have to show then that the value v_f in self.f will never be moved until the end of its lifetime.

There are three ways to move a value out of self.f:

You can assign a new value to self.f, like self.f = ....
- This will run the destructor, ending the lifetime of the value v_f.
You can create a mutable reference r = &mut self.f and then…
- assign a new value to *r: but that will be an error because F: !Overwrite.
- swap the value in *r with another: but that will be an error because F: !Overwrite.

QED. =)

Making Pin-based APIs easier to call

Today, getting a Pin<&mut> requires using the pin! macro, going through Box::pin, or some similar explicit action. This adds “syntactic salt” to calling a Pin<&mut Self> some other abstraction rooted in unsafe (e.g., Box::pin). There is no built-in way to safely create a pinned reference. This is fine but introduces ergonomic hurdles

We want to make calling a Pin<&mut Self> method as easy as calling an &mut self method. To do this, we need to extra the compiler’s notion of “auto-ref” to include the option of “auto-pin-ref”:

// Instead of this:
let future: Pin<&mut impl Future> = pin!(async { ... });
future.poll(cx);

// We would do this:
let mut future: impl Future = async { ... };
future.poll(cx); // <-- Wowee!

Just as a typical method call like vec.len() expands to Vec::len(&vec), the compiler would be expanding future.poll(cx) to something like so:

Future::poll(&pinned mut future, cx)
//           ^^^^^^^^^^^ but what, what's this?

This expansion though includes a new piece of syntax that doesn’t exist today, the &pinned mut operation. (I’m lifting this syntax from boats’ pinned places proposal.)

Whereas &mut var results in an &mut T reference (assuming var: T), &pinned mut var borrow would result in a Pin<&mut T>. It would also make the borrow checker consider the value in future to be pinned. That means that it is illegal to move out from var. The pinned state continues indefinitely until var goes out of scope or is overwritten by an assignment like var = ... (which drops the heretofore pinned value). This is a fairly straightforward extension to the borrow checker’s existing logic.

New syntax not strictly required

It’s worth noting that we don’t actually need the &pinned mut syntax (which means we don’t need the pinned keyword). We could make it so that the only way to get the compiler to do a pinned borrow is via auto-ref. We could even add a silly trait to make it explicit, like so:

trait Pinned {
    fn pinned(self: Pin<&mut Self>) -> Pin<&mut Self>;
}

impl<T: ?Sized> Pinned for T {
    fn pinned(self: Pin<&mut T>) -> Pin<&mut T> {
        self
    }
}

Now you can write var.pinned(), which the compiler would desugar to Pinned::pinned(&rustc#pinned mut var). Here I am using rustc#pinned to denote an “internal keyword” that users can’t type.7

Frequently asked questions So…there’s a lot here. What’s the key takeaways?

The shortest version of this post I can manage is8

Pinning fits smoothly into Rust if we make two changes:
- Limit the ability to swap types by default, making Pin<&mut T> a subtype of &mut T and enabling uniform pin projection.
- Integrate pinning in the auto-ref rules and the borrow checker.

Why do you only mention swaps? Doesn’t Overwrite affect other things?

Indeed the Overwrite trait as I defined it is overkill for pinning. The more precise, we might imagine two special traits that affect how and when we can drop or move values:

trait DropWhileBorrowed: Sized { }
trait Swap: DropWhileBorrowed { }

Given a reference r: &mut T, overwriting its referent *r with a new value would require T: DropWhileBorrowed;
Swapping two values of type T requires that T: Swap.
- This is true regardless of whether they are borrowed or not.

Today, every type is Swap. What I argued in the previous post is that we should make the default be that user-defined types implement neither of these two traits (over an edition, etc etc). Instead, you could opt-in to both of them at once by implementing Overwrite.

But we could get all the pin benefits by making a weaker change. Instead of having types opt out from both traits by default, they could only opt out of Swap, but continue to implement DropWhileBorrowed. This is enough to make pinning work smoothly. To see why, recall the pinning state diagram: dropping the value in *r (permitted by DropWhileBorrowed) will exit the “pinned” state and return to the “uninitialized” state. This is valid. Swapping, in contrast, is UB.

Two subtle observations here worth calling out:

Both DropWhileBorrowed and Swap have Sized as a supertrait. Today in Rust you can’t drop a &mut dyn SomeTrait value and replace it with another, for example. I think it’s a bit unclear whether unsafe could do this if it knows the dynamic type of value behind the dyn. But under this model, it would only be valid for unsafe code do that drop if (a) it knew the dynamic type and (b) the dynamic type implemented DropWhileBorrowed. Same applies to Swap.
The Swap trait applies longer than just the duration of a borrow. This is because, once you pin a value to create a Pin<&mut T> reference, the state of being pinned persists even after that reference has ended. I say a bit more about this in another FAQ below.

EDIT: An earlier draft of this post named the trait Swap. This was wrong, as described in the FAQ on subtle reasoning.

Why then did you propose opting out from both overwrites and swaps?

Opting out of overwrites (i.e., making the default be neither DropWhileBorrowed nor Swap) gives us the additional benefit of truly immutable fields. This will make cross-function borrows less of an issue, as I described in my previous post, and make some other things (e.g., variance) less relevant. Moreover, I don’t think overwriting an entire reference like *r is that common, versus accessing individual fields. And in the cases where people do do it, it is easy to make a dummy struct with a single field, and then overwrite r.value instead of *r. To me, therefore, distinguishing between DropWhileBorrowed and Swap doesn’t obviously carry its weight.

Can you come up with a more semantic name for Overwrite?

All the trait names I’ve given so far (Overwrite, DropWhileBorrowed, Swap) answer the question of “what operation does this trait allow”. That’s pretty common for traits (e.g., Clone or, for that matter, Unpin) but it is sometimes useful to think instead about “what kinds of types should implement this trait” (or not implement it, as the case may be).

My current favorite “semantic style name” is Mobile, which corresponds to implementing Swap. A mobile type is one that, while borrowed, can move to a new place. This name doesn’t convey that it’s also ok to drop the value, but that follows, since if you can swap the value to a new place, you can presumably drop that new place.

I don’t have a “semantic” name for DropWhileBorrowed. As I said, I’m hard pressed to characterize the type that would want to implement DropWhileBorrowed but not Swap.

What do DropWhileBorrowed and Swap have in common?

These traits pertain to whether an owner who lends out a local variable (i.e., executes r = &mut lv) can rely on that local variable lv to store the same value after the borrow completes. Under this model, the answer depends on the type T of the local variable:

If T: DropWhileBorrowed (or T: Swap, which implies DropWhileBorrowed), the answer is “no”, the local variable may point at some other value, because it is possible to do *r = /* new value */.
But if T: !DropWhileBorrowed, then the owner can be sure that lv still stores the same value (though lv’s fields may have changed).

Let’s use an analogy. Suppose I own a house and I lease it out to someone else to use. I expect that they will make changes on the inside, such as hanging up a new picture. But I don’t expect them to tear down the house and build a new one on the same lot. I also don’t expect them to drive up a flatbed truck, load my house onto it, and move it somewhere else (while proving me with a new one in return). In Rust today, a reference r: &mut T reference allows all of these things:

Mutating a field like r.count += 1 corresponds to hanging up a picture. The values inside r change, but r still refers to the same conceptual value.
Overwriting *r = t with a new value t is like tearing down the house and building a new one. The original value that was in r no longer exists.
Swapping *r with some other reference *r2 is like moving my house somewhere else and putting a new house in its place.

EDIT: Wording refined based on feedback.

What does it mean to be the “same value”?

One question I received was what it meant for two structs to have the “same value”? Imagine a struct with all public fields – can we make any sense of it having an identity? The way I think of it, every struct has a “ghost” private field $identity (one that doesn’t exist at runtime) that contains its identity. Every StructName { } expression has an implicit $identity: new_value() that assigns the identity a distinct value from every other struct that has been created thus far. If two struct values have the same $identity, then they are the same value.

Admittedly, if a struct has all public fields, then it doesn’t really matter whether it’s identity is the same, except perhaps to philosophers. But most structs don’t.

An example that can help clarify this is what I call the “scope pattern”. Imagine I have a Scope type that has some private fields and which can be “installed” in some way and later “deinstalled” (perhaps it modifies thread-local values):

pub struct Scope {...}

impl Scope {
    fn new() -> Self { /* install scope */ }
}

impl Drop for Scope {
    fn drop(&mut self) {
        /* deinstall scope */
    }
}

And the only way for users to get their hands on a “scope” is to use with_scope, which ensures it is installed and deinstalled properly:

pub fn with_scope(op: impl FnOnce(&mut Scope)) {
    let mut scope = Scope::new();
    op(&mut scope);
}

It may appear that this code enforces a “stack discipline”, where nested scopes will be installed and deinstalled in a stack-like fashion. But in fact, thanks to std::mem::swap, this is not guaranteed:

with_scope(|s1| {
    with_scope(|s2| {
        std::mem::swap(s1, s2);
    })
})

This could easily cause logic bugs or, in unsafe is involved, something worse. This is why lending out scopes requires some extra step to be safe, such as using a &-reference or adding a “fresh” lifetime paramteer of some kind to ensure that each scope has a unique type. In principle you could also use a type like &mut dyn ScopeTrait, because the compiler disallows overwriting or swapping dyn Trait values: but I think it’s ambiguous today whether unsafe code could validly do such a swap.

EDIT: Question added based on feedback.

There’s a lot of subtle reasoning in this post. Are you sure this is correct?

I am pretty sure! But not 100%. I’m definitely scared that people will point out some obvious flaw in my reasoning. But of course, if there’s a flaw I want to know. To help people analyze, let me recap the two subtle arguments that I made in this post and recap the reasoning.

Lemma. Given some local variable lv: T where T: !Overwrite mutably borrowed by a reference r: &'a mut T, the value in lv cannot be dropped, moved, or forgotten for the lifetime 'a.

During 'a, the variable lv cannot be accessed directly (per the borrow checker’s usual rules). Therefore, any drops/moves/forgets must take place to *r:

Because T: !Overwrite, it is not possible to overwrite or swap *r with a new value; it is only legal to mutate individual fields. Therefore the value cannot be dropped or moved.
Forgetting a value (via std::mem::forget) requires ownership and is not accesible while lv is borrowed.

Theorem A. If we replace T: Unpin and T: Overwrite, then Pin<&mut T> is a safe subtype of &mut T.

The argument proceeds by cases:

If T: Overwrite, then Pin<&mut T> does not refer to a pinned value, and hence it is semantically equivalent to &mut T.
If T: !Overwrite, then Pin<&mut T> does refer to a pinned value, so we must show that the pinning guarantee cannot be disturbed by the &mut T. By our lemma, the &mut T cannot move or forget the pinned value, which is the only way to disturb the pinning guarantee.

Theorem B. Given some field owner o: O where O: !Overwrite with a field f: F, it is safe to pin-project from Pin<&mut O> to a Pin<&mut F> reference referring to o.f.

The argument proceeds by cases:

If F: Overwrite, then Pin<&mut F> is equivalent to &mut F. We showed in Theorem A that Pin<&mut O> could be upcast to &mut O and it is possible to create an &mut F from &mut O, so this must be safe.
If F: !Overwrite, then Pin<&mut F> refers to a pinned value found in o.f. The lemma tells us that the value in o.f will not be disturbed for the duration of the borrow.

EDIT: It was pointed out to me that this last theorem isn’t quite proving what it needs to prove. It shows that o.f will not be disturbed for the duration of the borrow, but to meet the pin rules, we need to ensure that the value is not swapped even after the borrow ends. We can do this by committing to never permit swaps of values unless T: Overwrite, regardless of whether they are borrowed. I meant to clarify this in the post but forgot about it, and then I made a mistake and talked about Swap – but Swap is the right name.

What part of this post are you most proud of?

Geez, I’m so glad you asked! Such a thoughtful question. To be honest, the part of this post that I am happiest with is the state diagram for places, which I’ve found very useful in helping me to understand Pin:

flowchart TD
Uninitialized 
Initialized
Pinned

Uninitialized --
    `p = v` where `v: T`
--> Initialized

Initialized -- 
    move out, drop, or forget
--> Uninitialized

Initialized --
    pin value `v` in `p`
    (only possible when `T` is `!Unpin`)
--> Pinned

Pinned --
    drop value
--> Uninitialized

Pinned --
    move out or forget
--> UB

Uninitialized --
    free the place
--> Freed

UB[💥 Undefined behavior 💥]

Obviously this question was just an excuse to reproduce it again. Some of the key insights that it helped me to crystallize:

A value that is Unpin cannot be pinned:
- And hence Pin<&mut Self> really means “reference to a maybe-pinned value” (a value that is pinned if it can be).
Forgetting a value is very different from leaking the place that value is stored:
- In both cases, the value’s Drop never runs, but only one of them can lead to a “freed place”.

In thinking through the stuff I wrote in this post, I’ve found it very useful to go back to this diagram and trace through it with my finger.

Is this backwards compatible?

Maybe? The question does not have a simple answer. I will address in a future blog post in this series. Let me say a few points here though:

First, the s/Unpin/Overwrite/ proposal is not backwards compatible as I described. It would mean for example that all futures returned by async fn are no longer Overwrite. It is quite possible we simply can’t get away with it.

That’s not fatal, but it makes things more annoying. It would mean there exist types that are !Unpin but which can be overwritten. This in turn means that Pin<&mut Self> is not a subtype of &mut Self for all types. Pinned mutable references would be a subtype for almost all types, but not those that are !Unpin && Overwrite.

Second, a naive, conservative transition would definitely be rough. My current thinking is that, in older editions, we add T: Overwrite bounds by default on type parameters T and, when you have a T: SomeTrait bound, we would expand that to include a Overwrite bound on associated types in SomeTrait, like T: SomeTrait<AssocType: Overwrite>. When you move to a newer edition I think we would just not add those bounds. This is kind of a mess, though, because if you call code from an older edition, you are still going to need those bounds to be present.

That all sounds painful enough that I think we might have to do something smarter, where we don’t always add Overwrite bounds, but instead use some kind of inference in older editions to avoid it most of the time.

Conclusion

My takeaway from authoring this post is that something like Overwrite has the potential to turn Pin from wizard level Rust into mere “advanced Rust”, somewhat akin to knowing the borrow checker really well. If we had no backwards compatibility constraints to work with, it seems clear that this would be a better design than Unpin as it is today.

Of course, we do have backwards compatibility constraints, so the real question is how we can make the transition. I don’t know the answer yet! I’m planning on thinking more deeply about it (and talking to folks) once this post is out. My hope was first to make the case for the value of Overwrite (and to be sure my reasoning is sound) before I invest too much into thinking how we can make the transition.

Assuming we can make the transition, I’m wondering two things. First, is Overwrite the right name? Second, should we take the time to re-evaluate the default bounds on generic types in a more complete way? For example, to truly have a nice async story, and for myraid other reasons, I think we need must move types. How does that fit in?

The precise design of generators is of course an ongoing topic of some controversy. I am not trying to flesh out a true design here or take a position. Mostly I want to show that we can create ergonomic bridges between “must pin” types like generators and “non pin” interfaces like Iterator in an ergonomic way without explicit mentioning of pinning. ↩︎
Boats has argued that, since no existing iterator can support borrows over a yield point, generators might not need to do so either. I don’t agree. I think supporting borrows over yield points is necessary for ergonomics just as it was in futures. ↩︎
Actually for Pin<impl DerefMut<Target: Generator>>. ↩︎
I will say, I use std::mem::forget quite regularly, but mostly to make up for a shortcoming in Drop. I would like it if Drop had a separate method, fn drop_on_unwind(&mut self), and we invoked that method when unwinding. Most of the time, it would be the same as regular drop, but in some cases it’s useful to have cleanup logic that only runs in the case of unwinding. ↩︎
In contrast, a Pin<&mut T> reference can be safely converted into an &T reference, as evidenced by Pin’s Deref impl. This is because, even if T: !Unpin, a &T reference cannot do anything that is invalid for a pinned value. You can’t swap the underlying value or read from it. ↩︎
Projection is the wonky PL term for “accessing a field”. It’s never made much sense to me, but I don’t have a better term to use, so I’m sticking with it. ↩︎
We have a syntax k#foo for explicitly referred to a keyword foo. It is meant to be used only for keywords that will be added in future Rust editions. However, I sometimes think it’d be neat to internal-ish keywords (like k#pinned) that are used in desugaring but rarely need to be typed explicitly; you would still be able to write k#pinned if for whatever reason you wanted to. And of course we could later opt to stabilize it as pinned (no prefix required) in a future edition. ↩︎
I tried asking ChatGPT to summarize the post but, when I pasted in my post, it replied, “The message you submitted was too long, please reload the conversation and submit something shorter.” Dang ChatGPT, that’s rude! Gemini at least gave it the old college try. Score one for Google. Plus, it called my post “thought-provoking!” Aww, I’m blushing! ↩︎

https://smallcultfollowing.com/babysteps/blog/2024/10/14/overwrite-and-pin/

Making overwrite opt-in #crazyideas

Sep 26, 2024 Updated Sep 26, 2024

Show full content

What would you say if I told you that it was possible to (a) eliminate a lot of “inter-method borrow conflicts” without introducing something like view types and (b) make pinning easier even than boats’s pinned places proposal, all without needing pinned fields or even a pinned keyword? You’d probably say “Sounds great… what’s the catch?” The catch it requires us to change Rust’s fundamental assumption that, given x: &mut T, you can always overwrite *x by doing *x = /* new value */, for any type T: Sized. This kind of change is tricky, but not impossible, to do over an edition.

TL;DR

We can reduce inter-procedural borrow check errors, increase clarity, and make pin vastly simpler to work with if we limit when it is possible to overwrite an &mut reference. The idea is that if you have a mutable reference x: &mut T, it should only be possible to overwrite x via *x = /* new value */ or to swap its value via std::mem::swap if T: Overwrite. To start with, most structs and enums would implement Overwrite, and it would be a default bound, like Sized; but we would transition in a future edition to have structs/enums be !Overwrite by default and to have T: Overwrite bounds written explicitly.

Structure of this series

This blog post is part of a series:

This first post will introduce the idea of immutable fields and show why they could make Rust more ergonomic and more consistent. It will then show how overwrites and swaps are the key blocker and introduce the idea of the Overwrite trait, which could overcome that.
In the next post, I’ll dive deeper into Pin and how the Overwrite trait can help there.
After that, who knows? Depends on what people say in response.1

If you could change one thing about Rust, what would it be?

People often ask me to name something I would change about Rust if I could. One of the items on my list is the fact that, given a mutable reference x: &mut SomeStruct to some struct, I can overwrite the entire value of x by doing *x = /* new value */, versus only modifying individual fields like x.field = /* new value */.

Having the ability to overwrite *x always seemed very natural to me, having come from C, and it’s definitely useful sometimes (particularly with Copy types like integers or newtyped integers). But it turns out to make borrowing and pinning much more painful than they would otherwise have to be, as I’ll explain shortly.

In the past, when I’ve thought about how to fix this, I always assumed we would need a new form of reference type, like &move T or something. That seemed like a non-starter to me. But at RustConf last week, while talking about the ergonomics of Pin, a few of us stumbled on the idea of using a trait instead. Under this design, you can always make an x: &mut T, but you can’t always assign to *x as a result. This turns out to be a much smoother integration. And, as I’ll show, it doesn’t really give up any expressiveness.

Motivating example #1: Immutable fields

In this post, I’m going to motivate the changes by talking about immutable fields. Today in Rust, when you declare a local variable let x = …, that variable is immutable by default2. Fields, in contrast, inherit their mutability from the outside: when a struct appears in a mut location, all of its fields are mutable.

Not all fields are mutable, but I can’t declare that in my Rust code

It turns out that declaring local variables as mut is not needed for the borrow checker — and yet we do it nonetheless, in part because it helps readability. It’s useful to see when a variable might change. But if that argument holds for local variables, it holds double for fields! For local variables, we can find all potential mutation just by searching one function. To know if a field may be mutated, we have to search across many functions. And for fields, precisely because they can be mutated across functions, declaring them as immutable can actually help the borrow checker to see that your code is safe.

Idea: Declare fields as mutable

So what if we extended the mutable declaration to fields? The idea would be that, in your struct, if you want to mutate fields, you have to declare them as mut. This would allow them to be mutated: but only if the struct itself appears in a mutable local field.

For example, maybe I have an Analyzer struct that is created with some vector of datums and which has to compute the number of “important” ones:

#[derive(Default)]
struct Analyzer {
    /// Data being analyzed: will never be modified.
    data: Vec<Datum>,

    /// Number of important datums uncovered so far.
    mut important: usize,
}

As you can see from the struct declaration, the field data is declared as immutable. This is because we are only going to be reading the Datum values. The important field is declared as mut, indicating that it will be updated.

When can you mutate fields?

In this world, mutating a field is only possible when (1) the struct appears in a mutable location and (2) the field you are referencing is declared as mut. So this code compiles fine, because the field important is mut:

let mut analyzer = Analyzer::new();
analyzer.important += 1; // OK: mut field in a mut location

But this code does not compile, because the local variable x is not:

let x = Analyzer::default();
x.important += 1; // ERROR: `x` not declared as mutable

And this code does not compile, because the field data is not declared as mut:

let mut x = Analyzer::default();
x.data.clear(); // ERROR: field `data` is not declared as mutable

Leveraging immutable fields in the borrow checker

So why is it useful to declare fields as mut? Well, imagine you have a method like increment_if_important, which checks if datum.is_important() is true and modifies the important flag if so:

impl Analyzer {
    fn increment_if_important(&mut self, datum: &Datum) {
        if datum.is_important() {
            self.important += 1;
        }
    }
}

Now imagine you have a function that loops over self.data and calls increment_if_important on each item:

impl Analyzer {
    fn count_important(&mut self) {
        for datum in &self.data {
            self.increment_if_important(datum);
        }
    }
}

I can hear the experienced Rustaceans crying out in pain now. This function, natural as it appears, will not compile in Rust today. Why is that? Well, we have a shared borrow on self.data but we are trying to call an &mut self function, so we have no way to be sure that self.data will not be modified.

But what about immutable fields? Doesn’t that solve this?

Annoyingly, immutable fields on their own don’t change anything! Why? Well, just because you can’t write to a field directly doesn’t mean you can’t mutate the memory it’s stored in. For example, maybe I write a malicious version of increment_if_important:

impl Analyzer {
    fn malicious_increment_if_important(&mut self, datum: &Datum) {
        *self = Analyzer::default();
    }
}

This version never directly accesses the field data, but it just writes to *self, and hence it has the same impact. Annoying!

Generics: why we can’t trivially disallow overwrites

Maybe you’re thinking “well, can’t we just disallow overwriting *self if there are fields declared mut?” The answer is yes, we can, and that’s what this blog post is about. But it’s not so simple as it sounds, because we are changing the “basic contract” that all Rust types currently satisfy. In particular, Rust today assumes that if you have a reference x: &mut T and a value v: T, you can always do *x = v and overwrite the referent of x. That means I could can write a generic function like set_to_default:

fn set_to_default<T: Default>(r: &mut T) {
    *r = T::default();
}

Now, since Analyzer implements Default, I can make increment_if_important call set_to_default. This will still free self.data, but it does it in a sneaky way, where we can’t obviously tell that the value being overwritten is an instance of a struct with mut fields:

impl Analyzer {
    fn malicious_increment_if_important(&mut self, datum: &Datum) {
        // Overwrites `self.data`, but not in an obvious way
        set_to_default(self);
    }
}

Recap

So let’s step back and recap what we’ve seen so far:

If we could distinguish which fields were mutable and which were definitely not, we could eliminate many inter-function borrow check errors3.
However, just adding mut declarations is not enough, because fields can also be mutated indirectly. Specifically, when you have a &mut SomeStruct, you can overwrite with a fresh instance of SomeStruct or swap with another &mut SomeStruct, thus changing all fields at once.
Whatever fix we use has to consider generic code like std::mem::swap, which mutates an &mut T without knowing precisely what T is. Therefore we can’t do something simple like looking to see if T is a struct with mut fields4.

The trait system to the rescue

My proposal is to introduce a new, built-in marker trait called Overwrite:

/// Marker trait that permits overwriting
/// the referent of an `&mut Self` reference.
#[marker] // <-- means the trait cannot have methods
trait Overwrite: Sized {}

The effect of Overwrite

As a marker trait, Overwrite does not have methods, but rather indicates a property of the type. Specifically, assigning to a borrowed place of type T requires that T: Overwrite is implemented. For example, the following code writes to *x, which has type T; this is only legal if T: Overwrite:

fn overwrite<T>(x: &mut T, t: T) {
    *x = t; // <— requires `T: Overwrite`
}

Given this this code compiles today, this implies that a generic type parameter declaration like <T> would require a default Overwrite bound in the current edition. We would want to phase these defaults out in some future edition, as I’ll describe in detail later on.

Similarly, the standard library’s swap function would require a T: Overwrite bound, since it (via unsafe code) assigns to *x and *y:

fn swap<T>(x: &mut T, y: &mut T) {
    unsafe {
        let tmp: T = std::ptr::read(x);
        std::ptr::write(*x, *y); // overwrites `*x`, `T: Overwrite` required
        std::ptr::write(*y, tmp); // overwrites `*y`, `T: Overwrite` required
    }
}

Overwrite requires Sized

The Overwrite trait requires Sized because, for *x = /* new value */ to be safe, the compiler needs to ensure that the place *x has enough space to store “new value”, and that is only possible when the size of the new value is known at compilation time (i.e., the type implements Sized).

Overwrite only applies to borrowed values

The overwrite trait is only needed when assigning to a borrowed place of type T. If that place is owned, the owner is allowed to reassign it, just as they are allowed to drop it. So e.g. the following code compiles whether or not SomeType: Overwrite holds:

let mut x: SomeType = /* something */;
x = /* something else */; // <— does not require that `SomeType: Overwrite` holds

Subtle: Overwrite is not infectious

Somewhat surprisingly, it is ok to have a struct that implements Overwrite which has fields that do not. Consider the types Foo and Bar, where Foo: Overwrite holds but Bar: Overwrite does not:

struct Foo(Bar);
struct Bar;
impl Overwrite for Foo { }
impl !Overwrite for Bar { }

The following code would type check:

let foo = &mut Foo(Bar);
// OK: Overwriting a borrowed place of type `Foo`
// and `Foo: Overwrite` holds.
*foo = Foo(Bar);

However, the following code would not:

let foo = &mut Foo(Bar);
// ERROR: Overwriting a borrowed place of type `Bar`
// but `Bar: Overwrite` does not hold.
foo.0 = Bar;

Types that do not implement Overwrite can therefore still be overwritten in memory, but only as part of overwriting the value in which they are embedded. In the FAQ I show how this non-infectious property preserves expressiveness.5

Who implements Overwrite?

This section walks through which types should implement Overwrite.

Copy implies Overwrite

Any type that implements Copy would automatically implement Overwrite:

impl<T: Copy> Overwrite for T { }

(If you, like me, get nervous when you see blanket impls due to coherence concerns, it’s worth noting that RFC #1268 allows for overlapping impls of marker traits, though that RFC is not yet fully implemented nor stable. It’s not terribly relevant at the moment anyway.)

“Pointer” types are Overwrite

Types that represent pointers all implement Overwrite for all T:

&T
&mut T
Box<T>
Rc<T>
Arc<T>
*const T
*mut T

dyn,[], and other “unsized” types do not implement Overwrite

Types that do not have a static size, like dyn and [], do not implement Overwrite. Safe Rust already disallows writing code like *x = … in such cases.

There are ways to do overwrites with unsized types in unsafe code, but they’d have to prove various bounds. For example, overwriting a [u32] value could be ok, but you have to know the length of data. Similarly swapping two dyn Value referents can be safe, but you have to know that (a) both dyn values have the same underlying type and (b) that type implements Overwrite.

Structs and enums

The question of whether structs and enums should implement Overwrite is complicated because of backwards compatibility. I’m going to distinguish two cases: Rust 2021, and Rust Next, which is Rust in some hypothetical future edition (surely not 2024, but maybe the one after that).

Rust 2021. Struct and enum types in Rust 2021 implement Overwrite by default. Structs could opt-out from Overwrite with an explicit negative impl (impl !Overwrite for S).

Integrating mut fields. Structs that have opted out from Overwrite require mutable fields to be declared as mut. Fields not declared as mut are immutable. This gives them the nicer borrow check behavior.6

Rust Next. In some future edition, we can swap the default, with fields being !Overwrite by default and having to opt-in to enable overwrites. This would make the nice borrow check behavior the default.

Futures and closures

Futures and closures can implement Overwrite iff their captured values implement Overwrite, though in future editions it would be best if they simple do not implement Overwrite.

Default bounds and backwards compatibility

The other big backwards compatibility issue has to do with default bounds. In Rust 2021, every type parameter declared as T implicitly gets a T: Sized bound. We would have to extend that default to be T: Sized + Overwrite. This also applies to associated types in trait definitions and impl X types.7

Interestingly, type parameters declared as T: ?Sized also opt-out from Overwrite. Why is that? Well, remember that Overwrite: Sized, so if T is not known to be Sized, it cannot be known to be Overwrite either. This is actually a big win. It means that types like &T and Box<T> can work with “non-overwrite” types out of the box.

Associated type bounds are annoying, but perhaps not fatal

Still, the fact that default bounds apply to associated types and impl Trait is a pain in the neck. For example, it implies that Iterator::Item would require its items to be Overwrite, which would prevent you from authoring iterators that iterate over structs with immutable fields. This can to some extent be overcome by associated type aliases8 (we could declare Item to be a “virtual associated type”, mapping to Item2021 in older editions, which require Overwrite, and ItemNext in newer ones, which do not).

Frequently asked questions OMG endless words. What did I just read?

Let me recap!

It would be more declarative and create fewer borrow check conflicts if we had users declare their fields as mut when they may be mutated and we were able to assume that non-mut fields will never be mutated.
- If we were to add this, in the current Rust edition it would obviously be opt-in.
- But in a future Rust edition it would become mandatory to declare fields as mut if you want to mutate them.
But to do that, we need to prevent overwrites and swaps. We can do that by introducing a trait, Overwrite, that is required to a given location.
- In the current Rust edition, this trait would be added by default to all type parameters, associated types, and impl Trait bounds; it would be implemented by all structs, enums, and unions.
- In a future Rust edition, the trait would no longer be the default, and structs, enums, and unions would have to explicitly implement if they want to be overwriteable.

This change doesn’t seem worth it just to get immutable fields. Is there more?

But wait, there’s more! Oh, you just said that. Yes, there’s more. I’m going to write a follow-up post showing how opting out from Overwrite eliminates most of the ergonomic pain of using Pin.

In “Rust Next”, who would ever implement Overwrite manually?

I said that, in Rust Next, types should be !Overwrite by default and require people to implement Overwrite manually if they want to. But who would ever do that? It’s a good question, because I don’t think there’s very much reason to.

Because Overwrite is not infectious, you can actually make a wrapper type…

#[repr(transparent)]
struct ForceOverwrite<T> { t: T }
impl<T> Overwrite for ForceOverwrite <T> { }

…and now you can put values of any type X into an ForceOverwrite <X> which can be reassigned.

This pattern allows you to make “local” use of overwrite, for example to implement a sorting algorithm (which has to do a lot of swapping). You could have a sort function that takes an &mut [T] for any T: Ord (Overwrite not required):

fn sort<T: Ord>(data: &mut [T])

Internally, it can safely transmute the &mut [T] to a &mut [ForceOverwrite<T>] and sort that. Note that at no point during that sorting are we moving or overwriting an element while it is borrowed (the slice that owns it is borrowed, but not the elements themselves).

What is the relationship of Overwrite and Unpin?

I’m still puzzling that over myself. I think that Overwrite is “morally the same” as Unpin, but it is much more powerful (and ergonomic) because it is integrated into the behavior of &mut (of course, this comes at the cost of a complex backwards compatibility story).

Let me describe it this way. Types that do not implement Overwrite cannot be overwritten while borrowed, and hence are “pinned for the duration of the borrow”. This has always been true for &T, but for &mut T has traditionally not been true. We’ll see in the next post that Pin<&mut T> basically just extends that guarantee to apply indefinitely.

Compare that to types that do not implement Unpin and hence are “address sensitive”. Such types are pinned for the duration of a Pin<&mut T>. Unlike T: !Overwrite types, they are not pinned by &mut T references, but that’s a bug, not a feature: this is why Pin has to bend over backwards to prevent you from getting your hands on an &mut T.

I’ll explain this more in my next post, of course.

Should Overwrite be an auto trait?

I think not. If we did so, it would lock people into semver hazards in the “Rust Next” edition where mut is mandatory for mutation. Consider a struct Foo { value: u32 } type. This type has not opted into becoming Copy, but it only contains types that are Copy and therefore Overwrite. By auto trait rules it would by default be Overwrite. But that would prevent you from adding a mut field in the future or benefit from immutable fields. This is why I said the default would just be !Overwrite, no matter the field types.

Conclusion

Obama Mic Drop

After this grandiose intro, hopefully I won’t be printing a retraction of the idea due to some glaring flaw… eep! ↩︎
Whenever I saw immutable here, I mean immutable-modulo-Cell, of course. We should probably find another word for that, this is kind of terminology debt that Rust has bought its way into and I’m not sure the best way for us to get out! ↩︎
Immutable fields don’t resolve all inter-function borrow conflicts. To do that, you need something like view types. But in my experience they would eliminate many. ↩︎
The simple solution — if a struct has mut fields, disallow overwriting it — is basically what C++ does with their const fields. Classes or structs with const fields are more limited in how you can use them. This works in C++ because they don’t wait until post-substitution to check templates for validity. ↩︎
I love the Felleisen definition of “expressiveness”: two language features are equally expressive if one can be converted into the other with only local rewrites, which I generally interpret as “rewrites that don’t affect the function signature (or other abstraction boundary)”. ↩︎
We can also make the !Overwrite impl implied by declaring fields mut, of course. This is fine for backwards compatibility, but isn’t the design I would want long-term, since it introduces an odd “step change” where declaring one field as mut implicitly declares all other fields as immutable (and, conversely, deleting the mut keyword from that field has the effect of declaring all fields, including that one, as mutable). ↩︎
The Self type in traits is exempt from the Sized default, and it could be exempt from the Overwrite default as well, unless the trait is declared as Sized. ↩︎
Hat tip to TC, who pointed this out to me. ↩︎

https://smallcultfollowing.com/babysteps/blog/2024/09/26/overwrite-trait/

More thoughts on claiming

Jun 26, 2024 Updated Jun 26, 2024

Show full content

This is the first of what I think will be several follow-up posts to “Claiming, auto and otherwise”. This post is focused on clarifying and tweaking the design I laid out previously in response to some of the feedback I’ve gotten. In future posts I want to lay out some of the alternative designs I’ve heard.

TL;DR: People like it

If there’s any one thing I can take away from what I’ve heard, is that people really like the idea of making working with reference counted or cheaply cloneable data more ergonomic than it is today. A lot of people have expressed a lot of excitement.

If you read only one additional thing from the post—well, don’t do that, but if you must—read the Conclusion. It attempts to restate what I was proposing to help make it clear.

Clarifying the relationship of the traits

I got a few questions about the relationship of the Copy/Clone/Claim traits to one another. I think the best way to show it is with a venn diagram:

.heavy { font: 70px sans-serif; } Clone Copy Claim

The Clone trait is the most general, representing any way of duplicating the value. There are two important subtraits:
- Copy represents values that can be cloned via memcpy and which lack destructors (“plain old data”).
- Claim represents values whose clones are cheap, infallible, and transparent; on the basis of these properties, claims are inserted automatically by the compiler.

Copy and Claim overlap but do not have a strict hierarchical relationship. Some Claim types (like Rc and Arc) are not “plain old data”. And while all Copy operations are infallible, some of them fail to meet claims other conditions:

Copying a large type like [u8; 1024] is not cheap.
Copying a type with interior mutability like Cell<u8> is not transparent.

On heuristics

One challenge with the Claim trait is that the choice to implement it involves some heuristics:

What exactly is cheap? I tried to be specific by saying “O(1) and doesn’t copy more than a few cache lines”, but clearly it will be hard to draw a strict line.
What exactly is infallible? It was pointed out to me that Arc will abort if the ref count overflows (which is one reason why the Rust-for-Linux project rolled their own alternative). And besides, any Rust code can abort on stack overflow. So clearly we need to have some reasonable compromise.
What exactly is transparent? Again, I tried to specify it, but iterator types are an example of types that are technically transparent to copy but where it is nontheless very confusing to claim them.

An aversion to heuristics is the reason we have the current copy/clone split. We couldn’t figure out where to draw the line (“how much data is too much?”) so we decided to simply make it “memcpy or custom code”. This was a reasonable starting point, but we’ve seen that it is imperfect, leading to uncomfortable compromises.

The thing about “cheap, infallible, and transparent” is that I think it represents exactly the criteria that we really want to represent when something can be automatically claimed. And it seems inherent that those criteria are a bit squishy.

One implication of this is that Claim should rarely if ever appear as a bound on a function. Writing fn foo<T: Claim>(t: T) doesn’t really feel like it adds a lot of value to me, since, given the heuristical nature of claim, it’s going to rule out some uses that may make sense. eternaleye proposed an interesting twist on the original proposal, suggesting we introducing stricter versions of Claim for, say, O(1) Clone, although I don’t yet see what code would want to use that as a bound either.

“Infallible” ought to be “does not unwind” (and we ought to abort if it does)

I originally laid out the conditions for claim as “cheap, infallible, and transparent”, where “infallible” means “cannot panic or abort”. But it was pointed out to me that Arc and Rc in the standard library will indeed abort if the ref-count exceeds std::usize::MAX! This obviously can’t work, since reference counted values are the prime candidate to implement Claim.

Therefore, I think infallible ought to say that “Claim operations should never panic”. This almost doesn’t need to be said, since panics are already meant to represent impossible or extraordinarily unlikely conditions, but it seems worth reiterating since it is particularly important in this case.

In fact, I think we should go further and have the compiler insert an abort if an automatic claim operation does unwind.1 My reasoning here is the same as I gave in my post on unwinding 2:

Reasoning about unwinding is already very hard, it becomes nigh impossible if the sources of unwinding are hidden.
It would make for more efficient codegen if the compiler doesn’t have to account for unwinding, which would make code using claim() (automatically or explicitly) mildly more efficient than code using clone().

I was originally thinking of the Rust For Linux project when I wrote the wording on infallible, but their requirements around aborting are really orthogonal and much broader than Claim itself. They already don’t use the Rust standard library, or most dependencies, because they want to limit themselves to code that treats abort as an absolute last resort. Rather than abort on overflow, their version of reference counting opts simply to leak, for example, and their memory allocators return a Result to account for OOM conditions. I think the Claim trait will work just fine for them whatever we say on this point, as they’ll already have to screen for code that meets their more stringent criteria.

Clarifying claim codegen

In my post, I noted almost in passing that I would expect the compiler to still use memcpy at monomorphization time when it knew that the type being claimed implements Copy. One interesting bit of feedback I got was anecdotal evidence that this will indeed be cricital for performance.

To model the semantics I want for claim we would need specialization3. I’m going to use a variant of specialized that lcnr first proposed to me; the idea is to have an if impl expression that, at monomorphization time, either takes the true path (if the type implements Foo via always applicable impls) or the false path (otherwise). This is a cleaner formulation for specialization when the main thing you want to do is provide more optimized or alternative implementations.

Using that, we could write a function use_claim_value that defines the code the compiler should insert:

fn use_claim_value<T: Claim>(t: &T) -> T {
    std::panic::catch_unwind(|| {
        if impl T: Copy {
            // Copy T if we can
            *t
        } else {
            // Otherwise clone
            t.clone()
        }
    }).unwrap_or_else(|| {
        // Do not allow unwinding
        abort();
    })
}

This has three important properties:

No unwinding, for easier reasoning and better codegen.
Copies if it can.
Always calls clone otherwise.

Conclusion What I really proposed

Effectively I proposed to change what it means to “use something by value” in Rust. This has always been a kind of awkward concept in Rust without a proper name, but I’m talking about what happens to the value x in any of these scenarios:

let x: SomeType;

// Scenario A: passing as an argument
fn consume(x: SomeType) {}
consume(x);

// Scenario B: assigning to a new place
let y = x;

// Scenario C: captured by a "move" closure
let c = move || x.operation();

// Scenario D: used in a non-move closure
// in a way that requires ownership
let d = || consume(x);

No matter which way you do it, the rules today are the same:

If SomeType: Copy, then x is copied, and you can go on using it later.
Else, x is moved, and you cannot.

I am proposing that, modulo the staging required for backwards compatibility, we change those rules to the following:

If SomeType: Claim, then x is claimed, and you can go on using it later.
Else, x is moved, and you cannot.

To a first approximation, “claiming” something means calling x.claim() (which is the same as x.clone()). But in reality we can be more efficient, and the definition I would use is as follows:

If the compiler sees x is “live” (may be used again later), it transforms the use of x to use_claimed_value(&x) (as defined earlier).
If x is dead, then it is just moved.

Why I proposed it

There’s a reason I proposed this change in the way that I did. I really value the way Rust handles “by value consumption” in a consistent way across all those contexts. It fits with Rust’s ethos of orthogonal, consistent rules that fit together to make a harmonious, usable whole.

My goal is to retain Rust’s consistency while also improving the gaps in the current rule, which neither highlights the things I want to pay attention to (large copies), hides the things I (almost always) don’t (reference count increments), nor covers all the patterns I sometimes want (e.g., being able to get and set a Cell<Range<u32>>, which doesn’t work today because making Range<u32>: Copy would introduce footguns). My hope is that we can do this in a way that it benefits most every Rust program, whether it be low-level or high-level in nature.

In fact, I wonder if we could extend RFC #3288 to apply this retroactively to all operations invoked automatically by the compiler, like Deref, DerefMut, and Drop. Obviously this is technically backwards incompatible, but the benefits here could well be worth it in my view, and the code impacted seems very small (who intentionally panics in Deref?). ↩︎
Another blog post for which I ought to post a follow-up! ↩︎
Specialization has definitely acquired that “vaporware” reputation and for good reason—but I still think we can add it! That said, my thinking on the topic has evolved quite a bit. It’d be worth another post sometime. /me adds it to the queue. ↩︎

https://smallcultfollowing.com/babysteps/blog/2024/06/26/claim-followup-1/

Claiming, auto and otherwise

Jun 21, 2024 Updated Jun 21, 2024

Show full content

This blog post proposes adding a third trait, Claim, that would live alongside Copy and Clone. The goal of this trait is to improve Rust’s existing split, where types are categorized as either Copy (for “plain old data”1 that is safe to memcpy) and Clone (for types that require executing custom code or which have destructors). This split has served Rust fairly well but also has some shortcomings that we’ve seen over time, including maintenance hazards, performance footguns, and (at times quite significant) ergonomic pain and user confusion.

TL;DR

The proposal in this blog post has three phases:

Adding a new Claim trait that refines Clone to identify “cheap, infallible, and transparent” clones (see below for the definition, but it explicitly excludes allocation). Explicit calls to x.claim() are therefore known to be cheap and easily distinguished from calls to x.clone(), which may not be. This makes code easier to understand and addresses existing maintenance hazards (obviously we can bikeshed the name).
Modifying the borrow checker to insert calls to claim() when using a value from a place that will be used later. So given e.g. a variable y: Rc<Vec<u32>>, an assignment like x = y would be transformed to x = y.claim() if y is used again later. This addresses the ergonomic pain and user confusion of reference-counted values in rust today, especially in connection with closures and async blocks.
Finally, disconnect Copy from “moves” altogether, first with warnings (in the current edition) and then errors (in Rust 2027). In short, x = y would move y unless y: Claim. Most Copy types would also be Claim, so this is largely backwards compatible, but it would let us rule out cases like y: [u8; 1024] and also extend Copy to types like Cell<u32> or iterators without the risk of introducing subtle bugs.

For some code, automatically calling Claim may be undesirable. For example, some data structure definitions track reference count increments closely. I propose to address this case by creating a “allow-by-default” automatic-claim lint that crates or modules can opt-into so that all “claims” can be made explicit. This is more-or-less the profile pattern, although I think it’s notable here that the set of crates which would want “auto-claim” do not necessarily fall into neat categories, as I will discuss.

Step 1: Introducing an explicit Claim trait

Quick, reading this code, can you tell me anything about it’s performance characteristics?

tokio::spawn({
    // Clone `map` and store it into another variable
    // named `map`. This new variable shadows the original.
    // We can now write code that uses `map` and then go on
    // using the original afterwards.
    let map = map.clone();
    async move { /* code using map */ }
});

/* more code using map */

Short answer: no, you can’t, not without knowing the type of map. The call to map.clone() may just be cloning a large map or incrementing a reference count, you can’t tell.

One-clone-fits-all creates a maintenance hazard

When you’re in the midst of writing code, you tend to have a good idea whether a given value is “cheap to clone” or “expensive”. But this property can change over the lifetime of the code. Maybe map starts out as an Rc<HashMap<K, V>> but is later refactored to HashMap<K, V>. A call to map.clone() will still compile but with very different performance characteristics.

In fact, clone can have an effect on the program’s semantics as well. Imagine you have a variable c: Rc<Cell<u32>> and a call c.clone(). Currently this creates another handle to the same underlying cell. But if you refactor c to Cell<u32>, that call to c.clone() is now creating an independent cell. Argh. (We’ll see this theme, of the importance of distinguishing interior mutability, come up again later.)

Proposal: an explicit Claim trait distinguishing “cheap, infallible, transparent” clones

Now imagine we introduced a new trait Claim. This would be a subtrait of Clonethat indicates that cloning is:

Cheap: Claiming should complete in O(1) time and avoid copying more than a few cache lines (64-256 bytes on current arhictectures).
Infallible: Claim should not encounter failures, even panics or aborts, under any circumstances. Memory allocation is not allowed, as it can abort if memory is exhausted.
Transparent: The old and new value should behave the same with respect to their public API.

The trait itself could be defined like so:2

trait Claim: Clone {
    fn claim(&self) -> Self {
        self.clone()
    }
}

Now when I see code calling map.claim(), even without knowing what the type of map is, I can be reasonably confident that this is a “cheap clone”. Moreover, if my code is refactored so that map is no longer ref-counted, I will start to get compilation errors, letting me decide whether I want to clone here (potentially expensive) or find some other solution.

Step 2: Claiming values in assignments

In Rust today, values are moved when accessed unless their type implement the Copy trait. This means (among other things) that given a ref-counted map: Rc<HashMap<K, V>>, using the value map will mean that I can’t use map anymore. So e.g. if I do some_operation(map), then gives my handle to some_operation, preventing me from using it again.

Not all memcopies should be ‘quiet’

The intention of this rule is that something as simple as x = y should correspond to a simple operation at runtime (a memcpy, specifically) rather than something extensible. That, I think, is laudable. And yet the current rule in practice has some issues:

First, x = y can still result in surprising things happening at runtime. If y: [u8; 1024], for example, then a few simple calls like process1(y); process2(y); can easily copy large amounts of data (you probably meant to pass that by reference).
Second, seeing x = y.clone() (or even x = y.claim()) is visual clutter, distracting the reader from what’s really going on. In most applications, incrementing ref counts is simply not that interesting that it needs to be called out so explicitly.

Some things that should implement Copy do not

There’s a more subtle problem: the current rule means adding Copy impls can create correctness hazards. For example, many iterator types like std::ops::Range<u32> and std::vec::Iter<u32> could well be Copy, in the sense that they are safe to memcpy. And that would be cool, because you could put them in a Cell and then use get/set to manipulate them. But we don’t implement Copy for those types because it would introduce a subtle footgun:

let mut iter0 = vec.iter();
let mut iter1 = iter0;
iter1.next(); // does not effect `iter0`

Whether this is surprising or not depends on how well you know Rust – but definitely it would be clearer if you had to call clone explicitly:

let mut iter0 = vec.iter();
let mut iter1 = iter0.clone();
iter1.next();

Similar considerations are the reason we have not made Cell<u32> implement Copy.

The clone/copy rules interact very poorly with closures

The biggest source of confusion when it comes to clone/copy, however, is not about assignments like x = y but rather closures and async blocks. Combining ref-counted values with closures is a big stumbling block for new users. This has been true as long as I can remember. Here for example is a 2014 talk at Strangeloop in which the speaker devotes considerable time to the “accidental complexity” (their words, but I agree) they encountered navigating cloning and closures (and, I will note, how the term clone is misleading because it doesn’t mean a deep clone). I’m sorry to say that the situation they describe hasn’t really improved much since then. And, bear in mind, this speaker is a skilled programmer. Now imagine a novice trying to navigate this. Oh boy.

But it’s not just beginners who struggle! In fact, there isn’t really a convenient way to manage the problem of having to clone a copy of a ref-counted item for a closure’s use. At the RustNL unconf, Jonathan Kelley, who heads up the Dioxus Labs, described how at CloudFlare codebase they spent significant time trying to find the most ergonomic way to thread context (and these are not Rust novices).

In that setting, they had a master context object cx that had a number of subsystems, each of which was ref-counted. Before launching a new task, they would handle out handles to the subsystems that task required (they didn’t want every task to hold on to the entire context). They ultimately landed on a setup like this, which is still pretty painful:

let _io = cx.io.clone():
let _disk = cx.disk.clone():
let _health_check = cx.health_check.clone():
tokio::spawn(async move {
    do_something(_io, _disk, _health_check)
})

You can make this (in my opinion) mildly better by leveraging variable shadowing, but even then, it’s pretty verbose:

tokio::spawn({
    let io = cx.io.clone():
    let disk = cx.disk.clone():
    let health_check = cx.health_check.clone():
    async move {
        do_something(io, disk, health_check)
    }
})

What you really want is to just write something like this, like you would in Swift or Go or most any other modern language:3

tokio::spawn(async move {
    do_something(cx.io, cx.disk, cx.health_check)
})

“Autoclaim” to the rescue

What I propose is to modify the borrow checker to automatically invoke claim as needed. So e.g. an expression like x = y would be automatically converted to x = y.claim() if y will be used again later. And closures that capture variables in their environment would respect auto-claim as well, so move || process(y) would become { let y = y.claim(); move || process(y) } if y were used again later.

Autoclaim would not apply to the last use of a variable. So x = y only introduces a call to claim if it is needed to prevent an error. This avoids unnecessary reference counting.

Naturally, if the type of y doesn’t implement Claim, we would give a suitable error explaining that this is a move and the user should insert a call to clone if they want to make a cloned value.

Support opt-out with an allow-by-default lint

There is definitely some code that benefits from having the distinction between moving an existing handle and claiming a new one made explicit. For these cases, what I think we should do is add an “allow-by-default” automatic-claim lint that triggers whenever the compiler inserts a call to claim on a type that is not Copy. This is a signal that user-supplied code is running.

To aid in discovery, I would consider a automatic-operations lint group for these kind of “almost always useful, but sometimes not” conveniences; effectively adopting the profile pattern I floated at one point, but just by making it a lint group. Crates could then add automatic-operations = 'deny" (bikeshed needed) in the [lints] section of their Cargo.toml.

Step 3. Stop using Copy to control moves

Adding “autoclaim” addresses the ergonomic issues around having to call clone, but it still means that anything which is Copy can be, well, copied. As noted before that implies performance footguns ([u8;1024] is probably not something to be copied lightly) and correctness hazards (neither is an iterator).

The real goal should be to disconnect “can be memcopied” and “can be automatically copied”4. Once we have “autoclaim”, we can do that, thanks to the magic of lints and editions:

In Rust 2024 and before, we warn when x = y copies a value that is Copy but not Claim.
In the next Rust edition (Rust 2027, presumably), we make it a hard error so that the rule is just tied to Claim trait.

At codegen time, I would still expect us to guarantee that x = y will memcpy and will not invoke y.claim(), since technically the Clone impl may not be the same behavior; it’d be nice if we could extend this guarantee to any call to clone, but I don’t know how to do that, and it’s a separate problem. Furthermore, the automatic_claims lint would only apply to types that don’t implement Copy.5

Frequently asked questions

All right, I’ve laid out the proposal, let me dive into some of the questions that usually come up.

Are you ??!@$!$! nuts???

I mean, maybe? The Copy/Clone split has been a part of Rust for a long time6. But from what I can see in real codebases and daily life, the impact of this change would be a net-positive all around:

For most code, they get less clutter and less confusing error messages but the same great Rust taste (i.e., no impact on reliability or performance).
Where desired, projects can enable the lint (declaring that they care about performance as a side benefit). Furthermore, they can distinguish calls to claim (cheap, infallible, transparent) from calls to clone (anything goes).

What’s not to like?

What kind of code would #[deny(automatic_claims)]?

That’s actually an interesting question! At first I thought this would correspond to the “high-level, business-logic-oriented code” vs “low-level systems software” distinction, but I am no longer convinced.

For example, I spoke with someone from Rust For Linux who felt that autoclaim would be useful, and it doesn’t get more low-level than that! Their basic constraint is that they want to track carefully where memory allocation and other fallible operations occur, and incrementing a reference count is fine.

I think the real answer is “I’m not entirely sure”, we have to wait and see! I suspect it will be a fairly small, specialized set of projects. This is part of why I this this is a good idea.

Well my code definitely wants to track when ref-counts are incremented!

I totally get that! And in fact I think this proposal actually helps your code:

By setting #![deny(automatic_claims)], you declare up front the fact that reference counts are something you track carefully. OK, I admit not everything will consider this a pro. Regardless, it’s a 1-time setup cost.
By distinguishing claim from clone, your project avoids surprising performance footguns (this seems inarguably good).
In the next edition, when we no longer make Copy implicitly copy, you further avoid the footguns associated with that (also inarguably good).

Is this revisiting RFC 936?

Ooh, deep cut! RFC 936 was a proposal to split Pod (memcopyable values) from Copy (implicitly memcopyable values). At the time, we decided not to do this.7 I am even the one who summarized the reasons. The short version is that we felt it better to have a single trait and lints.

I am definitely offering another alternative aiming at the same problem identified by the RFC. I don’t think this means we made the wrong decision at the time. The problem was real, but the proposed solutions were not worth it. This proposal solves the same problems and more, and it has the benefit of ~10 years of experience.8 (Also, it’s worth pointing out that this RFC came two months before 1.0, and I definitely feel to avoid derailing 1.0 with last minute changes – stability without stagnation!)

Doesn’t having these “profile lints” split Rust?

A good question. Certainly on a technical level, there is nothing new here. We’ve had lints since forever, and we’ve seen that many projects use them in different ways (e.g., customized clippy levels or even – like the linux kernel – a dedicated custom linter). An important invariant is that lints define “subsets” of Rust, they don’t change it. Any given piece of code that compiles always means the same thing.

That said, the profile pattern does lower the cost to adding syntactic sugar, and I see a “slippery slope” here. I don’t want Rust to fundamentally change its character. We should still be aiming at our core constituency of programs that prioritize performance, reliability, and long-term maintenance.

How will we judge when an ergonomic change is “worth it”?

I think we should write up some design axioms. But it turns out we already have a first draft! Some years back Aaron Turon wrote an astute analysis in the “ergonomics initiative” blog post. He identified three axes to consider:

Applicability. Where are you allowed to elide implied information? Is there any heads-up that this might be happening?

Power. What influence does the elided information have? Can it radically change program behavior or its types?

Context-dependence. How much of do you have to know about the rest of the code to know what is being implied, i.e. how elided details will be filled in? Is there always a clear place to look?

Aaron concluded that "implicit features should balance these three dimensions. If a feature is large in one of the dimensions, it’s best to strongly limit it in the other two." In the case of autoclaim, the applicability is high (could happen a lot with no heads up) and the context dependence is medium-to-large (you have to know the types of things and traits they implement). We should therefore limit power, and this is why we put clear guidelines on who should implement Claim. And of course for the cases where that doesn’t suffice, the lint can limit the applicability to zero.

I like this analysis. I also want us to consider “who will want to opt-out and why” and see if there are simple steps (e.g., ruling out allocation) we can take which will minimize that while retaining the feature’s overall usefulness.

What about explicit closure autoclaim syntax?

In a recent lang team meeting Josh raised the idea of annotating closures (and presumably async blocks) with some form of syntax that means “they will auto-capture things they capture”. I find the concept appealing because I like having an explicit version of automatic syntax; also, projects that deny automatic_claim should have a lightweight alternative for cases where they want to be more explicit. However, I’ve not seen any actual specific proposal and I can’t think of one myself that seems to carry its weight. So I guess I’d say “sure, I like it, but I would want it in addition to what is in this blog post, not instead of”.

What about explicit closure capture clauses?

Ah, good question! It’s almost like you read my mind! I was going to add to the previous question that I do like the idea of having some syntax for “explicit capture clauses” on closures.

Today, we just have || $body (which implicitly captures paths in $body in some mode) and move || $body (which implicitly captures paths in $body by value).

Some years ago I wrote a draft RFC in a hackmd that I still mostly like (I’d want to revisit the details). The idea was to expand move to let it be more explicit about what is captured. So move(a, b) || $body would capture only a and b by value (and error if $body references other variables). But move(&a, b) || $body would capture a = &a. And move(a.claim(), b) || $body would capture a = a.claim().

This is really attacking a different problem, the fact that closure captures have no explicit form, but it also gives a canonical, lighterweight pattern for “claiming” values from the surrounding context.

How did you come up with the name Claim?

I thought Jonathan Kelley suggested it to me, but reviewing my notes I see he suggested Capture. Well, that’s a good name too. Maybe even a better one! I’ve already written this whole damn blog post using the name Claim, so I’m not going to go change it now. But I’d expect a proper bikeshed before taking any real action.

I love Wikipedia (of course), but using the name passive data structure (which I have never heard before) instead of plain old data feels very… well, very Wikipedia. ↩︎
In point of fact, I would prefer if we could define the claim method as “final”, meaning that it cannot be overridden by implementations, so that we would have a guarantee that x.claim() and x.clone() are identical. You can do this somewhat awkwardly by defining claim in an extension trait, like so, but it’d be a bit embarassing to have that in the standard library. ↩︎
Interestingly, when I read that snippet, I had a moment where I thought “maybe it should be async move { do_something(cx.io.claim(), ...) }?”. But of course that won’t work, that would be doing the claim in the future, whereas we want to do it before. But it really looks like it should work, and it’s good evidence for how non-obvious this can be. ↩︎
In effect I am proposing to revisit the decision we made in RFC 936, way back when. Actually, I have more thoughts on this, I’ll leave them to a FAQ! ↩︎
Oooh, that gives me an idea. It would be nice if in addition to writing x.claim() one could write x.copy() (similar to iter.copied()) to explicitly indicate that you are doing a memcpy. Then the compiler rule is basicaly that it will insert either x.claim() or x.copy() as appropriate for types that implement Claim. ↩︎
I’ve noticed I’m often more willing to revisit long-standing design decisions than others I talk to. I think it comes from having been present when the decisions were made. I know most of them were close calls and often began with “let’s try this for a while and see how it feels…”. Well, I think it comes from that and a certain predilection for recklessness. 🤘 ↩︎
This RFC is so old it predates rfcbot! Look how informal that comment was. Astounding. ↩︎
This seems to reflect the best and worst of Rust decision making. The best because autoclaim represents (to my mind) a nice “third way” in between two extreme alternatives. The worst because the rough design for autoclaim has been clear for years but it sometimes takes a long time for us to actually act on things. Perhaps that’s just the nature of the beast, though. ↩︎

https://smallcultfollowing.com/babysteps/blog/2024/06/21/claim-auto-and-otherwise/

The borrow checker within

Jun 2, 2024 Updated Jun 2, 2024

Show full content

This post lays out a 4-part roadmap for the borrow checker that I call “the borrow checker within”. These changes are meant to help Rust become a better version of itself, enabling patterns of code which feel like they fit within Rust’s spirit, but run afoul of the letter of its law. I feel fairly comfortable with the design for each of these items, though work remains to scope out the details. My belief is that a-mir-formality will make a perfect place to do that work.

Rust’s spirit is mutation xor sharing

When I refer to the spirit of the borrow checker, I mean the rules of mutation xor sharing that I see as Rust’s core design ethos. This basic rule—that when you are mutating a value using the variable x, you should not also be reading that data through a variable y—is what enables Rust’s memory safety guarantees and also, I think, contributes to its overall sense of “if it compiles, it works”.

Mutation xor sharing is, in some sense, neither necessary nor sufficient. It’s not necessary because there are many programs (like every program written in Java) that share data like crazy and yet still work fine1. It’s also not sufficient in that there are many problems that demand some amount of sharing – which is why Rust has “backdoors” like Arc<Mutex<T>>, AtomicU32, and—the ultimate backdoor of them all—unsafe.

But to me the biggest surprise from working on Rust is how often this mutation xor sharing pattern is “just right”, once you learn how to work with it2. The other surprise has been seeing the benefits over time: programs written in this style are fundamentally “less surprising” which, in turn, means they are more maintainable over time.

In Rust today though there are a number of patterns that are rejected by the borrow checker despite fitting the mutation xor sharing pattern. Chipping away at this gap, helping to make the borrow checker’s rules a more perfect reflection of mutation xor sharing, is what I mean by the borrow checker within.

I saw the angel in the marble and carved until I set him free. — Michelangelo

OK, enough inspirational rhetoric, let’s get to the code.

Ahem, right. Let’s do that.

Step 1: Conditionally return references easily with “Polonius”

Rust 2018 introduced “non-lexical lifetimes” — this rather cryptic name refers to an extension of the borrow checker so that it understood the control flow within functions much more deeply. This change made using Rust a much more “fluid” experience, since the borrow checker was able to accept a lot more code.

But NLL does not handle one important case3: conditionally returning references. Here is the canonical example, taken from Remy’s Polonius update blog post:

fn get_default<'r, K: Hash + Eq + Copy, V: Default>(
    map: &'r mut HashMap<K, V>,
    key: K,
) -> &'r mut V {
    match map.get_mut(&key) {
        Some(value) => value,
        None => {
            map.insert(key, V::default());
            //  ------ 💥 Gets an error today,
            //            but not with polonius
            map.get_mut(&key).unwrap()
        }
    }
}

Remy’s post gives more details about why this occurs and how we plan to fix it. It’s mostly accurate except that the timeline has stretched on more than I’d like (of course). But we are making steady progress these days.

Step 2: A syntax for lifetimes based on places

The next step is to add an explicit syntax for lifetimes based on “place expressions” (e.g., x or x.y). I wrote about this in my post Borrow checking without lifetimes. This is basically taking the formulation that underlies Polonius and adding a syntax.

The idea would be that, in addition to the abstract lifetime parameters we have today, you could reference program variables and even fields as the “lifetime” of a reference. So you could write ’x to indicate a value that is “borrowed from the variable x”. You could also write ’x.y to indicate that it was borrowed from the field y of x, and even '(x.y, z) to mean borrowed from either x.y or z. For example:

struct WidgetFactory {
    manufacturer: String,
    model: String,
}

impl WidgetFactory {
    fn new_widget(&self, name: String) -> Widget {
        let name_suffix: &’name str = &name[3..];
                       // ——- borrowed from “name”
        let model_prefix: &’self.model str = &self.model[..2];
                         // —————- borrowed from “self.model”
    }
}

This would make many of lifetime parameters we write today unnecessary. For example, the classic Polonius example where the function takes a parameter map: &mut Hashmap<K, V> and returns a reference into the map can be written as follows:

fn get_default<K: Hash + Eq + Copy, V: Default>(
    map: &mut HashMap<K, V>,
    key: K,
) -> &'map mut V {
    //---- "borrowed from the parameter map"
    ...
}

This syntax is more convenient — but I think its bigger impact will be to make Rust more teachable and learnable. Right now, lifetimes are in a tricky place, because

they represent a concept (spans of code) that isn’t normal for users to think explicitly about and
they don’t have any kind of syntax.

Syntax is useful when learning because it allows you to make everything explicit, which is a critical intermediate step to really internalizing a concept — what boats memorably called the dialectical ratchet. Anecdotally I’ve been using a “place-based” syntax when teaching people Rust and I’ve found it is much quicker for them to grasp it.

Step 3: View types and interprocedural borrows

The next piece of the plan is view types, which are a way to have functions declare which fields they access. Consider a struct like WidgetFactory…

struct WidgetFactory {
    counter: usize,
    widgets: Vec<Widget>,
}

…which has a helper function increment_counter…

impl WidgetFactory {
    fn increment_counter(&mut self) {
        self.counter += 1;
    }
}

Today, if we want to iterate over the widgets and occasionally increment the counter with increment_counter, we will encounter an error:

impl WidgetFactory {
    fn increment_counter(&mut self) {...}
    
    pub fn count_widgets(&mut self) {
        for widget in &self.widgets {
            if widget.should_be_counted() {
                self.increment_counter();
                // ^ 💥 Can't borrow self as mutable
                //      while iterating over `self.widgets`
            }
        }    
    }
}

The problem is that the borrow checker operates one function at a time. It doesn’t know precisely which fields increment_counter is going to mutate. So it conservatively assumes that self.widgets may be changed, and that’s not allowed. There are a number of workarounds today, such as writing a “free function” that doesn’t take &mut self but rather takes references to the individual fields (e.g., counter: &mut usize) or even collecting those references into a “view struct” (e.g., struct WidgetFactoryView<'a> { widgets: &'a [Widget], counter: &'a mut usize }) but these are non-obvious, annoying, and non-local (they require changing significant parts of your code)

View types extend struct types so that instead of just having a type like WidgetFactory, you can have a “view” on that type that included only a subset of the fields, like {counter} WidgetFactory. We can use this to modify increment_counter so that it declares that it will only access the field counter:

impl WidgetFactory {
    fn increment_counter(&mut {counter} self) {
        //               -------------------
        // Equivalent to `self: &mut {counter} WidgetFactory`
        self.counter += 1;
    }
}

This allows the compiler to compile count_widgets just fine, since it can see that iterating over self.widgets while modifying self.counter is not a problem.4

View types also address phased initialization

There is another place where the borrow checker’s rules fall short: phased initialization. Rust today follows the functional programming language style of requiring values for all the fields of a struct when it is created. Mostly this is fine, but sometimes you have structs where you want to initialize some of the fields and then invoke helper functions, much like increment_counter, to create the remainder. In this scenario you are stuck, because those helper functions cannot take a reference to the struct since you haven’t created the struct yet. The workarounds (free functions, intermediate struct types) are very similar.

Start with private functions, consider scaling to public functions

View types as described here have limitations. Because the types involve the names of fields, they are not really suitable for public interfaces. They could also be annoying to use in practice because one will have sets of fields that go together that have to be manually copied and pasted. All of this is true but I think something that can be addressed later (e.g., with named groups of fields).

What I’ve found is that the majority of times that I want to use view types, it is in private functions. Private methods often do little bits of logic and make use of the struct’s internal structure. Public methods in contrast tend to do larger operations and to hide that internal structure from users. This isn’t a universal law – sometimes I have public functions that should be callable concurrently – but it happens less.

There is also an advantage to the current behavior for public functions in particular: it preserves forward compatibilty. Taking &mut self (versus some subset of fields) means that the function can change the set of fields that it uses without affecting its clients. This is not a concern for private functions.

Step 4: Internal references

Rust today cannot support structs whose fields refer to data owned by another. This gap is partially closed through crates like rental (no longer maintained), though more often by modeling internal references with indices. We also have Pin, which covers the related (but even harder) problem of immobile data.

I’ve been chipping away at a solution to this problem for some time. I won’t be able to lay it out in full in this post, but I can sketch what I have in mind, and lay out more details in future posts (I have done some formalization of this, enough to convince myself it works).

As an example, imagine that we have some kind of Message struct consisting of a big string along with several references into that string. You could model that like so:

struct Message {
    text: String,
    headers: Vec<(&'self.text str, &'self.text str)>,
    body: &'self.text str,
}

This message would be constructed in the usual way:

let text: String = parse_text();
let (headers, body) = parse_message(&text);
let message = Message { text, headers, body };

where parse_message is some function like

fn parse_message(text: &str) -> (
    Vec<(&'text str, &'text str)>,
    &'text str
) {
    let mut headers = vec![];
    // ...
    (headers, body)
}

Note that Message doesn’t have any lifetime parameters – it doesn’t need any, because it doesn’t borrow from anything outside of itself. In fact, Message: 'static is true, which means that I could send this Message to another thread:

// A channel of `Message` values:
let (tx, rx) = std::sync::mpsc::channel();

// A thread to consume those values:
std::thread::spawn(move || {
    for message in rx {
        // `message` here has type `Message`
        process(message.body);
    }
});

// Produce them:
loop {
    let message: Message = next_message();
    tx.send(message);
}

How far along are each of these ideas?

Roughly speaking…

Polonius – ‘just’ engineering
Syntax – ‘just’ bikeshedding
View types – needs modeling, one or two open questions in my mind5
Internal references – modeled in some detail for a simplified variant of Rust, have to port to Rust and explain the assumptions I made along the way6

…in other words, I’ve done enough work to to convince myself that these designs are practical, but plenty of work remains. :)

How do we prioritize this work?

Whenever I think about investing in borrow checker ergonomics and usability, I feel a bit guilty. Surely something so fun to think about must be a bad use of my time.

Conversations at RustNL shifted my perspective. When I asked people about pain points, I kept hearing the same few themes arise, especially from people trying building applications or GUIs.

I now think I had fallen victim to the dreaded “curse of knowledge”, forgetting how frustrating it can be to run into a limitation of the borrow checker and not know how to resolve it.

Conclusion

This post proposes four changes attacking some very long-standing problems:

Conditionally returned references, solved by Polonius
No or awkward syntax for lifetimes, solved by an explicit lifetime syntax
Helper methods whose body must be inlined, solved by view types
Can’t “package up” a value and references into that value, solved by interior references

You may have noticed that these changes build on one another. Polonius remodels borrowing in terms of “place expressions” (variables, fields). This enables an explicit lifetime syntax, which in turn is a key building block for interior references. View types in turn let us expose helper methods that can operate on ‘partially borrowed’ (or even partially initialized!) values.

Why these changes won’t make Rust “more complex” (or, if they do, it’s worth it)

You might wonder about the impact of these changes on Rust’s complexity. Certainly they grow the set of things the type system can express. But in my mind they, like NLL before them, fall into that category of changes that will actually make using Rust feel simpler overall.

To see why, put yourself in the shoes of a user today who has written any one of the “obviously correct” programs we’ve seen in this post – for example, the WidgetFactory code we saw in view types. Compiling this code today gives an error:

error[E0502]: cannot borrow `*self` as mutable
              because it is also borrowed as immutable
  --> src/lib.rs:14:17
   |
12 | for widget in &self.widgets {
   |               -------------
   |               |
   |               immutable borrow occurs here
   |               immutable borrow later used here
13 |     if widget.should_be_counted() {
14 |         self.increment_counter();
   |         ^^^^^^^^^^^^^^^^^^^^^^^^
   |         |
   |         mutable borrow occurs here

Despite all our efforts to render it well, this error is inherently confusing. It is not possible to explain why WidgetFactory doesn’t work from an “intuitive” point-of-view because conceptually it ought to work, it just runs up against a limit of our type system.

The only way to understand why WidgetFactory doesn’t compile is to dive deeper into the engineering details of how the Rust type system functions, and that is precisely the kind of thing people don’t want to learn. Moreover, once you’ve done that deep dive, what is your reward? At best you can devise an awkward workaround. Yay 🥳.7

Now imagine what happens with view types. You still get an error, but now that error can come with a suggestion:

help: consider declaring the fields
      accessed by `increment_counter` so that
      other functions can rely on that
 7 | fn increment_counter(&mut self) {
   |                      ---------
   |                      |
   |      help: annotate with accessed fields: `&mut {counter} self`

You now have two choices. First, you can apply the suggestion and move on – your code works! Next, at your leisure, you can dig in a bit deeper and understand what’s going on. You can learn about the semver hazards that motivate an explicit declaration here.

Yes, you’ve learned a new detail of the type system, but you did so on your schedule and, where extra annotations were required, they were well-motivated. Yay 🥳!8

Reifying the borrow checker into types

There is another theme running through here: moving the borrow checker analysis out from the compiler’s mind and into types that can be expressed. Right now, all types always represent fully initialized, unborrowed values. There is no way to express a type that captures the state of being in the midst of iterating over something or having moved one or two fields but not all of them. These changes address that gap.9

This conclusion is too long

I know, I’m like Peter Jackson trying to end “The Return of the King”, I just can’t do it! I keep coming up with more things to say. Well, I’ll stop now. Have a nice weekend y’all.

Well, every program written in Java does share data like crazy, but they do not all work fine. But you get what I mean. ↩︎
And I think learning how to work with mutation xor sharing is a big part of what it means to learn Rust. ↩︎
NLL as implemented, anyway. The original design was meant to cover conditionally returning references, but the proposed type system was not feasible to implement. Moreover, and I say this as the one who designed it, the formulation in the NLL RFC was not good. It was mind-bending and hard to comprehend. Polonius is much better. ↩︎
In fact, view types will also allow us to implement the “disjoint closure capture” rules from RFC 2229 in a more efficient way. Currently a closure using self.widgets and self.counter will store 2 references, kind of an implicit “view struct”. Although we found this doesn’t really affect much code in practice, it still bothers me. With view types they could store 1. ↩︎
To me, the biggest open question for view types is how to accommodate “strong updates” to types. I’d like to be able to do let mut wf: {} WidgetFactory = WidgetFactory {} to create a WidgetFactory value that is completely uninitialized and then permit writing (for example) wf.counter = 0. This should update the type of wf to {counter} WidgetFactory. Basically I want to link the information found in types with the borrow checker’s notion of what is initialized, but I haven’t worked that out in detail. ↩︎
As an example, to make this work I’m assuming some kind of “true deref” trait that indicates that Deref yields a reference that remains valid even as the value being deref’d moves from place to place. We need a trait much like this for other reasons too. ↩︎
That’s a sarcastic “Yay 🥳”, in case you couldn’t tell. ↩︎
This “Yay 🥳” is genuine. ↩︎
I remember years ago presenting Rust at some academic conference and a friendly professor telling me, “In my experience, you always want to get that state into the type system”. I think that professor was right, though I don’t regret not prioritizing it (always a million things to do, better to ask what is the right next step now than to worry about what step might’ve been better in the past). Anyway, I wish I could remember who that was! ↩︎

https://smallcultfollowing.com/babysteps/blog/2024/06/02/the-borrow-checker-within/

Unwind considered harmful?

May 2, 2024 Updated May 2, 2024

Show full content

I’ve been thinking a wild thought lately: we should deprecate panic=unwind. Most production users I know either already run with panic=abort or use unwinding in a very limited fashion, basically just to run to cleanup, not to truly recover. Removing unwinding from most case meanwhile has a number of benefits, allowing us to extend the type system in interesting and potentially very impactful ways. It also removes a common source of subtle bugs. Note that I am not saying we should remove unwinding entirely: that’s not an option, both because of stability and because of Rust’s mission to “deeply integrate” with all kinds of languages and systems.

Unwinding means all code must be able to stop at every point

Unwinding puts a “non-local burden” on the language. The fundamental premise of unwinding is that it should be possible for all code to just stop execution at any point (or at least at any function call) and then be restarted. But this is not always possible. Sometimes code disturbs invariants which must be restored before execution can continue in a reasonable way.

The impact of unwinding was supposed to be contained

In Graydon’s initial sketches for Rust’s design, he was very suspicious of unwinding.1 Unwinding introduces implicit control flow that is difficult to reason about. Worse, this control flow doesn’t surface during “normal execution”, it only shows up when things go wrong — this can tend to pile up, making a bad situation worse.

The initial idea was that unwinding would be allowed, but it would always unwinding the entire active thread. Moreover, since in very early Rust threads couldn’t share state at all (it was more like Erlang), that limited the damage that a thread could do. It was reasonable to assume that programs could recover.

But it escaped its bounds

Over time, both of the invariants that limited unwinding’s scope proved untenable. Most importantly, we added shared-mutability with types like Mutex. This was necessary to cover the full range of use cases Rust aims to cover, but it meant that it was now possible for threads to leave data in a disturbed state. We added “lock poisoning” to account for that, but it’s an ergonomic annoyance and an imperfect solution, and so libraries like parking_lot have simply removed it.

We also added catch_unwind, allowing recovery within a thread. This was meant to be used in libraries like rayon that were simulating many logical threads with one OS thread, but it of course opened the door to “catching” exceptions in other scenarios. We added the idea of UnwindSafe to try and discourage abuse, but (in a familiar theme) it’s an ergonomic annoyance and an imperfect solution, and so many folks would prefer to just remove it.

Unwinding increases binary size and reduces optimization potential

Unwinding is supposed to be a “zero-cost abstraction”, but it’s not really. To start, it requires inserting “landing pads” — basically, the code that will execute when unwinding occurs — which can take up quite a large amount of space in your binary. Folks like Fuchsia have measured binary size improvements of up to 10% by removing unwinding. Second, the need to account for unwinding limits optimizations, because the compiler has to account for more control-flow paths. I don’t have a number for how high of an impact this is, but it’s clearly not zero.

Unwinding puts limits on the borrow checker

Accounting for unwinding also requires the borrow checker to be more conservative. Consider for example the function std::mem::swap. It’d be nice if one could write this in safe code:

fn swap<T>(
    a: &mut T,
    b: &mut T,
) {
    let tmp = *a;
    *a = *b;
    *b = tmp;
}

This code won’t compile today, because let tmp = *a requires moving out of *a, and a is an &mut reference. That would leave the reference in an “incomplete” state, so we don’t allow it. But is that constraint truly needed? After all, the reference is going to be restored a few lines below…?

The reason the borrow checker does not accept code like the above is due to unwinding. In general, if you move out of an &mut, you leave a hole behind that MUST be filled before the function returns. In the function above, it is in fact guaranteed that the hole will be filled before swap returns. But in general there is a very narrow range of code that can safely execute, since any function call (and many other operations besides) can initiate a panic!. And if unwinding occurred, then the code that restores the &mut value would never execute. For this reason, we deemed it not worth the complexity to support moving out of &mut references.

Unwinding prevents code from running to completion

If the only cost of unwinding was moving out of &mutand inflated binary sizes, I would think that it’s probably worth it to keep it. But over time it’s become clear to me that this is just one special case of a more general challenge with unwinding, which is that functions simply cannot rely on running to completion. This creates challenges in a number of areas.

Unwinding makes unsafe code really hard to write

If you are writing unsafe code, you have to be very careful to account for possible unwinding. And it can occur in a lot of places! Some of them are obvious, such as when the user gives you a closure and you call it. Others are less obvious, such as when you call a trait method like x.clone() where x has some unknown type T: Clone. Others are downright obscure, such as when you execute vec[i] = new_value and vec is a Vec<T> for some unknown type T — that last one will run the destructor on vec[i] , which can panic, and hence can unwind (at least until RFC #3288 is accepted). When developing Rayon, I found I could not feasibly track all the places that unwinding could occur, and thus gave up and just added code to abort if unwinding occurs when I don’t expect it.

Unwinding makes Must Move types untenable

In a previous blog post I wrote about the idea of must move types. I am not sure if this idea is worth it on balance (although I think it might be, it addresses an awful lot of scenarios) but I think it will not be workable with unwinding. And the reason is the same as everything else: the point of a “must move” type is that it must be moved before the fn ends. This effectively means there is some kind of action you must take. But unwinding assumes you can stop the function at any point, so you can never guarantee that this action gets taken (at least, not in a practical sense, in principle you could setup destructors to take the action, but it would be unworkable I think).

Unwinding is of course useful

I’ve been dunking on unwinding, but it is of course useful (although I suspect less broadly than is commonly believed). The most obvious use case is recovering in an “event-driven” sort of process, like a webserver or perhaps a GUI. We’ve all been to websites that dump a stack trace on our screen. Unwinding is one way that you could implement this sort of recovery in Rust. It’s not, however, the only way. We could look into constructs that leverage process-based recovery, for example. And of course unwinding-based recovery is a bit risky, if there is shared state. Plus, in practice, a good many things that become exceptions in Java are Result-return values in Rust.

For me, the key thing here is that virtually every network service I know of ships either with panic=abort or without really leveraging unwinding to recover, just to take cleanup actions and then exit. This could be done with panic=abort and exit handlers.

One other place that uses unwinding is the salsa framework, which uses it to abort cancelled operations in IDEs. It’s useful there because all the code is side-effect free, so we really can unwinding without any impact. But we could always find another solution to the problem.

Unwinding is in fact required…but only in narrow places

I don’t really think Rust should remove support for unwinding, of course. For one thing, there is backwards compatibility to consider. But for another, I think that Rust ought to have the goal that it ultimately supports any low-level thing you might want to do. There are C++ systems that use exceptions, and Rust ought to interoperate with them. But I don’t think that means the default across all of Rust should be unwinding: it’s more like “something you need in a narrow part of your codebase so you can convert to Result”.

Conclusion

I think the argument for deprecating unwinding boils down to this: unwinding purports to make cheap recovery tenable, but it’s not really reliable in the face of shared state. Meanwhile, it puts limits on what we can do in the language, ultimately decreasing reliability (because we can’t guarantee cleanup is done) and ease of use (borrow checker is stricter, APIs that would require cleanup can’t be written).

How could we deprecate it, though? It would basically become part of the ABI, much like C vs C-unwind. It’d be possible to opt-in on a finer-grained basis. In functions that are guaranteed not to have unwinding, the borrow checker could be more permissive, and must-move types could be supported.

I’m definitely tempted to sketch out what deprecating unwinding might look like in more detail. I’d be curious to hear from folks that rely on unwinding to better understand where it is useful— and if we can find alternatives that meet the need in a more narrowly tailored way!

For a time, we were exploring an alternative approach to panics called signals that didn’t use unwinding at all – the idea was that, for each error condition, you would expose a hook point (a “signal”) that users could customize to control what to do in the case of error. This proved a bit too unfamiliar and kind of a pain in practice, and we wound up backing away from it. Today’s panic hook is sort of a simpler version of that (it doesn’t support in-place recovery, but it does enable in-place cleanup). ↩︎

https://smallcultfollowing.com/babysteps/blog/2024/05/02/unwind-considered-harmful/

Sized, DynSized, and Unsized

Apr 23, 2024 Updated Apr 23, 2024

Show full content

Extern types have been blocked for an unreasonably long time on a fairly narrow, specialized question: Rust today divides all types into two categories — sized, whose size can be statically computed, and unsized, whose size can only be computed at runtime. But for external types what we really want is a third category, types whose size can never be known, even at runtime (in C, you can model this by defining structs with an unknown set of fields). The problem is that Rust’s ?Sized notation does not naturally scale to this third case. I think it’s time we fixed this. At some point I read a proposal — I no longer remember where — that seems like the obvious way forward and which I think is a win on several levels. So I thought I would take a bit of time to float the idea again, explain the tradeoffs I see with it, and explain why I think the idea is a good change.

TL;DR: write T: Unsized in place of T: ?Sized (and sometimes T: DynSized)

The basic idea is to deprecate the ?Sized notation and instead have a family of Sized supertraits. As today, the default is that every type parameter T gets a T: Sized bound unless the user explicitly chooses one of the other supertraits:

/// Types whose size is known at compilation time (statically).
/// Implemented by (e.g.) `u32`. References to `Sized` types
/// are "thin pointers" -- just a pointer.
trait Sized: DynSized { }

/// Types whose size can be computed at runtime (dynamically).
/// Implemented by (e.g.) `[u32]` or `dyn Trait`.
/// References to these types are "wide pointers",
/// with the extra metadata making it possible to compute the size
/// at runtime.
trait DynSized: Unsized { }

/// Types that may not have a knowable size at all (either statically or dynamically).
/// All types implement this, but extern types **only** implement this.
trait Unsized { }

Under this proposal, T: ?Sized notation could be converted to T: DynSized or T: Unsized. T: DynSized matches the current semantics precisely, but T: Unsized is probably what most uses actually want. This is because most users of T: ?Sized never compute the size of T but rather just refer to existing values of T by pointer.

Credit where credit is due?

For the record, this design is not my idea, but I’m not sure where I saw it. I would appreciate a link so I can properly give credit.

Why do we have a default T: Sized bound in the first place?

It’s natural to wonder why we have this T: Sized default in the first place. The short version is that Rust would be very annoying to use without it. If the compiler doesn’t know the size of a value at compilation time, it cannot (at least, cannot easily) generate code to do a number of common things, such as store a value of type T on the stack or have structs with fields of type T. This means that a very large fraction of generic type parameters would wind up with T: Sized.

So why the ?Sized notation?

The ?Sized notation was the result of a lot of discussion. It satisfied a number of criteria.

? signals that the bound operates in reverse

The ? is meant to signal that a bound like ?Sized actually works in reverse from a normal bound. When you have T: Clone, you are saying “type T must implement Clone”. So you are narrowing the set of types that T could be: before, it could have been both types that implement Clone and those that do not. After, it can only be types that implement Clone. T: ?Sized does the reverse: before, it can only be types that implement Sized (like u32), but after, it can also be types that do not (like [u32] or dyn Debug). Hence the ?, which can be read as “maybe” — i.e., T is “maybe” Sized.

? can be extended to other default bounds

The ? notation also scales to other default traits. Although we’ve been reluctant to exercise this ability, we wanted to leave room to add a new default bound. This power will be needed if we ever adopt “must move” types 1 or add a bound like ?Leak to signal a value that cannot be leaked.

But ? doesn’t scale well to “differences in degree”

When we debated the ? notation, we thought a lot about extensibility to other orthogonal defaults (like ?Leak), but we didn’t consider extending a single dimension (like Sized) to multiple levels. There is no theoretical challenge. In principle we could say…

T means T: Sized + DynSized
T: ?Sized drops the Sized default, leaving T: DynSized
T: ?DynSized drops both, leaving any type T

…but I personally find that very confusing. To me, saying something “might be statically sized” does not signify that it is dynamically sized.

And ? looks “more magical” than it needs to

Despite knowing that T: ?Sized operates in reverse, I find that in practice it still feels very much like other bounds. Just like T: Debug gives the function the extra capability of generating debug info, T: ?Sized feels to me like it gives the function an extra capability: the ability to be used on unsized types. This logic is specious, these are different kinds of capabilities, but, as I said, it’s how I find myself thinking about it.

Moreover, even though I know that T: ?Sized “most properly” means “a type that may or may not be Sized”, I find it wind up thinking about it as “a type that is unsized”, just as I think about T: Debug as a “type that is Debug”. Why is that? Well, beacuse ?Sized types may be unsized, I have to treat them as if they are unsized – i.e., refer to them only by pointer. So the fact that they might also be sized isn’t very relevant.

How would we use these new traits?

So if we adopted the “family of sized traits” proposal, how would we use it? Well, for starters, the size_of methods would no longer be defined as T and T: ?Sized…

fn size_of<T>() -> usize {}
fn size_of_val<T: ?Sized>(t: &T) -> usize {}

… but instead as T and T: DynSized …

fn size_of<T>() -> usize {}
fn size_of_val<T: DynSized>(t: &T) -> usize {}

That said, most uses of ?Sized today do not need to compute the size of the value, and would be better translated to Unsized…

impl<T: Unsized> Debug for &T {
    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) { .. }
}

Option: Defaults could also be disabled by supertraits?

As an interesting extension to today’s system, we could say that every type parameter T gets an implicit Sized bound unless either…

There is an explicit weaker alternative(like T: DynSized or T: Unsized);
Or some other bound T: Trait has an explicit supertrait DynSized or Unsized.

This would clarify that trait aliases can be used to disable the Sized default. For example, today, one might create a Value trait is equivalent to Debug + Hash + Org, roughly like this:

trait Value: Debug + Hash + Ord {
    // Note that `Self` is the *only* type parameter that does NOT get `Sized` by default
}

impl<T: ?Sized + Debug + Hash + Ord> Value for T {}

But what if, in your particular data structure, all values are boxed and hence can be unsized. Today, you have to repeat ?Sized everywhere:

struct Tree<V: ?Sized + Value> {
    value: Box<V>,
    children: Vec<Tree<V>>,
}

impl<V: ?Sized + Value> Tree<V> { … }

With this proposal, the explicit Unsized bound could be signaled on the trait:

trait Value: Debug + Hash + Ord + Unsized {
    // Note that `Self` is the *only* type parameter that does NOT get `Sized` by default
}

impl<T: Unsized + Debug + Hash + Ord> Value for T {}

which would mean that

struct Tree<V: Value> { … }

would imply V: Unsized.

Alternatives Different names

The name of the Unsized trait in particular is a bit odd. It means “you can treat this type as unsized”, which is true of all types, but it sounds like the type is definitely unsized. I’m open to alternative names, but I haven’t come up with one I like yet. Here are some alternatives and the problems with them I see:

Unsizeable — doesn’t meet our typical name conventions, has overlap with the Unsize trait
NoSize, UnknownSize — same general problem as Unsize
ByPointer — in some ways, I kind of like this, because it says “you can work with this type by pointer”, which is clearly true of all types. But it doesn’t align well with the existing Sized trait — what would we call that, ByValue? And it seems too tied to today’s limitations: there are, after all, ways that we can make DynSized types work by value, at least in some places.
MaybeSized — just seems awkward, and should it be MaybeDynSized?

All told, I think Unsized is the best name. It’s a bit wrong, but I think you can understand it, and to me it fits the intuition I have, which is that I mark type parameters as Unsized and then I tend to just think of them as being unsized (since I have to).

Some sigil

Under this proposal, the DynSized and Unsized traits are “magic” in that explicitly declaring them as a bound has the impact of disabling a default T: Sized bound. We could signify that in their names by having their name be prefixed with some sort of sigil. I’m not really sure what that sigil would be — T: %Unsized? T: ?Unsized? It all seems unnecessary.

Drop the implicit bound altogether

The purist in me is tempted to question whether we need the default bound. Maybe in Rust 2027 we should try to drop it altogether. Then people could write

fn size_of<T: Sized>() -> usize {}
fn size_of_val<T: DynSized>(t: &T) -> usize {}

and

impl<T> Debug for &T {
    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) { .. }
}

Of course, it would also mean a lot of Sized bounds cropping up in surprising places. Beyond random functions, consider that every associated type today has a default Sized bound, so you would need

trait Iterator {
    type Item: Sized;
}

Overall, I doubt this idea is worth it. Not surprising: it was deemed too annoying before, and now it has the added problem of being hugely disruptive.

Conclusion

I’ve covered a design to move away from ?Sized bounds and towards specialized traits. There are avrious “pros and cons” to this proposal but one aspect in particular feels common to this question and many others: when do you make two “similar but different” concepts feel very different — e.g., via special syntax like T: ?Sized — and when do you make them feel very similar — e.g., via the idea of “special traits” where a bound like T: Unsized has extra meaning (disabling defaults).

There is a definite trade-off here. Distinct syntax help avoid potential confusion, but it forces people to recognize that something special is going on even when that may not be relevant or important to them. This can deter folks early on, when they are most “deter-able”. I think it can also contribute to a general sense of “big-ness” that makes it feel like understanding the entire language is harder.

Over time, I’ve started to believe that it’s generally better to make things feel similar, letting people push off the time at which they have to learn a new concept. In this case, this lessens my fears around the idea that Unsized and DynSized traits would be confusing because they behave differently than other traits. In this particular case, I also feel that ?Sized doesn’t “scale well” to default bounds where you want to pick from one of many options, so it’s kind of the worst of both worlds – distinct syntax that shouts at you but which also fails to add clarity.

Ultimately, though, I’m not wedded to this idea, but I am interested in kicking off a discussion of how we can unblock extern types. I think by now we’ve no doubt covered the space pretty well and we should pick a direction and go for it (or else just give up on extern types).

I still think “must move” types are a good idea — but that’s a topic for another post. ↩︎

https://smallcultfollowing.com/babysteps/blog/2024/04/23/dynsized-unsized/

Ownership in Rust

Apr 5, 2024 Updated Apr 5, 2024

Show full content

Ownership is an important concept in Rust — but I’m not talking about the type system. I’m talking about in our open source project. One of the big failure modes I’ve seen in the Rust community, especially lately, is the feeling that it’s unclear who is entitled to make decisions. Over the last six months or so, I’ve been developing a project goals proposal, which is an attempt to reinvigorate Rust’s roadmap process — and a key part of this is the idea of giving each goal an owner. I wanted to write a post just exploring this idea of being an owner: what it means and what it doesn’t.

Every goal needs an owner

Under my proposal, the project will identify its top priority goals, and every goal will have a designated owner. This is ideally a single, concrete person, though it can be a small group. Owners are the ones who, well, own the design being proposed. Just like in Rust, when they own something, they have the power to change it.1

Just because owners own the design does not mean they work alone. Like any good Rustacean, they should treasure dissent, making sure that when a concern is raised, the owner fully understands it and does what they can to mitigate or address it. But there always comes a point where the tradeoffs have been laid on the table, the space has been mapped, and somebody just has to make a call about what to do. This is where the owner comes in. Under project goals, the owner is the one we’ve chosen to do that job, and they should feel free to make decisions in order to keep things moving.

Teams make the final decision

Owners own the proposal, but they don’t decide whether the proposal gets accepted. That is the job of the team. So, if e.g. the goal in question requires making a change to the language, the language design team is the one that ultimately decides whether to accept the proposal.

Teams can ultimately overrule an owner: they can ask the owner to come back with a modified proposal that weighs the tradeoffs differently. This is right and appropriate, because teams are the ones we recognize as having the best broad understanding of the domain they maintain.2 But teams should use their power judiciously, because the owner is typically the one who understands the tradeoffs for this particular goal most deeply.

Ownership is empowerment

Rust’s primary goal is empowerment — and that is as true for the open-source org as it is for the language itself. Our goal should be to empower people to improve Rust. That does not mean giving them unfettered ability to make changes — that would result in chaos, not an improved version of Rust — but when their vision is aligned with Rust’s values, we should ensure they have the capability and support they need to realize it.

Ownership requires trust

There is an interesting tension around ownership. Giving someone ownership of a goal is an act of faith — it means that we consider them to be an individual of high judgment who understands Rust and its values and will act accordingly. This implies to me that we are unlikely to take a goal if the owner is not known to the project. They don’t necessarily have to have worked on Rust, but they have to have enough of a reputation that we can evaluate whether they’re going to do a good job.’

The design of project goal proposals includes steps designed to increase trust. Each goal includes a set of design axioms identifying the key tradeoffs that are expected and how they will be weighed against one another. The goal also identifies milestones, which shows that the author has thought about how to breakup and approach the work incrementally.

It’s also worth highlighting that while the project has to trust the owner, the reverse is also true: the project hasn’t always done a good job of making good on its commitments. Sometimes we’ve asked for a proposal on a given feature and then not responded when it arrives.3 Or we set up unbounded queues that wind up getting overfull, resulting in long delays.

The project goal system has steps to build that kind of trust too: the owner identifies exactly the kind of support they expect to require from the team, and the team commits to provide it. Moreover, the general expectation is that any project goal represents an important priority, and so teams should prioritize nominated issues and the like that are related.

Trust requires accountability

Trust is something that has to be maintained over time. The primary mechanism for that in the project goal system is regular reporting. The idea is that, once we’ve identified a goal, we will create a tracking issue. Bots will prompt owners to give regular status updates on the issue. Then, periodically, we will post a blog post that aggregates these status updates. This gives us a chance to identify goals that haven’t been moving — or at least where no status update has been provided — and take a look as to see why.

In my view, it’s expected and normal that we will not make all our goals. Things happen. Sometimes owners get busy with other things. Other times, priorities change and what was once a goal no longer seems relevant. That’s fine, but we do want to be explicit about noticing it has happened. The problem is when we let things live in the dark, so that if you want to really know what’s going on, you have to conduct an exhaustive archaeological expedition through github comments, zulip threads, emails, and sometimes random chats and minutes.

Conclusion

Rust has strong values of being an open, participatory language. This is a good thing and a key part of how Rust has gotten as good as it is. Rust’s design does not belong to any one person. A key part of how we enforce that is by making decisions by consensus.

But people sometimes get confused and think consensus means that everyone has to agree. This is wrong on two levels:

The team must be in consensus, not the RFC thread: in Rust’s system, it’s the teams that ultimately make the decision. There have been plenty of RFCs that the team decided to accept despite strong opposition from the RFC thread (e.g., the ? operator comes to mind). This is right and good. The team has the most context, but the team also gets input from many other sources beyond the people that come to participate in the RFC thread.
Consensus doesn’t mean unanimity: Being in consensus means that a majority agrees with the proposal and nobody thinks that it is definitely wrong. Plenty of proposals are decided where team members have significant, even grave, doubts. But ultimately tradeoffs must be made, and the team members trust one another’s judgment, so sometimes proposals go forward that aren’t made the way you would do it.

The reality is that every good thing that ever got done in Rust had an owner – somebody driving the work to completion. But we’ve never named those owners explicitly or given them a formal place in our structure. I think it’s time we fixed that!

Hat tip to Jack Huey for this turn of phrase. Clever guy. ↩︎
There is a common misunderstanding that being on a Rust team for a project X means you are the one authoring code for X. That’s not the role of a team member. Team members hold the overall design of X in their heads. They review changes and mentor contributors who are looking to make a change. Of course, team members do sometimes write code, too, but in that case they are playing the role of a (particularly knowledgable) contributor. ↩︎
I still feel bad about delegation. ↩︎

https://smallcultfollowing.com/babysteps/blog/2024/04/05/ownership-in-rust/

Borrow checking without lifetimes

Mar 4, 2024 Updated Mar 4, 2024

Show full content

This blog post explores an alternative formulation of Rust’s type system that eschews lifetimes in favor of places. The TL;DR is that instead of having 'a represent a lifetime in the code, it can represent a set of loans, like shared(a.b.c) or mut(x). If this sounds familiar, it should, it’s the basis for polonius, but reformulated as a type system instead of a static analysis. This blog post is just going to give the high-level ideas. In follow-up posts I’ll dig into how we can use this to support interior references and other advanced borrowing patterns. In terms of implementation, I’ve mocked this up a bit, but I intend to start extending a-mir-formality to include this analysis.

Why would you want to replace lifetimes?

Lifetimes are the best and worst part of Rust. The best in that they let you express very cool patterns, like returning a pointer into some data in the middle of your data structure. But they’ve got some serious issues. For one, the idea of what a lifetime is rather abstract, and hard for people to grasp (“what does 'a actually represent?”). But also Rust is not able to express some important patterns, most notably interior references, where one field of a struct refers to data owned by another field.

So what is a lifetime exactly?

Here is the definition of a lifetime from the RFC on non-lexical lifetimes:

Whenever you create a borrow, the compiler assigns the resulting reference a lifetime. This lifetime corresponds to the span of the code where the reference may be used. The compiler will infer this lifetime to be the smallest lifetime that it can have that still encompasses all the uses of the reference.

Read the RFC for more details.

Replacing a lifetime with an origin

Under this formulation, 'a no longer represents a lifetime but rather an origin – i.e., it explains where the reference may have come from. We define an origin as a set of loans. Each loan captures some place expression (e.g. a or a.b.c), that has been borrowed along with the mode in which it was borrowed (shared or mut).

Origin = { Loan }

Loan = shared(Place)
     | mut(Place)

Place = variable(.field)*  // e.g., a.b.c

Defining types

Using origins, we can define Rust types roughly like this (obviously I’m ignoring a bunch of complexity here…):

Type = TypeName < Generic* >
     | & Origin Type
     | & Origin mut Type
     
TypeName = u32 (for now I'll ignore the rest of the scalars)
         | ()  (unit type, don't worry about tuples)
         | StructName
         | EnumName
         | UnionName

Generic = Type | Origin

Here is the first interesting thing to note: there is no 'a notation here! This is because I’ve not introduced generics yet. Unlike Rust proper, this formulation of the type system has a concrete syntax (Origin) for what 'a represents.

Explicit types for a simple program

Having a fully explicit type system also means we can easily write out example programs where all types are fully specified. This used to be rather challenging because we had no notation for lifetimes. Let’s look at a simple example, a program that ought to get an error:

let mut counter: u32 = 22_u32;
let p: & /*{shared(counter)}*/ u32 = &counter;
//       ---------------------
//       no syntax for this today!
counter += 1; // Error: cannot mutate `counter` while `p` is live
println!("{p}");

Apart from the type of p, this is valid Rust. Of course, it won’t compile, because we can’t modify counter while there is a live shared reference p (playground). As we continue, you will see how the new type system formulation arrives at the same conclusion.

Basic typing judgments

Typing judgments are the standard way to describe a type system. We’re going to phase in the typing judgments for our system iteratively. We’ll start with a simple, fairly standard formulation that doesn’t include borrow checking, and then show how we introduce borrow checking. For this first version, the typing judgment we are defining has the form

Env |- Expr : Type

This says, “in the environment Env, the expression Expr is legal and has the type Type”. The environment Env here defines the local variables in scope. The Rust expressions we are looking at for our sample program are pretty simple:

Expr = integer literal (e.g., 22_u32)
     | & Place
     | Expr + Expr
     | Place (read the value of a place)
     | Place = Expr (overwrite the value of a place)
     | ...

Since we only support one scalar type (u32), the typing judgment for Expr + Expr is as simple as:

Env |- Expr1 : u32
Env |- Expr2 : u32
----------------------------------------- addition
Env |- Expr1 + Expr2 : u32

The rule for Place = Expr assignments is based on subtyping:

Env |- Expr : Type1
Env |- Place : Type2
Env |- Type1 <: Type2
----------------------------------------- assignment
Env |- Place = Expr : ()

The rule for &Place is somewhat more interesting:

Env |- Place : Type
----------------------------------------- shared references
Env |- & Place : & {shared(Place)} Type

The rule just says that we figure out the type of the place Place being borrowed (here, the place is counter and its type will be u32) and then we have a resulting reference to that type. The origin of that reference will be {shared(Place)}, indicating that the reference came from Place:

&{shared(Place)} Type

Computing liveness

To introduce borrow checking, we need to phase in the idea of liveness.1 If you’re not familiar with the concept, the NLL RFC has a nice introduction:

The term “liveness” derives from compiler analysis, but it’s fairly intuitive. We say that a variable is live if the current value that it holds may be used later.

Unlike with NLL, where we just computed live variables, we’re going to compute live places:

LivePlaces = { Place }

To compute the set of live places, we’ll introduce a helper function LiveBefore(Env, LivePlaces, Expr): LivePlaces. LiveBefore() returns the set of places that are live before Expr is evaluated, given the environment Env and the set of places live after expression. I won’t define this function in detail, but it looks roughly like this:

// `&Place` reads `Place`, so add it to `LivePlaces`
LiveBefore(Env, LivePlaces, &Place) =
    LivePlaces ∪ {Place}

// `Place = Expr` overwrites `Place`, so remove it from `LivePlaces`
LiveBefore(Env, LivePlaces, Place = Expr) =
    LiveBefore(Env, (LivePlaces - {Place}), Expr)

// `Expr1` is evaluated first, then `Expr2`, so the set of places
// live after expr1 is the set that are live *before* expr2
LiveBefore(Env, LivePlaces, Expr1 + Expr2) =
    LiveBefore(Env, LiveBefore(Env, LivePlaces, Expr2), Expr1)
    
... etc ...

Integrating liveness into our typing judgments

To detect borrow check errors, we need to adjust our typing judgment to include liveness. The result will be as follows:

(Env, LivePlaces) |- Expr : Type

This judgment says, “in the environment Env, and given that the function will access LivePlaces in the future, Expr is valid and has type Type”. Integrating liveness in this way gives us some idea of what accesses will happen in the future.

For compound expressions, like Expr1 + Expr2, we have to adjust the set of live places to reflect control flow:

LiveAfter1 = LiveBefore(Env, LiveAfter2, Expr2)
(Env, LiveAfter1) |- Expr1 : u32
(Env, LiveAfter2) |- Expr2 : u32
----------------------------------------- addition
(Env, LiveAfter2) |- Expr1 + Expr2 : u32

We start out with LiveAfter2, i.e., the places that are live after the entire expression. These are also the same as the places live after expression 2 is evaluated, since this expression doesn’t itself reference or overwrite any places. We then compute LiveAfter1 – i.e., the places live after Expr1 is evaluated – by looking at the places that are live before Expr2. This is a bit mind-bending and took me a bit of time to see. The tricky bit here is that liveness is computed backwards, but most of our typing rules (and intution) tends to flow forwards. If it helps, think of the “fully desugared” version of +:

let tmp0 = <Expr1>
    // <-- the set LiveAfter1 is live here (ignoring tmp0, tmp1)
let tmp1 = <Expr2>
    // <-- the set LiveAfter2 is live here (ignoring tmp0, tmp1)
tmp0 + tmp1
    // <-- the set LiveAfter2 is live here

Borrow checking with liveness

Now that we know liveness information, we can use it to do borrow checking. We’ll introduce a “permits” judgment:

(Env, LiveAfter) permits Loan

that indicates that “taking the loan Loan would be allowed given the environment and the live places”. Here is the rule for assignments, modified to include liveness and the new “permits” judgment:

(Env, LiveAfter - {Place}) |- Expr : Type1
(Env, LiveAfter) |- Place : Type2
(Env, LiveAfter) |- Type1 <: Type2
(Env, LiveAfter) permits mut(Place)
----------------------------------------- assignment
(Env, LiveAfter) |- Place = Expr : ()

Before I dive into how we define “permits”, let’s go back to our example and get an intution for what is going on here. We want to declare an error on this assigment:

let mut counter: u32 = 22_u32;
let p: &{shared(counter)} u32 = &counter;
counter += 1; // <-- Error
println!("{p}"); // <-- p is live

Note that, because of the println! on the next line, p will be in our LiveAfter set. Looking at the type of p, we see that it includes the loan shared(counter). The idea then is that mutating counter is illegal because there is a live loan shared(counter), which implies that counter must be immutable.

Restating that intution:

A set Live of live places permits a loan Loan1 if, for every live place Place in Live, the loans in the type of Place are compatible with Loan1.

Written more formally:

∀ Place ∈ Live {
    (Env, Live) |- Place : Type
    ∀ Loan2 ∈ Loans(Type) { Compatible(Loan1, Loan2) }
}
-----------------------------------------
(Env, Live) permits Loan1

This definition makes use of two helper functions:

Loans(Type) – the set of loans that appear in the type
Compatible(Loan1, Loan2) – defines if two loans are compatible. Two shared loans are always compatible. A mutable loan is only compatible with another loan if the places are disjoint.

Conclusion

The goal of this post was to give a high-level intution. I wrote it from memory, so I’ve probably overlooked a thing or two. In follow-up posts though I want to go deeper into how the system I’ve been playing with works and what new things it can support. Some high-level examples:

How to define subtyping, and in particular the role of liveness in subtyping
Important borrow patterns that we use today and how they work in the new system
Interior references that point at data owned by other struct fields and how it can be supported

If this is not obvious to you, don’t worry, it wasn’t obvious to me either. It turns out that using liveness in the rules is the key to making them simple. I’ll try to write a follow-up about the alternatives I explored and why they don’t work later on. ↩︎

https://smallcultfollowing.com/babysteps/blog/2024/03/04/borrow-checking-without-lifetimes/

What I'd like to see for Async Rust in 2024 🎄

Jan 3, 2024 Updated Jan 3, 2024

Show full content

Well, it’s that time of year, when thoughts turn to…well, Rust of course. I guess that’s every time of year. This year was a pretty big year for Rust, though I think a lot of what happened was more in the vein of “setting things up for success in 2024”. So let’s talk about 2024! I’m going to publish a series of blog posts about different aspects of Rust I’m excited about, and what I think we should be doing. To help make things concrete, I’m going to frame the 2024 by using proposed project goals – basically a specific piece of work I think we can get done this year. In this first post, I’ll focus on async Rust.

What we did in 2023

On Dec 28, with the release of Rust 1.75.0, we stabilized async fn and impl trait in traits. This is a really big deal. Async fn in traits has been “considered hard” since 2019 and they’re at the foundation of basically everything that we need to do to make async better.

Async Rust to me showcases the best and worst of Rust. It delivers on that Rust promise of “high-level code, low-level performance”. Building on the highly tuned Tokio runtime, network services in Rust consistently have tighter tail latency and lower memory usage, which means you can service a lot more clients with a lot less resources. Alternatively, because Rust doesn’t hardcode the runtime, you can write async Rust code that targets embedded environments that don’t even have an underlying operating system, or anywhere in between.

And yet it continues to be true that, in the words of an Amazon engineer I talked to, “Async Rust is Rust on hard mode”. Truly closing this gap requires work in the language, standard library, and the ecosystem. We won’t get all the way there in 2024, but I think we can make some big strides.

Proposed goal: Solve the send bound problem in Q2

We made a lot of progress on async functions in traits last year, but we still can’t cover the use case of generic traits that can be used either with a work-stealing executor or without one. One very specific example of this is the Service trait from tower. To handle this use case, we need a solution to the send bound problem. We have a bunch of idea for what this might be, and we’ve even got a prototype implementation for (a subset of) return type notation, so we are well positioned for success. I think we should aim to finish this by the end of Q2 (summer, basically). This in turn would unblock a 1.0 release of the tower crate, letting us having a stable trait for middleware.

Proposed goal: Stabilize an MVP for async closures in Q3

The holy grail for async is that you should be able to easily make any synchronous function into an asynchronous one. The 2019 MVP supported only top-level functions and inherent methods. We’ve now extended that to include trait methods. In 2024, we should take the next step and support async closures. This will allow people to define combinator methods like iterator map and so forth and avoid the convoluted workarounds currently required.

For this first goal, I think we should be working to establish an MVP. Recently, Errs and I outlined an MVP we thought seemed quite doable. It began with creating AsyncFn traits that look that mirror the Fn trait hierarchy…

trait AsyncFnOnce<A> {
    type Output;
    
    async fn call_once(self, args: A) -> Self::Output;
}

trait AsyncFnMut<A>: AsyncFnOnce<A> {
    async fn call_mut(&mut self, args: A) -> Self::Output;
}

trait AsyncFn<A>: AsyncFnMut<A> {
    async fn call(self, args: A) -> Self::Output;
}

…and the ability to write async closures like async || <expr>, as well as a bridge such that any function that returns a future also implements the appropiate AsyncFn traits. Async clsoures would unblock us from creating combinator traits, like a truly nice version of async iterators.

This MVP is not intended as the final state, but it is intended to be compatible with whatever final state we wind up with. There remains a really interesing question about how to integrate the AsyncFn traits with the regular Fn traits. Nonetheless, I think we can stabilize the above MVP in parallel with exploring that question.

Proposed goal: Author an RFC for “maybe async” in Q4 (or decide not to!)

One of the big questions around async is whether we should be supporting some way to write “maybe async” code. This idea has gone through a lot of names. Yosh and Oli originally kicked off something they called keyword generics and later rebranded as effect generics. I prefer the framing of trait transformers, and I wrote a blog post about how trait transformers can make async closures fit nicely.

There is significant skepticism about whether this is a good direction. There are other ways to think about async closures (though Errs pointed out an issue with this that I hope to write about in a future post). Boats has written a number of blog posts with concerns, and members of the types team have expressed fear about what will be required to write code that is generic over effects. These concerns make a lot of sense to me!

Overall, I still believe that something like trait transformers could make Rust feel simpler and help us scale to future needs. But I think we have to prove our case! My goal for 2024 then is to do exactly that. The idea would be to author an RFC laying out a “maybe async” scheme and to get that RFC accepted. To address the concerns of the types team, I think that will require modeling “maybe async” formally as part of a-mir-formality, so that everybody can understand how it will work.

Another possible outcome here is that we opt to abandon the idea. Maybe the complexity really is infeasible. Or maybe the lang design doesn’t feel right. I’m good with that too, but either way, I think we need to settle on a plan this year.

Stretch goal: stabilize generator syntax

As a stretch goal, it would be really cool to land support for generator expressions – basically a way to write async iterators. Errs recently opened a PR adding nightly support for async and RFC #3513 proposed reserving the gen keyword for Rust 2024. Really stabilizing generators however requires us to answer some interesting questions about the best design for the async iteration trait. Thanks to the stabilization of async fn in trait, we can now have this conversation – and we have certainly been having it! Over the last month or so there has also been a lot of interesting back and forth about the best setup. I’m still digesting all the posts, I hope to put up some thoughts this month (no promises). Regardless, I think it’s plausible that we could see async genreators land in 2024, which would be great, as it would eliminate the major reason that people have to interact directly with Pin.

Conclusion: looking past 2024

If we accomplish the goals I outlined above, async Rust by the end of 2024 will be much improved. But there will still be a few big items before we can really say that we’ve laid out the pieces we need. Sadly, we can’t do it all, so these items would have to wait until after 2024, though I think we will continue to experiment and discuss their design:

Async drop: Once we have async closures, there remains one place where you cannot write an async function – the Drop trait. Async drop has a bunch of interesting complications (Sabrina wrote a great blog post on this!), but it is also a major pain point for users. We’ll get to it!
Dyn async trait: Besides send bounds, the other major limitation for async fn in trait is that traits using them do not yet support dynamic dispatch. We should absolutely lift this, but to me it’s lower in priority because there is an existing workaround of using a proc-macro to create a DynAsyncTrait type. It’s not ideal, but it’s not as fundamental a limitation as send bounds or the lack of async closures and async drop. (That said, the design work for this is largely done, so it is entirely possible that we land it this year as a drive-by piece of work.)
Traits for being generic over runtimes: Async Rust’s ability to support runtimes as varied as Tokio and Embassy is one of its superpowers. But the fact that switching runtimes or writing code that is generic over what runtime it uses is very hard to impossible is a key pain point, made even worse by the fact that runtimes often don’t play nice together. We need to build out traits for interop, starting with [async read + write] but eventually covering [task spawning and timers].
Better APIs: Many of the nastiest async Rust bugs come about when users are trying to manage nested tasks. Existing APIs like FutureUnordered and select have a lot of rough edges and can easily lead to deadlock – Tyler had a good post on this. I would like to see us take a fresh look at the async APIs we offer Rust programmers and build up a powerful, easy to use library that helps steer people away from potential sources of deadlock. Ideally this API would not be specific to the underlying runtime, but instead let users switch between different runtimes, and hopefully cleanly support embedded systems (perhaps with limited functionality). I don’t think we know how to do this yet, and I think that doing it will require us to have a lot more tools (things like send bounds, async closure, and quite possibly trait transformers or async drop).

https://smallcultfollowing.com/babysteps/blog/2024/01/03/async-rust-2024/

Being Rusty: Discovering Rust's design axioms

Dec 7, 2023 Updated Dec 7, 2023

Show full content

To your average Joe, being “rusty” is not seen as a good thing.1 But readers of this blog know that being Rusty – with a capitol R! – is, of course, something completely different! So what is that makes Rust Rust? Our slogans articulate key parts of it, like fearless concurrency, stability without stagnation, or the epic Hack without fear. And there is of course Lindsey Kuper’s epic haiku: “A systems language / pursuing the trifecta: / fast, concurrent, safe”. But I feel like we’re still missing a unified set of axioms that we can refer back to over time and use to guide us as we make decisions. Some of you will remember the Rustacean Principles, which was my first attempt at this. I’ve been dissatisfied with them for a couple of reasons, so I decided to try again. The structure is really different, so I’m calling it Rust’s design axioms. This post documents the current state – I’m quite a bit happier with it! But it’s not quite there yet. So I’ve also got a link to a repository where I’m hoping people can help improve them by opening issues with examples, counter-examples, or other thoughts.

Axioms capture the principles you use in your decision-making process

What I’ve noticed is that when I am trying to make some decision – whether it’s a question of language design or something else – I am implicitly bringing assumptions, intuitions, and hypotheses to bear. Oftentimes, those intutions fly by very quickly in my mind, and I barely even notice them. Ah yeah, we could do X, but if we did that, it would mean Y, and I don’t want that, scratch that idea. I’m slowly learning to be attentive to these moments – whatever Y is right there, it’s related to one of my design axioms — something I’m implicitly using to shape my thinking.

I’ve found that if I can capture those axioms and write them out, they can help me down the line when I’m facing future decisions. It can also help to bring alignment to a group of people by making those intutions explicit (and giving people a chance to refute or sharpen them). Obviously I’m not the first to observe this. I’ve found Amazon’s practice of using tenets to be quite useful2, for example, and I’ve also been inspired by things I’ve read online about the importance of making your hypotheses explicit.3

In proof systems, your axioms are the things that you assert to be true and take on faith, and from which the rest of your argument follows. I choose to call these Rust’s design axioms because that seemed like exactly what I was going for. What are the starting assumptions that, followed to their conclusion, lead you to Rust? The more clearly we can articulate those assumptions, the better we’ll be able to ensure that we continue to follow them as we evolve Rust to meet future needs.

Axioms have a hypothesis and a consequence

I’ve structured the axioms in a particular way. They begin by stating the axiom itself – the core belief that we assert to be true. That is followed by a consequence, which is something that we do as a result of that core belief. To show you what I mean, here is one of the Rust design axioms I’ve drafted:

Rust users want to surface problems as early as possible, and so Rust is designed to be reliable. We make choices that help surface bugs earlier. We don’t make guesses about what our users meant to do, we let them tell us, and we endeavor to make the meaning of code transparent to its reader. And we always, always guarantee memory safety and data-race freedom in safe Rust code.

Axioms have an ordering and earlier things take priority

Each axiom is useful on its own, but where things become interesting is when they come into conflict. Consider reliability: that is a core axiom of Rust, no doubt, but is it the most important? I would argue it is not. If it were, we wouldn’t permit unsafe code, or at least not without a safety proof. I think our core axiom is actually that Rust is is meant to be used, and used for building a particular kind of program. I articulated it like this:

Rust is meant to empower everyone to build reliable and efficient software, so above all else, Rust needs to be accessible to a broad audience. We avoid designs that will be too complex to be used in practice. We build supportive tooling that not only points out potential mistakes but helps users understand and fix them.

When it comes to safety, I think Rust’s approach is eminently practical. We’ve designed a safe type system that we believe covers 90-95% of what people need to do, and we are always working to expand that scope. We to get that last 5-10%, we fallback to unsafe code. Is this as safe and reliable as it could be? No. That would be requiring 100% proofs of correctness. There are systems that do that, but they are maintained by a small handful of experts, and that idea – that systems programming is just for “wizards” – is exactly what we are trying to get away from.

To express this in our axioms, we put accessible as the top-most axiom. It defines the mission overall. But we put reliability as the second in the list, since that takes precedence over everything else.

The design axioms I really like

Without further ado, here is my current list design axioms. Well, part of it. These are the axioms that I feel pretty good about it. The ordering also feels right to me.

We believe that…

Rust is meant to empower everyone to build reliable and efficient software, so above all else, Rust needs to be accessible to a broad audience. We avoid designs that will be too complex to be used in practice. We build supportive tooling that not only points out potential mistakes but helps users understand and fix them.

Rust users want to surface problems as early as possible, and so Rust is designed to be reliable. We make choices that help surface bugs earlier. We don’t make guesses about what our users meant to do, we let them tell us, and we endeavor to make the meaning of code transparent to its reader. And we always, always guarantee memory safety and data-race freedom in safe Rust code.

Rust users are just as obsessed with quality as we are, and so Rust is extensible. We empower our users to build their own abstractions. We prefer to let people build what they need than to try (and fail) to give them everything ourselves.

Systems programmers need to know what is happening and where, and so system details and especially performance costs in Rust are transparent and tunable. When building systems, it’s often important to know what’s going on underneath the abstractions. Abstractions should still leave the programmer feeling like they’re in control of the underlying system, such as by making it easy to notice (or avoid) certain types of operations.

…where earlier things take precedence.

The design axioms that are still a work-in-progress

These axioms are things I am less sure of. It’s not that I don’t think they are true. It’s that I don’t know yet if they’re worded correctly. Maybe they should be combined together? And where, exactly, do they fall in the ordering?

Rust users want to focus on solving their problem, not the fiddly details, so Rust is productive. We favor APIs that where the most convenient and high-level option is also the most efficient one. We support portability across operating systems and execution environments by default. We aren’t explicit for the sake of being explicit, but rather to surface details we believe are needed.

N✕M is bigger than N+M, and so we design for composability and orthogonality. We are looking for features that tackle independent problems and build on one another, giving rise to N✕M possibilities.

It’s nicer to use one language than two, so Rust is versatile. Rust can’t be the best at everything, but we can make it decent for just about anything, whether that’s low-level C code or high-level scripting.

Of these, I like the first one best. Also, it follows the axiom structure better, because it starts with a hypothesis about Rust users and what they want. The other two are a bit older and I hadn’t adopted that convention yet.

Help shape the axioms!

My ultimate goal is to author an RFC endorsing these axioms for Rust. But I need help to get there. Are these the right axioms? Am I missing things? Should we change the ordering?

I’d love to know what you think! To aid in collaboration, I’ve created a nikomatsakis/rust-design-axioms github repository. It hosts the current state of the axioms and also has suggested ways to contribute.

I’ve already opened issues for some of the things I am wondering about, such as:

nikomatsakis/rust-design-axioms#1: Maybe we need a “performant” axiom? Right now, the idea of “zero-cost abstractions” and ““the default thing is also the most efficient one” feels a bit smeared across “transparent and tunable” and “productive”.
nikomatsakis/rust-design-axioms#2: Is “portability” sufficiently important to pull out from “productivity” into its own axiom?
nikomatsakis/rust-design-axioms#3: Are “versatility” and “orthogonality” really expressing something different from “productivity”?

Check it out!

I have a Google alert for “Rust” and I cannot tell you how often it seems that some sports teams or another shakes off Rust. I’d never heard that expression before signing up for this Google alert. ↩︎
I’m perhaps a bit unusual in my love for things like Amazon’s Leadership Principles. I can totally understand why, to many people, they seem like corporate nonsense. But if there’s one theme I’ve seen consistenly over my time working on Rust, it’s that process and structure are essential. Take a look at the “People Systems” keynote that Aaron, Ashley, and I gave at RustConf 2018 and you will see that theme running throughout. So many of Rust’s greatest practices – things like the teams or RFCs or public, rfcbot-based decision making – are an attempt to take some kind of informal, unstructured process and give it shape. ↩︎
I really like this Learning for Action page, which I admit I found just by googling for “strategy articulate a hypotheses”. I’m less into this super corporate-sounding LinkedIn post, but I have to admit I think it’s right on the money. ↩︎

https://smallcultfollowing.com/babysteps/blog/2023/12/07/rust-design-axioms/

Project Goals

Nov 28, 2023 Updated Nov 28, 2023

Show full content

Lately I’ve been iterating on an idea I call project goals. Project goals are a new kind of RFC that defines a specific goal that a specific group of people hope to achieve in a specific amount of time – for example, “Rusty Spoon Corp proposes to fund 2 engineers full time to stabilize collections that support custom memory allocations by the end of 2023”.

Project goals would also include asks from various teams that are needed to complete the goal. For example, “Achieving this goal requires a dedicated reviewer from the compiler team along with an agreement from the language design team to respond to RFCs or nominated issues within 2 weeks.” The decision of whether to accept a goal would be up to those teams who are being asked to support it. If those teams approve the RFC, it means they agree with the goal, and also that they agree to commit those resources.

My belief is that project goals become a kind of incremental, rolling roadmap, declaring our intent to fix specific problems and then tracking our follow-through (or lack thereof). As I’ll explain in the post, I believe that a mechanism like project goals will help our morale and help us to get shit done, but I also think it’ll help with a bunch of other ancillary problems, such as providing a clearer path to get involved in Rust as well as getting more paid maintainers and contributors.

At the moment, project goals are just an idea. My plan is to author some sample goals to iron out the process and then an RFC to make it official.

Driving a goal in the Rust project is an uncertain process

Rust today has a lot of half-finished features waiting for people to invest time into them. But figuring out how to do so can be quite intimidating. You may have to trawl through github or Zulip threads to figure out what’s going on. Once you’ve done that, you’ll likely have to work through some competing constraints to find a proposed solution. But that stuff isn’t the real problem. The real problem is that, once you’ve invested that time and done that work, you don’t really know whether anyone will care enough about your work to approve it. There’s a good chance you’ll author an RFC, or a PR, and nobody will even respond to it.

Rust teams today often operate in a fairly reactive mode, without clear priorities. The official Rust procedures are almost exclusively ‘push’, and often based on evaluating artifacts, not intentions – people decide a problem they would like to see solved, and write an RFC or a PR to drive it forward; the teams decide whether to accept that work. But there is no established way to get feedback from the team on whether this is a problem – or an approach the problem – that would be welcome. Or, even if the team does theoretically want the work, there is no real promise from the team that they’ll respond or accountability when they do not.

We do try to be proactive and talk about our goals. Teams sometimes post lists of aspirations or roadmaps to to Inside Rust, for example, and we used to publish annual roadmaps as a project. But these documents have never seemed very successful to me. There is a fundamental tension that is peculiar to open source: the teams are not the ones doing the work. Teams review and provide feedback. Contributors do the work, and ultimately they decide what they will work on (or if they will do work at all). It’s hard to plan for the kinds of things you will do when you don’t know what resources you have. A more reliable barometer of the Rust project’s priorities has been to read the personal blogs doing the work, where people are talking about the goals they personally plan to drive.

This uncertainty holds back investment

The uncertainty involved in trying to push an idea forward in Rust is a major deterrent for companies thinking about investing in Rust. I hear about this gap from virtually every angle:

Imagine you’re a a developer who wants to use paid time to work on open source. How do you convince your manager it makes sense? Right now, the best you can do is I think I can make progress, and besides, it’s the right thing to do!"
Imagine you’re a contractor who wants to deliver for a client. They want to pay you to help drive a feature over the finish line – but you can’t be sure if you’re going to be able to deliver, since it will require consensus from a Rust team, and it’s unclear whether it meets their priorities.
Imagine you’re a CTO considering whether to adopt Rust for your company. You see that there are gaps in an area, but you don’t know whether that is something the project is actively looking to close, or what.
Or maybe you’re a CTO who has adopted Rust and is looking to “give back” to the community by contributing. You want to help deliver support for a feature you need and that you know a lot of people in the community would like, but you can’t figure out how to get started, and you can’t afford to have an engineer or two work on something for months without a return.

But some things work really well and we don’t want to lose those

Rust’s development may be chaotic, but there’s a beauty to it as well. As Mara’s classic blog post put it, “Rust is not a company”. Rust’s current structure allows for a feature to make progress in fits and starts, which means we can accommodate all kinds many different interest levels and motivation. Someone who is motivated can author and contribute an RFC, and then disappear. Somebody else can pick up the ball and move the implementation forward. And yet a third person can drive the docs and stabilization over the finish line. This is not only cool to watch, it also means that some features get done that would never be “top priority”. Consider let-else – this is one of the most popular features from the last few years, and yet, compared against core enabled like “async fn in trait”, it clearly takes second place in the priority list. But that’s fine, there are plenty of folks who don’t have the time or expertise to work on async fn in trait, but they can move let-else forward. It’s really important to me that we don’t lose this.

Proposal: project goal RFCs

So, top-down roadmaps are a poor fit for open-source. But working purely bottom-up has its own downsides. What can we do?

My proposal is to form roadmaps, but to do it bottom-up, via a new kind of RFC called a project goal RFC. A regular RFC proposes a solution to a problem. A project goal RFC proposes a plan to solve a particular problem in a particular timeframe. This could be specific, like “stabilize support for async closures in 2024”, or it could be more general, like “land nightly support for managing resource cleanup in async functions in 2024”. What it can’t be is non-actionable, such as “simplify async programming in 2024” or “make async Rust nice in 2024”.

Project goal RFCs are opened by the goal owners, the people proposing to do the work. They are approved by the teams which will be responsible for approving that work.1 The RFC serves as a kind of contract: the owners will drive the work and the team will review that work and/or provide other kinds of support (such as mentorship).

Project goal RFCs are aimed squarely at larger projects

Project goal RFCs are not appropriate for all projects. In fact, they’re not appropriate for most projects. They are meant for larger, flagship projects, the kind where you want to be sure that the project is aligned around the goals before you start investing heavily. Here are some examples where I think project goal RFCs would be useful…

The async WG set an “unofficial” project goal of shipping async functions in traits this year (coming Dec 28!). Honestly, setting a goal like this felt a bit uncomfortable, as we didn’t have a means to make it “official and blessed”. I think that would have also helped during the push to stabilization, since we could reference this goal to help make the case for “time to ship”.
Goals might also take the shape of internal improvements. The types team is driving a flagship goal to ship a new trait solver. Authoring a project goal RFC would help bring this visibility and would also make it easier to make the case for funding work on this project.
I sometimes help to mentor collaborations with people in universities or with Master’s students. Project goals would let us set expectations up front about what work we expect to do during that time.
I’d like to drive consensus around the idea of easing tradeoffs with profiles – but I don’t want to start off with an RFC that is going to focus discuss on the details of how profiles are specified. I want to start off by getting alignment around whether to do something like profiles at all. Wearing my Amazon manager hat, having alignment there would also influence whether I allocated some of our team’s bandwidth to work on that. A project goal could be perfect for that.
The Foundation has run several project grant programs, and one of the challenges has been trying to choose projects to fund which will be welcomed by the project. As I’ve been saying, we don’t really have a mechanism for making those sorts of decisions.
The embedded working group or the Rust For Linux folks have a bunch of pain points. I think it’s been hard for us to manage cooperation between those really important efforts and the other Rust teams. Developing a joint project goal would be a way to highlight needs.
Someone who wants to work on Rust at their company could work with a team to develop an official goal that they can show to their manager to get authorized work time.
Companies that want to invest in Rust to close gaps could propose project goals. For example, I frequently get asked how a company can help move custom allocators forward. One candidate that comes up a lot is support for custom allocators and collections with fallible allocation. This same mechanism would also allow larger companies to propose goals that they’d like to drive. For example, there was a recent RFC on debugger visualization aimed at better support for debugging Rust in Windows. I could imagine folks from Microsoft proposing some goals in that area.

Anatomy of a project goal RFC

Project goal RFCs need to include enough detail that both the owners and the teams know what they are signing up for. I believe a project goal RFC should answer the following questions:

Why is this work important?
What work will be done on what timeframe?
- This should include…
  - milestones you will meet along the way,
  - specific use-cases you plan to address,
  - and guiding principles that will be used during design.
Who will be doing the work, and how much time will the have?
What support is needed and from which Rust teams?

The list above is intentionally somewhat detailed. Project goal RFCs are not meant to be used for everything. They are meant to be used for goals that are big enough that doing the planning is worthwhile. The planning also helps the owners and the teams set realistic timelines. (My assumption is that the first few project goals we set will be wildly optimistic, and over time we learn to temper our expectations.)

Why is this work important?

Naturally whenever we propose to do something, it is important to explain why this thing is worth doing. A quality project goal will lay out the context and motivation. The goal is for the owners to explain to the team why the team should dedicate their maintenance bandwidth to this feature. It’s also a space for the owners to explain to the world why they feel it’s worth their time to do the work to develop this feature.

What will be done and on what timeframe?

The heart of the project goal is declaring what work is to be done and when it will be done by. It’s important that this “work to be done” is specific enough to be evaluated. For example, “make async nice next year” is not a good goal. Something like “stabilize async closures in 2024” is good. It’s also ok to just talk about the problem to be solved, if the best solution isn’t known yet. For example, “deliver nightly support for managing resource cleanup in async programs in 2025” is a good goal that could be solved by [“async drop”][] but also by some other means.

Scaling work with timeframes and milestones

Goals should always include a specific timeframe, such as “in 2024” or “in 2025”. I think these timeframes will typically be about a year. If the time is too short, then the work is probably not significant enough to call it a goal. But if the timeframe is much longer than a year, then it’s probably best to scale back the “work to be done” to something more intermediate.

Of course, many goals will be part of a bigger project. For example, if one took a goal to deliver nightly support for something in 2024, then the next year, one might propose a goal to stabilize that support.

Ideally, the goal will also include milestones along the way. For example, if the goal is to have something stable in 1 year, it might begin with an RFC after 3 months, then 3 months of impl, 3 months of gaining experience, and 3 months for stabilization.

Pinning things down with use-cases

Unlike a feature RFC, a project goal RFC does not specify a precise design for the feature in question. Even if the project goal is something relatively specific, like “add support for async functions in traits”, there will still be a lot of ambiguity about what counts as success. For example, we decided to stabilize async functions in traits without support for send bounds. This means that some use cases, notably a crate like tower, aren’t supported yet. Does this count as success? To help pin this down, the project goal should include a list of use cases that it is trying to address.

Establishing guiding principles early

Finally, especially when goals involve a fair bit of design leeway, it is useful to lay down some of the guiding principles the goal owners expect to use. I think having discussion around these principles early will really help focus discussions later on. For example, when discussing how dynamic dispatch for async functions in traits should work, Tyler Mandry and I had an early goal that it should “just work” for simple cases but give the ability to customize behavior. But we quickly found that ran smack into Josh’s prioritization of allocation transparency. This conflict was precictable and I think it would have been useful to have had the discussion around these tenets early as a lang team, rather than waiting.2

Who will be doing the work, and how much time will the have?

Part of the goal is specifying who is going to be doing the work. For example, the goal might say “two developers to work at 50% time”. It might also say something more flexible, like “one developer to create quest issues and then mentor a group of volunteers to drive most of the work”. If possible, including specific names is useful too, particularly in more specialized areas. For example, “Ralf Jung and one graduate student will pursue an official set of rules for stacked borrows”.

What support is needed and from which Rust teams?

This section is where the project goal owners make asks of the project. Here are some typical asks that I expect we will have:

A dedicated reviewer for PRs to the compiler and an expected SLA of reviews within 3 days (or 1 week, or something).
An agreement from the lang team to review and provide feedback on RFCs.
Mentorship on some aspect or other.

I think teams should suggest the expected shape of asks and track their resources. For example, the lang team can probably have manage up to only a small number of “prioritized RFCs” at a time, so if there are more project goals, they may have to wait or accept a lower SLA.

Tracking progress

One of the interesting things about project goals is that they give us an immediate roadmap. I would like to see the project author a quarterly report – which means every 12 weeks, or two release cycles. This report would include all the current project goals and updates on their progress. Did they make their declared milestones? If not, why not? Because project goals don’t cover the entirety of the work we do, the report could also include other significant developments. This would be published on the main Rust blog and would let people follow along with Rust development and get a sense for our current trajectory.

One thing I’ve learned, though: you can’t require the goal owners to author that blog post. It would be much better to have a dedicated person or team authoring the blog posts and pinging the goal owners to get those status updates. Preparing an update so that it can be understood by a mass audience is its own sort of skill. Moreover, goal owners will be tempted to put it off, and the updates won’t happen. I think it’s quite important that these project updates happen every quarter, like clockwork, just as our Rust releases do. This is true even if the update has to ship without an update from some goals.

I envision this progress tracking as providing a measure of accountability. When somebody takes a goal, we’ll be able to follow along with their progress. I’ve seen at Amazon and elsewhere that having written down a goal and declared milestones, and then having to say whether you’ve met them, helps to keep teams focused on getting the job done. I often find that I have a job about 95% done but then, in the week before I have to write an update about it, I’m inspired to go and finish that last 5%.

Conclusion: next steps

My next step is that I am going to fashion an RFC making the case for project goals. This RFC will include a template. To try out the idea, I plan to also author an example project goal for “async function in traits” and perhaps some other ongoing or proposed efforts. In truth, I don’t think we need an RFC to do project goals – nothing is stopping us from accepting whatever RFC we want – but I see some value in spelling out and legitimizing the process. I think this probably ought to be approved by the governance council, which is an interesting test for that new group.

There are some follow-up questions worth discussing. One of the ones I think is most interesting is how to manage the quarterly project updates. This deserves a post of its own. The short version of my opinion is that I think it’d be great to have an open source “reporting” team that has the job of authoring this update and others of its ilk. I suspect that this team would work best if we had one or more people paid to participate and to bear the brunt of some of the organizational lift. I further suspect that the Foundation would be a good place for at least one of those people. But this is getting pretty speculative by now and I’d have to make the case to the board and Rust community that it’s a good use for the Foundation budget, which I certainly have not done.

It’s worth noting that I see project goal RFCs as just one piece of a larger puzzle that is giving a bit more structure to our design effort. One thing I think went wrong in prior efforts was that we attemped to be too proscriptive and too “one size fits all”. These days I tend to think that the only thing we must have to add a new feature to stable is an FCP-binding decision from the relevant teams(s). All the rest, whether it be authoring a feature RFC or creating a project goal RFC, are steps that make sense for projects of a certain magnitude, but not everything. Our job then should be to lay out the various kinds of RFCs one can write and when they are appropriate for use, and then let the teams judge how and when to request one.

In theory, anyway. In practice, I imagine that many team maintainers may keep some draft project goal RFCs in their pocket, looking for someone willing to do the work. ↩︎
The question of how to make dyn async traits easy to use and transparent remains unresolved, which is partly why I’m keen on something like profiles. ↩︎

https://smallcultfollowing.com/babysteps/blog/2023/11/28/project-goals/

Idea: "Using Rust", a living document

Oct 20, 2023 Updated Oct 20, 2023

Show full content

A few years back, the Async Wg tried something new. We collaboratively authored an Async Vision Doc. The doc began by writing “status quo” stories, written as narratives from our cast of characters, that described how people were experiencing Async Rust at that time and then went on to plan a “shiny future”. This was a great experience. My impression was that authoring the “status quo” stories in particular was really helpful. Discussions at EuroRust recently got me wondering: can we adapt the “status quo” stories to something bigger? What if we could author a living document on the Rust user experience? One that captures what people are trying to do with Rust, where it is working really well for them, and where it could use improvement. I love this idea, and the more I thought about it, the more I saw opportunities to use it to improve other processes, such as planning, public communication, and RFCs. But I’m getting ahead of myself! Let’s dive in.

TL;DR

I think authoring a living document (working title: “Using Rust”) that collects “status quo” stories could be a tremendous resource for the Rust community. I’m curious to hear from folks who might like to be part of a group authoring such a document, especially (but not only) people with experience as product managers, developer advocates, or UX researchers.

Open source is full of ideas, but which to do?

The Rust open-source organization is a raucuous, chaotic, and, at its best, joyful environment. People are bubbling with ideas on how to make things better (some better than others). There are also a ton of people who want to be involved, but don’t know what to do. This sounds great, but it presents a real challenge: how do you decide which ideas to do?

The vast majority of ideas for improvement tend to be incremental. They take some small problem and polish it. If I sound disparaging, I don’t mean to be. This kind of polish is absolutely essential. It’s kind of ironic: there’s always been a perception that open source can’t build a quality product, but my experience has often been the opposite. Open source means that people show up out of nowhere with PRs that remove sharp edges. Sometimes it’s an edge you knew was there but didn’t have time to fix; other times it’s a problem you weren’t aware of, perhaps because of the Curse of Knowledge.

But finding those revolutionary ideas is harder. To be clear, it’s hard in any environment, but I think it’s particularly hard in open source. A big part of the problem is that open source has always focused on coding as our basic currency. Discussions tend to orient around specific proposals – that could be as small as a PR or as large as an RFC. But finding a revolutionary idea doesn’t start from coding or from a specific idea.

It all starts with the “status quo”

So how do we go about having more “revolutionary ideas”? My experience is that it begins by deeply understandly understanding the present moment. It’s amazing how often we take the “status quo” for granted. We assume that we know the problems people experience, and we assume that everybody else knows them too. In reality, we only know the problems that we personally experience – and most of the time we are not even fully aware of those!

One thing I remember from authoring the async vision doc is how hard it was to focus on the “status quo” – and how rewarding it was when we did! When you get people talking about the problems they experience, the temptation is to immediately jump to how to fix the problem. But if you resist that, and you force yourself to just document the current state, you’ll find you have a much richer idea of the problem.1 And that richer understanding, in turn, gives rise to better ideas for how to fix it.

Idea: a living “Using Rust” document

So here is my idea: what if we created a living document, working title “Using Rust”, that aims to capture the “status quo” of Rust today:

What are people building with Rust?
How are people’s Rust experiences influenced by their background (e.g., prior programming experience, native language, etc)?
What is working well?
What challenges are they encountering?

Just as with the Async Vision Doc, I imagine “Using Rust” would cover the whole gamut of experiences, including not just the language itself but tooling, libraries, etc. Unlike the vision doc, I wouldn’t narrow it to async (though we might start by focusing on a particular domain to prove out the idea).

Like the vision doc, I imagine “Using Rust” would be composed of a series of vignettes, expressed in narrative form, using a similar set of personas 2 to the Async Vision Doc (perhaps with variations, like Spanish-speaking Alano instead of Alan).

I personally found the narratives really helpful to get the emotional “heft” of some of the stories. For example, “Alan started trusting the Rust compiler, but then… async” helped drive home the importance of that “if it compiles, it works” feeling for Rust users, as well as the way that panics can undermine it. Even though these are narratives, they can still dive deep into technical details. Researching and writing “Barbara battles buffered streams”, for example, really helped me to appreciate the trickiness of async cancellation’s semantics.3

I don’t think “Using Rust” would ever be finished, nor would I narrow it to one domain. Rather, I imagine it being a living document, one that we continuously revise as Rust changes.

Improving on the async vision doc

The async vision doc experience was great, but I learned a few things along the way that I would do differently now. One of them is that collecting stories is good, but synthesizing them is better (and harder). I also found that people telling you the stories are not always the right ones to author them. Last time, we had a lot of success with people authoring PRs, but many times people would tell a story, agree to author a PR, and then never follow up. This is pretty standard for open source but it also applies a sort of “selection bias” to the stories we got. I would address both of these problems by dividing up the roles. Rust users would just have to tell their stories. There would be a group of maintainers who would record those stories and then go try to author the PRs that integrate into “Using Rust”.

The other thing I learned is that trying to author a single shiny future does not work. It was meant to be a unifying vision for the group, but there are just too many variables at play to reach consensus on that. We should definitely be talking about where we will be in 5 years, but we don’t have to be entirely aligned on it. We just have to agree on the right next steps. My new plan is to integrate the “shiny future” into RFCs, as I describe below.

Maintaining “Using Rust”

In the fullness of time, and presuming it works out well, I think “Using Rust” should be a rust-lang project, owned and maintained by its own team. My working title for this team is the User Research Team, which has the charter of gathering up data on how people use Rust and putting that data into a form that makes it accessible to the rest of the Rust project. But I tend to think it’s better to prove out ideas before creating the team, so I think I would start with an experimental project, and create the team once we demonstrate the concept is working.

Gathering stories

So how would this team go about gathering data? There’s so many ways. When doing the async vision doc, we got some stories submitted by PRs on the repo. We ran writing sessions where people would come and tell us about their experiences.

I think it’s very valuable to have people gather “in depth” data from within specific companies. For the Async Vision Doc, I also interviewed team members, culminating in the “meta-story” “Alan extends an AWS service”. Tyler Mandry and I also met with members from Google, and I recall we had folks from Embark and a few other companies reach out to tell us about their experiences.

Another really cool idea that came from Pietro Albini: set up a booth at various Rust conferences where people can come up and tell you about their stories. Or perhaps we can run a workshop. So many possibilities!

Integrating “Using Rust” with the RFC process

The purpose of an RFC, in my mind, is to lay out a problem and a specific solution to that problem. The RFC is not code. It doesn’t have to be a complete description of the problem. But it should be complete enough that people can imagine how the problem is going to be solved.

Every RFC includes a motivation, but when I read those motivations, I am often a bit at a loss as to how to evaluate them. Clearly there is some kind of problem. But is it important? How does it rank with respect to other problems that users are encountering?

I imagine that the “Using Rust” doc would help greatly here. I’d like to get to the point where the moivation for RFCs is primarily addressing particular stories or aspects of stories within the document. We would then be able to read over other related stories to get a sense for how this problem ranks compared to other problems for that audience, and thus how important the motivation is.

RFCs can also include a section that “retells” the story to explain how it would have played out had this feature been available. I’ve often found that doing this helps me to identify obvious gaps. For example, maybe we are adding a nifty new syntax to address an issue, but how will users learn about it? Perhaps we can add a “note” to the diagnostic to guide them.

Frequently asked questions Will this help us in cross-team collaboration?

Like any organization, the Rust organization can easily wind up “shipping its org chart”. For example, if I see a problem, as a lang-team member, I may be inclined to ship a language-based solution for it; similarly, I’ve seen that the embedded community works very hard to work within the confines of Rust as it is, whereas sometimes they could be a lot more productive if we added something to the language.

Although they are not a complete solution, I think having a “Using Rust” document will be helpful. Focusing on describing the problem means it can be presented to multiple teams and each can evaluate it to decide where the best solution lies.

What about other kinds of stories?

I’ve focused on stories about Rust users, but I think there are other kinds of stories we might want to include. For example, what about the trials and travails of Alan, Barbara, Grace, and Niklaus as they try to contribute to Rust?

How will we avoid “scenario solving”?

Scenario solving refers to a pattern where a feature is made to target various specific examples rather than being generalized to address a pattern of problems. It’s possible that if we write out user stories, people will design features to target exactly the problems that they read about, rather than observing that a whole host of problems can be addressed via a single solution. That is true, and I think teams will want to watch out for that. At the same time, I think that having access to a full range of stories will make it much easier to see those large patterns and to help identify the full value for a proposal.

What about a project management team?

From time to time there are proposals to create a “project management” team. There are many different shapes for what such a team would do, but the high-level motivation is to help provide “overall guidance” and ensure coherence between the Rust teams. I am skeptical about any idea that sounds like an “overseer” team. I trust the Rust teams to own and maintain their area. But I do think we can all benefit from getting more alignment on the sets of problems to be solved, which I think this “Using Rust” document would help to create. I can also imagine other interesting mechanisms that build on the doc, such as reviewing stories as a group online, or at “unconferences”.

Call to action: get in touch!

I’m feeling pretty excited about this project. I’m contemplating how to go about organizing it. I’m really interested to hear from people who would like to take part as authors and collators of user stories. If you think you’d be interested to participate, please send me an email. I’m particularly interested to hear from people with experience doing this sort of work (e.g., product managers, developer advocates, UX researchers).

If you’re hearing resonance of the wisdom of the Buddha, it was not intentional when I wrote this, but you are not alone. ↩︎
The personas/characters may look simple, but developing that cast of characters took a lot of work. Finding a set that is small enough to be memorable but which captures the essentials is hard work. One key insight was separating out the projects people are building from the characters building them, since otherwise you get a combinatorial explosion. ↩︎
Async cancellation is an area I deseparately want to return to! I still think we want some kind of structured concurrency like solution. My current thinking is roughly that we want something like moro for task-based concurrency and something like Yosh’s merged streams for handling “expect one of many possible message”-like scenarios. ↩︎

https://smallcultfollowing.com/babysteps/blog/2023/10/20/using-rust/

Eurorust reflections

Oct 14, 2023 Updated Oct 14, 2023

Show full content

I’m on the plane back to the US from Belgium now and feeling grateful for having had the chance to speak at the EuroRust conference 1. EuroRust was the first Rust-focused conference that I’ve attended since COVID (though not the first conference overall). It was also the first Rust-focused conference that I’ve attended in Europe since…ever, from what I recall.2 Since many of us were going to be in attendance, the types team also organized an in-person meetup which took place for 3 days before the conference itself3. Both the meetup and the conference were great in many ways, and sparked a lot of ideas. I think I’ll be writing blog posts about them for weeks to come, but I thought that to start, I’d write up something general about the conference itself, and some of my takeaways from the experience

It’s great to talk to people using Rust

When I started on Rust, I figured the project was never going to go anywhere — I mean, come on, we were making a new programming language. What are the odds it’ll be a success? But it still seemed like fun. So I set myself a simple benchmark: I will consider the project a success the first time I see an announcement where somebody built something cool with it, and I didn’t know them beforehand. In those days, everybody using Rust was also hanging out on IRC or on the mailing list.

Well, that turned out to be a touch on the conservative side. These days, Rust has gotten big enough that the core project itself is just a small piece of the action. It’s just amazing to hear all the things people are using Rust for. Just looking at the conference sponsors alone, I loved meeting the Shuttle and Tauri/CrabNebula teams and I got excited about playing with both of them. I had a great time talking to the RustRover team about the possibilities for building custom diagnostics and the ways we could leverage their custom GUI to finally get past the limitations of the terminal when we present error messages. But one of my favorite parts happened on the tram ride home, when I randomly met the maintainer of PyO3. Such a cool project, and definite inspiration for work I’ve been doing lately, like duchess.

Rust teachers everywhere

Speaking of Shuttle and Tauri, both of them are interesting in a particular way: they are empowerment efforts in their own right, and so they attract people whose primary interest is not Rust itself, but rather achieving some other goal (e.g., cloud development, or building a GUI application). It’s cool to see Rust empowering people to build other empowerment apps, but it’s also a fascinating source of data. Both of those projects have started embarking on efforts to teach Rust precisely because that will help grow their userbase. The Shuttle blog has all kinds of interesting articles4; the Tauri folks told me about their efforts to build Rust articles specifically targeting JavaScript and TypeScript programmers, which required careful choice of terminology and concepts.

The whole RustFest idea seems to have really worked

At some point, RustFest morphed from a particular conference into a kind of ‘meta conference’ organization, helping others to organize and run their own events. Looking over the calendar of Rust events in Europe, I have to say, that looks like it’s worked out pretty dang well. Hats off to y’all on that. Between EuroRust, RustLab in Italy, Rust Nation in the UK, and probably a bunch more that I’m not aware of.

I should also say that meeting the conference organizers at this conference was very nice. Both the EuroRust organizers (Marco and Sarah, from Mainmatter) were great to talk to, and I finally got to meet Ernest (now organizing Rust Nation in the UK), whom I’ve talked to on and off over the years but never met in person.

I do still miss the cozy chats at Rust Belt Rust (RIP), but this new generation of Rust conferences (and their organizers) is pretty rad too. Plus I get to eat good cheese and drink beer outdoors, two things that for reasons unbeknownst to me are all too rare in the United States.

The kids are all right

One of my favorite things about being involved in the Rust project has been watching it sustain and reinvent itself over the years. This year at the conference I got to see the “new generation” of Rust maintainers and contributors — some of them, like @davidtwco, I had met before, but who have gone from “wanna be” Rust contributor to driving core initiatives like the diagnostic translation effort. Others — like @bjorn3, @WaffleLapkin, @Nilstrieb, and even @MaraBos — I had never had a chance to meet before. I love that working on Rust lets you interact with people from all other the world, but there’s nothing like putting a name to a face, and getting to give someone a hug or shake their hand.

But yeah, there’s that thing

So, let me say up front, due to scheduling conflicts, I wasn’t able to attend RustConf this year (or last year, as it happens). But I read Adam Chalmer’s blog post that many people were talking about, and I saw this paragraph…

Rustconf definitely felt sadder and downbeat than my previous visit. Rustconf 2019 felt jubilant. The opening keynote celebrated the many exciting things that had happened over the last year. Non-lexical lifetimes had just shipped, which removed a ton of confusing borrow checker edge cases. Async/await was just a few short months away from being stabilized, unleashing a lot of high-performance, massively-scalable software. Eliza Weisman was presenting a new async tracing library which soon took over the Rust ecosystem. Lin Clark presented about how you could actually compile Rust into this niche thing called WebAssembly and get Rust to run on the frontend – awesome! It felt like Rust had a clear vision and was rapidly achieving its goals. I was super excited to be part of this revolution in software engineering.

…and it made me feel really sad.5 Rust’s mission has always been empowerment. I’ve always loved the “can do” spirit of Rust, the way we aim high and try to push boundaries in every way we can. To me, the open source org has always been an important part of how we empower.

Developing a programming language, especially a compiled one, is often viewed as the work of “wizards”, just like systems programming. I think Rust proves that this “wizard-like” reputation has more to do with the limitations of the tools we were using than the task itself. But just like Rust has the goal of making systems programming more practical and accessible, I like to think the Rust org helps to open up language development to a wider audience. I’ve seen so many people come to Rust, full of enthusiasm but not so much experience, and use it to launch a new career.

But, if I’m honest, I’ve also seen a lot of people come into Rust full of enthusiasm and wind up burned out and frustrated. And sometimes I think that’s precisely because of our “sky’s the limit” attitude — sometimes we can get so ambitious, we set ourselves up to crash and burn.

Sometimes “thinking big” means getting nowhere

Everybody wants to “think big”. And Rust has always prided itself on taking a “holistic view” of problems — we’ve tried to pay attention to the whole project, not just generating good code, but targeting the whole experience with quality diagnostics, a build system, an easy way to manage which Rust version you want, a package ecosystem, etc. But when we look at all the stuff we’ve built, it’s easy to forget how we got there: incrementally and painfully.

I mean, in Ye Olde Days of Rust, we didn’t even have a borrow checker. Soundness was an aspiration, not a reality. And once we got one, it sucked to use, because the design was still stuck in some ‘old style’ thinking. And even once we had INHTWAMA6, the error messages were pretty confounding. And once we invented the idea of multiline errors, it wasn’t until late 2018 that we had NLL, which changed the game again. And that’s just the compiler! The story is pretty much the same for every other detail of the language. You used to have to build the compiler with a Makefile that was so complex, I wouldn’t be surprised if were self-aware.7

When I feel burned out, one of the biggest reasons is that I’ve fallen into the trap of thinking too big, doing too much, and as a result I am spread too thin and everything seems impossible. Just look back three years ago: the async working group was driving this crazy project, the Async Vision Doc, and it seemed like we were on top of the world. We recorded all these stories of how async Rust was hard, and we were thinking about how we could solve it. Not surprisingly, we found that these stories were sometimes language problems, but just as often they were library limitations, or gaps in the tooling, or the docs. And so we set out an expansive vision, spawning out a ton of subprojects. And all the time, there was a voice in my head saying, “is this really going to work?”

Well, I’d say the answer is “no”. I mean, we made a lot of progress. We are going to stabilize async functions in traits this year, and that is awesome. We made a bunch of improvements to async usability, most notably cjgillot’s fantastic PR that improves the accuracy of send bounds and futures, preventing a whole ton of false errors (though that work wasn’t really done in coordination with the async wg effort per se, it’s just because cjgillot is out there silently making huge refactors8).

And yet, there’s a lot we didn’t do. We don’t have generators. We didn’t yet find a way to make futures smaller. We didn’t really drive to ground the conversation on structured concurrency. We also took a lot longer to do stuff than I hoped. I thought async functions in traits would ship in 2021 — it’s shipping now, but it’s 2023.

Focus, focus, focus; iterate, iterate, iterate

One lesson I take away from the async wg experience is focus, focus, focus and iterate, iterate, iterate. You can (almost) never start too small. I think we were absolutely right that “doing async right” demands addressing all of those concerns, but I think that we overestimated our ability to coordinate them up front, and as a result, things like shipping async fn in traits took longer than they needed to. We are going to get the async shiny future, but we’re going to get it one step at a time.

Also: we’re a lot bigger than we used to

Still, sometimes I find that when I float ideas, I encounter a reflexive bit of pushback: “sounds great, who’s going to do it”. One the one hand, that’s the voice of experience, coming back from one too many Think Big plans that didn’t work out. But on the other, sometimes it feels a bit like “old school” thinking to me. Rust is not the dinky little project it used to be, where we all knew everybody. Rust is used by millions of developers and is one of the fastest growing language today; it powers the cloud and it’s quite possibly in your kernel. In many ways, this growth hasn’t caught up with the open source org: I’d still like to see more companies hiring dedicated Rust teams of Rust developers, or giving their employees paid time to work on Rust9. But I think that growth is coming, especially if we work harder at harnessing it, and I am very excited about what that can mean.

Nothing succeeds like success

Now I know that when we talk about burnout, we’re also talking about other kinds of drama. Maybe you think that things like ‘working iteratively’ and having more people or resources are not going to help when the problem is conflicts between people or organizations. And you’re not wrong, it’s not going to solve all conflict. But I also think that an awful lot of conflict ultimately comes out of zero-sum, scarcity-oriented thinking, or from feeling disempowered to achieve the goals you set out to do. To help with burnout, we need to do better at a number of things, including I think helping each other to practice empathy and manage conflict more productively10, but I think we also need to do better at shipping product.

Don’t be afraid to fail — you got this

One of my favorite conversations from the whole conference happened after the conference itself. I was in the midst of pitching Jack Huey on some of the organizational ideas that I’m really excited about right now, which I think can help bring the Rust project closer to being the empowering, inclusive open-source project it aspires to be. Jack wasn’t sure if they were going to work. “But”, he said, “what the heck, let’s try it! I mean, what have we got to lose? If it doesn’t work, we’ll learn something, and do something else.”11 Hell yes.

As I usually do, I’ve put my slides online. If you’re curious, take a look! If you see a typo, maybe open a PR. The speaker notes have some of the “soundrack”, though not all of it. ↩︎
Somehow, I never made it to a RustFest. ↩︎
You can find the agenda here. It contains links to the briefing documents that we prepared in advance, along with loose notes that we took during the discussions. I expect we’ll author a blog post covering the key developments on the Inside Rust blog. ↩︎
Including one I can’t wait to read about OAuth – I tried to understand Github’s docs on OAuth and just got completely lost. ↩︎
Side note, but I think Rust 2024 is shaping up to be another hugely impactful edition. There’s a very good chance we’ll have async functions in traits, type alias impl trait, and polonius, each of which is a massive usability and expressiveness win. I’m hoping we’ll also get improved temporary lifetimes in the new edition, eliminating the “blocking bugs” identified as among the most common in real-world Rust programs. And of course the last few years have already seen let-else, scoped threads, cargo add, and a variety of other changes. Gonna be great! ↩︎
INHTWAMA was the rather awkward (and inaccurate) acronym that we gave to the idea of “aliasing xor mutation” — i.e., the key principle underlying Rust’s borrow checker. The name comes from a blog post I wrote called “Imagine never hearing the phrase aliasable, mutable again”, which @pcwalton incorrectly remembered as “Imagine never hearing the words aliasable, mutable again”, and hence shortened to INHTWAMA. I notice now though that this acronym was also frequently mutated to IMHTWAMA which just makes no sense at all. ↩︎
I learned a lot from reading Rust’s Makefile in the early days. I had no idea you could model function calls in make with macros. Brilliant. I’ve always deeply admired Graydon’s Makefile wizardry there, though it occurs to me now that I never checked the git logs – maybe it was somebody else! I’ll have to go look later. ↩︎
Side note, but more often than not, I think cjgillot’s approaches are not going to work. And so far I’m 0 for 2 on this, he’s always been right. To paraphrase Brendan Eich, “always bet on cjgillot”. ↩︎
And I have some thoughts on how we can do better at encouraging them! More on that in some later posts. ↩︎
One of the biggest lessons for me in my personal life has been realizing that not telling people when I feel upset is not necessarily being kind to them and certainly not kind to myself. It seems like avoiding conflict, but it can actually lead to much larger conflicts down the line. ↩︎
Full confession, this quote is made up out of thin air. I have no memory of what words he used. But this is what he meant! ↩︎

https://smallcultfollowing.com/babysteps/blog/2023/10/14/eurorust-reflections/

Easing tradeoffs with profiles

Sep 30, 2023 Updated Sep 30, 2023

Show full content

Rust helps you to build reliable programs. One of the ways it does that is by surfacing things to your attention that you really ought to care about. Think of the way we handle errors with Result: if some operation can fail, you can’t, ahem, fail to recognize that, because you have to account for the error case. And yet often the kinds of things you care about depend on the kind of application you are building. A classic example is memory allocation, which for many Rust apps is No Big Deal, but for others is something to be done carefully, and for still others is completely verboten. But this pattern crops up a lot. I’ve heard and like the framing of designing for “what do you have to pay attention to” – Rust currently aims for a balance that errs on the side of paying attention to more things, but tries to make them easy to manage. But this post is about a speculative idea of how we could do better than that by allowing programs to declare a profile.

Profiles declare what you want to pay attention to

The core idea is pretty simple. A profile would be declared, I think, in the Cargo.toml. Profiles would never change the semantics of your Rust code. You could always copy and paste code between Rust projects with different profiles and things would work the same. But it would adjust lint settings and errors. So if you copy code from a more lenient profile into your more stringent project, you might find that it gets warnings or errors it didn’t get before.

Primarily, this means lints

In effect, a profile would be a lot like a lint group. So if we have a profile for kernel development, this would turn on various lints that help to detect things that kernel developers really care about – unexpected memory allocation, potential panics – but other projects don’t. Much like Rust-for-linux’s existing klint project.

So why not just make it a lint group? Well, actually, maybe we should – but I thought Cargo.toml would be better because it would allow us to apply more stringent checks to what dependencies you use, which features they use, etc. For example, maybe dependencies could declare that some of their features are not well suited to certain profiles, and you would get a warning if your application winds up depending on them. I imagine would select a profile when running cargo new.

Example: autoclone for Rc and Arc

Let’s give an example of how this might work. In Rust today, if you want to have many handles to the same value, you can use a reference counted type like Rc or Arc. But whenever you want to get a new handle to that value, you have to explicit clone it:

let map: Rc<HashMap> = create_map();
let map2 = map.clone(); // 👈 Clone!

The idea of this clone is to call attention to the fact that custom code is executing here. This is not just a memcpy1. I’ve been grateful for this some of the time. For example, when optimizing a concurrent data structure, I really like knowing exactly when one of my reference counts is going to change. But a lot of the time, these calls to clone are just noise, and I wish I could just write let map2 = map and be done with it.

So what if we modify the compiler as follows. Today, when you move out from a variable, you effectively get an error if that is not the “last use” of the variable:

let a = v; // move out from `v` here...
...
read(&v); // 💥 ...so we get an error when we use `v`.

What if, instead, when you move out from a value and it is not the last use, we introduce an auto-clone operation. This may fail if the type is not auto-cloneable (e.g., a Vec), but for Rc, Arc, and other O(1) clone operations, it would be equivalent to x.clone(). We could designate which types can be auto-cloneable by extra marker traits, for example. This means that let a = v above would be equivalent to let a = v.clone().

Now, here comes the interesing part. When we introduce an auto-clone, we would also introduce a lint: implicit clone operation. In the higher-level profile, this lint would be allow-by-default, but in the profile for lower-level code, if would be deny-by-default, with an auto-fix to insert clone. Now when I’m editing my concurrent data structure, I still get to see the clone operations explicitly, but when I’m writing my application code, I don’t have to think about it.

Example: dynamic dispatch with async trait

Here’s another example. Last year we spent a while exploring the ways that we can enable dynamic dispatch for traits that use async functions. We landed on a design that seemed like it hit a sweet spot. Most users could just use traits with async functions like normal, but they might get some implicit allocations. Users who cared could use other allocation strategies by being more explicit about things. (You can read about the design here.) But, as I described in my blog post The Soul of Rust, this design had a crucial flaw: although it was still possible to avoid allocation, it was no longer easy. This seemed to push Rust over the line from its current position as a systems language that can claim to be a true C alternative into a “just another higher-level language that can be made low-level if you program with care”.

But profiles seem to offer another alternative. We could go with our original design, but whenever the compiler inserted an adapter that might cause boxing to occur, it would issue a lint warning. In the higher-level profile, the warning would be allow-by-default, but in the lower-level profile, it would by deny-by-default.

Example: panic effects or other capabilities

If you really want to go crazy, we can use annotations to signal various kinds of effects. For example, one way to achieve panic safety, we might allow functions to be annotated with #[panics], signaling a function that might panic. Depending on the profile, this might require you to declare that the caller may panic (similar to how unsafe works now).

Depending how far we want to go here, we would ultimately have to integrate these kind of checks more deeply into the type system. For example, if you have a fn-pointer, or a dyn Trait call, we would have to introduce “may panic” effects into the type system to be able to track that information (but we could be conservative and just assume calls by pointer may panic, for example). But we could likely still use profiles to control how much you as the caller choose to care.

Changing the profile for a module or a function

Because profiles primarily address lints, we can also allow you to change the profile in a more narrow way. This could be done with lint groups (maybe each profile is a lint group), or perhaps with a #![profile] annotation.

Why I care: profiles could open up design space

So why am I writing about profiles? In short, I’m looking for opportunities to do the classic Rust thing of trying to have our cake and eat it too. I want Rust to be versatile, suitable for projects up and down the stack. I know that many projects contain hot spots or core bits of the code where the details matter quite a bit, and then large swaths of code where they don’t matter a jot. I’d like to have a Rust that feels closer to Swift that I can use most of the time, and then the ability to “dial up” the detail level for the code where I do care.

Conclusion: the core principles

I do want to emphasize that this idea is speculation. As far as I know, nobody else on the lang team is into this idea – most of them haven’t even heard about it!

I also am not hung up on the details. Maybe we can implement profiles with some well-named lint groups. Or maybe, as I proposed, it should go in Cargo.toml.

What I do care about are the core principles of what I am proposing:

Defining some small set of profiles for Rust applications that define the kinds of things you want to care about in that code.
- I think these should be global and not user-defined. This will allow profiles to work more smoothly across dependencies. Plus we can always allow user-defined profiles or something later if want.
Profiles never change what code will do when it runs, but they can make code get more warnings or errors.
- You can always copy-and-paste code between applications without fear that it will behave differently (though it may not compile).
- You can always understand what Rust code will do without knowing the profile or context it is running in.
Profiles let us do more implicit things to ease ergonomics without making Rust inapplicable for other use cases.
- Looking at Aaron Turon’s classic post introducing the lang team’s Rust 2018 ergonomics initiative, profiles let users dial down the context dependence and applicability of any particular change.

Back in the early days of Rust, we debated a lot about what ought to be the rule for when clone was required. I think the current rule of “memcpy is quiet, everything else is not” is pretty decent, but it’s not ideal in a few ways. For example, an O(1) clone operation like incrementing a refcount is not the same as an O(n) operation like cloning a vector, and yet they look the same. Moreover, memcpy’ing a giant array (or Future) can be a real performance footgun (not to mention blowing up your stack), and yet we let you do that quite quietly. This is a good example of where profiles could help, I believe. ↩︎

https://smallcultfollowing.com/babysteps/blog/2023/09/30/profiles/

Polonius revisited, part 2

Sep 29, 2023 Updated Sep 29, 2023

Show full content

In the previous Polonius post, we formulated the original borrow checker in a Polonius-like style. In this post, we are going to explore how we can extend that formulation to be flow-sensitive. In so doing, we will enable the original Polonius goals, but also overcome some of its shortcomings. I believe this formulation is also more amenable to efficient implementation. As I’ll cover at the end, though, I do find myself wondering if there’s still more room for improvement.

Running example

We will be working from the same Rust example as the original post, but focusing especially on the mutation in the false branch1:

let mut x = 22;
let mut y = 44;
let mut p: &'0 u32 = &x;
y += 1;
let mut q: &'1 u32 = &y; // Borrow `y` here (L1)
if something() {
    p = q;  // Store borrow into `p`
    x += 1;
} else {
    y += 1; // Mutate `y` on `false` branch
}
y += 1;
read_value(p); // May refer to `x` or `y`

There is no reason to have an error on this line. There is a borrow of y, but on the false branch that borrow is only stored in q, and q will never be read again. So there cannot be undefined behavior (UB).

Existing borrow checker flags an error

The existing borrow checker, however, is not that smart. It sees read_value(p) at the end and, because that line could potentially read x or y, it flags the y += 1 as an error. When expressed this way, maybe you can have some sympathy for the poor borrow checker – it’s not an unreasonable conclusion! But it’s wrong.

The core issue of the existing borrow check stems from its use of a flow insensitive subset graph. This in turn is related to how it does the type check. In Polonius today, each variable has a single type and hence a single origin (e.g., q: &'1 u32). This causes us to conflate all the possible loans that the variable may refer to throughout execution. And yet as we have seen, this information is actually flow dependent.

The borrow checker today is based on a pretty standard style of type checker applied to the MIR. Essentially there is an environment that maps each variable to a type.

Env  = { X -> Type }
Type = scalar | & 'Y T | ...

Then we have type-checking inference rules that thread this same environment everywhere. Conceptually the structure of the the rules is as follows:

construct Env from local variable declarations
Env |- each basic block type checks
--------------------------
the MIR type checks

Type-checking a place then uses this Env, bottoming out in an inference rule like:

Env[X] = T
-------------
Env |- X : T

Flow-sensitive type check

The key thing that makes the borrow checker flow insensitive is that we use the same environment at all points. What if instead we had one environment per program point:

EnvAt = { Point -> Env }

Whenever we type check a statement at program point A, we will use EnvAt[A] as its environment. When program point A flows into point B, then the environment at A must be a subenvironment of the environment at B, which we write as EnvAt[A] <: EnvAt[B].

The subenvironment relationship Env1 <: Env2 holds if

for each variable X in Env2:
- X appears in Env1
- Env1[X] <: Env2[X]

There are two interesting things here. The first is that the set of variables can change over time. The idea is that once a variable goes dead, you can drop it from the environment. The second is that the type of the variable can change according to the subtyping rules.

You can think of flow-sensitive typing as if, for each program variable like q, we have a separate copy per program point, so q@A for point A and q@B for point at B. When we flow from one point to another, we assign from q@A to q@B. Like any assignment, this would require the type of q@A to be a subtype of the type of q@B.

Flow-sensitive typing in our example

Let’s see how this idea of a flow-sensitive type check plays out for our example. First, recall the MIR for our example from the previous post:

flowchart TD
  Intro --> BB1
  Intro["let mut x: i32\nlet mut y: i32\nlet mut p: &'0 i32\nlet mut q: &'1 i32"]
  BB1["BB1:\np = &x;\ny = y + 1;\nq = &y;\nif something goto BB2 else BB3"]
  BB1 --> BB2
  BB1 --> BB3
  BB2["BB2\np = q;\nx = x + 1;\n"]
  BB3["BB3\ny = y + 1;"]
  BB2 --> BB4;
  BB3 --> BB4;
  BB4["BB4\ny = y + 1;\nread_value(p);\n"]

  classDef default text-align:left,fill-opacity:0;

One environment per program point

In the original, flow-insensitive type check, the first thing we did was to create origin variables ('0, '1) for each of the origins that appear in our types. You can see those variables in the chart above. So we effectively had an environment like

Env_flow_insensitive = {
    p: &'0 i32,
    q: &'1 i32,
}

But now we are going to have one environment per program point. There is one program point in between each MIR statement. So the point BB1_0 would be the entry to basic block BB1, and BB1_1 would be after the first statement. So we have Env_BB1_0, Env_BB1_1, etc. We are going to create distinct origin variables for each of them:

Env_BB1_0 = {
    p: &'0_BB1_0 i32,
    q: &'1_BB1_0 i32,
}

Env_BB1_1 = {
    p: &'0_BB1_1 i32,
    q: &'1_BB1_1 i32,
}

...

Type-checking the edge from BB1 to BB2

Let’s look at point BB1_3, which is the final line in BB1, which in MIR-speak is called the terminator. It is an if terminator (if something goto BB2 else BB3). To type-check it, we will take the environment on entry (Env_BB1_3) and require that it is a sub-environment of the environment on entry to the true branch (Env_BB2_0) and on entry to the false branch (Env1_BB3_0).

Let’s start with the true branch. Here we have the environment Env_BB2_0:

Env_BB2_0 = {
    q: &'1_BB2_0 i32,
}

You should notice something curious here – why is there no entry for p? The reason is that the variable p is dead on entry to BB2, because its current value is about to be overridden. The type checker knows not to include dead variables in the environment.

This means that…

Env_BB1_3 <: Env_BB2_0 if the type of q at BB1_3 is a subtype of the type of q at BB2_0…
…so &'1_BB1_3 i32 <: &'1_BB2_0 i32 must hold…
…so '1_BB1_3 : '1_BB2_0 must hold.

What we just found then is that, because of the edge from BB1 to BB2, the version of '1 on exit from BB1 flows into '1 on entry to BB2.

Type-checking the p = q assignment

let’s look at the assignment p = q. This occurs in statement BB2_0. The environment before we just saw:

Env_BB2_0 = {
    q: &'1_BB2_0 i32,
}

For an assignment, we take the type of the left-hand side (p) from the environment after, because that is what we are storing into. The environment after is Env_BB2_1:

Env_BB2_1 = {
    p: &'0_BB2_1 i32,
}

And so to type check the statement, we get that &'1_BB2_0 i32 <: &'0 BB2_1 i32, or '1_BB2_0 : '0_BB2_1.

In addition to this relation from the assignment, we also have to make the environment Env_BB2_0 be a subenvironment of the env after Env_BB2_1. But since the set of live variables are disjoint, in this case, that doesn’t add anything to the picture.

Type-checking the edge from BB1 to BB3

As the final example, let’s look at the false edge from BB1 to BB3. On entry to BB3, the variable q is dead but p is not, so the environment looks like

Env_BB3_0 = {
    p: &'0_BB3_0 i32,
}

Following a similar process to before, we conclude that '0_BB1_3 : '0_BB3_0.

Building the flow-sensitive subset graph

We are now starting to see how we can build a flow-sensitive version of the flow graph. Instead of having one node in the graph per origin variable, we now have one node in the graph per origin variable per program point, and we create an edge N1 -> N2 between two nodes if the type check requires that N1 : N2, just as before. Basically the only difference is that we have a lot more nodes.

Putting together what we saw thus far, we can construct a subset graph for this program like the following. I’ve excluded nodes that correspond to dead variables – so for example there is no node '1_BB1_0, because '1 appears in the variable q, and q is dead at the start of the program.

flowchart TD
    subgraph "'0"
        N0_BB1_0["'0_BB1_0"]
        N0_BB1_1["'0_BB1_1"]
        N0_BB1_2["'0_BB1_2"]
        N0_BB1_3["'0_BB1_3"]
        N0_BB2_1["'0_BB2_1"]
        N0_BB3_0["'0_BB3_0"]
        N0_BB4_0["'0_BB4_0"]
        N0_BB4_1["'0_BB4_1"]
    end

    subgraph "'1"
        N1_BB1_2["'1_BB1_2"]
        N1_BB1_3["'1_BB1_3"]
        N1_BB2_0["'1_BB2_0"]
    end
    
    subgraph "Loans"
        L0["{L0} (&x)"]
        L1["{L1} (&y)"]
    end
    
    L0 --> N0_BB1_0
    L1 --> N1_BB1_2
    
    N0_BB1_0 --> N0_BB1_1 --> N0_BB1_2 --> N0_BB1_3
    N0_BB1_3 --> N0_BB3_0
    N0_BB3_0 --> N0_BB4_0 --> N0_BB4_1
    N0_BB2_1 --> N0_BB4_0

    N1_BB1_2 --> N1_BB1_3
    N1_BB1_3 --> N1_BB2_0
    
    N1_BB2_0 --> N0_BB2_1

Just as before, we can trace back from the node for a particular origin O to find all the loans contained within O. Only this time, the origin O also indicates a program point.

In particular, compare '0_BB3_0 (the data reachable from p on the false branch of the if) to '0_BB4_0 (the data reachable after the if finishes). We can see that in the first case, the origin can only reference L0, but afterwards, it could reference L1.

Active loans

Just as in described in the previous post, to complete the analysis we compute the active loans. Active loans are defined in almost exactly the same way, but with one twist. A loan L is active at a program point P if there is a path from the borrow that created L to P where, for each point along the path…

there is some live variable whose type at P may reference the loan; and,
the place expression that was borrowed by L (here, x) is not reassigned at P.

See the bolded test? We are now taking into account the fact that the type of the variable can change along the path. In particular, it may reference distinct origins.

Implementing using dataflow

Just as in the previous post, we can compute active loans using dataflow. In particular, we gen a loan when it is issued, and we kill a loan L at a point P if (a) there are no live variables whose origins contain L or (b) the path borrowed by L is assigned at P.

Applying this to our running example

When we apply this to our running example, the unnecessary error on the false branch of the if goes away. Let’s walk through it.

Entry block

In BB1, we gen L0 and L1 at their two borrow sites, respectively. As a result, the active loans on exit from BB1 wil be {L0, L1}:

flowchart TD
  Start["..."]
  BB1["BB1:
       p = &x; // Gen: L0
       y = y + 1;
       q = &y; // Gen: L1
       if something goto BB2 else BB3
  "]
  BB2["..."]
  BB3["..."]
  BB4["..."]
 
  Start --> BB1
  BB1 --> BB2
  BB1 --> BB3
  BB2 --> BB4
  BB3 --> BB4
 
  classDef default text-align:left,fill:#ffffff;
  classDef highlight text-align:left,fill:yellow;
  class BB3 highlight

The false branch of the if

On the false branch of the if (BB3), the only live reference is p, which will be used later on in BB4. In particular, q is dead.

In the flow insensitive version, when the borrow checker looked at the type of p, it was p: &'0 i32, and '0 had the value {L0, L1}, so the borrow checker concluded that both loans were active.

But in the flow sensitive version we are looking at now, the type of p on entry to BB3 is p: &'0_BB3_0 i32. And, consulting the subset graph shown earlier in this post, the value of '0_BB3_0 is just {L0}. So there is a kill for L1 on entry to the block. This means that the only active loan is L0, which borrows x. This in turn means that y = y + 1 is not an error.

flowchart TD
  Start["
    ...
  "]
  BB1["
      BB1:
      p = &x; // Gen: L0
      ...
      q = &y; // Gen: L1
      ...
  "]
  BB2["
      BB2:
      ...
  "]
  BB3["
      BB3:
      // Kill `L1` (no live references)
      // Active loans: {L0}
      y = y + 1;
  "]
  BB4["
      BB4:
      ...
      read_value(p); // later use of `p`
  "]
 
  Start --> BB1
  BB1 --> BB2
  BB1 --> BB3
  BB2 --> BB4
  BB3 --> BB4
 
  classDef default text-align:left,fill:#ffffff;
  classDef highlight text-align:left,fill:yellow;
  class BB3 highlight

The role of invariance: vec-push-ref

I didn’t highlight it before, but invariance plays a really interesting role in this analysis. Let’s see another example, a simplified version of vec-push-ref from polonius:

let v: Vec<&'v u32>;
let p: &'p mut Vec<&'vp u32>;
let x: u32;

/* P0 */ v = vec![];
/* P1 */ p = &mut v; // Loan L0
/* P2 */ x += 1; // <-- Expect NO error here.
/* P3 */ p.push(&x); // Loan 1
/* P4 */ x += 1; // <-- 💥 Expect an error here!
/* P5 */ drop(v);

What makes this interesting? We create a reference p at point P1 that points at v. We then insert a borrow of x into the reference p. After that point, the reference p is dead, but the loan L1 is still active – this is because it is also stored in v. This connection between p and v is what is key about this example.

The way that this connection is reflected in the type system is through variance. In particular, a type &mut T is invariant with respect to T. This means that when you assign one reference to another, the type that they reference must be exactly the same.

In terms of the subset graph, invariance works out to creating bidirectional edges between origins. Take a look at the resulting subset graph to see what I mean. To keep things simple, I am going to exclude nodes for p: the interesting origins here at 'v (the data in the vector v) and 'vp (the data in the vector referenced by p – which is also v).

flowchart TD
    subgraph "Loans"
      L1["L1 (&x)"]
    end
    
    subgraph "'v"
      V_P0["'v_P0"]
      V_P1["'v_P1"]
      V_P2["'v_P2"]
      V_P3["'v_P3"]
      V_P4["'v_P4"]
      V_P5["'v_P5"]
    end

    subgraph "'vp"
      VP_P1["'vp_P1"]
      VP_P2["'vp_P2"]
      VP_P3["'vp_P3"]
    end

    V_P0 --> V_P1 --> V_P2 --> V_P3 --> V_P4 --> V_P5
    
    V_P1 <---> VP_P1
    VP_P1 <---> VP_P2 <---> VP_P3
        
    L1 --> VP_P3

The key part here are the bidirectional arrows between v_P1 and vp_P1 and between vp_P1 and vp_P3. How did those come about?

The first edge resulted from p = &mut v. The type of v (at P1) is Vec<&'v_P1 u32>, and that type had to be equal to the referent of p (Vec<&'vp_P1 u32>). Since the types must be equal, that means 'v_P1: 'vp_P1 and vice versa, hence a bidirectional arrow.
The second edge resulted from the flow from P1 to P3. The variable p is live across that edge, so its type before (&'p_P1 mut Vec<&'vp_P1 u32>) must be a subtype of its type after (&'p_P3 mut Vec<&'vp_P3 u32>). Because &mut references are invariant with respect to their referent types, this implies that 'vp_P1 and 'vp_P3 must be equal.

Put all together, and we see that L1 can reach 'v_P4 and 'v_P5, even though it only flowed into an earlier point in the graph. That’s cool! We will get the error we expect.

On the other hand, we can also see that there is some imprecision introduced through invariance. The loan L1 is introduced at point P3, and yet it appears to flow from 'vp_P3 backwards in time to 'vp_P2, 'vp_P1, over to 'v_P1, and downward from there. If we were only looking at the subset graph, then, we would conclude that both x += 1 statements in this program are illegal, but in fact only the second one causes a problem.

Active loans to the rescue (again)

The imprecision we see here is very similar to the imprecision we saw in the original polonius. Effectively, invariance is taking away some of our flow sensitivity. Interestingly, the active loans portion of the analysis makes up for this, in the same way that it did in the previous post. In vec-push-ref, L1 will only be generated at P3, so even though it can reach 'v_P2 via the subset graph, it is not considered active at P2. But once it is generated, it is not killed, even when p goes dead, because it can flow into 'v_P4. Therefore we get the one error we expect.

Conclusion

I’m going to stop this post here. I’ve described a version of polonius where we give variables distinct types at each program point and then relate those types together to create an improved subset graph. This graph increases the precision of the active loans analysis such that we don’t get as many false errors, but it is still imprecise in some ways.

I think this formulation is interesting for a few reasons. First, the most expensive part of it is going to be the subset graph, which has a LOT of nodes and edges. But that can be compressed significantly with some simple heuristics. Moreover, the core operation we perform on that graph is reachability, and that can be implemented quite efficiently as well (do a strongly connected components computation to reduce the graph to a tree, and then you can assign pre- and post-orderings and just compare indices). So I believe it could scale in practice.

I have worked through a few more classic examples, and I may come back to them in future posts, so far this analysis seems to get the results I expect. However, I would also like to go back and compare it more deeply to the original polonius, as well as to some of the formulations that came out of academia. There is still something odd about leaning on the dataflow check. I hope to talk about some of that in follow-up posts (or perhaps on Zulip or elsewhere with some of you readers!).

If this particular example feels artificial, that’s because it is. But similar errors cause more common errors, most notably Problem Case #3. ↩︎

https://smallcultfollowing.com/babysteps/blog/2023/09/29/polonius-part-2/

Empathy in open source: be gentle with each other

Sep 27, 2023 Updated Sep 27, 2023

Show full content

Over the last few weeks I had been preparing a talk on “Inclusive Mentoring: Mentoring Across Differences” with one of my good friends at Amazon. Unfortunately, that talk got canceled because I came down with COVID when we were supposed to be presenting. But the themes we covered in the talk have been rattling in my brain ever since, and suddenly I’m seeing them everywhere. One of the big ones was about empathy — what it is, what it isn’t, and how you can practice it. Now that I’m thinking about it, I see empathy so often in open source.

What empathy is

In her book Atlas of the Heart1, Brené Brown defines empathy as

an emotional skill set that allows us to understand what someone is experiencing and to reflect back that understanding.

Empathy is not about being nice or making the other person feel good or even feel better2. Being empathetic means understanding what the other person feels and then showing them that you understand.

Understanding what the other person feels doesn’t mean you have to feel the same way. It also doesn’t mean you have to agree with them, or feel that they are “justified” in those feelings. In fact, as I’ll explain in a second, strong feelings and emotion are by design limited in their viewpoints — they are always showing us something, and showing us something real, but they are never showing us the full picture.

Usually we feel multiple, seemingly contradictory things, which can leave everything feeling like a big muddle. The goal, from what I can see, is to be able to pull those multiple feelings apart, understand them, and then – from a balanced place – decide how we are going to react to them. Hopefully in real time. Pretty damn hard, in my experience, but something we can get better at.

People are not any one thing

Some time back, Aaron Turon introduced me to Internal Family Systems through the book Self Therapy 3. It’s really had a big influence on how I think about things. The super short version of IFS is “Inside Out is real”. We are each composites of a number of independent parts which capture pieces of our personality. When we are feeling balanced and whole, we are switching between these parts all the time in reaction to what is going on around us.

But sometimes things go awry. Sometimes, one part will get very alarmed about what it perceives to be happening, and it will take complete control of you. This is called blending. While you are blended, the part is doing its best to help you in the ways that it knows: that might mean making you super anxious, so that you identify risks, or it might mean making you yell at people, so that they will go away and you don’t have to risk them letting you down. No matter which part you are blended with in the moment, though, you lose access to your whole self and your full range of capabilities. Even though the part will help you solve the immediate problem, it often does so in ways that create other problems down the line.

This concept of parts has really helped me to understand myself, but it has also helped me to understand what previously seemed like contradictory behavior in other people. The reason that people sometimes act in extreme ways, ways that seem so different from the person I know at other times, is because they’re blended — they’re not the person I know at that time, they’re just one part of that person. And probably a part that has helped them through some tough times in the past.

Empathy as “holding space”

I’ve often heard the term ‘emotional labor’ and, to be honest, I had a hard time connecting to it. But in Lama Rod Owen’s “Love and Rage”, he talks about emotional labor in terms of “the work we do to help people process their emotions” and, in particular, gives this list of examples:

This includes actively listening to others, asking how people are feeling, checking in with them, letting them vent in front of you, and not reacting to someone when they are being rude or disrespectful.

Now this list struck a chord with me. To me, the hardest part of empathy is holding space — letting someone have a reaction or a feeling without turning away. When people are reacting in an extreme way — whether it’s venting or being rude — it makes us uncomfortable, and often we’ll try to make them stop. This can take many forms. It could mean changing the topic, dismissing it (“get over it”, “I’m sure they didn’t mean it like that”), or trying to fix it (“what you need to do is…”, “let’s go kick their ass!”) For me, when people do that, it makes me feel unseen and kind of upset. Even if the other person is getting righteously angry on my behalf, I feel like suddenly the situation isn’t about me and how I want to think about things.

What does all this have to do with Github?

At this point you might be wondering “what do obscure therapeutic processes and buddhist philosophy have to do with Github issue threads?” Take another look at Lama Rod Owens’s list of examples of emotional labor, especially the last one:

not reacting to someone when they are being rude or disrespectful

To be frank, being an open-source maintainer means taking a lot of shit4. In his insightful, and widely discussed, talk “The Hard Parts of Open Source", Evan Czaplicki identified many of the “failure modes” of open source comment threads. One very memorable pattern is the “Why don’t you just…” comment, where somebody chimes in with an obvious alternative, as if you hadn’t thought of it. There is also my personal favorite, what I’ll call the “double agent” comment, where someone seems to feel that your goal is actually to ruin the project you’ve put so much effort into, and so comes in hot and angry.

My goal is always to respond to comments as if the commenter had been constructive and polite, or was my best friend. I don’t always achieve my goal, especially in forums where I have to respond quickly5. But I honestly do try. One technique is to find the key points in their comment and rephrase them, to be sure you understand, and then give your take. When I do that, I usually learn things — even when I initially thought somebody was just a blowhard, there is often a strong point underlying their argument, and it may lead me to change course if I listen to it. If nothing else, it’s always good to know the counterarguments in depth.

Empathy as a maintainer

And this brings us to the role of empathy as an open-source maintainer. As I said, these days, I see it popping up everywhere. To start, the idea of responding to someone’s comment, even one that feels rude, by identifying the key points they are trying to make feels to me like empathy, even if those points are often highly technical6. Fundamentally, empathy is all about understanding the other person and letting them know you understand, and that is what I am trying to do here.

But empathy comes into play in a more meta way as well. Trying to think how somebody feels — and why they might be feeling that way — can really help me to step back from feeling angry or injured by the tone of a comment and instead to refocus on what they are trying to communicate to me. Aaron Turon wrote a truly insightful and honest series of posts about his perspective on this called Listening and Trust. In part 3 of that series, he identified some of the key contributors to comment threads that go off the rails, what he called “momentum, urgency, and fatigue”. It’s worth reading that post, or reading it again if you already have. It’s a masterpiece of looking past the immediate reactions to understand better what’s going on, both within others and yourself.

Empathy when we surprise people

When Apple is working on a new product, they keep it absolutely top secret until they are ready – and then they tell the world, hoping for a big splash. This works for them. In open source, though, it’s an anti-pattern. The last thing you want to do is to surprise people – that’s a great way to trigger those parts we were talking about.

The difference, I think, is that open source projects are community projects – everybody feels some degree of ownership. That’s a big part of what makes open source so great! But, at the same time, when somebody starts messing with your stuff, that’s sure to get you upset. Paul Ford wrote an article identifying this feeling, which he called “Why wasn’t I consulted?”.

I find the phrase “Why wasn’t I consulted?” a pretty useful reminder for how it feels, but to be honest I’ve never liked it. The problem is that to me it feels condescending. But I totally get the way that people feel. It doesn’t always mean I think they’re right, or even justified in that feeling. But I get it, and I respect it. Heck, I feel it too!7

My personal creed these days is to be as open and transparent as I can with what I am doing and why. It’s part of why I love having this blog, since it lets me post up early ideas while I am still thinking about them. This also means I can start to get input and feedback. I don’t always listen to that feedback. A lot of times, people hate the things I am talking about, and they’re not shy about saying so – I try to take that as a signal, but just one signal of many. If people are upset, I’m probably doing something wrong, but it may not be the idea, it may be the way I am talking about it, or some particular aspect of it.

Empathy when we design our project processes

As I prepared this blog post, I re-read Aaron’s Listening and Trust, and I was struck again by how many insights he had there. One of them was that by applying empathy, and looking at our processes from the lens of how it feels to be a participant – what concerns get triggered – we can make changes so that everyone feels more included and less worn down. The key part here is that we have to look not only as how things feel for ourselves, but also how they feel for the participants – and for those who are not yet participating! There’s a huge swath of people who do not join in on Rust discussions, and I think we’re really missing out. This kind of design isn’t easy, but it’s crucial.

Empathy as a contributor

I’ve focused a lot on the role of empathy as an open-source maintainer. But empathy absolutely comes into play as a contributor. There’s a lot said on how people behave differently when commenting on the internet versus in person, and how the tone of a text comment can so easily be misread.

The fact is, when you contribute to an open-source project, the maintainers are going to come up short. They’re going to overlook things. They may not respond promptly to your comment or PR – they’re likely going to hide their head in the sand because they’re overwhemed.8 Or they may snap at you.

So what do you do when people let you down? I think the best is to speak for your feelings, but to do so in an empathetic way. If you are feeling hurt, don’t leave an angry comment. This doesn’t mean you have to silence your feelings – but just own them as your feelings. “Hey, I get that you are busy. Still, when I open a PR and nobody answers, it feels like this contribution is not wanted. If that’s true, just tell me, I can go elsewhere.”9

I bet some of you, when you read that last comment, were like “oh, heck no”. It’s scary to talk about how you feel. It takes a lot of courage. But it’s effective – and it can help the maintainer get unblended from whatever part they are in and think about things from your perspective. Maybe they will answer, “No, I really want this change, but I am just super busy right now, can you give me 3 months?” Or maybe they will say, “Actually, you’re right, I am not sure this is the right direction. I’m sorry that I didn’t say so before you put so much work into it.” Or maybe they won’t answer at all, because they’re hiding from the github issue thread – but when they come back and read it much later, they’ll reflect on how that made you feel, and try to be more prompt the next time. Either way, you know that you spoke up for yourself, but did so in a way that they can hear.

Empathy for ourselves and our own parts

This brings me to my final topic. No matter what role we play in an open-source project, or in life, the most important person to have empathy for is yourself. Ironically, this is often the hardest. We usually have very high expectations for ourselves, and we don’t cut ourselves much slack. As a maintainer, this might manifest as feeling you have to respond to every comment or task, and feeling bad when you don’t keep up. As a contributor, it might be feeling crappy when people point out bugs in your PR. No matter who we are, it might be kicking ourselves and feeling shame when we overreact in a comment.

In my view, shame is basically never good. Of course I make mistakes, and I regret them. But when I feel shame about them, I am actually focusing inward, focusing on my own mistakes instead of focusing on how I can make it up to the other person or resolve my predicament. It doesn’t actually do anyone any good.

I think there are different ways to experience shame. I know how I experience it. It feels like one of my parts is kicking the crap out of itself. And that really hurts. It hurts so bad that it tends to cause other parts to rise up to try and make it stop. That might be by getting angry at others — “it’s their fault we screwed up!” — or, more common for me, it might be by feeling depressed, withdrawing, and perhaps focusing on some technical project that can make me feel good about myself.

In their classic and highly recommended blog post, My FOSS Story, Andrew Gallant talked about how they deal with an overflowing inbox full of issues, feature requests, and comments:

The solution that I’ve adopted for this phenomenon is one that I’ve used extremely effectively in my personal life: establish boundaries. Courteously but firmly setting boundaries is one of those magical life hacks that pays dividends once you figure out how to do it. If you don’t know how to do it, then I’m not sure exactly how to learn how to do it unfortunately. But setting boundaries lets you focus on what’s important to you and not what’s important to others.

It can be really easy to overextend yourself in an open-source project. This could mean, as a maintainer, feeling you have to respond to every comment, fix every bug. Overextending yourself in turn is a great way to become blended with a part, and start acting out some of those older, defensive strategies you have for dealing with stress.

Also, I’ve got bad news. You are going to screw up in some way. It might be overextending yourself10. It might be responding poorly. Or pushing for an idea that turns out to be very deeply wrong. When you do that, you have a choice. You can feel shame, or you can extend compassion and empathy to yourself. It’s ok. Mistakes happen. They are how we learn.

Once you’ve gotten past the shame, and realized that making mistakes doesn’t make you bad, you can start to think about repair. OK, so you messed up. What can you do about it? Maybe nothing is needed. Or maybe you need to go and undo some of what you did. Or maybe you have to go and tell some people that what they are doing is not ok. Either way, compassion and empathy for yourself is how you will get there.

On the limits of my own experience

Before I go, I want to take a moment to acknowledge the limits of my own experience. I am a cis, white male, and I think in this post it shows. When I encounter antipathy, it tends to be targeted at individual things I have done or ideas I am espousing. At most, it might come about because of the role I am playing. I don’t encounter conscious or unconscious bias on the basis of my race, gender, sexual orientation, or any other such thing. This gives me a lot of luxury. For example, for the most part, I can take a rude comment and I can usually find an underlying technical point to focus on in my response. This is not true for all maintainers. In writing this post, I thought a lot about how the dynamics of open source seem almost perfectly designed11 to be exclusive to people who are not from groups deemed “high status” by society.

Rust has a pretty uneven track record here. There are projects that do better. Improving our processes to take better account of how they feel for participants is definitely a necessary step, along with other things. One thing I am convinced of: the more people that get involved in Rust – and especially the more distinct backgrounds and experiences those people have – the better it becomes. Rust is always trying to achieve 6 (previously) impossible things before breakfast, and we need all the ideas we can get.12

Be gentle with each other

If could I have just one wish, it would be this bastardized quote from the great Bill and Ted:

Be gentle with each other

We’ve talked a lot about empathy and how it comes into play, but really, in my mind, it all boils down to being gentle when somebody slips up. Note that being gentle doesn’t mean you can’t also be real and authentic about how you felt. We talked earlier about I-messages – by speaking plainly about how somebody made you feel, you can deliver a message that is both gentle and yet incredibly powerful. To me, the key is not to make assumptions about what’s going on for other people. You can never know their motivations. You can make guesses, but they’re always based on incomplete information.

Does this mean I think we should all go running around saying “when you do X, I felt like you were trying to ruin the project?” Well, not really, although I think that would be an improvement. Even better though would be to stop and think, wait, why would they be trying to ruin the project? Instead of assuming what other people are doing, tell them how they are making you feel. Maybe say, “when you do X, I feel like you are saying my use case doesn’t matter”. Or, better yet, say “when you do X, I will no longer be able to do Y, which I find really valuable”. I predict this is much more likely to lead to a constructive discussion.

It’s important to remember that the choice of words can have strong impact, too. For me, words like ruin or phrases like dumpster fire, shitshow, etc, can be quite triggering all on their own. I’m not always consistent on this. I’ve noticed that I sometimes use strong, colorful language because I think it’s funny. But I’ve also noticed that when other people do it, I can get pretty upset (“I know that code is not the best, but it’s worked for the last 3 years dang it.”).

I think you can boil all of this down to be precise and accurate when you communicate. It’s not accurate to say “you are trying to ruin the project”. You can’t know that. It is accurate to talk about what you feel and why you feel it. It’s also not accurate to say something is a dumpster fire, but it is accurate to call out shortcomings and concerns.

Anyway, I’m done giving advice. I’m no expert here, just one more person trying to learn and do the best I can. What I can say with confidence is that the things I’m talking here have really helped me personally in approaching difficult situations in my life, and I hope that they’ll help some of you too!

I bought this book when it first came out, read a bit of it, and then thought of it more as a reference — a great book for getting clear, distinguished definitions that help to elucidate the subtleties of human emotion. But when I revisited it to prepare for this talk, I was surprised to find it was much more “front-to-back” readable than I thought, and carried a lot of hidden wisdom. ↩︎
Though I think people feeling good and better is always a consequence of having encountered someone else empathetic. ↩︎
By none other than Jay Earley, inventer of the Earley parser! This guy is my hero. ↩︎
And I say this as a cis white man, which means I don’t even have to deal with shit resulting from people’s conscious or unconscious bias. ↩︎
This is one reason I don’t personally like fast moving threads and discussions, and I often limit the venues where I will participate. I need a bit of time to sit with things and process them. ↩︎
It’s worth highlighting that the key points they are trying to make are not always technical. Re-reading Aaron Turon’s Listening and Trust posts for this series, I was reminded of glaebhoerl’s pivotal comment that articulated very well their frustration at the Rust maintainer’s sense of entitlement and superiority, and the reasons for it. As glaebhoerl identified so clearly, it wasn’t so much the technical decision that was the problem — though I think on balance it was the wrong call, it was a debatable point — as the manner of engagement. ↩︎
Like when Disney canceled Owl House without even asking me. WHAT GIVES DISNEY. ↩︎
For example, I’ve been ignoring messages in the Salsa Zulip for a bit, and feeling bad about how I just don’t have the time to focus on that project right now. I’m sorry y’all and I do still expect to come back to Salsa 2022 (which, alas, will clearly not ship in 2022 – ah well, I knew the risks when I put a year into the name). ↩︎
This structure, “when you do X, I feel Y”, is called an I-message. It’s surprisingly hard to do it right. It’s easy to make something that sounds like an I-message, but isn’t. For example, “When you closed this PR without commenting, it showed me I am not welcome here” is very different from “When you closed this PR without commenting, it made me feel like I am not welcome here”. The first one is not an I-message. It’s telling someone else how they feel. The second one is telling someone else how they made you feel. There’s a very good chance those two statements would land quite differently. ↩︎
Unless, perhaps, you are Andrew Gallant, who from what I can see is one supremely well balanced individual. :) ↩︎
This of course is what people mean when they talk about systemic racism, or at least how I understand it: it’s not that open source or most other things were designed intentionally to reinforce bias, but the structures of our society are setup so that if you don’t actively work to counteract bias, you wind up playing into it. ↩︎
I always think of Jessica Lord’s inspirational blog post Privilege, Community, and Open source, which sadly appears to be offline, but you can read it on the web-archive. ↩︎

https://smallcultfollowing.com/babysteps/blog/2023/09/27/empathy-in-open-source/

Polonius revisited, part 1

Sep 22, 2023 Updated Sep 22, 2023

Show full content

lqd has been doing awesome work driving progress on polonius. He’s authoring an update for Inside Rust, but the TL;DR is that, with his latest PR, we’ve reimplemented the traditional Rust borrow checker in a more polonius-like style. We are working to iron out the last few performance hiccups and thinking about replacing the existing borrow checker with this new re-implementation, which is effectively a no-op from a user’s perspective (including from a performance perspective). This blog post walks through that work, describing how the new analysis works at a high-level. I plan to write some follow-up posts diving into how we can extend this analysis to be more precise (while hopefully remaining efficient).

What is Polonius?

Polonius is one of those long-running projects that are finally starting to move again. From an end user’s perspective, the key goal is that we want to accept functions like so-called Problem Case #3, which was originally a goal of NLL but eventually cut from the deliverable. From my perspective, though, I’m most excited about Polonius as a stepping stone towards an analysis that can support internal references and self borrows.

Polonius began its life as an alternative formulation of the borrow checker rules defined in Datalog. The key idea is to switch the way we do the analysis. Whereas NLL thinks of 'r as a lifetime consisting of a set of program points, in polonius, we call 'r an origin containing a set of loans. In other words, rather than tracking the parts of the program where a reference will be used, we track the places that the reference may have come from. For deeper coverage of Polonius, I recommend my talk at Rust Belt Rust from (egads) 2019 (slides here).

Running example

In order to explain the analyses, I’m going to use this running example. One thing you’ll note is that the lifetimes/origins in the example are written as numbers, like '0 and '1. This is because, when we start the borrow check, we haven’t computed lifetimes/origins yet – that is the job of the borrow check! So, we first go and create synthetic inference variables (just like an algebraic variable) to use as placeholders throughout the computation. Once we’re all done, we’ll have actual values we could plug in for them – in the case of polonius, those values are sets of loans (each loan is a & expression, more or less, that appears somewhere in the program).

Here is our example. It contains two loans, L0 and L1, of x and y respectively. There are also four assignments:

let mut x = 22;
let mut y = 44;
let mut p: &'0 u32 = &x; // Loan L0, borrowing `x`
y += 1;                  // (A) Mutate `y` -- is this ok?
let mut q: &'1 u32 = &y; // Loan L1, borrowing `y`
if something() {
    p = q;               // `p` now points at `y`
    x += 1;              // (B) Mutate `x` -- is this ok?
} else {
    y += 1;              // (C) Mutate `y` -- is this ok?
}
y += 1;                  // (D) Mutate `y` -- is this ok?
read_value(p);           // use `p` again here

Today in Rust, we get two errors (C and D). If you were to run this example with MiniRust, though, you would find that only D can actually cause Undefined Behavior. At point C, we mutate y, but the only variable that references y is q, and it will never be used again. The borrow checker today reports an error because its overly conservative. Polonius, on the other hand, gets that case correct.

Location Existing borrow checker Polonius MiniRust A ✔️ ✔️ OK B ✔️ ✔️ OK C ❌ ✔️ OK D ❌ ❌ Can cause UB, if true branch is taken Reformulating the existing borrow check à la polonius

This blog post is going describe the existing borrow checker, but reformulated in a polonius-like style. This will make it easier to see how polonius is different in the next post. The idea of doing this reformulation came about when implementing the borrow checker in a-mir-formality 1. At first, we weren’t sure if it was equivalent, but lqd verified it experimentally by testing it against the rustc test suite, where it matches the behavior 100% (lqd is also going to test against crater).

The borrow check analysis is a combination of three things, which we will cover in turn:

flowchart TD
  ConstructMIR --> LiveVariable
  ConstructMIR --> OutlivesGraph
  LiveVariable --> LiveLoanDataflow
  OutlivesGraph --> LiveLoanDataflow
  ConstructMIR["Construct the MIR"]
  LiveVariable["Compute the live variables"]
  OutlivesGraph["Compute the outlives graph"]
  LiveLoanDataflow["Compute the active loans at a given point"]

Construct the MIR

The borrow checker these days operates on MIR 2. MIR is basically a very simplified version of Rust where each statement is broken down into rudimentary statements. Our program is already so simple that the MIR basically looks the same as the original program, except for the fact that it’s structured into a control-flow graph. The MIR would look roughly like this (simplified):

flowchart TD
  Intro --> BB1
  Intro["let mut x: i32
let mut y: i32
let mut p: &'0 i32
let mut q: &'1 i32"]
  BB1["p = &x;
y = y + 1;
q = &y;
if something goto BB2 else BB3"]
  BB1 --> BB2
  BB1 --> BB3
  BB2["p = q;
x = x + 1;
"]
  BB3["y = y + 1;"]
  BB2 --> BB4;
  BB3 --> BB4;
  BB4["y = y + 1;
read_value(p);
"]

  classDef default text-align:left,fill-opacity:0;

Note that MIR begins with the types for all the variables; control-flow constructs like if get transformed into graph nodes called basic blocks, where each basic block contains only simple, straightline statements.

Compute the live origins

The first step is to compute the set of live origins at each program point. This is precisely the same as it was described in the NLL RFC. This is very similar to the classic liveness computation that is taught in a typical compiler course, but with one key difference. We are not computing live variables but rather live origins – the idea is roughly that the live origins are equal to the origins that appear in the types of the live variables:

LiveOrigins(P) = { O | O appears in the type of some variable V live at P }

The actual computation is slightly more subtle: when variables go out of scope, we take into account the rules from RFC #1327 to figure out precisely which of their origins may be accessed by the Drop impl. But I’m going to skip over that in this post.

Going back to our example, I’ve added comments which origins would be live at various points of interest:

let mut x = 22;
let mut y = 44;
let mut p: &'0 u32 = &x;
y += 1;
let mut q: &'1 u32 = &y;
// Here both `p` and `q` may be used later,
// and so the origins in their types (`'0` and `'1`)
// are live.
if something() {
    // Here, only the variable `q` is live.
    // `p` is dead because its current value is about
    // to be overwritten. As a result, the only live
    // origin is `'1`, since it appears in `q`'s type.
    p = q;
    x += 1;
} else {
    y += 1;
}
// Here, only the variable `p` is live
// (`q` is never used again),
// and so only the origin `'0` is live.
y += 1;
read_value(p);

Compute the subset graph

The next step in borrow checking is to run a type check across the MIR. MIR is effectively a very simplified form of Rust where statements are heavily desugared and there is a lot less type inference. There is, however, a lot of lifetime inference – basically when NLL starts every lifetime is an inference variable.

For example, consider the p = q assignment in our running example:

...
let mut p: &'0 u32 = &x;
y += 1;
let mut q: &'1 u32 = &y;
if something() {
    p = q; // <-- this assignment
    ...
} else {
    ...
}
...

To type check this, we take the type of q (&'1 u32) and require that it is a subtype of the type of p (&'0 u32):

&'1 u32 <: &'0 u32

As described in the NLL RFC, this subtyping relation holds if '1: '0. In NLL, we called this an outlives relation. But in polonius, because '0 and '1 are origins representing sets of loans, we call it a subset relation. In other words, '1: '0 could be written '1 ⊆ '0, and it means that whatever loans '1 may be referencing, '0 may reference too. Whatever final values we wind up with for '0 and '1 will have to reflect this constraint.

We can view these subset relations as a graph, where '1: '0 means there is an edge '1 --⊆--> '0. In the borrow checker today, this graph is flow insensitive, meaning that there is one graph for the entire function. As a result, we are going to get a graph like this:

flowchart LR
  L0 --"⊆"--> Tick0
  L1 --"⊆"--> Tick1
  Tick1 --"⊆"--> Tick0
  
  L0["{L0}"]
  L1["{L1}"]
  Tick0["'0"]
  Tick1["'1"]

  classDef default text-align:left,fill:#ffffff;

You can see that '0, the origin that appears in p, can be reached from both loan L0 and loan L1. That means that it could store a reference to either x or y, in short. In contrast, '1 (q) can only be reached from L1, and hence can only store a reference to y.

Active loans

There is one last piece to complete the borrow checker, which is computing the active loans. Active loans determine the errors that get reported. The idea is that, if there is an active loan of a place a.b.c, then accessing a.b.c may be an error, depending on the kind of loan/access.

Active loans build on the liveness analysis as well as the subset graph. The basic idea is that a loan is active at a point P if there is a path from the borrow that created the loan to P where, for each point along the path…

there is some live variable that may reference the loan
- i.e., there is a live origin O at P where L ∈ O. L ∈ O means that there is a path in the subset graph from the loan L to the origin O.
the place expression that was borrowed (here, x) is not reassigned
- this isn’t relevant to the current example, but the idea is that you can borrow the referent of a pointer, e.g., &mut *tmp. If you then later change tmp to point somewhere else, then the old loan of *tmp is no longer relevant, because it’s pointing to different data than the current value of *tmp.

Implementing using dataflow

In the compiler, we implement the above as a dataflow analysis. The value at any given point is the set of active loans. We gen a loan (add it to the value) when it is issued, and we kill a loan at a point P if either (1) the loan is not a member of the origins of any live variables; (2) the path borrowed by the loan is overwritten.

Active loans on entry to the function

Let’s walk through our running example. To start, look at the first basic block:

flowchart TD
  Start["..."]
  BB1["// Active loans: {}
       p = &x; // Gen: L0 -- loan issued
       // Active loans: {L0}
       y = y + 1;
       q = &y; // Gen L1 -- loan issued
       // Active loans {L0, L1}
       if something goto BB2 else BB3
  "]
  BB2["..."]
  BB3["..."]
  BB4["..."]

  Start --> BB1
  BB1 --> BB2
  BB1 --> BB3
  BB2 --> BB4
  BB3 --> BB4

  classDef default text-align:left,fill:#ffffff;
  classDef highlight text-align:left,fill:yellow;
  class BB1 highlight

This block is the start of the function, so the set of active loans starts out as empty. But then we encounter two &x statements, and each of them is the gen site for a loan (L0 and L1 respectively). By the end of the block, the active loan set is {L0, L1}.

Active loans on the “true” branch

The next interesting point is the “true” branch of the if:

flowchart TD
  Start["
    ...
    let mut q: &'1 i32;
    ...
  "]
  BB1["..."]
  BB2["
      // Kill L0 -- not part of any live origin
      // Active loans {L1}
      p = q;
      x = x + 1;
  "]
  BB3["..."]
  BB4["..."]
 
  Start --> BB1
  BB1 --> BB2
  BB1 --> BB3
  BB2 --> BB4
  BB3 --> BB4
 
  classDef default text-align:left,fill:#ffffff;
  classDef highlight text-align:left,fill:yellow;
  class BB2 highlight

The interesting thing here is that, on entering the block, there is a kill of L0. This is because the only live reference on entry to the block is q, as p is about to be overwritten. As the type of q is &'1 i32, this means that the live origins on entry to the block are {'1}. Looking at the subset graph we saw earlier…

flowchart LR
  L0 --"⊆"--> Tick0
  L1 --"⊆"--> Tick1
  Tick1 --"⊆"--> Tick0
  
  L0["{L0}"]
  L1["{L1}"]
  Tick0["'0"]
  Tick1["'1"]

  class L1 trace
  class Tick1 trace

  classDef default text-align:left,fill:#ffffff;
  classDef trace text-align:left,fill:yellow;

…we can trace the transitive predecessors of '1 to see that it contains only {L1} (I’ve highlighted those predecessors in yellow in the graph). This means that there is no live variable whose origins contains L0, so we add a kill for L0.

No error on true branch

Because the only active loan is L1, and L1 borrowed y, the x = x + 1 statement is accepted. This is a really interesting result! It illustrates how the idea of active loans restores some flow sensitivity to the borrow check.

Why is it so interesting? Well, consider this. At this point, the variable p is live. The variable p contains the origin '0, and if we look at the subset graph, '0 contains both L0 and L1. So, based purely on the subset graph, we would expect modifying x to be an error, since it is borrowed by L0. And yet it’s not!

This is because the active loan analysis noticed that, although in theory x may reference L0, it definitely doesn’t at this point.

Active loans on the false branch

In contrast, if we look at the “false” branch of the if:

flowchart TD
  Start["
    ...
    let mut p: &'0 i32;
    ...
  "]
  BB1["..."]
  BB2["..."]
  BB3["
      // Active loans {L0}, {L1}
      y = y + 1;
  "]
  BB4["..."]
 
  Start --> BB1
  BB1 --> BB2
  BB1 --> BB3
  BB2 --> BB4
  BB3 --> BB4
 
  classDef default text-align:left,fill:#ffffff;
  classDef highlight text-align:left,fill:yellow;
  class BB3 highlight

False error on the false branch

This path is also interesting: there is only one live variable, p. If you trace the code by hand, you can see that p could only refer to L0 (x) here. And yet the analysis concludes that we have two active loans: L0 and L1. This is because it is looking at the subset graph to determine what p may reference, and that graph is flow insensitive. So, since p may reference L1 at some point in the program, and we haven’t yet seen references to L1 go completely dead, we assume that p may reference L1 here. This leads to a false error being reported when the user does y = y + 1.

Active loans on the final block

Now let’s look at the final block:

flowchart TD
  Start["
    ...
    let mut p: &'0 i32;
    ...
  "]
  BB1["..."]
  BB2["..."]
  BB3["..."]
  BB4["
        // Active loans {L0}, {L1}
        y = y + 1;
        read_value(p);
  "]
 
  Start --> BB1
  BB1 --> BB2
  BB1 --> BB3
  BB2 --> BB4
  BB3 --> BB4
 
  classDef default text-align:left,fill:#ffffff;
  classDef highlight text-align:left,fill:yellow;
  class BB4 highlight

At this point, there is one live variable (p) and hence one live origin ('0); the subset graph tells us that p may reference both L0 and L1, so the set of active loans is {L0, L1}. This is correct: depending on which path we took, p may refer to either L0 or L1, and hence we flag a (correct) error when the user attempts to modify y.

Kills for reassignment

Our running example showed one reason that loans get killed when there are no more live references to them. This most commonly happens when you create a short-lived reference and then stop using it. But there is another way to get a kill, which happens from reassignment. Consider this example:

struct List {
    data: u32,
    next: Option<Box<List>>
}

fn print_all(mut p: &mut List) {
    loop {
        println!("{}", p.data);
        if let Some(n) = &mut p.next {
            p = n;
        } else {
            break;
        }
    }
}

I’m not going to walk through how this is borrow checked in detail here, but let me just point out what makes it interesting. In this loop, the code first borrows from p and then assigns that result to p. This means that, if you just look at the subset graph, on the next iteration around the loop, there would be an active loan of p. However, this code compiles – how does that work? The answer is that when we do p = n, we are mutating p, which means that, when we borrow from p on the next iteration, we are actually borrowing from a previous node than we borrowed from in the first iteration. So everything is fine. The reason the borrow checker is able to conclude this is that it kills the loan of p.next when it sees that p is assigned to. This is discussed in the NLL RFC in more detail.

Conclusion

That brings us to the end of part 1! In this post, we covered how you can describe the existing borrow check in a more polonius-like style. We also uncovered an interesting quirk in how the borrow checker is formulated. It uses a location insensitive alias analysis (the subset graph) but completes that with a dataflow propagation to track active loans. Together, this makes it more expressive. This wasn’t, however, the original plan with NLL. Originally, the subset graph was meant to be flow sensitive. Extending the subset graph to be flow sensitive is basically the heart of polonius. I’ve got some thoughts on how we might do that and I’ll be getting to that in later posts. I do want to say in passing though that doing all of this framing is also making me wonder – is it really necessary to combine a type check and the dataflow check? Can we frame the borrow checker (probably the more precise variants we’ll be getting to in future posts) in a more unified way? Not sure yet!

You won’t find this code in the current version of a-mir-formality; it’s since been rewritten a few times and the current version hasn’t caught up yet. ↩︎
The origin of the MIR is actually an interesting story. As documented in RFC #1211, ↩︎

https://smallcultfollowing.com/babysteps/blog/2023/09/22/polonius-part-1/

New Layout, and now using Hugo!

Sep 19, 2023 Updated Sep 19, 2023

Show full content

Some time ago I wrote about how I wanted to improve how my blog works. I recently got a spate of emails about this – thanks to all of you! And a particular big thank you to Luna Razzaghipour, who went ahead and ported the blog over to use Hugo, cleaning up the layout a bit and preserving URLs. It’s much appreciated! If you notice something amiss (like a link that doesn’t work anymore), I’d be very grateful if you opened an issue on the babysteps github repo! Thanks!

Hugo seems fast so far, although I will say that figuring out how to use Hugo modules (so that I could preserve the atom feed…) was rather confusing! But it’s all working now (I think!). I’m still interested in playing around more with the layout, but overall I think it looks good, and I’m happy to have code coloring on the snippets. Hopefully it renders better on mobile too.

https://smallcultfollowing.com/babysteps/blog/2023/09/19/new-layout/

Stability without stressing the !@#! out

Sep 18, 2023 Updated Sep 18, 2023

Show full content

One of Rust’s core principles is “stability without stagnation”. This is embodied by our use of a “release train” model, in which we issue a new release every 6 weeks. Release trains make releasing a new release a “non-event”. Feature-based releases, in contrast, are super stressful! Since they occur infrequently, people try to cram everything into that release, which inevitably makes the release late. In contrast, with a release train, it’s not so important to make any particular release – if you miss one deadline, you can always catch the next one six weeks later. That’s the theory, anyway: but I’ve observed that, in practice, stabilizing a feature in Rust can still be a pretty stressful process. And the more important the feature, the more stress. This blog post talks over my theories as to why this is the case, and how we can tweak our processes (and our habits) to address it.

TL;DR

I like to write, and sometimes my posts get long. Sorry! Let me summarize for you:

Stabilization designs in Rust are stressful because they are conflating two distinct things: “does the feature do what it is supposed to do” (semver-stability) and “is the feature ready for general use for all its intended use cases” (recommended-for-use).
Open source works incrementally: to complete the polish we want, we need users to encounter the feature; incremental milestones help us do that.
Nightly is effective for getting some kinds of feedback, but not all; in particular, production users and library authors often won’t touch it. This gives us less data to work with when making high stakes decisions, and it’s a problem.
We should modify our process to distinguish four phases
- Accepted RFC – The team agrees idea is worth implementing, but it may yet be changed or removed. Use at your own risk. (Nightly today)
- Preview – Team agrees feature is ready for use, but wishes more feedback before committing. We reserve the right to tweak the details, but will not remove functionality without some migration path or workaround. (No equivalent today)
- Stable – Team agrees feature is done. Semantics will no longer change. Implementation may lack polish and may not yet meet all its intended use cases (but should meet some). (Stable today)
- Recommended – everyone should use this, it rocks. 🎸 (No equivalent today, though some would say stable)
I have an initial proposal for how we could implement these phases for Rust, but I’m not sure on the details. The point is more to identify this as a problem and start a discussion on potential solutions, rather than to drive a particular proposal.

Context

This post is inspired by years of experience trying to stabilize features. I’ve been meaning to write it for a while, but I was influenced most recently by the discussion on the PR to stabilize async fn in trait and return-position impl trait. I’m not intending this blog post to be an argument either way on that particular discussion, although I will be explaining my POV, which certainly has bearing on the outcome.

I will zoom out though and say that I think the Rust project needs to think about the whole “feature design lifecycle”. This has been a topic for me for years – just search for “adventures in consensus” on this blog. I think in the past I’ve been a bit too ambitious in my proposals1, so I’m thinking now about how we can move more incrementally. This blog post is one such example.

Summary of Rust’s process today

Let me briefly summarize the “feature lifecycle” for Rust today. I’ll focus on language features since that’s what I know best: this material is also published on the “How do I propose a change to the language” page for the lang-team, which I suspect most people don’t know exists2.

The path is roughly like this:

Author an RFC that outlines the problem to be solved and the key aspects of your solution. The RFC doesn’t have to have everything figured out, especially when it comes to the implementation – but it should describe most everything that a user of the language would have to know. The RFC can include “unresolved questions” that lay out corner cases or things where we need more experience to figure out the right answer.
- Generally speaking, to avoid undue maintenance burden, we don’t allow code to land until there is an accepted RFC. There is an exception though for experienced Rust contributors, who can create an experimental feature gate to do some initial hacking. That’s sometimes useful to prove out designs.3
Complete the implementation on master. This should force you to work out answers to the all unresolved questions that came up in the RFC. Often, having an implementation to work with also leads to other changes in the design. Presuming these are relatively minor, these changes are discussed and approved by the lang team on issues on the rust-lang repository.
Author a stabilization report, describing precisely what is being stabilized along with how each unresolved question was resolved.

Observation: Stabilization means different things to different people.

In a technical sense, stabilization means exactly one thing: the feature is now available on the stable release, and hence we can no longer make breaking changes to it4.

But, of course, stabilization also means that the feature is going to be encountered by users. Rust has always prided itself on holding a high bar for polish and quality, as reflected in how easy cargo is to use, our quality error messages, etc. There is always a concern when stabilizing a long-awaited feature that users are going to get excited, try it out, encounter rough edges, and conclude from this that Rust is impossible to use.

Observation: Open source works incrementally

Something I’ve come to appreciate over time is that open source is most effective if you work incrementally. If you want people to contribute or to provide meaningful feedback, you have to give them something to play with. Once you do that, the pace of progress and polish increases dramatically. It’s not magic, it’s just people “scratching their own itch” – once people have a chance to use the feature, if there is a confusing diagnostic or other similar issue, there’s a good chance that somebody will take a shot at addressing it.

In fact, speaking of diagnostics, it’s pretty hard to write a good diagnostic until you’ve thrown the feature at users. Often it’s not obvious up front what is going to be confusing. If you’ve ever watched Esteban at work, you’ll know that he scans all kinds of sources (github issues, twitter or whatever it’s called now, etc) to see the kinds of confusions that people are having and to look for ideas on how to explain them better.

Observation: Incremental progress boosts morale

The other big impact of working incrementally is for morale. If you’ve ever tried to push a big feature over the line, you’ll know that achieving milestones along the way is crucial. There’s a huge difference between trying to get everything perfect before you can ship and saying: “ok, this part is done, let’s get it in people’s hands, and then go focus on the next one”. This is both because it’s good to have the satisfaction of a job well done, and because stabilization is the only point at which we can truly end discussion. Up until stabilization is done, it’s always possible to stop and revisit old decisions.5

Observation: Working incrementally has a cost

Obviously, I am a big of working incrementally, but I won’t deny that it has a cost. For every person who encounters a bad diagnostic and gets inspired to open a PR, there are a lot more who will get confused. Some portion of them will walk away, concluding “Rust is too confusing”. That’s a problem.

Observation: A polished feature has a lot of moving parts

A polished feature in Rust today has a lot of moving parts…

a thoughtful design
a stable, bug free implementation
documentation in the Rust reference
quality error messages
tooling support, such as rustfmt, rustdoc, IDE, etc

…and we’d like to add more. For example, we are working on various Rust formalizations (MiniRust, a-mir-formality) and talking about upgrading the Rust reference into a normative specification.

Observation: Distinct skillsets are required to polish a feature

One interesting detail is that, often, completeing a polished feature requires the work of different people with different skillsets, which in turn means the involvement of many distinct Rust teams – in fact, when it comes to development tooling, this can mean the involvement of distinct projects that aren’t even part of the Rust org!

Just looking at language features, the design, for example, belongs to the lang-team, and often completes relatively early through the RFC process. The implementation is (typically) the compiler team, but often also more specialized teams and groups, like the types team or the diagnostics working group; RFCs can sometimes languish for a long time before being implemented. Documentation meanwhile is driven by the lang-docs team (for language features, anyway). Once that is done, the rustfmt, rustdoc, and IDE vendors also have work to do incorporating the new feature.

One of the challenges to open-source development is coordinating all of these different aspects. Open source development tends to be opportunistic – you don’t have dedicated resources available, so you have to do a balancing act where you adapt the work that needs to get done to the people that are available to do it. In my experience, it’s neither top down nor bottom up, but a strange mixture of the two.6

Because of the opportunistic nature of open-source development, some parts of a feature move more quickly than others – often, the basic design gets hammered out early, but implementation can take a long time. Sadly, the reference is often the hardest thing to catch up, in part because the rather heroic Eric Huss does not implement the Clone trait. 💜

Observation: Polished features don’t stand alone

And yet, to be truly polished, features need more than docs and error-messages: they need other features! It often happens that users using feature X will find that, to complete their task, they also need feature Y. This inevitably presents a challenge to our stabilization system, which judges the stability of each feature independently.

Async functions in trait are a great example: the core feature is working great on stable, but we haven’t reached consensus on a solution to the send bound problem. For some users, like embedded users, this doesn’t matter at all. For others, like Tower, this is a pretty big problem. So, do we hold back async function in traits until both features are ready? Or do we work incrementally, releasing what is ready now and then turning to focus on what’s left?

“We seem to have been designed for each other” – Mr Collins.

Observation: Nightly is just the beginning

I can hear readers saying now, “but wait, isn’t this what Nightly is for?” And yes, in principle, the nightly release is our vehicle for enabling experimentation with in-progress features. Sometimes it works great! It can be a great way to get ahead of confusing error messages, for example, or to flush out bugs. But all too often, Nightly is a big barrier for people, particularly production Rust users or those building widely used libraries. And those are precisely the users whose feedback would be most valuable.

What’s interesting is that many production users would be willing to tolerate a certain amount of instability. Many users tell me they wouldn’t mind rebasing over small changes in the feature design7, but what they can’t tolerate is building a codebase around a feature and then having it removed entirely, or having dropped support for major use cases without some kind of workaround.

Libraries are another interesting story. Library authors tend to be more advanced than your typical Rust user. They can tolerate a lack of polish in exchange for having access to a feature that lets them build a nicer experience for their users. Generic associated types are a clear example of this. One of the big arguments in favor of stabilizing them was that they often show up in the implementation of libraries but not in the outward interfaces. As one personal example, we’ve been using them extensively in Duchess, an experimental library for Java-Rust interop, and yet you won’t find any mention of them in the docs. Do we sometimes hit confusing errors or other problems? Yes. Is the syntax annoyingly verbose? Yes, absolutely. Am I glad they are stabilized? Hell yes.

Observation: having users help us figure out what else is needed

Remember how I said that it was hard to design quality diagnostics until you had seen the ways that users got confused? Well, the same goes for designing related features. Once production users or library authors start playing with something, they find all kinds of clever things they can do with it – or, often, things they could almost do, except for this one other missing piece. In this way, holding things unstable on Nightly – which means far fewer users can touch it – holds back the whole pace of Rust development significantly.

Prior art Ember’s feature lifecycle

The Ember and Rust projects have long had a lot of fruitful back-and-forth when it comes to governance and process, thanks in part to the fact that Yehuda Katz was deeply involved in both of them. In 2022, they adopted a revised RFC process in which each feature goes through a number of stages:8

Proposed – An open pull request on the emberjs/rfcs repo.
Exploring – An RFC deemed worth pursuing but in need of refinement.
Accepted – A fully specified RFC.
Ready for release – The implementation of the RFC is complete, including learning materials.
Released – The work is published.
Recommended – The feature/resource is recommended for general use.

This is pretty cool! One other interesting aspect for Ember is how they approach editions. Remember I talked about how features don’t stand alone? In Ember, a significant cluster of related features is called an “edition”. New editions are declaed when all the pieces are in place to enable a new model for programming. This is pretty distinct from Rust’s time-based editions.

I’m not totally sure how to map Ember’s edition to Rust, but I think that the concept of an “umbrella initiative” is pretty close. For example, the async fundamentals initaitive roadmap identifies a cluster of related work that together constitute “async-sync language parity” – i.e., you can truly use async operations everywhere you would like to.

One interesting aspect of Ember’s editions is that they often begin by stabilizing “primitives” – e.g., fundamental APIs that aren’t really meant for end-users, but rather for plugin authors or people in the ecosystem, who can use them to experiment with the right end-user abstractions. I’ve found in Rust that we sometimes do this, though sometimes we find it better to begin with the end-user abstraction, and expose the primitives later.

The TC39 process for ECMAScript

The TC39 committee has a nice staged process. It’s not exactly comparable to Rust, but there are few things worth observing. First, I love the designation of a champion for a feature, and I think Rust would benefit from being more official about that in some ways. Second, I also love the explainer concept of authoring user documentation as part of the process. Third, before they stabilize, they always make the feature available to end-users, but under gates.

Java’s preview features

Ever since [JEP-12], Java has included preview features in their release process. A preview feature is one that is “fully specified, fully implemented, and yet impermanent” – it’s released for feedback, but it may be removed or changed based on the result of the evaluation. The motivation is to get more feedback on the design before committing to it:

To build confidence in the correctness and completeness of a new feature – whether in the Java language, the JVM, or the Java SE API – it is desirable for the feature to enjoy a period of broad exposure after its specification and implementation are stable but before it achieves final and permanent status in the Java SE Platform.

When using preview features, users opt-in both at compilation time and at runtime. In other words, if you compile a Java file that uses preview features to a JAR, and distribute the JAR, people using the JAR must also opt-in.

Proposal

Instead of rehashing the same debate every time we go to stabilize a feature, I think we should look at our feature release process so that we have more gradations of stability:

accepted RFC – With an accepted RFC, the team has agreed that we want the feature in principal. However, the details often change during development, and may even be removed. Use at your own risk.
preview – We are commited to keeping this functionality in some form, but we reserve the right to make changes. We won’t remove functionality from preview state without some kind of workaround. You can use this feature so long as you are willing to update your code when moving to a new version of the compiler. Preview features must be viral, meaning that if I build a crate using preview features, consumers must opt-in to the resulting instability somehow.
semver stable – We have committed to the technical design of this feature and people can build on it without fear of breakage between compiler revisions. The experience may lack polish and some intended use cases may not yet be possible.
recommended for use – This feature has all the documentation, error messages, and associated features that are needed for most Rust users to be successful. USE IT!

Comparison with today’s release trains. In our system today, the first three phases are both covered by “nightly” and the latter two are both covered by “stable”, but of course we don’t draw any formal distinctions. Async function in trait, for example, is clearly past the accepted RFC phase and is now in preview: the team is committed to shipping it in some form, and we don’t expect any major changes. But how would you know this, if you aren’t closely following Rust development? Generic associated types, meanwhile, are clearly semver stable rather than recommended for use – we know of many major gaps in the experience, mostly blocked on the trait system refactor initiative, but how would you know that, unless you were actively attending Rust types team meetings?

Unresolved questions

I am confident that these four phases are important, but there are a number of details of which I am not sure. Let me pose some of the questions I anticipate here.

How committed should we be to preview features?

In my proposal above, I said that the project would not remove functionality without a workaround. This is somewhat stronger than [JEP-12][], which indicates that preview features “will either be granted final and permanent status (with or without refinements) or be removed”. I said something somewhat stronger because I was thinking of production users. I know many such users would happily make use of preview features, and they are willing to make updates, but they don’t want to get stuck having based their codebase on something that completely goes away. I feel pretty confident that by the time we get to preview state, we should be able to say “yes, we want something like this”. I think it’s fine however if the feature gets removed in favor, say, of a procedural macro or some other solution, so long as the people using that preview feature has somewhere to go. (Naturally, my preference would be to provide as smooth a path as possible between compiler revisions; ideally, we’d issue automatable suggestions using cargo fix, similar to what we do for editions.)

How should the features be reflected in our release trains?

I don’t entirely know! I think there are a lot of different versions. I do know a few things:

Instability should be viral, whether experimental or preview: today, if I depend on a crate that uses nightly features, I must use nightly myself; this falls out from the fact that Rust doesn’t support binary distribution, but is very much intentional. The reason is that a crate cannot truly “hide” instability from its users. They can always upgrade to a new version of Rust and, if that causes the crate to stop compiling, they will perceive this as a failure of Rust’s promise, even it is a result of the crate having used an unstable feature. We need to do the same kind of viral result for preview features.
Preview and stabilized features need to be internally consistent, but not complete or fully polished: Preview features need to meet a certain quality bar – e.g., support in rustfmt, adequate documentation – but it’s fine for them to be a subset of what we hope to do in the fullness of time. It’s also ok for them to have less-than-ideal error messages. Those things come with time.
Documentation is key: A big challenge for Rust today is that we don’t have a canonical way for people to find out the status of the things they care about. I think we should invest some effort in setting up a consistent format with bot/tooling support to make it easy to maintain. Users will understand the idea that a feature is unpolished if you can direct them to a page where they can understand the context and learn about the workarounds they need in the short term.

With that in mind, here is a possible proposal for how we might do this:

Initially, features are nightly only, as today, and require an individual feature-gate.
- Until there is an accepted RFC, we should have a mandatory warning that the team has not yet decided if the feature is worth including; we also can continue to warn for features whose implementation is very incomplete.
Preview features are usable on stable, but with opt-in:
- Every project that uses any preview features, or which depends on crates that use preview features, must include preview-features = true in their Cargo.toml.
- Every crate that directly uses preview features must additionally include the appropriate feature gates.
- Reaching preview status should require some base level of support
  - core tooling, e.g. rustfmt, rustdoc, must work
  - an explainer must be available, but Rust reference material is not required
  - a nice landing page (or Github issue with known format) that indicates how to provide feedback; this page should also cover polish or supporting features that are known to be missing (similar to the [async fn fundamentals roadmap][roadmap])
  - the feature must be “complete enough” to meet some of its intended use cases; it doesn’t have to meet all of its intended use cases.
- This is an FCP decision, because it is commits the Rust project to supporting the use cases targeted by the preview feature (if not the details of how the feature works).
Semver stable features features are usable on stable, but we make efforts to redirect users to the landing page have a landing page that outlines what kind of support is still missing and how to provide feedback.
- Reaching semver stable requires an update to the Rust reference, in addition to the requirements for preview.
- The feature must be “complete enough” to meet some of its intended use cases; it doesn’t have to meet all of its intended use cases.
- This is an FCP decision, because it is commits the Rust project to supporting the feature in its current form going forward.
Recommended for use features would be just as today.
- The feature must meet all of the major use cases, which may mean that other features are present.

Other frequently asked questions and alternatives

Here are answers to a few other questions I anticipate.

Who will maintain these “landing pages”?

This is a good question! It’s easy for these to get out of date. I think part of designing this ‘preview’ process should also be investing in a standard template for the landing pages and some guidelines. My sense is that people would be happy to update landing pages as part of the stabilization process if it meant they can make progress on shipping the feature they’ve worked so hard to build! But I think we can do a lot to make it easier. Having a standard format would also mean that users can find the information they’re looking for more easily. We can then also build bots and things to help. I’ve seen that investing in bots can make a real difference.

How will we ensure polish gets done?

One concern that is commonly raised is that stabilization is the only gate we have to force polish work to get done. I agree that we should maintain a certain quality bar as features move towards being fully recommended. But I think that saying “we cannot ship something for widespread use until it is polished” misses the point that open-source is incremental. In other words, part of the way that features get polished is by releasing them for widespread use.

Definitely though the Rust project can do a better job of tracking and ensuring that we do the follow-up items. There are plenty of examples of follow-up that never gets done. But I don’t think blocking stabilization is an effective tool for that. If anything, it’s demoralizing9. We really need to strengthen our project management skills – pushing people to create better landing pages and to help identify the gaps more crisply feels like it can help, though more is needed.

Why would we stabilize a feature if we know users will hit gaps?

Most features in Rust serve a lot of purposes. Even if we know about major gaps, there are often important blocks of users who are not affected by them. For async functions in traits, the send bound problem can be a total blocker – but it’s also a non-issue for a lot of users. I would like to see us focus more on how we can alert users to the gaps they are hitting rather than denying them access to features until everything is done.

I thought you said you wanted to move incrementally? This feels like a big step.

Earlier, I said that I wanted to look for incremental ways to tweak Rust’s process, since in the past I’ve gotten too ambitious. In truth, I think this blog post is really laying out two proposals, so let me separate them out:

Part 1: Semver-stable vs recommended-for-use
- The most immediate need is to clarify what stabilization means and what exactly is the “bar”; in my opinion, that is semver stability, and I think there is plenty of precedent for that.
- But I think the risk of user confusion is very real, and we can take some simple steps to help mitigate it, such as creating good landing pages and having the compiler direct users to them when it thinks they may be encountering a gap.
  - Example: today if you try to use an async fn in a trait, you get directed to the async-trait crate. We can detect “send bound”-related failures and direct users to a github issue that explains how they can resolve it and also gives them a way to register interest or provide feedback.
- I don’t think anything is really blocking us from moving forward here immediately, though an RFC might be nice at some point to clarify terminology and help align the way we talk about this.
Part 2: Preview features
- Preview features is really a distinct concept, but I do think it’s important. For example, we could have declared async functions in traits as a ‘preview feature’ over a year ago. This would have given us a lot more data and made it accessible to a much broader pool of people. I think this would have given us a clearer picture on how important the ‘send bound’ problem is, for example, and would inform other prioritization efforts.
- Moving forward here will require an RFC and also implementation work.

Conclusion

With apologies to Jane Austen:

“All Rust features are so accomplished. They all have stable semantics and even make helpful suggestions when you go astray. I am sure I never encountered a Rust feature without being informed that it was very accomplished.”

“Your list of the common extent of accomplishments,” said Darcy, “has too much truth. The word is applied to many a feature who deserves it no otherwise than by being stabilized. But I am very far from agreeing with you in your estimation of Rust features in general. I cannot boast of knowing more than half-a-dozen, in the whole range of my acquaintance, that are really accomplished.”

“Then,” observed Elizabeth, “you must comprehend a great deal in your idea of an accomplished feature.”

“Oh! certainly,” cried his faithful assistant, “no feature can be really esteemed accomplished without strong support in the IDE, wondorous documentation, and perhaps a chapter in the Rust book.”

“All this it must possess,” added Darcy, “and to all this it must yet add something more substantial: a host of related features that address common problems our users may encounter.”

“I am no longer surprised at your knowing ONLY six accomplished features. I rather wonder now at your knowing ANY.”

To translate: I think our ‘all or nothing’ stability system is introducing unnecessary friction into Rust development. Let’s change it!

A critique which many people pointed out to me at the time. ↩︎
The whole “How do I…” section on the page has some interesting things, if you’re looking to interact with the lang team! ↩︎
The decision to limit in-tree experimentation to experienced contributors was based on our experience with the earlier initiative system, where we were more open-ended. We found that the majority of those projects never went anywhere. Most of the people who signed up to drive experiments didn’t really have the time or knowledge to move them independently, and there wasn’t enough mentoring bandwidth to help them make progress. So we decided to limit in-tree experimentation to maintainers who’ve already demonstrated staying power. ↩︎
RFC 1122 lays out the lang team’s definition of “breaking change”, which is not quite the same as “your code will always continue to compile”. For example, we sometimes change the rules of inference; we also introduce or modify the behavior of lints (which can cause code that has #[deny] to stop compiling). Finally, we reserve the right to fix soundness bugs. And, in rare cases, we will override the policy altogether, if a feature’s design is so broken, but the bar for that is quite high. ↩︎
One of the things I am proud of about the Rust project is that we are willing to stop and revisit old decisions – I think we’ve dodged a number of bullets that way. At the same time, it’s exhausting. I think there’s more to say about finding ways to enable conversation that are not as draining on the participants, and especially on the designers and maintainers, but that’s a topic for another post. ↩︎
That said, my experience is that Amazon works in a surprisingly similar way – there are top-down decisions, but there are an awful lot of bottom-up ones. I imagine this varies company to company, but I think ultimately every good manager tries to ensure that their people are working on things that are well-suited to their skills. ↩︎
Many of which could be automated via cargo fix! ↩︎
Speaking of Ember-Rust cross-polination, Peter Wagenet, co-author of the Ember release blog post, also hacks on the Rust compiler from time to time. ↩︎
There’s nothing worse than investing months and months of work into getting something ready for stabilization, endlessly triaging issues, only to open a stabilization PR – the culmination of all that effort – and have the first few comments tell you that your work is not good enough. Oftentimes the people opening those PRs are volunteers, as well, which makes it all the worse. ↩︎

https://smallcultfollowing.com/babysteps/blog/2023/09/18/stability-without-stressing-the-out/

Higher-ranked projections (send bound problem, part 4)

Jun 12, 2023 Updated Jun 12, 2023

Show full content

I recently posted a draft of an RFC about Return Type Notation to the async working group Zulip stream. In response, Josh Triplett reached out to me to raise some concerns. Talking to him gave rise to a 3rd idea for how to resolve the send bound problem. I still prefer RTN, but I think this idea is interesting and worth elaborating. I call it higher-ranked projections.

Idea part 1: Define T::Foo when T has higher-ranked bounds

Consider a trait like this…

trait Transform<In> {
    type Output;

    fn apply(&self, in: In) -> Self::Output;
}

Today, given a trait bound like T: Transform<Vec<u32>>, when you write T::Output, the compiler expands that to a fully qualified associated type <T as Transform<Vec<u32>>>::Output. This took a bit of work — the self type (T) of the trait is specified by the user, but the compiler looked at the bounds to select Vec<u32> as the value for In.

But suppose you have a higher-ranked trait bound like T: for<‘a> Transform<&’a [u32]>. Then what should the compiler do for T::Output? The compiler would have to something like <T as Transform<&’b str>>::Output where we pick a specific lifetime ’b. Instead of doing that, the compiler currently gives an error.

But we don’t always need to expand T::Output to a specific type. If T::Output is appearing in a where-clause, we could expand it to a random of types. For example, consider this function, which today will not compile:

fn process<T>()
where
    T: for<‘a> Transform<&’a str>>,
    T::Output: Send, // ERROR: `T::Output` is not allowed
{ /* … */ }

We could interpret T::Output: Send as a higher-ranked bound, for example:

fn process<T>()
where
    T: for<‘a> Transform<&’a str>>,
    for<‘a> <T as Transform<&’a str>>::Output: Send, // Desugared?
{ /* … */ }

Idea part 2: Fix the bugs on associated type chains

Right now, if have an iterator that yields other items, the compiler won’t let you write things like T::Item::Item…

fn foo<T: Iterator>
where
    T::Item: Iterator,
    T::Item::Item: Send, // <— ERROR
{ /* … */ }

…instead you have to write something horrible like <<T as Iterator>::Item as Iterator>::Item. There’s no particularly good reason for this. We should make it work better. One thing that would be useful is if we examined the bounds declared in the trait, so that e.g. if we have a trait like…

trait Factory {
    type Iterator: Iterator;
}

…and a F: Factory, then F::Iterator::Item should work.

Idea part 3: Associated type for every method in a trait

As the final step, for every method in a trait, we could add an associated type that binds to the “zero-sized function type” associated with that method. So in the Iterator trait…

trait Iterator {
    type Item;
    fn next(&mut self) -> Option<Self::Item>;
}

…there’d be two associated types, Item and next. Given T: Iterator, T::next would map to a function type that implements for<‘a> Fn(&’a mut T) -> Option<T::Item>.

Putting it all together

If we put this all together, we can start to put bounds in the return types of async functions. Consider our usual trait:

trait HealthCheck {
    async fn check(&mut self);
}

and then a function like

fn spawn_health_check<HC>(hc: &mut HC)
where
    HC: HealthCheck,
    HC::check::Output: Send,
{
    /* … */
}

what does HC::check::Output: Send mean? Note that the Output here is the return type of the function trait, so it refers to the future that you get when you call the async function.

Regardless, by combining ideas part 1, 2, and 3, HC::check::Output can then be expanded to the following:

fn spawn_health_check<HC>(hc: &mut HC)
where
    HC: HealthCheck,
    // `HC::check::Output: Send` becomes…
    for<‘a> <HC::check as Fn<(&’a mut HC,)>>::Output: Send,
{
    /* … */
}

which, if you really like complex where clauses, you could further expand to this to a where-clause like this:

for<‘a> <
    <HC as HealthCheck>::check 
    as 
    Fn<(&’a mut HC,)>
>::Output: Send

Comparing this approach and RTN

In many ways, this idea is very similar to RTN. Compare this example…

fn spawn_health_check<HC>(hc: &mut HC)
where
    HC: HealthCheck,
    HC::check::Output: Send,
{
    /* … */
}

…to the RTN-based approach…

fn spawn_health_check<HC>(hc: &mut HC)
where
    HC: HealthCheck,
    HC::check(): Send,
{
    /* … */
}

In fact, () could be a shorthand for ::Output.

Associated type bounds

Another part of RTN, and in fact the only part that we’ve implemented so far, is the ability to put bounds on function returns “inline”:

fn spawn_health_check<HC>(hc: &mut HC)
where
    HC: HealthCheck<check(): Send>,
    //             ———
{
    /* … */
}

We could in principle do the same thing with ::Output notation:

fn spawn_health_check<HC>(hc: &mut HC)
where
    HC: HealthCheck<check::Output: Send>,
    //             ———
{
    /* … */
}

Pro: simpler building blocks

What I really like about this idea is that it doesn’t introduce new concepts or notation, but rather refines and extends ones that exist. We already have T::Output — all this is doing is making it work in contexts where it didn’t work before, and in a fairly logical way. We already have zero-sized function types representing every method, but now we would have a way to name them.

Con: Rust has two namespaces, and this is at odds with that

I said that we can add an associated type for every method in the trait — but what do we do if there is an associated type and a method with the same name? Something like this…

trait Foo {
    type process;
    fn process(&mut self);
}

…that would be weird, but it can certainly happen (in fact, I’ve written proc macros that generate code like this because I was too lazy to transform the name of the associated type).

We have some options here. We could say that we only add associated types for a method if there isn’t an explicit associated type. We can make this shadowing illegal in Rust 2024 (but not earlier Rust editions). We can only add methods for async functions and RPITIT functions, which are not currently possible, and then forbid shadowing in those cases.

Still, fundamentally, this approach is of making a method into an associated type is at odds with Rust’s primary two namespaces (types, values), whereas the RTN approach is working with those two namespaces.

Con: omg so verbose; and so. many. colons.

The obvious downside of the ::Output notation is that it is significantly more verbose to read and write when compared to RTN, and it puts :: and : in close proximity (admittedly an existing problem with Rust syntax). Consider:

where HC::check(): Send
// vs
where HC::check::Output: Send

RTN also works really well in associated type bound position, but ::Output works less well:

where HC: HealthCheck<check(): Send>
// vs
where HC: HealthCheck<check::Output: Send>

but…

…although it must be said that, in practice, check(): Send isn’t the only thing you have to write. For example, this example only says that the future returned by check() is Send, but in practice you actually need HC to be Send + ‘static too. So you would have to write something like…

HC: HealthCheck<check(): Send> + Send + ‘static

…and, of course, many traits in practice have a lot more than one method. Consider something like this trait…

trait Resource {
    async fn get(&mut self);
    async fn put(&mut self);
}

…then you would need to write…

R: Resource<get(): Send, put(): Send> + Send + ‘static

…and that quickly gets tedious. We encountered this in the case studies that we did, which is why the Google folks created a crate that lets you define a trait alias like SendResource, so that R: SendResource says all the above.

Con: confusion between Output

One interesting point that Yosh raised in our lang team design meeting is that people already have the potential to be confused about whether the Send bound applies to the future returned by the async function or the value you get from awaiting the future; the fact that both FnOnce and Future have an Output associated type could well play into that confusion.

One thing we discussed is how one would place bounds on the value returned from a future (versus the future itself). Under the higher-ranked projections proposal described in this blog post, this is fairly clear, you just do ...::Output::Output:

where 
    T::method::Output::Output: Send
    //         ------  ------
    //           |       |
    //           |     Describes value produced by future
    //         Describes the future itself.

For RTN, there are multiple options. One is to use ::Output:

where 
    T::method()::Output: Send,
    //       --  ------
    //       |    |
    //       |  Describes value produced by future
    //       Describes the future itself.

Another is to “double down” on the “pseudo-expression” syntax:

where 
    T::method().await: Send,
    //       -- -----
    //       |    |
    //       |  Describes value produced by future
    //       Describes the future itself.

We don’t have to settle this today, but it’s interesting to think about.

Pro: Building blocks first?

I’m torn on this point. Lately I’ve been into the idea of “stabilize the building blocks”. For a mature language like Rust, it is important to work piece by piece. Moreover, thanks to custom derive and procedural macros, people can build really powerful abstractions if they have the buildings blocks to work with. And it’s sometimes a lot easier to get consensus around the building blocks than the nice syntax on top1. All of this argues to me for the ::Output approach, which feels to me like more of a general purpose building block.

but…

On the other hand, the () syntax is itself a building block. But it’s a building block that’s actually nice enough to use in simple cases. We’ve often been reluctant to add new bits of syntax to Rust, and I think that’s generally good, but sometimes I look with envy at other languages that are willing to take bold steps to build designs that are aggressively awesome. I’d like us as a language community to dare to ask for more. It’s hard to argue that the ::Output syntax is aggressively awesome. The () syntax may not be aggressively awesome (that’s probably trait transformers), but it’s at least mildly awesome.

Implementation notes

Right now, the only form of RTN that we have implemented is the “associated type bound” notation, e.g., HealthCheck<check(): Send>. If we add RTN, I think we should also support use in where clauses (e.g., HC::check(): Send) and as a type for local variables (e.g., let x: HC::check() = hc.check(…)), persuant to the “year of everywhere” philosophy, where we try to make Rust notations as uniformly applicable as possible2. That said, implementing it in those other places is significantly more complicated in the compiler.

The ::Output notation, in contrast, doesn’t read especially well as an associated type bound (HealthCheck<check::Output: Send> is kind of O_O to me). I think it works better as a standalone where clause like HC::check::Output: Send. It’s not clear how quickly we can implement that. It should be possible, imo, but it requires more investigation.

Conclusion

There isn’t one yet. My sense is that both the ::Output and the RTN approach would work. The ::Output approach feels a bit more “primitive”. It can be used with any higher-ranked trait bound, which means it covers slightly more options, although I don’t have a compelling example of where you would want it right now. In contrast, RTN feels easier to explain and more accessible to newcomers, and it respects Rust’s “two namespaces” approach. Neither feels like a one-way door: we can start with RTN and then add ::Output (in which case, () is a kind of sugar for ::Output), and we can start with ::Output and then add () as a sugar for it later.

Although not always! I think that -> impl Trait is a good example of where stabilizing the syntax first, and working through the semantics and core primitives over time, has paid off. ↩︎
Hat tip to TC for bringing up this slogan in the lang team meeting. ↩︎

https://smallcultfollowing.com/babysteps/blog/2023/06/12/higher-ranked-projections-send-bound-problem-part-4/

Giving, lending, and async closures

May 9, 2023 Updated May 9, 2023

Show full content

In a previous post on async closures, I concluded that the best way to support async closures was with an async trait combinator. I’ve had a few conversations since the post and I want to share some additional thoughts. In particular, this post dives into what it would take to make async functions matchable with a type like impl FnMut() -> impl Future<Output = bool>. This takes us down some interesting roads, in particular the distinction between giving and lending traits; it turns out that the closure traits specifically are a bit of a special case in turns of what we can do backwards compatibly, due to their special syntax. on!

Goal

Let me cut to the chase. This article lays out a way that we could support a notation like this:

fn take_closure(x: impl FnMut() -> impl Future<Output = bool>) { }

It requires some changes to the FnMut trait which, somewhat surprisingly, are backwards compatible I believe. It also requires us to change how we interpret -> impl Trait when in a trait bound (and likely in the value of an associated type); this could be done (over an Edition if necessary) but it introduces some further questions without clear answers.

This blog post itself isn’t a real proposal, but it’s a useful ingredient to use when discussing the right shape for async closures.

Giving traits

The split between Fn and async Fn turns out to be one instance of a general pattern, which I call “giving” vs “lending” traits. In a giving trait, when you invoke its methods, you get back a value that is independent from self.

Let’s see an example. The current Iterator trait is a giving trait:

trait Iterator {
    type Item;
    fn next(&mut self) -> Option<Self::Item>;
    //      ^ the lifetime of this reference
    //        does not appear in the return type;
    //        hence "giving"
}

In Iterator, each time you invoke next, you get ownership of a Self::Item value (or None). This value is not borrowed from the iterator.1 As a consumer, a giving trait is convenient, because it permits you to invoke next multiple times and keep using the return value afterwards. For example, this function compiles and works for any iterator (playground):

fn take_two_v1<T: Iterator>(t: &mut T) -> Option<(T::Item, T::Item)> {
    let Some(i) = t.next() else { return None };
    let Some(j) = t.next() else { return None };
    // *Key point:* `i` is still live here, even though we called `next`
    // again to get `j`.
    Some((i, j))
}

Lending traits

Whereas a giving trait gives you ownership of the return value, a lending trait is one that returns a value borrowed from self. This pattern is less common, but it certainly appears from time to time. Consider the AsMut trait:

trait AsMut<T: ?Sized> {
    fn as_mut(&mut self) -> &mut T;
    //        -             -
    // Returns a reference borrowed from `self`.
}

AsMut takes an &mut self and (thanks to Rust’s elision rules) returns an &mut T borrowed from it. As a caller, this means that so long as you use the return value, the self is considered borrowed. Unlike with Iterator, therefore, you can’t invoke as_mut twice and keep using both return values (playground):

fn as_mut_two<T: AsMut<String>>(t: &mut T) {
    let i = t.as_mut(); // Borrows `t` mutably
    
    let j = t.as_mut(); // Error: second mutable borrow
                        // while the first is still live
    
    i.len();            // Use result from first borrow
}

Lending iterators

Of course, AsMut is kind of a “trivial” lending trait. A more interesting one is lending iterators2. A lending iterator is an iterator that returns references into the iterator self. Typically this is because the iterator has some kind of internal buffer that it uses. Until recently, there was no lending iterator trait because it wasn’t even possible to express it in Rust. But with generic associated types (GATs), that changed. It’s now possible to express the trait, although there are borrow checker limitations that block it from being practical3:

trait LendingIterator {
    type Item<'this>
    where
        Self: 'this;
    
    fn next(&mut self) -> Option<Self::Item<'_>>;
    //      ^                        ^^
    // Unlike `Iterator`, returns a value
    // potentially borrowed from `self`.
}

As the name suggests, when you use a lending iterator, it is lending values to you; you have to “give them back” (stop using them) before you can invoke next again. This gives more freedom to the iterator: it has the ability to use an internal mutable buffer, for example. But it takes some flexibility from you as the consumer. For example, the take_two function we saw earlier will not compile with LendingIterator (playground):

fn take_two_v2<T: LendingIterator>(
    t: &mut T,
) -> Option<(T::Item<'_>, T::Item<'_>)> {
    let Some(i) = t.next() else { return None };
    let Some(j) = t.next() else { return None };
    // *Key point:* `i` is still live here, even though we called `next`
    // again to get `j`.
    Some((i, j))
}

An aside: Inherent or accidental complexity?

It seems kind of annoying that Iterator and LendingIterator are two distinct traits. In a GC’d language, they wouldn’t be. This is a good example of what makes using Rust more complex. On the other hand, it’s worth asking, is this inherent or accidental complexity? The answer, I think, is “it depends”.

For example, I could certainly write an Iterator in Java that makes use of an internal buffer:

class Compute
    implements Iterator<ByteBuffer>
{
    ByteBuffer shared = new ByteBuffer(256);
    
    ByteBuffer next() {
        if (mutateSharedBuffer()) {
            return shared.asReadOnlyBufer();
        }
        return null;
    }
    
    /// Mutates `shared` and return true if there is a new value.
    private boolean mutateSharedBuffer() {
        // ...
    }
}

Despite the fact that Java has no way to express the concept, this is most definitely a lending iterator. If I try to write a function that invokes next twice, the first value will simply not exist anymore:

Compute c = new Compute();
ByteBuffer a = c.next();
ByteBuffer b = c.next();
byte a0 = a.get(); // a has been overwritten with b..
byte b0 = b.get(); // ..so `a0 == b0` is always true.

In a case like this, Rust’s distinctions are expressing inherent complexity4. If you want to have a shared buffer that you reuse between calls, Java makes it easy to make mistakes. Rust’s ownership rules force you to copy out data that you want to keep using, preventing bugs like the one above. Eventually people learn to adopt functional patterns or to clone data instead of sharing access to mutable state. But that requires time and experience, and the compiler and language isn’t helping you do so (unless you use, say, Haskell or O’Caml or some purely functional language). These kinds of patterns are a good example of why Rust code winds up having that “if it compiles, it works” feeling, and how the same machinery that guarantees memory safety also prevents logical bugs.

Iterator as a special case of LendingIterator

OK, so we saw that the Iterator and LendingIterator trait, while clearly related, express an important tradeoff. The Iterator trait declares up front that each Item is independent from the iterator, but the LendingIterator declares that the Item<'_> values returned may be borrowed from the iterator. This affects what fully generic code (like our take_two function) can do.

But note a careful hedge: I said that the LendingIterator trait declares that Item<'_> calues may be borrowed from the iterator. They don’t have to be. In fact, every Iterator can be viewed as a LendingIterator (as you can see in this playground), much like every FnMut (which takes an &mut self) can be viewed as a Fn (which takes an &self). Essentially an Iterator is “just” a LendingIterator that doesn’t happen to make use of the 'a argument when defining its Item<'a>.

It’s also possible to write a version of take_two that uses LendingIterator but compiles (playground)5:

fn take_two_v3<T, U>(t: &mut T) -> Option<(U, U)> 
where
    T: for<'a> LendingIterator<Item<'a> = U>
    // ^^^^^^                             ^
    // No matter which `'a` is used, result is always `U`,
    // which cannot reference `'a` (after all, `'a` is not
    // in scope when `U` is declared).
{
    let Some(i) = t.next() else { return None };
    let Some(j) = t.next() else { return None };
    Some((i, j))
}

The key here is the where-clause. It says that T::Item<'a> is always equal to U, no matter what 'a is. In other words, the item that is produced by this iterator is never borrowed from self – if it were, then its type would include 'a somewhere, as that is the lifetime of the reference to the iterator. As a result, take_two compiles successfully. Of course, it also can’t be used with LendingIterator values that actually make use of the flexibility the trait is offering them.

Can we “unify” Iterator and LendingIterator?

The fact that every iterator is just a special case of lending iterator begs the question, can they be unified? Jack Huey, in the runup to GATs, spend a while exploring this question, and concluded that it doesn’t work. To see why, imagine that we changed Iterator so that it had type Item<'a>, instead of just type Item. It’s easy enough to imagine that existing code that says T: Iterator<Item = u32> could be reinterpreted as for<'a> T: Iterator<Item<'a> = u32>, and then it ought to continue compiling. But the scheme doesn’t quite work precisely because of examples like take_two_v1:

fn take_two_v1<T: Iterator>(t: &mut T) -> Option<(T::Item, T::Item)> {...}

This signature just says that it takes an Iterator; it doesn’t put any additional constraints on it. If we’ve modified Iterator to be a lending iterator, then you can’t take two items independently. So we would have to have some way to say “any giving iterator” vs “any lending iterator” – and if we’re going to say those two things, why not make it two distinct traits?

FnMut is a giving trait

I started off this post talking about async closures, but so far I’ve just talked about iterators. What’s the connection? Well, for starters, the distinction between sync and async closures is precisely the difference between giving and lending closures.

Sync closures (at least as defined now) are giving traits. Consider a (simplified) view of the FnMut trait as an example:

trait FnMut<A> {
    type Output;
    fn call(&mut self, args: A) -> Self::Output;
    //      ^                      ^^^^^^^^^^^^
    // The `self` reference is independent from the
    // return type.
}

FnMut returns a Self::Output, just like the giving Iterator returns Self::Item.

FnMut has special syntax

You may not be accustomed to seeing the FnMut trait as a regular trait. In fact, on stable Rust, we require you to use special syntax with FnMut. For example, you write impl FnMut(u32) -> bool as a shorthand for FnMut<(u32,), Output = bool>. This is not just for convenience, it’s also because we have planned for some time to make changes to the FnMut trait (e.g., to make it variadic, rather than having it take a tuple of argument types), and the special syntax is meant to leave room for that. Pay attention here: this special syntax turns out to have an important role.

Async closures are a lending pattern

Async closures are closures that return a future. But that future has to capture self. So that makes them a kind of lending trait. Imagine we had a LendingFnMut:

trait LendingFnMut<A> {
    type Output<'this>
    where
        Self: 'this;
    
    fn call(&mut self, args: A) -> Self::Output<'_>;
    //      ^                                  ^^^^
    // Lends data from `self` as part of return value.
}

Now we could (not saying we should) express an async closure as a kind of bound on Output:

// Imagine we want something like this...
async fn foo(x: async FnMut() -> bool) {...}

// ...that is kind of this:
async fn foo<F>(f: F)
where
    F: LendingFnMut<()>,
    for<'a> F::Output<'a>: Future<Output = bool>
{
    ...
}

What is going on here? We saying first that f is a lending closure that takes no arguments F: LendingFnMut<()>. Note that we are not using the special FnMut sugar here, so this constraint says nothing about the value of Output. Then, in the next where-clause, we are specifying that Output implements Future<Output = bool>. Importantly, we never say what F::Output is. Just that it will implement Future. This means that it could include references to self (but it doesn’t have to).

Note what just happened. This is effectively a “third option” for how to desugar some kind of async closures. In my [previous post], I talked about using HKT and about transforming the FnMut trait into an async variant (async FnMut). But here we see that we could also have a lending variant of the trait and then bound the Output of that to implement Future.

Closure syntax gives us more room to maneuver

So, to recap things we have seen:

Giving vs lending traits is a fundamental pattern:
- A giving trait has a return value that never borrows from self
- A lending trait has a return value that may borrow from self
Giving traits are subtraits of lending traits; i.e., you can view a giving trait as a lending trait that happens not to lend.
We can’t convert Iterator to a lending trait “in place”, because functions that are generic over T: Iterator rely on it being the giving pattern.
Async closures are expressible using a lending variant of FnMut, but not the current trait, which is the giving version.

Given the last two points, it might seem logical that we also can’t convert FnMut “in place” to the lending version, and that therefore we have to add some kind of separate trait. In fact, though, this is not true, and the reason is because of the forced closure syntax. In particular, it’s not possible to write a function today that is generic over F: FnMut<A> but doesn’t specify a specific value for the Output generic type. When you write F: FnMut(u32), you are actually specifying F: FnMut<(u32,), Output = ()>. It is possible to write generic code that talks about F::Output, but that will always be normalizable to something else, because adding the FnMut bound always includes a value for Output.

In principle, then, we could redefine the Output associated type to take a lifetime parameter and change the desugaring for F: FnMut() -> R to be for<'a> F: FnMut<(), Output<'a> = R>. We would also have to make F::Output be legal even without specifying a value for its lifetime parameter; there are a few ways we could do that.

How to interpret impl Trait in the value of an associated type

Let’s imagine that we changed the Fn* to be lending traits, then. That’s still not enough to support our original goal:

fn take_closure(x: impl FnMut() -> impl Future<Output = bool>) { }
//                                 ^^^^
// Impl trait is not supported here.

The problem is that we also have to decide how to desugar impl Trait in this position. The interpretation that we want is not entirely obvious. We could choose to desugar -> impl Future as a bound on the Output type, i.e., to this:

fn take_closure<F>(x: F) 
where
    F: FnMut<()>,
    for<'a> <F as FnMut<()>>::Output<'a>: Future<Output = bool>.
{ }

If we did this, then the Output value is permitted to capture 'a, and hence we are taking advantage of FnMut being a lending closure. This means that, when we call the closure, we have to await the resulting future before we can call again, just like we wanted.

Complications

Interpreting impl Trait this way is a bit tricky. For one thing, it seems inconsistent with how we interpret impl Trait in a parameter like impl Iterator<Item = impl Debug>. Today, that desugars to two fresh parameters <F, G> where F: Iterator<Item = G>, G: Debug. We could probably change that without breaking real world code, since if the associated type is not a GAT I don’t think it matters, but we also permit things like impl Iterator<Item = (impl Debug, impl Debug)> that cannot be expressed as bounds. RFC #2289 proposed a new syntax for these sorts of bounds, such that one would write F: Iterator<Item: Debug> to express the same thing. By analogy, one could imagine writing F: FnMut(): Future<Output = bool>, but that’s not consistent with the -> impl Future that we see elsewhere. It feels like there’s a bit of a tangle of string to sort out here if we try to go down this road, and I worry about winding up with something that is very confusing for end-users (too many subtle variations).

Conclusion

To recap all the points made in this post:

Giving vs lending traits is a fundamental pattern:
- A giving trait has a return value that never borrows from self
- A lending trait has a return value that may borrow from self
Giving traits are subtraits of lending traits; i.e., you can view a giving trait as a lending trait that happens not to lend.
We can’t convert Iterator to a lending trait “in place”, because functions that are generic over T: Iterator rely on it being the giving pattern.
Async closures are expressible using a lending variant of FnMut, but not the current trait, which is the giving version.
It is possible to modify the Fn* traits to be “lending” by changing how we desugar F: Fn, but we have to make it possible to write F::Output even when Output has a lifetime parameter (perhaps only if that parameter is statically known not to be used).
We’d also have to interpret FnMut() -> impl Future as being a bound on a possibly lent return type, which would be somewhat inconsistent with how Foo<Bar = impl Trait> is interpreted now (which is as a fresh type).

Hat tip

Tip of the hat to Tyler Mandry – this post is basically a summary of a conversation we had.

Footnotes

There is a subtle point here. If you are iterating over, say, a &[T] value, then the Item you get back is an &T and hence borrowed. It may seem strange for me to say that you get ownership of the &T. The key point here is that the &T is borrowed from the collection you are iterating over and not from the iterator itself. In other words, from the point of view of the Iterator, it is copying out a &T reference and handing ownership of the reference to you. Owning the reference does not give you ownership of the data it refers to. ↩︎
Sometimes called “streaming” iterators. ↩︎
Not to mention that GATs remain in an “MVP” state that is rather unergonomic to use; we’re working on it! ↩︎
Of course, Rust’s notations for expressing these distinctions involve some “accidental complexity” of their own, and you might argue that the cure is worse than the disease. Fair enough. ↩︎
This example, by the way, demonstrates how the unergonomic state of GAT support. I don’t love writing for<'a> all the time. ↩︎

https://smallcultfollowing.com/babysteps/blog/2023/05/09/giving-lending-and-async-closures/

Fix my blog, please

Apr 3, 2023 Updated Apr 3, 2023

Show full content

It’s well known that my blog has some issues. The category links don’t work. It renders oddly on mobile. And maybe Safari, too? The Rust snippets are not colored. The RSS feed is apparently not advertised properly in the metadata. It’s published via a makefile instead of some hot-rod CI/CD script, and it uses jekyll instead of whatever the new hotness is.1 Being a programmer, you’d think I could fix this, but I am intimidated by HTML, CSS, and Github Actions. Hence this call for help: I’d like to hire someone to “tune up” the blog, a combination of fixing the underlying setup and also the visual layout. This post will be a rough set of things I have in mind, but I’m open to suggestions. If you think you’d be up for the job, read on.

Desiderata2

In short, I am looking for a rad visual designer who also can do the technical side of fixing up my jekyll and CI/CD setup.

Specific works item I have in mind:

Syntax highlighting
Make it look great on mobile and safari
Fix the category links
Add RSS feed into metadata and link it, whatever is normal
CI/CD setup so that when I push or land a PR, it deploys automatically
“Tune up” the layout, but keep the cute picture!3

Bonus points if you can make the setup easier to duplicate. Installing and upgrading Ruby is a horrible pain and I always forget whether I like rbenv or rubyenv or whatever better. Porting over to Hugo or Zola would likely be awesome, so long as links and content can be preserved. I do use some funky jekyll plugins, though I kind of forgot why. Alternatively maybe something with docker?

Current blog implementation

The blog is a jekyll blog with a custom theme. Sources are here:

Deployment is done via rsync at present.

Interested?

Send me an email with your name, some examples of past work, any recommendations etc, and the rate you charge. Thanks!

On the other hand, it has that super cute picture of my daughter (from around a decade ago, but still…). And the content, I like to think, is decent. ↩︎
I have a soft spot for wacky plurals, and “desiderata” might be my fave. I heard it first from a Dave Herman presentation to TC39 and it’s been rattling in my brain ever since, wanting to be used. ↩︎
Ooooh, I always want nice looking tables like those wizards who style github have. How come my tables are always so ugly? ↩︎

https://smallcultfollowing.com/babysteps/blog/2023/04/03/fix-my-blog-please/

Thoughts on async closures

Mar 29, 2023 Updated Mar 29, 2023

Show full content

I’ve been thinking about async closures and how they could work once we have static async fn in trait. Somewhat surprisingly to me, I found that async closures are a strong example for where async transformers could be an important tool. Let’s dive in! We’re going to start with the problem, then show why modeling async closures as “closures that return futures” would require some deep lifetime magic, and finally circle back to how async transformers can make all this “just work” in a surprisingly natural way.

Sync closures

Closures are omnipresent in combinator style APIs in Rust. For the purposes of this post, let’s dive into a really simple closure function, call_twice_sync:

fn call_twice_sync(mut op: impl FnMut(&str)) {
    op("Hello");
    op("Rustaceans");
}

As the name suggests, call_twice_sync invokes its argument twice. You might call it from synchronous code like so:

let mut buf = String::new();
call_twice_sync(|s| buf.push_str(s));

As you might expect, after this code executes, buf will have the value "HelloRustaceans". (Playground link, if you’re curious to try it out.)

Async closures as closures that return futures

Suppose we want to allow the closure to do async operations, though. That won’t work with call_twice_sync because the closure is a synchronous function:

let mut buf = String::new();
call_twice_sync(|s| s.push_str(receive_message().await));
//                                               ----- ERROR

Given that an async function is just a sync function that returns a future, perhaps we can model an async clousure as a sync closure that returns a future? Let’s try it.

async fn call_twice_async<F>(op: impl FnMut(&str) -> F)
where
    F: Future<Output = ()>,
{
    op("Hello").await;
    op("Rustaceans").await;
}

This compiles. So far so good. Now let’s try using it. For now we won’t even use an await, just the same sync code we tried before:

// Hint: won't compile
async fn use_it() {
    let mut buf = String::new();
    call_twice_async(|s| async { buf.push_str(s); });
    //                   ----- Return a future
}

Wait, what’s this? Lo and behold, we get an error, and a kind of intimidating one:

error: captured variable cannot escape `FnMut` closure body
  --> src/lib.rs:13:26
   |
12 |     let mut buf = String::new();
   |         ------- variable defined here
13 |     call_twice_async(|s| async { buf.push_str(s); });
   |                        - ^^^^^^^^---^^^^^^^^^^^^^^^
   |                        | |       |
   |                        | |       variable captured here
   |                        | returns an `async` block that contains a reference to a captured variable, which then escapes the closure body
   |                        inferred to be a `FnMut` closure
   |
   = note: `FnMut` closures only have access to their captured variables while they are executing...
   = note: ...therefore, they cannot allow references to captured variables to escape

So what is this all about? The last two lines actually tell you, but to really see it you have to do a bit of desugaring.

Futures capture the data they will use

The closure tries to construct a future with an async block. This async block is going to capture a reference to all the variables it needs: in this case, s and buf. So the closure will become something like:

|s| MyAsyncBlockType { buf, s }

where MyAsyncBlockType implements Future:

struct MyAsyncBlockType<'b> {
    buf: &'b mut String,
    s: &'b str,
}

impl Future for MyAsyncBlockType<'_> {
    type Output = ();
    
    fn poll(..) { ... }
}

The key point here is that the closure is returning a struct (MyAsyncBlockType) and this struct is holding on to a reference to both buf and s so that it can use them when it is awaited.

Closure signature promises to be finished

The problem is that the FnMut closure signature actually promises something different than what the body does. The signature says that it takes an &str – this means that the closure is allowed to use the string while it executes, but it cannot hold on to a reference to the string and use it later. The same is true for buf, which will be accessible through the implicit self argument of the closure. But when the closure return the future, it is trying to create references to buf and s that outlive the closure itself! This is why the error message says:

= note: `FnMut` closures only have access to their captured variables while they are executing...
= note: ...therefore, they cannot allow references to captured variables to escape

This is a problem!

Add some lifetime arguments?

So maybe we can declare the fact that we hold on to the data? It turns out you almost can, but not quite, and making an async closure be “just” a sync closure that returns a future would require some rather fundamental extensions to Rust’s trait system. There are two variables to consider, buf and s. Let’s begin with the argument s.

An aside: impl Trait capture rules

Before we dive more deeply into the closure case, let’s back up and imagine a top-level function that returns a future:

fn push_buf(buf: &mut String, s: &str) -> impl Future<Output = ()> {
    async move {
        buf.push_str(s);
    }
}

If you try to compile this code, you’ll find that it does not build (playground):

error[E0700]: hidden type for `impl Future<Output = ()>` captures lifetime that does not appear in bounds
 --> src/lib.rs:4:5
  |
3 |   fn push_buf(buf: &mut String, s: &str) -> impl Future<Output = ()> {
  |                    ----------- hidden type `[async block@src/lib.rs:4:5: 6:6]` captures the anonymous lifetime defined here
4 | /     async move {
5 | |         buf.push_str(s);
6 | |     }
  | |_____^
  |
help: to declare that `impl Future<Output = ()>` captures `'_`, you can introduce a named lifetime parameter `'a`
  |
3 | fn push_buf<'a>(buf: &'a mut String, s: &'a str) -> impl Future<Output = ()> + 'a  {
  |            ++++       ++                 ++                                  ++++

impl Trait values can only capture borrowed data if they explicitly name the lifetime. This is why the suggested fix is to use a named lifetime 'a for buf and s and declare that the Future captures it:

fn push_buf<'a>(buf: &'a mut String, s: &'a str) -> impl Future<Output = ()> + 'a

If you desugar this return position impl trait into an explicit type alias impl trait, you can see the captures more clearly, as they become parameters to the type. The original (no captures) would be:

type PushBuf = impl Future<Output = ()>;
fn push_buf<'a>(buf: &'a mut String, s: &'a str) -> PushBuf

and the fixed version would be:

type PushBuf<'a> = impl Future<Output = ()> + 'a
fn push_buf<'a>(buf: &'a mut String, s: &'a str) -> PushBuf<'a>

From functions to closures

OK, so we just saw how we can define a function that returns an impl Future, how that future will wind up capturing the arguments, and how that is made explicit in the return type by references to a named lifetime 'a. We could do something similar for closures, although Rust’s rather limited support for explicit closure syntax makes it awkward. I’ll use the unimplemented syntax from RFC 3216, you can see the workaround on the playground if that’s your thing:

type PushBuf<'a> = impl Future<Output = ()> + 'a


async fn test() {
    let mut c = for<'a> |buf: &'a mut String, s: &'a str| -> PushBuf<'a> {
        async move { buf.push_str(s) }
    });
    
    let mut buf = String::new();
    c(&mut buf, "foo").await;
}

(Side note that this is an interesting case for the “currently under debate” rules around defining type alias impl trait.)

Now for the HAMMER

OK, so far so grody, but we’ve shown that indeed you could define a closure that returns a future and it seems like things would work. But now comes the problem. Let’s take a look at the call_twice_async function – i.e., instead of looking at where the closure is defined, we look at the function that takes the closure as argument. That’s where things get tricky.

Here is call_twice_async, but with the anonymous lifetime given an explicit name 'a:

fn call_twice_async<F>(op: impl for<'a> FnMut(&str) -> F)
where
    F: Future<Output = ()>,

Now the problem is this: we need to declare that the future which is returned (F) might capture 'a. But F is declared in an outer scope, and it can’t name 'a. In other words, right now, the return type F of the closure op must be the same each time the closure is called, but to get the semantics we want, we need the return type to include a different value for 'a each time.

If Rust had higher-kinded types (HKT), you could do something a bit wild, like this…

fn call_twice_async<F<'_>>(op: impl for<'a> FnMut(&'a str) -> F<'a>)
//                  ----- HKT
where
    for<'a> F<'a>: Future<Output = ()>,

but, of course, we don’t have HKT (and, cool as they are, I don’t think that’s a good fit for Rust right now, it would bust our complexity barrier in my opinion and then some without near enough payoff).

Short of adding HKT or some equivalent, I believe the option workaround is to use a dyn type:

fn call_twice_async(op: impl for<'a> FnMut(&'a str) -> Box<dyn Future<Output = ()> + 'a>)

This works today (and it is, for example, what moro does to resolve exactly this problem). Of course that means that the closure has to allocate a box, instead of just returning an async move. That’s a non-starter.

So we’re kind of stuck. As far as I can tell, modeling async closures as “normal closures that happen to return futures” requires one of two unappealing options

extend the language with HKT, or possibly some syntactic sugar that ultimately however desugars to HKT
use Box<dyn> everywhere, giving up on zero cost futures, embedded use cases, etc.

More traits, less problems

But wait, there is another way. Instead of modeling async closures using the normal Fn traits, we could define some async closure traits. To keep our life simple, let’s just look at one, for FnMut:

trait AsyncFnMut<A> {
    type Output;
    
    async fn call(&mut self, args: A) -> Self::Output;
}

This is identical to the [sync FnMut] trait, except that call is an async fn. But that’s a pretty important difference. If we desugar the async fn to one using impl Trait, and then to GATs, we can start to see why:

trait AsyncFnMut<A> {
    type Output;
    type Call<'a>: Future<Output = Self::Output> + 'a;
    
    fn call(&mut self, args: A) -> Self::Call<'_>;
}

Notice the Generic Associated Type (GAT) Call. GATs are basically the Rusty way to do HKTs (if you want to go deeper, I wrote a comparison series which may help; back then we called them associated type constructors, not GATs). Essentially what has happened here is that we moved the “HKT” into the trait definition itself, instead of forcing the caller to have it.

Given this definition, when we try to write the “call twice async” function, things work out more smoothly:

async fn call_twice_async<F>(mut op: impl AsyncFnMut(&str)) {
    op.call("Hello").await;
    op.call("World").await;
}

Try it out on the playground, though note that we don’t actually support the () sugar for arbitrary traits, so I wrote impl for<'a> AsyncFnMut<&'a str, Output = ()> instead.

Connection to trait transformers

The translation between the normal FnMut trait and the AsyncFnMut trait was pretty automatic. The only thing we did was change the “call” function to async. So what if we had an async trait transformer, as was discussed earlier? Then we only have one “maybe async” trait, FnMut:

#[maybe(async)]
trait FnMut<A> {
    type Output;
    
    #[maybe(async)]
    fn call(&mut self, args: A) -> Self::Output;
}

Now we can write call_twice either sync or async, as we like, and the code is virtually identical. The only difference is that I write impl FnMut for sync or impl async FnMut for async:

fn call_twice_sync<F>(mut op: impl FnMut(&str)) {
    op.call("Hello");
    op.call("World");
}

async fn call_twice_async<F>(mut op: impl async FnMut(&str)) {
    op.call("Hello").await;
    op.call("World").await;
}

Of course, with a more general maybe-async design, we might just write this function once, but that’s separate concern. Right now I’m only concerned with the idea of authoring traits that can be used in two modes, but not necessarily with writing code that is generic over which mode is being used.

Final note: creating the closure in a maybe-async world

When calling call_twice, we could write |s| buf.push_str(s) or async |s| buf.push_str(s) to indicate which traits it implements, but we could also infer this from context. We already do similar inference to decide the type of s for example. In fact, we could have some blanket impls, so that every F: FnMut also implements F: async FnMut; I guess this is generally true for any trait.

Conclusion

My conclusions:

Nothing in this discussion required or even suggested any changes to the underlying design of async fn in trait. Stabilizing the statically dispatched subset of async fn in trait should be forwards compatible with supporting async closures. 🎉
The “higher-kinded-ness” of async closures has to go somewhere. In stabilizing GATs, in my view, we’ve committed to the path that it should go into the trait definition (vs HKT, which would push it to the use site). The standard “def vs use site” tradeoffs apply here, I think: def sites often feel simpler and easier to understand, but are less flexible. I think that’s fine.
Async trait transformers feel like a great option here that makes async closures work just like you would expect.

https://smallcultfollowing.com/babysteps/blog/2023/03/29/thoughts-on-async-closures/

Must move types

Mar 16, 2023 Updated Mar 16, 2023

Show full content

Rust has lots of mechanisms that prevent you from doing something bad. But, right now, it has NO mechanisms that force you to do something good1. I’ve been thinking lately about what it would mean to add “must move” types to the language. This is an idea that I’ve long resisted, because it represents a fundamental increase to complexity. But lately I’m seeing more and more problems that it would help to address, so I wanted to try and think what it might look like, so we can better decide if it’s a good idea.

Must move?

The term ‘must move’ type is not standard. I made it up. The more usual name in PL circles is a “linear” type, which means a value that must be used exactly once. The idea of a must move type T is that, if some function f has a value t of type T, then f must move t before it returns (modulo panic, which I discuss below). Moving t can mean either calling some other function that takes ownership of t, returning it, or — as we’ll see later — destructuring it via pattern matching.

Here are some examples of functions that move the value t. You can return it…

fn return_it<T>(t: T) {
    t
}

…call a function that takes ownership of it…

fn send_it<T>(t: T) {
    channel.send(t); // takes ownership of `t`
}

…or maybe call a constructor function that takes ownership of it (which would usually mean you must “recursively” move the result)…

fn return_opt<T>(t: T) -> Option<T> {
    Some(t) // moves t into the option
}

Doesn’t Rust have “linear types” already?

You may have heard that Rust’s ownership and borrowing is a form of “linear types”. That’s not really true. Rust has affine types, which means a value that can be moved at most once. But we have nothing that forces you to move a value. For example, I can write the consume function in Rust today:

fn consume<T>(t: T) {
    /* look ma, no .. nothin' */
}

This function takes a value t of (almost, see below) any type T and…does nothing with it. This is not possible with linear types. If T were linear, we would have to do something with t — e.g., move it somewhere. This is why I call linear types must move.

What about the destructor?

“Hold up!”, you’re thinking, “consume doesn’t actually do nothing with t. It drops t, executing its destructor!” Good point. That’s true. But consume isn’t actually required to execute the destructor; you can always use forget to avoid it2:

fn consume<T>(t: T) {
    std::mem::forget(t();
}

If weren’t possible to “forget” values, destructors would mean that Rust had a linear system, but even then, it would only be in a technical sense. In particular, destructors would be a required action, but of a limited form — they can’t, for example, take arguments. Nor can they be async.

What about Sized?

There is one other detail about the consume type worth mentioning. When I write fn consume<T>(t: T), that is actually shorthand for saying “any type T that is Sized”. In other words, the fully elaborated “do nothing with a value” function looks like this:

fn consume<T: Sized>(t: T) {
    std::mem::forget(t();
}

If you don’t want this default Sized bound, you write T: ?Sized. The leading ? means “maybe Sized” — i.e., now T can any type, whether it be sized (e.g., u32) or unsized (e.g., [u32]).

This is important: a where-clause like T: Foo narrows the set of types that T can be, since now it must be a type that implements Foo. The “maybe” where-clause T: ?Sized (we don’t accept other traits here) broadens the set of types that T can be, by removing default bounds.

So how would “must move” work?

You might imagine that we could encode “must move” types via a new kind of bound, e.g., T: MustMove. But that’s actually backwards. The problem is that “must move” types are actually a superset of ordinary types — after all, if you have an ordinary type, it’s still ok to write a function that always moves it. But it’s also ok to have a function that drops it or forgets it. In contrast, with a “must move” type, the only option is to move it. This implies that what we want is a ? bound, not a normal bound.

The notation I propose is ?Drop. The idea is that, by default, every type parameter D is assumed to be droppable, meaning that you can always choose to drop it at any point. But a M: ?Drop parameter is not necessarily droppable. You must ensure that a value of type M is moved somewhere else.

Let’s see a few examples to get the idea of it. To start, the identity function, which just returns its argument, could be declared with ?Drop:

fn identity<M: ?Drop>(m: M) -> M {
    m // OK — moving `m` to the caller
}

But the consume function could not:

fn consume<M: ?Drop>(m: M) -> M {
    // ERROR: `M` is not moved.
}

You might think that the version of consume which calls mem::forget is sound — after all, forget is declared like so

fn forget<T>(t: T) {
    /* compiler magic to avoid dropping */
}

Therefore, if consume were to call forget(m), wouldn’t that count as a move? The answer is yes, it would, but we still get an error. This is because forget is not declared with ?Drop, and therefore there is an implicit T: Drop where-clause:

fn consume<M: ?Drop>(m: M) -> M {
    forget(m); // ERROR: `forget` requires `M: Drop`, which isn’t known to hold.
}

Declaring types to be ?Drop

Under this scheme, all structs and types you declare would be droppable by default. If you don’t implement Drop explicitly, the compiler adds an automatic Drop impl for you that just recursively drops your fields. But you could explicitly declare your type to be ?Drop by using a negative impl:

pub struct Guard {
    value: u32
}

impl !Drop for Guard { }

When you do this, the type becomes “must move” and any function which has a value of type Guard must either move it somewhere else. You might wonder then how you ever terminate — the answer is that one way to “move” the value is to unpack it with a pattern. For example, Guard might declare a log method:

impl Guard {
    pub fn log(self, message: &str) {
        let Guard { value } = self; // moves “self”
        println!(“{value} = {message}”);
    }
}

This plays nicely with privacy: if your type have private fields, only functions within that module will be able to destruct it, everyone else must (eventually) discharge their obligation to move by invoking some function within your module.

Interactions between “must move” and control-flow

Must move values interact with control-flow like ?. Consider the Guard type from the previous section, and imagine I have a function like this one…

fn execute(t: Guard) -> Result<(), std::io::Error> {
    let s: String = read_file(“message.txt”)?;  // ERROR: `t` is not moved on error
    t.log(&s);
    Ok(())
}

This code would not compile. The problem is that the ? in read_file may return with an Err result, in which case the call to t.log would not execute! This is a good error, in the sense that it is helping us ensure that the log call to Guard is invoked, but you can imagine that it’s going to interact with other things. To fix the error, you should do something like this…

fn execute(t: Guard) -> Result<(), std::io::Error> {
    match read_file(“message.txt”) {
        Ok(s) => {
		t.log(&s);
		Ok(())
        }
        Err(e) => {
            t.log(“error”); // now `t` is moved
            Err(e)
        }
    }
}

Of course, you could also opt to pass back the t value to the caller, making it their problem.

Conditional “must move” types

Talking about types like Option and Result — it’s clear that we are going to want to be able to have types that are conditionally must move — i.e., must move only if their type parameter is “must move”. That’s easy enough to do:

enum Option<T: ?Drop> {
    Some(T),
    None,
}

Some of the methods on Option work just fine:

impl<T: ?Drop> Option<T> {
    pub fn map<U: ?Drop>(self, op: impl FnOnce(T) -> U) -> Option<U> {
        match self {
            Some(t) => Some(op(t)),
            None => None,
        }
    }
}

Other methods would require a Drop bound, such as unwrap_or:

impl<T: ?Drop> Option<T> {
    pub fn unwrap_or(self, default:T) -> T
    where
        T: Drop,
    {
        match self {
            // OK
            None => default,

            // Without the `T: Drop` bound, we are not allowed to drop `default` here.
            Some(v) => v,
       }
    }
}

“Must move” and panic

One very interesting question is what to do in the case of panic. This is tricky! Ordinarily, a panic will unwind all stack frames, executing destructors. But what should we do for a ?Drop type that doesn’t have a destructor?

I see a few options:

Force an abort. Seems bad.
Deprecate and remove unwinding, limit to panic=abort. A more honest version of the previous one. Still seems bad, though dang would it make life easier.
Provide some kind of fallback option.

The last one is most appealing, but I’m not 100% sure how it works. It may mean that we don’t want to have the “must move” opt-in be to impl !Drop but rather to impl MustMove, or something like that, which would provide a method that is invoked on the case of panic (this method could, of course, choose to abort). The idea of fallback might also be used to permit cancellation with the ? operator or other control-flow drops (though I think we definitely want types that don’t permit cancellation in those cases).

“Must move” and trait objects

What do we do with dyn? I think the answer is that dyn Foo defaults to dyn Foo + Drop, and hence requires that the type be droppable. To create a “must move” dyn, we could permit dyn Foo + ?Drop. To make that really work out, we’d have to have self methods to consume the dyn (though today you can do that via self: Box<Self> methods).

Uses for “must move”

Contra to best practices, I suppose, I’ve purposefully kept this blog post focused on the mechanism of must move and not talked much about the motivation. This is because I’m not really trying to sell anyone on the idea, at least not yet, I just wanted to sketch some thoughts about how we might achieve it. That said, let me indicate why I am interested in “must move” types.

First, async drop: right now, you cannot have destructors in async code that perform awaits. But this means that async code is not able to manage cleanup in the same way that sync code does. Take a look at the status quo story about dropping database handles to get an idea of the kinds of problems that arise. Adding async drop itself isn’t that hard, but what’s really hard is guaranteeing that types with async drop are not dropped in sync code, as documented at length in Sabrina Jewson’s blog post. This is precisely because we currently assume that all types are droppable. The simplest way to achieve “async drop” then would to define a trait trait AsyncDrop { async fn async_drop(self); } and then make the type “must move”. This will force callers to eventually invoke async_drop(x).await. We might want some syntactic sugar to handle ? more easily, but that could come later.

Second, parallel structured concurrency. As Tyler Mandry elegant documented, if we want to mix parallel scopes and async, we need some way to have futures that cannot be forgotten. The way I think of it is like this: in sync code, when you create a local variable x on your stack, you have a guarantee from the language that it’s destructor will eventually run, unless you move it. In async code, you have no such guarantee, as your entire future could just be forgotten by a caller. “Must move” types solve this problem (with some kind of callback for panic) give us a tool to solve this problem, by having the future type be ?Drop — this is effectively a principled way to integrate completion-style futures that must be fully polled.

Finally, “liveness conditions writ large”. As I noted in the beginning, Rust’s type system today is pretty good at letting you guarantee “safety” properties (“nothing bad happens”), but it’s much less useful for liveness properties (“something good eventually happens”). Destructors let you get close, but they can be circumvented. And yet I see liveness properties cropping up all over the place, often in the form of guards or cleanup that really ought to happen. Any time you’ve ever wanted to have a destructor that takes an argument, that applies. This comes up a lot in unsafe code, in particular. Being able to “log” those obligations via “must move” types feels like a really powerful tool that will be used in many different ways.

Parting thoughts

This post sketches out one way to get “true linear” types in Rust, which I’ve dubbed as “must move” types. I think I would call this the ?Drop approach, because the basic idea is to allow types to “opt out” from being “droppable” (in which case they must be moved). This is not the only approach we could use. One of my goals with this blog post is to start collecting ideas for different ways to add linear capabilities, so that we can compare them with one another.

I should also address the obvious “elephant in the room”. The Rust type system is already complex, and adding “must move” types will unquestionably make it more complex. I’m not sure yet whether the tradeoff is worth it: it’s hard to judge without trying the system out. I think there’s a good chance that “must move” types live “on the edges” of the type system, through things like guards and so forth that are rarely abstracted over. I think that when you are dealing with concrete types, like the Guard example, must move types won’t feel particularly complicated. It will just be a helpful lint saying “oh, by the way, you are supposed to clean this up properly”. But where pain will arise is when you are trying to build up generic functions — and of course just in the sense of making the Rust language that much bigger. Things like ?Sized definitely make the language feel more complex, even if you never have to interact with them directly.

On the other hand, “must move” types definitely add value in the form of preventing very real failure modes. I continue to feel that Rust’s goal, above all else, is “productive reliability”, and that we should double down on that strength. Put another way, I think that the complexity that comes from reasoning about “must move” types is, in large part, inherent complexity, and I feel ok about extending the language with new tools for that. We saw this with the interaction with the ? operator — no doubt it’s annoying to have to account for moves and cleanup when an error occurs, but it’s also a a key part of building a robust system, and destructors don’t always cut it.

Well, apart from the “must use” lint. ↩︎
Or create a Rc-cycle, if that’s more your speed. ↩︎

https://smallcultfollowing.com/babysteps/blog/2023/03/16/must-move-types/

Temporary lifetimes

Mar 15, 2023 Updated Mar 15, 2023

Show full content

In today’s lang team design meeting, we reviewed a doc I wrote about temporary lifetimes in Rust. The current rules were established in a blog post I wrote in 2014. Almost a decade later, we’ve seen that they have some rough edges, and in particular can be a common source of bugs for people. The Rust 2024 Edition gives us a chance to address some of those rough edges. This blog post is a copy of the document that the lang team reviewed. It’s not a proposal, but it covers some of what works well and what doesn’t, and includes a few sketchy ideas towards what we could do better.

Summary

Rust’s rules on temporary lifetimes often work well but have some sharp edges. The 2024 edition offers us a chance to adjust these rules. Since those adjustments change the times when destructors run, they must be done over an edition.

Design principles

I propose the following design principles to guide our decision.

Independent from borrow checker: We need to be able to figure out when destructors run without consulting the borrow checker. This is a slight weakening of the original rules, which required that we knew when destructors would run without consulting results from name resolution or type check.
Shorter is more reliable and predictable: In general, we should prefer shorter temporary lifetimes, as that results in more reliable and predictable programs.
- Editor’s note: A number of people in the lang questions this point. The reasoning is as follows. First, a lot of the problems in practice come from locks that are held longer than expected. Second, problems that come from temporaries being dropped too early tend to manifest as borrow check errors. Therefore, they don’t cause reliability issues, but rather ergonomic ones.
Longer is more convenient: Extending temporary lifetimes where we can do so safely gives more convenience and is key for some patterns.
- Editor’s note: As noted in the previous bullet, our current rules sometimes give temporary lifetimes that are shorter than what the code requires, but these generally surface as borrow check errors.

Equivalences and anti-equivalences

The rules should ensure that E and (E), for any expression E, result in temporaries with the same lifetimes.

Today, the rules also ensure that E and {E}, for any expression E, result in temporaries with the same lifetimes, but this document proposes dropping that equivalence as of Rust 2024.

Current rules When are temporaries introduced?

Temporaries are introduced when there is a borrow of a value-producing expression (often called an “rvalue”). Consider an example like &foo(); in this case, the compiler needs to produce a reference to some memory somewhere, so it stores the result of foo() into a temporary local variable and returns a reference to that.

Often the borrows are implicit. Consider a function get_data() that returns a Vec<T> and a call get_data().is_empty(); because is_empty() is declared with &self on [T], this will store the result of get_data() into a temporary, invoke deref to get a &[T], and then call is_empty.

Default temporary lifetime

Whenever a temporary is introduced, the default rule is that the temporary is dropped at the end of the innermost enclosing statement; this rule is sometimes summarized as “at the next semicolon”. But the definition of statement involves some subtlety.

Block tail expressions. Consider a Rust block:

{
    stmt[0];
    ...
    stmt[n];
    tail_expression
}

And temporaries created in a statement stmt[i] will be dropped once that statement completes. But the tail expression is not considered a statement, so temporaries produced there are dropped at the end of the statement that encloses the block. For example, given get_data and is_empty as defined in the previous section, and a statement let x = foo({get_data().is_empty()});, the vector will be freed at the end of the let.

Conditional scopes for if and while. if and while expressions and if guards (but not match or if let) introduce a temporary scope around the condition. So any temporaries from expr in if expr { ... } would be dropped before the { ... } executes. The reasoning here is that all of these contexts produce a boolean and hence it is not possible to have a reference into the temporary that is still live. For example, given if get_data().is_empty(), the vector must be safe to drop before entering the body of the if. This is not true for a case like match get_data().last() { Some(x) => ..., None => ... }, where the x would be a reference into the vector returned by get_data().

Function scope. The tail expression of a function block (e.g., the expression E in fn foo() { E }) is not contained by any statement. In this case, we drop temporaries from E just before returning from the function, and thus fn last() -> Option<&Datum> { get_data().last() } fails the borrow check (because the temporary returned by get_data() is dropped before the function returns). Importantly, this function scope ends after local variables in the function are dropped. Therefore, this function…

fn foo() {
    let x = String::new();
    vec![].is_empty()
}

…is effectively desugared to this…

fn foo() {
    let tmp;
    {
        let x = String::new();
        { tmp = vec![]; &tmp }.is_empty()
    } // x dropped here
} // tmp dropped here

Lifetime extension

In some cases, temporary lifetimes are extended from the innermost statement to the innermost block. The rules for this are currently defined syntactically, meaning that they do not consider types or name resolution. The intution is that we extend the lifetime of the temporary for an expression E if it is evident that this temporary will be stored into a local variable. Consider the trivial example:

let t = &foo();

Here, foo() is a value expression, and hence &foo() needs to create a temporary so that we can have a reference. But the resulting &T is going to be stored in the local variable t. If we were to free the temporary at the next ;, this local variable would be immediately invalid. That doesn’t seem to match the user intent. Therefore, we extend the lifetime of the temporary so that it is dropped at the end of the innermost block. This is the equivalent of:

let tmp;
let t = { tmp = foo(); &tmp };

We can extend this same logic to compound expressions. Consider:

let t = (&foo(), &bar());

we will expand this to

let tmp1;
let tmp2;
let t = { tmp1 = foo(); tmp2 = bar(); (&tmp1, &tmp2) };

The exact rules are given by a grammar in the code and also covered in the reference. Rather than define them here I’ll just give some examples. In each case, the &foo() temporary is extended:

let t = &foo();

// Aggregates containing a reference that is stored into a local:
let t = Foo { x: &foo() };
let t = (&foo(), );
let t = [&foo()];

// Patterns that create a reference, rather than `&`:
let ref t = foo();

Here are some cases where temporaries are NOT extended:

let f = some_function(&foo()); // could be `fn some_function(x: &Vec<T>) -> bool`, may not need extension

struct SomeTupleStruct<T>(T);
let f = SomeTupleStruct(&foo()); // looks like a function call

Patterns that work well in the current rules Storing temporary into a local

struct Data<'a> {
    data: &'a [u32] // use a slice to permit subslicing later
}

fn initialize() {
    let d = Data { x: &[1, 2, 3] };
    //                 ^^^^^^^^^ extended temporary
    d.process();
}

impl Data<'_> {
    fn process(&mut self) {
        ...
        self.data = &self.data[1..];
        ...
    }
}

Reading values out of a lock/refcell

The current rules allow you to do atomic operations on locals/refcells conveniently, so long as they don’t return references to the data. This works great in a let statement (there are other cases below where it works less well).

let result = cell.borrow_mut().do_something();
// `cell` is not borrowed here
...

Error-prone cases with today’s rules

Today’s rules sometimes give lifetimes that are too long, resulting in bugs at runtime.

Deadlocks because of temporary lifetimes in matches

One very common problem is deadlocks (or panics, for ref-cell) when mutex locks occur in a match scrutinee:

match lock.lock().data.clone() {
    //     ------ returns a temporary guard
    
    Data { ... } => {
        lock.lock(); // deadlock
    }
    
} // <-- lock() temporary dropped here

Ergonomic problems with today’s rules

Today’s rules sometimes give lifetimes that are too short, resulting in ergonomic failures or confusing error messages.

Call parameter temporary lifetime is too short (RFC66)

Somewhat surprisingly, the following code does not compile:

fn get_data() -> Vec<u32> { vec![1, 2, 3] }

fn main() {
    let last_elem = get_data().last();
    drop(last_elem); // just a dummy use
}

This fails because the Vec returned by get_data() is stored into a temporary so that we can invoke last, which requires &self, but that temporary is dropped at the ; (as this case doesn’t fall under the lifetime extension rules).

RFC 66 proposed a rather underspecified extension to the temporary lifetime rules to cover this case; loosely speaking, the idea was to extend the lifetime extension rules to extend the lifetime of temporaries that appear in function arguments if the function’s signature is going to return a reference from that argument. So, in this case, the signature of last indicates that it returns a reference from self:

impl<T> [T] {
    fn last(&self) -> Option<&T> {...}
}

and therefore, since E.last() is being assigned to last_elem, we would extend the lifetime of any temporaries in E (the value for self). Ding Xiang Fei has been exploring how to actually implement RFC 66 and has made some progress, but it’s clear that we need to settle on the exact rules for when lifetime temporary extension should happen.

Even assuming we created some rules for RFC 66, there can be confusing cases that wouldn’t be covered. Consider this statement:

let l = get_data().last().unwrap();
drop(l); // ERROR

Here, the unwrap call has a signature fn(Option<T>) -> T, which doesn’t contain any references. Therefore, it does not extend the lifetimes of temporaries in its arguments. The argument here is the expression get_data().last(), which creates a temporary to store get_data(). This temporary is then dropped at the end of the statement, and hence l winds up pointing to dead memory.

Statement-like expressions in tail position

The original rules assumed that changing E to {E} should not change when temporaries are dropped. This has the counterintuitive behavior though that introducing a block doesn’t constrain the stack lifetime of temporaries. It is also surprising for blocks that have tail expressions that are “statement-like” (e.g., match), because these can be used as statements without a ;, and thus users may not have a clear picture of whether they are an expression producing a value or a statement.

Example. The following code does not compile:

struct Identity<A>(A);
impl<A> Drop for Identity<A> {
    fn drop(&mut self) { }
}
fn main() {
    let x = 22;
    match Identity(&x) {
        //------------ creates a temporary that can be matched
        _ => {
            println!("");
        }
    } // <-- this is considered a trailing expression by the compiler
} // <-- temporary is dropped after this block executes

Because of the way that the implicit function scope works, and the fact that this match is actually the tail expression in the function body, this is effectively desugared to something like this:

struct Identity<A>(A);
impl<A> Drop for Identity<A> {
    fn drop(&mut self) { }
}
fn main() {
    let tmp;
    {
        let x = 22;
        match {tmp = Identity(&x); tmp} {
            _ => {
                println!("");
            }
        }
    }
}

Lack of equivalence between if and match

The current rules distinguish temporary behavior for if/while from match/if-let. As a result, code like this compiles and executes fine:

if lock.lock().something { // grab lock, then release
    lock.lock(); // OK to grab lock again
}

but very similar code using a match gives a deadlock:

if let true = lock.lock().something {
    lock.lock(), // Deadlock lock.lock(), // Deadlock
}

// or

match lock.lock().something {
    true => lock.lock(), // Deadlock
    false => (),
}

Partly as a result of this lack of equivalence, we have had a lot of trouble doing desugarings for things like let-else and if-let expressions.

Named block

Tail expressions aren’t the only way to “escape” a value from a block, the same applies to breaking with a named label, but they don’t benefit from lifetime extension. The following example, therefore, fails to compile:

fn main() {
    let x = 'a: {
        break 'a &vec![0]; // ERROR
    };
    
    drop(x);
}

Note that a tail-expression based version does compile today:

fn main() {
    let x = { &vec![0] };
    drop(x);
}

Proposed properties to focus discussion

To focus discussion, here are some named examples we can use that capture key patterns.

Examples of behaviors we would ideally preserve:

read-locked-field: let x: Event = ref_cell.borrow_mut().get_event(); releases borrow at the end of the statement (as today)
obvious aggregate construction: let x: Event = Event { x: &[1, 2, 3] } stores [1, 2, 3] in a temporary with block scope

Examples of behavior that we would like, but which we don’t have today, resulting in bugs/confusion:

match-locked-field: match data.lock().unwrap().data { ... } releases lock before match body executes
if-match-correspondence: if <expr> {}, if let true = <expr> {}, and match <expr> { true => .. } all behave the same with respect to temporaries in <expr> (unlike today)
block containment: {<expr>} must not create any temporaries that extend past the end of the block (unlike today)
tail-break-correspondence: {<expr>} and 'a: { break 'a <expr> } should be equivalent

Examples we behavior that we would like, but which we don’t have today, resulting in ergonomic pain (these cases may not be achievable without violating the previous ones):

last: let x = get_data().last(); (the canonical RFC66 example) will extend lifetime of data to end of block; also covers (some) new methods like let x: Event<'_> = Event::new(&[1, 2, 3])
last-unwrap: let x = get_data().last().unwrap(); (extended form of the above) will extend lifetime of data to end of block
tuple struct construction: let x = Event(&[1, 2, 3])

Tightest proposal

The proposal with minimal confusion would be to remove syntactic lifetime extension and tighten default lifetimes in two ways:

Tighten block tail expressions. Have temporaries in the tail expression of a block be dropped when returning from the block. This ensures block containment and tail-break-correspondence.

Tighten match scrutinees. Drop temporaries from match/if-let scrutinees performing the match. This ensures match-locked-field and if-match-correspondence. To avoid footguns, we can tighten up the rules around match/if-let scrutinees so that temporaries are dropped before entering body of the match.

In short, temporaries would always be dropped at the innermost statement, match/if/if-let/while scrutinee, or block.

Things that no longer build

There are three cases that build today which will no longer build with this minimal proposal:

let x = &vec![] no longer builds, nor does let x = Foo { x: &[1, 2, 3] }. Both of them create temporaries that are dropped at the end of the let.
match &foo.borrow_mut().parent { Some(ref p) => .., None => ... } no longer builds, since temporary from borrow_mut() is dropped before entering the match arms.
{let x = {&vec![0]}; ...} no longer builds, as a result of tightening block tail expressions. Note however that other examples, e.g. the one from th section “statement-like expressions in tail position”, would now build successfully.

The core proposal also does nothing to address RFC66-like patterns, tuple struct construction, etc.

Extension option A: Do What I Mean

One way to overcome the concerns of the core proposal would be to extend with more “DWIM”-like options. For example, we could extend “lifetime extension rules” to cover match expressions.

Lifetime extension for let statements, as today. To allow let x = &vec![] to build, we can restore today’s lifetime extension rules.

Pro: things like this will build

let x = Foo { 
    data: &get_data()
    //     ---------- stored in a temporary that outlives `x`
};)

Con: the following example would build again, which leads to a (perhaps surprising) panic – that said, I’ve never seen a case like this in the wild, the confusion always occurs with match

use std::cell::RefCell;

struct Foo<'a> {
    data: &'a u32
}

fn main() {
    let cell = RefCell::new(22);
    let x: Foo<'_> = Foo {
        data: &*cell.borrow_mut(),
    };
    *cell.borrow_mut() += 1; // <-- panic
    drop(x);
}

Scope extension for match structinees. To allow match &foo.borrow_mut().parent { Some(ref x => ... } to work, we could fix this by including similar scope extension rules to the ones used with let initializers (i.e., if we can see that a ref is taken into the temporary, then extend its lifetime, but otherwise do not).

Pro: match &foo.borrow_mut().parent { .. } works as it does today.
Con: Syntactic extension rules can be approximate, so e.g. match (foo(), bar().baz()) { (Some(ref x), y) => .. } would likely keep the temporary returned by bar(), even though it is not referenced.

RFC66-like rules. Use some heuristic rules to determine, from a function signature, when the return type includes data from the arguments. If the return type of a function f references a generic type or lifetime parameter that also appears in some argument i, and the function call f(a0, ..., ai, ..., an) appears in some position with an extended temporary lifetime, then ai will also have an extended temporary lifetime (i.e., any temporaries created in ai will persist until end of enclosing block / match expression).

Pro: Patterns like let x = E where E is get_data().last(), get_data().last().unwrap(), TupleStruct(&get_data()), or SomeStruct::new(&get_data()) would all allocate a temporary for get_data() that persistent until the end of the enclosing block. This occurs because
Con: Complex rules imply that let x = locked_vec.lock().last() would also extend lock lifetime to end-of-block, which users may not expect.

Extension option B: “Anonymous lets” for extended temporary lifetimes

Allow expr.let as an operator that means “introduce a let to store this value inside the innermost block but before the current statement and replace this statement with a reference to it”. So for example:

let x = get_data().let.last();

would be equivalent to

let tmp = get_data();
let x = tmp.last();

Question: Do we keep some amount of implicit extension? For example, should let x = &vec![] keep compiling, or do you have to do let x = &vec![].let?

Parting notes

Editor’s note: As I wrote at the start, this was an early document to prompt discussion in a meeting (you can see notes from the meeting here) It’s not a full proposal. That said, my position when I started writing was different than where I landed. Initially I was going to propose more of a “DWIM”-approach, tweaking the rules to be tighter in some places, more flexible in others. I’m still interested in exploring that, but I am worried that the end-result will just be people having very little idea when their destructors run. For the most part, you shouldn’t have to care about that, but it is sometimes quite important. That leads me to: let’s have some simple rules that can be explained on a postcard and work “pretty well”, and some convenient way to extend lifetimes when you want it. The .let syntax is interesting but ultimately probably too confusing to play this role.

Oh, and a note on the edition: I didn’t say it explicitly, but we can make changes to temporary lifetime rules over an edition by rewriting where necessary to use explicit lets, or (if we add one) some other explicit notation. The result would be code that runs on all editions with same semantics.

https://smallcultfollowing.com/babysteps/blog/2023/03/15/temporary-lifetimes/

To async trait or just to trait

Mar 12, 2023 Updated Mar 12, 2023

Show full content

One interesting question about async fn in traits is whether or not we should label the trait itself as async. Until recently, I didn’t see any need for that. But as we discussed the question of how to enable “maybe async” code, we realized that there would be some advantages to distinguishing “async traits” (which could contain async functions) from sync traits (which could not). However, as I’ve thought about the idea more, I’m more and more of the mind that we should not take this step — at least not now. I wanted to write a blog post divin g into the considerations as I see them now.

What is being proposed?

The specific proposal I am discussing is to require that traits which include async functions are declared as async traits…

// The "async trait" (vs just "trait") would be required
// to have an "async fn" (vs just a "fn").
async trait HttpEngine {
    async fn fetch(&mut self, url: Url) -> Vec<u8>;
}

…and when you reference them, you use the async keyword as well…

fn load_data<H>(h: &mut impl async HttpEngine, urls: &[Url]) {
    //                       ----- just writing `impl HttpEngine`
    //                             would be an error
    …
}

This would be a change from the support implemented in nightly today, where any trait can have async functions.

Why have “async traits” vs “normal” traits?

When authoring an async application, you’re going to define traits like HttpEngine that inherently involve async operations. In that case, having to write async trait seems like pure overhead. So why would we ever want it?

The answer is that not all traits are like HttpEngine. We can call HttpEngine an “always async” trait — it will always involve an async operation. But a lot of traits are “maybe async” — they sometimes involve async operations and sometimes not. In fact, we can probably break these down further: you have traits like Read, which involve I/O but have a sync and async equivalent, and then you have traits like Iterator, which are orthogonal from I/O.

Particularly for traits like Iterator, the current trajectory will result in two nearly identical traits in the stdlib: Iterator and AsyncIterator. These will be mostly the same apart from AsyncIterator have an async next function, and perhaps some more combinators. It’s not the end of the world, but it’s also not ideal, particularly when you consider that we likely want more “modes”, like a const Iterator, a “sendable” iterator, perhaps a fallible iterator (one that returns results), etc. This is of course the problem often referred to as the “color problem”, from Bob Nystron’s well-known “What color is your function?” blog post, and it’s precisely what the “keyword generics” initiative is looking to solve.

Requiring an async keyword ensures consistency between “maybe” and “always” async traits…

It’s not really clear what a full solution to the “color problem” looks like. But whatever it is, it’s going to involve having traits with multiple modes. So instead of Iterator and AsyncIterator, we’ll have the base definition of Iterator and then a way to derive an async version, async Iterator. We can then call an Iterator a “maybe async” trait, because it might be sync but it might be async. We might declare a “maybe async” trait using an attribute, like this1:

#[maybe(async)]
trait Iterator {
    type Item;

    // Because of the #[maybe(async)] attribute,
    // the async keyword on this function means “if
    // this trait is in async mode, then this is an
    // async function”:
    async fn next(&mut self) -> Option<Self::Item>;
}

Now imagine I have a function that reads urls from some kind of input stream. This might be an async fn that takes an impl async Iterator as argument:

async fn read_urls(urls: impl async Iterator<Item = Url>) {
    //                        --——- specify async mode
    while let Some(u) = urls.next().await {
        //                          -———- needed because this is an async iterator
        …
    }
}

But now let’s say I want to combine this (async) iterator of urls and use an HttpEngine (our “always async” trait) to fetch them:

async fn fetch_urls(
    urls: impl async Iterator<Item = Url>,
    engine: impl HttpEngine,
) {
   while let Some(u) = urls.next().await {
       let data = engine.fetch(u).await;
       …
   }
}

There’s nothing wrong with this code, but it might be a bit surprising that I have to write impl async Iterator but I just write impl HttpEngine, even though both traits involve async functions. I can imagine that it would sometimes be hard to remember which traits are “always async” versus which ones are only “maybe async”.

…which also means traits can go from “always” to “maybe” async without a major version bump.

There is another tricky bit: imagine that I am authoring a library and I create a “always async” HttpEngine trait to start:

trait HttpEngine {
    async fn fetch(&mut self, url: Url) -> Vec<u8>;
}

but then later I want to issue a new version that offers a sync and an async version of HttpEngine. I can’t add a #[maybe(async)] to the trait declaration because, if I do so, then code using impl HttpEngine would suddenly be getting the sync version of the trait, whereas before they were getting the async version.

In other words, unless we force people to declare async traits up front, then changing a trait from “always async” to “maybe async” is a breaking change.

But writing async Trait for traits that are always async is annoying…

The points above are solid. But there are some flaws. The most obvious is that having to write async for every trait that uses an async function is likely to be pretty tedious. I can easily imagine that people writing async applications are going to use a lot of “always async” traits and I imagine that, each time they write impl async HttpEngine, they will think to themselves, “How many times do I have to tell the compiler this is async already?! We get it, we get it!!”

Put another way, the consistency argument (“how will I remember which traits need to be declared async?”) may not hold water in practice. I can imagine that for many applications the only “maybe async” traits are the core abstractions coming from libraries, like Iterator, and most of the other code is just “always async”. So actually it’s not that hard to remember which is which.

…and it’s not clear that traits will go from “always” to “maybe” async anyway…

But what about semver violations? Well, if my thesis above is correct, then it’s also true that there will be relatively few traits that need to go from “always async” to “maybe async”. Moreover, I imagine most libraries will know up front whether they expect to be sync or not. So maybe it’s not a big deal that this is a breaking change,

…and trait aliases would give a workaround for “always -> maybe” transitions anyway…

So, maybe it won’t happen in practice, but let’s imagine that we did define an always async HttpEngine and then later want to make the trait “maybe async”. Do we absolutely need a new major version of the crate? Not really, there is a workaround. We can define a new “maybe async” trait — let’s call it HttpFetch and then redefine HttpEngine in terms of HttpFetch:

// This is a trait alias. It’s an unstable feature that I would like to stabilize.
// Even without a trait alias, though, you could do this with a blanket impl.
trait HttpEngine = async HttpFetch;

#[maybe(async)]
trait HttpFetch { … }

This obviously isn’t ideal: you wind up with two names for the same underlying trait. Maybe you deprecate the old one. But it’s not the end of the world.

…and requiring async composes poorly with supertraits and trait aliases…

Actually, that last example brings up an interesting point. To truly ensure consistency, it’s not enough to say that “traits with async functions must be declared async”. We also need to be careful what we permit in trait aliases and supertraits. For example, imagine we have a trait UrlIterator that has an async Iterator as a supertrait…

trait UrlIterator: async Iterator<Item = Url> { }

…now people could write functions that take a impl UrlIterator, but it will still require await when you invoke its methods. So we didn’t really achieve consistency after all. The same thing would apply with a trait alias like trait UrlIterator = async Iterator<Item = Url>.

It’s possible to imagine a requirement like “to have a supertrait that is async, the trait must be async”, but — to me — that feels non-compositional. I’d like to be able to declare a trait alias trait A = … and have the … be able to be any sort of trait bounds, whether they’re async or not. It feels funny to have the async propagate out of the ... and onto the trait alias A.

…and, while this decision is hard to reverse, it can be reversed.

So, let’s say that we were to stabilize the ability to add async functions to any trait. And then later we find that we actually want to have maybe async traits and that we wish we had required people to write async explicitly all the time, because consistency and semver. Are we stuck?

Well, not really. There are options here. For example, we might might make it possible to write async (but not required) and then lint and warn when people don’t. Perhaps in another edition, we would make it mandatory. This is basically what we did with the dyn keyword. Then we could declare that making a trait always-async to maybe-async is not considered worthy of a major version, because people’s code that follows the lints and warnings will not be affected. If we had transitioned so that all code in the new edition required an async keyword even for “always async” traits, we could let people declare a trait to be “maybe async but only in the new edition”, which would avoid all breakage entirely.

In any case, I don’t really want to do those things. It’d be embarassing and confusing to stabilize SAFIT and then decide that “oh, no, you have to declare traits to be async”. I’d rather we just think through the arguments now and make a call. But it’s always good to know that, just in case you’re wrong, you have options.

My (current) conclusion: YAGNI

So which way to go? I think the question hinges a lot on how common we expect “maybe async” code to be. My expectation is that, even if we do support it, “maybe async” will be fairly limited. It will mostly apply to (a) code like Iterator that is orthogonal from I/O and (b) core I/O primitives like the Read trait or the File type. If we’re especially successful, then crates like reqwest (which currently offers both a sync and async interface) would be able to unify those into one. But application code I expect to largely be written to be either sync or async.

I also think that it’ll be relatively unusual to go from “always async” to “maybe async”. Not impossible, but unusual enough that either making a new major version or using the “renaming” trick will be fine.

For this reason, I lean towards NOT requiring async trait, and instead allowing async fn to be added to any trait. I am still hopeful we’ll add “maybe async” traits as well, but I think there won’t be a big problem of “always async” traits needing to change to maybe async. (Clearly we are going to want to go from “never async” to “maybe async”, since there are lots of traits like Iterator in the stdlib, but that’s a non-issue.)

The other argument in favor is that it’s closer to what we do today. There are lots of people using #[async_trait] and I’ve never heard anyone say “it’s so weird that you can write T: HttpEngine and don’t have to write T: async HttpEngine”. At minimum, if we were going to change to requiring the “async” keyword, I would want to give that change some time to bake on nightly before we stabilized it. This could well delay stabilization significantly.

If, in contrast, you believed that lots of code was going to be “maybe async”, then I think you would probably want the async keyword to be mandatory on traits. After all, since most traits are maybe async anyway, you’re going to need to write it a lot of the time.

I can feel you fixating on the #[maybe(async)] syntax. Resist the urge! There is no concrete proposal yet. ↩︎

https://smallcultfollowing.com/babysteps/blog/2023/03/12/to-async-trait-or-just-to-trait/

Trait transformers (send bounds, part 3)

Mar 3, 2023 Updated Mar 3, 2023

Show full content

I previously introduced the “send bound” problem, which refers to the need to add a Send bound to the future returned by an async function. This post continues my tour over the various solutions that are available. This post covers “Trait Transformers”. This proposal arose from a joint conversation with myself, Eric Holk, Yoshua Wuyts, Oli Scherer, and Tyler Mandry. It’s a variant of Eric Holk’s inferred async send bounds proposal as well as the work that Yosh/Oli have been doing in the keyword generics group. Those posts are worth reading as well, lots of good ideas there.1

Core idea: the trait transformer

A transformer is a way for a single trait definition to define multiple variants of that trait. For example, where T: Iterator means that T implements the Iterator trait we know and love, T: async Iterator means that T implements the async version of Iterator. Similarly, T: Send Iterator means that T implements the sendable version of Iterator (we’ll define both the “sendable version” and “async version” more precisely, don’t worry).

Transformers can be combined, so you can write T: async Send Iterator to mean “the async, sendable version”. They can also be distributed, so you can write T: async Send (Iterator + Factory) to mean the “async, sendable” version of both Iterator and Factory.

There are 3 proposed transformers:

async
const
any auto trait

The set of transformers is defined by the language and is not user extensible. This could change in the future, as transformers can be seen as a kind of trait alias.

The async transformer

The async transformer is used to choose whether functions are sync or async. It can only be applied to traits that opt-in by specifying which methods should be made into sync or async. Traits can opt-in either by declaring the async transformer to be mandatory, as follows…

async trait Fetch {
    async fn fetch(&mut self, url: Url) -> Data;
}

…or by making it optional, in which case we call it a “maybe-async” trait…

#[maybe(async)]
trait Iterator {
    type Item;
    
    #[maybe(async)]
    fn next(&mut self) -> Self::Item;
    
    fn size_hint(&self) -> Option<(usize, usize)>;
}

Here, the trait Iterator is the same Iterator we’ve always had, but async Iterator refers to the “async version” of Iterator, which means that it has an async next method (but still has a sync method size_hint).

(For the time being, maybe-async traits cannot have default methods, which avoids the need to deal with “maybe-async” code. This can change in the future.)

Trait transformer as macros

You can think of a trait transformer as being like a fancy kind of macro. When you write a maybe-async trait like Iterator above, you are effectively defining a template from which the compiler can derive a family of traits. You could think of the #[maybe(async)] annotation as a macro that derives two related traits, so that…

#[maybe(async)]
trait Iterator {
    type Item;
    
    #[maybe(async)]
    fn next(&mut self) -> Self::Item;
    
    fn size_hint(&self) -> Option<(usize, usize)>;
}

…would effectively expand into two traits, one with a sync next method and one with an async version…

trait Iterator { fn next(&mut self ) -> Self::Item; ... }
trait AsyncIterator { async fn next(&mut self) -> Self::Item; ... }

…when you have a where-clause like T: async Iterator, then, the compiler would be transforming that to T: AsyncIterator. In fact, Oli and Yosh implemented a procedural macro crate that does more-or-less exactly this.

The idea with trait transformers though is not to literally do expansions like the ones above, but rather to build those mechanisms into the compiler. This makes them more efficient, and also paves the way for us to have code that is generic over whether or not it is async, or expand the list of modifiers. But the “macro view” is useful to have in mind.

Always async traits

When a trait is declared like async trait Fetch, it only defines an async version, and it is an error to request the sync version like T: Fetch, you must write T: async Fetch.

Defining an async method without being always-async or maybe-async is disallowed:

trait Fetch {
    async fn fetch(&mut self, url: Url) -> Data; // ERROR
}

Forbidding traits of this kind means that traits can move from “always async” to “maybe async” without a breaking change. See the frequently asked questions for more details.

The const transformer

The const transformer works similarly to async. One can write

#[maybe(const)]
trait Compute {
    #[maybe(const)]
    fn a(&mut self);
    
    fn b(&mut self);
}

and then if you write T: const Compute it means that a must be a const fn but b need not be. Similarly one could write const trait Compute to indicate that the const transformer is mandatory.

The auto-trait transformer

Auto-traits can be used as a transformer. This is permitted on any (maybe) async trait or on traits that explicitly opt-in by defining #[maybe(Send)] variants. The default behavior of T: Send Foo for some trait Foo is that…

T must be Send
the future returned by any async method in Foo must be Send
the value returned by any RPITIT method must be Send2

Per these rules, given:

#[maybe(async)]
trait Iterator {
    type Item;

    #[maybe(async)]
    fn next(&mut self) -> Self::Item;
}

writing T: async Send Iterator would be equivalent to:

T: async Iterator<next(): Send> + Send

using the return type notation.

The #[maybe(Send)] annotation can be applied to associated types or functions…

#[maybe(Send)]
trait IntoIterator {
    #[maybe(Send)]
    type IntoIter;
    
    type Item;
}

…in which case writing T: Send IntoIterator would expand to T: IntoIterator<IntoIter: Send> + Send.

Frequently asked questions How is this different from eholk’s Inferred Async Send Bounds?

Eric’s proposal was similar in that it permitted T: async(Send) Foo as a similar sort of “macro” to get a bound that included Send bounds on the resulting futures. In that proposal, though the “send bounds” were tied to the use of async sugar, which means that you could no longer consider async fn to be sugar for a function returning an -> impl Future. That seemed like a bad thing, particularly since explicitly -> impl Future syntax is the only way to write an async fn that doesn’t capture all of its arguments.

How is this different from the keyword generics post?

Yosh and Oli posted a keyword generics update that included notation for “maybe async” traits (they wrote ?async) along with some other things. The ideas in this post are very similar to those, the main difference is treating Send as an independent transformer, similar to the previous question.

Should the auto-trait transformer be specific to each auto-trait, or generic?

As written, the auto-trait transformer is specific to a particular auto-trait, but it might be useful to be able to be generic over multiple (e.g., if you are maybe Send, you likely want to be maybe Send-Sync too, right?). You could imagine writing #[maybe(auto)] instead of #[maybe(Send)], but that’s kind of confusing, because an “always-auto” trait (i.e., an auto trait like Send) is quite a different thing from a “maybe-auto” trait (i.e., a trait that has a “sendable version”). OTOH users can’t define their own auto traits and likely will never be able to. Unclear.

Why make auto-trait transformer be opt-in?

You can imagine letting T: Send Foo mean T: Foo + Send for all traits Foo, without requiring Foo to be declared as maybe(Send). The problem is that this would mean that customizing the Send version of a trait for the first time is a semver breaking change, and so must be done at the same time the trait is introduced. This implies that no existing trait in the ecosystem could customize its Send version. Seems bad.

Will you permit async methods without the async transformer? Why or why not?

No. The following trait…

trait Http {
    async fn fetch(&mut self); // ERROR
}

…would get an error like “cannot use async in a trait unless it is declared as async or #[maybe(async)]. Ensuring that people write T: async Http and not just T: Http means that the trait can become “maybe async” later without breaking those clients. It also means that people would have to remember (when writing async code) whether a trait is “maybe async” or “always async” so they know whether to write T: async Http (for maybe-async traits) or T: Http (for always-async). This way, if the trait has async methods, you write async.

Why did you label methods in a #[maybe(async)] trait as #[maybe(async)] instead of async?

In the examples, I wrote maybe(async) traits like so:

#[maybe(async)]
trait Iterator {
    type Item;

    #[maybe(async)]
    fn next(&mut self) -> Self::Item;
}

Personally, I rather prefer the idea that inside a #[maybe(async)] block, you define the trait as it were always async…

#[maybe(async)]
trait Iterator {
    type Item;

    async fn next(&mut self) -> Self::Item;
}

…but then the async gets removed when used in a sync context. However, I changed it because I couldn’t figure out the right way to permit #[maybe(Send)] in this scenario. I can also imagine that it’s a bit confusing to write async fn when you maybe “maybe async”.

Why use an annotation (#[..]) like #[maybe(async)] instead of a keyword?

I don’t know, because ?async is hard to read, and we’ve got enough keywords? I’m open to bikeshedding here.

Do we still want return type notation?

Yes, RTN is useful for giving more precise specification of which methods should return send-futures (you may not want to require that all async methods are send, for example). It’s also needed internally by the compiler anyway as the “desugaring target” for the Send transformer.

Can we allow #[maybe] on types/functions?

Maybe!3 That’s basically full-on keyword generics. This proposal is meant as a stepping stone. It doesn’t permit code or types to be generic whether they are async/send/whatever, but it does permit us to define multiple versions of trait. To the language, it’s effectively a kind of macro, so that (i.e.) a single trait definition #[maybe(async)] trait Iterator effectively defines two traits, Iterator and AsyncIterator, and the T: async Iterator notation is being used to select the second one. (This is only an example, I don’t mean that users would literally be able to reference a AsyncIterator trait.)

What order are transformers applied?

Transformers must be written according to this grammar

Trait := async? const? Path* Path

where x? means optional x, x* means zero or more x, and the traits named in Path* must be auto-traits. The transformers (if present) are applied in order, so first things are made async, then const, then sendable. (I’m not sure if both async and const make any sense?)

Can auto-trait transformers let us genearlize over rc/arc?

Yosh at some point suggested that we could think of “send” or “not send” as another application of keyword generics, and that got me very excited. It’s a known problem that people have to define two versions of their structs (see e.g. the im and im-rc crates). Maybe we could permit something like

#[maybe(Send)]
struct Shared<T> {
    /* either Rc<T> or Arc<T>, depending */
}

and then permit variables of type Shared<u32> or Send Shared<u32>. The keywosrd generics proposals already are exploring the idea of structs whose types vary depending on whether they are async or not, so this fits in.

Conclusion

This post covered “trait transformers” as a possible solution the “send bounds” problem. Trait transformers are not exactly an alternative to the return type notation proposed earlier; they are more like a complement, in that they make the “easy easy”, but effectively provide a convenient desugaring to uses of return type notation.

The full set of solutions thus far are…

Return type notation (RTN)
- Example: T: Fetch<fetch(): Send>
- Pros: flexible and expressive
- Cons: verbose
eholk’s inferred async send bounds
- Example: T: async(Send) Fetch
- Pros: concise
- Cons: specific to async notation, doesn’t support -> impl Future functions; requires RTN for completeness
trait transformers (this post)
- Example: T: async Send Fetch
- Pros: concise
- Cons: requires RTN for completeness

I originally planned to have part 3 of this series simply summarize those posts, in fact, but I consider Trait Transformers an evolution of those ideas, and close enough that I’m not sure separate posts are needed. ↩︎
It’s unclear if Send Foo should always convert RPITIT return values to be Send, but it is clear that we want some way to permit one to write -> impl Future in a trait and have that be Send iff async methods are Send. ↩︎
See what I did there? ↩︎

https://smallcultfollowing.com/babysteps/blog/2023/03/03/trait-transformers-send-bounds-part-3/

Return type notation (send bounds, part 2)

Feb 13, 2023 Updated Feb 13, 2023

Show full content

In the previous post, I introduced the “send bound” problem, which refers to the need to add a Send bound to the future returned by an async function. I want to start talking about some of the ideas that have been floating around for how to solve this problem. I consider this a bit of an open problem, in that I think we know a lot of the ingredients, but there is a bit of a “delicate balance” to finding the right syntax and so forth. To start with, though, I want to introduce Return Type Notation, which is an idea that Tyler Mandry and I came up with for referring to the type returned by a trait method.

Recap of the problem

If we have a trait HealthCheck that has an async function check…

trait HealthCheck {
    async fn check(&mut self, server: Server);
}

…and then a function that is going to call that method check but in a parallel task…

fn start_health_check<H>(health_check: H, server: Server)
where
    H: HealthCheck + Send + 'static,
{ 
    …
}

…we don’t currently have a way to say that the future returned by calling H::check() is send. The where clause H: HealthCheck + Send says that the type H must be send, but it says nothing about the future that gets returned from calling check.

Core idea: A way to name “the type returned by a function”

The core idea of return-type notation is to let you write where-clauses that apply to <H as HealthCheck>::check(..), which means “any return type you can get by calling check as defined in the impl of HealthCheck for H”. This notation is meant to be reminiscent of the fully qualified notation for associated types, e.g. <T as Iterator>::Item. Just as we usually abbreviate associated types to T::Item, you would also typically abbreviate return type notation to H::check(..). The trait name is only needed when there is ambiguity.

Here is an example of how start_health_check would look using this notation:

fn start_health_check<H>(health_check: H, server: Server)
where
    H: HealthCheck + Send + 'static,
    H::check(..): Send, // <— return type notation

Here the where clause H::check(..): Send means “the type(s) returned when you call H::check must be Send. Since async functions return a future, this means that future must implement Send.

More compact notation

Although it has not yet been stabilized, RFC #2289 proposed a shorthand way to write bounds on associated types; something like T: Iterator<Item: Send> means “T implements Iterator and its associated type Item implements Send”. We can apply that same sugar to return-type notations:

fn start_health_check<H>(health_check: H, server: Server)
where
    H: HealthCheck<check(..): Send> + Send + 'static,
    //             ^^^^^^^^^

This is more concise, though also clearly kind of repetitive. (When I read it, I think “how many dang times do I have to write Send?” But for now we’re just trying to explore the idea, not evaluate its downsides, so let’s hold on that thought.)

Futures capture their arguments

Note that the where clause we wrote was

H::check(..): Send

and not

H::check(..): Send + ‘static

Moreover, if we were to add a 'static bound, the program would not compile. Why is that? The reason is that async functions in Rust desugar to returning a future that captures all of the function’s arguments:

trait HealthCheck {
    // async fn check(&mut self, server: Server);
    fn check<‘s>(&’s mut self, server: Server) -> impl Future<Output = ()> + ‘s;
    //           ^^^^^^^^^^^^                                                ^^
    //         The future captures `self`, so it requires the lifetime bound `'s` 
}

Because the future being returned captures self, and self has type &’s mut Self, the Future returned must capture ’s. Therefore, it is not ’static, and so the where-clause H::check(..): Send + ‘static doesn’t hold for all possible calls to check, since you are not required to give an argument of type &’static mut Self.

RTN with specific parameter types

Most of the time, you would use RTN to bound all possible return values from the function. But sometimes you might want to be more specific, and talk just about the return value for some specific argument types. As a silly example, we could have a function like

fn call_check_with_static<H>(h: &’static mut H)
where
   H: HealthCheck + ‘static,
   H::check(&’static mut H, Server): ‘static,

This function has a generic parameter H that is ’static and it gets a &’static mut H as argument. The where clause H::check(&’static mut H, Server): ‘static then says: if I call check with the argument &’static mut H, it will return a ‘static future. In contrast to the previous section, where we were talking about any possible return value from check, this where-clause is true and valid.

Desugaring RTN to associated types

To understand what RTN does, it’s best to think of the desugaring from async functions to associated types. This desugaring is exactly how Rust works internally, but we are not proposing to expose it to users directly, for reasons I’ll elaborate in a bit.

We saw earlier how an async fn desugars to a function that returns impl Future. Well, in a trait, returning impl Future can itself be desugared to a trait with a(generic) associated type:

trait HealthCheck {
    // async fn check(&mut self, server: Server);
    type Check<‘t>: Future<Output = ()> + ‘t;
    fn check<‘s>(&’s mut self, server: Server) -> Self::Check<‘s>;
}

When we write a where-clause like H::check(..): Send, that is then effectively a bound on this hidden associated type Check:

fn start_health_check<H>(health_check: H, server: Server)
where
    H: HealthCheck + Send + 'static,
    for<‘a> H::Check<‘a>: Send, // <— equivalent to `H::check(..): Send`

Generic methods

It is also possible to have generic async functions in traits. Imagine that instead of HealthCheck taking a specific Server type, we wanted to accept any type that implements the trait ServerTrait:

trait HealthCheckGeneric {
    async fn check_gen<S: ServerTrait>(&mut self, server: S);
}

We can still think of this trait as desugaring to a trait with an associated type:

trait HealthCheckGeneric {
    // async fn check<S>(&mut self, server: S) where S: ServerTrait,
    type CheckGen<‘t, S: ServerTrait>: Future<Output = ()> + ‘t;
   fn check_gen <‘s, S: ServerTrait>(&’s mut self, server: Server) -> Self::CheckGen<‘s, S>;
}

But if we want to write a where-clause like H::check_gen(..): Send, this would require us to support higher-ranked trait bounds over types and not just lifetimes:

fn start_health_check<H>(health_check: H, server: Server)
where
    H: HealthCheckGeneric + Send + 'static,
    for<‘a, S> H::CheckGen<‘a, S>: Send, // <—
    //     ^ for all types S…

As it happens, this sort of where-clause is something the types team is working on in our new solver design. I’m going to skip over the details, as it’s kind of orthogonal to the topic of how to write Send bounds.

One final note: just as you can specify a particular value for the argument types, you should be able to use turbofish to specify the value for generic parameters. So something like H::check_gen::<MyServer>(..): Send would mean “whenever you call check_gen on H with S = MyServer, the return type is Send”.

Using RTN outside of where-clauses

So far, all the examples I’ve shown you for RTN involved a where-clause. That is the most important context, but it should be possible to write RTN types any place you write a type. For the most part, this is just fine, but using the .. notation outside of a where-clause introduces some additional complications. Think of H::check — the precise type that is returned will depend on the lifetime of the first argument. So we could have one type H::check(&’a mut H, Server) and the return value would reference the lifetime ’a, but we could also have H::check(&’b mut H, Server), and the return value would reference the lifetime ’b. The .. notation really names a range of types. For the time being, I think we would simply say that .. is not allowed outside of a where-clause, but there are ways that you could make it make sense (e.g., it might be valid only when the return type doesn’t depend on the types of the parameters).

“Frequently asked questions”

That sums up our tour of the “return-type-notation” idea. In short:

You can write bounds like <T as Trait>::method(..): Send in a where-clause to mean “the method method from the impl of Trait for T returns a value that is Send, no matter what parameters I give it”.
Like an associated type, this would more commonly be written T::method(..), with the trait automatically determined.
You could also specify precise types for the parameters and/or generic types, like T::method(U, V).

Let’s dive into some of the common questions about this idea.

Why not just expose the desugared associated type directly?

Earlier I explained how H::check(..) would work by desugaring it to an associated type. So, why not just have users talk about that associated type directly, instead of adding a new notation for “the type returned by check”? The main reason is that it would require us to expose details about this desugaring that we don’t necessarily want to expose.

The most obvious detail is “what is the name of the associated type” — I think the only clear choice is to have it have the same name as the method itself, which is slightly backwards incompatible (since one can have a trait with an associated type and a method that has the same name), but easy enough to do over an edition.

We would also have to expose what generic parameters this associated type has. This is not always so simple. For example, consider this trait:

trait Dump {
   async fn dump(&mut self, data: &impl Debug);
}

If we want to desugar this to an associated type, what generics should that type have?

trait Dump {
    type Dump<…>: Future<Output = ()> + …;
    //        ^^^ how many generics go here?
    fn dump(&mut self, data: &impl Debug) -> Self::Dump<…>;
}

This function has two sources of “implicit” generic parameters: elided lifetimes and the impl Trait argument. One desugaring would be:

trait Dump {
    type Dump<‘a, ‘b, D: Debug>: Future<Output = ()> + ‘a + ‘b;
   fn dump<‘a, ‘b, D: Debug>(&’a mut self, data: &’b D) -> Self::Dump<‘a, ‘b, D>;
}

But, in this case, we could also have a simpler desugaring that uses just one lifetime parameter (this isn’t always the case):

trait Dump {
    type Dump<‘a, D: Debug>: Future<Output = ()> + ‘a;
   fn dump<‘a, D: Debug>(&’a mut self, data: &’a D) -> Self::Dump<‘a, D>;
}

Regardless of how we expose the lifetimes, the impl Trait argument also raises interesting questions. In ordinary functions, the lang-team generally favors not including impl Trait arguments in the list of generics (i.e., they can’t be specified by turbofish, their values are inferred from the argument types), although we’ve not reached a final decision there. That seems inconsistent with exposing the type parameter D.

All in all, the appeal of the RTN is that it skips over these questions, leaving the compiler room to desugar in any of the various equivalent ways. It also means users don’t have to understand the desugaring, and can just think about the “return value of check”.

Should H::check(..): Send mean that the future is Send, or the result of the future?

Some folks have pointed out that H::check(..): Send seems like it refers to the value you get from awaiting check, and not the future itself. This is particularly true since our async function notation doesn’t write the future explicitly, unlike (say) C# or TypeScript (in those languages, an async fn must return a task or promise type). This seems true, it will likely be a source of confusion — but it’s also consistent with how async functions work. For example:

trait Get {
    async fn get(&mut self) -> u32;
}

async fn bar<G: Get>(g: &mut G) {
    let f: impl Future<Output = u32> = g.get();
}

In this code, even though g.get() is declared to return u32, f is a future, not an integer. Writing G::get(..): Send thus talks about the future, not the integer.

Isn’t RTN kind of verbose?

Interesting fact: when I talk to people about what is confusing in Rust, the trait system ranks as high or higher than the borrow checker. If we take another look at our motivation example, I think we can start to see why:

fn start_health_check<H>(health_check: H, server: Server)
where
    H: HealthCheck<check(..): Send> + Send + 'static,

That where-clause basically just says “H is safe to use from other threads”, but it requires a pretty dense bit of notation! (And, of course, also demonstrates that the borrow checker and the trait system are not independent things, since ’static can be seen as a part of both, and is certainly a common source of confusion.) Wouldn’t it be nice if we had a more compact way to say that?

Now imagine you have a trait with a lot of methods:

trait AsyncOps {
    async fn op1(self);
    async fn op2(self);
    async fn op3(self);
}

Under the current proposal, to create an AsyncOps that can be (fully) used across threads, one would write:

fn do_async_ops<A>(health_check: H, server: Server)
where
    A: AsyncOps<op1(..): Send, op2(..): Send, op3(..): Send> + Send + 'static,

You could use a trait alias (if we stabilized them) to help here, but still, this seems like a problem!

But maybe that verbosity is useful?

Indeed! RTN is a very flexible notation. To continue with the AsyncOps example, we could write a function that says “the future returned by op1 must be send, but not the others”, which would be useful for a function like so:

async fn do_op1_in_parallel(a: impl AsyncOps<op1(..): Send + 'static>) {
    //                                       ^^^^^^^^^^^^^^^^^^^^^^^
    //                                       Return value of `op1` must be Send, static
    tokio::spawn(a.op1()).await;
}

Is RTN limited to async fn in traits?

All my examples have focused on async fn in traits, but we can use RTN to name the return types of any function anywhere. For example, given a function like get:

fn get() -> impl FnOnce() -> u32 {
    move || 22
}

we could allow you to write get() to name name the closure type that is returned:

fn foo() {
    let c: get() = get();
    let d: u32 = c();
}

This seems like it would be useful for things like iterator combinators, so that you can say things like “the iterator returned by calling map is Send”.

Why do we have to write ..?

OK, nobody asks this, but I do sometimes feel that writing .. just seems silly. We could say that you just write H::check(): Send to mean “for all parameters”. (In the case where the method has no parameters, then “for all parameters” is satisfied trivially.) That doesn’t change anything fundamental about the proposal but it lightens the “line noise” aspect a tad:

fn start_health_check<H>(health_check: H, server: Server)
where
    H: HealthCheck<check(): Send> + Send + 'static,

It does introduce some ambiguity. Did the user mean “for all parameters” or did they forget that check() has parameters? I’m not sure how this confusion is harmful, though. The main way I can see it coming about is something like this:

check() initially has zero parameters, and the user writes check(): Send.
In a later version of the program, a parameter is added, and now the meaning of check changes to “for all parameters” (although, as we noted before, that was arguably the meaning before).

There is a shift happening here, but what harm can it do? If the check still passes, then check(T): Send is true for any T. If it doesn’t, the user gets an error has to add an explicit type for this new parameter.

Can we really handle this in our trait solver?

As we saw when discussing generic methods, handling this feature in its full generality is a bit much for our trait solver today. But we could begin with a subset – for example, the notation can only be used in where-clauses and only for methods that are generic over lifetime parameters and not types. Tyler and I worked out a subset we believe would be readily implementable.

Conclusion

This post introduced return-type notation, an extension to the type grammar that allows you to refer to the return type of a trait method, and covered some of the pros/cons. Here is a rundown:

Pros:

Extremely flexible notation that lets us say precisely which methods must return Send types, and even lets us go into detail about which argument types they will be called with.
Avoids having to specify a desugaring to associated types precisely. For example, we don’t have to decide how to name that type, nor do we have to decide how many lifetime parameters it has, or whether impl Trait arguments become type parameters.
Can be used to refer to return values of things beyond async functions.

Cons:

New concept for users to learn — now they have associated types as well as associated return types.
Verbose even for common cases; doesn’t scale up to traits with many methods.

https://smallcultfollowing.com/babysteps/blog/2023/02/13/return-type-notation-send-bounds-part-2/

Async trait send bounds, part 1: intro

Feb 1, 2023 Updated Feb 1, 2023

Show full content

Nightly Rust now has support for async functions in traits, so long as you limit yourself to static dispatch. That’s super exciting! And yet, for many users, this support won’t yet meet their needs. One of the problems we need to resolve is how users can conveniently specify when they need an async function to return a Send future. This post covers some of the background on send futures, why we don’t want to adopt the solution from the async_trait crate for the language, and the general direction we would like to go. Follow-up posts will dive into specific solutions.

Why do we care about Send bounds?

Let’s look at an example. Suppose I have an async trait for performs some kind of periodic health check on a given server:

trait HealthCheck {
    async fn check(&mut self, server: &Server) -> bool;
}

Now suppose we want to write a function that, given a HealthCheck, starts a parallel task that runs that check every second, logging failures. This might look like so:

fn start_health_check<H>(health_check: H, server: Server)
where
    H: HealthCheck + Send + 'static,
{
    tokio::spawn(async move {
        while health_check.check(&server).await {
            tokio::time::sleep(Duration::from_secs(1)).await;
        }
        emit_failure_log(&server).await;
    });
}

So far so good! So what happens if we try to compile this? You can try it yourself if you use the async_fn_in_trait feature gate, you should see a compilation error like so:

error: future cannot be sent between threads safely
   --> src/lib.rs:15:18
    |
15  |       tokio::spawn(async move {
    |  __________________^
16  | |         while health_check.check(&server).await {
17  | |             tokio::time::sleep(Duration::from_secs(1)).await;
18  | |         }
19  | |         emit_failure_log(&server).await;
20  | |     });
    | |_____^ future created by async block is not `Send`
    |
    = help: within `[async block@src/lib.rs:15:18: 20:6]`, the trait `Send` is not implemented for `impl Future<Output = bool>`

The error is saying that the future for our task cannot be sent between threads. But why not? After all, the health_check value is both Send and ’static, so we know that health_check is safe to send it over to the new thread. But the problem lies elsewhere. The error has an attached note that points it out to us:

note: future is not `Send` as it awaits another future which is not `Send`
   --> src/lib.rs:16:15
    |
16  |         while health_check.check(&server).await {
    |               ^^^^^^^^^^^^^^^^^^^^^^^^^^^ await occurs here

The problem is that the call to check is going to return a future, and that future is not known to be Send. To see this more clearly, let’s desugar the HealthCheck trait slightly:

trait HealthCheck {
    // async fn check(&mut self, server: &Server) -> bool;
    fn check(&mut self, server: &Server) -> impl Future<Output = bool>;
                                           // ^ Problem is here! This returns a future, but not necessarily a `Send` future.
}

The problem is that check returns an impl Future, but the trait doesn’t say whether this future is Send or not. The compiler therefore sees that our task is going to be awaiting a future, but that future might not be sendable between threads.

What does the async-trait crate do?

Interestingly, if you rewrite the above example to use the async_trait crate, it compiles. What’s going on here? The answer is that the async_trait proc macro uses a different desugaring. Instead of creating a trait that yields -> impl Future, it creates a trait that returns a Pin<Box<dyn Future + Send>>. This means that the future can be sent between threads; it also means that the trait is dyn-safe.

This is a good answer for the async-trait crate, but it’s not a good answer for a core language construct as it loses key flexibility. We want to support async in single-threaded executors, where the Send bound is irrelevant, and we also to support async in no-std applications, where Box isn’t available. Moreover, we want to have key interop traits (e.g., Read) that can be used for all three of those applications at the same time. An approach like the used in async-trait cannot support a trait that works for all three of those applications at once.

How would we like to solve this?

Instead of having the trait specify whether the returned future is Send (or boxed, for that matter), our preferred solution is to have the start_health_check function declare that it requires check to return a sendable future. Remember that health_check already included a where clause specifying that the type H was sendable across threads:

fn start_health_check<H>(health_check: H, server: Server)
where
    H: HealthCheck + Send + 'static,
    // —————  ^^^^^^^^^^^^^^ “sendable to another disconnected thread”
    //     |
    // Implements the `HealthCheck` trait

Right now, this where clause says two independent things:

H implements HealthCheck;
values of type H can be sent to an independent task, which is really a combination of two things
- type H can be sent between threads (H: Send)
- type H contains no references to the current stack (H: ‘static)

What we want is to add syntax to specify an additional condition:

H implements HealthCheck and its check method returns a Send future

In other words, we don’t want just any type that implements HealthCheck. We specifically want a type that implements HealthCheck and returns a Send future.

Note the contrast to the desugaring approach used in the async_trait crate: in that approach, we changed what it means to implement HealthCheck to always require a sendable future. In this approach, we allow the trait to be used in both ways, but allow the function to say when it needs sendability or not.

The approach of “let the function specify what it needs” is very in-line with Rust. In fact, the existing where-clause demonstrates the same pattern. We don’t say that implementing HealthCheck implies that H is Send, rather we say that the trait can be implemented by any type, but allow the function to specify that H must be both HealthCheck and Send.

Next post: Let’s talk syntax

I’m going to leave you on a cliffhanger. This blog post setup the problem we are trying to solve: for traits with async functions, we need some kind of syntax for declaring that you want an implementation that returns Send futures, and not just any implementation. In the next set of posts, I’ll walk through our proposed solution to this, and some of the other approaches we’ve considered and rejected.

Appendix: Why does the returned future have to be send anyway?

Some of you may wonder why it matters that the future returned is not Send. After all, the only thing we are actually sending between threads is health_check — the future is being created on the new thread itself, when we call check. It is a bit surprising, but this is actually highlighting an area where async tasks are different from threads (and where we might consider future language extensions).

Async is intended to support a number of different task models:

Single-threaded: all tasks run in the same OS thread. This is a great choice for embedded systems, or systems where you have lightweight processes (e.g., Fuchsia 1).
Work-dealing, sometimes called thread-per-core: tasks run in multiple threads, but once a task starts in a thread, it never moves again.
Work-stealing: tasks start in one thread, but can migrate between OS threads while they execute.

Tokio’s spawn function supports the final mode (work-stealing). The key point here is that the future can move between threads at any await point. This means that it’s possible for the future to be moved between threads while awaiting the future returned by check. Therefore, any data in this future must be Send.

This might be surprising. After all, the most common example of non-send data is something like a (non-atomic) Rc. It would be fine to create an Rc within one async task and then move that task to another thread, so long as the task is paused at the point of move. But there are other non-Send types that wouldn’t work so well. For example, you might make a type that relies on thread-local storage; such a type would not be Send because it’s only safe to use it on the thread in which it was created. If that type were moved between threads, the system could break.

In the future, it might be useful to separate out types like Rc from other Send types. The distinguishing characteristic is that Rc can be moved between threads so long as all possible aliases are also moved at the same time. Other types are really tied to a specific thread. There’s no example in the stdlib that comes to mind, but it seems like a valid pattern for Rust today that I would like to continue supporting. I’m not sure yet the right way to think about that!

I have finally learned how to spell this word without having to look it up! 💪 ↩︎

https://smallcultfollowing.com/babysteps/blog/2023/02/01/async-trait-send-bounds-part-1-intro/

Rust in 2023: Growing up

Jan 20, 2023 Updated Jan 20, 2023

Show full content

When I started working on Rust in 2011, my daughter was about three months old. She’s now in sixth grade, and she’s started growing rapidly. Sometimes we wake up to find that her clothes don’t quite fit anymore: the sleeves might be a little too short, or the legs come up to her ankles. Rust is experiencing something similar. We’ve been growing tremendously fast over the last few years, and any time you experience growth like that, there are bound to be a few rough patches. Things that don’t work as well as they used to. This holds both in a technical sense — there are parts of the language that don’t seem to scale up to Rust’s current size — and in a social one — some aspects of how the projects runs need to change if we’re going to keep growing the way I think we should. As we head into 2023, with two years to go until the Rust 2024 edition, this is the theme I see for Rust: maturation and scaling.

TL;DR

In summary, these are (some of) the things I think are most important for Rust in 2023:

Implementing “the year of everywhere” so that you can make any function async, write impl Trait just about anywhere, and fully utilize generic associated types; planning for the Rust 2024 edition.
Beginning work on a Rust specification and integrating it into our processes.
Defining rules for unsafe code and smooth tooling to check whether you’re following them.
Supporting efforts to teach Rust in universities and elsewhere.
Improving our product planning and user feedback processes.
Refining our governance structure with specialized teams for dedicated areas, more scalable structure for broad oversight, and more intensional onboarding.

“The year of everywhere” and the 2024 edition

What do async-await, impl Trait, and generic parameters have in common? They’re all essential parts of modern Rust, that’s one thing. They’re also all, in my opinion, in a “minimum viable product” state. Each of them has some key limitations that make them less useful and more confusing than they have to be. As I wrote in “Rust 2024: The Year of Everywhere”, there are currently a lot of folks working hard to lift those limitations through a number of extensions:

Generic associated types (stabilized in October, now undergoing various improvements!)
Type alias impl trait (proposed for stabilization)
Async functions in traits and “return position impl Trait in traits” (static dispatch available on nightly, but more work is needed)
Polonius (under active discussion)

None of these features are “new”. They just take something that exists in Rust and let you use it more broadly. Nonetheless, I think they’re going to have a big impact, on experienced and new users alike. Experienced users can express more patterns more easily and avoid awkward workarounds. New users never have to experience the confusion that comes from typing something that feels like it should work, but doesn’t.

One other important point: Rust 2024 is just around the corner! Our goal is to get any edition changes landed on master this year, so that we can spend the next year doing finishing touches. This means we need to put in some effort to thinking ahead and planning what we can achieve.

Towards a Rust specification

As Rust grows, there is increasing need for a specification. Mara had a recent blog post outlining some of the considerations — and especially the distinction between a specification and standardization. I don’t see the need for Rust to get involved in any standards bodies — our existing RFC and open-source process works well. But I do think that for us to continue growing out the set of people working on Rust, we need a central definition of what Rust should do, and that we need to integrate that definition into our processes more thoroughly.

In addition to long-standing docs like the Rust Reference, the last year has seen a number of notable efforts towards a Rust specification. The Ferrocene language specification is the most comprehensive, covering the grammar, name resolution, and overall functioning of the compiler. Separately, I’ve been working on a project called a-mir-formality, which aims to be a “formal model” of Rust’s type system, including the borrow checker. And Ralf Jung has MiniRust, which is targeting the rules for unsafe code.

So what would an official Rust specification look like? Mara opened RFC 3355, which lays out some basic parameters. I think there are still a lot of questions to work out. Most obviously, how can we combine the existing efforts and documents? Each of them has a different focus and — as a result — a somewhat different structure. I’m hopeful that we can create a complementary whole.

Another important question is how to integrate the specification into our project processes. We’ve already got a rule that new language features can’t be stabilized until the reference is updated, but we’ve not always followed it, and the lang docs team is always in need of support. There are hopeful signs here: both the Foundation and Ferrocene are interested in supporting this effort.

Unsafe code

In my experience, most production users of Rust don’t touch unsafe code, which is as it should be. But almost every user of Rust relies on dependencies that do, and those dependencies are often the most critical systems.

At first, the idea of unsafe code seems simple. By writing unsafe, you gain access to new capabilities, but you take responsibility for using them correctly. But the more you look at unsafe code, the more questions come up. What does it mean to use those capabilities correctly? These questions are not just academic, they have a real impact on optimizations performed by the Rust compiler, LLVM, and even the hardware.

Eventually, we want to get to a place where those who author unsafe code have clear rules to follow, as well as simple tooling to test if their code violates those rules (think cargo test —unsafe). Authors who want more assurance than dynamic testing can provide should have access to static verifiers that can prove their crate is safe — and we should start by proving the standard library is safe.

We’ve been trying for some years to build that world but it’s been ridiculously hard. Lately, though, there have been some breakthroughs. Gankra’s experiments with strict_provenance APIs have given some hope that we can define a relatively simple provenance model that will support both arbitrary unsafe code trickery and aggressive optimization, and Ralf Jung’s aforementioned MiniRust shows how a Rust operational semantics could look. More and more crates test with miri to check their unsafe code, and for those who wish to go further, the kani verifier can check unsafe code for UB (more formal methods tooling here).

I think we need a renewed focus on unsafe code in 2023. The first step is already underway: we are creating the opsem team. Led by Ralf Jung and Jakob Degen, the opsem team has the job of defining “the rules governing unsafe code in Rust”. It’s been clear for some time that this area requires dedicated focus, and I am hopeful that the opsem team will help to provide that.

I would like to see progress on dynamic verification. In particular, I think we need a tool that can handle arbitrary binaries. miri is great, but it can’t be used to test programs that call into C code. I’d like to see something more like valgrind or ubsan, where you can test your Rust project for UB even if it’s calling into other languages through FFI.

Dynamic verification is great, but it is limited by the scope of your tests. To get true reliability, we need a way for unsafe code authors to do static verification. Building static verification tools today is possible but extremely painful. The compiler’s APIs are unstable and a moving target. The stable MIR project proposes to change that by providing a stable set of APIs that tool authors can build on.

Finally, the best unsafe code is the unsafe code you don’t have to write. Unsafe code provides infinite power, but people often have simpler needs that could be made safe with enough effort. Projects like cxx demonstrate the power of this approach. For Rust the language, safe transmute is the most promising such effort, and I’d like to see more of that.

Teaching Rust in universities

More and more universities are offering classes that make use of Rust, and recently many of these educators have come together in the Rust Edu initiative to form shared teaching materials. I think this is great, and a trend we should encourage. It’s helpful for the Rust community, of course, since it means more Rust programmers. I think it’s also helpful for the students: much like learning a functional programming language, learning Rust requires incorporating different patterns and structure than other languages. I find my programs tend to be broken into smaller pieces, and the borrow checker forces me to be more thoughtful about which bits of context each function will need. Even if you wind up building your code in other languages, those new patterns will influence the way you work.

Stronger connections to teacher can also be a great source of data for improving Rust. If we understand better how people learn Rust and what they find difficult, we can use that to guide our priorities and look for ways to make it better. This might mean changing the language, but it might also mean changing the tooling or error messages. I’d like to see us setup some mechanism to feed insights from Rust educators, both in universities but also trainers at companies like Ferrous Systems or Integer32, into the Rust teams.

One particularly exciting effort here is the research being done at Brown University1 by Will Crichton and Shriram Krisnamurthi. Will and Shriram have published an interactive version of the Rust book that includes quizzes. As a reader, these quizzes help you check that you understood the section. But they also provide feedback to the book authors on which sections are effective. And they allow for “A/B testing”, where you change the content of the book and see whether the quiz scores improve. Will and Shriram are also looking at other ways to deepen our understanding of how people learn Rust.

More insight and data into the user experience

As Rust has grown, we no longer have the obvious gaps in our user experience that there used to be (e.g., “no IDE support”). At the same time, it’s clear that the experience of Rust developers could be a lot smoother. There are a lot of great ideas of changes to make, but it’s hard to know which ones would be most effective. I would like to see a more coordinated effort to gather data on the user experience and transform it into actionable insights. Currently, the largest source of data that we have is the annual Rust survey. This is a great resource, but it only gives a very broad picture of what’s going on.

A few years back, the async working group collected “status quo” stories as part of its vision doc effort. These stories were immensely helpful in understanding the “async Rust user experience”, and they are still helping to shape the priorities of the async working group today. At the same time, that was a one-time effort, and it was focused on async specifically. I think that kind of effort could be useful in a number of areas.

I’ve already mentioned that teachers can provide one source of data. Another is simply going out and having conversations with Rust users. But I think we also need fine-grained data about the user experience. In the compiler team’s mid-year report, they noted (emphasis mine):

One more thing I want to point out: five of the ambitions checked the box in the survey that said “some of our work has reached Rust programmers, but we do not know if it has improved Rust for them.”

Right now, it’s really hard to know even basic things, like how many users are encountering compiler bugs in the wild. We have to judge that by how many comments people leave on a Github issue. Meanwhile, Esteban personally scours twitter to find out which error messages are confusing to people.2 We should look into better ways to gather data here. I’m a fan of (opt-in, privacy preserving) telemetry, but I think there’s a discussion to be had here about the best approach. All I know is that there has to be a better way.

Maturing our governance

In 2015, shortly after 1.0, RFC 1068 introduced the original Rust teams: libs, lang, compiler, infra, and moderation. Each team is an independent, decision-making entity, owning one particular aspect of Rust, and operating by consensus. The “Rust core team” was given the role of knitting them together and providing a unifying vision. This structure has been a great success, but as we’ve grown, it has started to hit some limits.

The first limiting point has been bringing the teams together. The original vision was that team leads—along with others—would be part of a core team that would provide a unifying technical vision and tend to the health of the project. It’s become clear over time though that there are really different jobs. Over this year, the various Rust teams, project directors, and existing core team have come together to define a new model for project-wide governance. This effort is being driven by a dedicated working group and I am looking forward to seeing that effort come to fruition this year.

The second limiting point has been the need for more specialized teams. One example near and dear to my heart is the new types team, which is focused on type and trait system. This team has the job of diving into the nitty gritty on proposals like Generic Associated Types or impl Trait, and then surfacing up the key details for broader-based teams like lang or compiler where necessary. The aforementioned opsem team is another example of this sort of team. I suspect we’ll be seeing more teams like this.

There continues to be a need for us to grow teams that do more than coding. The compiler team prioritization effort, under the leadership of apiraino, is a great example of a vital role that allows Rust to function but doesn’t involve landing PRs. I think there are a number of other “multiplier”-type efforts that we could use. One example would be “reporters”, i.e., people to help publish blog posts about the many things going on and spread information around the project. I am hopeful that as we get a new structure for top-level governance we can see some renewed focus and experimentation here.

Conclusion

Seven years since Rust 1.0 and we are still going strong. As Rust usage spreads, our focus is changing. Where once we had gaping holes to close, it’s now more a question of iterating to build on our success. But the more things change, the more they stay the same. Rust is still working to empower people to build reliable, performant programs. We still believe that building a supportive, productive tool for systems programming — one that brings more people into the “systems programming” tent — is also the best way to help the existing C and C++ programmers “hack without fear” and build the kind of systems they always wanted to build. So, what are you waiting for? Let’s get building!

In disclosure, AWS is a sponsor of this work. ↩︎
To be honest, Esteban will probably always do that, whatever we do. ↩︎

https://smallcultfollowing.com/babysteps/blog/2023/01/20/rust-in-2023-growing-up/

Rust 2024...the year of everywhere?

Sep 22, 2022 Updated Sep 22, 2022

Show full content

I’ve been thinking about what “Rust 2024” will look like lately. I don’t really mean the edition itself — but more like, what will Rust feel like after we’ve finished up the next few years of work? I think the answer is that Rust 2024 is going to be the year of “everywhere”. Let me explain what I mean. Up until now, Rust has had a lot of nice features, but they only work sometimes. By the time 2024 rolls around, they’re going to work everywhere that you want to use them, and I think that’s going to make a big difference in how Rust feels.

Async everywhere

Let’s start with async. Right now, you can write async functions, but not in traits. You can’t write async closures. You can’t use async drop. This creates a real hurdle. You have to learn the workarounds (e.g., the async-trait crate), and in some cases, there are no proper workarounds (e.g., for async-drop).

Thanks to a recent PR by Michael Goulet, static async functions in traits almost work on nightly today! I’m confident we can work out the remaining kinks soon and start advancing the static subset (i.e., no support for dyn trait) towards stabilization.

The plans for dyn, meanwhile, are advancing rapidly. At this point I think we have two good options on the table and I’m hopeful we can get that nailed down and start planning what’s needed to make the implementation work.

Once async functions in traits work, the next steps for core Rust will be figuring out how to support async closures and async drop. Both of them add some additional challenges — particularly async drop, which has some complex interactions with other parts of the language, as Sabrina Jewson elaborated in a great, if dense, blog post — but we’ve started to develop a crack team of people in the async working group and I’m confident we can overcome them.

There is also library work, most notably settling on some interop traits, and defining ways to write code that is portable across allocators. I would like to see more exploration of structured concurrency1, as well, or other alternatives to select! like the stream merging pattern Yosh has been advocating for.

Finally, for extra credit, I would love to see us integrate async/await keywords into other bits of the function body, permitting you to write common patterns more easily. Yoshua Wuyts has had a really interesting series of blog posts exploring these sorts of ideas. I think that being able to do for await x in y to iterate, or (a, b).await as a form of join, or async let x = … to create a future in a really lightweight way could be great.

Impl trait everywhere

The impl Trait notation is one of Rust’s most powerful conveniences, allowing you to omit specific types and instead talk about the interface you need. Like async, however, impl Trait can only be used in inherent functions and methods, and can’t be used for return types in traits, nor can it be used in type aliases, let bindings, or any number of other places it might be useful.

Thanks to Oli Scherer’s hard work over the last year, we are nearing stabilization for impl Trait in type aliases. Oli’s work has also laid the groundwork to support impl trait in let bindings, meaning that you will be able to do something like

let iter: impl Iterator<Item = i32> = (0..10);
//        ^^^^^^^^^^^^^ Declare type of `iter` to be “some iterator”.

Finally, the same PR that added support for async fns in traits also added initial support for return-position impl trait in traits. Put it all together, and we are getting very close the letting you use impl trait everywhere you might want to.

There is still at least one place where impl Trait is not accepted that I think it should be, which is nested in other positions. I’d like you to be able to write impl Fn(impl Debug), for example, to refer to “some closure that takes an argument of type impl Debug” (i.e., can be invoked multiple times with different debug types).

Generics everywhere

Generic types are a big part of how Rust libraries are built, but Rust doesn’t allow people to write generic parameters in all the places they would be useful, and limitations in the compiler prevent us from making full use of the annotations we do have.

Not being able to use generic types everywhere might seem abstract, particularly if you’re not super familiar with Rust. And indeed, for a lot of code, it’s not a big deal. But if you’re trying to write libraries, or to write one common function that will be used all over your code base, then it can quickly become a huge blocker. Moreover, given that Rust supports generic types in many places, the fact that we don’t support them in some places can be really confusing — people don’t realize that the reason their idea doesn’t work is not because the idea is wrong, it’s because the language (or, often, the compiler) is limited.

The biggest example of generics everywhere is generic associated types. Thanks to hard work by Jack Huey, Matthew Jasper, and a number of others, this feature is very close to hitting stable Rust — in fact, it is in the current beta, and should be available in 1.65. One caveat, though: the upcoming support for GATs has a number of known limitations and shortcomings, and it gives some pretty confusing errors. It’s still really useful, and a lot of people are already using it on nightly, but it’s going to require more attention before it lives up to its full potential.

You may not wind up using GATs in your code, but it will definitely be used in some of the libraries you rely on. GATs directly enables common patterns like Iterable that have heretofore been inexpressible, but we’ve also seen a lot of examples where its used internally to help libraries present a more unified, simpler interface to their users.

Beyond GATs, there are a number of other places where we could support generics, but we don’t. In the previous section, for example, I talked about being able to have a function with a parameter like impl Fn(impl Debug) — this is actually an example of a “generic closure”. That is, a closure that itself has generic arguments. Rust doesn’t support this yet, but there’s no reason we can’t.

Oftentimes, though, the work to realize “generics everywhere” is not so much a matter of extending the language as it is a matter of improving the compiler’s implementation. Rust’s current traits implementation works pretty well, but as you start to push the bounds of it, you find that there are lots of places where it could be smarter. A lot of the ergonomic problems in GATs arise exactly out of these areas.

One of the developments I’m most excited about in Rust is not any particular feature, it’s the formation of the new types team. The goal of this team is to revamp the compiler’s trait system implementation into something efficient and extensible, as well as building up a core set of contributors.

Making Rust feel simpler by making it more uniform

The topics in this post, of course, only scratch the surface of what’s going on in Rust right now. For example, I’m really excited about “everyday niceties” like let/else-syntax and if-let-pattern guards, or the scoped threads API that we got in 1.63. There are exciting conversations about ways to improve error messages. Cargo, the compiler, and rust-analyzer are all generally getting faster and more capable. And so on, and so on.

The pattern of having a feature that starts working somewhere and then extending it so that it works everywhere seems, though, to be a key part of how Rust development works. It’s inspiring also because it becomes a win-win for users. Newer users find Rust easier to use and more consistent; they don’t have to learn the “edges” of where one thing works and where it doesn’t. Experienced users gain new expressiveness and unlock patterns that were either awkward or impossible before.

One challenge with this iterative development style is that sometimes it takes a long time. Async functions, impl Trait, and generic reasoning are three areas where progress has been stalled for years, for a variety of reasons. That’s all started to shift this year, though. A big part of is the formation of new Rust teams at many companies, allowing a lot more people to have a lot more time. It’s also just the accumulation of the hard work of many people over a long time, slowly chipping away at hard problems (to get a sense for what I mean, read Jack’s blog post on NLL removal, and take a look at the full list of contributors he cited there — just assembling the list was impressive work, not to mention the actual work itself).

It may have been a long time coming, but I’m really excited about where Rust is going right now, as well as the new crop of contributors that have started to push the compiler faster and faster than it’s ever moved before. If things continue like this, Rust in 2024 is going to be pretty damn great.

Oh, my beloved moro! I will return to thee! ↩︎

https://smallcultfollowing.com/babysteps/blog/2022/09/22/rust-2024-the-year-of-everywhere/

Dyn async traits, part 9: call-site selection

Sep 21, 2022 Updated Sep 21, 2022

Show full content

After my last post on dyn async traits, some folks pointed out that I was overlooking a seemingly obvious possibility. Why not have the choice of how to manage the future be made at the call site? It’s true, I had largely dismissed that alternative, but it’s worth consideration. This post is going to explore what it would take to get call-site-based dispatch working, and what the ergonomics might look like. I think it’s actually fairly appealing, though it has some limitations.

If we added support for unsized return values…

The idea is to build on the mechanisms proposed in RFC 2884. With that RFC, you would be able to have functions that returned a dyn Future:

fn return_dyn() -> dyn Future<Output = ()> {
    async move { }
}

Normally, when you call a function, we can allocate space on the stack to store the return value. But when you call return_dyn, we don’t know how much space we need at compile time, so we can’t do that1. This means you can’t just write let x = return_dyn(). Instead, you have to choose how to allocate that memory. Using the APIs proposed in RFC 2884, the most common option would be to store it on the heap. A new method, Box::new_with, would be added to Box; it acts like new, but it takes a closure, and the closure can return values of any type, including dyn values:

let result = Box::new_with(|| return_dyn());
// result has type `Box<dyn Future<Output = ()>>`

Invoking new_with would be ergonomically unpleasant, so we could also add a .box operator. Rust has had an unstable box operator since forever, this might finally provide enough motivation to make it worth adding:

let result = return_dyn().box;
// result has type `Box<dyn Future<Output = ()>>`

Of course, you wouldn’t have to use Box. Assuming we have sufficient APIs available, people can write their own methods, such as something to do arena allocation…

let arena = Arena::new();
let result = arena.new_with(|| return_dyn());

…or perhaps a hypothetical maybe_box, which would use a buffer if that’s big enough, and use box otherwise:

let mut big_buf = [0; 1024];
let result = maybe_box(&mut big_buf, || return_dyn()).await;

If we add postfix macros, then we might even support something like return_dyn.maybe_box!(&mut big_buf), though I’m not sure if the current proposal would support that or not.

What are unsized return values?

This idea of returning dyn Future is sometimes called “unsized return values”, as functions can now return values of “unsized” type (i.e., types who size is not statically known). They’ve been proposed in RFC 2884 by Olivier Faure, and I believe there were some earlier RFCs as well. The .box operator, meanwhile, has been a part of “nightly Rust” since approximately forever, though its currently written in prefix form, i.e., box foo2.

The primary motivation for both unsized-return-values and .box has historically been efficiency: they permit in-place initialization in cases where it is not possible today. For example, if I write Box::new([0; 1024]) today, I am technically allocating a [0; 1024] buffer on the stack and then copying it into the box:

// First evaluate the argument, creating the temporary:
let temp: [u8; 1024] = ...;

// Then invoke `Box::new`, which allocates a Box...
let box: *const T = allocate_memory();

// ...and copies the memory in.
std::ptr::write(box, temp);

The optimizer may be able to fix that, but it’s not trivial. If you look at the order of operations, it requires making the allocation happen before the arguments are allocated. LLVM considers calls to known allocators to be “side-effect free”, but promoting them is still risky, since it means that more memory is allocated earlier, which can lead to memory exhaustion. The point isn’t so much to look at exactly what optimizations LLVM will do in practice, so much as to say that it is not trivial to optimize away the temporary: it requires some thoughtful heuristics.

How would unsized return values work?

This merits a blog post of its own, and I won’t dive into details. For our purposes here, the key point is that somehow when the callee goes to return its final value, it can use whatever strategy the caller prefers to get a return point, and write the return value directly in there. RFC 2884 proposes one solution based on generators, but I would want to spend time thinking through all the alternatives before we settled on something.

Using dynamic return types for async fn in traits

So, the question is, can we use dyn return types to help with async function in traits? Continuing with my example from my previous post, if you have an AsyncIterator trait…

trait AsyncIterator {
    type Item;
    
    async fn next(&mut self) -> Option<Self::Item>;
}

…the idea is that calling next on a dyn AsyncIterator type would yield dyn Future<Output = Option<Self::Item>>. Therefore, one could write code like this:

fn use_dyn(di: &mut dyn AsyncIterator) {
    di.next().box.await;
    //       ^^^^
}

The expression di.next() by itself yields a dyn Future. This type is not sized and so it won’t compile on its own. Adding .box produces a Box<dyn AsyncIterator>, which you can then await.3

Compared to the Boxing adapter I discussed before, this is relatively straightforward to explain. I’m not entirely sure which is more convenient to use in practice: it depends how many dyn values you create and how many methods you call on them. Certainly you can work around the problem of having to write .box at each call-site via wrapper types or helper methods that do it for you.

Complication: dyn AsyncIterator does not implement AsyncIterator

There is one complication. Today in Rust, every dyn Trait type also implements Trait. But can dyn AsyncIterator implement AsyncIterator? In fact, it cannot! The problem is that the AsyncIterator trait defines next as returning impl Future<..>, which is actually shorthand for impl Future<..> + Sized, but we said that next would return dyn Future<..>, which is ?Sized. So the dyn AsyncIterator type doesn’t meet the bounds the trait requires. Hmm.

But…does dyn AsyncIterator have to implement AsyncIterator?

There is no “hard and fixed” reason that dyn Trait types have to implement Trait, and there are a few good reasons not to do it. The alternative to dyn safety is a design like this: you can always create a dyn Trait value for any Trait, but you may not be able to use all of its members. For example, given a dyn Iterator, you could call next, but you couldn’t call generic methods like map. In fact, we’ve kind of got this design in practice, thanks to the where Self: Sized hack that lets us exclude methods from being used on dyn values.

Why did we adopt object safety in the first place? If you look back at RFC 255, the primary motivation for this rule was ergonomics: clearer rules and better error messages. Although I argued for RFC 255 at the time, I don’t think these motivations have aged so well. Right now, for example, if you have a trait with a generic method, you get an error when you try to create a dyn Trait value, telling you that you cannot create a dyn Trait from a trait with a generic method. But it may well be clearer to get an error at the point where you to call that generic method telling you that you cannot call generic methods through dyn Trait.

Another motivation for having dyn Trait implement Trait was that one could write a generic function with T: Trait and have it work equally well for object types. That capability is useful, but because you have to write T: ?Sized to take advantage of it, it only really works if you plan carefully. In practice what I’ve found works much better is to implement Trait to &dyn Trait.

What would it mean to remove the rule that dyn AsyncIterator: AsyncIterator?

I think the new system would be something like this…

You can always4 create a dyn Foo value. The dyn Foo type would define inherent methods based on the trait Foo that use dynamic dispatch, but with some changes:
- Async functions and other methods defined with -> impl Trait return -> dyn Trait instead.
- Generic methods, methods referencing Self, and other such cases are excluded. These cannot be handled with virtual dispatch.
If Foo is object safe using today’s rules, dyn Foo: Foo holds. Otherwise, it does not.5
- On a related but orthogonal note, I would like to make a dyn keyword required to declare dyn safety.

Implications of removing that rule

This implies that dyn AsyncIterator (or any trait with async functions/RPITIT6) will not implement AsyncIterator. So if I write this function…

fn use_any<I>(x: &mut I)
where
    I: ?Sized + AsyncIterator,
{
    x.next().await
}

…I cannot use it with I = dyn AsyncIterator. You can see why: it calls next and assumes the result is Sized (as promised by the trait), so it doesn’t add any kind of .box directive (and it shouldn’t have to).

What you can do is implement a wrapper type that encapsulates the boxing:

struct BoxingAsyncIterator<'i, I> {
    iter: &'i mut dyn AsyncIterator<Item = I>
}

impl<I> AsyncIterator for BoxingAsyncIterator<'i, I> {
    type Item = I;
    
    async fn next(&mut self) -> Option<Self::Item> {
        self.iter.next().box.await
    }
}

…and then you can call use_any(BoxingAsyncIterator::new(ai)).7

Limitation: what if you wanted to do stack allocation?

One of the goals with the previous proposal was to allow you to write code that used dyn AsyncIterator which worked equally well in std and no-std environments. I would say that goal was partially achieved. The core idea was that the caller would choose the strategy by which the future got allocated, and so it could opt to use inline allocation (and thus be no-std compatible) or use boxing (and thus be simple).

In this proposal, the call-site has to choose. You might think then that you could just choose to use stack allocation at the call-site and thus be no-std compatible. But how does one choose stack allocation? It’s actually quite tricky! Part of the problem is that async stack frames are stored in structs, and thus we cannot support something like alloca (at least not for values that will be live across an await, which includes any future that is awaited8). In fact, even outside of async, using alloca is quite hard! The problem is that a stack is, well, a stack. Ideally, you would do the allocation just before your callee returns, but that’s when you know how much memory you need. But at that time, your callee is still using the stack, so your allocation is on the wrong spot.9 I personally think we should just rule out the idea of using alloca to do stack allocation.

If we can’t use alloca, what can we do? We have a few choices. In the very beginning, I talked about the idea of a maybe_box function that would take a buffer and use it only for really large values. That’s kind of nifty, but it still relies on a box fallback, so it doesn’t really work for no-std.10 Might be a nice alternative to stackfuture though!11

You can also achieve inlining by writing wrapper types (something tmandry and I prototyped some time back), but the challenge then is that your callee doesn’t accept a &mut dyn AsyncIterator, it accepts something like &mut DynAsyncIter, where DynAsyncIter is a struct that you defined to do the wrapping.

All told, I think the answer in reality would be: If you want to be used in a no-std environment, you don’t use dyn in your public interfaces. Just use impl AsyncIterator. You can use hacks like the wrapper types internally if you really want dynamic dispatch.

Question: How much room is there for the compiler to get clever?

One other concern I had in thinking about this proposal was that it seemed like it was overspecified. That is, the vast majority of call-sites in this proposal will be written with .box, which thus specifies that they should allocate a box to store the result. But what about ideas like caching the box across invocations, or “best effort” stack allocation? Where do they fit in? From what I can tell, those optimizations are still possible, so long as the Box which would be allocated doesn’t escape the function (which was the same condition we had before).

The way to think of it: by writing foo().box.await, the user told us to use the boxing allocator to box the return value of foo. But we can then see that this result is passed to await, which takes ownership and later frees it. We can thus decide to substitute a different allocator, perhaps one that reuses the box across invocations, or tries to use stack memory; this is fine so long as we modifed the freeing code to match. Doing this relies on knowing that the allocated value is immediately returned to us and that it never leaves our control.

Conclusion

To sum up, I think for most users this design would work like so…

You can use dyn with traits that have async functions, but you have to write .box every time you call a method.
You get to use .box in other places too, and we gain at least some support for unsized return values.12
If you want to write code that is sometimes using dyn and sometimes using static dispatch, you’ll have to write some awkward wrapper types.13
If you are writing no-std code, use impl Trait, not dyn Trait; if you must use dyn, it’ll require wrapper types.

Initially, I dismissed call-site allocation because it violated dyn Trait: Trait and it didn’t allow code to be written with dyn that could work in both std and no-std. But I think that violating dyn Trait: Trait may actually be good, and I’m not sure how important that latter constraint truly is. Furthermore, I think that Boxing::new and the various “dyn adapters” are probably going to be pretty confusing for users, but writing .box on a call-site is relatively easy to explain (“we don’t know what future you need, so you have to box it”). So now it seems a lot more appealing to me, and I’m grateful to Olivier Faure for bringing it up again.

One possible extension would be to permit users to specify the type of each returned future in some way. As I was finishing up this post, I saw that matthieum posted an intriguing idea in this direction on the internals thread. In general, I do see a need for some kind of “trait adapters”, such that you can take a base trait like Iterator and “adapt” it in various ways, e.g. producing a version that uses async methods, or which is const-safe. This has some pretty heavy overlap with the whole keyword generics initiative too. I think it’s a good extension to think about, but it wouldn’t be part of the “MVP” that we ship first.

Thoughts?

Please leave comments in this internals thread, thanks!

Appendix A: the Output associated type

Here is an interesting thing! The FnOnce trait, implemented by all callable things, defines its associated type Output as Sized! We have to change this if we want to allow unsized return values.

In theory, this could be a big backwards compatibility hazard. Code that writes F::Output can assume, based on the trait, that the return value is sized – so if we remove that bound, the code will no longer build!

Fortunately, I think this is ok. We’ve deliberately restricted the fn types so you can only use them with the () notation, e.g., where F: FnOnce() or where F: FnOnce() -> (). Both of these forms expand to something which explicitly specifies Output, like F: FnOnce<(), Output = ()>. What this means is that even if you really generic code…

fn foo<F, R>(f: F)
where
    F: FnOnce<Output = R>
{
    let value: F::Output = f();
    ...
}

…when you write F::Output, that is actually normalized to R, and the type R has its own (implicit) Sized bound.

(There’s was actually a recent unsoundness related to this bound, closed by this PR, and we discussed exactly this forwards compatibility question on Zulip.)

Footnotes

I can hear you now: “but what about alloca!” I’ll get there. ↩︎
The box foo operator supported by the compiler has no current path to stabilization. There were earlier plans (see RFC 809 and RFC 1228), but we ultimately abandoned those efforts. Part of the problem, in fact, was that the precedence of box foo made for bad ergonomics: foo.box works much better. ↩︎
If you try to await a Box<dyn Future> today, you get an error that it needs to be pinned. I think we can solve that by implementing IntoFuture for Box<dyn Future> and having that convert it to Pin<Box<dyn Future>>. ↩︎
Or almost always? I may be overlooking some edge cases. ↩︎
Internally in the compiler, this would require modifying the definition of MIR to make “dyn dispatch” more first-class. ↩︎
Don’t know what RPITIT stands for?! “Return position impl trait in traits!” Get with the program! ↩︎
This is basically what the “magical” Boxing::new would have done for you in the older proposal. ↩︎
Brief explanation of why async and alloca don’t mix here. ↩︎
I was told Ada compiles will allocate the memory at the top of the stack, copy it over to the start of the function’s area, and then pop what’s left. Theoretically possible! ↩︎
You could imagine a version that aborted the code if the size is wrong, too, which would make it no-std safe, but not in a realiable way (aborts == yuck). ↩︎
Conceivably you could set the size to size_of(SomeOtherType) to automatically determine how much space is needed. ↩︎
I say at least some because I suspect many details of the more general case would remain unstable until we gain more experience. ↩︎
You have to write awkward wrapper types for now, anyway. I’m intrigued by ideas about how we could make that more automatic, but I think it’s way out of scope here. ↩︎

https://smallcultfollowing.com/babysteps/blog/2022/09/21/dyn-async-traits-part-9-callee-site-selection/

What I meant by the "soul of Rust"

Sep 19, 2022 Updated Sep 19, 2022

Show full content

Re-reading my previous post, I felt I should clarify why I called it the “soul of Rust”. The soul of Rust, to my mind, is definitely not being explicit about allocation. Rather, it’s about the struggle between a few key values — especially productivity and versatility1 in tension with transparency. Rust’s goal has always been to feel like a high-level but with the performance and control of a low-level one. Oftentimes, we are able to find a “third way” that removes the tradeoff, solving both goals pretty well. But finding those “third ways” takes time — and sometimes we just have to accept a certain hit to one value or another for the time being to make progress. It’s exactly at these times, when we have to make a difficult call, that questions about the “soul of Rust” starts to come into play. I’ve been thinking about this a lot, so I thought I would write a post that expands on the role of transparency in Rust, and some of the tensions that arise around it.

Why do we value transparency?

From the draft Rustacean Principles:

🔧 Transparent: “you can predict and control low-level details”

The C language, famously, maps quite closely to how machines typically operate. So much so that people have sometimes called it “portable assembly”.2 Both C++ and Rust are trying to carry on that tradition, but to add on higher levels of abstraction. Inevitably, this leads to tension. Operator overloading, for example, makes figuring out what a + b more difficult.3

Transparency gives you control

Transparency doesn’t automatically give high performance, but it does give control. This helps when crafting your system, since you can set it up to do what you want, but it also helps when analyzing its performance or debugging. There’s nothing more frustrating than starting at code for hours and hours only to realize that the source of your problem isn’t anywhere in the code you can see — it lies in some invisible interaction that wasn’t made explicit.

Transparency can cost performance

The flip-side of transparency is overspecification. The more directly your program maps to assembly, the less room the compiler and runtime have to do clever things, which can lead to lower performance. In Rust, we are always looking for places where we can be less transparent in order to gain performance — but only up to a point. One example is struct layout: the Rust compiler retains the freedom to reorder fields in a struct, enabling us to make more compact data structures. That’s less transparent than C, but usually not in a way that you care about. (And, of course, if you want to specify the order of your fields, we offer the #[repr] attribute.)

Transparency hurts versatility and productivity

The bigger price of transparency, though, is versatility. It forces everyone to care about low-level details that may not actually matter to the problem at hand4. Relevant to dyn async trait, most async Rust systems, for example, perform allocations left and right. The fact that a particular call to an async function might invoke Box::new is unlikely to be a performance problem. For those users, selecting a Boxing adapter adds to the overall complexity they have to manage for very little gain. If you’re working on a project where you don’t need peak performance, that’s going to make Rust less appealing than other languages. I’m not saying that’s bad, but it’s a fact.

A zero-sum situation…

At this moment in the design of async traits, we are struggling with a core question here of “how versatile can Rust be”. Right now, it feels like a “zero sum situation”. We can add in something like Boxing::new to preserve transparency, but it’s going to cost us some in versatility — hopefully not too much.

…for now?

I do wonder, though, if there’s a “third way” waiting somewhere. I hinted at this a bit in the previous post. At the moment, I don’t know what that third way is, and I think that requiring an explicit adapter is the most practical way forward. But it seems to me that it’s not a perfect sweet spot yet, and I am hopeful we’ll be able to subsume it into something more general.

Some ingredients that might lead to a ‘third way’:

With-clauses or capabilities: I am intrigued by the idea of [with-clauses] and the general idea of scoped capabilities. We might be able to think about the “default adapter” as something that gets specified via a with-clause?
Const evaluation: One of the niftier uses for const evaluation is for “meta-programming” that customizes how Rust is compiled. For example, we could potentially let you write a const fn that creates the vtable data structure for a given trait.
Profiles and portability: Can we find a better way to identify the kinds of transparency that you want, perhaps via some kind of ‘profiles’? I feel we already have ‘de facto’ profiles right now, but we don’t recognize them. “No std” is a clear example, but another would be the set of operating systems or architectures that you try to support. Recognizing that different users have different needs, and giving people a way to choose which one fits them best, might allow us to be more supportive of all our users — but then again, it might make it make Rust “modal” and more confusing.

Comments?

Please leave comments in this internals thread. Thanks!

Footnotes

I didn’t write about versatility in my original post: instead I focused on the hit to productivity. But as I think about it now, versatility is really what’s at play here — versatility really meant that Rust was useful for high-level things and low-level things, and I think that requiring an explicit dyn adaptor is unquestionably a hit against being high-level. Interestingly, I put versatility after transparency in the list, meaning that it was lower priority, and that seems to back up the decision to have some kind of explicit adaptor. ↩︎
At this point, some folks point out all the myriad subtleties and details that are actually hidden in C code. Hush you. ↩︎
I remember a colleague at a past job discovering that somebody had overloaded the -> operator in our codebase. They sent out an angry email, “When does it stop? Must I examine every dot and squiggle in the code?” (NB: Rust supports overloading the deref operator.) ↩︎
Put another way, being transparent about one thing can make other things more obscure (“can’t see the forest for the trees”). ↩︎

https://smallcultfollowing.com/babysteps/blog/2022/09/19/what-i-meant-by-the-soul-of-rust/

Dyn async traits, part 8: the soul of Rust

Sep 18, 2022 Updated Sep 18, 2022

Show full content

In the last few months, Tyler Mandry and I have been circulating a “User’s Guide from the Future” that describes our current proposed design for async functions in traits. In this blog post, I want to deep dive on one aspect of that proposal: how to handle dynamic dispatch. My goal here is to explore the space a bit and also to address one particularly tricky topic: how explicit do we have to be about the possibility of allocation? This is a tricky topic, and one that gets at that core question: what is the soul of Rust?

The running example trait

Throughout this blog post, I am going to focus exclusively on this example trait, AsyncIterator:

trait AsyncIterator {
    type Item;
    async fn next(&mut self) -> Option<Self::Item>;
}

And we’re particularly focused on the scenario where we are invoking next via dynamic dispatch:

fn make_dyn<AI: AsyncIterator>(ai: AI) {
    use_dyn(&mut ai); // <— coercion from `&mut AI` to `&mut dyn AsyncIterator`
}

fn use_dyn(di: &mut dyn AsyncIterator) {
    di.next().await; // <— this call right here!
}

Even though I’m focusing the blog post on this particular snippet of code, everything I’m talking about is applicable to any trait with methods that return impl Trait (async functions themselves being a shorthand for a function that returns impl Future).

The basic challenge that we have to face is this:

The caller function, use_dyn, doesn’t know what impl is behind the dyn, so it needs to allocate a fixed amount of space that works for everybody. It also needs some kind of vtable so it knows what poll method to call.
The callee, AI::next, needs to be able to package up the future for its next function in some way to fit the caller’s expectations.

The first blog post in this series 1 explains the problem in more detail.

A brief tour through the options

One of the challenges here is that there are many, many ways to make this work, and none of them is “obviously best”. What follows is, I think, an exhaustive list of the various ways one might handle the situation. If anybody has an idea that doesn’t fit into this list, I’d love to hear it.

Box it. The most obvious strategy is to have the callee box the future type, effectively returning a Box<dyn Future>, and have the caller invoke the poll method via virtual dispatch. This is what the async-trait crate does (although it also boxes for static dispatch, which we don’t have to do).

Box it with some custom allocator. You might want to box the future with a custom allocator.

Box it and cache box in the caller. For most applications, boxing itself is not a performance problem, unless it occurs repeatedly in a tight loop. Mathias Einwag pointed out if you have some code that is repeatedly calling next on the same object, you could have that caller cache the box in between calls, and have the callee reuse it. This way you only have to actually allocate once.

Inline it into the iterator. Another option is to store all the state needed by the function in the AsyncIter type itself. This is actually what the existing Stream trait does, if you think about it: instead of returning a future, it offers a poll_next method, so that the implementor of Stream effectively is the future, and the caller doesn’t have to store any state. Tyler and I worked out a more general way to do inlining that doesn’t require user intervention, where you basically wrap the AsyncIterator type in another type W that has a field big enough to store the next future. When you call next, this wrapper W stores the future into that field and then returns a pointer to the field, so that the caller only has to poll that pointer. One problem with inlining things into the iterator is that it only works well for &mut self methods, since in that case there can be at most one active future at a time. With &self methods, you could have any number of active futures.

Box it and cache box in the callee. Instead of inlining the entire future into the AsyncIterator type, you could inline just one pointer-word slot, so that you can cache and reuse the Box that next returns. The upside of this strategy is that the cached box moves with the iterator and can potentially be reused across callers. The downside is that once the caller has finished, the cached box lives on until the object itself is destroyed.

Have caller allocate maximal space. Another strategy is to have the caller allocate a big chunk of space on the stack, one that should be big enough for every callee. If you know the callees your code will have to handle, and the futures for those callees are close enough in size, this strategy works well. Eric Holk recently released the [stackfuture crate] that can help automate it. One problem with this strategy is that the caller has to know the size of all its callees.

Have caller allocate some space, and fall back to boxing for large callees. If you don’t know the sizes of all your callees, or those sizes have a wide distribution, another strategy might be to have the caller allocate some amount of stack space (say, 128 bytes) and then have the callee invoke Box if that space is not enough.

Alloca on the caller side. You might think you can store the size of the future to be returned in the vtable and then have the caller “alloca” that space — i.e., bump the stack pointer by some dynamic amount. Interestingly, this doesn’t work with Rust’s async model. Async tasks require that the size of the stack frame is known up front.

Side stack. Similar to the previous suggestion, you could imagine having the async runtimes provide some kind of “dynamic side stack” for each task.2 We could then allocate the right amount of space on this stack. This is probably the most efficient option, but it assumes that the runtime is able to provide a dynamic stack. Runtimes like embassy wouldn’t be able to do this. Moreover, we don’t have any sort of protocol for this sort of thing right now. Introducing a side-stack also starts to “eat away” at some of the appeal of Rust’s async model, which is designed to allocate the “perfect size stack” up front and avoid the need to allocate a “big stack per task”.3

Can async functions used with dyn be “normal”?

One of my initial goals for async functions in traits was that they should feel “as natural as possible”. In particular, I wanted you to be able to use them with dynamic dispatch in just the same way as you would a synchronous function. In other words, I wanted this code to compile, and I would want it to work even if use_dyn were put into another crate (and therefore were compiled with no idea of who is calling it):

fn make_dyn<AI: AsyncIterator>(ai: AI) {
    use_dyn(&mut ai);
}

fn use_dyn(di: &mut dyn AsyncIterator) {
    di.next().await;
}

My hope was that we could make this code work just as it is by selecting some kind of default strategy that works most of the time, and then provide ways for you to pick other strategies for those code where the default strategy is not a good fit. The problem though is that there is no single default strategy that seems “obvious and right almost all of the time”…

Strategy Downside Box it (with default allocator) requires allocation, not especially efficient Box it with cache on caller side requires allocation Inline it into the iterator adds space to AI, doesn’t work for &self Box it with cache on callee side requires allocation, adds space to AI, doesn’t work for &self Allocate maximal space can’t necessarily use that across crates, requires extensive interprocedural analysis Allocate some space, fallback uses allocator, requires extensive interprocedural analysis or else random guesswork Alloca on the caller side incompatible with async Rust Side-stack requires cooperation from runtime and allocation The soul of Rust

This is where we get to the “soul of Rust”. Looking at the above table, the strategy that seems the closest to “obviously correct” is “box it”. It works fine with separate compilation, fits great with Rust’s async model, and it matches what people are doing today in practice. I’ve spoken with a fair number of people who use async Rust in production, and virtually all of them agreed that “box by default, but let me control it” would work great in practice.

And yet, when we floated the idea of using this as the default, Josh Triplett objected strenuously, and I think for good reason. Josh’s core concern was that this would be crossing a line for Rust. Until now, there is no way to allocate heap memory without some kind of explicit operation (though that operation could be a function call). But if we wanted make “box it” the default strategy, then you’d be able to write “innocent looking” Rust code that nonetheless is invoking Box::new. In particular, it would be invoking Box::new each time that next is called, to box up the future. But that is very unclear from reading over make_dyn and use_dyn.

As an example of where this might matter, it might be that you are writing some sensitive systems code where allocation is something you always do with great care. It doesn’t mean the code is no-std, it may have access to an allocator, but you still would like to know exactly where you will be doing allocations. Today, you can audit the code by hand, scanning for “obvious” allocation points like Box::new or vec![]. Under this proposal, while it would still be possible, the presence of an allocation in the code is much less obvious. The allocation is “injected” as part of the vtable construction process. To figure out that this will happen, you have to know Rust’s rules quite well, and you also have to know the signature of the callee (because in this case, the vtable is built as part of an implicit coercion). In short, scanning for allocation went from being relatively obvious to requiring a PhD in Rustology. Hmm.

On the other hand, if scanning for allocations is what is important, we could address that in many ways. We could add an “allow by default” lint to flag the points where the “default vtable” is constructed, and you could enable it in your project. This way the compiler would warn you about the possible future allocation. In fact, even today, scanning for allocations is actually much harder than I made it ought to be: you can easily see if your function allocates, but you can’t easily see what its callees do. You have to read deeply into all of your dependencies and, if there are function pointers or dyn Trait values, figure out what code is potentially being called. With compiler/language support, we could make that whole process much more first-class and better.

In a way, though, the technical arguments are besides the point. “Rust makes allocations explicit” is widely seen as a key attribute of Rust’s design. In making this change, we would be tweaking that rule to be something like ”Rust makes allocations explicit most of the time”. This would be harder for users to understand, and it would introduce doubt as whether Rust really intends to be the kind of language that can replace C and C++4.

Looking to the Rustacean design principles for guidance

Some time back, Josh and I drew up a draft set of design principles for Rust. It’s interesting to look back on them and see what they have to say about this question:

⚙️ Reliable: “if it compiles, it works”
🐎 Performant: “idiomatic code runs efficiently”
🥰 Supportive: “the language, tools, and community are here to help”
🧩 Productive: “a little effort does a lot of work”
🔧 Transparent: “you can predict and control low-level details”
🤸 Versatile: “you can do anything with Rust”

Boxing by default, to my mind, scores as follows:

🐎 Performant: meh. The real goal with performant is that the cleanest code also runs the fastest. Boxing on every dynamic call doesn’t meet this goal, but something like “boxing with caller-side caching” or “have caller allocate space and fall back to boxing” very well might.
🧩 Productive: yes! Virtually every production user of async Rust that I’ve talked to has agreed that having code box by default would (but giving the option to do something else for tight loops) would be a great sweet spot for Rust.
🔧 Transparent: no. As I wrote before, understanding when a call may box now requires a PhD in Rustology, so this definitely fails on transparency.

(The other principles are not affected in any notable way, I don’t think.)

What the “user’s guide from the future” suggests

These considerations led Tyler and I to a different design. In the “User’s Guide From the Future” document from before, you’ll see that it does not accept the running example just as is. Instead, if you were to compile the example code we’ve been using thus far, you’d get an error:

error[E0277]: the type `AI` cannot be converted to a
              `dyn AsyncIterator` without an adapter
 --> src/lib.rs:3:23
  |
3 |     use_dyn(&mut ai);
  |                  ^^ adapter required to convert to `dyn AsyncIterator`
  |
  = help: consider introducing the `Boxing` adapter,
    which will box the futures returned by each async fn
3 |     use_dyn(&mut Boxing::new(ai));
                     ++++++++++++  +

As the error suggests, in order to get the boxing behavior, you have to opt-in via a type that we called Boxing5:

fn make_dyn<AI: AsyncIterator>(ai: AI) {
    use_dyn(&mut Boxing::new(ai));
    //          ^^^^^^^^^^^
}

fn use_dyn(di: &mut dyn AsyncIterator) {
    di.next().await;
}

Under this design, you can only create a &mut dyn AsyncIterator when the caller can verify that the next method returns a type from which a dyn* can be constructed. If that’s not the case, and it’s usually not, you can use the Boxing::new adapter to create a Boxing<AI>. Via some kind of compiler magic that ahem we haven’t fully worked out yet6, you could coerce a Boxing<AI> into a dyn AsyncIterator.

The details of the Boxing type need more work7, but the basic idea remains the same: require users to make some explicit opt-in to the default vtable strategy, which may indeed perform allocation.

How does Boxing rank on the design principles?

To my mind, adding the Boxing adapter ranks as follows…

🐎 Performant: meh. This is roughly the same as before. We’ll come back to this.
🥰 Supportive: yes! The error message guides you to exactly what you need to do, and hopefully links to a well-written explanation that can help you learn about why this is required.
🧩 Productive: meh. Having to add Boxing::new call each time you create a dyn AsyncIterator is not great, but also on-par with other Rust papercuts.
🔧 Transparent: yes! It is easy to see that boxing may occur in the future now.

This design is now transparent. It’s also less productive than before, but we’ve tried to make up for it with supportiveness. “Rust isn’t always easy, but it’s always helpful.”

Improving performance with a more complex ABI

One thing that bugs me about the “box by default” strategy is that the performance is only “meh”. I like stories like Iterator, where you write nice code and you get tight loops. It bothers me that writing “nice” async code yields a naive, middling efficiency story.

That said, I think this is something we could fix in the future, and I think we could fix it backwards compatibly. The idea would be to extend our ABI when doing virtual calls so that the caller has the option to provide some “scratch space” for the callee. For example, we could then do things like analyze the binary to get a good guess as to how much stack space is needed (either by doing dataflow or just by looking at all implementations of AsyncIterator). We could then have the caller reserve stack space for the future and pass a pointer into the callee — the callee would still have the option of allocating, if for example, there wasn’t enough stack space, but it could make use of the space in the common case.

Interestingly, I think that if we did this, we would also be putting some pressure on Rust’s “transparency” story again. While Rust’s leans heavily on optimizations to get performance, we’ve generally restricted ourselves to simple, local ones like inlining; we don’t require interprocedural dataflow in particular, although of course it helps (and LLVM does it). But getting a good estimate of how much stack space to reserve for potential calleees would violate that rule (we’d also need some simple escape analysis, as I describe in Appendix A). All of this adds up to a bit of ‘performance unpredictability’. Still, I don’t see this as a big problem, particularly since the fallback is just to use Box::new, and as we’ve said, for most users that is perfectly adequate.

Picking another strategy, such as inlining

Of course, maybe you don’t want to use Boxing. It would also be possible to construct other kinds of adapters, and they would work in a similar fashion. For example, an inlining adapter might look like:

fn make_dyn<AI: AsyncIterator>(ai: AI) {
    use_dyn(&mut InlineAsyncIterator::new(ai));
    //           ^^^^^^^^^^^^^^^^^^^^^^^^
}

The InlineAsyncIterator<AI> type would add the extra space to store the future, so that when the next method is called, it writes the future into its own fields and then returns it to the caller. Similarly, a cached box adapter might be &mut CachedAsyncIterator::new(ai), only it would use a field to cache the resulting Box.

You may have noticed that the inline/cached adapters include the name of the trait. That’s because they aren’t relying on compiler magic like Boxing, but are instead intended to be authored by end-users, and we don’t yet have a way to be generic over any trait definition. (The proposal as we wrote it uses macros to generate an adapter type for any trait you wish to adapt.) This is something I’d love to address in the future. You can read more about how adapters work here.

Conclusion

OK, so let’s put it all together into a coherent design proposal:

You cannot coerce from an arbitrary type AI into a dyn AsyncIterator. Instead, you must select an adaptor:
- Typically you want Boxing, which has a decent performance profile and “just works”.
- But users can write their own adapters to implement other strategies, such as InlineAsyncIterator or CachingAsyncIterator.
From an implementation perspective:
- When invoked via dynamic dispatch, async functions return a dyn* Future. The caller can invoke poll via virtual dispatch and invoke the (virtual) drop function when it’s ready to dispose of the future.
- The vtable created for Boxing<AI> will allocate a box to store the future AI::next() and use that to create the dyn* Future.
- The vtable for other adapters can use whatever strategy they want. InlineAsyncIterator<AI>, for example, stores the AI::next() future into a field in the wrapper, takes a raw pointer to that field, and creates a dyn* Future from this raw pointer.
Possible future extension for better performance:8
- We modify the ABI for async trait functions (or any trait function using return-position impl trait) to allow the caller to optionally provide stack space. The Boxing adapter, if such stack space is available, will use it to avoid boxing when it can. This would have to be coupled with some compiler analysis to figure out how much to stack space to pre-allocate.

This lets us express virtually any pattern. Its even possible to express side-stacks, if the runtime provides a suitable adapter (e.g., TokioSideStackAdapter::new(ai)), though if side-stacks become popular I would rather consider a more standard means to expose them.

The main downsides to this proposal are:

Users have to write Boxing::new, which is a productivity and learnability hit, but it avoids a big hit to transparency. Is that the right call? I’m still not entirely sure, though my heart increasingly says yes. It’s also something we could revisit in the future (e.g., and add a default adapter).
If we opt to modify the ABI, we’re adding some complexity there, but in exchange for potentially quite a lot of performance. I would expect us not to do this initially, but to explore it as an extension in the future once we have more data about how important it is.

There is one pattern that we can’t express: “have caller allocate maximal space”. This pattern guarantees that heap allocation is not needed; the best we can do is a heuristic that tries to avoid heap allocation, since we have to consider public functions on crate boundaries and the like. To offer a guarantee, the argument type needs to change from &mut dyn AsyncIterator (which accepts any async iterator) to something narrower. This would also support futures that escape the stack frame (see Appendix A below). It seems likely that these details don’t matter, and that either inline futures or heuristics would suffice, but if not, a crate like stackfuture remains an option.

Comments?

Please leave comments in this internals thread. Thanks!

Appendix A: futures that escape the stack frame

In all of this discussion, I’ve been assuming that the async call was followed closely by an await. But what happens if the future is not awaited, but instead is moved into the heap or other locations?

fn foo(x: &mut dyn AsyncIterator<Item = u32>) -> impl Future<Output = Option<u32>> + ‘_ {
    x.next()
}

For boxing, this kind of code doesn’t pose any problem at all. But if we had allocated space on the stack to store the future, examples like this would be a problem. So long as the scratch space is optional, with a fallback to boxing, this is no problem. We can do an escape analysis and avoid the use of scratch space for examples like this.

Footnotes

Written in Sep 2020, egads! ↩︎
I was intrigued to learn that this is what Ada does, and that Ada features like returning dynamically sized types are built on this model. I’m not sure how SPARK and other Ada subsets that target embedded spaces manage that, I’d like to learn more about it. ↩︎
Of course, without a side stack, we are left using mechanisms like Box::new to cover cases like dynamic dispatch or recursive functions. This becomes a kind of pessimistically sized segmented stack, where we allocate for each little piece of extra state that we need. A side stack might be an appealing middle ground, but because of cases like embassy, it can’t be the only option. ↩︎
Ironically, C++ itself inserts implicit heap allocations to help with coroutines! ↩︎
Suggestions for a better name very welcome. ↩︎
Pay no attention to the compiler author behind the curtain. 🪄 🌈 Avert your eyes! ↩︎
e.g., if you look closely at the User’s Guide from the Future, you’ll see that it writes Boxing::new(&mut ai), and not &mut Boxing::new(ai). I go back and forth on this one. ↩︎
I should clarify that, while Tyler and I have discussed this, I don’t know how he feels about it. I wouldn’t call it ‘part of the proposal’ exactly, more like an extension I am interested in. ↩︎

https://smallcultfollowing.com/babysteps/blog/2022/09/18/dyn-async-traits-part-8-the-soul-of-rust/

Come contribute to Salsa 2022!

Aug 18, 2022 Updated Aug 18, 2022

Show full content

Have you heard of the Salsa project? Salsa is a library for incremental computation – it’s used by rust-analyzer, for example, to stay responsive as you type into your IDE (we have also discussed using it in rustc, though more work is needed there). We are in the midst of a big push right now to develop and release Salsa 2022, a major new revision to the API that will make Salsa far more natural to use. I’m writing this blog post both to advertise that ongoing work and to put out a call for contribution. Salsa doesn’t yet have a large group of maintainers, and I would like to fix that. If you’ve been looking for an open source project to try and get involved in, maybe take a look at our Salsa 2022 tracking issue and see if there is an issue you’d like to tackle?

So wait, what does Salsa do?

Salsa is designed to help you build programs that respond to rapidly changing inputs. The prototypical example is a compiler, especially an IDE. You’d like to be able to do things like “jump to definition” and keep those results up-to-date even as the user is actively typing. Salsa can help you build programs that manage that.

The key way that Salsa achieves reuse is through memoization. The idea is that you define a function that does some specific computation, let’s say it has the job of parsing the input and creating the Abstract Syntax Tree (AST):

fn parse_program(input: &str) -> AST { }

Then later I have other functions that might take parts of that AST and operate on them, such as type-checking:

fn type_check(function: &AstFunction) { }

In a setup like this, I would like to have it so that when my base input changes, I do have to re-parse but I don’t necessarily have to run the type checker. For example, if the only change to my progam was to add a comment, then maybe my AST is not affected, and so I don’t need to run the type checker again. Or perhaps the AST contains many functions, and only one of them changed, so while I have to type check that function, I don’t want to type check the others. Salsa can help you manage this sort of thing automatically.

What is Salsa 2022 and how is it different?

The original salsa system was modeled very closely on the [rustc query system]. As such, it required you to structure your program entirely in terms of functions and queries that called one another. All data was passed through return values. This is a very powerful and flexible system, but it can also be kind of mind-bending sometimes to figure out how to “close the loop”, particularly if you wanted to get effective re-use, or do lazy computation.

Just looking at the parse_program function we saw before, it was defined to return a complete AST:

fn parse_program(input: &str) -> AST { }

But that AST has, internally, a lot of structure. For example, perhaps an AST looks like a set of functions:

struct Ast {
    functions: Vec<AstFunction>
}

struct AstFunction {
    name: Name,
    body: AstFunctionBody,
}

struct AstFunctionBody {
    ...
}

Under the old Salsa, changes were tracked at a pretty coarse-grained level. So if your input changed, and the content of any function body changed, then your entire AST was considered to have changed. If you were naive about it, this would mean that everything would have to be type-checked again. In order to get good reuse, you had to change the structure of your program pretty dramatically from the “natural structure” that you started with.

Enter: tracked structs

The newer Salsa introduces tracked structs, which makes this a lot easier. The idea is that you can label a struct as tracked, and now its fields become managed by the database:

#[salsa::tracked]
struct AstFunction {
    name: Name,
    body: AstFunctionBody,
}

When a struct is declared as tracked, then we also track accesses to its fields. This means that if the parser produces the same set of functions, then its output is considered not to have changed, even if the function bodies are different. When the type checker reads the function body, we’ll track that read independently. So if just one function has changed, only that function will be type checked again.

Goal: relatively natural

The goal of Salsa 2022 is that you should be able to convert a program to use Salsa without dramatically restructuring it. It should still feel quite similar to the ’natural structure’ that you would have used if you didn’t care about incremental reuse.

Using techniques like tracked structs, you can keep the pattern of a compiler as a kind of “big function” that passes the input through many phases, while still getting pretty good re-use:

fn typical_compiler(input: &str) -> Result {
    let ast = parse_ast(input);
    for function in &ast.functions {
        type_check(function);
    }
    ...
}

Salsa 2022 also has other nice features, such as accumulators for managing diagnostics and built-in interning.

If you’d like to learn more about how Salsa works, check out the overview page or read through the (WIP) tutorial, which covers the design of a complete compiler and interpreter.

How to get involved

As I mentioned, the purpose of this blog post is to serve as a call for contribution. Salsa is a cool project but it doesn’t have a lot of active maintainers, and we are actively looking to recruit new people.

The Salsa 2022 tracking issue contains a list of possible items to work on. Many of those items have mentoring instructions, just search for things tagged with good first issue. There is also documentation of salsa’s internal structure on the main web page that can help you navigate the code base. Finally, we have a Zulip instance where we hang out and chat (the #good-first-issue stream is a good place to ask for help!)

https://smallcultfollowing.com/babysteps/blog/2022/08/18/come-contribute-to-salsa-2022/

Many modes: a GATs pattern

Jun 27, 2022 Updated Jun 27, 2022

Show full content

As some of you may know, on May 4th Jack Huey opened a PR to stabilize an initial version of generic associated types. The current version is at best an MVP: the compiler support is limited, resulting in unnecessary errors, and the syntax is limited, making code that uses GATs much more verbose than I’d like. Nonetheless, I’m super excited, since GATs unlock a lot of interesting use cases, and we can continue to smooth out the rough edges over time. However, folks on the thread have raised some strong concerns about GAT stabilization, including asking whether GATs are worth including in the language at all. The fear is that they make Rust the language too complex, and that it would be better to just use them as an internal building block for other, more accessible features (like async functions and [return position impl trait in traits][RPITIT]). In response to this concern, a number of people have posted about how they are using GATs. I recently took some time to deep dive into these comments and to write about some of the patterns that I found there, including a pattern I am calling the “many modes” pattern, which comes from the chumsky parser combinator library. I posted about this pattern on the thread, but I thought I would cross-post my write-up here to the blog as well, because I think it’s of general interest.

General thoughts from reading the examples

I’ve been going through the (many, many) examples that people have posted where they are relying on GATs and look at them in a bit more detail. A few interesting things jumped out at me as I read through the examples:

Many of the use-cases involve GATs with type parameters. There has been some discussion of stabilizing “lifetime-only” GATs, but I don’t think that makes sense from any angle. It’s more complex for the implementation and, I think, more confusing for the user. But also, given that the “workaround” for not having GATs tends to be higher-ranked trait bounds (HRTB), and given that those only work for lifetimes, it means we’re losing one of the primary benefits of GATs in practice (note that I do expect to get HRTB for types in the near-ish future).
GATs allowed libraries to better hide details from their clients. This is precisely because they could make a trait hierarchy that more directly captured the “spirit” of the trait, resulting in bounds like M: Mode instead of higher-ranked trait bounds (in some cases, the HRTB would have to be over types, like for<X> M: Mode<X>, which isn’t even legal in Rust…yet).

As I read, I felt this fit a pattern that I’ve experienced many times but hadn’t given a name to: when traits are being used to describe a situation that they don’t quite fit, the result is an explosion of where-clauses on the clients. Sometimes you can hide these via supertraits or something, but those complex bounds are still visible in rustdoc, still leak out in error mesages, and don’t generally “stay hidden” as well as you’d like. You’ll see this come up here when I talk about how you would model this pattern in Rust today, but it’s a comon theme across all examples. Issue #95 on the RustAudio crate for example says, “The first [solution] would be to make PortType generic over a 'a lifetime…however, this has a cascading effect, which would force all downstream users of port types to specify their lifetimes”. Pythonesque made a simpler point here, “Without GATs, I ended up having to make an Hkt trait that had to be implemented for every type, define its projections, and then make everything heavily parametric and generic over the various conversions.”

The “many modes” pattern (chumsky)

The first example I looked at closely was the chumsky parsing library. This is leveraging a pattern that I would call the “many modes” pattern. The idea is that you have some “core function” but you want to execute this function in many different modes. Ideally, you’d like to define the modes independently from the function, and you’d like to be able to add more modes later without having to change the function at all. (If you’re familiar with Haskell, monads are an example of this pattern; the monad specifies the “mode” in which some simple sequential function is executed.)

chumsky is a parser combinator library, so the “core function” is a parse function, defined in the Parser trait. Each Parser trait impl contains a function that indicates how to parse some particular construct in the grammar. Normally, this parser function builds up a data structure representing the parsed data. But sometimes you don’t need the full results of the parse: sometimes you might just like to know if the parse succeeds or fails, without building the parsed version. Thus, the “many modes” pattern: we’d like to be able to define our parser and then execute it against one of two modes, emit or check. The emit mode will build the data structure, but check will just check if the parse succeeds.

In the past, chumsky only had one mode, so they always built the data structure. This could take significant time and memory. Adding the “check” mode let’s them skip that, which is a significant performance win. Moreover, the modes are encapsulated within the library traits, and aren’t visible to end-users. Nice!

How did chumsky model modes with GATs?

Chumsky added a Mode trait, encapsulated as part of their internals module. Instead of directly constructing the results from parsing, the Parser impls invoke methods on Mode with closures. This allows the mode to decide which parts of the parsing to execute and which to skip. So, in check mode, the Mode would decide not to execute the closure that builds the output data structure, for example.

Using this approach, the Parser trait does indeed have several ’entrypoint’ methods, but they are all defaulted and just invoke a common implementation method called go:

pub trait Parser<'a, I: Input + ?Sized, E: Error<I::Token> = (), S: 'a = ()> {
    type Output;
    
    fn parse(&self, input: &'a I) -> Result<Self::Output, E> ... {
        self.go::<Emit>(...)
    }

    fn check(&self, input: &'a I) -> Result<(), E> ... {
        self.go::<Check>(...)
    }
    
    #[doc(hidden)]
    fn go<M: Mode>(&self, inp: &mut InputRef<'a, '_, I, E, S>) -> PResult<M, Self::Output, E>
    where
        Self: Sized;
}

Implementations of Parser just specify the go method. Note that the impls are, presumably, either contained within chumsky or generated by chumsky proc-macros, so the go method doesn’t need to be documented. However, even if go were documented, the trait bounds certainly look quite reasonable. (The type of inp is a bit…imposing, admittedly.)

So how is the Mode trait defined? Just to focus on the GAT, the trait look likes this:

pub trait Mode {
    type Output<T>;
    ...
}

Here, the T represents the result type of “some parser parsed in this mode”. GATs thus allow us to define a Mode that is independent from any particular Parser. There are two impls of Mode (also internal to chumsky):

Check, defined like struct Check; impl Mode for Check { type Output<T> = (); ... }. In other words, no matter what parser you use, Check just builds a () result (success or failure is propagated inepdendently of the mode).
Emit, defined like struct Emit; impl Mode for Emit { type Output<T> = T; ... }. In Emit mode, the output is exactly what the parser generated.

Note that you could, in theory, produce other modes. For example, a Count mode that not only computes success/failure but counts the number of nodes parsed, or perhaps a mode that computes hashes of the resulting parsed value. Moreover, you could add these modes (and the defaulted methods in Parser) without breaking any clients.

How could you model this today?

I was trying to think how one might model this problem with traits today. All the options I came up with had significant downsides.

Multiple functions on the trait, or multiple traits. One obvious option would be to use multiple functions in the parse trait, or multiple traits:

// Multiple functions
trait Parser { fn parse(); fn check(); }

// Multiple traits
trait Parser: Checker { fn parse(); }
trait Checker { fn check(); }

Both of these approaches mean that defining a new combinator requires writing the same logic twice, once for parse and once for check, but with small variations, which is both annoying and a great opportunity for bugs. It also means that if chumsky ever wanted to define a new mode, they would have to modify every implementation of Parser (a breaking change, to boot).

Mode with a type parameter. You could try defining a the mode trait with a type parameter, like so…

trait ModeFor<T> {
    type Output;
    ...
}

The go function would then look like

fn go<M: ModeFor<Self::Output>>(&self, inp: &mut InputRef<'a, '_, I, E, S>) -> PResult<M, Self::Output, E>
where
    Self: Sized;

In practice, though, this doesn’t really work, for a number of reasons. One of them is that the Mode trait includes methods like combine, which take the output of many parsers, not just one, and combine them together. Good luck writing that constraint with ModeFor. But even ignoring that, lacking HRTB, the signature of go itself is incomplete. The problem is that, given some impl of Parser for some parser type MyParser, MyParser only knows that M is a valid mode for its particular output. But maybe MyParser plans to (internally) use some other parser combinators that produce different kinds of results. Will the mode M still apply to those? We don’t know. We’d have to be able to write a HRTB like for<O> Mode<O>, which Rust doesn’t support yet:

fn go<M: for<O> Mode<O>>(&self, inp: &mut InputRef<'a, '_, I, E, S>) -> PResult<M, Self::Output, E>
where
    Self: Sized;

But even if Rust did support it, you can see that the Mode<T> trait doesn’t capture the user’s intent as closely as the Mode trait from Chumsky did. The Mode trait was defined independently from all parsers, which is what we wanted. The Mode<T> trait is defined relative to some specific parser, and then it falls to the go function to say “oh, I want this to be a mode for all parsers” using a HRTB.

Using just HRTB (which, again, Rust doesn’t have), you could define another trait…

trait Mode: for<O> ModeFor<O> {}

trait ModeFor<O> {}

…which would allow us to write M: Mode on go against, but it’s hard to argue this is simpler than the original GAT variety. This extra ModeFor trait has a “code smell” to it, it’s hard to understand why it is there. Whereas before, you implemented the Mode trait in just the way you think about it, with a single impl that applies to all parsers…

impl Mode for Check {
    type Output<T> = ();
    ...
}

…you now write an impl of ModeFor, where one “instance” of the impl applies to only one parser (which has output type O). It feels indirect:

impl<O> ModeFor<O> for Check {
    type Output = ();
    ...
}

How could you model this with RPITIT?

It’s also been proposed that we should keep GATs, but only as an implementation detail for things like return position impl Trait in traits (RPITIT) or async functions. This implies that we could model the “many modes” pattern with RPITIT. If you look at the Mode trait, though, you’ll see that this simply doesn’t work. Consider the combine method, which takes the results from two parsers and combines them to form a new result:

fn combine<T, U, V, F: FnOnce(T, U) -> V>(
    x: Self::Output<T>,
    y: Self::Output<U>,
    f: F,
) -> Self::Output<V>;

How could we write this in terms of a function that returns impl Trait?

Other patterns

In this post, I went through the chumsky pattern in detail. I’ve not had time to dive quite as deep into other examples, but I’ve been reading through them and trying to extract out patterns. Here are a few patterns I extracted so far:

The “generic scopes” pattern (smithay, playground):
- In the Smithay API, if you have some variable r: R where R: Renderer, you can invoke r.render(|my_frame| ...). This will invoke your callback with some frame my_frame that you can then modify. The thing is that the type of my_frame depends on the type of renderer that you have; moreover, frames often include thread-local data and so should only be accessible to during that callback.
- I called this the “generic scopes” pattern because, at least from a types POV, it is kind of a generic version of APIs like std::thread::scope. The scope function also uses a callback to give limited access to a variable (the “thread scope”), but in the case of std::thread::scope, the type of that scope is hard-coded to be std::thread::Scope, whereas here, we want the specific type to depend on the renderer.
- Thanks to GATs, you can express that pretty cleanly, so that the only bound you need is R: Renderer. As with “many modes”, if you tried to express it using features today, you can get part of the way there, but the bounds will be complex and involve HRTB.
The “pointer types” pattern:
- I didn’t dig deep enough into Pythonesque’s hypotheticals, but this comment seemed to be describing a desire to talk about “pointer types” in the abstract, which is definitely a common need; looking at the comits from Veloren that pythonesque also cited, this might be a kind of “pointer types” pattern, but I think I might also call it “many modes”.
The “iterable” pattern:
- In this pattern, you would like a way to say where C: Iterable, meaning that C is a collection with an iter method which fits the signature fn iter(&self) -> impl Iterator<Item = &T>. This is distinct from IntoIterator because it takes &self and thus we can iterate over the same collection many times and concurrently.
- The most common workaround is to return a Box<dyn> (as in graphene) or a collection (as in metamolectular). Neither is zero-cost, which can be a problem in tight loops, as commented here. You can also use HRTB (as rustc does, which is complex and leaky.

Did I miss something?

Maybe you see a way to express the “many modes” pattern (or one of the other patterns I cited) in Rust today that works well? Let me know by commenting on the thread.

(Since posting this, it occurs to me that one could probably use procedural macros to achieve some similar goals, though I think this approach would also have significant downsides.)

https://smallcultfollowing.com/babysteps/blog/2022/06/27/many-modes-a-gats-pattern/

What it feels like when Rust saves your bacon

Jun 15, 2022 Updated Jun 15, 2022

Show full content

You’ve probably heard that the Rust type checker can be a great “co-pilot”, helping you to avoid subtle bugs that would have been a royal pain in the !@#!$! to debug. This is truly awesome! But what you may not realize is how it feels in the moment when this happens. The answer typically is: really, really frustrating! Usually, you are trying to get some code to compile and you find you just can’t do it.

As you come to learn Rust better, and especially to gain a bit of a deeper understanding of what is happening when your code runs, you can start to see when you are getting a type-check error because you have a typo versus because you are trying to do something fundamentally flawed.

A couple of days back, I had a moment where the compiler caught a really subtle bug that would’ve been horrible had it been allowd to compile. I thought it would be fun to narrate a bit how it played out, and also take the moment to explain a bit more about temporaries in Rust (a common source of confusion, in my observations).

Code available in this repository

All the code for this blog post is available in a github repository.

Setting the scene: lowering the AST

In the compiler, we first represent Rust programs using an Abstract Syntax Tree (AST). I’ve prepared a standalone example that shows roughly how the code looks today (of course the real thing is a lot more complex). The AST in particular is found in the ast module containing various data structures that map closely to Rust syntax. So for example we have a Ty type that represents Rust types:

pub enum Ty {
    ImplTrait(TraitRef),
    NamedType(String, Vec<Ty>),
    // ...
}

pub struct Lifetime {
    // ...
}

The impl Trait notation references a TraitRef, which stores the Trait part of things:

pub struct TraitRef {
    pub trait_name: String,
    pub parameters: Parameters,
}

pub enum Parameters {
    AngleBracket(Vec<Parameter>),
    Parenthesized(Vec<Ty>),
}

pub enum Parameter {
    Ty(Ty),
    Lifetime(Lifetime),
}

Note that the parameters of the trait come in two varieties, angle-bracket (e.g., impl PartialEq<T> or impl MyTrait<'a, U>) and parenthesized (e.g., impl FnOnce(String, u32)). These two are slightly different – parenthesized parameters, for example, only accept types, whereas angle-bracket accept types or lifetimes.

After parsing, this AST gets translated to something called High-level Intermediate Representation (HIR) through a process called lowering. The snippet doesn’t include the HIR, but it includes a number of methods like lower_ty that take as input an AST type and produce the HIR type:

impl Context {
    fn lower_ty(&mut self, ty: &ast::Ty) -> hir::Ty {
        match ty {
            // ... lots of stuff here
            // A type like `impl Trait`
            ast::Ty::ImplTrait(trait_ref) => {
                do_something_with(trait_ref);
            }

            // A type like `Vec<T>`, where `Vec` is the name and
            // `[T]` are the `parameters`
            ast::Ty::NamedType(name, parameters) => {
                for parameter in parameters {
                    self.lower_ty(parameter);
                }
            }
        }
        // ...
    }
}

Each method is defined on this Context type that carries some common state, and the methods tend to call one another. For example, lower_signature invokes lower_ty on all of the input (argument) types and on the output (return) type:

impl Context {
    fn lower_signature(&mut self, sig: &ast::Signature) -> hir::Signature {
        for input in &sig.inputs {
            self.lower_ty(input);
        }

        self.lower_ty(&sig.output);

        ...
    }
}

Our story begins

Santiago Pastorino is working on a refactoring to make it easier to support returning impl Trait values from trait functions. As part of that, he needs to collect all the impl Trait types that appear in the function arguments. The challenge is that these types can appear anywhere, and not just at the top level. In other words, you might have fn foo(x: impl Debug), but you might also have fn foo(x: Box<(impl Debug, impl Debug)>). Therefore, we decided it would make sense to add a vector to Context and have lower_ty collect the impl Trait types into it. That way, we can find the complete set.

To do this, we started by adding the vector into this Context. We’ll store the TraitRef from each impl Trait type:

struct Context<'ast> {
    saved_impl_trait_types: Vec<&'ast ast::TraitRef>,
    // ...
}

To do this, we had to add a new lifetime parameter, 'ast, which is meant to represent the lifetime of the AST structure itself. In other words, saved_impl_trait_types stores references into the AST. Of course, once we did this, the compiler got upset and we had to go modify the impl block that references Context:

impl<'ast> Context<'ast> {
    ...
}

Now we can modify the lower_ty to push the trait ref into the vector:

impl<'ast> Context<'ast> {
    fn lower_ty(&mut self, ty: &ast::Ty) {
        match ty {
            ...
            
            ast::Ty::ImplTrait(...) => {
                // 👇 push the types into the vector 👇
                self.saved_impl_trait_types.push(ty);
                do_something();
            }

            ast::Ty::NamedType(name, parameters) => {
                ... // just like before
            }
            
            ...
        }
    }
}

At this point, the compiler gives us an error:

error[E0621]: explicit lifetime required in the type of `ty`
   --> examples/b.rs:125:42
    |
119 |     fn lower_ty(&mut self, ty: &ast::Ty) -> hir::Ty {
    |                                -------- help: add explicit lifetime `'ast` to the type of `ty`: `&'ast ast::Ty`
...
125 |                 self.impl_trait_tys.push(trait_ref);
    |                                          ^^^^^^^^^ lifetime `'ast` required

Pretty nice error, actually! It’s pointing out that we are pushing into this vector which needs references into “the AST”, but we haven’t declared in our signature that the ast::Ty must actually from “the AST”. OK, let’s fix this:

impl<'ast> Context<'ast> {
    fn lower_ty(&mut self, ty: &'ast ast::Ty) {
        // had to add 'ast here 👆, just like the error message said
        ...
    }
}

Propagating lifetimes everywhere

Of course, now we start getting errors in the functions that call lower_ty. For example, lower_signature says:

error[E0621]: explicit lifetime required in the type of `sig`
  --> examples/b.rs:71:18
   |
65 |     fn lower_signature(&mut self, sig: &ast::Signature) -> hir::Signature {
   |                                        --------------- help: add explicit lifetime `'ast` to the type of `sig`: `&'ast ast::Signature`
...
71 |             self.lower_ty(input);
   |                  ^^^^^^^^ lifetime `'ast` required

The fix is the same. We tell the compiler that the ast::Signature is part of “the AST”, and that implies that the ast::Ty values owned by the ast::Signature are also part of “the AST”:

impl<'ast> Context<'ast> {
    fn lower_signature(&mut self, sig: &'ast ast::Signature) -> hir::Signature {
        //        had to add 'ast here 👆, just like the error message said
        ...
    }
}

Great. This continues for a bit. But then… we hit this error:

error[E0597]: `parameters` does not live long enough
  --> examples/b.rs:92:53
   |
58 | impl<'ast> Context<'ast> {
   |      ---- lifetime `'ast` defined here
...
92 |                 self.lower_angle_bracket_parameters(&parameters);
   |                 ------------------------------------^^^^^^^^^^^-
   |                 |                                   |
   |                 |                                   borrowed value does not live long enough
   |                 argument requires that `parameters` is borrowed for `'ast`
93 |             }
   |             - `parameters` dropped here while still borrowed

What’s this about?

Uh oh…

Jumping to that line, we see this function lower_trait_ref:

impl Context<'ast> {
    // ...
    fn lower_trait_ref(&mut self, trait_ref: &'ast ast::TraitRef) -> hir::TraitRef {
        match &trait_ref.parameters {
            ast::Parameters::AngleBracket(parameters) => {
                self.lower_angle_bracket_parameters(&parameters);
            }
            ast::Parameters::Parenthesized(types) => {
                let parameters: Vec<_> = types.iter().cloned().map(ast::Parameter::Ty).collect();
                self.lower_angle_bracket_parameters(&parameters); // 👈 error is on this line
                
            }
        }

        hir::TraitRef
    }
    // ...
}

So what’s this about? Well, the purpose of this code is a bit clever. As we saw before, Rust has two syntaxes for trait-refs, you can use parentheses like FnOnce(u32), in which case you only have types, or you can use angle brackets like Foo<'a, u32>, in which case you could have either lifetimes or types. So this code is normalizing to the angle-bracket notation, which is more general, and then using the same lowering helper function.

Wait! Right there! That was the moment!

What?

That was the moment that Rust saved you a world of pain!

It was? It just kind of seemed like an annoying, and I will say, kind of confusing compilation error. What the heck is going on? The problem here is that parameters is a local variable. It is going to be freed as soon as lower_trait_ref returns. But it could happen that lower_trait_ref calls lower_ty which takes a reference to the type and stores it into the saved_impl_trait_types vector. Then, later, some code would try to use that reference, and access freed memory. That would sometimes work, but often not – and if you forgot to test with parenthesized trait refs, the code would work fine for ever, so you’d never even notice.

How to fix it

Maybe you’re wondering: great, Rust saved me a world of pain, but how do I fix it? Do I just have to copy the lower_angle_bracket_parameters and have two copies? ‘Cause that’s kind of unfortunate.

Well, there are a variety of ways you might fix it. One of them is to use an arena, like the typed-arena crate. An arena is a memory pool. Instead of storing the temporary Vec<Parameter> vector on the stack, we’ll put it in an arena, and that way it will live for the entire time that we are lowering things. Example C in the repo takes this approach. It starts by adding the arena field to the Context:

struct Context<'ast> {
    impl_trait_tys: Vec<&'ast ast::TraitRef>,

    // Holds temporary AST nodes that we create during lowering;
    // this can be dropped once lowering is complete.
    arena: &'ast typed_arena::Arena<Vec<ast::Parameter>>,
}

This actually makes a subtle change to the meaning of 'ast. It used to be that the only things with 'ast lifetime were “the AST” itself, so having that lifetime implied being a part of the AST. But now that same lifetime is being used to tag the arena, too, so if we hae &'ast Foo it means the data comes is owned by either the arena or the AST itself.

Side note: despite the name lifetimes, which I now rather regret, more and more I tend to think of lifetimes like 'ast in terms of “who owns the data”, which you can see in my description in the previous paragraph. You could instead think of 'ast as a span of time (a “lifetime”), in which case it refers to the time that the Context type is valid, really, which must be a subset of the time that the arena is valid and the time that the AST itself is valid, since Context stores references to data owned by both of those.

Now we can rewrite lower_trait_ref to call self.arena.alloc():

impl Context<'ast> {
    fn lower_trait_ref(&mut self, trait_ref: &'ast ast::TraitRef) -> hir::TraitRef {
        match &trait_ref.parameters {
            // ...
            ast::Parameters::Parenthesized(types) => {
                let parameters: Vec<_> = types.iter().cloned().map(ast::Parameter::Ty).collect();
                let parameters = self.arena.alloc(parameters); // 👈 added this line!
                self.lower_angle_bracket_parameters(parameters);
            }
        }
        // ...
    }
}

Now the parameters variable is not stored on the stack but allocated in the arena; the arena has 'ast lifetime, so that’s fine, and everything works!

Calling the lowering code and creating the context

Now that we added, the arena, creating the context will look a bit different. It’ll look something like:

let arena = TypedArena::new();
let context = Context::new(&arena);
let hir_signature = context.lower_signature(&signature);

The nice thing about this is that, once we are done with lowering, the context will be dropped and all those temporary nodes will be freed.

Another way to fix it

The other obvious option is to avoid lifetimes altogether and just “clone all the things”. Given that the AST is immutable once constructed, you can just clone them into the vector:

struct Context {
    impl_trait_tys: Vec<ast::TraitRef>, // just clone it!
}

If that clone is too expensive (possible), then use Rc<ast::TraitRef> or Arc<ast::TraitRef> (this will require deep-ish changes to the AST to put all the things into Rc or Arc that might need to be individually referenced). At this point you’ve got a feeling a lot like garbage collection (if less ergonomic).

Yet another way

The way I tend to write compilers these days is to use the “indices as pointers”. In this approach, all the data in the AST is stored in vectors, and references between things use indices, kind of like I described here.

Conclusion

Compilation errors are pretty frustrating, but they may also be a sign that the compiler is protecting us from ourselves. In this case, when we embarked on this refactoring, I was totally sure it was going to work fine, because I didn’t realize we ever created “temporary AST” nodes, so I assumed that all the data was owned by the original AST. In a language like C or C++, it would have been very easy to have a bug here, and it would have been a horrible pain to find. With Rust, that’s not a problem.

Of course, not everything is great. For me, doing these kinds of lifetime transformations is old-hat. But for many people it’s pretty non-obvious how to start when the compiler is giving you error messages. When people come to me for help, the first thing I try to do is to suss out: what are the ownership relationships, and where do we expect these references to be coming form? There’s also various heuristics that I use to decide: do we need a new lifetime parameter? Can we re-use an existing one? I’ll try to write up more stories like this to clarify that side of things. Honestly, my main point here was that I was just so grateful that Rust prevented us from spending hours and hours debugging a subtle crash!

Looking forward a bit, I see a lot of potential to improve things about our notation and terminology. I think we should be able to make cases like this one much slicker, hopefully without requiring named lifetime parameters and so forth, or as many edits. But I admit I don’t yet know how to do it! :) My plan for now is to keep an eye out for the tricks I am using and the kinds of analysis I am doing in my head and write out blog posts like this one to capture those narratives. I encourage those of you who know Rust well (or who don’t!) to do the same.

Appendix: why not have Context own the TypedArena?

You may have noticed that using the arena had a kind of annoying consequence: people who called Context::new now had to create and supply an area:

let arena = TypedArena::new();
let context = Context::new(&arena);
let hir_signature = context.lower_signature(&signature);

This is because Context<'ast> stores a &'ast TypedArena<_>, and so the caller must create the arena. If we modified Context to own the arena, then the API could be better. So why didn’t I do that? To see why, check out example D (which doesn’t build). In that example, the Context looks like…

struct Context<'ast> {
    impl_trait_tys: Vec<&'ast ast::TraitRef>,

    // Holds temporary AST nodes that we create during lowering;
    // this can be dropped once lowering is complete.
    arena: typed_arena::Arena<Vec<ast::Parameter>>,
}

You then have to change the signatures of each function to take an &'ast mut self:

impl Context<'ast> {
    fn lower_signature(&'ast mut self, sig: &'ast ast::Signature) -> hir::Signature {...}
}

This is saying: the 'ast parameter might refer to data owned by self, or maybe by sig. Seems sensible, but if you try to build Example D, though, you get lots of errors. Here is one of the most interesting to me:

error[E0502]: cannot borrow `*self` as mutable because it is also borrowed as immutable
  --> examples/d.rs:98:17
   |
62 | impl<'ast> Context<'ast> {
   |      ---- lifetime `'ast` defined here
...
97 |                 let parameters = self.arena.alloc(parameters);
   |                                  ----------------------------
   |                                  |
   |                                  immutable borrow occurs here
   |                                  argument requires that `self.arena` is borrowed for `'ast`
98 |                 self.lower_angle_bracket_parameters(parameters);
   |                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ mutable borrow occurs here

What is this all about? This is actually pretty subtle! This is saying that parameters was allocated from self.arena. That means that parameters will be valid as long as self.arena is valid.

But self is an &mut Context, which means it can mutate any of the fields of the Context. When we call self.lower_angle_bracket_parameters(), it’s entirely possible that lower_angle_bracket_parameters could mutate the arena:

fn lower_angle_bracket_parameters(&'ast mut self, parameters: &'ast [ast::Parameter]) {
    self.arena = TypedArena::new(); // what if we did this?
    // ...
}

Of course, the code doesn’t do that now, but what if it did? The answer is that the parameters would be freed, because the arena that owns them is freed, and so we’d have dead code. D’oh!

All things considered, I’d like to make it possible for Context to own the arena, but right now it’s pretty challenging. This is a good example of code patterns we could enable, but it’ll require language extensions.

https://smallcultfollowing.com/babysteps/blog/2022/06/15/what-it-feels-like-when-rust-saves-your-bacon/

Async cancellation: a case study of pub-sub in mini-redis

Jun 13, 2022 Updated Jun 13, 2022

Show full content

Lately I’ve been diving deep into tokio’s mini-redis example. The mini-redis example is a great one to look at because it’s a realistic piece of quality async Rust code that is both self-contained and very well documented. Digging into mini-redis, I found that it exemplifies the best and worst of async Rust. On the one hand, the code itself is clean, efficient, and high-level. On the other hand, it relies on a number of subtle async conventions that can easily be done wrong – worse, if you do them wrong, you won’t get a compilation error, and your code will “mostly work”, breaking only in unpredictable timing conditions that are unlikely to occur in unit tests. Just the kind of thing Rust tries to avoid! This isn’t the fault of mini-redis – to my knowledge, there aren’t great alterantive patterns available in async Rust today (I go through some of the alternatives in this post, and their downsides).

Context: evaluating moro

We’ve heard from many users that async Rust has a number of pitfalls where things can break in subtle ways. In the Async Vision Doc, for example, the Barbara battles buffered streams and solving a deadlock stories discuss challenges with FuturesUnordered (wrapped in the buffered combinator); the Barbara gets burned by select and Alan tries to cache requests, which doesn’t always happen stories talk about cancellation hazards and the select! or race combinators.

In response to these stories, I created an experimental project called moro that explores structured concurrency in Rust. I’ve not yet blogged about moro, and that’s intentional. I’ve been holding off until I gain more confidence in moro’s APIs. In the meantime, various people (including myself) have been porting different bits of code to moro to get a better sense for what works and what doesn’t. GusWynn, for example, started changing bits of the materialize.io codebase to use moro and to have a safer alternative to cancellation. I’ve been poking at mini-redis, and I’ve also been working with some folks within AWS with some internal codebases.

What I’ve found so far is that moro absolutely helps, but it’s not enough. Therefore, instead of the triumphant blog post I had hoped for, I’m writing this one, which does a kind of deep-dive into the patterns that mini-redis uses: both how they work well when done right, but also how they are tedious and error-prone. I’ll be posting some follow-up blog posts that explore some of the ways that moro can help.

What is mini-redis?

If you’ve not seen it, mini-redis is a really cool bit of example code from the tokio project. It implements a “miniature” version of the redis in-memory data store, focusing on the key-value and pub-sub aspects of redis. Specifically, clients can connect to mini-redis and issue a subset of the redis commands. In this post, I’m going to focus on the “pub-sub” aspect of redis, in which clients can publish messages to a topic which are then broadcast to everyone who has subscribed to that topic. Whenever a client publishes a message, it receives in response the number of other clients that are currently subscribed to that topic.

Here is an example workflow involving two clients. Client 1 is subscribing to things, and Client 2 is publishing messages.

sequenceDiagram
    Client1 ->> Server: subscribe `A`
    Client2 ->> Server: publish `foo` to `A`
    Server -->> Client2: 1 client is subscribed to `A`
    Server -->> Client1: `foo` was published to `A`
    Client1 ->> Server: subscribe `B`
    Client2 ->> Server: publish `bar` to `B`
    Server -->> Client2: 1 client is subscribed to `B`
    Server -->> Client1: `bar` was published to `B`
    Client1 ->> Server: unsubscribe A
    Client2 ->> Server: publish `baz` to `A`
    Server -->> Client2: 0 clients are subscribed to `A`

Core data structures

To implement this, the redis server maintains a struct State that is shared across all active clients. Since it is shared across all clients, it is maintained in a Mutex (source):

struct Shared {
    /// The shared state is guarded by a mutex. […]
    state: Mutex<State>,
    …
}

Within this State struct, there is a pub_sub field (source):

pub_sub: HashMap<String, broadcast::Sender<Bytes>>,

The pub_sub field stores a big hashmap. The key is the topic and the value is the broadcast::Sender, which is the “sender half” of a tokio broadcast channel. Whenever a client issues a publish command, it ultimately calls Db::publish, which winds up invoking send on this broadcast channel:

pub(crate) fn publish(&self, key: &str, value: Bytes) -> usize {
        let state = self.shared.state.lock().unwrap();
        state
            .pub_sub
            .get(key)
            // On a successful message send on the broadcast channel, the number
            // of subscribers is returned. An error indicates there are no
            // receivers, in which case, `0` should be returned.
            .map(|tx| tx.send(value).unwrap_or(0))
            // If there is no entry for the channel key, then there are no
            // subscribers. In this case, return `0`.
            .unwrap_or(0)
}

The subscriber loop

We just saw how, when clients publish data to a channel, that winds up invoking send on a broadcast channel. But how do the clients who are subscribed to that channel receive those messages? The answer lies in the Subscribe command.

The idea is that the server has a set subscriptions of subscribed channels for the client (source):

let mut subscriptions = StreamMap::new();

This is implemented using a tokio StreamMap, which is a neato data structure that takes multiple streams which each yield up values of type V, gives each of them a key K, and combines them into one stream that yields up (K, V) pairs. In this case, the streams are the “receiver half” of those broadcast channels, and the keys are the channel names.

When it receives a subscribe command, then, the server wants to do the following:

Add the receivers for each subscribed channel into subscriptions.
Loop:
- If a message is published to subscriptions, then send it to the client.
- If the client subscribes to new channels, add those to subscriptions and send an acknowledgement to client.
- If the client unsubscribes from some channels, remove them from subscriptions and send an acknowledgement to client.
- If the client terminates, end the loop and close the connection.

“Show me the state”

Learning to write Rust code is basically an exercise in asking “show me the state” — i.e., the key to making Rust code work is knowing what data is going to be modified and when1. In this case, there are a few key pieces of state…

The set subscriptions of “broadcast receivers” from each subscribed stream
- There is also a set self.channels of “pending channel names” that ought to be subscribed to, though this is kind of an implementation detail and not essential.
The connection connection used to communicate with the client (a TCP socket)

And there are three concurrent tasks going on, each of which access that same state…

Looking for published messages from subscriptions and forwarding to connection (reads subscriptions, writes to connection)
Reading client commands from connection and then either…
- subscribing to new channels (writes to subscriptions) and sending a confirmation (writes to connection);
- or unsubscribing from channels (writes to subscriptions) and sending a confirmation (writes to connection).
Watching for termination and then cancelling everything (drops the broadcast handles in connections).

You can start to see that this is going to be a challenge. There are three conceptual tasks, but they are each needing mutable access to the same data:

flowchart LR
    forward["Forward published messages to client"]
    client["Process subscribe/unsubscribe messages from client"]
    terminate["Watch for termination"]
    
    subscriptions[("subscriptions:\nHandles from\nsubscribed channels")]
    connection[("connection:\nTCP stream\nto/from\nclient")]
    
    forward -- reads --> subscriptions
    forward -- writes --> connection
    
    client -- reads --> connection
    client -- writes --> subscriptions
    
    terminate -- drops --> subscriptions
    
    style forward fill:oldlace
    style client fill:oldlace
    style terminate fill:oldlace
    
    style subscriptions fill:pink
    style connection fill:pink

If you tried to do this with normal threads, it just plain wouldn’t work…

let mut subscriptions = vec![]; // close enough to a StreamMap for now
std::thread::scope(|s| {
   s.spawn(|| subscriptions.push("key1"));
   s.spawn(|| subscriptions.push("key2"));
});

If you try this on the playground, you’ll see it gets an error because both closures are trying to access the same mutable state. No good. So how does it work in mini-redis?

Enter select!, our dark knight

Mini-redis is able to juggle these three threads through careful use of the select! macro. This is pretty cool, but also pretty error-prone — as we’ll see, there are a number of subtle points in the way that select! is being used here, and it’s easy to write the code wrong and have surprising bugs. At the same time, it’s pretty neat that we can use select! in this way, and it begs the question of whether we can find safer patterns to achieve the same thing. I think right now you can find safer ones, but they require less efficiency, which isn’t really living up to Rust’s promise (though it might be a good idea). I’ll cover that in a follow-up post, though, for now I just want to focus on explaining what mini-redis is doing and the pros and cons of this approach.

The main loop looks like this (source):

let mut subscriptions = StreamMap::new();
loop {
    …
    select! {
        Some((channel_name, msg)) = subscriptions.next() => ...
        //                          -------------------- future 1
        res = dst.read_frame() => ...
        //    ---------------- future 2
        _ = shutdown.recv() => ...
        //  --------------- future 3
    }
}

select! is kind of like a match statement. It takes multiple futures (underlined in the code above) and continues executing them until one of them completes. Since the select! is in a loop, and in this case each of the features are producing a series of events, this setup effectively runs the three futures concurrently, processing events as they arrive:

subscriptions.next() – the future waiting for the next message to arise to the StreamMap
dst.read_frame() – the async method read_frame is defined on the conection, dst. It reads data from the client, parses it into a complete command, and returns that command. We’ll dive into this function in a bit – it turns out that it is written in a very careful way to account
shutdown.recv() – the mini-redis server signals a global shutdown by threading a tokio channel to every connection; when a message is sent to that channel, all the loops cleanup and stop.

How select! works

So, select! runs multiple futures concurrently until one of them completes. In practice, this means that it iterates down the futures, one after the other. Each future gets awoken and runs until it either yields (meaning, awaits on something that isn’t ready yet) or completes. If the future yields, then select! goes to the next future and tries that one.

Once a future completes, though, the select! gets ready to complete. It begins by dropping all the other futures that were selected. This means that they immediately stop executing at whatever await point they reached, running any destructors for things on the stack. As I described in a previous blog post, in practice this feels a lot like a panic! that is injected at the await point. And, just like any other case of recovering from an exception, it requires that code is written carefully to avoid introducing bugs – tomaka describes one such example in his blog post. These bugs are what gives async cancellation in Rust a reputation for being difficult.

Cancellation and mini-redis

Let’s talk through what cancellation means for mini-redis. As we saw, the select! here is effectively running two distinct tasks (as well as waiting for shutdown):

Waiting on subscriptions.next() for a message to arrive from subscribed channels, so it can be forwarded to the client.
Waiting on dst.read_frame() for the next comand from the client, so that we can modify the set of subscribed channels.

We’ll see that mini-redis is coded carefully so that, whichever of these events occurs first, everything keeps working correctly. We’ll also see that this setup is fragile – it would be easy to introduce subtle bugs, and the compiler would not help you find them.

Take a look back at the sample subscription workflow at the start of this post. After Client1 has subscribed to A, the server is effectively waiting for Client1 to send further messages, or for other clients to publish.

The code that checks for further messages from Client1 is an async function called read_frame. It has to read the raw bytes sent by the client and assemble them into a “frame” (a single command). The read_frame in mini-redis is written in particular way:

It loops and, for each iteration…
- tries to parse from a complete frame from self.buffer,
- if self.buffer doesn’t contain a complete frame, then it reads more data from the stream into the buffer.

In pseudocode, it looks like (source):

impl Connection {
    async fn read_frame(&mut self) -> Result<Option<Frame>> {
        loop {
            if let Some(f) = parse_frame(&self.buffer) {
                return Ok(Some(f));
            }
            
            read_more_data_into_buffer(&mut self.buffer).await;
        }
    }
}

The key idea is that the function buffers up data until it can read an entire frame (i.e., successfully complete) and then it removes that entire frame at once. It never removes part of a frame from the buffer. This ensures that if the read_frame function is canceled while awaiting more data, nothing gets lost.

Ways to write a broken read_frame

There are many ways to a version of read_frame that is NOT cancel-safe. For example, instead of storing the buffer in self, one could put the buffer on the stack:

impl Connection {
    async fn read_frame(&mut self) -> Result<Option<Frame>> {
        let mut buffer = vec![];
        
        loop {
            if let Some(f) = parse_frame(&buffer) {
                return Ok(Some(f));
            }
            
            read_more_data_into_buffer(&mut buffer).await;
            //                                      -----
            //                If future is canceled here,
            //                buffer is lost.
        }
    }
}

This setup is broken because, if the future is canceled when awaiting more data, the buffered data is lost.

Alternatively, read_frame could intersperse reading from the stream and parsing the frame itself:

impl Connection {
    async fn read_frame(&mut self) -> Result<Option<Frame>> {
        let mut buffer = vec![];
        
        let command_name = self.read_command_name().await 
        match command_name {
            "subscribe" => self.parse_subscribe_command().await,
            "unsubscribe" => self.parse_unsubscribe_command().await,
            "publish" => self.parse_publish_command().await,
            ...
        }
    }
}

The problem here is similar: if we are canceled while awaiting one of the parse_foo_command futures, then we will forget the fact that we read the command_name already.

Comparison with JavaScript

It is interesting to compare Rust’s Future model with Javascript’s Promise model. In JavaScript, when an async function is called, it implicitly creates a new task. This task has “independent life”, and it keeps executing even if nobody ever awaits it. In Rust, invoking an async fn returns a Future, but that is inert. A Future only executes when some task awaits it. (You can create a task by invoking a suitable spawn method your runtime, and then it will execute on its own.)

There are really good reasons for Rust’s model: in particular, it is a zero-cost abstraction (or very close to it). In JavaScript, if you have one async function, and you factor out a helper function, you just went from one task to two tasks, meaning twice as much load on the scheduler. In Rust, if you have an async fn and you factor out a helper, you still have one task; you also still allocate basically the same amount of stack space. This is a good example of the “performant” (“idiomatic code runs efficiently”) Rust design principle in action.

However, at least as we’ve currently set things up, the Rust model does have some sharp edges. We’ve seen three ways to write read_frame, and only one of them works. Interestingly, all three of them would work in JavaScript, because in the JS model, an async function always starts a task and hence maintains its context.

I would argue that this represents a serious problem for Rust, because it represents a failure to maintain the “reliability” principle (“if it compiles, it works”), whigh ought to come first and foremost for us. The result is that async Rust feels a bit more like C or C++, where performant and versatile take top rank, and one has to have a lot of experience to know how to avoid sharp edges.

Now, I am not arguing Rust should adopt the “Promises” model – I think the Future model is better. But I think we need to tweak something to recover that reliability.

Comparison with threads

It’s interesting to compare how mini-redis with async Rust would compare to a mini-redis implemented with threads. It turns out that it would also be challenging, but in different ways. To start, let’s write up some pseudocode for what we are trying to do:

let mut subscriptions = StreamMap::new();

spawn(async move {
    while let Some((channel_name, msg)) = subscriptions.next().await {
        connection.send_message(channel_name, msg);
    }
});

spawn(async move {
    while let Some(frame) = connection.read_frame().await {
        match frame {
            Subscribe(new_channel) => subscribe(&mut connection, new_channel),
            Unsubscribe(channel) => unsubscribe(&mut connection, channel),
            _ => ...,
        }
    }
});

Here we have spawned out two threads, one of which is waiting for new messages from the subscriptions, and one of which is processing incoming client messages (which may involve adding channels the subscriptions map).

There are two problems here. First, you may have noticed I didn’t handle server shutdown! That turns out to be kind of a pain in this setup, because tearing down those spawns tasks is harder than you might think. For simplicity, I’m going to skip that for the rest of the post – it turns out that moro’s APIs solve this problem in a really nice way by allowing shutdown to be imposed externally without any deep changes.

Second, those two threads are both accessing subscriptions and connection in a mutable way, which the Rust compiler will not accept. This is a key problem. Rust’s type system works really well when you can breakdown your data such that every task accesses distinct data (i.e., “spatially disjoint”), either because each task owns the data or because they have &mut references to different parts of it. We have a much harder time dealing with multiple tasks accessing the same data but at different points in time (i.e., “temporally disjoint”).

Use an arc-mutex?

The main way to manage multiple tasks sharing access to the same data is with some kind of interior mutability, typically an Arc<Mutex<T>>. One problem with this is that it fails Rust’s performant design principle (“idiomatic code runs efficiently”), because there is runtime overhead (even if it is minimal in practice, it doesn’t feel good). Another problem with Arc<Mutex<T>> is that it hits on a lot of Rust’s ergonomic weak points, failing our “supportive” principle (“the language, tools, and community are here to help”):

You have to allocate the arcs and clone references explicitly, which is annoying;
You have to invoke methods like lock, get back lock guards, and understand how destructors and lock guards interact;
In Async code in particular, thanks to #57478, the compiler doesn’t understand very well when a lock guard has been dropped, resulting in annoying compiler errors – though Eric Holk is close to landing a fix for this one! 🎉

Of course, people who remember the “bad old days” of async Rust before async-await are very familiar with this dynamic. In fact, one of the big selling points of adding async await sugar into Rust was getting rid of the need to use arc-mutex.

Deeper problems

But the ergonomic pitfalls of Arc<Mutex> are only the beginning. It’s also just really hard to get Arc<Mutex> to actually work for this setup. To see what I mean, let’s dive a bit deeper into the state for mini-redis. There are two main bits of state we have to think about:

the tcp-stream to the client
the StreamMap of active connections

Managing access to the tcp-stream for the client is actually relatively easy. For one thing, tokio streams support a split operation, so it is possible to take the stream and split out the “sending half” (for sending messages to the client) and the “receiving half” (for receiving messages from the client). All the active threads can send data to the client, so they all need the sending half, and presumably it’ll be have to be wrapped in an (async aware) mutex. But only one active thread needs the receiving half, so it can own that, and avoid any locks.

Managing access to the StreamMap of active connections, though, is quite a bit more difficult. Imagine we were to put that StreamMap itself into a Arc<Mutex>, so that both tasks can access it. Now one of the tasks is going to be waiting for new messages to arrive. It’s going to look something like this:

let mut subscriptions = Arc::new(Mutex::new(StreamMap::new()));

spawn(async move {
    while let Some((channel_name, msg)) = subscriptions.lock().unwrap().next().await {
        connection.send_message(channel_name, msg);
    }
});

However, this code won’t compile (thankfully!). The problem is that we are acquiring a lock but we are trying to hold onto that lock while we await, which means we might switch to other tasks with the lock being held. This can easily lead to deadlock if those other tasks try to acquire the lock, since the tokio scheduler and the O/S scheduler are not cooprerating with one another.

An alternative would be to use an async-aware mutex like tokio::sync::Mutex, but that is also not great: we can still wind up with a deadlock, but for another reason. The server is now prevented from adding a new subscription to the list until the lock is released, which means that if Client1 is trying to subscribe to a new channel, it has to wait for some other client to send a message to an existing channel to do so (because that is when the lock is released). Not great.

Actually, this whole saga is covered under another async vision doc “status quo” story, Alan thinks he needs async locks.

A third alternative: actors

Recognizing the problems with locks, Alice Ryhl some time ago wrote a nice blog post, “Actors with Tokio”, that explains how to setup actors. This problem actually helps to address both our problems around mutable state. The idea is to move the connections array so that it belongs solely to one actor. Instead of directly modifying collections, the other tasks will communicate with this actor by exchanging messages.

So basically there could be two actors, or even three:

Actor A, which owns the connections (list of subscribed streams). It receives messages that are either publishing new messages to the streams or messages that say “add this stream” to the list.
Actor B, which owns the “read half” of the client’s TCP stream. It reads bytes and parses new frames, then sends out requests to the other actors in response. For example, when a subscribe message comes in, it can send a message to Actor A saying “subscribe the client to this channel”.
Actor C, which owns the “write half” of the client’s TCP stream. Both actors A and B will send messages to it when there are things to be sent to client.

To see how this would be implemented, take a look at Alice’s post. The TL;DR is that you would model connections between actors as tokio channels. Each actor is either spawned or otherwise setup to run independently. You still wind up using select!, but you only use it to receive messages from multiple channels at once. This doesn’t present any cancelation hazards because the channel code is carefully written to avoid them.

This setup works fine, and is even elegant in its own way, but it’s also not living up to Rust’s concept of performant or the goal of “zero-cost abstractions” (ZCA). In particular, the idea with ZCA is that it is supposed to give you a model that says “if you wrote this by hand, you couldn’t do any better”. But if you wrote a mini-redis server in C, by hand, you probably wouldn’t adopt actors. In some sense, this is just adopting something much closer to the Promise model. (Plus, the most obvious way to implement actors in tokio is largely to use tokio::spawn, which definitely adds overhead, or to use FuturesUnordered, which can be a bit subtle as well – moro does address these problems by adding a nice API here.)

(The other challenge with actors implemented this way is coordinating shutdown, though it can certainly be done: you just have to remember to thread the shutdown handler around everywhere.)

Cancellation as the “dark knight”: looking again at select!

Taking a step back, we’ve now seen that trying to use distinct tasks introduces this interesting problem that we have shared data being accessed by all the tasks. That either pushes us to locks (broken) or actors (works), but either way, it raises the question: why wasn’t this a problem with select!? After all, select! is still combining various logical tasks, and those tasks are still touching the same variables, so why is the compiler ok with it?

The answer is closely tied to cancellation: the select! setup works because

the things running concurrently are not touching overlapping state:
- one of them is looking at subscriptions (waiting for a message);
- another is looking at connection;
- and the last one is receiving the termination message.
and once we decide which one of these paths to take, we cancel all the others.

This last part is key: if we receive an incoming message from the client, for example, we drop the future that was looking at subscriptions, canceling it. That means subscriptions is no longer in use, so we can push new subscriptions into it, or remove things from it.

So, cancellation is both what enables the mini-redis example to be performant and a zero-cost abstraction, but it is also the cause of our reliability hazards. That’s a pickle!

Conclusions

We’ve seen a lot of information, so let me try to sum it all up for you:

Fine-grained cancellation in select! is what enables async Rust to be a zero-cost abstraction and to avoid the need to create either locks or actors all over the place.
Fine-grained cancellation in select is the root cause for a LOT of reliability problems.

You’ll note that I wrote fine-grained cancellation. What I mean by that is specifically things like how select! will cancel the other futures. This is very different from coarse-grained cancellation like having the entire server shutdown, for which I think structured concurrency solves the problem very well.

So what can we do about fine-grained cancellation? Well, the answer depends.

In the short term, I value reliability above all, so I think adopting an actor-like pattern is a good idea. This setup can be a nice architecture for a lot of reasons2, and while I’ve described it as “not performant”, that assumes you are running a really high-scale server that has to handle a ton of load. For most applications, it will perform very well indeed.

I think it makes sense to be very judiciouis in what you select!! In the context of Materialize, GusWynn was experimenting with a Selectable trait for precisely this reason; that trait just permits select from a few sources, like channels. It’d be nice to support some convenient way of declaring that an async fn is cancel-safe, e.g. only allowing it to be used in select! if it is tagged with #[cancel_safe]. (This might be something one could author as a proc macro.)

But in the longer term, I’m interested if we can come up with a mechanism that will allow the compiler to get smarter. For example, I think it’d be cool if we could share one &mut across two async fn that are running concurrently, so long as that &mut is not borrowed across an await point. I have thoughts on that but…not for this post.

My experience is that being forced to get a clear picture on this is part of what makes Rust code reliable in practice. ↩︎
It’d be fun to take a look at Reactive Design Patterns and examine how many of them apply to Rust. I enjoyed that book a lot. ↩︎

https://smallcultfollowing.com/babysteps/blog/2022/06/13/async-cancellation-a-case-study-of-pub-sub-in-mini-redis/

Coherence and crate-level where-clauses

Apr 17, 2022 Updated Apr 17, 2022

Show full content

Rust has been wrestling with coherence more-or-less since we added methods; our current rule, the “orphan rule”, is safe but overly strict. Roughly speaking, the rule says that one can only implement foreign traits (that is, traits defined by one of your dependencies) for local types (that is, types that you define). The goal of this rule was to help foster the crates.io ecosystem — we wanted to ensure that you could grab any two crates and use them together, without worrying that they might define incompatible impls that can’t be combined. The rule has served us well in that respect, but over time we’ve seen that it can also have a kind of chilling effect, unintentionally working against successful composition of crates in the ecosystem. For this reason, I’ve come to believe that we will have to weaken the orphan rule. The purpose of this post is to write out some preliminary exploration of ways that we might do that.

So wait, how does the orphan rule protect composition?

You might be wondering how the orphan rule ensures you can compose crates from crates.io. Well, imagine that there is a crate widget that defines a struct Widget:

// crate widget
#[derive(PartialEq, Eq)]
pub struct Widget {
    pub name: String,
    pub code: u32,
}

As you can see, the crate has derived Eq, but neglected to derive Hash. Now, I am writing another crate, widget-factory that depends on widget. I’d like to store widgets in a hashset, but I can’t, because they don’t implement Hash! Today, if you want Widget to implement Hash, the only way is to open a PR against widget and wait for a new release.1 But if we didn’t have the orphan rule, we could just define Hash ourselves:

// Crate widget-factory
impl Hash for Widget {
    fn hash(&self) {
        // PSA: Don’t really define your hash functions like this omg.
        self.name.hash() ^ self.code.hash()
    }
}

Now we can define our WidgetFactory using HashSet<Widget>…

pub struct WidgetFactory {
    produced: HashSet<Widget>,
}

impl WidgetFactory {
    fn take_produced(&mut self) -> HashSet<Widget> {
        self.produced.take()
    }
}

OK, so far so good, but what happens if somebody else defines a widget-delivery crate and they too wish to use a HashSet<Widget>? Well, they will also define Hash for Widget, but of course they might do it differently — maybe even very badly:

// Crate widget-factory
impl Hash for Widget {
    fn hash(&self) {
        // PSA: You REALLY shouldn’t define your hash functions this way omg
        0
    }
}

Now the problem comes when I try to develop my widget-app crate that depends on widget-delivery and widget-factory. I now have two different impls of Hash for Widget, so which should the compiler use?

There are a bunch of answers we might give here, but most of them are bad:

We could have each crate use its own impl, in theory: but that wouldn’t work so well if the user tried to take a HashSet<Widget> from one crate and pass it to another crate.
The compiler could pick one of the two impls arbitrarily, but how do we know which one to use? In this case, one of them would give very bad performance, but it’s also possible that some code is designed to expect the exact hash algorithm it specified.
- This is even harder with associated types.
Users could tell us which impl they want, which is maybe better, but it also means that the widget-delivery crates have to be prepared that any impl they are using might be switched to another one by some other crate later on. This makes it impossible for us to inline the hash function or do other optimizations except at the very last second.

Faced with these options, we decided to just rule out orphan impls altogether. Too much hassle!

But the orphan rules make it hard to establish a standard

The orphan rules work well at ensuring that we can link two crates together, but ironically they can also work to make actual interop much harder. Consider the async runtime situation. Right now, there are a number of async runtimes, but no convenient way to write code that works with any runtime. As a result, people writing async libraries often wind up writing directly against one specific runtime. The end result is that we cannot combine libraries that were written against different runtimes, or at least that doing so can result in surprising failures.

It would be nice if we could implement some traits that allowed for greater interop. But we don’t quite know what those traits should look like (we also lack support for async fn in traits, but that’s coming!), so it would be nice if we could introduce those traits in the crates.io ecosystem and iterate a bit there — this was indeed the original vision for the futures crate! But if we do that, in practice, then the same crate that defines the trait must also define an implementation for every runtime. The problem is that the runtimes won’t want to depend on the futures crate, as it is still unstable; and the futures crate doesn’t want to have to depend on every runtime. So we’re kind of stuck. And of course if the futures crate were to take a dependency on some specific runtime, then that runtime couldn’t later add futures as a dependency, since that would result in a cycle.

Distinguishing “I need an impl” from “I prove an impl”

At the end of the day, I think we’re going to have to lift the orphan rule, and just accept that it may be possible to create crates that cannot be linked together because they contain overlapping impls. However, we can still give people the tools to ensure that composition works smoothly.

I would like to see us distinguish (at least) two cases:

I need this type to implement this trait (which maybe it doesn’t, yet).
I am supplying an impl of a trait for a given type.

The idea would be that most crates can just declare that they need an impl without actually supplying a specific one. Any number of such crates can be combined together without a problem (assuming that they don’t put inconsistent conditions on associated types).

Then, separately, one can have a crate that actually supplies an impl of a foreign trait for a foreign type. These impls can be isolated as much as possible. The hope is that only the final binary would be responsible for actually supplying the impl itself.

Where clauses are how we express “I need an impl” today

If you think about it, expressing “I need an impl” is something that we do all the time, but we typically do it with generic types. For example, when I write a function like so…

fn clone_list<T: Clone>(v: &[T]) {
    …
}

I am saying “I need a type T and I need it to implement Clone”, but I’m not being specific about what those types are.

In fact, it’s also possible to use where-clauses to specify things about non-generic types…

fn example()
where 
    u32: Copy,
{
{

…but the compiler today is a bit inconsistent about how it treats those. The plan is to move to a model where we “trust” what the user wrote — e.g., if the user wrote where String: Copy, then the function would treat the String type as if it were Copy, even if we can’t find any Copy impl. It so happens that such a function could never be called, but that’s no reason you can’t define it2.

Where clauses at the crate scope

What if we could put where clauses at the crate scope? We could use that to express impls that we need to exist without actually providing those impls. For example, the widget-factory crate from our earlier example might add a line like this into its lib.rs:

// Crate widget-factory
where Widget: Hash;

As a result, people would not be able to use that crate unless they either (a) supplied an impl of Hash for Widget or (b) repeated the where clause themselves, propagating the request up to the crates that depend on them. (Same as with any other where-clause.)

The intent would be to do the latter, propagating the dependencies up to the root crate, which could then either supply the impl itself or link in some other crate that does.

Allow crates to implement foreign traits for foreign impls

The next part of the idea would be to allow crates to implement foreign traits for foreign impls. I think I would convert the orphan check into a “deny by default” lint. The lint text would explain that these impls are not permitted because they may cause linker errors, but a crate could mark the impl with #[allow(orphan_impls]) to ignore that warning. Best practice would be to put orphan impls into their own crate that others can use.

Another idea: permit duplicate impls (especially those generated via derive)

Josh Triplett floated another interesting idea, which is that we could permit duplicate impls. One common example might be if the impl is defined via a derive (though we’d have to extend derive to permit one to derive on a struct definition that is not local somehow).

Conflicting where clauses

Even if you don’t supply an actual impl, it’s possible to create two crates that can’t be linked together if they contain contradictory where-clauses. For example, perhaps widget-factory defines Widget as an iterator over strings…

// Widget-factory
where Widget: Iterator<Item = String>;

…whilst widget-lib wants Widget to be an iterator over UUIDs:

// Widget-lib
where Widget: Iterator<Item = UUID>;

At the end of the day, at most one of these where-clauses can be satisfied, not both, so the two crates would not interoperate. That seems inevitable and ok.

Expressing target dependencies via where-clauses

Another idea that has been kicking around is the idea of expressing portability across target-architectures via traits and some kind of Platform type. As an example, one could imagine having code that says where Platform: NativeSimd to mean “this code requires native SIMD support”, or perhaps where Platform: Windows to mean “this msut support various windows APIs. This is just a “kernel” of an idea, I have no idea what the real trait hierarchy would look like, but it’s quite appealing and seems to fit well with the idea of crate-level where-clauses. Essentially the idea is to allow crates to “constrain the environment that they are used in” in an explicit way.

Module-level generics

In truth, the idea of crate-level where clauses is kind of a special case of having module-level generics, which I would very much like. The idea would be to allow modules (like types, functions, etc) to declare generic parameters and where-clauses.3 These would be nameable and usable from all code within the module, and when you referenced an item from outside the module, you would have to specify their value. This is very much like how a trait-level generic gets “inherited” by the methods in the trait.

I have wanted this for a long time because I often have modules where all the code is parameterized over some sort of “context parameter”. In the compiler, that is the lifetime ’tcx, but very often it’s some kind of generic type (e.g., Interner in salsa).

Conclusion

I discussed a few things in this post:

How coherence helps composability by ensuring that crates can be linked together, but harms composability by making it much harder to establish and use interoperability traits.
How crate-level where-clauses can allow us to express “I need someone to implement this trait” without actually providing an impl, providing for the ability to link things together.
A sketch of how crate-level where-clauses might be generalized to capture other kinds of constraints on the environment, such as conditions on the target platform, or to module-level generics, which could potentially be an ergonomic win.

Overall, I feel pretty excited about this direction. I feel like more and more things are becoming possible if we think about generalizing the trait system and making it more uniform. All of this, in my mind, builds on the work we’ve been doing to create a more precise definition of the trait system in a-mir-formality and to build up a team with expertise in how it works (see the types team RFC). I’ll write more about those in upcoming posts though! =)

You could also create a newtype and making your hashmap key off the newtype, but that’s more of a workaround, and doesn’t always work out. ↩︎
It might be nice of us to give a warning. ↩︎
Fans of ML will recognize this as “applicative functors”. ↩︎

https://smallcultfollowing.com/babysteps/blog/2022/04/17/coherence-and-crate-level-where-clauses/

Implied bounds and perfect derive

Apr 12, 2022 Updated Apr 12, 2022

Show full content

There are two ergonomic features that have been discussed for quite some time in Rust land: perfect derive and expanded implied bounds. Until recently, we were a bit stuck on the best way to implement them. Recently though I’ve been working on a new formulation of the Rust trait checker that gives us a bunch of new capabilities — among them, it resolved a soundness formulation that would have prevented these two features from being combined. I’m not going to describe my fix in detail in this post, though; instead, I want to ask a different question. Now that we can implement these features, should we?

Both of these features fit nicely into the less rigamarole part of the lang team Rust 2024 roadmap. That is, they allow the compiler to be smarter and require less annotation from you to figure out what code should be legal. Interestingly, as a direct result of that, they both also carry the same downside: semver hazards.

What is a semver hazard?

A semver hazard occurs when you have a change which feels innocuous but which, in fact, can break clients of your library. Whenever you try to automatically figure out some part of a crate’s public interface, you risk some kind of semver hazard. This doesn’t necessarily mean that you shouldn’t do the auto-detection: the convenience may be worth it. But it’s usually worth asking yourself if there is some way to lessen the semver hazard while still getting similar or the same benefits.

Rust has a number of semver hazards today.1 The most common example is around thread-safety. In Rust, a struct MyStruct is automatically deemed to implement the trait Send so long as all the fields of MyStruct are Send (this is why we call Send an auto trait: it is automatically implemented). This is very convenient, but an implication of it is that adding a private field to your struct whose type is not thread-safe (e.g., a Rc<T>) is potentially a breaking change: if someone was using your library and sending MyStruct to run in another thread, they would no longer be able to do so.

What is “perfect derive”?

So what is the perfect derive feature? Currently, when you derive a trait (e.g., Clone) on a generic type, the derive just assumes that all the generic parameters must be Clone. This is sometimes necessary, but not always; the idea of perfect derive is to change how derive works so that it instead figures out exactly the bounds that are needed.

Let’s see an example. Consider this List<T> type, which creates a linked list of T elements. Suppose that List<T> can be deref’d to yield its &T value. However, lists are immutable once created, and we also want them to be cheaply cloneable, so we use Rc<T> to store the data itself:

#[derive(Clone)]
struct List<T> {
    data: Rc<T>,
    next: Option<Rc<List<T>>>,
}

impl<T> Deref for List<T> {
    type Target = T;

    fn deref(&self) -> &T { &self.data }
}

Currently, derive is going to generate an impl that requires T: Clone, like this…

impl<T> Clone for List<T> 
where
    T: Clone,
{
    fn clone(&self) {
        List {
            value: self.value.clone(),
            next: self.next.clone(),
        }
    }
}

If you look closely at this impl, though, you will see that the T: Clone requirement is not actually necessary. This is because the only T in this struct is inside of an Rc, and hence is reference counted. Cloning the Rc only increments the reference count, it doesn’t actually create a new T.

With perfect derive, we would change the derive to generate an impl with one where clause per field, instead. The idea is that what we really need to know is that every field is cloneable (which may in turn require that T be cloneable):

impl<T> Clone for List<T> 
where
    Rc<T>: Clone, // type of the `value` field
    Option<Rc<List<T>>: Clone, // type of the `next` field
{
    fn clone(&self) { /* as before */ }
}

Making perfect derive sound was tricky, but we can do it now

This idea is quite old, but there were a few problems that have blocked us from doing it. First, it requires changing all trait matching to permit cycles (currently, cycles are only permitted for auto traits like Send). This is because checking whether List<T> is Send would not require checking whether Option<Rc<List<T>>> is Send. If you work that through, you’ll find that a cycle arises. I’m not going to talk much about this in this post, but it is not a trivial thing to do: if we are not careful, it would make Rust quite unsound indeed. For now, though, let’s just assume we can do it soundly.

The semver hazard with perfect derive

The other problem is that it introduces a new semver hazard: just as Rust currently commits you to being Send so long as you don’t have any non-Send types, derive would now commit List<T> to being cloneable even when T: Clone does not hold.

For example, perhaps we decide that storing a Rc<T> for each list wasn’t really necessary. Therefore, we might refactor List<T> to store T directly, like so:

#[derive(Clone)]
struct List<T> {
    data: T,
    next: Option<Rc<List<T>>>,
}

We might expect that, since we are only changing the type of a private field, this change could not cause any clients of the library to stop compiling. With perfect derive, we would be wrong.2 This change means that we now own a T directly, and so List<T>: Clone is only true if T: Clone.

Expanded implied bounds

An implied bound is a where clause that you don’t have to write explicitly. For example, if you have a struct that declares T: Ord, like this one…

struct RedBlackTree<T: Ord> { … }

impl<T: Ord> RedBlackTree<T> {
    fn insert(&mut self, value: T) { … }
}

…it would be nice if functions that worked with a red-black tree didn’t have to redeclare those same bounds:

fn insert_smaller<T>(red_black_tree: &mut RedBlackTree<T>, item1: T, item2: T) {
    // Today, this function would require `where T: Ord`:
    if item1 < item2 {
        red_black_tree.insert(item);
    } else {
        red_black_tree.insert(item2);
    }   
}\

I am saying expanded implied bounds because Rust already has two notions of implied bounds: expanding supertraits (T: Ord implies T: PartialOrd, for example, which is why the fn above can contain item1 < item2) and outlives relations (an argument of type &’a T, for example, implies that T: ‘a). The most maximal version of this proposal would expand those implied bounds from supertraits and lifetimes to any where-clause at all.

Implied bounds and semver

Expanding the set of implied bounds will also introduce a new semver hazard — or perhaps it would be better to say that is expands an existing semver hazard. It’s already the case that removing a supertrait from a trait is a breaking change: if the stdlib were to change trait Ord so that it no longer extended Eq, then Rust programs that just wrote T: Ord would no longer be able to assume that T: Eq, for example.

Similarly, at least with a maximal version of expanded implied bounds, removing the T: Ord from BinaryTree<T> would potentially stop client code from compiling. Making changes like that is not that uncommon. For example, we might want to introduce new methods on BinaryTree that work even without ordering. To do that, we would remove the T: Ord bound from the struct and just keep it on the impl:

struct RedBlackTree<T> { … }

impl<T> RedBlackTree<T> {
    fn len(&self) -> usize { /* doesn’t need to compare `T` values, so no bound */ }
}

impl<T: Ord> RedBlackTree<T> {
    fn insert(&mut self, value: T) { … }
}

But, if we had a maximal expansion of implied bounds, this could cause crates that depend on your library to stop compiling, because they would no longer be able to assume that RedBlackTree<X> being valid implies X: Ord. As a general rule, I think we want it to be clear what parts of your interface you are committing to and which you are not.

PSA: Removing bounds not always semver compliant

Interestingly, while it is true that you can remove bounds from a struct (today, at least) and be at semver complaint3, this is not the case for impls. For example if I have

impl<T: Copy> MyTrait for Vec<T> { }

and I change it to impl<T> MyTrait for Vec<T>, this is effectively introducing a new blanket impl, and that is not a semver compliant change (see RFC 2451 for more details).

Summarize

So, to summarize:

Perfect derive is great, but it reveals details about your fields—- sure, you can clone your List<T> for any type T now, but maybe you want the right to require T: Clone in the future?
Expanded implied bounds are great, but they prevent you from “relaxing” your requirements in the future— sure, you only ever have a RedBlackTree<T> for T: Ord now, but maybe you want to support more types in the future?
But also: the rules around semver compliance are rather subtle and quick to anger.

How can we fix these features?

I see a few options. The most obvious of course is to just accept the semver hazards. It’s not clear to me whether they will be a problem in practice, and Rust already has a number of similar hazards (e.g., adding a Box<dyn Write> makes your type no longer Send).

Another extreme alternative: crate-local implied bounds

Another option for implied bounds would be to expand implied bounds, but only on a crate-local basis. Imagine that the RedBlackTree type is declared in some crate rbtree, like so…

// The crate rbtree
struct RedBlackTree<T: Ord> { .. }
…
impl<T> RedBlackTree<T> {
    fn insert(&mut self, value: T) {
        …
    }
}

This impl, because it lives in the same crate as RedBlackTree, would be able to benefit from expanded implied bounds. Therefore, code inside the impl could assume that T: Ord. That’s nice. If I later remove the T: Ord bound from RedBlackTree, I can move it to the impl, and that’s fine.

But if I’m in some downstream crate, then I don’t benefit from implied bounds. If I were going to, say, implement some trait for RedBlackTree, I’d have to repeat T: Ord…

trait MyTrait { }

impl<T> MyTrait for rbtrait::RedBlackTree<T>
where
    T: Ord, // required
{ }

A middle ground: declaring “how public” your bounds are

Another variation would be to add a visibility to your bounds. The default would be that where clauses on structs are “private”, i.e., implied only within your module. But you could declare where clauses as “public”, in which case you would be committing to them as part of your semver guarantee:

struct RedBlackTree<T: pub Ord> { .. }

In principle, we could also support pub(crate) and other visibility modifiers.

Explicit perfect derive

I’ve been focused on implied bounds, but the same questions apply to perfect derive. In that case, I think the question is mildly simpler— we likely want some way to expand the perfect derive syntax to “opt in” to the perfect version (or “opt out” from it).

There have been some proposals that would allow you to be explicit about which parameters require which bounds. I’ve been a fan of those, but now that I’ve realized we can do perfect derive, I’m less sure. Maybe we should just want some way to say “add the bounds all the time” (the default today) or “use perfect derive” (the new option), and that’s good enough. We could even make there be a new attribute, e.g. #[perfect_derive(…)] or #[semver_derive]. Not sure.

Conclusion

In the past, we were blocked for technical reasons from expanding implied bounds and supporting perfect derive, but I believe we have resolved those issues. So now we have to think a bit about semver and decide how much explicit we want to be.

Side not that, no matter what we pick, I think it would be great to have easy tooling to help authors determine if something is a semver breaking change. This is a bit tricky because it requires reasoning about two versions of your code. I know there is rust-semverer but I’m not sure how well maintained it is. It’d be great to have a simple github action one could deploy that would warn you when reviewing PRs.

Rules regarding semver are documented here, by the way. ↩︎
Actually, you were wrong before: changing the types of private fields in Rust can already be a breaking change, as we discussed earlier (e.g., by introducing a Rc, which makes the type no longer implement Send). ↩︎
Uh, no promises — there may be some edge cases, particularly involving regions, where this is not true today. I should experiment. ↩︎

https://smallcultfollowing.com/babysteps/blog/2022/04/12/implied-bounds-and-perfect-derive/

dyn*: can we make dyn sized?

Mar 29, 2022 Updated Mar 29, 2022

Show full content

Last Friday, tmandry, cramertj, and I had an exciting conversation. We were talking about the design for combining async functions in traits with dyn Trait that tmandry and I had presented to the lang team on Friday. cramertj had an insightful twist to offer on that design, and I want to talk about it here. Keep in mind that this is a piece of “hot off the presses”, in-progress design and hence may easily go nowhere – but at the same time, I’m pretty excited about it. If it works out, it could go a long way towards making dyn Trait user-friendly and accessible in Rust, which I think would be a big deal.

Background: The core problem with dyn

dyn Trait is one of Rust’s most frustrating features. On the one hand, dyn Trait values are absolutely necessary. You need to be able to build up collections of heterogeneous types that all implement some common interface in order to implement core parts of the system. But working with heterogeneous types is just fundamentally hard because you don’t know how big they are. This implies that you have to manipulate them by pointer, and that brings up questions of how to manage the memory that these pointers point at. This is where the problems begin.

Problem: no memory allocator in core

One challenge has to do with how we factor our allocation. The core crate that is required for all Rust programs, libcore, doesn’t have a concept of a memory allocator. It relies purely on stack allocation. For the most part, this works fine: you can pass ownership of objects around by copying them from one stack frame to another. But it doesn’t work if you don’t know how much stack space they occupy!1

Problem: Dyn traits can’t really be substituted for impl Trait

In Rust today, the type dyn Trait is guaranteed to implement the trait Trait, so long as Trait is dyn safe. That seems pretty cool, but in practice it’s not all that useful. Consider a simple function that operates on any kind of Debug type:

fn print_me(x: impl Debug) {
    println!(“{x:?}”);
}

Even though the Debug trait is dyn-safe, you can’t just change the impl above into a dyn:

fn print_me(x: dyn Debug) { .. }

The problem here is that stack-allocated parameters need to have a known size, and we don’t know how big dyn is. The common solution is to introduce some kind of pointer, e.g. a reference:

fn print_me(x: &dyn Debug) { … }

That works ok for this function, but it has a few downsides. First, we have to change existing callers of print_me — maybe we had print_me(22) before, but now they have to write print_me(&22). That’s an ergonomic hit. Second, we’ve now hardcoded that we are borrowing the dyn Debug. There are other functions where this isn’t necessarily what we wanted to do. Maybe we wanted to store that dyn Debug into a datastructure and return it — for example, this function print_me_later returns a closure that will print x when called:

fn print_me_later(x: &dyn Debug) -> impl FnOnce() + ‘_ {
    move || println!(“{x:?}”)
}

Imagine that we wanted to spawn a thread that will invoke print_me_later:

fn spawn_thread(value: usize) {
   let closure = print_me_later(&value);
   std::thread::spawn(move || closure()); // <— Error, ‘static bound not satisfied
}

This code will not compile because closure references value on the stack. But if we had written print_me_later with an impl Debug parameter, it could take ownership of its argument and everything would work fine.

Of course, we could solve this by writing print_me_later to use Box but that’s hardcoding memory allocation. This is problematic if we want print_me_later to appear in a context, like libcore, that might not even have access to a memory allocator.

fn print_me_later(x: Box<dyn Debug>) -> impl FnOnce() + ‘_ {
    move || println!(“{x:?}”)
}

In this specific example, the Box is also kind of inefficient. After all, the value x is just a usize, and a Box is also a usize, so in theory we could just copy the integer around (the usize methods expect an &usize, after all). This is sort of a special case, but it does come up more than you would think at the lower levels of the system, where it may be worth the trouble to try and pack things into a usize — there are a number of futures, for example, that don’t really require much state.

The idea: What if the dyn were the pointer?

In the proposal for “async fns in traits” that tmandry and I put forward, we had introduced the idea of dynx Trait types. dynx Trait types were not an actual syntax that users would ever type; rather, they were an implementation detail. Effectively a dynx Future refers to a pointer to a type that implements Future. They don’t hardcode that this pointer is a Box; instead, the vtable includes a “drop” function that knows how to release the pointer’s referent (for a Box, that would free the memory).

Better idea: What if the dyn were “something of known size”?

After the lang team meeting, tmandry and I met with cramertj, who proceeded to point out to us something very insightful.2 The truth is that dynx Trait values don’t have to be a pointer to something that implemented Trait — they just have to be something pointer-sized. tmandry and I actually knew that, but what we didn’t see was how critically important this was:

First, a number of futures, in practice, consist of very little state and can be pointer-sized. For example, reading from a file descriptor only needs to store the file descriptor, which is a 32-bit integer, since the kernel stores the other state. Similarly the future for a timer or other builtin runtime primitive often just needs to store an index.
Second, a dynx Trait lets you write code that manipulates values which may be boxed without directly talking about the box. This is critical for code that wants to appear in libcore or be reusable across any possible context.
- As an example of something that would be much easier this way, the Waker struct, which lives in libcore, is effectively a hand-written dynx Waker struct.
Finally, and we’ll get to this in a bit, a lot of low-level systems code employs clever tricks where they know something about the layout of a value. For example, you might have a vector that contains values of various types, but (a) all those types have the same size and (b) they all share a common prefix. In that case, you can manipulate fields in that prefix without knowing what kind of data is contained with, and use a vtable or discriminatory to do the rest.
- In Rust, this pattern is painful to encode, though you can sometimes do it with a Vec<S> where S is some struct that contains the prefix fields and an enum. Enums work ok but if you have a more open-ended set of types, you might prefer to have trait objects.

A sketch: The dyn-star type

To give you a sense for how cool “fixed-size dyn types” could be, I’m going to start with a very simple design sketch. Imagine that we introduced a new type dyn* Trait, which represents the pair of:

a pointer-sized value of some type T that implements Trait (the * is meant to convey “pointer-sized”3)
a vtable for T: Trait; the drop method in the vtable drops the T value.

For now, don’t get too hung up on the specific syntax. There’s plenty of time to bikeshed, and I’ll talk a bit about how we might truly phase in something like dyn*. For now let’s just talk about what it would be like to use it.

Creating a dyn*

To coerce a value of type T into a dyn* Trait, two constraints must be met:

The type T must be pointer-sized or smaller.
The type T must implement Trait

Converting an impl to a dyn*

Using dyn*, we can convert impl Trait directly to dyn* Trait. This works fine, because dyn* Trait is Sized. To be truly equivalent to impl Trait, you do actually want a lifetime bound, so that the dyn* can represent references too:

// fn print_me(x: impl Debug) {…} becomes
fn print_me(x: dyn* Debug + ‘_) {
    println!(“{x:?}”);
}

fn print_me_later(x: dyn* Debug + ‘_) -> impl FnOnce() + ‘_ {
    move || println!(“{x:?}”)
}

These two functions can be directly invoked on a usize (e.g., print_me_later(22) compiles). What’s more, they work on references (e.g., print_me_later(&some_type)) or boxed values print_me_later(Box::new(some_type))).

They are also suitable for inclusion in a no-std project, as they don’t directly reference an allocator. Instead, when the dyn* is dropped, we will invoke its destructor from the vtable, which might wind up deallocating memory (but doesn’t have to).

More things are dyn* safe than dyn safe

Many things that were hard for dyn Trait values are trivial for dyn* Trait values:

By-value self methods work fine: a dyn* Trait value is sized, so you can move ownership of it just by copying its bytes.
Returning Self, as in the Clone trait, works fine.
- Similarly, the fact that trait Clone: Sized doesn’t mean that dyn* Clone can’t implement Clone, although it does imply that dyn Clone: Clone cannot hold.
Function arguments of type impl ArgTrait can be converted to dyn* ArgTrait, so long as ArgTrait is dyn*-safe
Returning an impl ArgTrait can return a dyn* ArgTrait.

In short, a large number of the barriers that make traits “not dyn-safe” don’t apply to dyn*. Not all, of course. Traits that take parameters of type Self won’t work (we don’t know that two dyn* Trait types have the same underlying type) and we also can’t support generic methods in many cases (we wouldn’t know how to monomorphize)4.

A catch: dyn* Foo requires Box<impl Foo>: Foo and friends

There is one catch from this whole setup, but I like to think of it is as an opportunity. In order to create a dyn* Trait from a pointer type like Box<Widget>, you need to know that Box<Widget>: Trait, whereas creating a Box<dyn Trait> just requires knowing that Widget: Trait (this follows directly from the fact that the Box is now part of the hidden type).

At the moment, annoyingly, when you define a trait you don’t automatically get any sort of impls for “pointers to types that implement the trait”. Instead, people often define such traits automatically — for example, the Iterator trait has impls like

impl<I> for &mut I
where
    I: ?Sized + Iterator

impl<I> for Box<I>
where
    I: ?Sized + Iterator

Many people forget to define such impls, however, which can be annoying in practice (and not just when using dyn).

I’m not totally sure the best way to fix this, but I view it as an opportunity because if we can supply such impls, that would make Rust more ergonomic overall.

One interesting thing: the impls for Iterator that you see above include I: ?Sized, which makes them applicable to Box<dyn Iterator>. But with dyn* Iterator, we are starting from a Box<impl Iterator> type — in other words, the ?Sized bound is not necessary, because we are creating our “dyn” abstraction around the pointer, which is sized. (The ?Sized is not harmful, either, of course, and if we auto-generate such impls, we should include it so that they apply to old-style dyn as well as slice types like [u8].)

Another catch: “shared subsets” of traits

One of the cool things about Rust’s Trait design is that it allows you to combine “read-only” and “modifier” methods into one trait, as in this example:

trait WidgetContainer {
    fn num_components(&self);
    fn add_component(&mut self, c: WidgetComponent);
}

I can write a function that takes a &mut dyn WidgetContainer and it will be able to invoke both methods. If that function takes &dyn WidgetContainer instead, it can only invoke num_components.

If we don’t do anything else, this flexibility is going to be lost with dyn*. Imagine that we wish to create a dyn* WidgetContainer from some &impl WidgetContainer type. To do that, we would need an impl of WidgetContainer for &T, but we can’t write that code, at least not without panicking:

impl<W> WidgetContainer for &W
where
    W: WidgetContainer,
{
    fn num_components(&self) {
        W::num_components(self) // OK
    }

    fn add_component(&mut self, c: WidgetComponent) {
        W::add_component(self, c) // Error!
    }
}

This problem is not specific to dyn — imagine I have some code that just invokes num_components but which can be called with a &W or with a Rc<W> or with other such types. It’s kind of awkward for me to write a function like that now: the easiest way is to hardcode that it takes &W and then lean on deref-coercions in the caller.

One idea that tmandry and I have been kicking around is the idea of having “views” on traits. The idea would be that you could write something like T: &WidgetContainer to mean “the &self methods of WidgetContainer”. If you had this idea, then you could certainly have

impl<W> &WidgetContainer for &W
where
    W: WidgetContainer

because you would only need to define num_components (though I would hope you don’t have to write such an impl by hand).

Now, instead of taking a &dyn WidgetContainer, you would take a dyn &WidgetContainer. Similarly, instead of taking an &impl WidgetContainer, you would probably be better off taking a impl &WidgetContainer (this has some other benefits too, as it happens).

A third catch: dyn safety sometimes puts constraints on impls, not just the trait itself

Rust’s current design assumes that you have a single trait definition and we can determine from that trait definition whether or not the trait ought to be dyn safe. But sometimes there are constraints around dyn safety that actually don’t affect the trait but only the impls of the trait. That kind of situation doesn’t work well with “implicit dyn safety”: if you determine that the trait is dyn-safe, you have to impose those limitations on its impls, but maybe the trait wasn’t meant to be dyn-safe.

I think overall it would be better if traits explicitly declared their intent to be dyn-safe or not. The most obvious way to do that would be with a declaration like dyn trait:

dyn trait Foo { }

As a nice side benefit, a declaration like this could also auto-generate impls like impl Foo for Box<impl Foo + ?Sized> and so forth. It would also mean that dyn-safety becomes a semver guarantee.

My main concern here is that I suspect most traits could and should be dyn-safe. I think I’d prefer if one had to opt out from dyn safety instead of opting in. I don’t know what the syntax for that would be, of course, and we’d have to deal with backwards compatibility.

Phasing things in over an edition

If we could start over again, I think I would approach dyn like this:

The syntax dyn Trait means a pointer-sized value that implements Trait. Typically a Box or & but sometimes other things.
The syntax dyn[T] Trait means “a value that is layout-compatible with T that implements Trait”; dyn Trait is thus sugar for dyn[*const ()] Trait, which we might write more compactly as dyn* Trait.
The syntax dyn[T..] Trait means “a value that starts with a prefix of T but has unknown size and implements Trait.
The syntax dyn[..] Trait means “some unknown value of a type that implements Trait”.

Meanwhile, we would extend the grammar of a trait bound with some new capabilities:

A bound like &Trait<P…> refers to “only the &self methods from Trait”;
A bound like &mut Trait<P…> refers to “only the &self and &mut self methods from Trait”;
- Probably this wants to include Pin<&mut Self> too? I’ve not thought about that.
We probably want a way to write a bound like Rc<Trait<P…>> to mean self: Rc<Self> and friends, but I don’t know what that looks like yet. Those kinds of traits are quite unusual.

I would expect that most people would just learn dyn Trait. The use cases for the dyn[] notation are far more specialized and would come later.

Interestingly, we could phase in this syntax in Rust 2024 if we wanted. The idea would be that we move existing uses of dyn to the explicit form in prep for the new edition:

&dyn Trait, for example, would become dyn* Trait + ‘_
Box<dyn Trait> would become dyn* Trait (note that a ’static bound is implied today; this might be worth reconsidering, but that’s a separate question).
other uses of dyn Trait would become dyn[…] Trait

Then, in Rust 2024, we would rewrite dyn* Trait to just dyn Trait with an “edition idom lint”.

Conclusion

Whew! This was a long post. Let me summarize what we covered:

If dyn Trait encapsulated some value of pointer size that implements Trait and not some value of unknown size:
- We could expand the set of things that are dyn safe by quite a lot without needing clever hacks:
  - methods that take by-value self: fn into_foo(self, …)
  - methods with parameters of impl Trait type (as long as Trait is dyn safe): fn foo(…, impl Trait, …)
  - methods that return impl Trait values: fn iter(&self) -> impl Iterator
  - methods that return Self types: fn clone(&self) -> Self
That would raise some problems we have to deal with, but all of them are things that would be useful anyway:
- You’d need dyn &Trait and things to “select” sets of methods.
- You’d need a more ergonomic way to ensure that Box<Trait>: Trait and so forth.
We could plausibly transition to this model for Rust 2024 by introducing two syntaxes, dyn* (pointer-sized) and dyn[..] (unknown size) and then changing what dyn means.

There are a number of details to work out, but among the most prominent are:

Should we declare dyn-safe traits explicitly? (I think yes)
- What “bridging” impls should we create when we do so? (e.g., to cover Box<impl Trait>: Trait etc)
How exactly do &Trait bounds work — do you get impls automatically? Do you have to write them?

Appendix A: Going even more crazy: dyn[T] for arbitrary prefixes

dyn* is pretty useful. But we could actually generalize it. You could imagine writing dyn[T] to mean “a value whose layout can be read as T. What we’ve called dyn* Trait would thus be equivalent to dyn[*const ()] Trait. This more general version allows us to package up larger values — for example, you could write dyn[[usize; 2]] Trait to mean a “two-word value”.

You could even imagine writing dyn[T] where the T meant that you can safely access the underlying value as a T instance. This would give access to common fields that the implementing type must expose or other such things. Systems programming hacks often lean on clever things like this. This would be a bit tricky to reconcile with cases where the T is a type like usize that is just indicating how many bytes of data there are, since if you are going to allow the dyn[T] to be treated like a &mut T the user could go crazy overwriting values in ways that are definitely not valid. So we’d have to think hard about this to make it work, that’s why I left it for an Appendix.

Appendix B: The “other” big problems with dyn

I think that the designs in this post address a number of the big problems with dyn:

You can’t use it like impl
Lots of useful trait features are not dyn-safe
You have to write ?Sized on impls to make them work

But it leaves a few problems unresolved. One of the biggest to my mind is the interaction with auto traits (and lifetimes, actually). With generic parameters like T: Debug, I don’t have to talk explicitly about whether T is Send or not or whether T contains lifetimes. I can just write write a generic type like struct MyWriter<W> where W: Write { w: W, ... }. Users of MyWriter know what W is, so they can determine whether or not MyWriter<Foo>: Send based on whether Foo: Send, and they also can understand that MyWriter<&'a Foo> includes references with the lifetime 'a. In contrast, if we did struct MyWriter { w: dyn* Write, ... }, that dyn* Write type is hiding the underlying data. As Rust currently stands, it implies that MyWriter it not Send and that it does not contain references. We don’t have a good way for MyWriter to declare that it is “send if the writer you gave me is send” and use dyn*. That’s an interesting problem! But orthogonal, I think, from the problems addressed in this blog post.

But, you are thinking, what about alloca? The answer is that alloca isn’t really a good option. For one thing, it doesn’t work on all targets, but in particular it doesn’t work for async functions, which require a fixed size stack frame. It also doesn’t let you return things back up the stack, at least not easily. ↩︎
Also, cramertj apparently had this idea a long time back but we didn’t really understand it. Ah well, sometimes it goes like that — you have to reinvent something to realize how brilliant the original inventor really was. ↩︎
In truth, I also just think “dyn-star” sounds cool. I’ve always been jealous of the A* algorithm and wanted to name something in a similar way. Now’s my chance! Ha ha! ↩︎
Obviously, we would be lifting this partly to accommoate impl Trait arguments. I think we could lift this restriction in more cases but it’s going to take a bit more design. ↩︎

https://smallcultfollowing.com/babysteps/blog/2022/03/29/dyn-can-we-make-dyn-sized/

Dare to ask for more #rust2024

Feb 9, 2022 Updated Feb 9, 2022

Show full content

Last year, we shipped Rust 2021 and I have found the changes to be a real improvement in usability. Even though the actual changes themselves were quite modest, the combination of precise capture closure and simpler formatting strings (println!("{x:?}") instead of println!("{:?}", x)) is making a real difference in my “day to day” life.1 Just like NLL and the new module system from Rust 2018, I’ve quickly adapted to these new conventions. When I go back to older code, with its clunky borrow checker workarounds and format strings, I die a little inside.2

As we enter 2022, I am finding my thoughts turning more and more to the next Rust edition. What do I want from Rust, and the Rust community, over the next few years? To me, the theme that keeps coming to mind is dare to ask for more. Rust has gotten quite a bit nicer to use over the last few years, but I am not satisfied. I believe that there is room for Rust to be 22x more productive3 and easy to use than it is today, and I think we can do it without meaningfully sacrificing reliability, performance, or versatility.

Daring to ask for a more ergonomic, expressive Rust

As Rust usage continues to grow, I have been able to talk to quite a number of Rust users with a wide variety of backgrounds and experience. One of the themes I like to ask about is their experience of learning Rust. In many ways, the story here is much better than I had anticipated. Most people are able to learn Rust and feel productive in 3-6 months. Moreover, once they get used to it, most people seem to really enjoy it, and they talk about how learning ownership rules influences the code they write in other languages too (for the better). They also talk about experiencing far fewer bugs in Rust than in other languages – this is true for C++4, but it’s also true for things written in Java or other languages5.

That said, it’s also quite clear that using Rust has a significant cognitive overhead. Few Rust users feel like true experts6. There are a few topics – “where clauses”, “lifetimes” – that people mention over and over as being confusing. The more I talk to people, the more I get the sense that the problem isn’t any one thing, it’s all the things. It’s having to juggle a lot of concerns all at once, and having to get everything lined up before one can even see your code run.

These interviews really validate the work we did on the ergonomics initiative and also in Rust 2021. One person I spoke to said the following:

Looking backwards, NLL and match ergonomics were major improvements in getting people to learn Rust. A lot of people suddenly found stuff way easier. NLL made a lot of things with regard to mutability much simpler. One remaining thing coming up is disjoint capture of fields in closures. That’s another example where people just didn’t understand, “why is this compiler yelling at me? This should work?”

As happy as I am with those results, I don’t think we’re done. I would like to see progress in two different dimensions:

Fundamental simplifications: These are changes like NLL or disjoint-closure-capture that just change the game in terms of what the compiler can accept. Even though these kinds of changes often make the analysis more complex, they ultimately make the language feel simpler: more of the programs that should work actually do work. Simplifications like this tend not to be particularly controversial, but they are difficult to design and implement. Often they require an edition because of small changes to language semantics in various edge cases.

One of the simplest improvements here would be landing polonius, which would fix #47680, a pattern that I see happening with some regularity. I think that there are also language extensions, like scoped contexts, some kind of view types, specialization, or some way to manage self-referential structs, that could fit in this category. That’s a bit trickier. The language grows, which is not a simplification, but it can make common patterns so much simpler than it’s a net win.

Sanding rough edges. These are changes that just make writing Rust code easier. There are fewer “i’s to dot” or “t’s to cross”. Good examples are lifetime elision. You know you are hitting a rough edge when you find yourself blindly following compiler suggestions, or randomly adding an & or a * here or there to see if it will make the compiler happy.

While sanding rough edges can benefit everyone, the impact is largest for newcomers. Experienced folks have a bit of “survival bias”. They tend to know the tricks and apply them automatically. Newcomers don’t have that benefit and can waste quite a lot of time (or just give up entirely) trying to fix some simple compilation errors.

Match ergonomics was a recent change in this category: while I believe it was an improvement, it also gave rise to a number of rough edges, particularly around references to copy types (see #44619 for more discussion). I’d like to see us fix those, and also fix “rough edges” in other areas, like implied bounds.

Daring to ask for a more ergonomic, expressive async Rust

Going along with the previous bullet, I think we still have quite a bit of work to do before using Async Rust feels natural. Tyler Mandry and I recently wrote a post on the “Inside Rust” blog, Async Rust in 2022, that sketched both the way we want async Rust to feel (“just add async”) and the plan to get there.

It seems clear that highly concurrent applications are a key area where Rust shines, so it makes sense for us to continue investing heavily in this area. What’s more, those investments benefit more than just async Rust users. Many of them are fundamental extensions to Rust, like generic associated types 7 or type alias impl trait 8, which ultimately benefit everyone.

Having a truly great async Rust experience, however, is going to require more than language extensions. It’s also going to require better tooling, like tokio console, and more efforts at standardization, like the portability and interoperability effort led by nrc.

Daring to ask for a more ergonomic, expressive unsafe Rust

Strange as it sounds, part of what makes Rust as safe as it is is the fact that Rust supports unsafe code. Unsafe code allows Rust programmers to gain access access to the full range of machine capabilities, which is what allows Rust to be versatile. Rust programmers can then use ownership/borrowing to encapsulate those raw capabilities in a safe interface, so that clients of that library can rely on things working correctly.

There are some flies in the unsafe ointment, though. The reality is that writing correct unsafe Rust code can be quite difficult.9 In fact, because we’ve never truly defined the set of rules that unsafe code authors have to follow, you could even say it is literally impossible, since there is no way to know if you are doing it correctly if nobody has defined what correct is.

To be clear, we do have a lot of promising work here! Stacked borrows, for example, looks to be awfully close to a viable approach for the aliasing rules. The rules are implemented in miri and a lot of folks are using that to check their unsafe code. Finally, the unsafe code guidelines effort made good progress on documenting layout guarantees and other aspects of unsafe code, though that work was never RFC’d or made normative. (The issues on that repo also contain a lot of great discussion.)

I think it’s time we paid good attention to the full experience of writing unsafe code. We need to be sure that people can write unsafe Rust abstractions that are correct. This means, yes, that we need to invest in defining the rules they have to follow. I think we also need to invest time in making correct unsafe Rust code more ergonomic to write. Unsafe Rust today often involves a lot of annotations and casts that don’t necessarily add much to the code10. There are also some core features, like method dispatch with a raw pointer, that don’t work, as well as features (like unsafe fields) that would help in ensuring unsafe guarantees are met.

Daring to ask for a richer, more interactive experience from Rust’s tooling

Tooling has a huge impact on the experience of using Rust, both as a learner and as a power user. I maintain that the the hassle-free experience of rustup and cargo has done as much for Rust’s adoption as our safety guarantees – maybe more. The quality of the compiler’s error messages comes up in virtually every single conversation I have, and I’ve lost count of how many people cite clippy and rustfmt as a key part of their onboarding process for new developers. Furthermore, after many years of ridiculously hard work, Rust’s IDE support is starting to be really, really good. Major kudos to both the rust-analyzer and IntelliJ Rust teams.

And yet, because I’m greedy, I want more. I want Rust to continue its tradition of “groundbreakingly good” tooling. I want you to be able to write cargo test --debug and have your test failures show up automatically in an omniscient debugger that lets you easily determine what happened11. I want profilers that serve up an approachable analysis of where you are burning CPU or allocating memory. I want it to be trivial to “up your game” when it comes to reliability by applying best practices like analyzing and improving code coverage or using a fuzzer to produce inputs.

I’m especially interested in tooling that changes the “fundamental relationship” between the Rust programmer and their programs. The difference between fixing compilation bugs in a modern Rust IDE and using rustc is a good illustration of this. In an IDE, you have the freedom to pick and choose which errors to fix and in which order, and the IDEs are getting good enough these days that this works quite well. Feedback is swift. This can be a big win.

I think we can do more like this. I would like to see people learning how the borrow checker works by “stepping through” code that doesn’t pass the borrow check, seeing the kinds of memory safety errors that can occur if that code were to execute. Or perhaps “debugging” trait resolution failures or other complex errors in a more interactive fashion. The sky’s the limit.

Daring to ask for richer tooling for unsafe Rust

One area where improved tooling could be particularly important is around “unsafe” Rust. If we really want people to write unsafe Rust code that is correct in practice – and I do! – they are going to need help. Just as with all Rust tooling, I think we need to cover the basics, but I also think we can go beyond that. We definitely need sanitizers, for example, but rather than just detecting errors, we can connect those sanitizers to debuggers and use that error as an opportunity to teach people how stacked borrows works. We can build better testing frameworks that make things like fuzzing and property-based testing easy. And we can offer strong support for formal methods, to support libraries that want to invest the time can give higher levels of assurance (the standard library seems like a good candidate, for example).

Conclusion: we got this

As Rust sees more success, it becomes harder and harder to make changes. There’s more and more Rust code out there and continuity and stability can sometimes be more important than fixing something that’s broken. And even when you do decide to make a change, everybody has opinions about how you should be doing it differently – worse yet, sometimes they’re right.12 It can sometimes be very tempting to say, “Rust is good enough, you don’t want one language for everything anyway” and leave it at that.

For Rust 2024, I don’t want us to do that. I think Rust is awesome. But I think Rust could be awesomer. We definitely shouldn’t go about making changes “just because”, we have to respect the work we’ve done before, and we have to be realistic about the price of churn. But we should be planning and dreaming as though the current crop of Rust programmers is just the beginning – as though the vast majority of Rust programs are yet to be written (which they are).

My hope is that for RustConf 2024, people will be bragging to each other about the hardships they endured back in the day. “Oh yeah,” they’ll say, “I was writing async Rust back in the old days. You had to grab a random crate from crates.io for every little thing you want to do. You want to use an async fn in a trait? Get a crate. You want to write an iterator that can await? Get a crate. People would come to standup after 5 days of hacking and be like ‘I finally got the code to compile!’ And we walked to work uphill in the snow! Both ways! In the summer!”13

So yeah, for Rust 2024, let’s dare to ask for more.14

Footnotes

One interesting change: I’ve been writing more and more code again. This itself is making a big difference in my state of mind, too! ↩︎
Die, I tell you! DIE! ↩︎
Because it’s 2022, get it? ↩︎
I talked to a team that developed some low-level Rust code (what would’ve been writte in C++) and they reported experienced one crash in 3+ years, which originated in an FFI to a C library. That’s just amazing. ↩︎
Most commonly, if Rust has an edge of a language like Java, it is because of our stronger concurrency guarantees. But it’s not only that. It’s also that meeting the required performance bar in other languages often requires one to write code that is “rather clever”. Rust’s higher performance means that one can write simpler code instead, which then has correspondingly fewer bugs. ↩︎
The survey consistenly has a peak of around 7 out of 10 in terms of how people self-identify their expertise. ↩︎
Shout out to Jack Huey, tirelessly driving that work forward! ↩︎
Shout out to Oliver Scherer, tirelessly driving that work forward! ↩︎
Armin wrote a recent article, Unsafe Rust is Too Hard, that gives some real-life examples of the kinds of challenges you can encounter. ↩︎
…besides boilerplate. ↩︎
Watch the recording pernos.co demo that Felix did for the Rustc Reading Club to get a sense for what is possible here! ↩︎
It’s so much easier when everybody else is wrong. ↩︎
I may have gotten a little carried away there. ↩︎
Hey, that rhymes! I’m a poet, and I didn’t even know it! ↩︎

https://smallcultfollowing.com/babysteps/blog/2022/02/09/dare-to-ask-for-more-rust2024/

Panics vs cancellation, part 1

Jan 27, 2022 Updated Jan 27, 2022

Show full content

One of the things people often complain about when doing Async Rust is cancellation. This has always been a bit confusing to me, because it seems to me that async cancellation should feel a lot like panics in practice, and people don’t complain about panics very often (though they do sometimes). This post is the start of a short series comparing panics and cancellation, seeking after the answer to the question “Why is async cancellation a pain point and what should we do about it?” This post focuses on explaining Rust’s panic philosophy and explaining why I see panics and cancellation as being quite analogous to one another.

Why panics are discouraged in Rust

Let’s go back to some pre-history. The Rust design has always included panics, but it hasn’t always included the catch_unwind function. In fact, adding that function was quite controversial. Why?

The reason is that long experience with exceptions has shown that exceptions work really well for propagating errors out, but they don’t work well for recovering from errors or handling them in a structured way. The problem is that exceptions make errors invisible, which means that programmers don’t think about them.

The only time when exceptions work well for recovery is when that recovery is done at a very coarse-grained level. If you have a “main loop” of your application and you can kind of catch the exception and restart that main loop, that can be very useful. You see this insight popping up all over the place; I think Erlang did it best, with their “let it crash” philosophy.

Why exceptions are bad at fine-grained recovery

The reason that exceptions are bad at fine-grained recovery is simple. In most programs, you have some kind of invariants that you are maintaining to ensure your data is in a valid state. It’s relatively straightforward to ensure that these invariants hold at the beginning of every operation and that they hold by the end of every operation. It’s really, really hard to ensure that those invariants hold all the time. Very often, you have some code that wants to make some mutations, put your data in an inconsistent state, and then fix that inconsistency.

Unfortunately, with widespread use of exceptions, what you have is that any piece of code, at any time, might suddenly just abort. So if that function is doing mutation, it could leave the program in an inconsistent state.

Consider this simple pseudocode (inspired by tomaka’s blog post). The idea of this function is that it is going to read from some file, parse the data it reads, and then send that data over a socket:

fn copy_data(from_file: &File, to_socket: &Socket) {
    let buffer = from_file.read();
    let parsed_items = parse(buffer);
    parsed_items.send(to_socket);
}

You might think that since this function doesn’t do any explicit mutation, it would be fine to stop it any point and re-execute it. But that’s not true: there is some implicit state, which is the cursor in the from_file. If the parse function or the send function were to throw an exception, whatever data had just been read (and maybe parsed) would be lost. The next time the function is invoked, it’s not going to go back and re-read that data, it’s just going to proceed from where it left off, and some data is lost.

Rust’s compromise

The initial design of Rust included the idea that panic recovery was only possible at the thread boundary. The idea was that threads own all of their state, so if a thread panicked, you would take down the thread, and with it all of the potentially corrupted state. In this way, recovery could be done with some reasonable assurance of success. There are some limits to this idea. For one thing, threads can share state. The most obvious way for that to happen is with a Mutex, but – as the copy_data example shows – you can also have problems when you are communicating (reading from a file, sending messages over a channel, etc). We have extra mechanisms to help with those cases, such as lock posioning, but the jury is out on how well they work.1

Why ? is good

All of this discussion of course begs the question, how is one supposed to handle error recovery in Rust? The answer, of course, is the ? operator. This operator desugars into a pattern match, but it has the effect of “propagating” the error to the caller of the function. If we look at the copy_data one more time, but imagine that any potential errors were propagated using results, it would look like:

fn copy_data(from_file: &File, to_socket: &Socket) -> eyre::Result<()> {
    let buffer = from_file.read()?;
    let parsed_items = parse(buffer);
    parsed_items.send(to_socket)?;
}

The nice thing about this code is that one can easily see and audit potential errors: for example, I can see that send may result in an error, and a sharp-eyed reviewer might see the potential data loss.2 Even better, I can do some sort of recovery in the case of error by opting not to forward the error but matching instead. (Note that the send methods typically pass back the message in the event of an error.)

fn copy_data(from_file: &File, to_socket: &Socket) -> eyre::Result<()> {
    let buffer = from_file.read()?;
    let parsed_items = parse(buffer);
    match parsed_items.send(to_socket) {
        Ok(()) => (),
        Err(SendError(parsed_items)) => recover_from_error(parsed_items),
    }
}

How does this connect to async cancellation?

I said that, from a user’s perspective, it seems to me that async cancellation and Rust panics should feel very similar. Let me explain.

It sometimes happen that you have spawned a future whose result is no longer needed. For example, you may be running a server that is doing work on behalf of a client, but that client may drop its connection, in which case you’d like to cancel that work.

In Rust, our cancellation story is centered around dropping. The idea is that to cancel a future, you drop it. Whenever you drop any kind of value in Rust, the value’s destructor runs which has the job of disposing of whatever resources that value owns. In the case of a future, the values that it owns are the suspended variables from the stack frame. Consider that same copy_data function we saw earlier, but ported to async Rust:

async fn copy_data(from_file: &File, to_socket: &Socket) {
    let buffer = from_file.read().await;
    let parsed_items = parse(buffer);
    parsed_items.send(to_socket).await;
}

Suppose that, at some point, we pause the program at the final line, parsed_items.send(...).await. In that case, the future would be storing the value of buffer and parsed_items. So when the future is dropped, those values will be dropped.

In effect, if you look at things from the “inside view” of the async fn, cancellation looks like the await call panicking – it unwinds the stack, running the destructors for all values. The analogy, of course, only goes so far: you can’t, for example, “catch” the unwinding from a cancellation. Also, panics arise from code that the thread executed, but cancellations are injected from the outside when the async fn’s result is no longer needed.3

Next time

In the next post I plan to start looking at examples of async cancellation and practice, trying to pinpoint how it is used and why it seems to cause more problems than panic.

Thanks

Thanks to Aaron Turon, Yoshua Wuyts, Yehuda Katz, and others with whom I’ve deep dived on this topic over the years, and to tomaka for their blog post.

Footnotes

My take is that the concept behind lock poisoning still seems good to me, but the ergonomics of how we implemented it are bad, and make people not like it. That said, I’d like to dig more into this: I’ve been hearing from various people that – even in their limited form – panics are one of the weaker points in Rust’s reliability story, and I’m not yet sure what to think. ↩︎
My experience is that these bugs are hard to spot in review, but that the ? operator is invaluable when debugging – in that case, you are asking the question, “how could this function possibly return early?”, and having the ? operator really helps you find the answer. ↩︎
This could be a crucial difference: I think, for example, it’s the reason that Java deprecated its Thread.stop method. ↩︎

https://smallcultfollowing.com/babysteps/blog/2022/01/27/panics-vs-cancellation-part-1/

Dyn async traits, part 7: a design emerges?

Jan 7, 2022 Updated Jan 8, 2022

Show full content

Hi all! Welcome to 2022! Towards the end of last year, Tyler Mandry and I were doing a lot of iteration around supporting “dyn async trait” – i.e., making traits that use async fn dyn safe – and we’re starting to feel pretty good about our design. This is the start of several blog posts talking about where we’re at. In this first post, I’m going to reiterate our goals and give a high-level outline of the design. The next few posts will dive more into the details and the next steps.

The goal: traits with async fn that work “just like normal”

It’s been a while since my last post about dyn trait, so let’s start by reviewing the overall goal: our mission is to allow async fn to be used in traits just like fn. For example, we would like to have an async version of the Iterator trait that looks roughly like this1:

trait AsyncIterator {
    type Item;
    
    async fn next(&mut self) -> Self::Item;
}

You should be able to use this AsyncIterator trait in all the ways you would use any other trait. Naturally, static dispatch and impl Trait should work:

async fn sum_static(mut v: impl AsyncIterator<Item = u32>) -> u32 {
    let mut result = 0;
    while let Some(i) = v.next().await {
        result += i;
    }
    result
}

But dynamic dispatch should work too:

async fn sum_dyn(v: &mut dyn AsyncIterator<Item = u32>) -> u32 {
    //               ^^^
    let mut result = 0;
    while let Some(i) = v.next().await {
        result += i;
    }
    result
}

Another goal: leave dyn cleaner than we found it

While we started out with the goal of improving async fn, we’ve also had a general interest in making dyn Trait more usable overall. There are a few reasons for this. To start, async fn is itself just sugar for a function that returns impl Trait, so making async fn in traits work is equivalent to making RPITIT (“return position impl trait in traits”) work. But also, the existing dyn Trait design contains a number of limitations that can be pretty frustrating, and so we would like a design that improves as many of those as possible. Currently, our plan lifts the following limitations, so that traits which make use of these features would still be compatible with dyn:

Return position impl Trait, so long as Trait is dyn safe.
- e.g., fn get_widgets(&self) -> impl Iterator<Item = Widget>
- As discussed above, this means that async fn works, since it desugars
Argument position impl Trait, so long as Trait is dyn safe.
- e.g., fn process_widgets(&mut self, items: impl Iterator<Item = Widget>).
By-value self methods.
- e.g., given fn process(self) and d: Box<dyn Trait>, able to call d.process()
- eventually this would be extended to other “box-like” smart pointers

If you put all three of those together, it represents a pretty large expansion to what dyn safety feels like in Rust. Here is an example trait that would now be dyn safe that uses all of these things together in a natural way:

trait Widget {
    async fn augment(&mut self, component: impl Into<WidgetComponent>);
    fn components(&self) -> impl Iterator<Item = WidgetComponent>;
    async fn transmit(self, factory: impl Factory);
}

Final goal: works without an allocator, too, though you have to work a bit harder

The most straightforward way to support RPITIT is to allocate a Box to store the return value. Most of the time, this is just fine. But there are use-cases where it’s not a good choice:

In a kernel, where you would like to use a custom allocator.
In a tight loop, where the performance cost of an allocation is too high.
Extreme embedded cases, where you have no allocator at all.

Therefore, we would like to ensure that it is possible to use a trait that uses async fns or RPITIT without requiring an allocator, though we think it’s ok for that to require a bit more work. Here are some alternative strategies one might want to support:

Pre-allocating stack space: when you create the dyn Trait, you reserve some space on the stack to store any futures or impl Trait that it might return.
Caching: reuse the same Box over and over to reduce the performance impact (a good allocator would do this for you, but not all systems ship with efficient allocators).
Sealed trait: you derive a wrapper enum for just the types that you need.

Ultimately, though, there is no limit to the number of ways that one might manage dynamic dispatch, so the goal is not to have a “built-in” set of strategies but rather allow people to develop their own using procedural macros. We can then offer the most common strategies in utility crates or perhaps even in the stdlib, while also allowing people to develop their own if they have very particular needs.

The design from 22,222 feet

I’ve drawn a little diagram to illustrate how our design works at a high-level:

VtableVtableCallerCallerArgumentadaptation
from vtable
Argument…Normal function found in the impl
Normal functi…Return value adaptation to vtable
Return value…Return type adaptation from vtable
Return type a…Caller knows:

Types of impl Trait arguments.

Caller does not know:

Type of the callee.
Precise return type, if function returns impl Trait.

Caller knows:…Argument adaptation to vtable
Argument adap…Callee does not know:

Types of impl Trait arguments.

Callee knows:

Type of the callee.
Precise return type, if function returns impl Trait.

Callee does not know:…Viewer does not support full SVG 1.1

Let’s walk through it:

To start, we have the caller, which has access to some kind of dyn trait, such as w: &mut Widget, and wishes to call a method, like w.augment()
The caller looks up the function for augment in the vtable and calls it:
- But wait, augment takes a impl Into<WidgetComponent>, which means that it is a generic function. Normally, we would have a separate copy of this function for every Into type! But we must have only a single copy for the vtable! What do we do?
- The answer is that the vtable encodes a copy that expects “some kind of pointer to a dyn Into<WidgetComponent>”. This could be a Box but it could also be other kinds of pointers: I’m being hand-wavy for now, I’ll go into the details later.
- The caller therefore has the job of creating a “pointer to a dyn Into<WidgetComponent>”. It can do this because it knows the type of the value being provided; in this case, it would do it by allocating some memory space on the stack.
The vtable, meanwhile, includes a pointer to the right function to call. But it’s not a direct pointer to the function from the impl: it’s a lightweight shim that wraps that function. This shim has the job of converting from the vtable’s ABI into the standard ABI used for static dispatch.
When the function returns, meanwhile, it is giving back some kind of future. The callee knows that type, but the caller doesn’t. Therefore, the callee has the job of converting it to “some kind of pointer to a dyn Future” and returning that pointer to the caller.
- The default is to box it, but the callee can customize this to use other strategies.
The caller gets back its “pointer to a dyn Future” and is able to await that, even though it doesn’t know exactly what sort of future it is.

Upcoming posts

In upcoming blog posts, I’m going to expand on several things that I alluded to in my walkthrough:

“Pointer to a dyn Trait”:
- How exactly do we encode “some kind of pointer” and what does that mean?
- This is really key, because we need to be able to support
Adaptation for impl Trait arguments:
- How do we adapt to/from the vtable for arguments of generic type?
- Hint: it involves create a dyn Trait for the argument
Adaptation for impl trait return values:
- How do we adapt to/from the vtable for arguments of generic type?
- Hint: it involves returning a dyn Trait, potentially boxed but not necessarily
Adaptation for by-value self:
- How do we adapt to/from the vtable for by-value self, and when are such functions callable?
Boxing and alternatives thereto:
- When you call an async fn or fn that returns impl Trait via dynamic dispatch, the default behavior is going to allocate a Box, but we’ve seen that doesn’t work for everyone. How convenient can we make it to select an alternative strategy like stack pre-allocation, and how can people create their own strategies?

We’ll also be updating the async fundamentals initiative page with more detailed design docs.

Appendix: Things I’d still like to see

I’m pretty excited about where we’re landing in this round of work, but it doesn’t get dyn where I ultimately want it to be. My ultimate goal is that people are able to use dynamic dispatch as conveniently as you use impl Trait, but I’m not entirely sure how to get there. That means being able to write function signatures that don’t talk about Box vs & or other details that you don’t have to deal with when you talk about impl Trait. It also means not having to worry so much about Send/Sync and lifetimes.

Here are some of the improvements I would like to see, if we can figure out how:

Support clone:
- Given trait Widget: Clone and w: Box<dyn Widget>, able to invoke w.clone()
- This almost works, but the fact that trait Clone: Sized makes it difficult.
Support “partially dyn safe” traits:
- Right now, dyn safe is all or nothing. This has the nice implication that dyn Foo: Foo for all types. However, it is also limiting, and many people have told me they find it confusing. Moreover, dyn Foo is not Sized, and hence while it’s cool conceptually that dyn Foo implements Foo, you can’t actually use a dyn Foo in the same way that you would use most other types.
Improve how Send interacts with returned values (e.g., RPIT, async fn in traits, etc):
- If you write dyn Foo + Send, that
Avoid having to talk about pointers so much
- When you use impl Trait, you get a really ergonomic experience today:
 - fn apply_map(map_fn: impl FnMut(u32) -> u32)
 - fn items(&self) -> impl Iterator<Item = Item> + '_
- In contrast, when you use dyn trait, you wind up having to be very explicit around lots of details, and your callers have to change as well:
 - fn apply_map(map_fn: &mut dyn FnMut(u32) -> u32)
 - fn items(&self) -> Box<dyn Iterator<Item = Item> + '_>
Make dyn trait feel more parametric:
- If I have an struct Foo<T: Trait> { t: Box<T> }, it has the nice property that it exposes the T. This means we know that Foo<T>: Send if T: Send (assuming Foo doesn’t have any fields that are not send), we know that Foo<T>: 'static if T: 'static, and so forth. This is very cool.
- In contrast, struct Foo { t: Box<dyn Trait> } bakes a lot of details – it doesn’t permit t to contain any references, and it doesn’t let Foo be Send.
Make it sound:
- There are a few open soundness bugs around dyn trait, such as #57893, and I would like to close them. This interacts with other things in this list.

This has traditionally been called Stream. ↩︎

https://smallcultfollowing.com/babysteps/blog/2022/01/07/dyn-async-traits-part-7/

Rustc Reading Club, Take 2

Nov 18, 2021 Updated Nov 18, 2021

Show full content

Wow! The response to the last Rustc Reading Club was overwhelming – literally! We maxed out the number of potential zoom attendees and I couldn’t even join the call! It’s clear that there’s a lot of demand here, which is great. We’ve decided to take another stab at running the Rustc Reading Club, but we’re going to try it a bit differently this time. We’re going to start by selecting a smaller group to do it a few times and see how it goes, and then decide how to scale up.

The ask

Here is what we want from you. If you are interested in the Rustc Reading Club, fill sign up on the form below!

Rustc reading club signup form

Start small…

As Doc Jones announced in her post, we’re going to hold our second meeting on December 2, 2021 at 12PM EST (see in your timezone). Read her post for all the details on how that’s going to work! To avoid a repeat of last time, this meeting will be invite only – we’re going to “hand select” about 10-15 people from the folks who sign up, looking for a range of experience and interests. The reason for this is that we want to try out the idea with a smaller group and see how it goes.

…and scale!

Presuming the club is a success, we would love to have more active clubs going on. My expectation is that we will have a number of rustc reading clubs of different kinds and flavors – for example, a recorded club, or a club that is held on Zulip instead of Zoom, or clubs in other languages.1 As we try out new ideas, we’ll make sure to reach out to people who signed up on the google form, so please do sign up if you are interested!

In fact, if you’re really excited, you don’t need to wait for us – just create a zoom room and invite your friends to read some code! Or leave a message in #rustc-reading-club on zulip, I bet you’d find some takers. ↩︎

https://smallcultfollowing.com/babysteps/blog/2021/11/18/rustc-reading-club-take-2/

CTCFT 2021-11-22 Agenda

Nov 15, 2021 Updated Nov 15, 2021

Show full content

The next “Cross Team Collaboration Fun Times” (CTCFT) meeting will take place next Monday, on 2021-11-22 at 11am US Eastern Time (click to see in your time zone). Note that this is a new time: we are experimenting with rotating in an earlier time that occurs during the European workday. This post covers the agenda. You’ll find the full details (along with a calendar event, zoom details, etc) on the CTCFT website.

Agenda

This meeting we’ve invited some of the people working to integrate Rust into the Linux kernel to come and speak. We’ve asked them to give us a feel for how the integration works and help identify those places where the experience is rough. The expectation is that we can use this feedback as an input when deciding what work to pursue and what features to prioritize for stabilization.

(5 min) Opening remarks 👋 (nikomatsakis)
(40 min) Rust for Linux (ojeda, alex, wedsonaf)
- The Rust for Linux project is adding Rust support to the Linux kernel. While it is still the early days, there are some areas of the Rust language, library, and tooling where the Rust project might be able to help out - for instance, via stabilization of features, suggesting ways to tackle particular problems, and more. This talk will walk through the issues found, along with examples where applicable.
(5 min) Closing (nikomatsakis)

Afterwards: Social Hour

After the CTCFT this week, we are going to try an experimental social hour. The hour will be coordinated in the #ctcft stream of the rust-lang Zulip. The idea is to create breakout rooms where people can gather to talk, hack together, or just chill.

https://smallcultfollowing.com/babysteps/blog/2021/11/15/ctcft-2021-11-22-agenda/

View types for Rust

Nov 5, 2021 Updated Nov 5, 2021

Show full content

I wanted to write about an idea that’s been kicking around in the back of my mind for some time. I call it view types. The basic idea is to give a way for an &mut or & reference to identify which fields it is actually going to access. The main use case for this is having “disjoint” methods that don’t interfere with one another.

This is not a proposal (yet?)

To be clear, this isn’t an RFC or a proposal, at least not yet. It’s some early stage ideas that I wanted to document. I’d love to hear reactions and thoughts, as I discuss in the conclusion.

Running example

As a running example, consider this struct WonkaShipmentManifest. It combines a vector bars of ChocolateBars and a list golden_tickets of indices for bars that should receive a ticket.

struct WonkaShipmentManifest {
    bars: Vec<ChocolateBar>,
    golden_tickets: Vec<usize>,
}

Now suppose we want to iterate over those bars and put them into their packaging. Along the way, we’ll insert a golden ticket. To start, we write a little function that checks whether a given bar should receive a golden ticket:

impl WonkaShipmentManifest {
    fn should_insert_ticket(&self, index: usize) -> bool {
        self.golden_tickets.contains(&index)
    }
}

Next, we write the loop that iterates over the chocolate bars and prepares them for shipment:

impl WonkaShipmentManifest {
    fn prepare_shipment(self) -> Vec<WrappedChocolateBar> {
        let mut result = vec![];
        for (bar, i) in self.bars.into_iter().zip(0..) {
            let opt_ticket = if self.should_insert_ticket(i) {
                Some(GoldenTicket::new())
            } else {
                None
            };
            result.push(bar.into_wrapped(opt_ticket));
        }
        result
    }
}

Satisfied with our code, we sit back and fire up the compiler and, wait… what’s this?

error[E0382]: borrow of partially moved value: `self`
   --> src/lib.rs:16:33
    |
15  |         for (bar, i) in self.bars.into_iter().zip(0..) {
    |                                   ----------- `self.bars` partially moved due to this method call
16  |             let opt_ticket = if self.should_insert_ticket(i) {
    |                                 ^^^^ value borrowed here after partial move
    |

Well, the message makes sense, but it’s unnecessary! The compiler is concerned because we are borrowing self when we’ve already moved out of the field self.bars, but we know that should_insert_ticket is only going to look at self.golden_tickets, and that value is still intact. So there’s not a real conflict here.

Still, thinking on it more, you can see why the compiler is complaining. It only looks at one function at a time, so how would it know what fields should_insert_ticket is going to read? And, even if were to look at the body of should_insert_ticket, maybe it’s reasonable to give a warning for future-proofing. Without knowing more about our plans here at Wonka Inc., it’s reasonable to assume that future code authors may modify should_insert_ticket to look at self.bars or any other field. This is part of the reason that Rust does its analysis on a per-function basis: checking each function independently gives room for other functions to change, so long as they don’t change their signature, without disturbing their callers.

What we need, then, is a way for should_insert_ticket to describe to its callers which fields it may use and which ones it won’t. Then the caller could permit invoking should_insert_ticket whenever the field self.golden_tickets is accessible, even if other fields are borrowed or have been moved.

An idea

When I’ve thought about this problem in the past, I’ve usually imagined that the list of “fields that may be accessed” would be attached to the reference. But that’s a bit odd, because a reference type &mut T doesn’t itself have an fields. The fields come from T.

So recently I was thinking, what if we had a view type? I’ll write it {place1, ..., placeN} T for now. What it means is “an instance of T, but where only the paths place1...placeN are accessible”. Like other types, view types can be borrowed. In our example, then, &{golden_tickets} WonkaShipmentManifest would describe a reference to WonkaShipmentManifest which only gives access to the golden_tickets field.

Creating a view

We could use some syntax like {place1..placeN} expr to create a view type1. This would be a place expression, which means that it refers to a specific place in memory. This means that it can be directly borrowed without creating a temporary. So I can create a view onto self that only has access to bars_counter like so:

impl WonkaShipmentManifest {
    fn example_a(&mut self) {
        let self1 = &{golden_tickets} self;
        println!("tickets = {:#?}", self1.golden_tickets);
    }
}

Notice the distinction between &self.golden_tickets and &{golden_tickets} self. The former borrows the field directly. The latter borrows the entire struct, but only gives access to one field. What happens if you try to access another field? An error, of course:

impl WonkaShipmentManifest {
    fn example_b(&mut self) {
        let self1 = &{golden_tickets} self;
        println!("tickets = {:#?}", self1.golden_tickets);
        for bar in &self1.bars {
            //      ^^^^^^^^^^
            // Error: self1 does not have access to `bars`
        }
    }
}

Of course, when a view is active, you can still access other fields through the original path, without disturbing the borrow:

impl WonkaShipmentManifest {
    fn example_c(&mut self) {
        let self1 = &{golden_tickets) self;
        
        for bar in &mut self.bars {
            println!("tickets = {:#?}", self1.golden_tickets);
        }
    }
}

And, naturally, that access includes the ability to create multiple views at once, so long as they have disjoint paths:

impl WonkaShipmentManifest {
    fn example_d(&mut self) {
        let self1 = &{golden_tickets) self;
        let self2 = &mut {bars} self;
        
        for bar in &mut self2.bars {
            println!("tickets = {:#?}", self1.golden_tickets);
            bar.modify();
        }
    }
}

View types in methods

As example C in the previous section suggested, we can use a view type in our definition of should_insert_ticket to specify which fields it will use:

impl WonkaChocolateFactory {
    fn should_insert_ticket(&{golden_tickets} self, index: usize) -> bool {
        self.golden_tickets.contains(&index)
    }
}

As a result of doing this, we can successfully compile the prepare_shipment function:

impl WonkaShipmentManifest {
    fn prepare_shipment(self) -> Vec<WrappedChocolateBar> {
        let mut result = vec![];
        for (bar, i) in self.bars.into_iter().zip(0..) {
            //          ^^^^^^^^^^^^^^^^^^^^^
            // Moving out of `self.bars` here....
            let opt_ticket = if self.should_insert_ticket(i) {
                //              ^^^^
                // ...does not conflict with borrowing a
                // view of `{golden_tickets}` from `self` here.
                Some(GoldenTicket::new())
            } else {
                None
            };
            result.push(bar.into_wrapped(opt_ticket));
        }
        result
    }
}

View types with access modes

All my examples so far were with “shared” views through & references. We could of course say that &mut {bars} WonkaShipmentManifest gives mutable access to the field bars, but it might also be nice to have an explicit mut mode, such that you write &mut {mut bars} WonkaShipmentManifest. This is more verbose, but it permits one to give away a mix of “shared” and “mut” access:

impl WonkaShipmentManifest {
    fn add_ticket(&mut {bars, mut golden_tickets} self, index: usize) {
        //              ^^^^  ^^^^^^^^^^^^^^^^^^^
        //              |     mut access to golden-tickets
        //              shared access to bars
        assert!(index < self.bars.len());
        self.golden_tickets.push(index);
    }
}

One could invoke add_ticket even if you had existing borrows to bars:

fn foo() {
    let manifest = WonkaShipmentManifest { bars, golden_tickets };
    let bar0 = &manifest.bars[0];
    //         ^^^^^^^^^^^^^^ shared borrow of `manifest.bars`...
    manifest.add_ticket(22);
    //      ^ borrows `self` mutably, but with view
    //        `{bars, mut golden_tickets}`
    println!("debug: {:?}", bar0);
}

View types and ownership

I’ve always shown view types with references, but combining them with ownership makes for other interesting possibilities. For example, suppose I wanted to extend GoldenTicket with some kind of unique serial_number that should never change, along with a owner field that will be mutated over time. For various reasons2, I might like to make the fields of GoldenTicket public:

pub struct GoldenTicket {
    pub serial_number: usize,
    pub owner: Option<String>,
}

impl GoldenTicket {
    pub fn new() -> Self {
        Self { .. }
    }
}

However, if I do that, then nothing stops future owners of a GoldenTicket from altering its serial_number:

let mut t = GoldenTicket::new();
t.serial_number += 1; // uh-oh!

The best answer today is to use a private field and an accessor:

pub struct GoldenTicket {
    pub serial_number: usize,
    pub owner: Option<String>,
}

impl GoldenTicket {
    pub fn new() -> Self {
        
    }
    
    pub fn serial_number(&self) -> usize {
        self.serial_number
    }
}

However, Rust’s design kind of discourages accessors. For one thing, the borrow checker doesn’t know which fields are used by an accessor, so you have code like this, you will now get annoying errors (this has been the theme of this whole post, of course):

let mut t = GoldenTicket::new();
let n = &mut t.owner;
compute_new_owner(n, t.serial_number());

Furthermore, accessors can be kind of unergonomic, particularly for things that are not copy types. Returning (say) an &T from a get can be super annoying.

Using a view type, we have some interesting other options. I could define a type alias GoldenTicket that is a limited view onto the underlying data:

pub type GoldenTicket = {serial_number, mut owner} GoldenTicketData;

pub struct GoldenTicketData {
    pub serial_number: usize,
    pub owner: Option<String>,
    dummy: (),
}

Now if my constructor function only ever creates this view, we know that nobody will be able to modify the serial_number for a GoldenTicket:

impl GoldenTicket {
    pub fn new() -> GoldenTicket {
        
    }
}

Obviously, this is not ergonomic to write, but it’s interesting that it is possible.

View types vs privacy

As you may have noticed in the previous example, view types interact with traditional privacy in interesting ways. It seems like there may be room for some sort of unification, but the two are also different. Traditional privacy (pub fields and so on) is like a view type in that, if you are outside the module, you can’t access private fields. Unlike a view, though, you can call methods on the type that do access those fields. In other words, traditional privacy denies you direct access, but permits intermediated access.

View types, in contrast, are “transitive” and apply both to direct and intermediated actions. If I have a view {serial_number} GoldenTicketData, I cannot access the owner field at all, even by invoking methods on the type.

Longer places

My examples so far have only shown views onto individual fields, but there is no reason we can’t have a view onto an arbitrary place. For example, one could write:

struct Point { x: u32, y: u32 }
struct Square { upper_left: Point, lower_right: Point }

let mut s: Square = Square { upper_left: Point { x: 22, y: 44 }, lower_right: Point { x: 66, y: 88 } };
let s_x = &{upper_left.x} s;

to get a view of type &{upper_left.x} Square. Paths like s.upper_left.y and s.lower_right would then still be mutable and not considered borrowed.

View types and named groups

There is another interaction with view types and privacy: view types name fields, but if you have private fields, you probably don’t want people outside your module typing their names, since that would prevent you from renaming them. At the same time, you might like to be able to let users refer to “groups of data” more abstractly. For example, for a WonkaShipmentManifest, I might like users to know they can iterate the bars and check if they have a golden ticket at once:

impl WonkaShipmentManifest {
    pub fn should_insert_ticket(&{golden_tickets} self, index: usize) -> bool {
        self.golden_tickets.contains(&index)
    }
    pub fn iter_bars_mut(&mut {bars} self) -> impl Iterator<Item = &mut Bar> {
        &mut self.bars
    }
}

But how should we express that to users without having them name fields directly? The obvious extension is to have some kind of “logical” fields that represent groups of data that can change over time. I don’t know how to declare those groups though.

Groups could be more DRY

Another reason to want named groups is to avoid repeating the names of common sets of fields over and over. It’s easy to imagine that there might be a few fields that some cluster of methods all want to access, and that repeating those names will be annoying and make the code harder to edit.

One positive thing from Rust’s current restrictions is that it has sometimes encouraged me to factor a single large type into multiple smaller ones, where the smaller ones encapsulate a group of logically related fields that are accessed together.[^ex] On the other hand, I’ve also encountered situations where such refactorings feel quite arbitrary – I have groups of fields that, yes, are accessed together, but which don’t form a logical unit on their own.

As an example of both why this sort of refactoring can be good and bad at the same time, I introduced the [cfg] field of the MIR Builder type to resolve errors where some methods only accessed a subset of fields. On the one hand, the CFG-related data is indeed conceptually distinct from the rest. On the other, the CFG type isn’t something you would use independently of the Builder itself, and I don’t feel that writing self.cfg.foo instead of self.foo made the code particularly clearer.

View types and fields in traits

Some time back, I had a draft RFC for fields in traits. That RFC was “postponed” and moved to a repo to iterate, but I have never had the time to invest in bringing it back. It has some obvious overlap with this idea of views, and (iirc) I had at some point considered using “fields in traits” as the basis for declaring views. I think I rather like this more “structural” approach, but perhaps traits with fields might be a way to give names to groups of fields that public users can reference. Have to mull on that.

View types and disjoint closure capture

Rust 2021 introduced disjoint closure capture. The idea is that closures capture one reference per path that is referenced, subject to some caveats. One of the things I am very happy with is that this was implemented with virtually no changes to the borrow checker: we basically just tweaked how closures are desugared. Besides saving a bunch of effort on the implementation3, this means that the risk of soundness problems is not increased. This strategy does have a downside, however: closures can sometimes get bigger (though we found experimentally that they rarely do in practice, and sometimes get smaller too).

Closures that access two paths like a.foo and a.bar can get bigger because they capture those paths independently, whereas before they have just captured a as a whole. Interestingly, using view types offers us a way to desugar those closures without introducing unsafe code. Closures could capture {foo, bar} a instead of the two fields independently. Neat!

How does this affect learning?

I’m always wary about extending “core Rust” because I don’t want to make Rust harder to learn. However, I also tend to feel that extensions like this one can have the opposite effect: I think that what throws people the most when learning Rust is trying to get a feel for what they can and cannot do. When they hit “arbitrary” restrictions like “cannot say that my helper function only uses a subset of my fields”4 that can often be the most confusing thing of all, because at first people think that they just don’t understand the system. “Surely there must be some way to do this!”

Going a bit further, one of the other challenges with Rust’s borrow checker is that so much of its reasoning is invisible and lacks explicit syntax. There is no way to “hand annotate” the value of lifetime parameters, for example, so as to explore how they work. Similarly, the borrow checker is currently tracking fine-grained state about which paths are borrowed in your program, but you have no way to talk about that logic explicitly. Adding explicit types may indeed prove helpful for learning.

But there must be some risks?

Yes, for sure. One of the best and worst things about Rust is that your public API docs force you to make decisions like “do I want &self or &mut self access for this function?” It pushes a lot of design up front (raising the risk of premature commitment) and makes things harder to change (more viscous). If it became “the norm” for people to document fine-grained information about which methods use which groups of fields, I worry that it would create more opportunities for semver-hazards, and also just make the docs harder to read.

On the other side, one of my observations it that public-facing types don’t want views that often; the main exception is that sometimes it’d be nice small accessors (for example, a Vec might like to document that one can read len even when iterating). Most of the time I find myself frustrated with this particular limitation of Rust, it has to do with private helper functions (similar to the initial example). In those cases, I think that the documentation is actually helpful, since it guides people who are reading and helps them know what to expect from the function.

Conclusion

This concludes our tour of “view types”, a proto-proposal. I hope you enjoyed your ride. Curious to hear what people think! I’ve opened an thread on internals for feedback. I’d love to know if you feel this would solve problems for you, but also how you think it would affect Rust learning – not to mention better syntax ideas.

I’d also be interested to read about related work. The idea here seems likely to have been invented and re-invented numerous times. What other languages, either in academic or industry, have similar mechanisms? How do they work? Educate me!

Footnotes

Yes, this is ambiguous. Think of it as my way of encouraging you to bikeshed something better. ↩︎
↩︎
Shout out to the RFC 2229 working group folks, who put in months and months and months of work on this. ↩︎
Another example is that there is no way to have a struct that has references to its own fields. ↩︎

https://smallcultfollowing.com/babysteps/blog/2021/11/05/view-types/

Rustc Reading Club

Oct 28, 2021 Updated Oct 28, 2021

Show full content

Ever wanted to understand how rustc works? Me too! Doc Jones and I have been talking and we had an idea we wanted to try. Inspired by the very cool Code Reading Club, we are launching an experimental Rustc Reading Club. Doc Jones posted an announcement on her blog, so go take a look!

The way this club works is pretty simple: every other week, we’ll get together for 90 minutes and read some part of rustc (or some project related to rustc), and talk about it. Our goal is to walk away with a high-level understanding of how that code works. For more complex parts of the code, we may wind up spending multiple sessions on the same code.

We may yet tweak this, but the plan is to follow a “semi-structured” reading process:

Identify the modules in the code and their purpose.
Look at the type definitions and try to describe their high-level purpose.
Identify the most important functions and their purpose.
Dig into how a few of those functions are actually implemented.

The meetings will not be recorded, but they will be open to anyone. The first meeting of the Rustc Reading Club will be November 4th, 2021 at 12:00pm US Eastern time. Hope to see you there!

https://smallcultfollowing.com/babysteps/blog/2021/10/28/rustc-reading-club/

Dyn async traits, part 6

Oct 15, 2021 Updated Oct 15, 2021

Show full content

A quick update to my last post: first, a better way to do what I was trying to do, and second, a sketch of the crate I’d like to see for experimental purposes.

An easier way to roll our own boxed dyn traits

In the previous post I covered how you could create vtables and pair the up with a data pointer to kind of “roll your own dyn”. After I published the post, though, dtolnay sent me this Rust playground link to show me a much better approach, one based on the erased-serde crate. The idea is that instead of make a “vtable struct” with a bunch of fn pointers, we create a “shadow trait” that reflects the contents of that vtable:

// erased trait:
trait ErasedAsyncIter {
    type Item;
    fn next<'me>(&'me mut self) -> Pin<Box<dyn Future<Output = Option<Self::Item>> + 'me>>;
}

Then the DynAsyncIter struct can just be a boxed form of this trait:

pub struct DynAsyncIter<'data, Item> {
    pointer: Box<dyn ErasedAsyncIter<Item = Item> + 'data>,
}

We define the “shim functions” by implementing ErasedAsyncIter for all T: AsyncIter:

impl<T> ErasedAsyncIter for T
where
    T: AsyncIter,
{
    type Item = T::Item;
    fn next<'me>(&'me mut self) -> Pin<Box<dyn Future<Output = Option<Self::Item>> + 'me>> {
        // This code allocates a box for the result
        // and coerces into a dyn:
        Box::pin(AsyncIter::next(self))
    }
}

And finally we can implement the AsyncIter trait for the dynamic type:

impl<'data, Item> AsyncIter for DynAsyncIter<'data, Item> {
    type Item = Item;

    type Next<'me>
    where
        Item: 'me,
        'data: 'me,
    = Pin<Box<dyn Future<Output = Option<Item>> + 'me>>;

    fn next(&mut self) -> Self::Next<'_> {
        self.pointer.next()
    }
}

Yay, it all works, and without any unsafe code!

What I’d like to see

This “convert to dyn” approach isn’t really specific to async (as erased-serde shows). I’d like to see a decorator that applies it to any trait. I imagine something like:

// Generates the `DynAsyncIter` type shown above:
#[derive_dyn(DynAsyncIter)]
trait AsyncIter {
    type Item;
    async fn next(&mut self) -> Option<Self::Item>;
}

But this ought to work with any -> impl Trait return type, too, so long as Trait is dyn safe and implemented for Box<T>. So something like this:

// Generates the `DynAsyncIter` type shown above:
#[derive_dyn(DynSillyIterTools)]
trait SillyIterTools: Iterator {
    // Iterate over the iter in pairs of two items.
    fn pair_up(&mut self) -> impl Iterator<(Self::Item, Self::Item)>;
}

would generate an erased trait that returns a Box<dyn Iterator<(...)>>. Similarly, you could do a trick with taking any impl Foo and passing in a Box<dyn Foo>, so you can support impl Trait in argument position.

Even without impl trait, derive_dyn would create a more ergonomic dyn to play with.

I don’t really see this as a “long term solution”, but I would be interested to play with it.

Comments?

I’ve created a thread on internals if you’d like to comment on this post, or others in this series.

https://smallcultfollowing.com/babysteps/blog/2021/10/15/dyn-async-traits-part-6/

Dyn async traits, part 5

Oct 14, 2021 Updated Oct 14, 2021

Show full content

If you’re willing to use nightly, you can already model async functions in traits by using GATs and impl Trait — this is what the Embassy async runtime does, and it’s also what the real-async-trait crate does. One shortcoming, though, is that your trait doesn’t support dynamic dispatch. In the previous posts of this series, I have been exploring some of the reasons for that limitation, and what kind of primitive capabilities need to be exposed in the language to overcome it. My thought was that we could try to stabilize those primitive capabilities with the plan of enabling experimentation. I am still in favor of this plan, but I realized something yesterday: using procedural macros, you can ALMOST do this experimentation today! Unfortunately, it doesn’t quite work owing to some relatively obscure rules in the Rust type system (perhaps some clever readers will find a workaround; that said, these are rules I have wanted to change for a while).

Just to be crystal clear: Nothing in this post is intended to describe an “ideal end state” for async functions in traits. I still want to get to the point where one can write async fn in a trait without any further annotation and have the trait be “fully capable” (support both static dispatch and dyn mode while adhering to the tenets of zero-cost abstractions1). But there are some significant questions there, and to find the best answers for those questions, we need to enable more exploration, which is the point of this post.

Code is on github

The code covered in this blog post has been prototyped and is available on github. See the caveat at the end of the post, though!

Design goal

To see what I mean, let’s return to my favorite trait, AsyncIter:

trait AsyncIter {
    type Item;
    async fn next(&mut self) -> Option<Self::Item>;
}

The post is going to lay out how we can transform a trait declaration like the one above into a series of declarations that achieve the following:

We can use it as a generic bound (fn foo<T: AsyncIter>()), in which case we get static dispatch, full auto trait support, and all the other goodies that normally come with generic bounds in Rust.
Given a T: AsyncIter, we can coerce it into some form of DynAsyncIter that uses virtual dispatch. In this case, the type doesn’t reveal the specific T or the specific types of the futures.
- I wrote DynAsyncIter, and not dyn AsyncIter on purpose — we are going to create our own type that acts like a dyn type, but which manages the adaptations needed for async.
- For simplicity, let’s assume we want to box the resulting futures. Part of the point of this design though is that it leaves room for us to generate whatever sort of wrapping types we want.

You could write the code I’m showing here by hand, but the better route would be to package it up as a kind of decorator (e.g., #[async_trait_v2]2).

The basics: trait with a GAT

The first step is to transform the trait to have a GAT and a regular fn, in the way that we’ve seen many times:

trait AsyncIter {
    type Item;

    type Next<‘me>: Future<Output = Option<Self::Item>>
    where
        Self: ‘me;

    fn next(&mut self) -> Self::Next<‘_>;
}

Next: define a “DynAsyncIter” struct

The next step is to manage the virtual dispatch (dyn) version of the trait. To do this, we are going to “roll our own” object by creating a struct DynAsyncIter. This struct plays the role of a Box<dyn AsyncIter> trait object. Instances of the struct can be created by calling DynAsyncIter::from with some specific iterator type; the DynAsyncIter type implements the AsyncIter trait, so once you have one you can just call next as usual:

let the_iter: DynAsyncIter<u32> = DynAsyncIter::from(some_iterator);
process_items(&mut the_iter);

async fn sum_items(iter: &mut impl AsyncIter<Item = u32>) -> u32 {
    let mut s = 0;
    while let Some(v) = the_iter.next().await {
        s += v;
    }
    s
}

Struct definition

Let’s look at how this DynAsyncIter struct is defined. First, we are going to “roll our own” object by creating a struct DynAsyncIter. This struct is going to model a Box<dyn AsyncIter> trait object; it will have one generic parameter for every ordinary associated type declared in the trait (not including the GATs we introduced for async fn return types). The struct itself has two fields, the data pointer (a box, but in raw form) and a vtable. We don’t know the type of the underlying value, so we’ll use ErasedData for that:

type ErasedData = ();

pub struct DynAsyncIter<Item> {
    data: *mut ErasedData,
    vtable: &’static DynAsyncIterVtable<Item>,
}

For the vtable, we will make a struct that contains a fn for each of the methods in the trait. Unlike the builtin vtables, we will modify the return type of these functions to be a boxed future:

struct DynAsyncIterVtable<Item> {
    drop_fn: unsafe fn(*mut ErasedData),
    next_fn: unsafe fn(&mut *mut ErasedData) -> Box<dyn Future<Output = Option<Item>> + ‘_>,
}

Implementing the AsyncIter trait

Next, we can implement the AsyncIter trait for the DynAsyncIter type. For each of the new GATs we introduced, we simply use a boxed future type. For the method bodies, we extract the function pointer from the vtable and call it:

impl<Item> AsyncIter for DynAsyncIter<Item> {
    type Item = Item;

    type Next<‘me> = Box<dyn Future<Output = Option<Item>> + ‘me>;

    fn next(&mut self) -> Self::Next<‘_> {
        let next_fn = self.vtable.next_fn;
        unsafe { next_fn(&mut self.data) }
   }
}

The unsafe keyword here is asserting that the safety conditions of next_fn are met. We’ll cover that in more detail later, but in short those conditions are:

The vtable corresponds to some erased type T: AsyncIter…
…and each instance of *mut ErasedData points to a valid Box<T> for that type.

Dropping the object

Speaking of Drop, we do need to implement that as well. It too will call through the vtable:

impl Drop for DynAsyncIter {
    fn drop(&mut self) {
        let drop_fn = self.vtable.drop_fn;
        unsafe { drop_fn(self.data); }
    }
}

We need to call through the vtable because we don’t know what kind of data we have, so we can’t know how to drop it correctly.

Creating an instance of DynAsyncIter

To create one of these DynAsyncIter objects, we can implement the From trait. This allocates a box, coerces it into a raw pointer, and then combines that with the vtable:

impl<Item, T> From<T> for DynAsyncIter<Item>
where
    T: AsyncIter<Item = Item>,
{
    fn from(value: T) -> DynAsyncIter {
        let boxed_value = Box::new(value);
        DynAsyncIter {
            data: Box::into_raw(boxed_value) as *mut (),
            vtable: dyn_async_iter_vtable::<T>(), // we’ll cover this fn later
        }
    }
}

Creating the vtable shims

Now we come to the most interesting part: how do we create the vtable for one of these objects? Recall that our vtable was a struct like so:

struct DynAsyncIterVtable<Item> {
    drop_fn: unsafe fn(*mut ErasedData),
    next_fn: unsafe fn(&mut *mut ErasedData) -> Box<dyn Future<Output = Option<Item>> + ‘_>,
}

We are going to need to create the values for each of those fields. In an ordinary dyn, these would be pointers directly to the methods from the impl, but for us they are “wrapper functions” around the core trait functions. The role of these wrappers is to introduce some minor coercions, such as allocating a box for the resulting future, as well as to adapt from the “erased data” to the true type:

// Safety conditions:
//
// The `*mut ErasedData` is actually the raw form of a `Box<T>` 
// that is valid for ‘a.
unsafe fn next_wrapper<‘a, T>(
    this: &’a mut *mut ErasedData,
) -> Box<dyn Future<Output = Option<T::Item>> + ‘a
where
    T: AsyncIter,
{
    let unerased_this: &mut Box<T> = unsafe { &mut *(this as *mut Box<T>) };
    let future: T::Next<‘_> = <T as AsyncIter>::next(unerased_this);
    Box::new(future)
}

We’ll also need a “drop” wrapper:

// Safety conditions:
//
// The `*mut ErasedData` is actually the raw form of a `Box<T>` 
// and this function is being given ownership of it.
fn drop_wrapper<T>(
    this: *mut ErasedData,
)
where
    T: AsyncIter,
{
    let unerased_this = Box::from_raw(this as *mut T);
    drop(unerased_this); // Execute destructor as normal
}

Constructing the vtable

Now that we’ve defined the wrappers, we can construct the vtable itself. Recall that the From impl called a function dyn_async_iter_vtable::<T>. That function looks like this:

fn dyn_async_iter_vtable<T>() -> &’static DynAsyncIterVtable<T::Item>
where
    T: AsyncIter,
{
    const {
        &DynAsyncIterVtable {
            drop_fn: drop_wrapper::<T>,
            next_fn: next_wrapper::<T>,
        }
    }
}

This constructs a struct with the two function pointers: this struct only contains static data, so we are allowed to return a &’static reference to it.

Done!

And now the caveat, and a plea for help

Unfortunately, this setup doesn’t work quite how I described it. There are two problems:

const functions and expressions stil lhave a lot of limitations, especially around generics like T, and I couldn’t get them to work;
Because of the rules introduced by RFC 1214, the &’static DynAsyncIterVtable<T::Item> type requires that T::Item: 'static, which may not be true here. This condition perhaps shouldn’t be necessary, but the compiler currently enforces it.

I wound up hacking something terrible that erased the T::Item type into uses and used Box::leak to get a &'static reference, just to prove out the concept. I’m almost embarassed to show the code, but there it is.

Anyway, I know people have done some pretty clever tricks, so I’d be curious to know if I’m missing something and there is a way to build this vtable on Rust today. Regardless, it seems like extending const and a few other things to support this case is a relatively light lift, if we wanted to do that.

Conclusion

This blog post presented a way to implement the dyn dispatch ideas I’ve been talking using only features that currently exist and are generally en route to stabilization. That’s exiting to me, because it means that we can start to do measurements and experimentation. For example, I would really like to know the performance impact of transitiong from async-trait to a scheme that uses a combination of static dispatch and boxed dynamic dispatch as described here. I would also like to explore whether there are other ways to wrap futures (e.g., with task-local allocators or other smart pointers) that might perform better. This would help inform what kind of capabilities we ultimately need.

Looking beyond async, I’m interested in tinkering with different models for dyn in general. As an obvious example, the “always boxed” version I implemented here has some runtime cost (an allocation!) and isn’t applicable in all environments, but it would be far more ergonomic. Trait objects would be Sized and would transparently work in far more contexts. We can also prototype different kinds of vtable adaptation.

In the words of Bjarne Stroustroup, “What you don’t use, you don’t pay for. And further: What you do use, you couldn’t hand code any better.” ↩︎
Egads, I need a snazzier name than that! ↩︎

https://smallcultfollowing.com/babysteps/blog/2021/10/14/dyn-async-traits-part-5/

CTCFT 2021-10-18 Agenda

Oct 13, 2021 Updated Oct 13, 2021

Show full content

The next “Cross Team Collaboration Fun Times” (CTCFT) meeting will take place next Monday, on 2021-10-18 (in your time zone)! This post covers the agenda. You’ll find the full details (along with a calendar event, zoom details, etc) on the CTCFT website.

Agenda

The theme for this meeting is exploring ways to empower and organize contributors.

(5 min) Opening remarks 👋 (nikomatsakis)
(5 min) CTCFT update (angelonfira)
(20 min) Sprints and groups implementing the async vision doc (tmandry)
(15 min) rust-analyzer talk (TBD)
- The rust-analyzer project aims to succeed RLS as the official language server for Rust. We talk about how it differs from RLS, how it is developed, and what to expect in the future.
(10 min) Contributor survey (yaahc)
- Introducing the contributor survey, it’s goals, methodology, and soliciting community feedback
(5 min) Closing (nikomatsakis)

Afterwards: Social hour

After the CTCFT this week, we are going to try an experimental social hour. The hour will be coordinated in the #ctcft stream of the rust-lang Zulip. The idea is to create breakout rooms where people can gather to talk, hack together, or just chill.

https://smallcultfollowing.com/babysteps/blog/2021/10/13/ctcft-2021-10-18-agenda/

Dyn async traits, part 4

Oct 7, 2021 Updated Oct 7, 2021

Show full content

In the previous post, I talked about how we could write our own impl Iterator for dyn Iterator by adding a few primitives. In this post, I want to look at what it would take to extend that to an async iterator trait. As before, I am interested in exploring the “core capabilities” that would be needed to make everything work.

Start somewhere: Just assume we want Box

In the first post of this series, we talked about how invoking an async fn through a dyn trait should to have the return type of that async fn be a Box<dyn Future> — but only when calling it through a dyn type, not all the time.

Actually, that’s a slight simplification: Box<dyn Future> is certainly one type we could use, but there are other types you might want:

Box<dyn Future + Send>, to indicate that the future is sendable across threads;
Some other wrapper type besides Box.

To keep things simple, I’m just going to look at Box<dyn Future> in this post. We’ll come back to some of those extensions later.

Background: Running example

Let’s start by recalling the AsyncIter trait:

trait AsyncIter {
    type Item;

    async fn next(&mut self) -> Option<Self::Item>;
}

Remember that when we “desugared” this async fn, we introduced a new (generic) associated type for the future returned by next, called Next here:

trait AsyncIter {
    type Item;

    type Next<'me>: Future<Output = Self::Item> + 'me;
    fn next(&mut self) -> Self::Next<'_>;
}

We were working with a struct SleepyRange that implements AsyncIter:

struct SleepyRange { … }
impl AsyncIter for SleepyRange {
    type Item = u32;
    …
}

Background: Associated types in a static vs dyn context

Using an associated type is great in a static context, because it means that when you call sleepy_range.next(), we are able to resolve the returned future type precisely. This helps us to allocate exactly as much stack as is needed and so forth.

But in a dynamic context, i.e. if you have some_iter: Box<dyn AsyncIter> and you invoke some_iter.next(), that’s a liability. The whole point of using dyn is that we don’t know exactly what implementation of AsyncIter::next we are invoking, so we can’t know exactly what future type is returned. Really, we just want to get back a Box<dyn Future<Output = Option<u32>>> — or something very similar.

How could we have a trait that boxes futures, but only when using dyn?

If we want the trait to only box futures when using dyn, there are two things we need.

First, we need to change the impl AsyncIter for dyn AsyncIter. In the compiler today, it generates an impl which is generic over the value of every associated type. But we want an impl that is generic over the value of the Item type, but which specifies the value of the Next type to be Box<dyn Future>. This way, we are effectively saying that “when you call the next method on a dyn AsyncIter, you always get a Box<dyn Future> back” (but when you call the next method on a specific type, such as a SleepyRange, you would get back a different type — the actual future type, not a boxed version). If we were to write that dyn impl in Rust code, it might look something like this:

impl<I> AsyncIter for dyn AsyncIter<Item = I> {
    type Item = I;

    type Next<'me> = Box<dyn Future<Output = Option<I>> + ‘me>;
    fn next(&mut self) -> Self::Next<'_> {
        /* see below */
    }
}

The body of the next function is code that extracts the function pointer from the vtable and calls it. Something like this, relying on the APIs from [RFC 2580] along with the function associated_fn that I sketched in the previous post:

fn next(&mut self) -> Self::Next<‘_> {
    type RuntimeType = ();
    let data_pointer: *mut RuntimeType = self as *mut ();
    let vtable: DynMetadata = ptr::metadata(self);
    let fn_pointer: fn(*mut RuntimeType) -> Box<dyn Future<Output = Option<I>> + ‘_> =
        associated_fn::<AsyncIter::next>();
    fn_pointer(data)
}

This is still the code we want. However, there is a slight wrinkle.

Constructing the vtable: Async functions need a shim to return a Box

In the next method above, the type of the function pointer that we extracted from the vtable was the following:

fn(*mut RuntimeType) -> Box<dyn Future<Output = Option<I>> + ‘_>

However, the signature of the function in the impl is different! It doesn’t return a Box, it returns an impl Future! Somehow we have to bridge this gap. What we need is a kind of “shim function”, something like this:

fn next_box_shim<T: AsyncIter>(this: &mut T) -> Box<dyn Future<Output = Option<I>> + ‘_> {
    let future: impl Future<Output = Option<I>> = AsyncIter::next(this);
    Box::new(future)
}

Now the vtable for SleepyRange can store next_box_shim::<SleepyRange> instead of storing <SleepyRange as AsyncIter>::next directly.

Extending the AssociatedFn trait

In my previous post, I sketched out the idea of an AssociatedFn trait that had an associated type FnPtr. If we wanted to make the construction of this sort of shim automated, we would want to change that from an associated type into its own trait. I’m imagining something like this:

trait AssociatedFn { }
trait Reify<F>: AssociatedFn {
    fn reify(self) -> F; 
}

where A: Reify<F> indicates that the associated function A can be “reified” (made into a function pointer) for a function type F. The compiler could implement this trait for the direct mapping where possible, but also for various kinds of shims and ABI transformations. For example, the AsyncIter::next method might implementReify<fn(*mut ()) -> Box<dyn Future<..>>> to allow a “boxing shim” to be constructed and so forth.

Other sorts of shims

There are other sorts of limitations around dyn traits that could be overcome with judicious use of shims and tweaked vtables, at least in some cases. As an example, consider this trait:

pub trait Append {
    fn append(&mut self, values: impl Iterator<Item = u32>);
}

This trait is not traditionally dyn-safe because the append function is generic and requires monomorphization for each kind of iterator — therefore, we don’t know which version to put in the vtable for Append, since we don’t yet know the types of iterators it will be applied to! But what if we just put one version, the case where the iterator type is &mut dyn Iterator<Item = u32>? We could then tweak the impl Append for dyn Append to create this &mut dyn Iterator and call the function from the vtable:

impl Append for dyn Append {
    fn append(&mut self, values: impl Iterator<Item = u32>) {
        let values_dyn: &mut dyn Iterator<Item = u32> = &values;
        type RuntimeType = ();
        let data_pointer: *mut RuntimeType = self as *mut ();
        let vtable: DynMetadata = ptr::metadata(self);
        let f = associated_fn::<Append::append>(vtable);
        f(data_pointer, values_dyn);
    }
}

Conclusion

So where does this leave us? The core building blocks for “dyn async traits” seem to be:

The ability to customize the contents of the vtable that gets generated for a trait.
- For example, async fns need shim functions that box the output.
The ability to customize the dispatch logic (impl Foo for dyn Foo).
The ability to customize associated types like Next to be a Box<dyn>:
- This requires the ability to extract the vtable, as given by [RFC 2580].
- It also requires the ability to extract functions from the vtable (not presently supported).

I said at the outset that I was going to assume, for the purposes of this post, that we wanted to return a Box<dyn>, and I have. It seems possible to extend these core capabilities to other sorts of return types (such as other smart pointers), but it’s not entirely trivial; we’d have to define what kinds of shims the compiler can generate.

I haven’t really thought very hard about how we might allow users to specify each of those building blocks, though I sketched out some possibilities. At this point, I’m mostly trying to explore the possibilities of what kinds of capabilities may be useful or necessary to expose.

https://smallcultfollowing.com/babysteps/blog/2021/10/07/dyn-async-traits-part-4/

Dyn async traits, part 3

Oct 6, 2021 Updated Oct 6, 2021

Show full content

In the previous “dyn async traits” posts, I talked about how we can think about the compiler as synthesizing an impl that performed the dynamic dispatch. In this post, I wanted to start explore a theoretical future in which this impl was written manually by the Rust programmer. This is in part a thought exercise, but it’s also a possible ingredient for a future design: if we could give programmers more control over the “impl Trait for dyn Trait” impl, then we could enable a lot of use cases.

Example

For this post, async fn is kind of a distraction. Let’s just work with a simplified Iterator trait:

trait Iterator {
    type Item;
    fn next(&mut self) -> Option<Self::Item>;
}

As we discussed in the previous post, the compiler today generates an impl that is something like this:

impl<I> Iterator for dyn Iterator<Item = I> {
    type Item = I;
    fn next(&mut self) -> Option<I> {
        type RuntimeType = ();
        let data_pointer: *mut RuntimeType = self as *mut ();
        let vtable: DynMetadata = ptr::metadata(self);
        let fn_pointer: fn(*mut RuntimeType) -> Option<I> =
            __get_next_fn_pointer__(vtable);
        fn_pointer(data)
    }
}

This code draws on the APIs from RFC 2580, along with a healthy dash of “pseduo-code”. Let’s see what it does:

Extracting the data pointer

type RuntimeType = ();
let data_pointer: *mut RuntimeType = self as *mut ();

Here, self is a wide pointer of type &mut dyn Iterator<Item = I>. The rules for as state that casting a wide pointer to a thin pointer drops the metadata1, so we can (ab)use that to get the data pointer. Here I just gave the pointer the type *mut RuntimeType, which is an alias for *mut () — i.e., raw pointer to something. The type alias RuntimeType is meant to signify “whatever type of data we have at runtime”. Using () for this is a hack; the “proper” way to model it would be with an existential type. But since Rust doesn’t have those, and I’m not keen to add them if we don’t have to, we’ll just use this type alias for now.

Extracting the vtable (or DynMetadata)

let vtable: DynMetadata = ptr::metadata(self);

The ptr::metadata function was added in RFC 2580. Its purpose is to extract the “metadata” from a wide pointer. The type of this metadata depends on the type of wide pointer you have: this is determined by the Pointee trait[^noreferent]. For dyn types, the metadata is a DynMetadata, which just means “pointer to the vtable”. In today’s APIs, the DynMetadata is pretty limited: it lets you extract the size/alignment of the underlying RuntimeType, but it doesn’t give any access to the actual function pointers that are inside.

Extracting the function pointer from the vtable

let fn_pointer: fn(*mut RuntimeType) -> Option<I> = 
    __get_next_fn_pointer__(vtable);

Now we get to the pseudocode. Somehow, we need a way to get the fn pointer out from the vtable. At runtime, the way this works is that each method has an assigned offset within the vtable, and you basically do an array lookup; kind of like vtable.methods()[0], where methods() returns a array &[fn()] of function pointers. The problem is that there’s a lot of “dynamic typing” going on here: the signature of each one of those methods is going to be different. Moreover, we’d like some freedom to change how vtables are laid out. For example, the ongoing (and awesome!) work on dyn upcasting by Charles Lew has required modifying our vtable layout, and I expect further modification as we try to support dyn types with multiple traits, like dyn Debug + Display.

So, for now, let’s just leave this as pseudocode. Once we’ve finished walking through the example, I’ll return to this question of how we might model __get_next_fn_pointer__ in a forwards compatible way.

One thing worth pointing out: the type of fn_pointer is a fn(*mut RuntimeType) -> Option. There are two interesting things going on here:

The argument has type *mut RuntimeType: using the type alias indicates that this function is known to take a single pointer (in fact, it’s a reference, but those have the same layout). This pointer is expected to point to the same runtime data that self points at — we don’t know what it is, but we know that they’re the same. This works because self paired together a pointer to some data of type RuntimeType along with a vtable of functions that expect RuntimeType references.2
The return type is Option, where I is the item type: this is interesting because although we don’t know statically what the Self type is, we do know the Item type. In fact, we will generate a distinct copy of this impl for every kind of item. This allows us to easily pass the return value.

Calling the function

fn_pointer(data)

The final line in the code is very simple: we call the function! It returns an Option and we can return that to our caller.

Returning to the pseudocode

We relied on one piece of pseudocode in that imaginary impl:

let fn_pointer: fn(*mut RuntimeType) -> Option<I> = 
    __get_next_fn_pointer__(vtable);

So how could we possibly turn __get_next_fn_pointer__ from pseudocode into real code? There are two things worth noting:

First, the name of this function already encodes the method we want (next). We probably don’t want to generate an infinite family of these “getter” functions.
Second, the signature of the function is specific to the method we want, since it returns a fn type(fn *mut RuntimeType) -> Option) that encodes the signature for next (with the self type changed, of course). This seems better than just returning a generic signature like fn() that must be cast manually by the user; less opportunity for error.

Using zero-sized fn types as the basis for an API

One way to solve these problems would be to build on the trait system. Imagine there were a type for every method, let’s call it A, and that this type implemented a trait like AssociatedFn:

trait AssociatedFn {
    // The type of the associated function, but as a `fn` pointer
    // with the self type erased. This is the type that would be
    // encoded in the vtable.
    type FnPointer;

    … // maybe other things
}

We could then define a generic “get function pointer” function like so:

fn associated_fn<A>(vtable: DynMetadata) -> A::FnPtr
where
    A: AssociatedFn

Now instead of __get_next_fn_pointer__, we can write

type NextMethodType =  /* type corresponding to the next method */;
let fn_pointer: fn(*mut RuntimeType) -> Option<I> = 
   associated_fn::<NextMethodType>(vtable);

Ah, but what is this NextMethodType? How do we get the type for the next method? Presumably we’d have to introduce some syntax, like Iterator::item.

Related concept: zero-sized fn types

This idea of a type for associated functions is very close (but not identical) to an already existing concept in Rust: zero-sized function types. As you may know, the type of a Rust function is in fact a special zero-sized type that uniquely identifies the function. There is (presently, anyway) no syntax for this type, but you can observe it by printing out the size of values (playground):

fn foo() { }

// The type of `f` is not `fn()`. It is a special, zero-sized type that uniquely
// identifies `foo`
let f = foo;
println!(“{}”, sizeof_value(&f)); // prints 0

// This type can be coerced to `fn()`, which is a function pointer
let g: fn() = f;
println!(“{}”, sizeof_value(&g)); // prints 8

There are also types for functions that appear in impls. For example, you could get an instance of the type that represents the next method on vec::IntoIter<u32> like so:

let x = <vec::IntoIter<u32> as Iterator>::next;
println!(“{}”, sizeof_value(&f)); // prints 0

Where the zero-sized types don’t fit

The existing zero-sized types can’t be used for our “associated function” type for two reasons:

You can’t name them! We can fix this by adding syntax.
There is no zero-sized type for a trait function independent of an impl.

The latter point is subtle3. Before, when I talked about getting the type for a function from an impl, you’ll note that I gave a fully qualified function name, which specified the Self type precisely:

let x = <vec::IntoIter<u32> as Iterator>::next;
//       ^^^^^^^^^^^^^^^^^^ the Self type

But what we want in our impl is to write code that doesn’t know what the Self type is! So this type that exists in the Rust type system today isn’t quite what we need. But it’s very close.

Conclusion

I’m going to leave it here. Obviously, I haven’t presented any kind of final design, but we’ve seen a lot of tantalizing ingredients:

Today, the compiler generates a impl Iterator for dyn Iterator that extract functions from a vtable and invokes them by magic.
But, using the APIs from RFC 2580, you can almost write the by hand. What is missing is a way to extract a function pointer from a vtable, and what makes that hard is that we need a way to identify the function we are extracting
We have zero-sized types that represent functions today, but we don’t have a way to name them, and we don’t have zero-sized types for functions in traits, only in impls.

Of course, all of the stuff I wrote here was just about normal functions. We still need to circle back to async functions, which add a few extra wrinkles. Until next time!

Footnotes

I don’t actually like these rules, which have bitten me a few times. I think we should introduce an accessor function, but I didn’t see one in RFC 2580 — maybe I missed it, or it already exists. ↩︎
If you used unsafe code to pair up a random pointer with an unrelated vtable, then hilarity would ensue here, as there is no runtime checking that these types line up. ↩︎
And, in fact, I didn’t see it until I was writing this blog post! ↩︎

https://smallcultfollowing.com/babysteps/blog/2021/10/06/dyn-async-traits-part-3/

Dyn async traits, part 2

Oct 1, 2021 Updated Oct 1, 2021

Show full content

In the previous post, we uncovered a key challenge for dyn and async traits: the fact that, in Rust today, dyn types have to specify the values for all associated types. This post is going to dive into more background about how dyn traits work today, and in particular it will talk about where that limitation comes from.

Today: Dyn traits implement the trait

In Rust today, assuming you have a “dyn-safe” trait DoTheThing , then the type dyn DoTheThing implements Trait. Consider this trait:

trait DoTheThing {
	fn do_the_thing(&self);
}

impl DoTheThing for String {
    fn do_the_thing(&self) {
        println!(“{}”, self);
    }
}

And now imagine some generic function that uses the trait:

fn some_generic_fn<T: ?Sized + DoTheThing>(t: &T) {
	t.do_the_thing();
}

Naturally, we can call some_generic_fn with a &String, but — because dyn DoTheThing implements DoTheThing — we can also call some_generic_fn with a &dyn DoTheThing:

fn some_nongeneric_fn(x: &dyn DoTheThing) {
    some_generic_fn(x)
}

Dyn safety, a mini retrospective

Early on in Rust, we debated whether dyn DoTheThing ought to implement the trait DoTheThing or not. This was, indeed, the origin of the term “dyn safe” (then called “object safe”). At the time, I argued in favor of the current approach: that is, creating a binary property. Either the trait was dyn safe, in which case dyn DoTheThing implements DoTheThing, or it was not, in which case dyn DoTheThing is not a legal type. I am no longer sure that was the right call.

What I liked at the time was the idea that, in this model, whenever you see a type like dyn DoTheThing, you know that you can use it like any other type that implements DoTheThing.

Unfortunately, in practice, the type dyn DoTheThing is not comparable to a type like String. Notably, dyn types are not sized, so you can’t pass them around by value or work with them like strings. You must instead always pass around some kind of pointer to them, such as a Box<dyn DoTheThing> or a &dyn DoTheThing. This is “unusual” enough that we make you opt-in to it for generic functions, by writing T: ?Sized.

What this means is that, in practice, generic functions don’t accept dyn types “automatically”, you have to design for dyn explicitly. So a lot of the benefit I envisioned didn’t come to pass.

Static versus dynamic dispatch, vtables

Let’s talk for a bit about dyn safety and where it comes from. To start, we need to explain the difference between static dispatch and virtual (dyn) dispatch. Simply put, static dispatch means that the compiler knows which function is being called, whereas dyn dispatch means that the compiler doesn’t know. In terms of the CPU itself, there isn’t much difference. With static dispatch, there is a “hard-coded” instruction that says “call the code at this address”1; with dynamic dispatch, there is an instruction that says “call the code whose address is in this variable”. The latter can be a bit slower but it hardly matters in practice, particularly with a successful prediction.

When you use a dyn trait, what you actually have is a vtable. You can think of a vtable as being a kind of struct that contains a collection of function pointers, one for each method in the trait. So the vtable type for the DoTheThing trait might look like (in practice, there is a bit of extra data, but this is close enough for our purposes):

struct DoTheThingVtable {
    do_the_thing: fn(*mut ())
}

Here the do_the_thing method has a corresponding field. Note that the type of the first argument ought to be &self, but we changed it to *mut (). This is because the whole idea of the vtable is that you don’t know what the self type is, so we just changed it to “some pointer” (which is all we need to know).

When you create a vtable, you are making an instance of this struct that is tailored to some particular type. In our example, the type String implements DoTheThing, so we might create the vtable for String like so:

static Vtable_DoTheThing_String: &DoTheThingVtable = &DoTheThingVtable {
    do_the_thing: <String as DoTheThing>::do_the_thing as fn(*mut ())
    //            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    //            Fully qualified reference to `do_the_thing` for strings
};

You may have heard that a &dyn DoTheThing type in Rust is a wide pointer. What that means is that, at runtime, it is actually a pair of two pointers: a data pointer and a vtable pointer for the DoTheThing trait. So &dyn DoTheThing is roughly equivalent to:

(*mut (), &’static DoTheThingVtable)

When you cast a &String to a &dyn DoTheThing, what actually happens at runtime is that the compiler takes the &String pointer, casts it to *mut (), and pairs it with the appropriate vtable. So, if you have some code like this:

let x: &String = &”Hello, Rustaceans”.to_string();
let y: &dyn DoTheThing = x;

It winds up “desugared” to something like this:

let x: &String = &”Hello, Rustaceans”.to_string();
let y: (*mut (), &’static DoTheThingVtable) = 
    (x as *mut (), Vtable_DoTheThing_String);

The dyn impl

We’ve seen how you create wide pointers and how the compiler represents vtables. We’ve also seen that, in Rust, dyn DoTheThing implements DoTheThing. You might wonder how that works. Conceptually, the compiler generates an impl where each method in the trait is implemented by extracting the function pointer from the vtable and calling it:

impl DoTheThing for dyn DoTheThing {
    fn do_the_thing(self: &dyn DoTheThing) {
        // Remember that `&dyn DoTheThing` is equivalent to
        // a tuple like `(*mut (), &’static DoTheThingVtable)`:
        let (data_pointer, vtable_pointer) = self;

        let function_pointer = vtable_pointer.do_the_thing;
        function_pointer(data_pointer);
    }
}

In effect, when we call a generic function like some_generic_fn with T = dyn DoTheThing, we monomorphize that call exactly like any other type. The call to do_the_thing is dispatched against the impl above, and it is that special impl that actually does the dynamic dispatch. Neat.

Static dispatch permits monomorphization

Now that we’ve seen how and when vtables are constructed, we can talk about the rules for dyn safety and where they come from. One of the most basic rules is that a trait is only dyn-safe if it contains no generic methods (or, more precisely, if its methods are only generic over lifetimes, not types). The reason for this rule derives directly from how a vtable works: when you construct a vtable, you need to give a single function pointer for each method in the trait (or, perhaps, a finite set of function pointers). The problem with generic methods is that there is no single function pointer for them: you need a different pointer for each type that they’re applied to. Consider this example trait, PrintPrefixed:

trait PrintPrefixed {
    fn prefix(&self) -> String;
    fn apply<T: Display>(&self, t: T);
}

impl PrintPrefixed for String {
    fn prefix(&self) -> String {
        self.clone()
    }
    fn apply<T: Display>(&self, t: T) {
        println!(“{}: {}”, self, t);
    }
}

What would a vtable for String as PrintPrefixed look like? Generating a function pointer for prefix is no problem, we can just use <String as PrintPrefixed>::prefix. But what about apply? We would have to include a function pointer for <String as PrintPrefixed>::apply<T>, but we don’t know yet what the T is!

In contrast, with static dispatch, we don’t have to know what T is until the point of call. In that case, we can generate just the copy we need.

Partial dyn impls

The previous point shows that a trait can have some methods that are dyn-safe and some methods that are not. In current Rust, this makes the entire trait be “not dyn safe”, and this is because there is no way for us to write a complete impl PrintPrefixed for dyn PrintPrefixed:

impl PrintPrefixed for dyn PrintPrefixed {
    fn prefix(&self) -> String {
        // For `prefix`, no problem:
        let prefix_fn = /* get prefix function pointer from vtable */;
        prefix_fn(…);
    }
    fn apply<T: Display>(&self, t: T) {
        // For `apply`, we can’t handle all `T` types, what field to fetch?
        panic!(“No way to implement apply”)
    }
}

Under the alternative design that was considered long ago, we could say that a dyn PrintPrefixed value is always legal, but dyn PrintPrefixed only implements the PrintPrefixed trait if all of its methods (and other items) are dyn safe. Either way, if you had a &dyn PrintPrefixed, you could call prefix. You just wouldn’t be able to use a dyn PrintPrefixed with generic code like fn foo<T: ?Sized + PrintPrefixed>.

(We’ll return to this theme in future blog posts.)

If you’re familiar with the “special case” around trait methods that require where Self: Sized, you might be able to see where it comes from now. If a method has a where Self: Sized requirement, and we have an impl for a type like dyn PrintPrefixed, then we can see that this impl could never be called, and so we can omit the method from the impl (and vtable) altogether. This is awfully similar to saying that dyn PrintPrefixed is always legal, because it means that there only a subset of methods that can be used via virtual dispatch. The difference is that dyn PrintPrefixed: PrintPrefixed still holds, because we know that generic code won’t be able to call those “non-dyn-safe” methods, since generic code would have to require that T: ?Sized.

Associated types and dyn types

We began this saga by talking about associated types and dyn types. In Rust today, a dyn type is required to specify a value for each associated type in the trait. For example, consider a simplified Iterator trait:

trait Iterator {
    type Item;

    fn next(&mut self) -> Option<Self::Item>;
}

This trait is dyn safe, but if you actually have a dyn in practice, you would have to write something like dyn Iterator<Item = u32>. The impl Iterator for dyn Iterator looks like:

impl<T> Iterator for dyn Iterator<Item = T> {
    type Item = T;
    
    fn next(&mut self) -> Option<T> {
        let next_fn = /* get next function from vtable */;
        return next_fn(self);
    }
}

Now you can see why we require all the associated types to be part of the dyn type — it lets us write a complete impl (i.e., one that includes a value for each of the associated types).

Conclusion

We covered a lot of background in this post:

Static vs dynamic dispatch, vtables
The origin of dyn safety, and the possibility of “partial dyn safety”
The idea of a synthesized impl Trait for dyn Trait

Modulo dynamic linking. ↩︎

https://smallcultfollowing.com/babysteps/blog/2021/10/01/dyn-async-traits-part-2/

Dyn async traits, part 1

Sep 30, 2021 Updated Sep 30, 2021

Show full content

Over the last few weeks, Tyler Mandry and I have been digging hard into what it will take to implement async fn in traits. Per the new lang team initiative process, we are collecting our design thoughts in an ever-evolving website, the async fundamentals initiative. If you’re interested in the area, you should definitely poke around; you may be interested to read about the MVP that we hope to stabilize first, or the (very much WIP) evaluation doc which covers some of the challenges we are still working out. I am going to be writing a series of blog posts focusing on one particular thing that we have been talking through: the problem of dyn and async fn. This first post introduces the problem and the general goal that we are shooting for (but don’t yet know the best way to reach).

What we’re shooting for

What we want is simple. Imagine this trait, for “async iterators”:

trait AsyncIter {
    type Item;
    async fn next(&mut self) -> Option<Self::Item>;
}

We would like you to be able to write a trait like that, and to implement it in the obvious way:

struct SleepyRange {
    start: u32,
    stop: u32,
}

impl AsyncIter for SleepyRange {
    type Item = u32;
    
    async fn next(&mut self) -> Option<Self::Item> {
        tokio::sleep(1000).await; // just to await something :)
        let s = self.start;
        if s < self.stop {
            self.start = s + 1;
            Some(s)
        } else {
            None
        }
    }
}

You should then be able to have a Box<dyn AsyncIter<Item = u32>> and use that in exactly the way you would use a Box<dyn Iterator<Item = u32>> (but with an await after each call to next, of course):

let b: Box<dyn AsyncIter<Item = u32>> = ...;
let i = b.next().await;

Desugaring to an associated type

Consider this running example:

trait AsyncIter {
    type Item;
    async fn next(&mut self) -> Option<Self::Item>;
}

Here, the next method will desugar to a fn that returns some kind of future; you can think of it like a generic associated type:

trait AsyncIter {
    type Item;

    type Next<'me>: Future<Output = Self::Item> + 'me;
    fn next(&mut self) -> Self::Next<'_>;
}

The corresponding desugaring for the impl would use type alias impl trait:

struct SleepyRange {
    start: u32,
    stop: u32,
}

// Type alias impl trait:
type SleepyRangeNext<'me> = impl Future<Output = u32> + 'me;

impl AsyncIter for InfinityAndBeyond {
    type Item = u32;
    
    type Next<'me> = SleepyRangeNext<'me>;
    fn next(&mut self) -> SleepyRangeNext<'me> {
        async move {
            tokio::sleep(1000).await;
            let s = self.start;
            ... // as above
        }
    }
}

This desugaring works quite well for standard generics (or impl Trait). Consider this function:

async fn process<T>(t: &mut T) -> u32
where
    T: AsyncIter<Item = u32>,
{
    let mut sum = 0;
    while let Some(x) = t.next().await {
        sum += x;
        if sum > 22 {
            break;
        }
    }
    sum
}

This code will work quite nicely. For example, when you call t.next(), the resulting future will be of type T::Next. After monomorphization, the compiler will be able to resolve <SleepyRange as AsyncIter>::Next to the SleepyRangeNext type, so that the future is known exactly. In fact, crates like embassy already use this desugaring, albeit manually and only on nightly.

Associated types don’t work for dyn

Unfortunately, this desugaring causes problems when you try to use dyn values. Today, when you have dyn AsyncIter, you must specify the values for all associated types defined in AsyncIter. So that means that instead of dyn AsyncIter<Item = u32>, you would have to write something like

for<'me> dyn AsyncIter<
    Item = u32, 
    Next<'me> = SleepyRangeNext<'me>,
>

This is clearly a non-starter from an ergonomic perspective, but is has an even more pernicious problem. The whole point of a dyn trait is to have a value where we don’t know what the underlying type is. But specifying the value of Next<'me> as SleepyRangeNext means that there is exactly one impl that could be in use here. This dyn value must be a SleepyRange, since no other impl has that same future.

Conclusion: For dyn AsyncIter to work, the future returned by next() must be independent of the actual impl. Furthermore, it must have a fixed size. In other words, it needs to be something like Box<dyn Future<Output = u32>>.

How the async-trait crate solves this problem

You may have used the async-trait crate. It resolves this problem by not using an associated type, but instead desugaring to Box<dyn Future> types:

trait AsyncIter {
    type Item;

    fn next(&mut self) -> Box<dyn Future<Output = Self::Item> + Send + 'me>;
}

This has a few disadvantages:

It forces a Box all the time, even when you are using AsyncIter with static dispatch.
The type as given above says that the resulting future must be Send. For other async fn, we use auto traits to analyze automatically whether the resulting future is send (it is Send it if it can be, in other words; we don’t declare up front whether it must be).

Conclusion: Ideally we want Box when using dyn, but not otherwise

So far we’ve seen:

If we desugar async fn to an associated type, it works well for generic cases, because we can resolve the future to precisely the right type.
But it doesn’t work for doesn’t work well for dyn trait, because the rules of Rust require that we specify the value of the associated type exactly. For dyn traits, we really want the returned future to be something like Box<dyn Future>.
- Using Box does mean a slight performance penalty relative to static dispatch, because we must allocate the future dynamically.

What we would ideally want is to only pay the price of Box when using dyn:

When you use AsyncIter in generic types, you get the desugaring shown above, with no boxing and static dispatch.
But when you create a dyn AsyncIter, the future type becomes Box<dyn Future<Output = u32>>.
- (And perhaps you can choose another “smart pointer” type besides Box, but I’ll ignore that for now and come back to it later.)

In upcoming posts, I will dig into some of the ways that we might achieve this.

https://smallcultfollowing.com/babysteps/blog/2021/09/30/dyn-async-traits-part-1/

Rustacean Principles, continued

Sep 16, 2021 Updated Sep 16, 2021

Show full content

RustConf is always a good time for reflecting on the project. For me, the last week has been particularly “reflective”. Since announcing the Rustacean Principles, I’ve been having a number of conversations with members of the community about how they can be improved. I wanted to write a post summarizing some of the feedback I’ve gotten.

The principles are a work-in-progress

Sparking conversation about the principles was exactly what I was hoping for when I posted the previous blog post. The principles have mostly been the product of Josh and I iterating, and hence reflect our experiences. While the two of us have been involved in quite a few parts of the project, for the document to truly serve its purpose, it needs input from the community as a whole.

Unfortunately, for many people, the way I presented the principles made it seem like I was trying to unveil a fait accompli, rather than seeking input on a work-in-progress. I hope this post makes the intention more clear!

The principles as a continuation of Rust’s traditions

Rust has a long tradition of articulating its values. This is why we have a Code of Conduct. This is why we wrote blog posts like Fearless Concurrency, Stability as a Deliverable and Rust Once, Run Anywhere. Looking past the “engineering side” of Rust, aturon’s classic blog posts on listening and trust (part 1, part 2, part 3) did a great job of talking about what it is like to be on a Rust team. And who could forget the whole “fireflowers” debate?1

My goal with the Rustacean Principles is to help coalesce the existing wisdom found in those classic Rust blog posts into a more concise form. To that end, I took initial inspiration from how AWS uses tenets, although by this point the principles have evolved into a somewhat different form. I like the way tenets use short, crisp statements that identify important concepts, and I like the way assigning a priority ordering helps establish which should have priority. (That said, one of Rust’s oldest values is synthesis: we try to find ways to resolve constraints that are in tension by having our cake and eating it too.)

Given all of this backdrop, I was pretty enthused by a suggestion that I heard from Jacob Finkelman. He suggested adapting the principles to incorporate more of the “classic Rust catchphrases”, such as the “no new rationale” rule described in the first blog post from aturon’s series. A similar idea is to incorporate the lessons from RFCs, both successful and unsuccessful (this is what I was going for in the case studies section, but that clearly needs to be expanded).

The overall goal: Empowerment

My original intention was to structure the principles as a cascading series of ideas:

Rust’s top-level goal: Empowerment
- Principles: Dissecting empowerment into its constituent pieces – reliable, performant, etc – and analyzing the importance of those pieces relative to one another.
  - Mechanisms: Specific rules that we use, like type safety, that engender the principles (reliability, performance, etc.). These mechanisms often work in favor of one principle, but can work against others.

wycats suggested that the site could do a better job of clarifying that empowerment is the top-level, overriding goal, and I agree. I’m going to try and tweak the site to make it clearer.

A goal, not a minimum bar

The principles in “How to Rustacean” were meant to be aspirational: a target to be reaching for. We’re all human: nobody does everything right all the time. But, as Matklad describes, the principles could be understood as setting up a kind of minimum bar – to be a team member, one has to show up, follow through, trust and delegate, all while bringing joy? This could be really stressful for people.

The goal for the “How to Rustacean” section is to be a way to lift people up by giving them clear guidance for how to succeed; it helps us to answer people when they ask “what should I do to get onto the lang/compiler/whatever team”. The internals thread had a number of good ideas for how to help it serve this intended purpose without stressing people out, such as cuviper’s suggestion to use fictional characters like Ferris in examples, passcod’s suggestion of discussing inclusion, or Matklad’s proposal to add something to the effect of “You don’t have to be perfect” to the list. Iteration needed!

Scope of the principles

Some people have wondered why the principles are framed in a rather general way, one that applies to all of Rust, instead of being specific to the lang team. It’s a fair question! In fact, they didn’t start this way. They started their life as a rather narrow set of “design tenets for async” that appeared in the async vision doc. But as those evolved, I found that they were starting to sound like design goals for Rust as a whole, not specifically for async.

Trying to describe Rust as a “coherent whole” makes a lot of sense to me. After all, the experience of using Rust is shaped by all of its facets: the language, the libraries, the tooling, the community, even its internal infrastructure (which contributes to that feeling of reliability by ensuring that the releases are available and high quality). Every part has its own role to play, but they are all working towards the same goal of empowering Rust’s users.2

There is an interesting question about the long-term trajectory for this work. In my mind, the principles remain something of an experiment. Presuming that they prove to be useful, I think that they would make a nice RFC.

What about “easy”?

One final bit of feedback I heard from Carl Lerche is surprise that the principles don’t include the word “easy”. This not an accident. I felt that “easy to use” was too subjective to be actionable, and that the goals of productive and supportive were more precise. However, I do think that for people to feel empowered, it’s important for them not feel mentally overloaded, and Rust can definitely have the problem of carrying a high mental load sometimes.

I’m not sure the best way to tweak the “Rust empowers by being…” section to reflect this, but the answer may lie with the Cognitive Dimensions of Notation. I was introduced to these from Felienne Herman’s excellent book The Programmer’s Brain; I quite enjoyed this journal article as well.

The idea of the CDN is to try and elaborate on the ways that tools can be easier or harder to use for a particular task. For example, Rust would likely do well on the “error prone” dimension, in that when you make changes, the compiler generally helps ensure they are correct. But Rust does tend to have a high “viscosity”, because making local changes tends to be difficult: adding a lifetime, for example, can require updating data structures all over the code in an annoying cascade.

It’s important though to keep in mind that the CDN will vary from task to task. There are many kinds of changes one can make in Rust with very low viscosity, such as adding a new dependency. On the other hand, there are also cases where Rust can be error prone, such as mixing async runtimes.

Conclusion

In retrospect, I wish I had introduced the concept of the Rustacean Principles in a different way. But the subsequent conversations have been really great, and I’m pretty excited by all the ideas on how to improve them. I want to encourage folks again to come over to the internals thread with their thoughts and suggestions.

Love that web page, brson. ↩︎
One interesting question: I do think that some tools may vary the prioritization of different aspects of Rust. For example, a tool for formal verification is obviously aimed at users that particularly value reliability, but other tools may have different audiences. I’m not sure yet the best way to capture that, it may well be that each tool can have its own take on the way that it particularly empowers. ↩︎

https://smallcultfollowing.com/babysteps/blog/2021/09/16/rustacean-principles-continued/

CTCFT 2021-09-20 Agenda

Sep 15, 2021 Updated Sep 15, 2021

Show full content

The next “Cross Team Collaboration Fun Times” (CTCFT) meeting will take place next Monday, on 2021-09-20 (in your time zone)! This post covers the agenda. You’ll find the full details (along with a calendar event, zoom details, etc) on the CTCFT website.

Agenda

Announcements
Interest group panel discussion

We’re going to try something a bit different this time! The agenda is going to focus on Rust interest groups and domain working groups, those brave explorers who are trying to put Rust to use on all kinds of interesting domains. Rather than having fixed presentations, we’re going to have a panel discussion with representatives from a number of Rust interest groups and domain groups, led by AngelOnFira. The idea is to open a channel for communication about how to have more active communication and feedback between interest groups and the Rust teams (in both directions).

Afterwards: Social hour

https://smallcultfollowing.com/babysteps/blog/2021/09/15/ctcft-2021-09-20-agenda/

Rustacean Principles

Sep 8, 2021 Updated Sep 8, 2021

Show full content

As the web site says, Rust is a language empowering everyone to build reliable and efficient software. I think it’s precisely this feeling of empowerment that people love about Rust. As wycats put it recently to me, Rust makes it “feel like things are possible that otherwise feel out of reach”. But what exactly makes Rust feel that way? If we can describe it, then we can use that description to help us improve Rust, and to guide us as we design extensions to Rust.

Besides the language itself, Rust is also an open-source community, one that prides itself on our ability to do collaborative design. But what do we do which makes us able to work well together? If we can describe that, then we can use those descriptions to help ourselves improve, and to instruct new people on how to better work within the community.

This blog post describes a project I and others have been working on called the Rustacean principles. This project is an attempt to enumerate the (heretofore implicit) principles that govern both Rust’s design and the way our community operates. The principles are still in draft form; for the time being, they live in the nikomatsakis/rustacean-principles repository.

How the principles got started

The Rustacean Principles were suggested by Shane during a discussion about how we can grow the Rust organization while keeping it true to itself. Shane pointed out that, at AWS, mechanisms like tenets and the leadership principles are used to communicate and preserve shared values.1 The goal at AWS, as in the Rust org, is to have teams that operate independently but which still wind up “itching in the same direction”, as aturon so memorably put it.

Since that initial conversation, the principles have undergone quite some iteration. The initial effort, which I presented at the CTCFT on 2021-06-21, were quite closely modeled on AWS tenets. After a number of in-depth conversations with both joshtriplett and aturon, though, I wound up evolving the structure quite a bit to what you see today. I expect them to continue evolving, particularly the section on what it means to be a team member, which has received less attention.

Rust empowers by being…

The principles are broken into two main sections. The first describes Rust’s particular way of empowering people. This description comes in the form of a list of properties that we are shooting for:

Rust empowers by being…
- ⚙️ Reliable: “if it compiles, it works”
- 🐎 Performant: “idiomatic code runs efficiently”
- 🥰 Supportive: “the language, tools, and community are here to help”
- 🧩 Productive: “a little effort does a lot of work”
- 🔧 Transparent: “you can predict and control low-level details”
- 🤸 Versatile: “you can do anything with Rust”

These properties are frequently in tension with one another. Our challenge as designers is to find ways to satisfy all of these properties at once. In some cases, though, we may be forced to decide between slightly penalizing one goal or another. In that case, we tend to give the edge to those goals that come earlier in the list over those that come later. Still, while the ordering is important, it’s important to emphasize that for Rust to be successful we need to achieve all of these feelings at once.

Each of the properties has a page that describes it in more detail. The page also describes some specific mechanisms that we use to achieve this property. These mechanisms take the form of more concrete rules that we apply to Rust’s design. For example, the page for reliability discusses type safety, consider all cases, and several other mechanisms. The discussion gives concrete examples of the tradeoffs at play and some of the techniques we have used to mitigate them.

One thing: these principles are meant to describe more than just the language. For example, one example of Rust being supportive are the great error messages, and Cargo’s lock files and dependency system are geared towards making Rust feel reliable.

How to Rustacean

Rust has been an open source project since its inception, and over time we have evolved and refined the way that we operate. One key concept for Rust are the governance teams, whose members are responsible for decisions regarding Rust’s design and maintenance. We definitely have a notion of what it means “to Rustacean” – there are specific behaviors that we are looking for. But it has historically been really challenging to define them, and in turn to help people to achieve them (or to recognize when we ourselves are falling short!). The next section of this site, How to Rustacean, is a first attempt at drafting just such a list. You can think of it like a companion to the Code of Conduct: whereas the CoC describes the bare minimum expected of any Rust participant, the How to Rustacean section describes what it means to excel.

How to Rustacean
- 💖 Be kind and considerate
- ✨ Bring joy to the user
- 👋 Show up
- 🔭 Recognize others’ knowledge
- 🔁 Start somewhere
- ✅ Follow through
- 🤝 Pay it forward
- 🎁 Trust and delegate

This section of the site has undergone less iteration than the “Rust empowerment” section. The idea is that each of these principles has a dedicated page that elaborates on the principle and gives examples of it in action. The example of Raising an objection about a design (from Show up) is the most developed and a good one to look at to get the idea. One interesting bit is the “goldilocks” structure2, which indicates what it means to “show up” too little but also what it means to “show up” too much.

How the principles can be used

For the principles to be a success, they need to be more than words on a website. I would like to see them become something that we actively reference all the time as we go about our work in the Rust org.

As an example, we were recently wrestling with a minor point about the semantics of closures in Rust 2021. The details aren’t that important (you can read them here, if you like), but the decision ultimately came down to a question of whether to adapt the rules so that they are smarter, but more complex. I think it would have been quite useful to refer to these principles in that discussion: ultimately, I think we chose to (slightly) favor productivity at the expense of transparency, which aligns well with the ordering on the site. Further, as I noted in my conclusion, I would personally like to see some form of explicit capture clause for closures, which would give users a way to ensure total transparency in those cases where it is most important.

The How to Rustacean section can be used in a number of ways. One thing would be cheering on examples of where someone is doing a great job: Mara’s issue celebrating all the contributions to the 2021 Edition is a great instance of paying it forward, for example, and I would love it if we had a precise vocabulary for calling that out.

Another time these principles can be used is when looking for new candidates for team membership. When considering a candidate, we can look to see whether we can give concrete examples of times they have exhibited these qualities. We can also use the principles to give feedback to people about where they need to improve. I’d like to be able to tell people who are interested in joining a Rust team, “Well, I’ve noticed you do a great job of showing up, but your designs tend to get mired in complexity. I think you should work on start somewhere.”

“Hard conversations” where you tell someone what they can do better are something that mangers do (or try to do…) in companies, but which often get sidestepped or avoided in an open source context. I don’t claim to be an expert, but I’ve found that having structure can help to take away the “sting” and make it easier for people to hear and learn from the feedback.3

What comes next

I think at this point the principles have evolved enough that it makes sense to get more widespread feedback. I’m interested in hearing from people who are active in the Rust community about whether they reflect what you love about Rust (and, if not, what might be changed). I also plan to try and use them to guide both design discussions and questions of team membership, and I encourage others in the Rust teams to do the same. If we find that they are useful, then I’d like to see them turned into an RFC and ultimately living on forge or somewhere more central.

Questions?

I’ve opened an internals thread for discussion.

Footnotes

One of the first things that our team did at Amazon was to draft its own tenets; the discussion helped us to clarify what we were setting out to do and how we planned to do it. ↩︎
Hat tip to Marc Brooker, who suggested the “Goldilocks” structure, based on how the Leadership Principles are presented in the AWS wiki. ↩︎
Speaking of which, one glance at my queue of assigned PRs make it clear that I need to work on my follow through. ↩︎

https://smallcultfollowing.com/babysteps/blog/2021/09/08/rustacean-principles/

Next CTCFT Meeting: 2021-09-20

Aug 30, 2021 Updated Aug 30, 2021

Show full content

Hold the date! The next Cross Team Collaboration Fun Times meeting will be 2021-09-20. We’ll be using the “Asia-friendly” time slot of 21:00 EST.

What will the talks be about?

A detailed agenda will be announced in a few weeks. Current thinking however is to center the agenda on Rust interest groups and domain working groups, those brave explorers who are trying to put Rust to use on all kinds of interesting domains, such as game development, cryptography, machine learning, formal verification, and embedded development. If you run an interest group and I didn’t list your group here, perhaps you want to get in touch! We’ll be talking about how these groups operate and how we can do a better job of connecting interest groups with the Rust org.

Will there be a social hour?

Absolutely! The social hour has been an increasingly popular feature of the CTCFT meeting. It will take place after the meeting (22:00 EST).

How can I get this on my calendar?

The CTCFT meetings are announced on this google calendar.

Wait, what about August?

Perceptive readers will note that there was no CTCFT meeting in August. That’s because I and many others were on vacation. =)

https://smallcultfollowing.com/babysteps/blog/2021/08/30/next-ctcft-meeting-2021-09-20/

CTCFT 2021-07-19 Agenda

Jul 12, 2021 Updated Jul 12, 2021

Show full content

The next “Cross Team Collaboration Fun Times” (CTCFT) meeting will take place one week from today, on 2021-07-19 (in your time zone)! What follows are the abstracts for the talks we have planned. You’ll find the full details (along with a calendar event, zoom details, etc) on the CTCFT website.

Mentoring

Presented by: doc-jones

The Rust project has a number of mechanisms for getting people involved in the project, but most are oriented around 1:1 engagement. Doc has been investigating some of the ways that other projects engage contributors, such as Python’s mentored sprints. She will discuss how some of those projects run things and share some ideas about how that might be applied in the Rust project.

Lang team initiative process

Presented by: joshtriplett

The lang team recently established a new process we call initiatives. This is a refinement of the RFC process to include more explicit staging. Josh will talk about the new process, what motivated it, and how we’re trying to build more sustainable processes.

Driving discussions via postmortem analysis

Presented by: TBD

Innovation means taking risks, and risky behavior sometimes leads to process failures. An example of a recent process failure was the Rust 1.52.0 release, and subsequent 1.52.1 patch release that followed a few days later. Every failure presents an opportunity to learn from our mistakes and correct our processes going forward. In response to the 1.52.0 event, the compiler team recently went through a “course correction” postmortem process inspired by the “Correction of Error” reviews that pnkfelix has observed at Amazon. This talk describes the structure of a formal postmortem, and discusses how other Rust teams might deploy similar postmortem activities for themselves.

Afterwards: Social hour

https://smallcultfollowing.com/babysteps/blog/2021/07/12/ctcft-2021-07-19-agenda/

CTCFT Social Hour

Jun 18, 2021 Updated Jun 18, 2021

Show full content

Hey everyone! At the CTCFT meeting this Monday (2021-06-21), we’re going to try a “social hour”. The idea is really simple: for the hour after the meeting, we will create breakout rooms in Zoom with different themes. You can join any breakout room you like and hangout.

The themes for the breakout rooms will be based on suggestions. If you have an idea for a room you’d like to try, you can post it in a dedicated topic on the #ctcft Zulip stream. Or, if you see somebody else has posted an idea that you like, then add a 👍 emoji. We’ll create the final breakout list based on what we see there.

The breakout rooms can be as casual or focused as you like. For example, we will have some default rooms for hanging out – please make suggestons for icebreaker topics on Zulip! We also plan to have some rooms where people are chatting while doing Rust work: for example, yaahc suggested for folks who want to write mentoring instructions.

Also: a reminder that there is a CTCFT Calendar that you can subscribe to to be reminded of future meetings. If you like, I can add you to the invite, just ask on Zulip or Discord.

See you there!

https://smallcultfollowing.com/babysteps/blog/2021/06/18/ctcft-social-hour/

CTCFT 2021-06-21 Agenda

Jun 14, 2021 Updated Jun 14, 2021

Show full content

The second “Cross Team Collaboration Fun Times” (CTCFT) meeting will take place one week from today, on 2021-06-21 (in your time zone)! This post describes the main agenda items for the meeting; you’ll find the full details (along with a calendar event, zoom details, etc) on the CTCFT website.

Afterwards: Social hour

Turbowish and Tokio console

Presented by: pnkfelix and Eliza (hawkw)

Rust programs are known for being performant and correct – but what about when that’s not true? Unfortunately, the state of the art for Rust tooling today can often be a bit difficult. This is particularly true for Async Rust, where users need insights into the state of the async runtime so that they can resolve deadlocks and tune performance. This talk discuss what top-notch debugging and tooling for Rust might look like. One particularly exciting project in this area is tokio-console, which lets users visualize the state of projects build on the tokio library.

Guiding principles for Rust

Presented by: nikomatsakis

As Rust grows, we need to ensure that it retains a coherent design. Establishing a set of “guiding principles” is one mechanism for doing that. Each principle captures a goal that Rust aims to achieve, such as ensuring correctness, or efficiency. The principles give us a shared vocabulary to use when discussing designs, and they are ordered so as to give guidance in resolving tradeoffs. This talk will walk through a draft set of guiding principles for Rust that nikomatsakis has been working on, along with examples of how they those principles are enacted through Rust’s language, library, and tooling.

https://smallcultfollowing.com/babysteps/blog/2021/06/14/ctcft-2021-06-21-agenda/

Edition: the song

May 26, 2021 Updated May 26, 2021

Show full content

You may have heard that the Rust 2021 Edition is coming. Along with my daughter Daphne, I have recorded a little song in honor of the occasion! The full lyrics are below – if you feel inspired, please make your own version!1 Enjoy!

Video

Watch the movie embedded here, or watch it on YouTube:

Lyrics

(Spoken)
Breaking changes where no code breaks.
Sounds impossible, no?
But in the Rust language, you might say that we like to do impossible things.
It isn’t easy.
You may ask, how do we manage such a thing?
That I can tell you in one word… Edition!

(Chorus)
Edition, edition… edition!

(Lang)
Who day and night
Is searching for a change
Whatever they can do
So Rust’s easier for you
Who sometimes finds
They have to tweak the rules
And change a thing or two in Rust?

(All)
The lang team, the lang team… edition!
The lang team, the lang team… edition!

(Libs)
Who designs the traits that we use each day?
All the time, in every way?
Who updates the prelude so that we can call
The methods that we want no sweat

(All)
The libs team, the libs team… edition!
The libs team, the libs team… edition!

(Users)
Three years ago I changed my code
to Rust twenty eighteen
Some dependencies did not
But they… kept working.

(All)
The users, the users… edition!
The users, the users… edition!

(Tooling)
And who does all this work
To patch and tweak and fix
Migrating all our code
Each edition to the next

(All)
The tooling, the tooling… edition!
The tooling, the tooling… edition!

(Spoken)
And here in Rust, we’ve always had our little slogans.
For instance, abstraction… without overhead.
Concurrency… without data races.
Stability… without stagnation.
Hack… without fear.
But we couldn’t do all of those things…
not without…
Edition!

Footnotes

OMG, that would be amazing. I’ll update the post with any such links I find. ↩︎

https://smallcultfollowing.com/babysteps/blog/2021/05/26/edition-the-song/

CTCFTFTW

May 14, 2021 Updated May 14, 2021

Show full content

This Monday I am starting something new: a monthly meeting called the “Cross Team Collaboration Fun Times” (CTCFT)1. Check out our nifty logo2:

Logo

The meeting is a mechanism to help keep the members of the Rust teams in sync and in touch with one another. The idea is to focus on topics of broad interest (more than two teams):

Status updates on far-reaching projects that could affect multiple teams;
Experience reports about people trying new things (sometimes succeeding, sometimes not);
“Rough draft” proposals that are ready to be brought before a wider audience.

The meeting will focus on things that could either offer insights that might affect the work you’re doing, or where the presenter would like to pose questions to the Rust teams and get feedback.

I announced the meeting some time back to all@rust-lang.org, but I wanted to make a broader announcement as well. This meeting is open for anyone to come and observe. This is by design. Even though the meeting is primarily meant as a forum for the members of the Rust teams, it can be hard to define the borders of a community like ours. I’m hoping we’ll get people who work on major Rust libraries in the ecosystem, for example, or who work on the various Rust teams that have come into being.

The first meeting is scheduled for 2021-05-17 at 15:00 Eastern and you will find the agenda on the CTCFT website, along with links to the slides (still a work-in-progress as of this writing!). There is also a twitter account @RustCTCFT and a Google calendar that you can subscribe to.

I realize the limitations of a synchronous meeting. Due to the reality of time zones and a volunteer project, for example, we’ll never be able to get all of Rust’s global community to attend at once. I’ve designed the meeting to work well even if you can’t attend: the goal is have a place to start conversations, not to finish them. Agendas are annonunced well in advance and the meetings are recorded. We’re also rotating times – the next meeting on 2021-06-21 takes place at 21:00 Eastern time, for example.3

Hope to see you there!

Footnotes

In keeping with Rust’s long-standing tradition of ridiculous acronyms. ↩︎
Thanks to @Xfactor521! 🙏 ↩︎
The agenda is still TBD. I’ll tweet when we get it lined up. We’re not announcing that far in advance! 😂 ↩︎

https://smallcultfollowing.com/babysteps/blog/2021/05/14/ctcftftw/

[AiC] Vision Docs!

May 1, 2021 Updated May 1, 2021

Show full content

The Async Vision Doc effort has been going now for about 6 weeks. It’s been a fun ride, and I’ve learned a lot. It seems like a good time to take a step back and start talking a bit about the vision doc structure and the process. In this post, I’m going to focus on the role that I see vision docs playing in Rust’s planning and decision making, particularly as compared to RFCs.

Vision docs frame RFCs

If you look at a description of the design process for a new Rust feature, it usually starts with “write an RFC”. After all, before we start work on something, we begin with an RFC that both motivates and details the idea. We then proceed to implementation and stabilization.

But the RFC process isn’t really the beginning. The process really begins with identifying some sort of problem1 – something that doesn’t work, or which doesn’t work as well as it could. The next step is imagining what you would like it to be like, and then thinking about how you could make that future into reality.

We’ve always done this sort of “framing” when we work on RFCs. In fact, RFCs are often just one small piece of a larger picture. Think about something like impl Trait, which began with an intentionally conservative step (RFC #1522) and has been gradually extended. Async Rust started the same way; in that case, though, even the first RFC was split into two, which together described a complete first step (RFC #2394 and RFC #2592).

The role of a vision doc is to take that implicit framing and make it explicit. Vision docs capture both the problem and the end-state that we hope to reach, and they describe the first steps we plan to take towards that end-state.

The “shiny future” of vision docs

There are many efforts within the Rust project that could benefit from vision docs. Think of long-running efforts like const generics or library-ification. There is a future we are trying to make real, but it doesn’t really exist in written form.

I can say that when the lang team is asked to approve an RFC relating to some incremental change in a long-running effort, it’s very difficult for me to do. I need to be able to put that RFC into context. What is the latest plan we are working towards? How does this RFC take us closer? Sometimes there are parts of that plan that I have doubts about – does this RFC lock us in, or does it keep our options open? Having a vision doc that I could return to and evolve over time would be a tremendous boon.

I’m also excited about the potential for ‘interlocking’ vision docs. While working on the Async Vision Doc, for example, I’ve found myself wanting to write examples that describe error handling. It’d be really cool if I could pop over to the Error Handling Project Group 4, take a look at their vision doc, and then make use of what I see there in my own examples. It might even help me to identify a conflict before it happens.

Start with the “status quo”

A key part of the vision doc is that it starts by documenting the “status quo”. It’s all too easy to take the “status quo” for granted – to assume that everybody understands how things play out today.

When we started writing “status quo” stories, it was really hard to focus on the “status quo”. It’s really tempting to jump straight to ideas for how to fix things. It took discipline to force ourselves to just focus on describing and understanding the current state.

I’m really glad we did though. If you haven’t done so already, take a moment to browse through the status quo section of the doc (you may find the metanarrative helpful to get an overview5). Reading those stories has given me a much deeper understanding of how Async is working in practice, both at a technical level but also in terms of its impact on people. This is true even when presenting highly technical context. Consider stories like Barbara builds an async executor or Barbara carefully dismisses embedded future. For me, stories like this have more resonance than just seeing a list of the technical obstacles one must overcome. They also help us talk about the various “dead-ends” that might otherwise get forgotten.

Those kind of dead-ends are especially important for people new to Rust, of course, who are likely to just give up and learn something else if the going gets too rough. In working on Rust, we’ve always found that focusing on accessibility and the needs of new users is a great way to identify things that – once fixed – wind up helping everyone. It’s interesting to think how long we put off doing NLL. After all, metajack filed #6393 in 2013, and I remember people raising it with me earlier. But to those of us who were experienced in Rust, we knew the workarounds, and it never seemed pressing, and hence NLL got put off until 2018.6 But now it’s clearly one of the most impactful changes we’ve made to Rust for users at all levels.

Brainstorming the “shiny future”

A few weeks back, we started writing “shiny future” stories (in addition to “status quo”). The “shiny future” stories are the point where we try to imagine what Rust could be like in a few years.

Ironically, although in the beginning the “shiny future” was all we could think about, getting a lot of “shiny future” stories up and posted has been rather difficult. It turns out to be hard to figure out what the future should look like!7

Writing “shiny future” stories sounds a bit like an RFC, but it’s actually quite different:

The focus is on the end user experience, not the details of how it works.
We want to think a bit past what we know how to do. The goal is to “shake off” the limits of incremental improvement and look for ways to really improve things in a big way.
We’re not making commitments. This is a brainstorming session, so it’s fine to have multiple contradictory shiny futures.

In a way, it’s like writing just the “guide section” of an RFC, except that it’s not written as a manual but in narrative form.

Collaborative writing sessions

To try and make the writing process more fun, we started running collaborative Vision Doc Writing Sessions. We were focused purely on status quo stories at the time. The idea was simple – find people who had used Rust and get them to talk about their experiences. At the end of the session, we would have a “nearly complete” outline of a story that we could hand off to someone to finish.8

The sessions work particularly well when you are telling the story of people who were actually in the session. Then you can simply ask them questions to find out what happened. How did you start? What happened next? How did you feel then? Did you try anything else in between? If you’re working from blog posts, you sometimes have to take guesses and try to imagine what might have happened.9

One thing to watch out for: I’ve noticed people tend to jump steps when they narrate. They’ll say something like “so then I decided to use FuturesUnordered”, but it’s interesting to find out how they made that decision. How did they learn about FuturesUnordered? Those details will be important later, because if you develop some superior alternative, you have to be sure people will find it.

Shifting to the “shiny future”

Applying the “collaborative writing session” idea to the shiny future has been more difficult. If you get a bunch of people in one session, they may not agree on what the future should be like.

Part of the trick is that, with shiny future, you often want to go for breadth rather than depth. It’s not just about writing one story, it’s about exploring the design space. That leads to a different style of writing session, but you wind up with a scattershot set of ideas, not with a ’nearly complete’ story, and it’s hard to hand those off.

I’ve got a few ideas of things I would like to try when it comes to future writing sessions. One of them is that I would like to work directly with various luminaries from the Async Rust world to make sure their point-of-view is represented in the doc.

Another idea is to try and encourage more “end-to-end” stories that weave together the “most important” substories and give a sense of prioritization. After all, we know that there are subtle footguns in the model as is and we also know that intgrating into external event loops is tricky. Ideally, we’d fix both. But which is a bigger obstacle to Async Rust users? In fact, I imagine that there is no single answer. The answer will depend on what people are doing with Async Rust.

After brainstorming: Consolidating the doc and building a roadmap

The brainstorming period is scheduled to end mid-May. At that point comes the next phase, which is when we try to sort out all the contradictory shiny future stories into one coherent picture. I envision this process being led by the async working group leads (tmandry and I), but it’s going to require a lot of consensus building as well.

In addition to building up the shiny future, part of this process will be deciding a concrete roadmap. The roadmap will describe the specific first steps we will take first towards this shiny future. The roadmap items will correspond to particular designs and work items. And here, with those specific work items, is where we get to RFCs: when those work items call for new stdlib APIs or extensions to the language, we will write RFCs that specify them. But those RFCs will be able to reference the vision doc to explain their motivation in more depth.

Living document: adjusting the “shiny future” as we go

There is one thing I want to emphasize: the “shiny future” stories we write today will be wrong. As we work on those first steps that appear in the roadmap, we are going to learn things. We’re going to realize that the experience we wanted to build is not possible – or perhaps that it’s not even desirable! That’s fine. We’ll adjust the vision doc periodically as we go. We’ll figure out the process for that when the time comes, but I imagine it may be a similar – but foreshortened – version of the one we have used to draft the initial version.

Conclusion

Ack! It’s probably pretty obvious that I’m excited about the potential for vision docs. I’ve got a lot of things I want to say about them, but this post is getting pretty long. There are a lot of interesting questions to poke at, most of which I don’t know the answers to yet. Some of the things on my mind: what are the best roles for the characters and should we tweak how they are defined10? Can we come up with good heuristics for which character to use for which story? How are the “consolidation” and “iteration / living document” phases going to work? When is the appropriate time to write a vision doc – right away, or should you wait until you’ve done enough work to have a clearer picture of what the future looks like? Are there lighterweight versions of the process? We’re going to figure these things out as we go, and I will write some follow-up posts talking about them.

Footnotes

Not problem, opportunity! ↩︎
And – heck – we’re still working towards Polonius! ↩︎
Not my actual reason. I don’t know my actual reason, it just seems right. ↩︎
Shout out to the error handling group, they’re doing great stuff! ↩︎
Did I mention we have 34 stories so far (and more in open PRs)? So cool. Keep ’em coming! ↩︎
To be fair, it was also because designing and implementing NLL was really, really hard.2 ↩︎
Who knew? ↩︎
Big, big shout-out to all those folks who have participated, and especially those brave souls who authored stories. ↩︎
One thing that’s great, though, is that after you post the story, you can ping people and ask them if you got it right. =) ↩︎
I feel pretty strongly that four characters is the right number (it worked for Marvel, it will work for us!)3, but I’m not sure if we got their setup right in other respects. ↩︎

https://smallcultfollowing.com/babysteps/blog/2021/05/01/aic-vision-docs/