GeistHaus
log in · sign up

https://sabrinajewson.org/blog/feed.xml

atom
6 posts
Polling state
Status active
Last polled May 18, 2026 23:13 UTC
Next poll May 20, 2026 00:00 UTC
Poll interval 86400s
ETag W/"672ceba9-13a965"
Last-Modified Thu, 07 Nov 2024 16:32:41 GMT

Posts

“Truly Hygienic” Let Statements in Rust
Show full content

Remon is a responsible library developer. She cares about stability, flexibility and correctness, using whichever tools are presently accessible to achieve those goals. Her authored libraries feature automated testing and extensive documentation; she allots design decisions rationale; she knows her language features and traditions and how to apply them to best effect.

And, somewhere to be discovered bound in the tangle of .rs files, there is Remon herself, tranquil and yet focused, meticulously crafting, polishing, studying and crafting again, a component she forsees to ease the life of her users, provides ergonomics inaccessible by traditional methods, brings to life the great gift of syntax without glue added to the cogs of the build process – a declarative macro.

Refined and learned code-witch she is, Remon is keenly aware of Rust Cultures and Traditions, and so in keeping, would do nothing but summon a monstrous (documented, without doubt, but monstrous nonetheless) tornado of dollar signs and brackets, one whose gales would surely lift up and send flying a meek blog post as this one. Have sympathy! I cannot handle that – I must admit I have not even implemented Send, so the results could verge on disastrous. But a trained magician knows better than to create a beast they cannot tame, and so for this chronicle it is simplified to a wisp of its wild self – one where you must excuse the apparent folly of its existence – as follows:

macro_rules! oh_my {
	() => {
		let Ok(x) = read_input() else { return Err(Error) };
		$crate::process(x);
	};
}

Remon is a responsible library developer, and understands that all humans will make mistakes – and so she has solicited the services of a good friend, Wolfie, to comment on this slice of code.

Well, Wolfie says, this macro is very impressive feat, and shall surely ease the lives of our users, provide ergonomics inaccessible by traditional methods, and bring to life the great gift of syntax without glue added to the cogs of the build process. But I do have one concern – the let in this macro is not hygienic.

Now, Remon has read her literature, and knows that Rust macros are hygienic with regards to locals – they are guaranteed not to interfere with variables of the caller’s scope unless the variable’s name is explicitly passed in.

Is that so?, asks Remon. You and I both know that Rust macros use mixed-site hygiene. But I trust your experience as a developer and respect you as a person, so I will approach this incongruence with curiosity rather than dismissal. Thus I must ask you: Whatever do you mean?

Wolfie thinks for a second, and concludes this point best communicated through the medium of code. So he quickly types out a demo of a certain way of use causing bugs:

const x: &str = "26ad109e6f4d14e5cc2c2ccb1f5fb497abcaa223";
oh_my!();

And upon entering input that is not the latest commit hash of the greatest Rust library of all time, Remon is dismayed and ashamed to discover that the code, incorrectly, results in an error. But it’s at least not hard to discover why: in the line containing let Ok(x) =, x is a identifier pattern, which means it can either refer to a constant if the constant is in scope, or create a new variable otherwise. Of course, the macro expects the latter to happen, but since constants are items, and thus unlike variables are unhygienic, if there is a constant x at the call site, it will be used instead. So our pattern becomes equivalent to Ok("26ad109…"), which will of course reject any value that is not the latest commit hash of the greatest Rust library of all time, resulting in silent bugs.

Okay, thinks Remon. I know of a way to fix this: the pattern IDENT @ PATTERN will unambiguously have IDENT bound as a variable, never to be treated as a constant. Since there are no other restrictions to be placed on the data, our PATTERN can simply be a wildcard – _. So that’s what she does:

macro_rules! oh_my {
	() => {
		let Ok(x @ _) = read_input() else { return Err(Error) };
		$crate::process(x);
	};
}

But Wolfie is still not pleased, and Remon is still surprised, because now there is a compilation error.

error[E0530]: let bindings cannot shadow constants
 --> src/main.rs:3:10
  |
3 |         let Ok(x @ _) = read_input() else { return Err(Error) };
  |                ^ cannot be named the same as a constant
...
8 |     const x: &str = "TODO";
  |     ---------------------- the constant `x` is defined here
9 |     oh_my!();
  |     -------- in this macro invocation
  |

This is of course not as bad as buggy behaviour, but Wolfie knows that Remon is a responsible library developer who cares about flexibility and correctness, and it is unpredicable that the macro would suddenly start failing just because of some constants that happen to be there at the call site.

Remon has never seen this error before, but remains undeterred. After all, there is one more trick up her sleeve: although let bindings cannot shadow constants, those two do not account for every member of the value namespace. Functions are a member just as well. And functions, unlike consts, have the property that they can be shadowed – and by virtue of being an item, they may shadow the latter as well (if introduced in a smaller scope).

So, she introduces that new scope into her macro, and inside it, defines a dummy function. As it happens, functions are never valid in patterns, and so the x @ _ trick is no longer needed.

macro_rules! oh_my {
	() => {{
        #[allow(dead_code)]
        fn x() {}
		let Ok(x) = read_input() else { return Err(Error) };
		$crate::process(x);
    }};
}

And despite Wolfie’s attempts to break it, this iteration remains hygienic even in the presence of strange environments.

But Remon isn’t satisfied. Because now, being the responsible library developer she is, whenever she uses this trick, she must document it. And she has to introduce a shadowing helper function for every single identifier used in the macro – something that is very easy to forget, negating the benefit of using this trick in the first place. It increases her codebase’s size, in an already-complex macro, for a gain that seems marginal at best.

And so, against her instincts to be fully correct, Remon turns to Wolfie and says, plainly, No. With the incantation of a git reset, she erases these changes from history, choosing instead to live in the ignorant bliss of very-slightly-unhygienic declarative macros.

After all, who names constants in lowercase anyway?

https://sabrinajewson.org/blog/truly-hygienic-let
Why the “Null” Lifetime Does Not Exist
Show full content

This post originated from an interesting conversation had on the Rust community Discord the other day, in which a user asks:

Does 'static have an opposite? Zero lifetime that’s shorter than anything?

Details of the question are not relevant, but intuitively the question does make sense. After all, Rust already has 'static, representing a lifetime that is longer than or equal to all other lifetimes — so why wouldn’t there be a counterpart, maybe called 'empty or 'null, that is shorter than every other lifetime? To the type theorists out there, if 'static is our bottom type (as it can become any lifetime), wouldn’t there also hypothetically be some top type that any lifetime can become?

The short answer is “not in any way that allows you to construct a value with the lifetime”. Rust’s type system, as this post will show, is designed with the assumption that given a valid lifetime, you can always make a shorter one, so any hypothetical 'null lifetime would have to be non-constructible in the first place.

But why is that? Well, that takes a bit of explaining, but first I’d like to take a bit of a detour into the world of self-referential types…

An Overlong Interlude Where I Am Increasingly Pedantic About Self-Referential Types

(This is going somewhere, I promise.)

You may often hear it be said that Rust does not support self-referential types. This is typically in response to a beginner attempting code like the following and thoroughly confusing themselves:

struct OhNo {
	base_string: String,
	parts: Vec<&'??? str>,
}

Where parts is supposed to borrow from base_string. The beginner of course expects this to be possible but has no clue what lifetime to write in there.

Inevitably, you, the seasoned Rustacean, will put on a grave expression and slowly shake your head in frank resignation. Then, bearing the bad news like a parent informing their child of the truth about Santa Claus, you say:

Rust does not support self-referential types.

And the beginner’s dreams are crushed, for no matter how much they may argue against it, they will eventually have to accept the tragic truth such an abstraction is simply not possible.

But we’re programmers here, and we like things to be precise. And if this interaction should occur in any technical space, it is quite expected that some other user be rather pleased with themselves by chiming in: But what about async?

Well, they’re not wrong. So what about it?

If you’ve ever worked with async before, you will know that futures declared with async blocks have the ability to borrow values across .await points. For example:

let future = async {
	let data = 5;
	let r = &data;
	something_else().await; // Point A
	println!("{r}");
};

The above future, once it suspends for the first time at point A, has to store all of the data required for resumation in its type. This means that we have to store both data and r in the future — r needs to be kept because we directly access it, and data needs to be kept since otherwise r’s reference would dangle into thin air. But r also needs to reference data, which means one part of the type needs to reference another — meeting exactly the definition of a self-referential type!

Okay then, you say, let’s refine the statement. async blocks can indeed be self-referential, but that’s not particularly useful because we can’t extract any data from them beyond the very limited Future interface. So we restrict our claim:

Rust does not support self-referential data structures.

This is better but still not quite true, because you can simply make a static item that depends on itself:

struct SelfRef {
	this: &'static Self,
}
static SELF_REF: SelfRef = SelfRef { this: &SELF_REF };

Well that’s just being pedantic now. But fine, let’s adjust the claim:

Rust does not support non-static self-referential data structures.

Except, with a little interior mutability trickery with Cell and Option to get around the acyclic nature of runtime execution, this same trick also works on the stack:

struct SelfRef<'this> {
	this: Cell<Option<&'this Self>>,
}

fn main() {
	let self_ref = SelfRef {
		this: Cell::new(None),
	};
	self_ref.this.set(Some(&self_ref));
}

use std::cell::Cell;

Try it yourself — although it might be surprising, this compiles just fine, and does indeed result in a struct that technically references itself.

Edit (2023-07-22): After this blog post was pubished, Daniel Henry-Mantilla helpfully pointed out that you don’t even need interior mutability to make a stack self-referential struct like this, so long as you’re willing to sacrifice having a literal self-reference (&Self) for a reference to an earlier field. Specifically, the following code just works:

struct SelfRef<'this> {
	a: i32,
	b: &'this i32,
}
fn main() {
	let mut self_ref = SelfRef { a: 37, b: &0 };
	self_ref.b = &self_ref.a;
}

The resulting struct exhibits the same behaviour we talk about later in this section, but it’s worth putting in this example as a more “pure” demonstration of the same effect. Thanks, Yandros!

So, did we do it? Have we solved the years-long problem of self-referential types?

Well, of course not, because this approach comes with one huge problem that is a deal-breaker for almost all real-life situations: the resulting value cannot be moved or uniquely borrowed for the rest of its lifetime. Even if we do something as trivial and innocuous as dropping it, we start to see the issue.

struct SelfRef<'this> {
	this: Cell<Option<&'this Self>>,
}

fn main() {
	let self_ref = SelfRef {
		this: Cell::new(None),
	};
	self_ref.this.set(Some(&self_ref));
+	drop(self_ref);
}

use std::cell::Cell;
error[E0505]: cannot move out of `self_ref` because it is borrowed
  --> src/main.rs:10:10
   |
6  |     let self_ref = SelfRef {
   |         -------- binding `self_ref` declared here
...
9  |     self_ref.this.set(Some(&self_ref));
   |                            --------- borrow of `self_ref` occurs here
10 |     drop(self_ref);
   |          ^^^^^^^^
   |          |
   |          move out of `self_ref` occurs here
   |          borrow later used here

For more information about this error, try `rustc --explain E0505`.

And with unique borrowing we get a similar error:

-	let self_ref = SelfRef {
+	let mut self_ref = SelfRef {
		this: Cell::new(None),
	};
-	drop(self_ref);
+	&mut self_ref.this;
error[E0502]: cannot borrow `self_ref.this` as mutable because it is also borrowed as immutable
  --> src/main.rs:10:5
   |
9  |     self_ref.this.set(Some(&self_ref));
   |                            --------- immutable borrow occurs here
10 |     &mut self_ref.this;
   |     ^^^^^^^^^^^^^^^^^^
   |     |
   |     mutable borrow occurs here
   |     immutable borrow later used here

For more information about this error, try `rustc --explain E0502`.

Of course, this makes perfect sense. If we were allowed to uniquely borrow the self_ref value we could trivially use that to produce a &mut and & to the same location, which is a textbook case of UB!

let mut self_ref = SelfRef { this: Cell::new(None) };
self_ref.this.set(Some(&self_ref));
let reference_1: &SelfRef<'_> = self_ref.this.get().unwrap();
let reference_2: &mut SelfRef<'_> = &mut self_ref;
// Oops, UB!

These examples are quite abstract, but they show that you are barred from doing basically anything useful with the value, including returning it from functions or setting any of its fields without interior mutability. Well what did I expect, we just can’t have nice things.

At least we can improve our claim:

Rust does not support movable self-referential data structures.

Surely we’re done now? Well, for the purposes of the main point of the post we are, but since I’ve started this game I feel only obliged to indulge in this pedantry to its natural terminus. So yes, let’s continue…

Our next counterexample is that C supports movable self-referential data structures. So if Rust can’t do this, does this mean Rust is inherently less powerful than C? Well no, of course not, we were just only considering safe code up until now. You can do anything that C can with a little unsafe, so let’s add that qualifier:

Rust does not support safe movable self-referential data structures.

But then we can’t ignore one of Rust’s most powerful features, the wrapping of unsafe code with safe code. That is to say, one can create safe abstractions over what the C code would do to enable this kind of thing with safe code, through dependencies that use unsafe.

As it turns out, this kind of thing is easier said than done. The original attempts at these, owning_ref and rental, are now both unsound and unmaintained; yoke is also unsound in two separate ways (1, 2; although neither are as of today considered exploitable) and only ouroboros has managed to fix all the issues. But it is at least possible, so we can arrive at our final (really final this time, I promise) true statement:

Rust does not natively support safe movable self-referential data structures.

Well isn’t that a mouthful?

Always A Shorter Lifetime

Let’s go back to the code example from before, where Rust prevented us from causing UB with our stack-based self-referential type.

let mut self_ref = SelfRef { this: Cell::new(None) };
self_ref.this.set(Some(&self_ref));
let reference_1: &SelfRef<'_> = self_ref.this.get().unwrap();
let reference_2: &mut SelfRef<'_> = &mut self_ref;
//                                  ^^^^^^^^^^^^^ Compiler error!

This is weird, isn’t it? Because suppose we delete the second line — then it all compiles just fine, that’s just basic Rust borrowing rules. So what is up with that line? How can one usage of a value, involving only that value, get it into this weird twilight state where you can normally borrow but not uniquely borrow or move no matter what you do?

We know that calling any normal function on this value would not put it in that twilight state:

fn uwu(_: &SelfRef<'_>) {}

let mut self_ref = SelfRef { this: Cell::new(None) };
uwu(&self_ref);
let reference_1: &SelfRef<'_> = self_ref.this.get().unwrap();
let reference_2: &mut SelfRef<'_> = &mut self_ref; // Compiles just fine!

So this might lead you to believe that this is some special case in the Rust compiler, that it detected we were building a self-referential type and intervened personally to protect us. But one of the beauties of the borrow checker is that’s it’s not, and we can show that if we first desugar the lifetimes of uwu, and then try to actually construct the self-referential type within it:

fn uwu<'a, 'b>(self_ref: &'a SelfRef<'b>) {
	self_ref.this.set(Some(&self_ref));
}
error: lifetime may not live long enough
 --> src/main.rs:4:2
  |
3 | fn uwu<'a, 'b>(self_ref: &'a SelfRef<'b>) {
  |        --  -- lifetime `'b` defined here
  |        |
  |        lifetime `'a` defined here
4 |     self_ref.this.set(Some(&self_ref));
  |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ argument requires that `'a` must outlive `'b`
  |
  = help: consider adding the following bound: `'a: 'b`

The fact than an error occurred at all first tells us that there is some material difference between using our magic line we had and calling the uwu function as we’ve currrently defined it. The clue to this difference can be found in the compiler help message:

consider adding the following bound: 'a: 'b

So we can do that, and recompile:

fn uwu<'a: 'b, 'b>(self_ref: &'a SelfRef<'b>) {
	self_ref.this.set(Some(&self_ref));
}

let mut self_ref = SelfRef { this: Cell::new(None) };
uwu(&self_ref);
let reference_1: &SelfRef<'_> = self_ref.this.get().unwrap();
let reference_2: &mut SelfRef<'_> = &mut self_ref;
//                                  ^^^^^^^^^^^^^ error: cannot borrow `self_ref` as mutable
//                                                because it is also borrowed as immutable

The same error as before! This means we’ve perfectly been able to extract the underlying “borrowing behaviour” behind the line self_ref.this.set(Some(&self_ref)) into a function, which gives us clues as to what’s really going on here. Since we now have the right signature, we can even delete the body of uwu and observe that the error remains the same:

fn uwu<'a: 'b, 'b>(_: &'a SelfRef<'b>) {}

Recall that : in lifetimes means “outlives” or “lives at least as long as”. Therefore, the generic parameter section <'a: 'b, 'b> of uwu tells us that it operates on the lifetimes

  • 'a, which is the same length or longer than, 'b;
  • and 'b, which can be any lifetime.

You can also use pure logic to reach the conclusion that a function accepting &'a SelfRef<'b> where 'a: 'b is enough to construct a self-referential type, and thus is also enough to prevent any future moves or unique borrows: the reference type held inside the SelfRef<'b> is a &'b Self, but Self in this context is SelfRef<'b>, so therefore procuring a &'b SelfRef<'b> is sufficient to fill that field in. If we then have some &'a SelfRef<'b> where 'a: 'b, as it’s always valid to treat objects as living shorter then they actually do, it can be implicitly converted into the &'b SelfRef<'b> as desired.

So what was all this about? Well really, it was just a long and roundabout way to demonstrate to you a theorem, in the mathematical sense, that holds in Rust:

When you have some type with an invariant lifetime parameter T<'b> and you borrow it with the lifetime 'a such that 'a outlives 'b (producing &'a T<'b>), one is prevented from moving the value thereafter.

(you might notice the presence of the qualifier “invariant” there; this is another thing I won’t go into because it’s not that relevant right now, but it is necessary for the theorem to hold).

We can then take the contrapositive of this theorem, giving us the corollary:

If one is able to move some type with an invariant lifetime parameter T<'b> after borrowing it, then the lifetime which it was borrowed for is strictly shorter than 'b (as if it outlived 'b, one would not have been able to move it).

You might be able to see where this is going now. Take the below code, which compiles:

// `Cell` is used to make T invariant in `'b`
type T<'b> = Cell<&'b ()>;
fn owo<'b>(mut value: T<'b>) {
	let reference = &value;
	drop(value);
}
use std::cell::Cell;

Here we have a function owo, accepting some T<'b>, borrowing it, and then moving it. This satisfies all the conditions to apply the theorem above, which tells us that the duration reference borrowed value for must be a lifetime that is strictly shorter than 'b.

But as 'b was a lifetime parameter to the function owo, we know that it could have been any lifetime — it’s not constrained in any way. This gives the final result for this section:

Given any lifetime parameter 'b, it must be possible to construct a reference whose lifetime is required to live strictly shorter than 'b in order for Rust to be sound.

Or, in other words,

There is always a shorter lifetime.

And this is the reason why the 'null lifetime doesn’t exist, at least in its naïve form. Because if it did exist, and if you could pass it to functions, those functions could always use the trick outlined above to construct a lifetime that must be shorter. This leaves us with only two possibilities:

  1. 'null is not actually shorter than every other lifetime, defeating its purpose;
  2. Rust is unsound.

Of course, this doesn’t not rule out a hypothetical “opposite of 'static” existing entirely; merely, it proves that it must not be allowed to actually construct a variable with this lifetime. dtolnay’s 2017 proposal for the 'void lifetime (which to my knowledge was unfortunately never pursued after that initial thread) is an example of the way in which 'static could have an opposite: it can be useful in traits as he shows, but it can never actually be constructed because it’s so short that any value containing a &'void would live longer than 'void, and thus would be disallowed.

This is quite counterintuitive, as after all if one can never construct a 'static reference to a stack value, surely one would always be able to construct a 'void reference to a stack value — but as you’ve seen, it’s the only way for Rust’s borrow checker to still be sound.

https://sabrinajewson.org/blog/null-lifetime
Modular Errors in Rust
Show full content

It is thankfully common wisdom nowadays that documentation must be placed as near as possible to the code it documents, and should be fine-grained to a minimal unit of describability (the thing being documented). The practice provides numerous benefits to the codebase and project as a whole:

  1. When editing the source code, contributors are less likely to forget to update the documentation as well, ensuring it is kept up-to-date and accurate.
  2. When reading the source code, reviewers can easily jump back and forth between the docs and the code it documents, helping them understand it and allowing them to contrast the expected with actual behaviour.
  3. The codebase becomes more modular. Individual parts can be extracted into different crates or projects if necessary, and strong abstraction boundaries make the code easier to understand in small pieces.

But you probably already knew this; after all, Rust made the excellent design choice of making it the by far easiest method of writing documentation at all. And you probably also know that these same principles apply to tests: when unit tests are kept next to their minimum unit of checkability, you get the same benefits of convenient updating, assisted understanding and modularity. And most Rust projects do use unit tests in this way (when they can, for often there are limitations that prevent it from working), which again we can thank the tooling for.

But that’s all old news. What I’m here to convince you of today is that this principle applies additionally to error types: that is, error types should be located near to their unit of fallibility. To illustrate this point, I will follow the initial development and later API improvement of a hypothetical Rust library.

Case Study: A Blocks.txt Parser

Suppose you’re a library author, and you’re working on a crate to implement the parsing of Blocks.txt in the Unicode Character Database. If you’re not familiar with this file, it defines the list of so-called Unicode blocks, which are non-overlapping contiguous categories that Unicode characters can be sorted into. It looks a bit like this:

0000..007F; Basic Latin
0080..00FF; Latin-1 Supplement
0100..017F; Latin Extended-A
0180..024F; Latin Extended-B
0250..02AF; IPA Extensions

This file tells you that, for example, the character “½”, U+00BD, is in the block “Latin-1 Supplement” because 0x0080 ≤ 0x00BD ≤ 0x00FF. Every character has an associated block; characters which have not yet been assigned a block in the file above are considered to be in the special pseudo-block No_Block.

So let’s get started on a Rust parser. The specification for the format is given by section 4.2 of Unicode Annex #44, but the format is so trivial you could almost guess it. Upon seeing this task, a typical Rustacean may write code like this:

//! This crate provides tools for working with Unicode blocks and its data files.

pub struct Blocks {
	ranges: Vec<(RangeInclusive<u32>, String)>,
}

impl Blocks {
	pub fn from_file<P: AsRef<Path>>(path: P) -> Result<Self, Error> {
		Self::from_str(&fs::read_to_string(path)?)
	}

	pub fn download(agent: &ureq::Agent) -> Result<Self, Error> {
		let response = agent.get(LATEST_URL).call()?;
		Self::from_str(&response.into_string()?)
	}
}

impl FromStr for Blocks {
	type Err = Error;
	fn from_str(s: &str) -> Result<Self, Self::Err> {
		let ranges = s
			.lines()
			.map(|line| line.split_once('#').map(|(l, _)| l).unwrap_or(line))
			.filter(|line| !line.is_empty())
			.map(|line| {
				let (range, name) = line.split_once(';').ok_or(Error::NoSemicolon)?;
				let (range, name) = (range.trim(), name.trim());
				let (start, end) = range.split_once("..").ok_or(Error::NoDotDot)?;
				let start = u32::from_str_radix(start, 16)?;
				let end = u32::from_str_radix(end, 16)?;
				Ok((start..=end, name.to_owned()))
			})
			.collect::<Result<Vec<_>, Error>>()?;
		Ok(Self { ranges })
	}
}

Now we need to define an error type, so let’s just follow the “big #[non_exhaustive] enum” convention and bash out some boilerplate that gets the job done:

/// An error in this library.
#[derive(Debug)]
#[non_exhaustive]
pub enum Error {
	NoSemicolon,
	NoDotDot,
	ParseInt(ParseIntError),
	Io(io::Error),
	Ureq(Box<ureq::Error>),
}

impl From<ParseIntError> for Error {
	fn from(error: ParseIntError) -> Self {
		Self::ParseInt(error)
	}
}

impl From<io::Error> for Error {
	fn from(error: io::Error) -> Self {
		Self::Io(error)
	}
}

impl From<ureq::Error> for Error {
	fn from(error: ureq::Error) -> Self {
		Self::Ureq(Box::new(error))
	}
}

impl Display for Error {
	fn fmt(&self, f: &mut Formatter<'_>) -> fmt::Result {
		match self {
			Self::NoSemicolon => f.write_str("no semicolon"),
			Self::NoDotDot => f.write_str("no `..` in character range"),
			Self::ParseInt(e) => Display::fmt(e, f),
			Self::Io(e) => Display::fmt(e, f),
			Self::Ureq(e) => Display::fmt(e, f),
		}
	}
}

impl std::error::Error for Error {}

Lastly, a couple other bits and imports go at the end:

pub const LATEST_URL: &str = "https://www.unicode.org/Public/UCD/latest/ucd/Blocks.txt";

use std::cmp;
use std::fmt;
use std::fmt::Display;
use std::fmt::Formatter;
use std::fs;
use std::io;
use std::num::ParseIntError;
use std::ops::RangeInclusive;
use std::path::Path;
use std::str::FromStr;

And we’re done.

There are a few small things to note with this code just before we move on:

  1. I omitted documentation, since it’s not relevant to the real example; in actual code, all the public items would be documented. Similarly, unit tests are omitted.
  2. In a real library, one would not hard-depend on ureq and std and would use feature-flags instead, but again I omitted that for this example.
  3. You might have noticed I put my imports on separate lines each at bottom — I do have my reasons for this, but that’s best saved for another day ;)
  4. Blocks implements FromStr, but not TryFrom<&str>. This is actually intentional, because despite being nearly identical traits signature-wise they mean two very different things: FromStr implies parsing from a string whereas TryFrom<&str> is for when your data type is a subset of all strings. In our case, FromStr is the correct one to use.
  5. The Display implementation of Error formats error messages like no semicolon in lowercase and without a full stop at the end — this is in accordance with conventions established by the Standard Library (“Error messages are typically concise lowercase sentences without trailing punctuation”). A common pitfall of both new and experienced Rustaceans is using incorrect casing for error messages.
  6. Another common pitfall is naming things like what we’ve named Error::Io as Error::IoError instead. Simply: you don’t need the Error suffix, it says it in the name already!
  7. One could use the thiserror crate to shorten the code by using a #[derive(Error)]. Personally, I would never use this for a library crate since it’s really not that many lines of code saved for a whole extra dependency, but you might want to know about it.
  8. The Ureq variant of the Error enum is boxed because ureq::Error is actually very large and Clippy complains about it.

So there we have it: our perfect little library, let’s go off and publish it to crates.io.

What we’ve written so far, with regard to error handling, is what I’d say most libraries on crates.io do. It’s by far the most common way of handling errors: just stick everything in a big enum of “different ways things can go wrong in the library” and don’t think about it after that. But unfortunately, while it is common it is not exactly good, for a few reasons the rest of this post will be covering.

Problem 1: Backtraces

Suppose you then decide to use your library in a CLI application; and as per usual advice and your own experience, you decide to use anyhow to handle the errors in it. So you write out all your code and it looks a little like this:

fn main() -> anyhow::Result<()> {
	init_something()?;
	let blocks = Blocks::from_file("Blocks.txt")?;
	init_something_else()?;
	// Run the main code…
	Ok(())
}

use unicode_blocks::Blocks;

Looks good, so you go ahead and run it — only, you’re rather abruptly met with:

Error: invalid digit found in string

Um, okay. That doesn’t help us very much at all. What went wrong here?

Well, much pain and many dbg! statements later, you discover that the culprit is that somehow, on line 223 of Blocks.txt you replaced a 0 with an O. Oops!

--- Blocks.txt
+++ Blocks.txt
@@ -222,3 +222,3 @@
 10800..1083F; Cypriot Syllabary
-10840..1O85F; Imperial Aramaic
+10840..1085F; Imperial Aramaic
 10860..1087F; Palmyrene

And then you run it again and it works fine.

But it didn’t have to be this hard. The error message could have displayed something more useful, and maybe this is just a pipe dream, but I’ve seen anyhow emit this sort of thing before:

Error: error reading `Blocks.txt`

Caused by:
	0: invalid Blocks.txt data on line 223
	1: one end of range is not a valid hexidecimal integer
	2: invalid digit found in string

That’s so much more helpful — you wouldn’t ever have had to suspect init_something and init_something_else as potential causes of the error, or even search Blocks.txt for mistakes, it completely guides you to exactly where it went wrong!

Oh well, you say to yourself, at least this time it was decently obvious where the source of the error came from; at least I wasn’t getting a file not found error from TcpListener::bind (the natural conclusion to this kind of “flat”-style error handling). But wouldn’t it be nice if all errors came with backtrace and context tracking built-in?

Problem 2: Inextensibility

At least one of the things in the above image looks feasible to fix though: adding line numbers as context to the error messages. All we have to do is return to our Error enum and add more fields to the NoSemicolon, and NoDotDot, and ParseInt, variants:

pub enum Error {
	NoSemicolon { line: usize },
	NoDotDot { line: usize },
	ParseInt { line: usize, source: ParseIntError },
	Io(io::Error),
	Ureq(Box<ureq::Error>),
}

Except… we can’t do that without breaking backward compatibility, because while the enum itself is #[non_exhaustive] the individual variants aren’t, meaning you’ve fixed them to forever have the fields they do currently (without breaking changes).

Problem 3: Error Matching

Okay, so back to the application. You’ve now realized that you still want to call Blocks::from_file("Blocks.txt"), but if it fails with a “file not found” error you actually want to download the file automatically instead of exiting the program entirely. We have to match on the Result for that:

let blocks = match Blocks::from_file("Blocks.txt") {
	Ok(blocks) => blocks,
	Err(unicode_blocks::Error::Io(e)) if e.kind() == io::ErrorKind::NotFound => {
		// download and retry…
	}
};

Great! But the compiler is yelling that the match arms aren’t exhaustive. Not too hard to fix, let’s look at the cases we need to deal with:

  • NoSemicolon, NoDotDot, ParseInt: Those are pretty obvious, they look like parsing errors, so we can just propagate them.
  • Io: Other I/O errors than “file not found” can also safely be propagated.
  • Ureq: Ummm…? Wait, is this function doing HTTP requests? Let me check the source code again… [please stand by…] oh okay, so it’s not. Then I could add an unreachable!() here which would be correct and indicates semantics nicely; on the other hand, nowhere is it written in the documentation of the API that it won’t ever return this, so maybe I should just propagate it anyway?
  • Oh, and I forgot, we added #[non_exhaustive] to enum Error so there’s always the possibility of it returning a variant that doesn’t exist yet. Well, I guess we can just propagate it anyway.

So, this situation isn’t ideal. The library doesn’t document anywhere what errors a given function can return, so users are often left shooting in the dark. From personal experience, there have been many times I have seen an error variant which was appropriate for me to catch, then I had to spend ages digging around in the source code to find out whether it was actually generated or not — and even an answer to that doesn’t constitute an API guarantee that it will or won’t be in future.

Another issue with the code that we’ve written is that it’s entirely non-obvious that our match arm refers specifically to the Blocks.txt file not being found. The arm itself just says “check if an I/O not found error occurred”, but in theory, and especially for more complex functions, an I/O not found error could mean one of several different things that the user can no longer differentiate between because they were all put together in a single Io variant.

Problem 4: Privacy and Stability

One very common mistake libraries make with this style of big- enum error is accidentally exposing dependencies intended to be private in their public API through error types. In our example code, suppose std::fs and io::Error weren’t part of the standard library but were rather types from an external library that was on version 0.4. Now, when they bump their version to 0.5 I also have to make a breaking change to update it to the newer version, because I exposed the io::Error type in my public API through the Error enum, even though I never expose my usage of the library anywhere else (it’s covered up by the opaque interface of from_file). The same issue occurs if I tried to switch out my usage of that library for a different one; it also forbids me from ever releasing 1.0 until the dependency library also reaches 1.0 as per the C-STABLE API requirement.

This is hard to fix with this approach to errors, because enum data is hardcoded to always use inherited visibility, meaning if the outer enum fields are public all inner fields are too. Private fields are also useful in errors in general, for reasons other than stability: private fields are just generally a nice feature to have on types.

Problem 5: Non-Modularity

And lastly, touching back on what I mentioned at the beginning of this article: this approach to error handling is non-modular. I couldn’t easily take a component alone, like the parser, and extract it to a different crate, because I’d have to change many APIs or otherwise hack around it. Every API is interconnected with each other through the underlying error type, tying the crate together in a big knot that makes it difficult to untangle and remove stuff.

This kind of non-modularity also makes the codebase more difficult to understand: one is forced, to a greater degree, to learn the entire codebase at once to work on it, rather than learn it piece by piece, a far preferable way of learning.

Guidelines for Good Errors

So the current error type we have has problems. But how do we fix them? And this is where we bring in that principle from the start:

Error types should be located near to their unit of fallibility.

The key phrase here is “unit of fallibility”. What are the units of fallibility in our library? Well, it’s certainly not the library itself — the library is just a way of interacting with Unicode blocks, and it’s not like that can particularly fail. The only libraries that would have the entire library as a unit of fallibility are those whose only purpose is to perform a single operation (they typically have an API surface of no more than two functions, maybe a Params builder type, and nothing more).

This tells us that the unicode_blocks::Error type is inherently misguided. Rather, the units of fallibility in our case are the operations we do, like downloading, reading a file, and parsing.

Now, things get a little subjective at this point on deciding what counts as two separate units or the same unit. In general, you should ask yourself the following two questions:

  1. Do they have different ways in which they can fail?
  2. Should they show different error messages should they fail?

If the answer to either of those questions is “yes”, then they should normally be separate error types.

For us, this means we actually want three separate error types:

  1. FromFileError, for errors in Blocks::from_file;
  2. DownloadError, for errors in Blocks::download;
  3. ParseError, for errors in from_str.
Leveraging the .source() method

Earlier, we said we wanted our error messages (printed with anyhow) to look good, like this:

Error: error reading `Blocks.txt`

Caused by:
	0: invalid Blocks.txt data on line 223
	1: one end of range is not a valid hexidecimal integer
	2: invalid digit found in string

So how do we get anyhow to print this? It turns out what the library calls internally is the Error::source() method, a default-implemented method of the Error trait that tells you the cause of an error. What we see in the above graphic depicts:

  1. an error type (we know to be FromFileError) whose Display implementation prints “error reading Blocks.txt”, and whose source is…
  2. …another error type, whose Display implementation prints “invalid Blocks.txt data on line 223”, and whose source is…
  3. …another error type, whose Display implementation prints “one end of range is not a valid hexidecimal integer”, and whose source is…
  4. …another error type (we know to be ParseIntError) whose Display implementation prints “invalid digit found in string” and whose source is None.

That might seem like a lot of layers, but they all map very nicely to our code: layer 1 is a FromFileError, layer 2 has to be our ParseError, layer 3 has to be something contained within the ParseError, and layer 4 is ParseIntError.

This leads us to a much nicer structure for the error types in the from_file API.

#[derive(Debug)]
#[non_exhaustive]
pub struct FromFileError {
	pub path: Box<Path>,
	pub kind: FromFileErrorKind,
}

impl Display for FromFileError {
	fn fmt(&self, f: &mut Formatter<'_>) -> fmt::Result {
		write!(f, "error reading `{}`", self.path.display())
	}
}

impl Error for FromFileError {
	fn source(&self) -> Option<&(dyn Error + 'static)> {
		match &self.kind {
			FromFileErrorKind::ReadFile(e) => Some(e),
			FromFileErrorKind::Parse(e) => Some(e),
		}
	}
}

#[derive(Debug)]
pub enum FromFileErrorKind {
	ReadFile(io::Error),
	Parse(ParseError),
}

This error:

  • has very good backtraces, as it implements Display and source() well;
  • is extensible, as the struct is attributed with #[non_exhaustive];
  • supports precise error matching, as we’ve now automatically given the public API guarantee that we won’t produce HTTP errors from our function, so our users needn’t worry about dealing with that case;
  • makes it clear where the io::Errors can come from, because the variant is named ReadFile instead of simply Io;
  • would easily be able to adjust to support hiding io::Error from the public API surface simply by making kind and FromFileErrorKind private;
  • is entirely modular, being conceptually contained within the from_file logic portion of the code, so it can be extracted, learnt independently, et cetera.

ParseError can be defined in a somewhat similar fashion, also with the above benefits.

#[derive(Debug)]
#[non_exhaustive]
pub struct ParseError {
	pub line: usize,
	pub kind: ParseErrorKind,
}

impl Display for ParseError {
	fn fmt(&self, f: &mut Formatter<'_>) -> fmt::Result {
		write!(f, "invalid Blocks.txt data on line {}", self.line + 1)
	}
}

impl Error for ParseError {
	fn source(&self) -> Option<&(dyn Error + 'static)> {
		Some(&self.kind)
	}
}

#[derive(Debug)]
pub enum ParseErrorKind {
	#[non_exhaustive]
	NoSemicolon,
	#[non_exhaustive]
	NoDotDot,
	#[non_exhaustive]
	ParseInt { source: ParseIntError },
}

impl Display for ParseErrorKind {
	fn fmt(&self, f: &mut Formatter<'_>) -> fmt::Result {
		match *self {
			Self::NoSemicolon => f.write_str("no semicolon"),
			Self::NoDotDot => f.write_str("no `..` in range"),
			Self::ParseInt { .. } => {
				f.write_str("one end of range is not a valid hexadecimal integer")
			}
		}
	}
}

impl Error for ParseErrorKind {
	fn source(&self) -> Option<&(dyn Error + 'static)> {
		match self {
			Self::ParseInt { source } => Some(source),
			_ => None,
		}
	}
}

Note that the enum variants themselves are #[non_exhaustive], so that they can be extended in future with more information.

There is a slight deviation from FromFileError’s design here, that its corresponding *Kind type actually implements Display and Error in and of itself instead of simply existing as a data holder for other error types. The logic is that while we could separate make unit structs for NoSemicolon, NoDotDot and ParseInt, it just isn’t very necessary here (where on the other hand io::Error is an external type and ParseError is required to be a distinct type because of FromStr). However, sometimes it is still better to make unit structs: it depends on the use case.

Finally, DownloadError showcases a similar pattern (although it’s not that interesting at this point):

#[derive(Debug)]
#[non_exhaustive]
pub struct DownloadError {
	pub kind: DownloadErrorKind,
}

impl Display for DownloadError {
	fn fmt(&self, f: &mut Formatter<'_>) -> fmt::Result {
		write!(f, "failed to download Blocks.txt from the Unicode website")
	}
}

impl Error for DownloadError {
	fn source(&self) -> Option<&(dyn Error + 'static)> {
		match &self.kind {
			DownloadErrorKind::Request(e) => Some(e),
			DownloadErrorKind::ReadBody(e) => Some(e),
			DownloadErrorKind::Parse(e) => Some(e),
		}
	}
}

#[derive(Debug)]
pub enum DownloadErrorKind {
	Request(Box<ureq::Error>),
	ReadBody(io::Error),
	Parse(ParseError),
}

Note that we could have merged DownloadErrorKind and DownloadError into a single type; I chose not to here in favour of extensibility, because it seems quite possible that one would want to add more fields to DownloadError in future. But for some cases it definitely makes sense.

Constructing the error types

If you try to implement the functions that return these error types, you’ll quickly run into something rather annoying: they require quite a bit of boilerplate to use. For example, the body of from_file now looks like this:

pub fn from_file<P: AsRef<Path>>(path: P) -> Result<Self, FromFileError> {
	let path = path.as_ref();
	(|| {
		let s = fs::read_to_string(path).map_err(FromFileErrorKind::ReadFile)?;
		Self::from_str(&s).map_err(FromFileErrorKind::Parse)
	})()
	.map_err(|kind| FromFileError {
		path: path.into(),
		kind,
	})
}

Yeah, not the prettiest. Unfortunately, I don’t think there’s much we can actually do here; once we get try blocks it’ll definitely be nicer, but it seems to be an unavoidable cost of many good error-handling schemes.

On From

One thing notably omitted from the definitions of the new error types was implementations of From for inner types. There is no problem with them really, one just has to be careful that it (a) works with extensibility and (b) actually makes sense. For example, taking FromFileErrorKind:

#[derive(Debug)]
pub enum FromFileErrorKind {
	ReadFile(io::Error),
	Parse(ParseError),
}

While it does make sense to implement From<ParseError>, because Parse is literally the name of one of the variants of FromFileErrorKind, it does not make sense to implement From<io::Error> because such an implementation would implicitly add meaning that one failed during the process of reading the file from disk (as the variant is named ReadFile instead of Io). Constraining the meaning of “any I/O error” to “an error reading the file from the disk” is helpful but should not be done implicitly, thus rendering From inappropriate.

On “nearness”

One part of my principle of errors I haven’t yet touched on is the aspect of “nearness”; that errors should, as well as having an appropriate associated unit of fallibility, be sufficiently near to it. The fact is, with Rust’s current design you can’t put them as close as I’d like without sacrificing documentation quality. That is, while you’d ideally write something like:

impl Blocks {
	pub fn from_file<P: AsRef<Path>>(path: P) -> Result<Self, FromFileError> { /* … */ }
}

pub struct FromFileError { /* … */ }

impl Blocks {
	pub fn download(agent: &ureq::Agent) -> Result<Self, DownloadError> { /* … */ }
}

pub struct DownloadError { /* … */ }

This just makes your rustdoc look bad, since the impl blocks are needlessly separated. So usually I end up writing something more like:

impl Blocks {
	pub fn from_file<P: AsRef<Path>>(path: P) -> Result<Self, FromFileError> { /* … */ }
	pub fn download(agent: &ureq::Agent) -> Result<Self, DownloadError> { /* … */ }
}
pub struct FromFileError { /* … */ }
pub struct DownloadError { /* … */ }

It’s unfortunate, but I don’t think it’s terrible — you still get most the benefits of nearness.

The only thing to make sure of is that they stay in the same module; this same concept of “nearness” is a similar reason why one should be extremely wary of any module named “errors”, which is of equal organizational value to having a drawer labelled “medium-sized and flat”.

Verbosity

Possibly the biggest objection to this style of error is the sheer number of lines of code required to implement it; error types aren’t a trivial number of lines, and making a new error type for every function can easily hugely increase the number of lines a library needs. This is definitely a valid criticism, I also find it tiresome to write the same things over and over again, but let me also offer an alternate perspective: rather than seeing it as simply a more verbose way to do the same thing, see it as due treatment for an oft ignored area.

Traditionally, errors as something to be pushed to the side as soon as possible to get on with “real” logic. But the art of resilient, reliable and user-friendly systems considers all outcomes, not just the successful one. As a success story, look no further than the Rust compiler itself; I don’t think it would be an exaggeration to say that Rust enjoys the current popularity it does because of how good its error messages are, and how much effort was put into it.

Conclusion

This post is not here to give you a structure that you should follow for your errors. The structure I used as an example in this post had one specific use case, and filled it appropriately. If you find you can apply the same structure to your own code and it works well, then great! But really, what post is for is to get people to start caring about errors, putting actual thought into their designs, and learning how to elegantly pull off ever-present balancing act between the five goals of good backtraces, extensibility, inspectability (matching), stability and modularity.

If there’s one thing I wish for you to take away, it’s that error handling is hard, but it’s worth it to learn. Because I’m tired of having to deal with lazy kitchen-sink-type errors.

The final code
//! This crate provides types for UCD’s `Blocks.txt`.

pub struct Blocks {
	ranges: Vec<(RangeInclusive<u32>, String)>,
}

impl Blocks {
	pub fn block_of(&self, c: char) -> &str {
		self.ranges
			.binary_search_by(|(range, _)| {
				if *range.end() < u32::from(c) {
					cmp::Ordering::Less
				} else if u32::from(c) < *range.start() {
					cmp::Ordering::Greater
				} else {
					cmp::Ordering::Equal
				}
			})
			.map(|i| &*self.ranges[i].1)
			.unwrap_or("No_Block")
	}
	pub fn from_file<P: AsRef<Path>>(path: P) -> Result<Self, FromFileError> {
		let path = path.as_ref();
		(|| {
			Self::from_str(&fs::read_to_string(path).map_err(FromFileErrorKind::ReadFile)?)
				.map_err(FromFileErrorKind::Parse)
		})()
		.map_err(|kind| FromFileError {
			path: path.into(),
			kind,
		})
	}
	pub fn download(agent: &ureq::Agent) -> Result<Self, DownloadError> {
		(|| {
			let response = agent
				.get(LATEST_URL)
				.call()
				.map_err(|e| DownloadErrorKind::Request(Box::new(e)))?;
			Self::from_str(
				&response
					.into_string()
					.map_err(DownloadErrorKind::ReadBody)?,
			)
			.map_err(DownloadErrorKind::Parse)
		})()
		.map_err(|kind| DownloadError { kind })
	}
}

impl FromStr for Blocks {
	type Err = ParseError;
	fn from_str(s: &str) -> Result<Self, Self::Err> {
		let ranges = s
			.lines()
			.enumerate()
			.map(|(i, line)| {
				(
					i,
					line.split_once('#').map(|(line, _)| line).unwrap_or(line),
				)
			})
			.filter(|(_, line)| !line.is_empty())
			.map(|(i, line)| {
				(|| {
					let (range, name) = line.split_once(';').ok_or(ParseErrorKind::NoSemicolon)?;
					let (range, name) = (range.trim(), name.trim());
					let (start, end) = range.split_once("..").ok_or(ParseErrorKind::NoDotDot)?;
					let start = u32::from_str_radix(start, 16)
						.map_err(|source| ParseErrorKind::ParseInt { source })?;
					let end = u32::from_str_radix(end, 16)
						.map_err(|source| ParseErrorKind::ParseInt { source })?;
					Ok((start..=end, name.to_owned()))
				})()
				.map_err(|kind| ParseError { line: i, kind })
			})
			.collect::<Result<Vec<_>, ParseError>>()?;
		Ok(Self { ranges })
	}
}

#[derive(Debug)]
#[non_exhaustive]
pub struct DownloadError {
	pub kind: DownloadErrorKind,
}

impl Display for DownloadError {
	fn fmt(&self, f: &mut Formatter<'_>) -> fmt::Result {
		write!(f, "failed to download Blocks.txt from the Unicode website")
	}
}

impl Error for DownloadError {
	fn source(&self) -> Option<&(dyn Error + 'static)> {
		match &self.kind {
			DownloadErrorKind::Request(e) => Some(e),
			DownloadErrorKind::ReadBody(e) => Some(e),
			DownloadErrorKind::Parse(e) => Some(e),
		}
	}
}

#[derive(Debug)]
pub enum DownloadErrorKind {
	Request(Box<ureq::Error>),
	ReadBody(io::Error),
	Parse(ParseError),
}

#[derive(Debug)]
#[non_exhaustive]
pub struct FromFileError {
	pub path: Box<Path>,
	pub kind: FromFileErrorKind,
}

impl Display for FromFileError {
	fn fmt(&self, f: &mut Formatter<'_>) -> fmt::Result {
		write!(f, "error reading `{}`", self.path.display())
	}
}

impl Error for FromFileError {
	fn source(&self) -> Option<&(dyn Error + 'static)> {
		match &self.kind {
			FromFileErrorKind::ReadFile(e) => Some(e),
			FromFileErrorKind::Parse(e) => Some(e),
		}
	}
}

#[derive(Debug)]
pub enum FromFileErrorKind {
	ReadFile(io::Error),
	Parse(ParseError),
}

#[derive(Debug)]
#[non_exhaustive]
pub struct ParseError {
	pub line: usize,
	pub kind: ParseErrorKind,
}

impl Display for ParseError {
	fn fmt(&self, f: &mut Formatter<'_>) -> fmt::Result {
		write!(f, "invalid Blocks.txt data on line {}", self.line + 1)
	}
}

impl Error for ParseError {
	fn source(&self) -> Option<&(dyn Error + 'static)> {
		Some(&self.kind)
	}
}

#[derive(Debug)]
pub enum ParseErrorKind {
	#[non_exhaustive]
	NoSemicolon,
	#[non_exhaustive]
	NoDotDot,
	#[non_exhaustive]
	ParseInt { source: ParseIntError },
}

impl Display for ParseErrorKind {
	fn fmt(&self, f: &mut Formatter<'_>) -> fmt::Result {
		match *self {
			Self::NoSemicolon => f.write_str("no semicolon"),
			Self::NoDotDot => f.write_str("no `..` in range"),
			Self::ParseInt { .. } => {
				write!(f, "one end of range is not a valid hexadecimal integer")
			}
		}
	}
}

impl Error for ParseErrorKind {
	fn source(&self) -> Option<&(dyn Error + 'static)> {
		match self {
			Self::ParseInt { source } => Some(source),
			_ => None,
		}
	}
}

#[cfg(test)]
mod tests {
	#[test]
	fn real_unicode() {
		let data = include_str!("../Blocks.txt").parse::<Blocks>().unwrap();
		assert_eq!(data.block_of('\u{0080}'), "Latin-1 Supplement");
		assert_eq!(data.block_of('½'), "Latin-1 Supplement");
		assert_eq!(data.block_of('\u{00FF}'), "Latin-1 Supplement");
		assert_eq!(data.block_of('\u{EFFFF}'), "No_Block");
	}

	use crate::Blocks;
}

pub const LATEST_URL: &str = "https://www.unicode.org/Public/UCD/latest/ucd/Blocks.txt";

use std::cmp;
use std::error::Error;
use std::fmt;
use std::fmt::Display;
use std::fmt::Formatter;
use std::fs;
use std::io;
use std::num::ParseIntError;
use std::ops::RangeInclusive;
use std::path::Path;
use std::str::FromStr;
https://sabrinajewson.org/blog/errors
The Better Alternative to Lifetime GATs
Show full content

Update (2022-05-30): danielhenrymantilla recently released a crate, nougat, which provides a proc macro that allows you to use the technique presented in this article with the same syntax as regular GATs. I encourage you to check it out!

Where real GATs fall short

GATs are an unstable feature of Rust, likely to be stabilized in the next few versions, that allow you to add generic parameters on associated types in traits. The motivating example for this feature is the “lending iterator” trait, which allows you to define an iterator for which only one of its items can exist at any given time. With lifetime GATs, its signature would look something like this:

pub trait LendingIterator {
	type Item<'this>
	where
		Self: 'this;
	fn next(&mut self) -> Option<Self::Item<'_>>;
}

and it would allow you to implement iterators you otherwise wouldn’t have been able to, like WindowsMut (since the slices it returns overlap, a regular iterator won’t work):

use ::core::mem;

pub fn windows_mut<T, const WINDOW_SIZE: usize>(
	slice: &mut [T],
) -> WindowsMut<'_, T, WINDOW_SIZE> {
	assert_ne!(WINDOW_SIZE, 0);
	WindowsMut { slice, first: true }
}

pub struct WindowsMut<'a, T, const WINDOW_SIZE: usize> {
	slice: &'a mut [T],
	first: bool,
}

impl<'a, T, const WINDOW_SIZE: usize> LendingIterator
	for WindowsMut<'a, T, WINDOW_SIZE>
{
	type Item<'this> = &'this mut [T; WINDOW_SIZE] where 'a: 'this;

	fn next(&mut self) -> Option<Self::Item<'_>> {
		if !self.first {
			self.slice = &mut mem::take(&mut self.slice)[1..];
		}
		self.first = false;

		Some(self.slice.get_mut(..WINDOW_SIZE)?.try_into().unwrap())
	}
}

Great! That’s our LendingIterator trait, done and dusted, and we’ve proven that it works. End of article.

Well, before we go let’s just try one last thing: actually consuming the WindowsMut iterator. There’s no need to really because I’m sure it’ll work, but we’ll do it anyway for the learning experience, right?

So first we’ll define a function that prints each element of a lending iterator. This is pretty simple, we just have to use HRTBs to write the trait bound and a while let loop for the actual consumption.

fn print_items<I>(mut iter: I)
where
	I: LendingIterator,
	for<'a> I::Item<'a>: Debug,
{
	while let Some(item) = iter.next() {
		println!("{item:?}");
	}
}

All good so far, this compiles fine. Now we’ll actually call it with an iterator:

print_items::<WindowsMut<'_, _, 2>>(windows_mut(&mut [1, 2, 3]));

This should obviously compile since &mut [i32; 2] is definitely Debug. So we can just run cargo run and see the ou–

error[E0716]: temporary value dropped while borrowed
  --> src/main.rs:45:58
   |
45 |     print_items::<WindowsMut<'_, _, 2>>(windows_mut(&mut [1, 2, 3]));
   |     -----------------------------------------------------^^^^^^^^^--
   |     |                                                    |
   |     |                                                    creates a temporary which is freed while still in use
   |     argument requires that borrow lasts for `'static`
46 | }
   | - temporary value is freed at the end of this statement

oh.

oh no.

What went wrong?

Clearly, something’s not right here. rustc is telling us that for some reason, our borrow of the array [1, 2, 3] is required to live for 'static — but we haven’t written any 'static bounds anywhere, so this doesn’t really make much sense. We’ll have to put ourselves in the mindset of the compiler for a bit so that we can try to figure out what’s happening.

First of all, we create an iterator of WindowsMut<'0, i32, 2>, where '0 is the name of some local lifetime (notably, this lifetime is necessarily shorter than 'static). Then we pass this iterator type into the function print_items, in doing so setting its I generic parameter to the aforementioned type WindowsMut<'0, i32, 2>.

So now we just need to make sure that the trait bounds hold. Substituting I for its actual type in the where clause of print_items, we get this bound that needs to be checked:

where
	for<'a> <WindowsMut<'0, i32, 2> as LendingIterator>::Item<'a>: Debug,

The for<'a> syntax means that we must verify that any lifetime can be substituted in the right hand side and the trait bound must still pass. A good edge case to check here is 'static, since we know that if that check fails the overall bound will definitely fail. So we end up with this:

where
	<WindowsMut<'0, i32, 2> as LendingIterator>::Item<'static>: Debug,

Or in other words, the associated item type of WindowsMut must implement Debug when fed the lifetime 'static. Let’s hop back to the implementation of LendingIterator for WindowsMut to see if that actually holds. As a quick refresher, the relevant bit of code is here:

impl<'a, T, const WINDOW_SIZE: usize> LendingIterator
	for WindowsMut<'a, T, WINDOW_SIZE>
{
	type Item<'this> = &'this mut [T; WINDOW_SIZE] where 'a: 'this;
	/* ... */
}

Uhh…that’s a bit complex. Let’s replace the generic types with our concrete ones to simplify it.

impl LendingIterator for WindowsMut<'0, i32, 2> {
	type Item<'static> = &'static mut [i32; 2]
	where
		'0: 'static;
}

And now we can finally see what’s going wrong. As we established earlier, '0 is the local lifetime of [1, 2, 3] and is therefore definitely a shorter lifetime than 'static. This means that there is absolutely no way that the bound '0: 'static will hold, making <WindowsMut<'0, i32, 2> as LendingIterator>::Item<'static> an invalid type altogether. So of course the compiler can’t verify that it implements Debug — it doesn’t even exist at all! This was what the compiler was really trying to tell us earlier, even if it was a bit obtuse about it.

The ultimate conclusion of all this is that HRTBs basically can’t be used with lifetime GATs at all. for<'a> just doesn’t express the right requirement — we don’t want to require the bound for any lifetime, we only really want to require it for lifetimes shorter than '0. Ideally, we would be able to write in a where clause there, so the bounds of print_items could become:

fn print_items<I>(mut iter: I)
where
	I: LendingIterator,
	for<'a where I: 'a> I::Item<'a>: Debug,

This would mean that 'static can’t be selected as the lifetime chosen for the HRTB since WindowsMut<'0, i32, 2> is definitely not 'static, so our above proof-by-contradiction would no longer work and the compiler would accept our correct code without problem.

But unfortunately it doesn’t look like we’ll be getting this feature any time soon. At the time of writing I do not know of any RFC or formal suggestion for this feature (other than one rust-lang/rust issue) so it’ll be a long time before it actually arrives on stable should we get it at all. Until then, we’re stuck with a hard limitation every time you use lifetime GATs: you can’t place trait bounds on GATs or require them to be a specific type unless the trait implementor is 'static.

This makes real GATs practically unusable for most use cases. I’m still happy they’re being stabilized, but they likely won’t see wide adoption in APIs until this problem is solved.

So, what can we do in the meantime?

Workaround 1: dyn Trait as a HKT

As first shared in this gist by @jix, one workaround is to use dyn Trait as a form of HKT, because dyn Trait accepts an HRTB in its type, and supports changing associated types based on the HRTB’s lifetime.

To implement the design in our code, first we modify the LendingIterator trait to look like this:

pub trait GivesItem<'a> {
	type Item;
}

pub trait LendingIterator {
	type Item: ?Sized + for<'this> GivesItem<'this>;
	fn next(&mut self) -> Option<<Self::Item as GivesItem<'_>>::Item>;
}

The magic comes in the implementation of LendingIterator for specific types. For WindowsMut it looks like this:

impl<'a, T, const WINDOW_SIZE: usize> LendingIterator
	for WindowsMut<'a, T, WINDOW_SIZE>
{
	type Item = dyn for<'this> GivesItem<
		'this,
		Item = &'this mut [T; WINDOW_SIZE],
	>;

	/* ... */
}

As you can see, the Item type is set to a dyn Trait with an HRTB, where the dyn Trait’s associated type depends on the input HRTB lifetime. So even though type Item is only a single type, it actually acts like a function from a lifetime to a type, just like a real GAT.

We can then modify the signature of print_items like so:

fn print_items<I>(mut iter: I)
where
	I: LendingIterator,
	for<'a> <I::Item as GivesItem<'a>>::Item: Debug,

And lo and behold, it works!

[1, 2]
[2, 3]

However, this approach runs into some nasty limitations rather quickly. Let’s say that we have now defined a mapping operation on lending iterators:

pub fn map<I, F>(iter: I, mapper: F) -> Map<I, F>
where
	I: LendingIterator,
	F: for<'a> Mapper<'a, <I::Item as GivesItem<'a>>::Item>,
{
	Map { iter, mapper }
}

pub struct Map<I, F> {
	iter: I,
	mapper: F,
}

impl<I, F> LendingIterator for Map<I, F>
where
	I: LendingIterator,
	F: for<'a> Mapper<'a, <I::Item as GivesItem<'a>>::Item>,
{
	type Item = dyn for<'this> GivesItem<
		'this,
		Item = <F as Mapper<'this, <I::Item as GivesItem<'this>>::Item>>::Output,
	>;

	fn next(&mut self) -> Option<<Self::Item as GivesItem<'_>>::Item> {
		self.iter.next().map(&mut self.mapper)
	}
}

// Trait helper to allow the lifetime of a mapping function's output to depend
// on its input. Without this, `map` on an iterator would always force lending
// iterators to become non-lending which we don't really want.
pub trait Mapper<'a, I>: FnMut(I) -> <Self as Mapper<'a, I>>::Output {
	type Output;
}

impl<'a, I, F, O> Mapper<'a, I> for F
where
	F: FnMut(I) -> O,
{
	type Output = O;
}

and then decide to use a mapped iterator instead of the normal one:

let mut array = [1, 2, 3];
let iter = windows_mut::<_, 2>(&mut array);

fn mapper(input: &mut [i32; 2]) -> &mut i32 {
	&mut input[0]
}
let mapped = map(iter, mapper);

print_items::<Map<_, _>>(mapped);

This works fine, printing the desired result of 1 followed by 2.

But if we suddenly decide that the code in print_items should be inlined, we’re in for a not-so-fun little surprise:

let mut mapped = map(iter, mapper);

while let Some(item) = mapped.next() {
	println!("{item:?}");
}
error[E0308]: mismatched types
  --> src/main.rs:97:35
   |
97 |     while let Some(item) = mapped.next() {
   |                                   ^^^^ one type is more general than the other
   |
   = note: expected associated type `<(dyn for<'this> GivesItem<'this, for<'this> Item = &'this mut [i32; 2]> + 'static) as GivesItem<'_>>::Item`
              found associated type `<(dyn for<'this> GivesItem<'this, for<'this> Item = &'this mut [i32; 2]> + 'static) as GivesItem<'this>>::Item`

To be honest, I have absolutely no idea what this error message is saying — but I’m pretty sure it’s just nonsense because the generic version works fine.

This isn’t the worst problem in the world — it’s inconvenient but it can probably always be worked around. That said, it is still possible to improve the ergonomics.

Workaround 2: HRTB supertrait

Let’s try a different approach then. We’ll start again from the real GAT version, but this time with explicit lifetimes (you’ll see why in a minute):

pub trait LendingIterator {
	type Item<'this> where Self: 'this;
	fn next<'this>(&'this mut self) -> Option<Self::Item<'this>>;
}

You’ll notice that all items of the trait use the 'this lifetime. So we can eliminate the use of GATs by raising that lifetime up one level, to become a generic parameter of the whole trait instead of each item on the trait.

pub trait LendingIterator<'this>
// This where bound is raised from the GAT
where
	Self: 'this,
{
	type Item;
	fn next(&'this mut self) -> Option<Self::Item<'this>>;
}

This way, for<'a> LendingIterator<'a> becomes an identical trait to the old LendingIterator trait — given a specific lifetime, we get both a next function and Item associated type.

However, there are a few problems with a trait declared this way:

  1. fn next(&'this mut self) is verbose and doesn’t allow eliding the lifetimes.
  2. The trait bound for<'a> LendingIterator<'a> is long and inconvenient to spell out.
  3. Some functions like for_each need Self to implement for<'a> LendingIterator<'a> in order for their signature to work. But it’s hard to express that within a trait LendingIterator<'this> where the HRTB is not already present.

To solve them we can split the trait into two, moving the parts that can have generic parameters (functions) into an outer lifetime-less subtrait and the parts that can’t have generic parameters (types) into an inner lifetimed supertrait:

pub trait LendingIteratorLifetime<'this>
where
	Self: 'this,
{
	type Item;
}

pub trait LendingIterator: for<'this> LendingIteratorLifetime<'this> {
	fn next(&mut self) -> Option<<Self as LendingIteratorLifetime<'_>>::Item>;
}

Now we can finally get to reimplementing WindowsMut:

impl<'this, 'a, T, const WINDOW_SIZE: usize> LendingIteratorLifetime<'this>
	for WindowsMut<'a, T, WINDOW_SIZE>
where
	Self: 'this,
{
	type Item = &'this mut [T; WINDOW_SIZE];
}

impl<'a, T, const WINDOW_SIZE: usize> LendingIterator
	for WindowsMut<'a, T, WINDOW_SIZE>
{
	fn next(&mut self) -> Option<<Self as LendingIteratorLifetime<'_>>::Item> {
		if !self.first {
			self.slice = &mut mem::take(&mut self.slice)[1..];
		}
		self.first = false;

		Some(self.slice.get_mut(..WINDOW_SIZE)?.try_into().unwrap())
	}
}

Let’s try it out then! Just run cargo build and…

error[E0477]: the type `WindowsMut<'a, T, WINDOW_SIZE>` does not fulfill the required lifetime
  --> src/main.rs:41:39
   |
41 | impl<'a, T, const WINDOW_SIZE: usize> LendingIterator
   |                                       ^^^^^^^^^^^^^^^

Right — I should know better than to expect things to work first try at this point.

That error’s extremely unhelpful, but there is actually a legitimate explanation for what’s happening here. Once again putting on our compiler hats, one of our jobs when checking a trait implementation is to check whether the supertraits hold. In this case that means we have to satisfy this trait bound:

WindowsMut<'a, T, WINDOW_SIZE>: for<'this> LendingIteratorLifetime<'this>

Like before, a good edge case to check for with HRTB bounds is whether substituting in 'static holds. In other words, a necessary condition for the above bound to be satisfied is that this bound is also satisfied:

WindowsMut<'a, T, WINDOW_SIZE>: LendingIteratorLifetime<'static>

So let’s check that. Jumping to the implementation of LendingIteratorLifetime for WindowsMut, we see this:

impl<'this, 'a, T, const WINDOW_SIZE: usize> LendingIteratorLifetime<'this>
	for WindowsMut<'a, T, WINDOW_SIZE>
where
	Self: 'this,

and substituting in 'this for 'static:

impl<'a, T, const WINDOW_SIZE: usize> LendingIteratorLifetime<'static>
	for WindowsMut<'a, T, WINDOW_SIZE>
where
	Self: 'static,

…ah. Self: 'static. That’s probably a problem.

Indeed, if we add a where Self: 'static to the LendingIterator implementation it does compile:

impl<'a, T, const WINDOW_SIZE: usize> LendingIterator
	for WindowsMut<'a, T, WINDOW_SIZE>
where
	Self: 'static,

But that’s definitely not something we want to do — it would mean that WindowsMut would only work on empty slices, global variables and leaked variables.

This is a very similar problem to the one we faced before with the GAT version: ideally, we’d be able to specify a where clause within the for<'a> bound so that only lifetimes shorter than Self could be substituted in, excluding lifetimes like 'static for non- 'static Selfs. The signature could look something like this:

pub trait LendingIterator
where
	Self: for<'this where Self: 'this> LendingIteratorLifetime<'this>,

But just as before where clauses in HRTBs unfortunately don’t exist yet, so it looks like this is just another dead end. What a shame.

HRTB implicit bounds

Having failed thoroughly in your mission to bring reliable and stable lifetime GATs to the Rust ecosystem, you quit programming altogether out of shame and vow to live out the rest of your days as a lowly potato farmer in the countryside. With nothing but a small amount savings and a dream, you move in to a run-down stone farmhouse in Scotland where you can live onwards peacefully and undisturbed.

Many years pass. You have grown accustomed to nature: you have seen plants grow, wither and die before your eyes more times than smallvec has had CVEs, and the seasons are now no more than a blur — day, night, summer, winter all morphing into one another and passing faster than the blink of an eye. You sleep deeply and peacefully every night, safe and comfortable in the knowledge that you’ll never have to deal with wall of text linker errors ever again. You have become so familiar with the pathways and routes around your home that you can walk them in your sleep. Every single nook and cranny of the place down to the most minute detail is etched deep into your brain: the position of each plant, the location of every nest, the size and shape of each pebble.

So it is no surprise that on one chilly March morning, you immediately notice the abnormal presence of a thin white object sticking out from under a bush. Drawing closer, it appears to be a piece of paper, slightly damp from absorbing the cold morning dew. You pick it up, and as you stare at the mysterious sigils printed on the page, slowly — very slowly — a vague memory begins to come back to you. That’s right, it’s “Rust”. And this “Rust” on the page appears to form a very short program:

fn example<T>(value: T)
where
	for<'a> &'a T: Debug,
{
	eprintln!("{:?}", &value);
}
let array = [1, 2, 3];
example(&array);

As you make your way back to the farmhouse, mysterious piece of paper in hand, you ponder about what it could mean. Of course, there’s no way it would compile, you know that much: for<'a> would be able to select 'static as its lifetime, meaning &'static T would need to implement Debug, which is obviously not true for the &'array [i32; 3] shown (as &'static &'array [i32; 3] can’t even exist, let alone be Debug).

So why would someone go to the effort of printing out code that doesn’t even work — and what’s more, placing it all the way in your farm? It is this that you wonder about while you dig out your old laptop from deep inside storage. It hasn’t been touched for five years, so it’s gotten a little dusty — but you press the power button and screen bursts into colour and life, exactly as it used to do those so many years ago.

Tentatively, you open a text editor, and begin copying out the contents of that paper inside it. Now, how do I build it again? Shipment? Freight? Haul? No, it was something different…ah, cargo, that was it. Into the shell you type out the words you haven’t seen for so, so long:

cargo run

You take a deep breath, and then press the enter key. The fan whirrs as the CPU starts into life. For a short moment that feels like an eon, Cargo displays “Building” — but eventually it finishes, and as it does, one line of text rolls down the screen:

[1, 2, 3]

Wait, what? Do that again.

You take a deep breath, and then press the enter key. The fan whirrs as the CPU starts into life. For a short moment that feels like an eon, Cargo displays “Building” — but eventually it finishes, and as it does, one line of text rolls down the screen:

[1, 2, 3]

So it wasn’t just a fluke. But that makes no sense at all: by all the rules we knew, there is no way that code should’ve compiled. So what’s happening here?

The answer is that while for<'a> does not support explicit where clauses, it actually can, sometimes, have an implied where clause — in this case, it’s for<'a where I: 'a>. But it only occurs in specific scenarios: in particular, when there is an implicit bound in the type or trait bound the HRTB is applied to, that implicit bound gets forwarded to the implicit where clause of the HRTB.

An implicit bound is a trait bound that is present, but not stated explicitly by a colon in the generics or where clause. As you can infer from the example above, &'a T contains an implicit bound for T: 'a — this is a really simple rule to prevent nonsense types like &'static &'short_lifetime i32 (a reference that outlives borrowed contents). It’s this rule that causes for<'a> &'a T to act like it’s actually for<'a where T: 'a> &'a T, enabling that code to run and successfully print [1, 2, 3].

Implicit bounds can appear on structs too. For example, take this struct:

#[derive(Debug)]
struct Reference<'a, T>(&'a T);

Because &'a T has an implicit bound of T: 'a, the struct Reference also has an implicit bound of T: 'a. You can prove this because this code compiles:

fn example<T>(value: T)
where
	for<'a /* where T: 'a */> Reference<'a, T>: Debug,
{
	dbg!(Reference(&value));
}

let array = [1, 2, 3];
example(&array);

However, as soon as you try to upgrade the implicit bound to an explicit one you will notice it no longer compiles:

#[derive(Debug)]
struct Reference<'a, T: 'a>(&'a T);

fn example<T>(value: T)
where
	for<'a> Reference<'a, T>: Debug,
{
	dbg!(Reference(&value));
}

let array = [1, 2, 3];
example(&array);
error[E0597]: `array` does not live long enough
  --> src/main.rs:15:13
   |
15 |     example(&array);
   |     --------^^^^^^-
   |     |       |
   |     |       borrowed value does not live long enough
   |     argument requires that `array` is borrowed for `'static`
16 | }
   | - `array` dropped here while still borrowed

Implicit bounds in HRTBs are…a very weird feature of Rust. I’m still not sure whether they are intended to exist or are just an obscure side-effect of the current implementation. But either way, this is an incredibly useful feature for us. If we can somehow leverage this to apply it in our supertrait HRTB of LendingIterator, then we can maybe get it to actually work without the 'static bound! Thanks, mysterious piece of paper.

Workaround 3: The better GATs

Armed with our new knowledge of implied bounds, all we have to do is get it to work in conjuction with that for<'a> LendingIteratorLifetime<'a> supertrait. One way to achieve this is to introduce a new dummy type parameter to LendingIteratorLifetime, so that HRTBs can make use of it to apply their own implicit bounds:

pub trait LendingIteratorLifetime<'this, ExtraParam> {
	type Item;
}

pub trait LendingIterator
where
	Self: for<'this /* where Self: 'this */>
		LendingIteratorLifetime<'this, &'this Self>,
{
	fn next(&mut self) -> Option<<Self as LendingIteratorLifetime<'_, &Self>>::Item>;
}

This works, but it’s a pain to have to write out &'this Self every time you want to use the trait. Ergonomics can be improved slightly by using a default type parameter:

// Give every usage of this trait an implicit `where Self: 'this` bound
pub trait LendingIteratorLifetime<'this, ImplicitBounds = &'this Self> {
	type Item;
}

pub trait LendingIterator
where
	Self: for<'this /* where Self: 'this */> LendingIteratorLifetime<'this>,
{
	fn next(&mut self) -> Option<<Self as LendingIteratorLifetime<'_>>::Item>;
}

There is still one slight improvement we can make to reduce the chance the API is accidentally misused by setting the ImplicitBounds parameter to something other than &'this Self, and that is using a sealed type and trait. This leads to my current recommended definition for this trait:

pub trait LendingIteratorLifetime<'this, ImplicitBounds: Sealed = Bounds<&'this Self>> {
	type Item;
}

mod sealed {
	pub trait Sealed: Sized {}
	pub struct Bounds<T>(T);
	impl<T> Sealed for Bounds<T> {}
}
use sealed::{Bounds, Sealed};

pub trait LendingIterator: for<'this> LendingIteratorLifetime<'this> {
	fn next(&mut self) -> Option<<Self as LendingIteratorLifetime<'_>>::Item>;
}

New trait in hand, we can rewrite our type WindowsMut to use it:

impl<'this, 'a, T, const WINDOW_SIZE: usize> LendingIteratorLifetime<'this>
	for WindowsMut<'a, T, WINDOW_SIZE>
{
	type Item = &'this mut [T; WINDOW_SIZE];
}

impl<'a, T, const WINDOW_SIZE: usize> LendingIterator
	for WindowsMut<'a, T, WINDOW_SIZE>
{
	fn next(&mut self) -> Option<<Self as LendingIteratorLifetime<'_>>::Item> {
		if !self.first {
			self.slice = &mut mem::take(&mut self.slice)[1..];
		}
		self.first = false;

		Some(self.slice.get_mut(..WINDOW_SIZE)?.try_into().unwrap())
	}
}

as well as Map (the Mapper trait is still needed):

impl<'this, I, F> LendingIteratorLifetime<'this> for Map<I, F>
where
	I: LendingIterator,
	F: for<'a> Mapper<'a, <I as LendingIteratorLifetime<'a>>::Item>,
{
	type Item = <F as Mapper<
		'this,
		<I as LendingIteratorLifetime<'this>>::Item,
	>>::Output;
}

impl<I, F> LendingIterator for Map<I, F>
where
	I: LendingIterator,
	F: for<'a> Mapper<'a, <I as LendingIteratorLifetime<'a>>::Item>,
{
	fn next(&mut self) -> Option<<Self as LendingIteratorLifetime<'_>>::Item> {
		self.iter.next().map(&mut self.mapper)
	}
}

and unlike both real GATs and workaround 1, this works with both consuming the concrete type directly and through the generic print_items function. Perfect!

Dyn safety

The main disadvantage of workaround 3 in comparison to workaround 1 is that it is not dyn-safe. If you try to use it as a trait object, rustc helpfully tells you this:

note: for a trait to be "object safe" it needs to allow building a vtable to allow the call to be resolvable dynamically; for more information visit <https://doc.rust-lang.org/reference/items/traits.html#object-safety>
   --> src/main.rs:14:28
    |
14  | pub trait LendingIterator: for<'this> LendingIteratorLifetime<'this> {
    |           ---------------  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ...because it uses `Self` as a type parameter
    |           |
    |           this trait cannot be made into an object...

When it says “because it uses Self as a type parameter” it’s actually referring to the hidden Bounds<&'this Self> default parameter we inserted. As a result, making LendingIterator directly work with dyn is simply not possible.

But that is not to say that dynamic dispatch is altogether impossible — all we have to do is define a helper trait for it! And as long as that helper trait uses workaround 1, it will be perfectly object-safe. This does lead to slightly worse ergnomics when using trait objects (due to that compiler bug with concrete types) but there really isn’t much we can do about that.

So let’s start by bringing back our old definition of LendingIterator, but this time under the name ErasedLendingIterator:

pub trait LendingIteratorGats<'a> {
	type Item;
}

pub trait ErasedLendingIterator {
	type Gats: ?Sized + for<'this> LendingIteratorGats<'this>;
	fn erased_next(&mut self) -> Option<<Self::Gats as LendingIteratorGats<'_>>::Item>;
}

Next, we add a blanket implementation of this trait for all LendingIterators:

impl<I: ?Sized + LendingIterator> ErasedLendingIterator for I {
	type Gats = dyn for<'this> LendingIteratorGats<
		'this,
		Item = <I as LendingIteratorLifetime<'this>>::Item,
	>;

	fn erased_next(&mut self) -> Option<<Self::Gats as LendingIteratorGats<'_>>::Item> {
		self.next()
	}
}

Finally, we implement the regular LendingIterator trait on all the trait objects we own:

impl<'this, Gats> LendingIteratorLifetime<'this>
	for dyn '_ + ErasedLendingIterator<Gats = Gats>
where
	Gats: ?Sized + for<'a> LendingIteratorGats<'a>,
{
	type Item = <Gats as LendingIteratorGats<'this>>::Item;
}

impl<Gats> LendingIterator
	for dyn '_ + ErasedLendingIterator<Gats = Gats>
where
	Gats: ?Sized + for<'a> LendingIteratorGats<'a>,
{
	fn next(&mut self) -> Option<<Self as LendingIteratorLifetime<'_>>::Item> {
		self.erased_next()
	}
}

// omitted implementations for all the permutations of auto traits. in a real
// implementation, you'd probably use a macro to generate all 32 versions
// (since there are 5 auto traits)

This is fairly standard boilerplate for defining an object-safe version of a non-object-safe trait, so I won’t explain it in great detail here.

Great, let’s try it out! Here, we can use it to create an iterator over either windows of size 2 or windows of size 3.

let mut array = [1, 2, 3, 4];

fn unsize<const N: usize>(array: &mut [i32; N]) -> &mut [i32] {
	array
}

type Gats = dyn for<'a> LendingIteratorGats<'a, Item = &'a mut [i32]>;
type Erased<'iter> = dyn 'iter + ErasedLendingIterator<Gats = Gats>;

let mut iter: Box<Erased<'_>> = if true {
	Box::new(map(windows_mut::<_, 2>(&mut array), unsize))
} else {
	Box::new(map(windows_mut::<_, 3>(&mut array), unsize))
};

while let Some(item) = iter.next() {
    println!("{item:?}");
}

and cargo build it…

error: implementation of `LendingIteratorLifetime` is not general enough
   --> src/main.rs:166:3
    |
166 |         Box::new(map(windows_mut::<_, 2>(&mut array), unsize))
    |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ implementation of `LendingIteratorLifetime` is not general enough
    |
    = note: `Map<WindowsMut<'_, i32, 2_usize>, for<'r> fn(&'r mut [i32; 2]) -> &'r mut [i32] {unsize::<2_usize>}>` must implement `LendingIteratorLifetime<'0>`, for any lifetime `'0`...
    = note: ...but it actually implements `LendingIteratorLifetime<'1>`, for some specific lifetime `'1`

…ah. Another cryptic error.

I believe what’s happening here is the same ergnomics issue as faced with workaround 1: There’s some compiler bug which makes this not work with concrete types.

So that means all we have to do to fix it is to move it into a generic function! And indeed this version does compile:

fn box_erase<'iter, I>(iter: I) -> Box<Erased<'iter>>
where
	I: 'iter + LendingIterator,
	I: for<'a> LendingIteratorLifetime<'a, Item = &'a mut [i32]>,
{
	Box::new(iter)
}

let mut iter: Box<Erased<'_>> = if true {
	box_erase(map(windows_mut::<_, 2>(&mut array), unsize))
} else {
	box_erase(map(windows_mut::<_, 3>(&mut array), unsize))
};

But we can do better than that, because generics are only one way to erase a value’s concrete type: you can also do it via return-position impl Trait.

fn funnel_opaque<'iter, I>(iter: I)
	-> impl 'iter + ErasedLendingIterator<Gats = Gats>
where
	I: 'iter + LendingIterator,
	I: for<'a> LendingIteratorLifetime<'a, Item = &'a mut [i32]>,
{
	iter
}

let mut iter: Box<Erased<'_>> = if false {
	Box::new(funnel_opaque(map(windows_mut::<_, 2>(&mut array), unsize)))
} else {
	Box::new(funnel_opaque(map(windows_mut::<_, 3>(&mut array), unsize)))
};

And this also works.

If you want to, you can generalize funnel_opaque further so that it works with any &'a mut T type instead of just &'a mut [i32]:

type Gats<T> = dyn for<'a> LendingIteratorGats<'a, Item = &'a mut T>;
type Erased<'iter, T> = dyn 'iter + ErasedLendingIterator<Gats = Gats<T>>;

fn funnel_opaque<'iter, I, T>(iter: I)
	-> impl 'iter + ErasedLendingIterator<Gats = Gats<T>>
where
	T: ?Sized,
	I: 'iter + LendingIterator,
	I: for<'a> LendingIteratorLifetime<'a, Item = &'a mut T>,
{
	iter
}

let mut iter: Box<Erased<'_, [i32]>> = if false {
	Box::new(funnel_opaque(map(windows_mut::<_, 2>(&mut array), unsize)))
} else {
	Box::new(funnel_opaque(map(windows_mut::<_, 3>(&mut array), unsize)))
};

But unfortunately you can’t generalize it completely to any LendingIterator, because you just run into that compiler bug again.

Conclusion

So there we have it - this technique is, to my knowledge, the best way to use lifetime GATs in Rust. Even once real GATs become stabilized, I predict it’ll likely still be useful for a long time to come, so you might want to familiarize yourself with it.

https://sabrinajewson.org/blog/the-better-alternative-to-lifetime-gats
Async destructors, async genericity and completion futures
Show full content

The main focus of this article will be on attempting to design a system to support asynchronous destructors in the Rust programming language, figuring the exact semantics of them and resolving any issues encountered along the way. By side effect, it also designs a language feature called “async genericity” which enables supporting blocking and asynchronous code with the same codebase, as well as designing a system for completion-guaranteed futures to be added to the language.

Why async destructors?

Async destructors, at a high level, would allow types to run code with .awaits inside it when they are dropped. This enables cleanup code to actually perform I/O, giving much more freedom in the extent to which resources can be properly cleaned up. One notable use case for this is implementing the TLS protocol, in which:

Each party MUST send a "close_notify" alert before closing its write
side of the connection, unless it has already sent some error alert.

(RFC 8446). In order to make sure that this requirement is consistently fulfilled, TLS implementations should be able to send this alert when the TlsStream type is dropped - and if all I/O is done asynchronously, this requires asynchronous destructors.

Currently, this kind of cleanup is generally managed by methods like poll_shutdown and poll_close: asynchronous functions that can optionally be called by the user if they want the type to be cleanly disposed of. However, this approach has several limitations:

  • There is no way to statically guarantee that the method isn’t called twice, that’s up to the user.
  • There is no way to statically guarantee that the method is called at all - it can be very easy to forget.
  • Calling it at the lifecycle end of each value is cumbersome boilerplate, and would ideally not be necessary.
  • It only works on types that actually implement AsyncWrite. If your type is not actually a byte stream, too bad.

Clearly we need a better solution than this. So let’s look at some practical examples to work out what features we’d need to improve the situation.

Async drop after future cancellation

Let’s start simple, with this trivial function:

async fn wait_then_drop_stream(_stream: TlsStream) {
	time::sleep(Duration::from_secs(10)).await;
}

It’s an asynchronous function that takes ownership of a TlsStream, sleeps for 10 seconds, then implicitly drops it at the end. The most obvious characteristic we want of this function is that the TLS stream should perform graceful close_notify shutdown after the 10 seconds. However there’s also a slightly more subtle but equally important one: because in Rust every future is implicitly made cancellable at .await points, the same graceful shutdown should also happen if the future is cancelled. For example, suppose the function is used like this:

let handle = task::spawn(wait_then_drop_stream(some_tls_stream));
time::sleep(Duration::from_secs(5)).await;
handle.cancel();

Just because we cancel the task overall doesn’t mean we suddenly want to sidestep the regular graceful shutdown and have the TLS stream finish in an unclean manner - in fact, we almost never want that. So somehow we need a way to register async operations to occur after a future is cancelled, in order to support running the graceful shutdown code in there. How do we do that?

As it turns out, with async destructors in the language that becomes quite easy: since future cancellation is signalled to the future is via calling its destructor, the future can simply itself have an async destructor and run the cleanup code in there. The precise semantics of this would work in a very similar way to how synchronous destruction works today: drop each of the local variables in reverse order (and this critically includes the _stream variable).

Hidden awaits

A second question we have to answer is what happens when async destruction itself is cancelled - for example, you might be in the middle of dropping a TLS stream, but at the same time your task suddenly gets aborted. To demonstrate this problem, take a look at this function:

async fn assign_stream(target: &mut TlsStream, source: TlsStream) {
	*target = source; // Async destructor is implicitly called!
	println!("1");
	async { println!("2") }.await;
	println!("3");
	yield_now().await;
	println!("4");
}

It assigns the source TLS stream to the target TLS stream (dropping the old source stream in the process), then prints out numbers 1 to 4. Under normal circumstances, this task would just run from top to bottom and always print out every number; but when cancellation gets involved, things become more complicated. If cancellation were to happen during the assignment of source to target, the language now has to decide what to do with the rest of the code - should it run it to the end? Should it immediately exit? Should it run only some of it?

There are three main categories of option worth talking about here: “abort now” designs, “never abort” designs and “delayed abort” designs. Each one has both advantages and drawbacks, which are explored in detail below.

“Abort now” designs

Under these designs, none of the four prints in the code above are guaranteed to run - if the assignment is aborted, it will exit the future as soon as possible while performing the minimum amount of cleanup (i.e. just running destructors and nothing else).

There are three variants of this design, differing slightly in when they require .await to be specified:

  1. Sometimes await: Under this design, = is kept to never require an .await and async function calls are kept to always require an .await. This mostly keeps things the same way as they are: no special new syntax is introduced, and no major breaking changes are made.

    To get a feel for how this looks, here is a non-trivial “real world” async function implemented using it:

    async fn handle_stream(mut stream: TlsStream) -> Result<()> {
    	loop {
    		match read_message(&mut stream).await? {
    			Message::Redirect(address) => {
    				stream = connect(address).await?;
    				// The below line isn't guaranteed to run even if
    				// redirection succeded, since the future could be
    				// cancelled during the drop of the old `TlsStream`.
    				log::info!("Redirected");
    			}
    			Message::Exit => break,
    		}
    	}
    }
    

    It does introduce a footgun as it will no longer be obvious at which points control flow can exit a function. It can also be considered inconsistent as some suspend points require an .await while others don’t, despite the fact that there is no meaningful semantic difference between the two kinds.

  2. Never await: To resolve that inconsistency, this design removes .awaits altogether, making all cancellation points completely invisible. Adapting our example from before, it would look like:

    async fn handle_stream(mut stream: TlsStream) -> Result<()> {
    	loop {
    		match read_message(&mut stream)? {
    			Message::Redirect(address) => {
    				stream = connect(address)?;
    				log::info!("Redirected");
    			}
    			Message::Exit => break,
    		}
    	}
    }
    

    Aside from the technical issues of removing .await (is it done recursively? does it make implementing Future a breaking change? are async blocks made redundant? et cetera) and the backwards compatibility/churn issue, this has the same footgun issue as the previous option but turned up to the extreme - it would now be basically impossible to carefully manage where cancellations can occur and most users would end up having to treat cancellation more as a pthread_kill than a helpful control flow construct.

  3. Always await: On the flip side, this design makes .awaits mandatory everywhere. Assignments to a value with an asynchronous destructor must be done with a new =.await operator instead of plain =, and values cannot implicitly fall out of scope but must instead be explicitly dropped by the user. Once again returning to the handle_stream example:

    async fn handle_stream(mut stream: TlsStream) -> Result<()> {
    	loop {
    		match read_message(&mut stream).await? {
    			Message::Redirect(address) => {
    				stream =.await connect(address).await?;
    				log::info!("Redirected");
    			}
    			Message::Exit => break,
    		}
    	}
    	drop(stream).await;
    }
    

    This is the only option of the three to definitively avoid the “implicit cancel” footgun, but it’s still not ideal as it ends up introducing new weird-looking syntax and makes writing async code pretty verbose.

All three of these variants end up with pretty significant drawbacks - fundamentally, it’s pretty incompatible with the current async syntax and model. So if aborting is so tricky to support, what if we could sidestep the problem by avoiding it altogether?

“Never abort” designs

This design category eliminates implicit cancellation entirely from the language. Futures would, much like synchronous functions, run from linearly top to bottom without the possibility of caller-induced early exit (of course, panics can still cause early exit to happen). This means that all of 1, 2, 3 and 4 are guaranteed to be printed in the assign_stream function shown at the start of this section, since at no point is code execution ever allowed to stop. This approach has been proposed by Carl Lerche previously, if you want to read more about it.

Much like the “abort now” category, it has three sub-designs, “always await”, “sometimes await” and “never await” depending on where .await is deemed to be necessary. Much of the same arguments listed up there apply, although there is no longer the issue of the footgun caused by potential cancellation points being implicit so it is mostly a question of weighing up consistency, breakage and new syntax.

This is another highly consistent approach, however it comes with the major downside of throwing away the very useful tool that is implicit cancellation contexts. While it is definitely possible for cancellation to be implemented as a library feature (see CancellationToken and StopToken) and I want that to be an option for use cases that need it, most of the time having an implicit context is far more useful since it is less verbose and requires much less boilerplate to make use of. I would hate to see otherwise infallible functions become fallible, or an enormous migration effort to add cancellation token parameters to every function.

One argument Carl Lerche used to support his point was an example code snippet in which future cancellation combined with select! turned out to be a footgun. But as Yoshua Wuyts argued in Futures Concurrency III, the primary problem in code like that is the confusing semantics of select! and not the cancellation behaviour of futures. Ultimately, I do not believe cancellation to be problematic enough to warrant removing it from the language. Although this approach’s consistency and its parallel with blocking code is nice, cancellation is still useful and there are ways to combine it with async destructors that don’t introduce footguns.

Note that even with the other options, adding async destructors to the language would make it trivial to create a combinator that executes futures in a “no-cancellation” mode if such semantics are desired - see appendix D for more.

“Delayed abort” designs

Unlike the previous two designs, these approaches try to fully embrace the syntactical difference between assigning and falling out of scope, which don’t require an .await, and calling an async function, which does. When the caller attempts to cancel the future during one of the former operations, the future will actually continue to run for a short while afterwards until it is able to reach one of the latter operations and properly exit.

This immediately solves the main set of problems that plagued the “abort now” designs without going to the extreme that never-abort did: there is no footgun as cancellation points are never implicitly introduced, no new syntax is added and no major breaking changes are made, and there is now a definite reason why = doesn’t need .await but calling functions does.

However, it is not perfect. It effectively introduces two different kinds of suspend point which behave pretty differently, an inconsistency not present with “abort now” and “never abort” designs. Additionally, it means that if you call a wrapper function around the = operator or call drop manually, it has subtly different semantics from using the built-in language behaviour since it changes what kind of suspend point it is. This is probably unexpected and unintuitive for most users.

There are three variations of this design, depending on when the code stops running:

  1. Abort before first await: Code will continue to run after cancellation of an operation like = until the next point at which .await occurs, at which point the outer future will promptly exit without even polling the inner future once. In the assign_stream example, that means that 1 is guaranteed to be printed, but everything after that isn’t.
  2. Abort after first await: As with the previous one, but the future will be polled once (only to have its result discarded and the outer future to exit). In our example, that means 1 and 2 are guranteed to be printed, but not anything beyond that.
  3. Abort at first suspend: The outer future will abort the first time a future which it .awaits returns Poll::Pending when it is polled. In the example code, this will force all of 1, 2 and 3 to be printed, but not 4 since yield_now() causes a suspend point to occur. This is the most similar to how future cancellation works today, because cancellation cannot currently appear to happen without a suspend point (it still can’t with the above proposals, but it appears to because async {}.await potentially exits control flow). From the future’s perspective, this behaves is exactly as if the caller had just waited and then attempted cancellation later on.

Although they might seem very similar, with the first two approaches an extremely subtle but very important paradigm shift is made: .await changes its meaning from being a “might suspend” operator to a “might halt” or “might abort” operator, since async {}.await; is now able to cause computation to suddenly stop. This is a small difference, but ends up very problematic as we now have to answer a whole host of new questions:

  • If .await is just about cancellation, should we allow omitting it to call async functions while forbidding cancellation?
  • Should we allow calling synchronous functions with .await to introduce cancellation points around them?
  • Should we introduce plain await; statements to introduce those cancellation points, equivalent to async {}.await;?

Phrased another way, we open ourselves up to this table existing whose empty boxes will come across as obvious holes:

Caller can’t cancelCaller can cancelCallee can’t cancelfoo()?Callee can cancel?foo().await

I don’t think that’s a situation we want to be in. The third approach avoids the whole situation altogether by tying abort opportunities to suspend points, removing the need for the second column in that table and thus closing those holes.

Additionally, the third variant is less of a breaking change because code that previously relied on the immediately-completing parts of an async operation not being able to abort won’t have to adjust their expectations. Technically it’s still non-breaking either way because no existing code uses asynchronous destructors, but it allows programmers to keep their mental model which is important too.

Because of all these reasons, I am in favour using a delayed abort design with abort-at-first-suspend: it would require little migration effort, avoids footguns and I don’t think is too surprising for users. The rest of this post will be written assuming that design is chosen.

Async drop in a sync function

Perhaps the hardest problem any async drop design has to face is what happens when a type with an async destructor gets dropped in a synchronous context. Consider this code:

fn sync_drop_stream(_stream: TlsStream) {}

The synchronous function declared takes a TLS stream as a parameter. It must do something with the stream it has been given since it has ownership and there’s no return value to pass it back to the caller, but it can’t use a regular asynchronous drop because it is a synchronous function. So what can it do? In withoutboats’ post on this subject they hypothesized two options:

  1. Call it’s non-async destructor, like every other type.
  2. Introduce some kind of executor to the runtime (probably just block_on) to call as part of the drop glue.

To me, both solutions seem pretty bad. Solution 2 is obviously unworkable for the reasons Boats’ outlined, but I believe solution 1 is far more of a footgun than it appears. Many many functions from the standard library become essentially off-limits, so not only do you not get their ergonomics in well-written code it would be very easy to create bug-ridden code too, simply by calling any function like Option::insert on a TLS stream.

My alternative solution is to forbid that code from compiling entirely. For a type to be dropped in a synchronous context it must implement a certain trait, and this just wouldn’t be implemented for TlsStream and similar types. Therefore, barring using of an explicit close_unclean method on TlsStream, it becomes totally impossible to cause an unclean TLS close from anywhere, eliminating an entire category of bugs.

This approach is not without its difficulties - in fact, it has more of them than the others and lots of this article will be simply dedicated to figuring them out. But ultimately, I do believe it to a better solution for the sake of those stronger static guarantees.

Panic checks

I mentioned that this design would forbid at compile time async drop types being dropped in a synchronous context. So, seems easy right? Just detect when the compiler would run the destructor for each value and error out if it’s invalid.

// Error
fn bad(stream: TlsStream) {
	println!("{:?}", stream.protocol_version());
	// Implicitly dropped here: error!
}
// OK
fn good(stream: TlsStream) -> TlsStream {
	println!("{:?}", stream.protocol_version());
	stream
}

Except…it’s not so simple. Because at nearly every point in a program, it is possible for the thread to panic, and if that happens unwinding might start to occur and if that happens you need to drop all the local variables in scope but you can only do that if they have a synchronous destructor! So really the compiler ought to forbid any usage of values with an asynchronous destructor in a synchronous context since panics can always happen and mess things up.

// Error
fn bad(stream: TlsStream) -> TlsStream { stream }

But that doesn’t work either. The usage of types with an asynchronous destructor in a synchronous context is absolutely necessary in many circumstances, for example TlsStream::close_unclean which takes self or block_on which takes a future. What the compiler actually needs to enforce is then slightly more relaxed: While a value that cannot be synchronously dropped is held in scope, no operations that might panic can occur. “Operations that might panic” here includes calling any function or triggering any operator overload. It only doesn’t include simple things like constructing a struct or tuple, accessing a type’s field (without overloaded Deref), matching, returning, or any other built-in and trivial operation.

// Error
fn bad(stream: TlsStream) -> TlsStream {
	println!("{:?}", stream.protocol_version());
	stream
}
// OK
fn good(stream: TlsStream) -> TlsStream { stream }

This rule is quite limited, but actually provides all the tools necessary for dealing with this situation. It is particularly effective when combined with ManuallyDrop: because ManuallyDrop skips running the destructor of a type, it is always able to be synchronously dropped even if the type inside isn’t. So as long as the first might-panic operation you do upon obtaining one of these values is calling ManuallyDrop::new on it, the compiler will allow you to do anything you like since the burden has effectively been shifted to you to drop the value if you want. What’s more, ManuallyDrop::new itself doesn’t have to be implemented with any compiler magic - since all it does is execute a struct expression and return it, it passes the panic check just fine.

Unwinding in async

Now that we’ve looked at what unwinding looks like in a synchronous context, let’s see what it looks like in an asynchronous one. It should be easier because this time we’re actually allowed to await on each value’s destruction.

async fn unwinds(_stream: TlsStream) {
	panic!();
}

Sticking with the principle of forbidding ungraceful TLS stream shutdown entirely, it makes sense for the future to catch this panic and then asynchronously drop everything in scope like it usually would, before eventually propagating the panic to the caller.

For parity with synchronous code, while performing these asynchronous drops std::thread::panicking would return true and similarly panicking again would result in an abort. Actually storing the in-flight panic in the future is easy: simply store an optional pointer that is the Box<dyn Any + Send> returned by catch_unwind, ready to be passed to resume_unwind later.

Unfortunately, those functions aren’t available in no_std environments yet so for now the compiler will probably have to use a workaround like aborting or leaking the values - or maybe implementing async destructors could be forbidden entirely on #![no_std]. If that issue is ever resolved it would be possible to improve the handling to something more useful.

There is one big issue with this approach however, and that is unwind safety. Unwind safety is the idea that panics in code can cause shared data structures to enter a logically invalid state, so whenever you are given the opportunity to observe the world after a panic it should be checked that you know that that might happen. This is regulated by two traits, UnwindSafe and RefUnwindSafe, which provide the necessary infrastructure to check all of this at compile time.

Implemented simply, this proposal would trivially break that concept:

#[derive(Clone, Copy, PartialEq, Eq)]
enum State { Valid, Invalid }

let state = Cell::new(State::Valid);

let task = pin!(async {
	let stream = some_tls_stream;
	state.set(State::Invalid);
	panic!();
	state.set(State::Valid);
});
let _ = task.poll(&mut cx);

// Now the task is panicking and polling the TLS stream...

// But we can observe the invalid state!
assert_eq!(state.get(), State::Invalid);

So what do we do? Well, we have a few options:

  1. Require that all local variables in async contexts are UnwindSafe. This would prevent the above code from compiling because &Cell<T> is !UnwindSafe.
  2. Have compiler-generated async {} types only implement Future when Self: UnwindSafe. This is mostly the same as the first option, it just causes an error later in compilation.
  3. Ignore unwind safety entirely - it’s already kind of useless because std::thread::spawn doesn’t require F: UnwindSafe and that can already be used to witness broken invariants. The system as a whole is definitely one of the more confusing and less understood parts of std, and it usually just amounts to slapping AssertUnwindSafe on everything until rustc is happy while not actually considering the implications.
  4. Have async panics always cause synchronous drops of locals. This would force a sync drop option on types where it might not even make logical sense to have one, and async panic handling would permanently be done suboptimally.

Personally, I’m quite in favour of option 3 - ignoring unwind safety entirely. I can’t think of a time where it has actually been useful for me or prevented a bug, but of course your mileage may vary (I know rust-analyzer has been saved by unwind safety at least once). I’m also open to option 1, although it could end up being quite a pain.

poll_drop_ready

In the now-closed RFC 2958, withoutboats proposed the following design for implementing asynchronous destructors:

trait Drop {
	fn drop(&mut self);

	fn poll_drop_ready(&mut self, cx: &mut Context<'_>) -> Poll<()> {
		Poll::Ready(())
	}
}

Under this design, dropping a type would be a simple matter of forwarding to poll_drop_ready inside the future’s poll function until it returns Poll::Ready(()) and execution can continue. Types would need to hold all state they need to use for destruction inside the type itself.

But this design comes with one major drawback that I haven’t seen mentioned so far: it breaks Vec’s three-pointer layout guarantee. The problem is that Vec, when destroyed, needs to drop each of its elements in order. So with an approach like poll_drop_ready, it would need to keep track of how many elements it has destroyed so far within the Vec itself, since it isn’t allowed to introduce any new external state during destruction. It can’t use any existing fields to do this - ptr, len and capacity are all necessary to keep around - therefore the only other option is adding a new field, but Rust already guarantees that Vec will never do that.

It’s not like there aren’t potential solutions to this, like hardcoding Vec’s async drop code into the language or only making it four usizes for async-drop types. But both of those are a hack, and to me appear to just be working around a more fundamental problem with the design.

So how do we avoid this? Well, we have to allow types to hold state - new state - in their asynchronous destructors. Such a design was rejected by withoutboats for two reasons:

  1. The resulting future can be unexpectedly !Send.
  2. It doesn’t play well with trait objects.

I don’t believe the first problem to be particularly bad, as if a type’s asynchronous destructor ends up being !Send that simply forms part of the type’s public API, similarly to how the type itself being Send is. And in generic contexts, since Send implementations leak all over the place anyway the Sendness of destructors can too: it would be up to the user to provide a type with a Send destructor if they want the resulting future to be Send.

Trait objects definitely pose a larger challenge - since the new state is of variable size, it’s not possible to stack-allocate it anywhere like we usually would with non-type-erased types. But this isn’t a problem that needs to be immediately solved: it’s possible to just forbid dyn trait objects with asynchronous destructors for now, and potentially fill in this gap later. Since users can always create user-space workarounds for this feature, it’s not urgent to attempt to stabilize a solution immediately. Additionally because it’s a problem shared with all async traits, not just async destructors, if a general solution is found for those it would end up working for this too.

Function implicit bounds

Now we need to begin to consider how async drop works in generic code. In particular, when will a generic parameter enforce that a type does or does not support synchronous drop?

Within the current edition, it is essential that backward compatibility is maintained. Therefore, we can’t suddenly force T: ?Drop on any existing function or implementation, synchronous or asynchronous since they could very well be relying on synchronous drop support. If asynchronous drop is to be supported at all by an API, they must have to explicitly opt in to it (more on this later). All generic parameters and associated types without that opt-in would default to requiring a synchronous drop in every context.

To illustrate how this would work, here is an implementation of FromIterator for Option annotated with the implicit bounds:

impl<A, V> FromIterator<Option<A>> for Option<V>
where
	// A: Drop,
	// V: Drop,
	V: FromIterator<A>,
{
	fn from_iter<I>(iter: I) -> Self
	where
		I: IntoIterator<Item = Option<A>>,
		// I: Drop,
		// No `I::IntoIter: Drop` bound is implied here since
		// that's provided by the IntoIterator trait already.
	{
		iter.into_iter().scan((), |_, item| item).collect()
	}
}

As a side note, I’m using T: Drop syntax to mean “supports synchronous drop”. Unfortunately, that is counterintuitively not what T: Drop currently means, nor does it mean “the type needs_drop”; instead, it is satisfied only if there a literal impl Drop block for the type, making the bound entirely useless in any actual code. But let’s ignore that and assume the more sensible meaning for now.

We get a lot more freedom when considering the next edition, and we can start relaxing the defaults of those bounds to something more commonly useful. As long as the standard library provides an adequate set of utilities for dealing with async drop types migrating should be painless.

Let’s look at a few simple examples to try and work out what these defaults should actually be.

fn sync_drops_a_value<T>(v: T) {}
fn sync_takes_a_ref<T>(v: &T) {}
fn sync_drops_a_clone<T: Clone>(v: &T) { v.clone(); }
async fn async_drops_a_value<T>(v: T) {}

sync_drops_a_value and sync_drops_a_clone should probably compile as-is and not work with async drop types. Similarly, async_drops_a_value should obviously work with async drop types, because of course async destructors would be supported in an asynchronous context. At first glance it looks like sync_takes_a_ref can follow suit - after all, it’s not trying to drop anything - but in practicality it can’t, because the compiler shouldn’t have to look into its function body to determine whether it actually does something like sync_drops_a_clone does or not. While that situation is unfortunate, it is not all bad because as it turns out the extra restriction does not matter in most cases, since users can often add an extra reference to the type to bridge the gap.

fn takes_a_ref<T /* implied to require not-async-drop */>(val: &T) { /* ... */ }

let stream: TlsStream = /* ... */;
takes_a_ref(&stream); // doesn't work, since TlsStream is async-drop
takes_a_ref(&&stream); // does work, since &TlsStream is not async-drop

Normally, a double reference functions totally equivalently to a single one, so this shouldn’t be a too big problem. And as older APIs gradually migrate to new syntax it becomes less and less of one.

So past the next edition all synchronous functions would implicitly bound each generic parameter by T: Drop and all asynchronous functions would use the async equivalent. While this doesn’t cover the desired behaviour 100% of the time, it covers the majority of cases and that’s all that’s needed for a default - explicit bounds can be used whereever necessary.

Inherent functions follow much the same idea. Consider this example:

struct Wrapper<T>(T);

impl<T> Wrapper<T> {
	fn some_sync_method(self) {}
	fn ref_method(&self) {}
	async fn some_async_method(self) {}
}

With all the implicit bounds made explicit, it would look like this:

struct Wrapper<T>(T);

impl<T> Wrapper<T> {
	fn some_sync_method(self) where T: Drop {}
	fn ref_method(&self) where T: Drop {}
	async fn some_async_method(self) where T: AsyncDrop {}
}

There is one small addition though: because of the frequency of wanting to define several synchronous methods that don’t care about drop, one can specify relaxed bounds on the impl block itself and have it apply to every function inside of it. This would be useful for defining many of the Option methods:

impl<T: ?Drop> Option<T> {
	pub fn is_some(&self) -> bool { /* ... */ }
	pub fn is_none(&self) -> bool { /* ... */ }
	pub fn as_ref(&self) -> Option<&T> { /* ... */ }
	pub fn as_mut(&mut self) -> Option<&mut T> { /* ... */ }
	// et cetera
}

The choices of the exact syntax for this is discussed more later.

Drop supertrait

The following code compiles today:

pub trait Foo {
	fn consumes_self(self) {}
}

If any declared trait didn’t imply Drop as a supertrait, then we would have a breaking change as there would no longer be a guarantee that self can be dropped like that. Ultimately, I would like to follow in the path of Sized and have Foo: Drop never implied so that the above code would need an explicit where Self: Drop bound, but until then that code must desugar like so:

pub trait Foo: Drop {
	fn consumes_self(self) {}
}

And everything can compile again.

It’s also possible that we could introduce some more complex rules about this in the current edition, like “the supertrait is only implied if there are any default methods”; but they would only help in a small number of cases and it would be easier to just convince users to use the next edition.

Async genericity

With the current suggestions taken alone, although async drop will be supported it would be rather inconvenient since almost no existing standard library APIs would support it. Just to show how difficult it would be to use, here are some functions that wouldn’t work with async drop types:

  • Option::insert, since it can drop the old value in the Option.
  • Many HashMap functions: insert, entry, etc since they call methods of user-supplied generics which can always panic.
  • Vec::push, since it’s synchronous and can panic if the Vec’s length exceeds isize::MAX.
  • Box::new, since it’s possible that allocation will be allowed to panic.

One potential option is to introduce _async variants of each of these functions that are async fns. When dealing with async-drop types, you’d call vec.push_async(item).await; instead of vec.push(item); and Box::new_async(value).await instead of Box::new(value). However this would nearly double the API surface of the standard library and lead to a large amount of code duplication. This is obviously undesirable, so what can we do about it?

One potential path forward is a feature known as async overloading, previously proposed by Yoshua Wuyts. The idea is that synchronous functions can be overloaded by asynchronous ones, allowing Vec::push_async and Vec::push to effectively share the same namespace, and have the correct function be chosen based on context.

While this does solve the first problem of the doubled API surface quite neatly, it does not however solve the second problem of code duplication - one would still have to write two copies of nearly-identical code for an async and sync implementation of the same algorithm. And it comes with its own problems too, such as needing a good way to force one particular overload to be chosen of multiple possibilities.

My alternative idea is what I will refer to as async genericity. Unlike async overloading which has two separate functions with different bodies, under async genericity the async and sync equivalents of one function share a body that works for both. The compiler can then monomorphize this into two separate functions, just like it does for generic parameters. The correct version will be chosen at call site depending on the traits the given generic parameters implement. It is, to some extent, colourless async.

Inspiration from const

I’d like to take inspiration from the work on const fn which faces a similar problem to the one we’re facing now: how can one function be written that works for multiple modes (async/sync, const/non const)? A simple example of that is drop:

const fn drop<T: ~const Drop>(_x: T) {}

This function can be treated as “expanding” into two separate functions:

const fn drop_const<T: const Drop>(_x: T) {}
fn drop_non_const<T>(_x: T) {}

Where the correct one will be chosen at call site depending on whether T can be dropped in const contexts. const Drop is a compiler-generated Drop subtrait which has all the same methods as Drop, but converted to const fns. This const modifier can actually be applied to any trait to automatically make it const: const Iterator, const Add et cetera. You can read more about this in its pre-RFC, I won’t go into the details here.

I will use this as a starting point for the async generics design. It might look something like this:

~async fn drop<T>(_x: T) {}

The T: ~async Drop bound is implied, like how T: async Drop would be implied in normal async fns. It “expands” to:

async fn drop_async<T>(_x: T) {}
fn drop_sync<T>(_x: T) {}

In cases where there are multiple generic parameters, like for example:

~async fn drop_pair<A, B>(_: A, _: B) {}

The synchronous version is only possible when all parameters implement the synchronous version of the trait.

// `A: async Drop, B: async Drop`
async fn drop_pair_async<A, B>(_: A, _: B) {}

// `A: Drop, B: Drop`
fn drop_pair_sync<A, B>(_: A, _: B) {}

If the function is being called where A: Drop but B: async Drop, the async version will be selected since A: Drop implies A: async Drop already.

If an ~async fn is declared with no generic parameters that have an ~async bound, then it’s actually totally equivalent to a synchronous function and should probably be warned against by rustc.

One important aspect to note is that async is somewhat the opposite of const. While a non- const function can always be substituted for a const one, the inverse is true of async: an async function can always be substituted for a sync one but not the other way around. This means that while const Trait is a subtrait of Trait (fewer types implement it than just Trait), async Trait is a supertrait of Trait (more types implement it than just Trait). Or in other words, const Trait: Trait: async Trait.

Another important impact of this system is that, unlike with const, upgrading an implementation from async Trait to Trait is a breaking change since the methods will now by default be synchronous instead of asynchronous, so you’ll get errors whereever you previously were using .await. Of course, the actual number of use cases is universally increased, not reduced (passing it to a function that accepts async Trait still works, and the methods will still require .await there) but direct callers will need to modify their code to have it build. However this should not be a large problem since it’s generally well known up front whether something will need async or not.

Another option would be to have async Trait and Trait be treated as two entirely separate traits, with no inherent connection between the two. This has the advantage of preventing mistakes like using std::fs::File in an asynchronous function at compile time (since std::fs::File would not implement async Read), but overall I do not think that to be worth it:

  1. Users can end up making the mistake anyway, just by calling a concrete blocking function like .metadata() on a Path or std::thread::sleep. It would only help prevent a small number of cases.

  2. It is not always a mistake; sometimes it is useful to run blocking code in an asynchronous context, if for example one wants to mix asynchronous and blocking function calls on a blocking worker thread.

  3. Sometimes whether an operation will actually block is only known dynamically, for example reading from a TCP stream - if it the stream is in non-blocking mode (which is explicitly a supported use case by the standard library) it should be fine to call it from async code.

  4. By default types like Vec<u8> (whose Write implementation is neither asynchronous nor blocking, and thus can be used in both contexts) would end up being exclusively synchronous. To support both, it would have to write out boilerplate code to implement both async Trait and Trait separately, or we’d have to introduce another new piece of syntax to share an implementation.

    It gets worse when considering Drop - every non-generic type implementing that trait would have to migrate to this new syntax to even be usable at all in asynchronous contexts (or we could special-case Drop to have shared implementations, but I can’t think of a strong reason why Drop should be treated so differently from everything else).

  5. Having the traits be separate rather increases the complexity of the system overall.

Relaxed drop bounds

We introduced implicit default Drop bounds in a previous section; now that we have some actual syntax for async drop (async Drop) the question is how those bounds can be relaxed for functions that allow it.

I’d first like to introduce a new concept in this section: the ?Drop bound. This bound can be considered the initial one before implicit bounds are added, and it imposes absolutely no requirements on to what extent the type supports being dropped. There would not be any situation in which this bound is necessary over async Drop, since the least “droppable” a type can be is async Drop - applying it only takes abilities away from the implementor while giving none to the caller. But it is still important to have because it avoids panic-check-passing synchronous functions that don’t care at all about async (mem::replace, any::type_name, Option::map etc) from having to write async in their signature to be general. It would feel rather strange for them to declare <T: async Drop> or something when they actually don’t drop the type asynchronously at all. It also enables future extensions into more kinds of drop which may be useful.

All functions have a stronger default bound for generic parameters than ?Drop, and that can be relaxed to ?Drop in much the same way as the other implied bound in Rust, Sized: by adding ?Drop as a trait bound in the parameter list or in the where clause. Like with Sized it only accepts the simple cases, so ?Drop cannot be used as a supertrait (it is the default anyway) or as a bound on types other than a literal type parameter. There is a slight inconsistency here in that ?Drop is used even when the implied bound isn’t actually Drop, because it could be in reality async Drop; so in a way it should really be ?async Drop if the outer function is async and only ?Drop if the outer function is sync. But since ?Drop is shorter, more consistent and unambiguous anyway there’s no strong reason not to use it.

When relaxing bounds to something weaker than the default but stronger than ?Drop, (particularly, setting them to async Drop in a synchronous function) the most obvious option is to support the trait name directly - use T: async Drop to support T not implementing any of the Drop subtraits (Drop, const Drop), but requiring it to implement async Drop. However this approach ends up being quite problematic because unlike ?Drop whose unique syntax excuses it from only supporting a few special cases, async Drop is also a trait like any other and so must be supported in the general case like any other.

What this means is that having T: async Drop implicitly also relax a Drop bound breaks down in more complex cases (such as when it’s implied through a supertrait, or transitively via a bound in the where clause applied to another type) leading to inconsistent behaviour and confusing semantics.

Instead, Rust should take the consistent approach of allowing (but potentially warning against) bounds like T: async Drop on a synchronous function, but not giving them any effect unless they’re also paired with ?Drop. Since Drop implies async Drop, adding async Drop in a synchronous function is a tautology and only by taking away the initial Drop bound does it have a meaning.

The only problem with this approach is its verbosity: T: ?Drop + async Drop is quite the mouthful to express one concept. It’s possible that Rust could introduce some syntax sugar to make it shorter, the only difficulty is what the actual syntax of that would be while remaining clear and unambiguous. I’m very much open to suggestions here.

Synchronous opt-out

While blindly turning every method in the trait const works most of the time for const Traits, it doesn’t end up working so well for async Traits. In particular, there are quite a few methods that would benefit from always being synchronous whether the outer trait is considered asynchronous or not, for example:

  • Iterator::size_hint and ExactSizeIterator::len: These methods should be O(1) and not perform I/O, so there’s no reason to have them be async.
  • Iterator::{step_by, chain, zip, map, filter, enumerate, ...}: These functions just construct a type and return it, no asynchronity here.
  • Read::{by_ref, bytes, chain, take}: More trivial functions that just construct a type.
  • BufRead::consume: Any I/O done by the BufRead should occur in fill_buf and all consume should do is move around a couple numbers. Hence, it should be always synchronous.

So evidently trait definitions need to be able to control what their async form would look like. Having any kind of default chosen by the Rust compiler would be a bad idea, because even without thinking about async code, just by writing a single trait you’d have already chosen and stabilized an async API. Plus, it’s not like many traits need to have async equivalents - it’s mostly just Iterator, I/O traits, functions and Drop that matter. Therefore I think it is best to have async Trait support be an opt-in by the trait declarer.

The syntax to declare one of these traits can be something along the lines of trait ~async Foo, ~async trait Foo, or async trait Foo - I don’t have a strong preference and will use the first for now. In order to declare the methods of these traits as being conditionally async, the same ~async syntax can actually be borrowed over from generic async functions - Self will just be treated as another generic parameter with an ~async Trait bound. This produces a nice parallel between functions and traits, as demonstrated below:

// What you write
~async fn f<T: ~async Trait>() { /* ... */ }

trait ~async Trait { ~async fn f(); }

// What it "expands" to
async fn f_async<T: async Trait>() { /* ... */ }
fn f_sync<T: Trait>() { /* ... */ }

trait async Trait { async fn f(); }
trait Trait { fn f(); }

And since those functions are actually just regular ~async functions, they also interact with generic parameters:

trait ~async Trait {
	~async fn f<T: ~async Read>(val: T);
}

// What it "expands" to
trait async Trait {
	async fn f_async<T: async Read>(val: T);
}
trait Trait {
	async fn f_async<T: async Read>(val: T);
	fn f_sync<T: Read>(val: T);
}

// A synchronous implementation
impl Trait for () {
	~async fn f<T: ~async Read>(val: T) {}
}
// An asynchronous implementation
impl async Trait for u32 {
	async fn f<T: async Read>(val: T) {}
}
// A generic implementation
impl<T: ~async Trait> ~async Trait for &T {
	~async fn f<T: ~async Read>(val: T) {}
}

Just like with regular ~async functions, the synchronous version only exists when all generic parameters (here, both T and Self) implement the trait synchronously.

The last thing to note is that associated types in ~async Traits would have the implicit bound ~async Drop: when the trait is an async Trait they’re allowed to be async Drop but when it’s a synchronous Trait they are required to be Drop. This should follow the rules that users will want most of the time.

To conclude, I’ll leave you with an annotated snippet of how the Iterator trait might look with added async support:

pub trait ~async Iterator {
	type Item;

	~async fn next(&mut self) -> Option<Self::Item>;

	fn size_hint(&self) -> (usize, Option<usize>) {
		(0, None)
	}

	~async fn fold<B, F>(mut self, init: B, f: F) -> B
	where
		Self: Sized,
		// `fold` always drops `Self` at the end so this bound is required.
		Self: ~async Drop,
		F: ~async FnMut(B, Self::Item) -> B,
		// We can't relax B's bound because it's dropped in the event that
		// `self.next()` panics.
	{
		let mut accum = init;
		// `.await` is required in both cases because it could be a cancellation
		// point.
		while let Some(x) = self.next().await {
			accum = f(accum, x).await;
		}
		accum
	}

	fn map<B, F>(self, f: F) -> Map<Self, F>
	where
		Self: Sized,
		// Even a synchronous iterator's `map` accepts an `async FnMut` here,
		// without the tilde. This is because every `FnMut` is also an
		// `async FnMut`, so `async FnMut` is the strictly more general bound.
		// The tilde is only necessary when the function effectively needs to
		// specialize on the synchronous case to not be async, but that's not
		// necessary here since `map` isn't ever async anyway.
		F: async FnMut(Self::Item) -> B,
		// The default bounds are overly restrictive, so we relax them.
		F: ?Drop,
		B: ?Drop,
	{
		Map::new(self, f)
	}

	// et cetera
}

Compared to the current design of adding a new Stream/AsyncIterator trait, this has the following advantages:

  • We don’t have to decide between async vs sync callbacks for functions like fold (currently futures-util and tokio-stream disagree about this).
  • We don’t have two separate functions .map and .then for sync and async respectively.
  • .map with an async function can be called on a synchronous iterator, automatically turning it into an async one.
  • There’s no need for additional conversion functions like .into_stream() or .into_async_iter().
  • Existing iterators like slice::Iter will automatically implement the new async Iterator trait.
Async traits and backwards compatibility

If you look closely at my definition of Iterator above you’ll notice that it’s actually not backward compatible with the current definition of Iterator. The problem is that today, people can override functions like fold that are less powerful than the ~async version. For example:

impl Iterator for Example {
	type Item = ();

	fn next(&mut self) -> Option<Self::Item> { Some(()) }

	fn fold<B, F>(mut self, mut accum: B, f: F) -> B
	where
		F: FnMut(B, Self::Item) -> B,
	{
		loop { accum = f(accum, ()) }
	}
}

Under my definition of Iterator, that code would instead need to be rewritten like this:

impl Iterator for Example {
	type Item = ();

	fn next(&mut self) -> Option<Self::Item> { Some(()) }

	~async fn fold<B, F>(mut self, mut accum: B, f: F) -> B
	where
		F: ~async FnMut(B, Self::Item) -> B,
	{
		loop { accum = f(accum, ()).await }
	}
}

The iterator itself is still not async, but this change would additionally allow calling fold with an asynchronous callback even if the underlying iterator is still synchronous.

Unfortunately, we can’t just make the first version stop compiling due to Rust’s backward compatibility guarantees. And even an edition won’t be able to fix this, since the issue is greater than just a syntactical one.

I don’t think there is a reasonable way to somehow fix fold itself - its signature is effectively set in stone at this point. But we can add a where Self: Iterator<Item = Self::Item> bound to it and then have the generic version be under a new name, fold_async. Since fold_async would be strictly more general than fold, the default implementation of fold can just forward to it. So the definition of Iterator would actually look more like this:

pub trait ~async Iterator {
	type Item;

	~async fn next(&mut self) -> Option<Self::Item>;

	fn fold<B, F>(mut self, init: B, f: F) -> B
	where
		Self: Iterator<Item = Self::Item> + Sized + Drop,
		F: FnMut(B, Self::Item) -> B,
	{
		self.fold_async(init, f)
	}

	~async fn fold_async<B, F>(mut self, init: B, f: F) -> B
	where
		Self: Sized + ~async Drop,
		F: ~async FnMut(B, Self::Item) -> B,
	{
		let mut accum = init;
		while let Some(x) = self.next().await {
			accum = f(accum, x).await;
		}
		accum
	}

	// et cetera
}

Even though it looks very similar to not having async genericity at all, it is still better than without because:

  1. Overriding fold_async also effectively overrides fold - they’re able to share an implementation.
  2. Async and sync iterators share definitions of fold and fold_async.

This makes the feature still worth it in my opinion, even if we have to insert some hacks into Iterator to avoid breaking compatibility.

Unfortunately fold isn’t the only method that would need this treatment, potentially many others would too. By my count, this includes (in the standard library alone): chain, zip, map, for_each, filter, filter_map, skip_while, take_while, map_while, scan, flat_map, flatten, inspect, collect, partition, try_fold, try_for_each, reduce, all, any, find, find_map, position, rposition, sum, product, cmp, partial_cmp, eq, ne, lt, le, gt, ge, DoubleEndedIterator::try_rfold, DoubleEndedIterator::rfold, DoubleEndedIterator::rfind and Read::chain. If async Clone or async Ord become things, the list would grow longer.

It is a bit of a shame that functions like map and Read::chain have to have async versions though, since it’s not like anyone overrides map anyway. But because it’s technically possible, Rust has already promised not to break that code and so now can’t relax the signature of that function. Although who knows, maybe if we got a low % regression Crater run it would convince people that’s it’s acceptable breakage and the list could be shortened to the much more manageable for_each, partition, try_fold, try_for_each, reduce, all, any, find, find_map, position, rposition, cmp, partial_cmp, eq, ne, lt, le, gt, ge, DoubleEndedIterator::try_rfold, DoubleEndedIterator::rfold and DoubleEndedIterator::rfind. I would definitely rather do this, because frankly if you override map then you deserve what you get.

Out of the group, collect, sum and product are an especially interesting three because their _async versions (and their normal versions if we accept the technically breaking change) can’t use the standard FromIterator, Product and Sum traits since those traits are currently hardcoded to work for synchronous iterators only. So we would instead have to make new *Async versions of those traits with blanket implementations of the old versions:

// Not sure how useful `~async` is here; it would only be needed for collections
// that actually perform async work themselves while collecting as opposed to
// just potentially-asynchronously receiving the items and then synchronously
// collecting them.
//
// This is not true of any existing `FromIterator` or `FromStream`
// implementation currently, but there may still be use cases - who knows.
pub trait ~async FromAsyncIterator<A>: Sized {
    ~async fn from_async_iter<T: ~async IntoIterator<Item = A>>(iter: T) -> Self;
}
impl<T: FromAsyncIterator<A>, A> FromIterator<A> for T {
	fn from_iter<T: IntoIterator<Item = A>>(iter: T) -> Self {
		Self::from_async_iter(iter)
	}
}

With similar code for both Sum and Product. Unlike Iterator::fold, since from_iter, sum and product aren’t default-implemented methods we can’t just add a new from_async_iter function to the FromIterator trait itself; an entirely new trait is needed.

Trait impl implicit bounds

Before, I talked about how inside an inherent impl block, implicit Drop bounds to generics of the outer type would apply individually to each of the methods depending on its asynchronity, and the block itself would enforce no bounds on the type. Unfortunately, we don’t have that luxury when considering trait implementations: either the trait is implemented or it’s not and we can’t apply our own bounds to individual items.

However, we do know whether the trait overall should be considered asynchronous or not - whether it’s being implemented as async Trait or Trait. So we can just forward that property as the default kind of Drop bound, and it should be what users want most of the time. Of course, for the (hopefully) rare case that it’s not desired they can always override it. The most obvious time that crops up is when implementing a trait that isn’t an async Trait but still has async methods (i.e. an async trait with no synchronous equivalent) - then the drop bounds would end up overly restrictive:

trait ExampleTrait {
	async fn foo<V>(&self, value: V);
}

struct Wrapper<T>(T);

impl<T> ExampleTrait for Wrapper<T>
where
	// overly-restrictive implied bound: `T: Drop`
{
	async fn foo<V>(&self, value: V)
	where
		// implied bound: `V: async Drop` (since it's declared
		// on the function and not on the impl block)
	{
		todo!()
	}
}

But with any luck this kind of code won’t be too common, since users should ideally be writing most code as generic-over-async anyway.

An interesting side effect of the above rule is in code like below:

struct Wrapper<T>(T);

impl<T /* implied Drop bound */> Drop for Wrapper<T> {
	fn drop(&mut self) {
		println!("I am being dropped");
	}
}

Although it is not obvious, this code wouldn’t compile because the Drop implementation of a type has more restrictive trait bounds than the type itself, and that isn’t allowed. But since it looks like this code should compile, I find it acceptable to introduce a special case and simply have the compiler forward that implicit T: Drop bound to the type itself, but only when a Drop implementation specifically is present.

Either way, that type does not work with async Drop types and the fix is like so:

struct Wrapper<T>(T);

impl<T: ?Drop> Drop for Wrapper<T> {
	fn drop(&mut self) {
		println!("I am being dropped");
	}
}
Async closures

Supporting async genericity with closures (as required for functions like Option::map and Iterator::fold) requires async {Fn, FnMut, FnOnce} to exist as traits. It seems that this is a bit useless since we already have functions that return futures, but as it turns out there is an actual benefit to having separate async function traits, particularly when working with closures: it makes the lifetimes a lot easier to manage, since the returned futures will be able to borrow the closure and parameters - something impossible with the current design.

However in order for the async Fn-traits to be useful, they must be actually implemented by the relevant functions and closures. Currently, people support asynchronous callbacks by having closures that return futures (|| async {}) - and async fns are desugared to functions of this form too. It wouldn’t be a good idea to attempt to change the behaviour of the former since that would need a hacky compiler special case for closures returning futures only, but thankfully we have reserved a bit of syntax that would be perfect for this use case: async closures (async || {}). If they were to evaluate to closure types implementing async Fn instead of Fn, they could be passed into async-generic functions like Option::map without a problem.

// Gives an `Option<T>`, since the async `map` is used.
let output = some_option.map(async |value| process(value).await).await;

// Gives an `Option<impl Future<Output = T>>`, since the sync `map` is used.
let output = some_option.map(|value| async { process(value).await });

The less good side of this addition is with async fns: we would have to choose between keeping the current system of desugaring to a simple -> impl Future function, and implementing the async Fn traits. The former is backwards compatible and more transparent (since those functions can be replicated entirely in userspace), but the latter has better interopability with async generic functions. I am inclined to choose the latter design, but it’s an unfortunate decision to have to make.

Note that it wouldn’t be possible to implement both async Fn and Fn, because implementing Fn already implies implementing async Fn as an async function that never awaits; we would end up with conflicting implementations of async Fn, one that asynchronously evaluates to T and one that immediately evaluates to impl Future<Output = T>. To avoid that compile error we would have to choose one and discard the other.

Conclusion

In this post we sketched out a potential design for async drop, figuring out many details and intricacies along the way. The resulting proposal is unfortunately not a small one, however it does have much general usefulness outside of async destructors (~async in particular would be excellent to have for so much code) and lots of it is necessary if we are to minimize footguns.

As a summary of everything we’ve explored thus far:

  1. We figured out the desired edge case semantics of async drop during cancellation, panics and assignments, in synchronous functions and with generics.
  2. We explored a system for async destructors based on destructor futures instead of poll_drop_ready.
  3. We explored a mechanism for supporting code that is generic over whether it is async or not.
  4. We hypothesized what is best to apply as the default generic drop bounds in functions, as well as how to relax and strengthen them if necessary.
  5. We considered how async genericity would impact functions and closures.

This post doesn’t attempt to provide a final design for async drop - there are still many open questions (e.g. UnwindSafe, ?Drop syntax, #![no_std] support) and likely unknown unknowns. But it does attempt to properly explore one particular design to evaluate its complexity, feasability and usefulness. Out of all possible options, I think it to be quite a promising one and definitely possible to implement in some form.

Many thanks to Yoshua Wuyts for proofreading this for me!

Appendix A: Completion futures

Completion futures are a concept for a special type of future that is guaranteed at compile-time to not be prematurely dropped or leaked, in contrast to regular futures which can be stopped without warning at any time. It doesn’t sound like much, but completion futures are actually incredibly useful:

  • They enable spawn and spawn_blocking functions that don’t restrict the future’s lifetime to 'static.
  • They enable creating zero-cost wrappers around completion-based APIs like io_uring, IOCP and libusb.
  • They enables better interopability with C++ futures, which have this guarantee by default.

I have previously written a library for this but it was very limited because it fundamentally needed to rely on unsafe, infecting just about every use of it with unsafe as well which was really not ideal. But it turns out that with an async destructor design like the one proposed by this post, it is much easier to support them in an even more powerful way and with minimal unsafe.

The solution is to add a single new trait to the core library:

pub unsafe auto trait Leak {}

As an auto trait, it would be implemented for every single type other than a special core::marker::PhantomNoLeak marker and any type transitively containing that. What Leak represents is the ability to safely leak an instance of the type, via mem::forget, reference cycles or anything similar. If a type opts out of implementing it, it is guaranteed that from creation, its Drop or async Drop implementation will be run if the type’s lifetime to end.

The standard library would have all the “leaky” APIs like Arc, Rc, ManuallyDrop and MaybeUninit require that Leak be implemented on the inner type, to avoid safe code being able to circumvent the restriction. Other than that, most other APIs would support both Leak and !Leak types, since they will run the destructor of inner values.

And this is all we need to support completion futures. An io_uring I/O operation future can be implemented by submitting the operation on creation and waiting for it to complete on drop, and the !Leak guarantee means that the use-after-free issue io_uring libraries currently have to work around is eliminated.

This is a very powerful feature, even more so than my old unsafe-based implementation. Because it guarantees not leaking from creation and not just from the first poll, scoped tasks don’t even need a special scope to be defined (à la Crossbeam). Instead, an API like this just works:

pub async fn spawn<'a, R, F>(f: F) -> JoinHandle<'a, R>
where
	F: Future<Output = R> + Send + 'a,
	R: Send,
{ /* ... */ }

It also has impacts on synchronous code, because thread::spawn gets to be extended in a similar way:

pub fn spawn_scoped<'a, R, F>(f: F) -> JoinHandle<'a, R>
where
	F: FnOnce() -> R + Send + 'a,
	R: Send,
{ /* ... */ }

This would allow you to write code that borrows from the stack without problems:

let message = "Hello World".to_owned();

// Synchronous code
let thread_1 = thread::spawn_scoped(|| println!("{message}"));
let thread_2 = thread::spawn_scoped(|| println!("{message}"));
thread_1.join().unwrap();
thread_2.join().unwrap();

// Asynchronous code
let task_1 = task::spawn(async { println!("{message}") }).await;
let task_2 = task::spawn(async { println!("{message}") }).await;
task_1.await.unwrap();
task_2.await.unwrap();

Neat, right?

As with many things it needs an edition boundary to implement fully: In the current edition, every generic parameter has to still imply T: Leak but in future editions that can be relaxed to T: ?Leak, allowing the small subset of APIs that can leak values (Arc, Rc, mem::forget, ManuallyDrop, etc) to declare so in their signature and the majority of APIs to have the less restrictive bound by default.

Appendix B: Weakly async functions

With the current design, there ends up being a large number of functions with the specific property that they need to be async fns if a type they deal with is async Drop, for the sole reason that they are able to panic while they have that type in scope. I listed a few at the start of the async genericity section, including HashMap::{insert, entry}, Vec::push and Box::new, but there’s one particularly relevant one here which is task::spawn (as seen in various runtimes: tokio, async-std, glommio, smol).

Across all those runtimes, task::spawn has the ability to panic before it spawns the future, which commonly can happen if the runtime is not running, but can also theoretically happen if allocation fails or there’s some other random system error. The problem is that just because of this one small edge case (and their presumed desire to support async Drop futures), task::spawn is forced to be a full async fn even though in itself it doesn’t do any async work.

This is especially bad for task::spawn as a function because it can easily trip up those who are migrating code. For example, while before this code would run the task in parallel with other_work():

let task = task::spawn(some_future);
other_work().await;
task.await;

With the changes applied it would instead run other_work() and wait for it to complete, and then spawn the task and not even wait for it to finish! (Unless of course dropping a task handle would be changed to implicitly join the task, which may be a better design overall - but the point still stands because it doesn’t run in parallel as people would expect.)

The fixed version would look like this:

let task = task::spawn(some_future).await;
other_work().await;
task.await;

But given that the old version doesn’t even fail to compile, it’s not an ideal situation to be in. Additionally, it does just look weird having a future that resolves to…another future.

My proposed solution to this problem is to add a new type of function to the language called “weakly async functions” which are in between asynchronous functions and synchronous functions. Let’s denote it here with [async] fn, but the syntax is obviously up for bikeshedding. The idea is this:

  • [async] fns either complete synchronously or panic asynchronously.
  • Because they must complete synchronously, they cannot be cancelled and thus they don’t need to be .awaited - that can be made implicit.
  • Because they panic asynchronously, they bypass the panic check and are allowed to own types with asynchronous destructors across potential panic points (but are not allowed to drop them unless via a panic).
  • They are allowed to call regular fns and other [async] fns, but not async fns.
  • They cannot be called from within synchronous functions.
  • They are not allowed to recurse, just like async fns.
  • It is not a breaking change to convert from an [async] fn to a regular fn.

This way, task::spawn (and a bunch of other functions like Box::new, Box::pin, Vec::push, Result::unwrap etc) would avoid requiring .awaits when being called with async Drop types. This solves the above footgun while also contributing to the succintness of code. task::spawn would be defined something like this:

pub [async] fn spawn<O, F>() -> JoinHandle<O>
where
	F: Future<Output = O> + Send + ?Drop + async Drop + 'static,
	O: Send,

And in asynchronous contexts would be callable with just task::spawn(future), no await necessary.

When inside generic code, [async] would be treated as another state that ~async fns can be in, meaning there are actually three ways to those functions. There would additionally be ~[async] fns for functions that can be either fns or [async] fns, but not async fns.

You’d also need a special kind of bound to represent “Drop when the function is synchronous and async Drop when the function is async, but also async Drop when the function is [async], since this function does not drop a value of this type unless it panics”. For now I will use the incredibly verbose form ~[async] async Drop to represent this, but if this feature is actually added a better and more bikeshedded syntax will probably have to be chosen.

This is the feature that allows us to define Vec::push generically:

impl<T> Vec<T> {
	~[async] fn push(&mut self, item: T)
	where
		T: ?Drop + ~[async] async Drop,
	{
		/* ... */
	}
}

// "Expanded" version
impl<T> Vec<T> {
	fn push_sync(&mut self, item: T)
	where
		T: Drop,
	{
		/* ... */
	}
	~[async] fn push_weak_async(&mut self, item: T)
	where
		T: ?Drop + async Drop,
	{
		/* ... */
	}
}

Remember that this function can drop item and so can’t be fully synchronous, but also doesn’t drop item unless it’s panicking and so shouldn’t be made fully async either. As such it uses the in-between, supporting async Drop (and therefore also [async] Drop) when it is an [async] fn and Drop when it is a fn.

Unlike completion futures, I’m not so certain whether this is a good idea or not, or whether there aren’t any other simpler alternatives. But I do definitely think there is a problem here that does need to be addressed somehow, and to me this seems the best way to do it.

Appendix C: Linear types

I feel that I have to mention linear types at least once, given how much discourse there has been about them. A linear type is defined as “a type that must be used exactly once”. It turns out this definition is slightly vague, because it can refer to two things:

  1. Types which do not have any kind of Drop implementation and must be handled explicitly, but can be leaked with functions like mem::forget.
  2. Types which do have destructors and so can implicitly fall out of scope, but can’t be leaked with functions like mem::forget (so they are guaranteed to be able to run code before falling out of scope).

The former is a more common definition of linear types, and allows for types to force their users to be more explicit about what happens to them when they’re destroyed. I don’t have a proposal for this, but simply by coincidence the proposed ?Drop bound feature does orient itself towards supporting linear types of this sort in future and although personally I do not think they will be worth adding, their viability has been increased as a side-effect.

The latter definition is what is implemented by the above completion futures proposal. In a way it’s not true linear types, but it’s the only one that gives the practical benefits of things like zero-cost io_uring and scoped tasks. It is also a lot less difficult to integrate into existing Rust code, which tends to rely quite heavily on destructors existing but not so much on values being safely leakable.

Appendix D: Uncancellable futures

I previously argued against Carl Lerche’s suggestion to make all async functions uncancellable in favour of defining consistent semantics for .await rather than removing it. However, these kinds of functions not totally off the table; such a feature can still definitely exist, first of all as a userspace combinator:

pub async fn must_complete<F: Future>(fut: F) -> F::Output {
	MustComplete(fut).await
}

#[pin_project(PinnedDrop)]
struct MustComplete<F: Future>(#[pin] F);

impl<F: Future + ?Drop + async Drop> Future for MustComplete<F> {
	type Output = F::Output;

	fn poll(self: Pin<&mut Self>, cx: &mut task::Context<'_>) -> Poll<Self::Output> {
		self.project().0.poll(cx)
	}
}

#[pinned_drop]
impl<F: Future> async PinnedDrop for MustComplete<F> {
	async fn drop(self: Pin<&mut Self>) {
		self.project().0.await;
	}
}

Usable like so:

must_complete(async {
	some_very_important_work().await;
	that_must_not_be_interrupted().await;
})
.await;

It could also exist as a language feature, which would additionally allow removing .await if that is desired. Either way, the effect is the same: this proposal easily enables writing futures that are guaranteed to not have cancellation points. Personally I do not think this use case is common enough to warrant a language feature, but it is still definitely worth considering.

https://sabrinajewson.org/blog/async-drop
Building this site
Show full content

Since I’ve spent the past few days working on creating this website, I thought I’d make good use of the effort by documenting my experiences here.

I got the idea to create a website from a desire to have a place to write blog posts. Initially I had plans on just making GitHub gists and sharing them on Reddit or something, but I (thankfully) decided against that since a site allows for much more flexibility.

To avoid having to maintain a web server myself, I’m just using GitHub Pages to host it (but on a custom domain to make the URL shorter). I also decided against using a static site generator since it’s a lot more fun to build it myself.

The build system

I’m not going to write the HTML for this manually, so I needed to decide on a build system to use. I briefly considered existing options like Make, Gulp or cargo-make but eventually decided to write my own thing in Rust.

The requirements I had for it were this:

  • Does the minimum amount of work possible between rebuilds.
  • Has a “watch mode” that can be enabled with zero extra configuration to watch the directory for changes and automatically rebuild when they happen.

After some thinking and a few failed attempts, I had devised quite a neat solution: the Asset trait. The core API is this:

trait Asset {
    type Output;
    fn modified(&self) -> Modified;
    fn generate(&self) -> Self::Output;
}

enum Modified {
    Never,
    At(SystemTime),
}

Each Asset represents a resource in the generation process: a text file being read, a JSON file being parsed, an HTML file being generated, an image being tranformed, et cetera. It has two main capabilities: calling generate to do the (potentially expensive) work to actually produce the value, and calling modified to cheaply compute the time at which that value was last modified.

The Modified enum is mostly just a SystemTime but also has a special variant Never to represent a time before all SystemTimes, which is used for when getting the modification time fails (e.g. a deleted/non-existent file) or when the asset’s value is a constant.

The three most basic implementors of this trait are Constant, Dynamic and FsPath, representing a constant value, a dynamic but immutable value (typically command-line arguments) and a value sourced from a filesystem path’s modification time respectively. Their implementations are pretty much as you’d expect:

struct Constant<T>(T);
impl<T: Clone> Asset for Constant<T> {
    type Output = T;
    fn modified(&self) -> Modified { Modified::Never }
    fn generate(&self) -> Self::Output { self.0.clone() }
}

struct Dynamic<T> {
    created: SystemTime,
    value: T,
}
impl<T> Dynamic<T> {
    fn new(value: T) -> Self {
		let created = SystemTime::now();
        Self { created, value }
    }
}
impl<T: Clone> Asset for Dynamic<T> {
    type Output = T;
    fn modified(&self) -> Modified { Modified::At(self.created) }
    fn generate(&self) -> Self::Output { self.value.clone() }
}

struct FsPath<P>(P);
impl<P: AsRef<Path>> Asset for FsPath<P> {
    type Output = ();
    fn modified(&self) -> Modified {
		fs::symlink_metadata(&self.0)
            .and_then(|metadata| metadata.modified())
            .map_or(Modified::Never, Modified::At)
    }
    fn generate(&self) -> Self::Output {}
}

FsPath is intentionally agnostic over how the actual path is read, allowing you to many different functions depending on the actual nature of the path (whether it’s a binary file, text file, JSON file, directory, et cetera).

With these base types there are then many combinators you can apply. One basic one is all, which combines multiple Assets into one for when a resulting asset is generated from more than one input file (such as this HTML file, which is generated from the source markdown and a template). It works on all kinds of containers of multiple assets including tuples and vectors. Example usage looks like:

asset::all((foo_asset, bar_asset))
	.map(|(foo_value, bar_value)| /* use both `foo_value` and `bar_value` */)

Its modified implementation takes the latest modification time of all the inner assets, and its generate implementation just forwards to the generation code of each one then packages them all up together in a tuple. However, you might notice a problem here: with the code above, if bar is changed but foo isn’t then both foo and bar are regenerated even if only bar actually needs to be.

This is where another combinator comes in: Cache. It provides an in-memory cache of the output’s value (as long as it is Clone), allowing cases like the above to simply use the cached value of foo instead of regenerating it from scratch.

struct Cache<A: Asset> {
    asset: A,
    cached: Cell<Option<(Modified, A::Output)>>,
}
impl<A: Asset> Asset for Cache<A>
where
    A::Output: Clone,
{
    type Output = A::Output;
    fn modified(&self) -> Modified {
        self.asset.modified()
    }
    fn generate(&self) -> Self::Output {
        let inner_modified = self.asset.modified();
        let (last_modified, output) = self
            .cached
            .take()
            .filter(|&(last_modified, _)| last_modified >= inner_modified)
            .unwrap_or_else(|| (inner_modified, self.asset.generate()));
        self.cached.set(Some((last_modified, output.clone())));
        output
    }
}

In the code snippet above, the generate function of Cache will first attempt to use the cached value instead of regenerating the asset if the inner asset hasn’t been modified since the cache was taken.

Another place where Cache is useful is when an asset is shared between multiple output assets (like how my “blog post template” asset is shared with every blog post), and Cache can be applied to avoid regenerating the shared asset every time.

The last combinator I will talk about here is called ModifiesPath, and it is perhaps the most important one. You can apply it to an asset that as a side-effect makes changes to a path on the filesystem, and it allows that asset to avoid rerunning itself when the asset’s age is older than the path it modifies.

struct ModifiesPath<A, P> {
    asset: A,
    path: P,
}
impl<A: Asset<Output = ()>, P: AsRef<Path>> Asset for ModifiesPath<A, P> {
    type Output = ();
    fn modified(&self) -> Modified {
		fs::symlink_metadata(&self.path)
            .and_then(|metadata| metadata.modified())
            .map_or(Modified::Never, Modified::At)
    }
    fn generate(&self) -> Self::Output {
        let output_modified = self.modified();
        if self.asset.modified() >= output_modified
			|| *EXE_MODIFIED >= output_modified
		{
            self.asset.generate();
        }
    }
}

static EXE_MODIFIED: Lazy<Modified> = Lazy::new(|| {
    let time = env::current_exe()
		.and_then(fs::symlink_metadata)
		.and_then(|metadata| metadata.modified())
		.unwrap_or_else(|_| SystemTime::now());
	Modified::At(time)
});

It is this combinator that allows the Make-like behaviour of comparing ages of input and output files and only rebuilding when necessary.

The other thing ModifiesPath does is takes into account the age of the executable it is running in, forcing a rebuild if the executable itself has been changed since the output was last generated. This is very useful during development to avoid situations where you need to manually remove the destination directory to force assets to be rebuilt.

The combination of all these features forms a very powerful build system implemented in simple Rust code. For example, suppose I wanted to make a build script that copies over source.txt to destination.txt. That would look like this:

fn main() {
	let asset = source_to_dest();
	asset.generate();
}

fn source_to_dest() -> impl Asset<Output = ()> {
	asset::FsPath::new("source.txt")
		.map(|()| {
			let res = fs::copy("source.txt", "destination.txt");
			if let Err(e) = res {
				log::error!("error copying files: {e}");
			}
		})
		.modifies_path("destination.txt")
}

And just like that, we have automatic tracking of dependencies done for free. Now suppose I wanted to add a “watch” mode that waits for changes to source.txt to happen and copies it over again. Absolutely no changes to the source_to_dest function are needed, all we have to do is layer some code using notify on top of that:

fn main() {
	let asset = source_to_dest();
	asset.generate();

	let (events_sender, events) = crossbeam_channel::bounded(16);

	let mut watcher = notify::recommended_watcher(events_sender);
	watcher.watch(".".as_ref(), notify::RecursiveMode::Recursive).unwrap();

	loop {
		let _ = events.recv().unwrap().unwrap();
		asset.generate();
	}
}

And there we are, everything is handled automatically from that point onward. Due to the in-memory caching and on-disk comparison that assets usually perform it ends up being pretty efficient, doing close to the minimum amount of work necessary between rebuilds. It could theoretically be improved if the contents of the notify::Events were actually paid attention to instead of having to repeatedly call fs::symlink_metadata a bunch, but I haven’t had a need to implement that just yet.

So there it is, a powerful and flexible build system implemented and configured from just Rust code. I haven’t bothered to release it as a crate at all - if someone asks me to I might but I don’t know if it would be useful to anyone else, or if something like this already exists in the ecosystem. But I’m sharing it because I think it’s quite a neat solution to this particular problem.

The Markdown renderer

The heart of this ad-hoc site generator is really the Markdown renderer. It’s what converts the Markdown files that I write the posts in into the HTML being rendered right now by your web browser. So it’s fitting for us to start there.

A markdown renderer consists of two main parts: the first stage that parses the source strings into a more code-friendly format, and the second stage that generates the HTML from the abstract Rust representation produced by the parser.

I don’t enjoy writing parsers, so I decided to shell out to an external crate for that. I chose pulldown_cmark because it is widely used, has a flexible API and supports a bunch of features that I really like (CommonMark + tables + smart quotes + heading IDs).

While pulldown_cmark does come with its own HTML generator and I could’ve just used that and called it a day, there are a bunch of features and additions I would like to implement that would be far easier if I could control generation myself rather than trying to modify the HTML AST after-the-fact.

So, taking inspiration from pulldown_cmark’s HTML renderer, I resolved to write my own. It works by walking once through all the events emitted by pulldown_cmark’s Parser struct and keeping track of state along the way in a gigantic Renderer type. Once the tree walk is finished, it runs a bit of finalization before dumping its relevant fields in the resulting Markdown struct:

struct Markdown {
    title: String,
    body: String,
    summary: String,
    outline: String,
}

Looking at these fields, you can probably tell why I didn’t just use the default HTML generator - there’s a lot of custom functionality in there not provided by plain pulldown_cmark. title contains the title of the page, body contains the body HTML (but excluding the title), summary contains the un-HTML-ified first paragraph of the content (this is used to put in each page’s <meta name="description"> tags) and outline is the automatically generated table of contents you can see at the top of this page.

Another reason I wanted to write my own HTML renderer is to enable syntax highlighting - by default pulldown_cmark puts all code into plain <pre> and <code> elements, but I wanted to transform it with build-time syntax highlighting instead to enable the pretty colours you can see in the code I write.

I chose the syntect crate to do the highlighting, since it’s widely used and has the features I need. It turned out to be pretty simple to add this functionality; I just embed the syntax definitions in the source code and load it in a lazy static, then use a ClassedHTMLGenerator to produce the actual HTML. The themes can be loaded separately by loading them at runtime in an Asset, converting them to CSS then concatenating with the CSS file for blog posts.

And that’s pretty much all there is to it: a single pure function that goes from markdown source to rendered HTML, to be later inserted into whichever document needs it. Actually, speaking of inserting it into documents, how does that work?

Templating

Unfortunately it’s not enough to just take rendered HTML, stick it in a .html file and call it a day. I need to add an HTML skeleton around it to add the document title, favicon, metadata and sitewide navigation links you can see on this page.

Initially, I had written my own custom templater for this. It was barely even a templater really, being so ridiculously minimal: just ~100 lines of code that replaced \{variable_name} with its contents. But as the project continued to grow I realized that I needed a better solution than that, so I decided to switch to a full-fledged templating library.

I chose handlebars for this, not for any particular reason, but I wanted to try it out since I’ve only used Tera before. I used my Asset system to create an asset that loads all the common “fragment” templates from an include/ directory as well as individual Assets for each template per page, then I combined them together and rendered it all to produce the final pages.

The template system turned out to be pretty powerful and definitely worth the extra dependency. I’m able to automatically generate pretty much everything automatically, like my list of blog posts whose content is sourced from the Markdown files only. Additionally, all the HTML boilerplate used repeatedly in every page can be abstracted to a common file which turned out to be very useful for code reuse.

Minification

To reduce page load times, I decided to minify all my HTML and CSS before writing each asset to its final file. I know that there exist minifiers for this in native Rust, but realistically all the state-of-the-art ones are in JavaScript. I ended up choosing html-minifier-terser and clean-css for this, which both seem to be well-maintained and have small output sizes.

Initially I planned on achieving maximum efficiency by using both projects as a library and starting up a single long-running Node process that I communicate with via IPC, to avoid the inefficiencies of starting up a whole new Node instance each time I wanted to minify something. But that plan ended up falling apart rather quickly, since I totally lack experience with Node and just couldn’t figure out how to get it to work. Maybe it’s just me but Node’s readable stream interface seems a million times more complicated and hard to use than Rust’s AsyncRead - it has four (!) separate ways of using the API of which none are as simple as just read_exact. And it doesn’t help that I despise writing JavaScript altogether - TypeScript makes it somewhat better but in comparison to Rust it’s just painful.

So with that plan scrapped, it was just a matter of calling into their CLIs each time (which luckily both libraries have). To avoid global dependencies I created a local npm package that uses both packages as a dependency. Then I could simply have a std::process::Command run npx html-minifier-terser or npx cleancss in that package’s directory and pipe through my files to have them minified.

The one issue I encountered is that unlike Cargo, npm doesn’t automatically install required dependencies before trying to run code. This means that in order to successfully build my website from a freshly cloned repository, you would’ve had to manually cd to the package directory and run npm install beforehand - obviously not ideal.

My first solution to this was just to run that command first thing whenever the building binary starts up. But since npm install is slow, it ends up slowing down the whole building process quite a significant amount since I have to wait for it each time. What I really needed was a way to only run the command when it hasn’t been run yet, or when the package.json changes. Lucky, my whole Asset system is just perfect for that - I could simply define an asset that runs npm install with package.json specified as its input file and package-lock.json as the output one (since npm install always updates its modification date). It ended up just being a couple lines of code:

fn asset() -> impl Asset<Output = ()> {
    asset::FsPath::new("./builder/js/package.json")
        .map(|()| log_errors(npm_install()))
        .modifies_path("./builder/js/package-lock.json")
}

And now I have the best of both worlds: fast building as well as automatic package setup.

Adding a dark theme

One specific goal I had for this site was to allow it to work in both light and dark modes, depending on the user’s chosen prefers-color-scheme setting. I was mildly dreading having to write out two large stylesheets with a different colour palette for each mode, but as it turns out modern browsers have a built-in way to change the default color scheme based on the user’s current prefers-color-scheme value. All I had to do was add one <meta> to my <head>:

<meta name="color-scheme" content="dark light">

And everything magically worked first try - if prefers-color-scheme was dark, the page would show a black background with consistently white text and if it was light it would show a white background with consistently black text. You can try it out now - if you open developer tools and press ctrl+shift+p, you should be able to enable the “emulate CSS prefers-color-scheme: dark/light” option and see how the website changes. And all that’s done entirely by the browser’s default styles. Who knew it was so easy?

The only time I did have to mess with prefers-color-scheme media queries was for the code blocks. That was easy though, I just wrote out the dark theme CSS then wrapped the light version in @media (prefers-color-scheme: light) {.

Adding the favicon

The favicon of this site is automatically generated by the build script from a single .png file in source control. I use the image crate to read in this source image, then resize it to generate two files:

  • favicon.ico, which contains 16x16, 32x32 and 64x64 versions of the icon all packed into a single .ico file.
  • apple-touch-icon.png, which has been resized to 180x180 as is suitable for an apple touch icon.

The paths of these files are then passed in to the templates, which include them in <link> tags in the head:

<link rel="icon" href="/{{icons.favicon}}">
<link rel="apple-touch-icon" href="/{{icons.apple_touch_icon}}">

I’m especially proud of this part of the code because the entire thing is implemented in <100 lines of logic and is far, far more convenient than manually using a site like RealFaviconGenerator to generate each of the files. The only downside of it is that image is ridiculously slow in debug mode, so I end up running the build process in --release all the time 😄.

A live-reloading dev server

For a long time I was previewing the website by just opening the file in the browser as a file:// URL. But this had several disadvantages:

  1. Paths like /favicon.ico would be resolved relative to the filesystem root, rather than the website root.
  2. index.html wasn’t automatically added to the end of paths if they pointed to directories and that file existed. I’d instead see a screen showing a file listing of the directory and have to click index.html manually each time.
  3. .html wasn’t automatically added to the end of paths like /blog/foo, making my links broken.
  4. 404 links did not show my custom 404.html page.
  5. I didn’t get live reloading.

At some point I switched to python -m http.server and that solved issues (1) and (2) but not the others. So eventually I’d had enough and decided to write my own server with all these features, in Rust.

Since the server doesn’t need to be particularly complex, I decided to just use plain hyper - no higher-level framework or anything. And it doesn’t need performance, so I’m only using Tokio’s current thread runtime instead of the heavier multi-threaded scheduler.

The server’s main job is to take a request path and map it to a path on the filesystem, which it does just by splitting on / and reconstructing a PathBuf. I also have some extra logic to solve problems (2) and (3) - adding index.html and .html to paths as a fallback if the requested path doesn’t exist. I guess the MIME type to serve based on file extension, which works fine for me, and also set Cache-Control: no-cache to avoid having the browser cache the pages.

To achieve live reloading, two things need to be coordinated. First, the server has to expose an endpoint that allows the browser to wait for a change to happen to any of the files it’s viewing - I do this via a /watch endpoint that accepts a list of paths to watch in its query parameters (decoded with form_urlencoded) and gives back an SSE stream that sends an empty event once something happens. Internally this is implemented with a Tokio broadcast channel of notify::Events, and a spawned task that subscribes to the channel and checks whether any of the events apply to it, sending an SSE event if so. Secondly, the client needs to produce a list of all the files it depends on and then send that in the SSE request to the server, reloading once it receives any data over that connection.

I do all that by passing in a boolean property live_reload to the templates, and only enable it when the server is running (this is easy since the server and build process share the same binary). The page will build up a set of dependencies in a URLSearchParams object then send off the request like so:

{{#if live_reload}}
<script>
	const source = new EventSource(`/watch?${params}`);
	source.addEventListener("message", () => location.reload());
</script>
{{/if}}

And just like that, we have live reloading. Whenever I edit one of the source files like the one I’m writing, a whole chain of automated events is set off, culminating in the reload of the page I’m viewing in-browser:

  1. The notify watcher sees the event and regenerates the main asset.
  2. The main asset generates the “blog posts” asset.
  3. The “blog posts” asset generates the asset for this blog post.
  4. This asset compares the dates of its input and output files, and upon seeing that the input file is newer than the output file decides to regenerate itself.
  5. The updated blog post HTML is written out to the dist/ directory.
  6. The notify watcher sees the event and passes it over to the server’s broadcast channel.
  7. The task spawned to manage the connnection to the site receives the event from the channel, and upon checking what paths it affects decides that the web page should reload.
  8. The task sends an SSE event to the website which it then receives.
  9. The website reloads, sending a new request to the server and receiving the updated blog post HTML.
Conclusion

Overall, I am extremely pleased with how this whole project has turned out. I now have my own personal website, designed in exactly the way I like it able to support exactly the workflow that I like, with almost everything completely automated with the power of code.

Do I recommend it if you want to start your own website? Not really, unless you’d do this sort of programming project anyway. All in all it took about a week to set up, and I was working on it for several hours each day. I can probably imagine that using an existing static site generator is a thousand times easier and faster and produces just as good output. But it was an extremely fun project for me do so I can definitely recommend it in that sense.

If you want to check out the actual code it’s on GitHub and contains all the things I talked about here, as well as some more mundane stuff I left out the article for brevity. Its file structure is located into three main folders:

  • builder, which contains the Rust source code of the crate that builds the site and runs the server.
  • template, which contains Handlebars templates, CSS, code themes and various other dynamic configuration parts related to the site.
  • src, which contains the source Markdown of the posts I’ve written as well as the favicon of the website.

Anyway, I really hope you enjoyed reading this post and maybe learnt something you found interesting. See you next time!

https://sabrinajewson.org/blog/building-this-site