Aaron Turon — GeistHaus

Sep 13, 2020 Updated Sep 13, 2020

Show full content

Grief has been in my life for two years. It took me by surprise, and it changed everything.

There are some Big Words that we reserve for Big Circumstances. “Grief”, we see as a sacred process surrounding a Big Loss. “Trauma”, we see as a wound from a Big Event, like a war or imminent physical threat. In therapy, I’m learning about the urge to draw lines, making binaries saying “that’s grief, that’s trauma, that’s not”. I’m learning that while there is a place for binaries, they often lead us astray, keeping us pinned to an unhelpful story, or preventing us from accepting a helpful one.

It was easier for me to admit to grief than to trauma, because the evidence was unmistakeable, even if the causes were less familiar.

My first bout of grief, like most that followed, came on the heels of an intensely positive moment. I felt safe, loved, held. I was a little high. And suddenly, without warning, a shocking sentence formed on my lips, a total non-sequitur: “there’s a hole where my dad should be”. What followed was one of the most animalistic moments I have ever experienced. I sobbed, screamed, moved around on all fours, and felt as if I were vomiting. It was completely instinctual. Throughout, a parade of memories marched in my mind, memories of childhood and beyond, supplemented by new feelings and new understanding.

For the first time, I let myself see and feel my experience of emotional neglect. I saw my years of longing, my failed attempts at connection. I felt what I had unconsciously held at bay throughout my life: the pain of not being understood, loved, and attended to in the ways I needed at the time. After three decades, I let go of the story of my father’s perfection, the impossibility of him hurting me. I let myself see and feel my wounds, and I grieved.

I think we fear grief because it seems immensely painful. But these days I welcome grief, because it is me admitting to the pain that is already there, and allowing that pain to move through me and be released. It is a letting go of resistance to reality, feeling the feelings that are appropriate to a new admission. Grief is reconciliation to a world that differs from the story we have clung to. The bigger the story, the bigger the grief.

When someone important to us dies, we grieve the story we’ve clung to that tells us they will always be in our life, the story of our plans and hopes and dreams with them. We’ve organized ourselves around that story, made it load bearing in the meaning of our life. That is as it should be. And in proportion, when we grieve we adjust our assumptions, and feel the pain of doing so.

If grief is relinquishing load-bearing assumptions, trauma is experiencing something beyond our ability to cope – an overwhelming experience that cannot be metabolized head-on. We have to shut it down somehow, which often means telling ourselves a story that denies part of what happened, because the associated feelings are just too much.

As a child, abuse and neglect are overwhelming, especially at the hands of the adults we’re programmed to see as our perfect caretakers. We invent a story that maintains their perfection, either denying what we experienced, or finding a way to take the blame for it – anything to keep the horror of the truth at bay. But the mind is not entirely fooled, and the exiled horror and pain remain, locked away behind a story that requires increasing distortion to cling to, lest some of the pain leak out.

After that first session of grief around my childhood, something even more surprising happened. I found that my typical social anxiety just… disappeared. I stopped shrinking from strangers; I didn’t feel I was an imposition. I ceased thinking that I needed to know all the rules and expectations; I was simply present, in myself, and confident that that was enough. While that magical period lasted for only a couple of weeks at first, it provided a glimpse into what healing could mean for me. It showed me that so much of what I had taken for granted about myself was, in fact, changeable. And in the two years since then, this kind of change has begun to take deeper root – always punctuated by the release of grief.

I have C-PTSD. The “C” stands for “complex”, which means that the traumas I experienced were not isolated events (like witnessing horrible violence), but rather ongoing situations. The coping strategies are likewise complex. I weaved stories hewing to key binaries, like the “perfection” of my father. I began to assume there was something innately undesirable about me, that I needed to carefully attend to the expectations of everyone around me and make sure I followed their rules, lest I be seen as a burden. Social anxiety was an inevitable result – especially around strangers, where I would do the work of discovering rules and expectations in real time. There was no room for my actual self. I was too busy protecting myself from further trauma.

PTSD has to do with exactly this: the felt, ongoing threat that past unprocessed trauma could resurface. With “simple” PTSD trauma recurs via explicit flashbacks, where something in the present sends you completely back to a specific event in the past. With C-PTSD, the flashbacks are more subtle; they are termed “emotional flashbacks”, meaning that you begin feeling the way you did for in the past, even though you have a cognitive grasp of the present. Either way you re-expereince what you’ve work so hard to contain, which often leads to a redoubling of your coping strategies.

With C-PTSD, it’s possible to live in a state of perpetual flashback, where every present experience is colored by unresolved past trauma and the ways you tried to deal with it. You can become a servant of your coping strategies. Gradually, the color and meaning and feeling of your life fade. Everything becomes a disconnected game of self-protection against the trauma you’ve never fully faced.

Grief is a way through. Grieving is a process of overcoming denial and other old coping strategies, and making your way toward acceptance of what really happened. It metabolizes those experiences that were originally too overwhelming to handle. When you begin to see and feel the fuller truth of what happened, you are released from your strategies and the beliefs they encompassed. Radical change is possible. Anxiety can melt away. All because you can see the past for what it is, and the present for what it is, too.

I’ve now grieved aspects of every major relationship of my life. When I slowly realize that I’ve been dwelling in flashback, I know that eventually grief will break through. Each time around the cycle, I feel freer for longer, letting go of another layer of rigid belief and behavior meant to keep the trauma away. I’ve been able to feel and process a bit more trauma, integrating it into a more whole life story. And I find myself fully in my story in the present, empowered to take it in new directions, at last.

If you want to learn more about trauma, C-PTSD, and grief, I recommend Pete Walker’s “Complex PTSD: From Surviving to Thriving”, a book that helped me understand my experiences and empowered me with tools to navigate them.

http://aturon.github.io/personal/2020/09/13/grief-trauma-healing

Back in the saddle

Jun 25, 2019 Updated Jun 25, 2019

Show full content

Hi y’all! After some time away on mental health leave (which you can read more about here if you’re curious), I’m fully back to work at the Mozilla Rust team. I won’t be returning to Lang Team or Async work, but will instead be working full-time on compiler engineering. The first goal is to work with Zoxc and others to help ship parallel rustc. After that, there’s tons to do on the trait system. I’m very excited about focusing on some actual Rust programming.

I wanted to take this opportunity to thank Mozilla, which has been very accommodating and understanding with my need to take some time, and especially to thank my dear friend Niko, who quietly and competently took on the workload around this transition.

http://aturon.github.io/tech/2019/06/25/back-in-the-saddle

Survivor skills

Jun 24, 2019 Updated Jun 24, 2019

Show full content

I usually start with some pivotal moment and work outward from there. Leaving my religion. Getting on antidepressants. Starting therapy. My partner coming out as trans. Starting therapy again. It doesn’t really matter; the story of my mental illness and the story of my life are the same story.

So let’s start in Indiana, on the worst day of my life—the day before my first of a dozen interviews for faculty positions at all the top universities. I had no intention of taking an offer from Purdue; it was a practice run. So why was I so panicked that not even sleeping pills could help me rest? I was hot shit on the market that year. This was supposed to be a slam dunk, but I was imploding.

My clearest memory from that day was me, standing out in the snow, waiting for my partner to pick up the phone. I vividly recalled time spent watching my infant daughter: there but not really there, work and worries forever stealing my attention. I knew what it would look like if I stayed on that path. I said the words “I don’t want this” into the phone, and cancelled the rest of my interviews.

Exactly five years later, I did it again, stepping down from a leadership role, for the exact same reason. I had tried a lot of things, but something was still wrong.

As a branch of medicine, psychiatry starts with symptoms, diagnostics, and disorders. I had some mix of Generalized and Social Anxiety Disorders, with occasional depressive episodes. My primary emotion was dread: a floating sense that something, somewhere, was wrong and it was all going to fall apart. The “generalized” part meant that the dread would latch on to whatever I could rationalize it with. If I don’t make this deadline, my career will fail. Any day now they’ll see I’m a fraud and I’ll be fired. At some level I knew these worries were overblown, but I couldn’t seem to put them aside. No matter where I was, there was always a tension, a nagging feeling that it would be dangerous to fully relax into the moment.

For those five years I tackled my disorders head-on. If something promised to reduce anxiety, I did it. Yoga; two hour walks; jogging; meditation; dropping caffeine; cutting out sugar; probiotics; sleep tests; a slew of acronyms like SSRIs, CBT, THC, CBD, and more. All of these helped, for a while—and sometimes they helped a lot. I remember Christmas a couple of years ago, when my antidepressant kicked in, and I felt that tension truly lift, for the first time in many years. But somehow it always snuck back in.

I was searching for that one weird trick that would suddenly cure me. But healing is messy and non-linear, and it doesn’t happen overnight.

In the months following my second implosion, I came to think about mental health differently. Rather than focusing on symptoms, I focused on areas of rigidity. What are the things I won’t let myself see or think or feel? What kinds of compulsive behaviors does that lead to? What assumptions do I habitually make, and how do they affect my life? My therapist encouraged me to see psychological flexibility as a measure of mental health.

Here’s the rub: these areas of rigidity are pesky fuckers, invisible at first, so reinforced into habit that you’re barely aware of them.

When I go into someplace like a coffee shop, I often feel like I’m imposing on everyone there. I am hyper-aware of the people around me. Where does the line end? Am I going to the right place? Is someone going to get frustrated that I didn’t move at the right time? Oh shit, the person in front of me didn’t see that it’s their turn, should I do something? OK, it’s my turn, I need to appear friendly but make clear that I want the minimum amount of interaction. Where am I supposed to wait?

These thoughts, feelings, and behaviors are so automatic that I barely register them. But I am nevertheless ruled by them. They steal my attention and energy. They inject anxiety. And they push me toward avoidance and denial, to the point that I could find myself preferring an uncomfortable bladder (“I don’t have to pee that bad”) to the discomfort of asking someone where the bathroom is, without even realizing it. And the list of my rigid habits is long and growing, as I build the skills to discover them. I started to wake up only when I realized that I saw my life as one big to-do list, an unending gauntlet of obligations in which I had less and less agency over time. I was very ill.

My deepest area of rigidity, the habit that most shaped my life? Success. In particular: achieving whatever the authority figures around me seemed to want, even if they didn’t articulate it themselves.

Rigidity is often born from survival. Maybe you survived an environment where your needs weren’t being met; you found your way to the best substitutes you could. Maybe you survived an acute trauma, like a threat to your physical being. Maybe the trauma was repeated, like being hit or yelled at as a child. Whatever the origin, your mind and your body worked overtime to protect you, pushing you into patterns, aversions, and strong feelings that guided your behavior toward survival. That survival often meant avoiding the full feeling of what you were surviving.

In other words, the habits are hard to see, but their origin is even harder: it’s the very thing that your mind is trying to protect you from experiencing.

A book that’s been very important for me, The Drama of the Gifted Child, portrays a common story, one that I share, and how it shows up in therapy:

In the very first interview they will let the listener know that they have had understanding parents, or at least one such, and if they ever lacked understanding, they felt that the fault lay with them and with their inability to express themselves appropriately. They recount their earliest memories without any sympathy for the child they once were, and this is the more striking since these patients not only have a pronounced introspective ability, but are also able to empathize well with other people. . . In general, there is a complete absence of real emotional understanding or serious appreciation of their own childhood vicissitudes, and no conception of their true needs—beyond the need for achievement.

The portrait ends with a haunting line:

The internalization of the original drama has been so complete that the illusion of a good childhood can be maintained.

It took me thirty-three years and a lot of self-compassion and curiosity to uncover the truth of my own childhood, which I always saw as safe, comfortable, supported, and loving. Even now, as I write these words, I feel the pull to make excuses, to second-guess myself, to say “it wasn’t so bad”. The adults in my life were doing the best they could, and I didn’t know how to tell them what I needed. They only hit me when I was little, and it was just spanking, not beating. It never occurred to me that the reason the violence stopped was that I had figured out what I needed to do to survive, that I had learned how to please my parents, to hide the parts of myself that would get me in trouble.

I learned an even deeper lesson. Love for myself as I was—all of my humanity, all of my inner experience—was not available to me. But there was a substitute: attention and praise for what I could achieve, which started very young. I was given an IQ test in first grade, and enrolled in a gifted program that bused kids from all over the county into a special classroom once a week. I was told I was different, that I had special worth that separated me from my peers, because certain intellectual activities came easily to me. This message was repeated over and over throughout my life, all the way to the brink of becoming a professor. And because I was singled out, pulled away from my childhood into more “worthwhile” activities, I missed out on a lot of basic skills that didn’t come so easily to me.

The story almost tells itself from here. Humans have a need to be seen and loved as they are. I was lost in a wilderness that couldn’t meet those needs, not in a healthy way. I sampled the nuts and berries I could find, and some made me sick (getting yelled at or spanked). Through trial and error, I discovered what would keep me alive, even if ultimately malnourished. By the time I left the wilderness of my childhood, I was so habituated to the survival skills that I developed that I didn’t realize it was possible to live any other way.

For me, success was never a matter of ambition; it was a means of survival. It was the only way I knew how to be in the world, and I got really fucking good at it. But achievement is not a well-balanced diet. Like junk food, it just leaves you wanting more. And let me tell you, society is more than happy to present an unending ladder of success.

I’ve been explicitly working on my mental health for more than five years, but I feel like it’s only in these last months that I’ve really started to heal. That healing has come from going beyond mindful awareness of my thoughts and habits to uncovering and making sense of their origins. Of course I’m an anxious wreck! I’ve been throwing all of my energy into what I learned, very young, was the “best” way to meet my needs—into things that could never fully meet my needs. By surveying my life—by facing my childhood for what it was, by mourning for the lost years, by letting myself feel the terrible isolation and abandonment and longing for love I experienced in my childhood, by raging at my parents and their inability to meet or even comprehend my basic need to be seen and loved regardless of what I did—I have begun to loosen the rigidity, to let go of the need for success, to open myself to new experiences, to fail, to discover my self, and to be seen and loved for it.

I started down this road with four words: “I don’t want this”. But I’m finding my way with another four:

I am a survivor.

http://aturon.github.io/personal/2019/06/24/survivor-skills

Version selection in Cargo

Jul 25, 2018 Updated Jul 25, 2018

Show full content

When there are multiple ways to resolve dependencies, Cargo generally chooses the newest possible version. The goal of this post is to explain why Cargo works this way, and how that rationale relates to several recent discussions, including:

Whether we should support a “minimal version selection” option, and if so how it should relate to CI and the existing ecosystem.
Whether we should state a minimal Rust version in Cargo.toml in a way that affects dependency resolution.
The work on vgo, a new package management tool for Go that explicitly opts for selecting the oldest possible version.

Version selection goals

No one likes spending time futzing with their dependencies instead of writing code on top of them. For Cargo (and, I think, most package managers) that translates to the following design goals:

Reproducibility. After building, it should be easy to perform an identical build again, even on a different machine, so that debugging can proceed from a firm foundation.
Control. Users should have control over when and how dependencies are upgraded, so that surgical fixes can be applied.
Compatibility. It should be easy to find versions of direct dependencies that work together.
Maintainability. The support burden should be minimized and evenly distributed, rather than falling entirely on the upstream or downstream sides.

The “dependency hell” experience comes down to one or more of these goals being unfulfilled. But some of the goals are directly at odds! For example, if we want to give clients fine-grained control over version selection and make it easy to find compatible sets of versions of libraries, we’ll be asking for a higher maintenance burden across the ecosystem. That’s because ensuring compatibility generally requires testing and bug-fixing, and the more combinations of versions that arise, the more testing and fixing that’s needed.

The role of the package manager is thus to provide mechanisms, defaults, and best practices that push the ecosystem toward a good balance across these goals. It’s an imperfect science, involving some social engineering and guesswork; there’s not a clear best way to go about it.

Rationale for maximal version resolution

Most of the time, there are many, many valid ways to resolve a dependency graph. Even in the simplest case of having a single dependency, if you use the typical version constraints multiple matches will be possible:

winapi = "0.3.0"

This version constraint asks for any version compatible with 0.3.0, which for Cargo means any 0.3.X version (currently, the latest is 0.3.5). How do we decide which of the compatible versions to go with? Most package managers, including Cargo, take the maximum (newest) version possible. The new vgo package manager from Go is a notable exception in taking the minimum version.

Let’s examine this question in light of the design goals:

Reproducibility:
- If we select the maximum version, dependency resolution will produce different results as new versions of crates are published. Thus, to achieve reproducible builds, a separate mechanism is needed to record the state of the world at the time of the build: the lockfile.
- If we select the minimum possible version, dependency resolution will give the same result even if new versions are published, so no lockfile is needed to achieve reproducibility.
Control: the version selection strategy doesn’t have direct bearing on the question of control, because users are always free to use more restrictive constraints, like “=0.3.1”.
Compatibility: since we’re talking about different valid resolutions of dependencies, we’re already in a situation where the dependency graph can be resolved. But whether the resulting code actually compiles and works together is another question! All else being equal, what will make compatibility most likely is if the specific combination of versions has been actively tested and debugged.
- If we select the maximum version, then at any given point in time, the current maximum versions of crates will be actively tested against each other (due to CI), and hence likely to work. Put differently, there’s an ecosystem-wide agreement on which versions to test compatibility with each other: the latest versions.
- If we select the minimum version, the version we actually get will depend on what minimum versions happen to appear in any transitive dependencies. In other words, it’s the minimum version that can satisfy our particular dependency graph. Thus, unlike the situation with maximum version selection, there is not ecosystem-wide agreement on which versions will be used together in CI and elsewhere; the versions chosen will vary across projects.
Maintainability:
- If we select the maximum version, downstream users are likelier to get the latest bugfixes; for apps, lockfiles help protect against the opposite problem of trading known bugs for unknown ones. Furthermore, as mentioned above, the ecosystem-wide agreement on the “frontier” of versions to test with each other means that bug reports against old versions (and expectations of backports) are less likely. Active maintenance is focused on the latest releases across the board.
- If we select the minimum version, there is greater chance of already-fixed bugs biting users. Furthermore, as mentioned above, the version combination a project ends up with is more likely to be unique to that project, and hence seen less testing over all. Active maintenance is spread across a larger range of versions.

The Cargo team believes that, on balance, maximum version selection provides a better experience across the ecosystem, with the primary cost being one of conceptual and implementation complexity: the lockfile.

It’s important to note, though, that while the maximum version approach tends to focus the ecosystem on the latest versions of crates, there are still plenty of circumstances where other version combinations arise:

For an app with an existing lockfile, the versions will be held steady regardless of new publications. However, at the time the lockfile was produced, the versions selected were the latest ones available, and hence were receiving active testing and maintenance at the time. Similarly, when dependencies are subsequently adjusted, Cargo will “unlock” the affected dependencies and again choose the maximum version.
Bounded constraints like = and <= prevent Cargo from choosing the newest version. These constraints are rare, especially for libraries, but they relate to toolchain version requirements, as we’ll see next.

Toolchain requirements

The Rust community has had recurring discussions about what kinds of constraints libraries should impose on the compiler toolchain they use, and how those constraints should be expressed:

On the one hand, library authors would like to use the newest Rust features.
On the other hand, doing so means their clients must be using an up-to-date toolchain. This can be a hardship in situations where it’s hard to change the toolchain, due to deep integrations or other constraints.

Today, the most widely-used crates in the Rust ecosystem have adopted an extremely conservative stance, effectively retaining compatibility with the oldest version of Rust possible, in some cases with a three-year-old toolchain. For a language as young as Rust, that’s pretty painful.

Proposals for addressing this problem fall into basically two camps:

Shared policy: rather than have core libraries each be “as compatible as possible”, instead set a clear, ecosystem-wide policy on what level of compatibility is expected. This was first proposed in 2016, and has been revived as part of the current long-term support (LTS) proposal. The key idea is that it is not considered a breaking change to update the compiler version required, as long as the new requirement is within the compatibility policy. For LTS-level compatibility, that means that the crate is always free to depend on the latest LTS toolchain.
Stated toolchain. There have been several RFCs proposing to specify toolchain requirements as part of Cargo.toml, and have those requirements affect dependency resolution; the latest such RFC is currently open. In this model, crates could freely bump the minimum compiler version needed, and Cargo would only resolve to a version of the crate that supports the compiler toolchain being used.

Let’s again analyze this situation in light of the design goals we started with (except for reproducibility, which isn’t relevant here):

Control:
- In the shared policy approach, control is very limited. Library authors don’t choose arbitrary toolchain versions, but instead commit to compatibility with a release channel (LTS, stable, nightly).
- In the stated toolchain approach, everyone has a lot of control. Library authors can set their toolchain requirements in any way they like, for any library release they like. Consumers can likewise choose any toolchain to work with, and Cargo will look for a compatible dependency resolution.
Compatibility: Note first that Rust toolchains are regularly tested against the entire crates.io ecosystem, so unlike with version selection above, there’s less concern here of finding a combination that “resolves but fails to compile/work”. The concern is more about finding a resolution at all.
- In the shared policy approach, similar to the “maximum version selection” we saw before, there’s an ecosystem-wide agreement about what versions to test on and be compatible with: the latest LTS toolchain. If we assume that the majority of users are able to stay at least on or above the most recent LTS, then toolchain compatibility is a non-issue, at least for core crates.
- In the stated toolchain approach, the toolchain being used to compile effectively imposes an =-style version constraint. That means that we are somewhat less likely to get the latest versions of all our (transitive) dependencies, since some of them may require newer toolchain versions; we will of course get the latest compatible versions. It’s hard to say for certain, but this seems likely to create a larger set of crate version combinations than we see today, and thereby diffuse the testing for compatibility.
Maintainability: Here the maintenance burden is largely on library authors.
- In the shared policy approach, library authors are often “stuck” on an old (LTS) version of the toolchain, though not as outdated as with many crates today; that imposes a maintenance cost. On the other hand, there’s a much greater chance that their clients are using the latest version of the library due to this generous toolchain compatibility, which helps with maintenance (since bug reports tend to be targeted toward the current release).
- In the stated toolchain approach, the tradeoffs are exactly the reverse: it’s easy to upgrade the toolchain requirement at will, but the cost is that doing so effectively creates an LTS version of the library, because users stuck on old toolchains will also be stuck on old library versions, and hence file bug reports (and request backports) for them.

There’s not a clear winner here! And there are a lot of other, emergent factors to consider as well:

Rust’s rapid release process is based on the idea that most developers will keep their toolchain up to date, since each incremental update is small (as opposed to “big bang” updates on a much slower cadence). There is some risk that the stated toolchain approach will reduce incentives toward upgrading.
Even if crates can state toolchain requirements, there’s still the question, for core crates, of what requirements are appropriate. Bumping the requirement won’t break clients right away, but it will cause problems if those clients want to update to gain new library features (but stay with the old toolchain). In other words, it seems possible that the benefits of the stated toolchain approach are illusory, and that in practice critical crates will stick with very conservative toolchain requirements.

For me, that last point is a clincher: I think that forming a good shared policy is going to be needed regardless, and that doing so will address most of the toolchain requirement issues we have today. I similarly think that it’s quite valuable to retain the true maximum version selection that we have today, rather than constrain it by a toolchain filter.

In the long run, it could even make sense to combine the two approaches, allowing crates to state their toolchain requirements (and have that influence resolution), but encourage core crates to state “LTS” as their requirement.

Checking the minimal resolution

Finally, I wanted to address an interesting aspect of the current approach to version resolution: most Cargo.toml files do not give an accurate lower bound on their dependencies! Going back to the winapi example, if the stated dependency is “0.3.0”, because we will resolve to the maximum version, we can freely rely on a feature that only appeared in 0.3.2.

Simple minimal version selection wouldn’t immediately address the issue, because one of our other dependencies could itself have a dependency like winapi = "0.3.2", which would mean we’d compile against a newer version than what we stated. To get a truly precise lower-bound, we have to (1) resolve to minimal versions and (2) check those versions against all the ones stated in the root Cargo.toml. There’s been some work to add such capabilities to Cargo, but there’s an open question: do we care?

The lack of lower-bound precision hasn’t been a problem for Cargo so far because, in general, we eagerly resolve to the maximum version; any requirement on newer library features will thus be automatically fulfilled.

However, there are at least two ways this could become more of a problem in the future:

If we adopt the stated toolchain approach above, we end up imposing more =-style constraints, which in turn can prevent us from choosing the globally-maximum version of crates. The effect could be that everything passes CI just fine, but a user with an older toolchain gets a crate resolution that fails to compile (rather than a resolution saying “you need a newer toolchain”). Notably, the lower-bound precision issue also applies to the stated toolchain, as well.
It’s possible that we will eventually have workflows that depend on the accuracy of lower bounds in Cargo.toml. At the moment, however, this is purely speculative; the Cargo team does not have any ready examples.

If we do decide to care, an approach to improve accuracy is to document, as part of CI best-practices, that a build with --minimal-versions should be performed in CI in additional to the normal build. We could likewise build that test into crate publication.

Wrapping up

While we didn’t reach crystal-clear conclusions on the current open questions, the main goal here was to lay out more explicitly a way of thinking about the design space. As with the Ergonomics Initiative post from last year, I’m hopeful that this framing can help give us some shared vocabulary for grappling with the current and future design question in Cargo.

For the particular questions examined here, I’d very much appreciate comments on:

The minimum version RFC.
The LTS RFC.
The CI best practices issue for --minimal-versions.

http://aturon.github.io/tech/2018/07/25/cargo-version-selection

aturon.log: listening and trust, part 3

Jun 18, 2018 Updated Jun 18, 2018

Show full content

In this third post in the listening and trust series, I’m going to talk through one of the most intense discussions the Rust community has had: the module system changes that were part of last year’s ergonomics initiative.

The saga, summarized

The modules saga demonstrates both payoffs and pathologies of the RFC process, playing out over a dozen different threads reaching 1,400+ comments in total.

It was, in the end, a success – at least as gauged by the collective enthusiasm for the final result, compared to the starting point. Yet it left wounds that have not entirely healed, which is part of why I want to talk about it here.

The 2017 roadmap focused on productivity and learnability, and the Lang Team took a look across the language for areas of improvement. Modules were a well-known stumbling block for many users, though many others (including most of the Lang Team) found it simple and easy to grasp. So the first order of business was working to understand what people found confusing or difficult about modules, which led to a couple of initial threads:

@withoutboats’s post, The Rust module system is too confusing.
An internals writeup of a Lang Team discussion about the privacy aspects of the module system.

These early threads produced some important insights into how different people experienced the module system. They also highlighted the level of controversy to expect around any discussion involving such a fundamental change, even one that was far from a complete proposal.

A few months later, a subset of the Lang Team and some others spent a few hundred person-hours delving into both the problem and solution space. We worked through about a dozen different designs before finally reaching a mix of ideas that seemed plausible enough to present to the community. I did so in an initial blog post, which also took a stab at a “comprehensive” analysis of the problems. The post generated an enormous amount of discussion, and a week later I closed its thread in favor of a new one with a revised proposal, which @withoutboats revised further. There were also a handful of other threads with additional proposals, or that drilled into specific aspects in greater detail.

One of the problems the original proposal called out was “path confusion”. The revised proposal summarized some of the feedback as saying:

Many on the thread cited this as the core problematic issue with the module system; I’ve collected some data about confusion around Rust modules which also supports that to a degree.

and suggested an approach that gave more weight to solving those problems.

After reaching what seemed to be a rough consensus on the internals thread around the third design, @withoutboats wrote up a complete proposal as an initial RFC. A similar story played out, with that initial RFC garnering quite a bit of feedback in multiple directions, and ultimately being closed in favor a second, and then a third (and final) RFC.

The RFC that was ultimately accepted bears almost no resemblance to any of the initial design sketches. It ultimately took the “path confusion” issue as the problem to address, and oriented the design more completely around that issue than any of the earlier proposals did. (Discussion of some aspects of the design is ongoing; there will soon be a 2018 Edition Preview where we’ll be looking for further feedback.)

With that basic background in place, I want to examine some of the social dynamics that played out along the way, from the context of listening and trust.

Momentum, urgency, and fatigue

I think that, collectively, we all remember the modules discussion as intense. But it’s interesting to dig into the ways it was intense.

In my memory, the discussion was heated. But it turns out that memory was faulty: when I went back and re-read all of these threads, I was shocked by the relative lack of heat! Granted, there were a few outlier comments, but I came away convinced that the sense of intensity was not primarily about the discussion being charged.

What I noticed, instead, is a recurring mention of the length and velocity of the comment threads involved. Threads were accruing hundreds of comments per week, and there was a sense of high stakes (the Lang Team is considering changing the module system!), so many people felt compelled to get involved, at least at the beginning. And the only way to do so was to participate in those threads, thus compounding the effect.

While the modules discussion was an extreme case, the issue of comment thread velocity is a familiar one in Rust. A high velocity thread often seems heated and “controversial”, even if the discussion is respectful and chock full of insights. I think this is part of why feelings about the modules discussion are so complex, and why it seems an exemplar of both the best and the worst of the RFC process.

I personally don’t see high comment velocity as a root problem, but an issue that relates to deeper dynamics:

Momentum. A comment thread has a kind of “momentum” of sentiment that can be hard to shift, and also hard to gauge as the thread gets long. If an RFC has an initial batch of negative (or positive) comments, it can be difficult to recover, in part because these are the first sentiments everyone sees.
Urgency. Because comment threads are a major input into the decision-making process, there’s a sense of urgency to participate and keep the discussion “on course” from your personal perspective. This urgency is compounded when a thread is fast-moving or lengthy, or when a proposal originates from a team member (and thus is reasonably seen as having a higher chance of landing).
Fatigue. Many people participate early on in an RFC thread, only to ultimately step away because the thread has too much traffic to keep up with or influence. There’s also sometimes a feeling of a topic getting discussed “to death”; many felt that way toward the end of the modules saga.

Some of these social dynamics are inevitable with a project as large and open as Rust. But the net effect is a bit like the one I talked about in the first post in the series: the sense that you need to be “in the room when it happens”, and that it takes a lot of time and stamina to do so. In this case it’s not about the moment of decision per se, but rather the struggle to set the direction of the comment thread. I often hear from people who are intimidated by the RFC process precisely because of the huge comment threads. Not to mention, of course, the enormous amount of work needed to fully participate in those threads.

These issues are particularly pronounced for early-stage discussions. “Brainstorming” can get overwhelmed either by a deluge of ideas, or (much worse) strong attempts to kill an idea before it has any chance to take root.

But I don’t think comments themselves are the problem; the whole process is, after all, a request for comments! I think the problematic dynamics stem instead from two core process problems:

A lack of clarity about the “stage” of any given discussion. A thread brainstorming on a new way to approach Ok-wrapping should not need to recapitulate fundamental disagreements on whether Ok-wrapping is desirable.
Too much emphasis on “the thread”, rather than on standalone artifacts. We don’t have a good process or culture around reflecting the discussion into the RFC itself, and while we do sometimes make “summary comments” to help manage discussion, they tend to get lost in the noise. The RFC thread takes on a primary, high stakes role instead.

I believe that if we adjust our process to address these two issues head-on, it will go far in further eliminating the requirement to be “in the room” at the right time, and the negative effects that come with it.

Wielding power; changing minds

In my last post, I mentioned a sentiment I’ve often seen, one sometimes made in reference to the modules discussion:

Luckily enough of us yelled to stop the terrifying original proposal from happening; the moment we stop speaking up, Those People will start pushing in that direction again.

I understand where this sentiment comes from; the RFC that landed indeed bore little resemblance to the starting point, due in part to pushback. But there are two distinct ways to understand why RFCs change, and what it means:

Wielding power. There were some particular aspects of the early modules proposals, like removing the need for mod statements, that garnered a strong negative reaction from a number of people. It took a lot of iterations, but ultimately that part of the proposal was dropped. It would be reasonable to see this as part of the community asserting itself, being loud enough about strong preferences to make it clear that certain changes would be intolerable. The result could be a capitulation, or compromise, in which the original proposers relent and “take what they can get”.
Changing minds. On the other hand, a number of folks who were positive about the original proposal were even more positive about the final one. For them, the final proposal wasn’t a compromise at all, but rather the best option.

I’m personally in the latter camp: I’m much happier with the RFC that landed than with the original proposal, and I wrote both of them! But I think both of the above elements were at play. Let me explain.

It was not until very late in the process that we stopped proposing to drop mod statements, and there’s no question that, had it not been for the persistence of a few people, it wouldn’t’ve happened. Power was, indeed, wielded. But the reason for changing the proposal wasn’t simply “well, we can’t get this through, so let’s scale back”. Rather, those lengthy comment threads and disagreements forced us all to dig deeper into the problem space. And we eventually learned that our original analysis of the core problem was just wrong, that the “path confusion” issue that I initially treated as secondary was actually the problem.

This is part of why I don’t want to put blame on comments themselves. Our willingness to dig deep and long to find new insights and more nuanced designs is a big part of what’s made Rust the language it is, and why I love working on it so much. Sometimes talking something “to death” is exactly what’s needed to uncover the right set of ideas.

I don’t think it’s the job of the Rust Teams to seek a compromise solution, which is a recipe for design by committee; we need a strong, coherent final design. I think we should be convinced that the solution is within striking distance of the best for our plural community. And thus the role of the RFC process is precisely to facilitate deep digging, to explore ideas, tradeoffs and constraints and look for the option that genuinely seems best.

Lived experience; active listening

A final dynamic that showed up throughout the modules discussion: reports of “lived experience”.

Going back to mod statements, several people talked about the role they play in their personal workflow, whether due to their IDE, their lack of IDE, their habits with respect to temporary files, or even the latency of their file system.

No one can be wrong about their own lived experience. And lived experience can bring issues to life in a way that pure empathy and speculation can’t. Thus, a big benefit of the RFC process is the crowdsourcing of lived experiences that it provides.

Unsurprisingly, one’s own lived experience always looms large. But our job in building a language is to account for the experience it provides for a large and diverse set of current and future users, many of whom are not well-represented on RFC threads. Thus accounts of lived experiences are data points, at best proxies for the experiences of similar users, but ultimately information that needs to be weighed in a global design space.

The work of an RFC thread is in part to employ active listening to turn lived experience into design constraints. In cartoon form this might look as follows:

A: “I am against this RFC.”
B: Why?
A: “I don’t want to get rid of mod statements, I think they’re very important.”
B: Can you say more about what role they play for you?
A: “They make it easy to temporarily remove modules while refactoring.”
B: OK, so you have as a design constraint that workflows for refactoring should remain ergonomic?
A: “Yes.”

The last post talked about the feelings that inevitably come into play in any discussion you care about. They’re strongest when they touch on lived experience. They should not be hidden away, but part of the emotional labor of the RFC process is to recognizing such feelings as emerging from our personal experience, and working introspectively to dig out the actual constraints that represents – and then to weigh those against the constraints of other present and future Rust users. We can help each other to do so, as in the cartoonish dialogue above. But even better if we can each do some of that work privately first, and come to the thread not with a flat “I am against this RFC” but rather “I’m concerned about refactoring workflows; here’s what my personal one looks like…”

Last week a few folks from the Rust Core Team got to spend a few hours talking about the RFC process in person – something we’ve done many times before, but that led in a more conclusive direction this time around. Niko is writing up some of the ideas, which are partly aimed at the problems raised in this post.

Ultimately, though, we can’t solely rely on process improvements; we need to do the work of reflecting on, writing down, and improving our design culture as well. I plan at least one more post in the Listening and Trust series, and then on to other, broader topics.

http://aturon.github.io/tech/2018/06/18/listening-part-3

aturon.log: listening and trust, part 2

Jun 2, 2018 Updated Jun 2, 2018

Show full content

In the previous post in this series, I recounted an early lesson for the Rust Core Team about working in the open. In this post, I want to talk about the delicate interplay between listening and trust when doing design in the open.

I honestly despise being subtle or “nice”. The fact is, people need to know what my position on things are. And I can’t just say “please don’t do that”, because people won’t listen. I say “On the internet, nobody can hear you being subtle”, and I mean it.

That’s Linus Torvalds on talking and listening in OSS. There is, of course, a long and continuing battle in the OSS world around codes of conduct, and Linus is often cited in these debates (by both sides). Given that the Rust community is firmly in the pro-CoC camp, it’s tempting to think that what Linus is describing here is simply not relevant in the Rust world.

But notice that Linus talks about two things here: being subtle, and being nice. The “being nice” part is indeed covered by codes of conduct, and is by and large not an issue for the Rust community. But the “being subtle” part is, well, more subtle.

To be concrete, saying “This idea is insane” or “An idiotic unreadable mess” is obviously not being nice, and the CoC draws a clear line here. But what about “I very strongly object” or “Doing this would ruin what I love most about Rust”? These aren’t personal attacks, and they’re often given along with detailed technical critiques. Moreover, they accurately describe the feelings being experienced by the author! Yet, I think such un-nuanced statements are often counterproductive to the design approach at the heart of Rust.

I’m going to spend the rest of this post unpacking that sentiment.

Pluralism and positive sums

In the run-up to 1.0, the Rust community went through a process of articulating the value propositions of the language, and — relatedly! — the design values for the project. We developed a pattern of slogans that summarized our understanding at that point:

Memory safety without garbage collection
Abstraction without overhead
Concurrency without data races
Stability without stagnation

and ultimately: Hack without fear.

The common thread here is reconciling oppositions. Not just finding a balance in a tradeoff, but finding ways to reduce or eliminate the tradeoff itself. In our 2016 RustConf keynote, Niko and I talked about this as the Rust community “knowing how to have our cake and eat it too”, as part of our challenge to the community to take another such step:

In short, productivity should be a core value of Rust, and we should work creatively to improve it while retaining Rust’s other core values. By the end of 2017, we want to have earned the slogan: Rust: fast, reliable, productive—pick three.

Of course, such reconciliations are not always possible, and certainly aren’t easy. It’s an aspiration, not an edict. But Rust’s culture and design process is engineered to produce such outcomes, by embracing pluralism and positive-sum thinking:

Pluralism is about who we target: Rust seeks to simultaneously appeal to die-hard C++ programmers and to empower dyed-in-the-wool JS devs, and to reach several other varied audiences. That’s uncomfortable! These audiences are very different, they have divergent needs and priorities, and the usual adage applies: if you try to please everyone, you won’t please anyone. But…
Positive-sum thinking is how we embrace pluralism while retaining a coherent vision and set of values for the language. A zero-sum view would assume that apparent oppositions are fundamental, e.g., that appealing to the JS crowd inherently hurts the C++ one. A positive-sum view starts by seeing different perspectives and priorities as legitimate and worthwhile, with a faith that by respecting each other in this way, we can find strictly better solutions than had we optimized solely for one perspective.

I can’t tell you the number of times I’ve experienced positive-sum outcomes when working with the Rust community. Times when I’ve ended up with a design much better than the one I started with, and got there because I thought it was important to listen to people with different priorities.

But there’s a lot of nuance here. Rust does not seek to be a language for everyone, but the audiences and use cases it does target are nevertheless diverse. And pluralism happens at the level of community and goals, not at the level of the actual design. We don’t embrace “there’s more than one way to do it” as a goal for our designs, nor do we “take the average” between opposed priorities (and please no one). Ultimately, we have to make hard decisions.

It’s the formal Rust teams, the people who make the final decisions, who are tasked to take in and care about a plurality of perspectives, but ultimately put forth a singular, coherent vision. They are the keepers of the vision, the counterbalance to the process of exploration and give-and-take.

Fear and power

Second, [we must] “defend” the language many times, but failing once has dire consequences. No matter how good the defenders are, they are going to let something slip from time to time.

(from Fortifying the process against feature bloat)

Many times, the language team hasn’t had a chance to even read the thread before it spirals out of control like this one, because every little bit of discussion makes you feel like you’re losing the fight.

(comment from @rpjohnst)

The idea that discussions can be “purely technical”, i.e. devoid of emotional content, is bogus. If we care at any level about what we’re discussing, then our emotions are going to play a role, and more likely than not, they will spill over onto the thread.

People care about Rust. It resonates with their values and experiences, in specific and highly personal ways. Because of that context, seeing a proposal that appears at odds with those values and experiences can be distressing. And that feeling is only heightened when you also feel you have limited power. Someone else is making the decision, there seems to be growing momentum around it, and so you reach for the only tool you have: raising your voice as loud as you can.

And so we come back to Linus’s issue of “subtle” communication. His recommendation is to amplify these feelings, to yell loud to make sure you’re heard. “I’m against every idea in this proposal”. “This feature will ruin Rust”. “Rust is heading in the wrong direction”.

These feelings are real and legitimate. But embracing and amplifying them works directly against the principles of plurality and positive-sum thinking. Escalation encourages a zero-sum environment, an us-versus-them battle, completely at odds with the positive-sum thinking that has led to Rust’s best innovations. And it’s a vicious cycle: if everyone is yelling, truly listening becomes very painful, and you “grow a thicker skin” in part by learning to not take other people’s feelings so seriously… which means they need to yell louder…

Humility and trust

Those that do argue for the proposal you hate often don’t have a strong opinion one way or the other yet—they may bring up counterpoints just to have them on the table, or to explore the design space. And, you should note, they often do wind up agreeing with you!

(comment from @rpjohnst)

Fear and creativity don’t mix. Working in a positive-sum, pluralistic way requires significant vulnerability and emotional labor:

Humility, in order to genuinely question the instinct that your values, ideas and opinions are the Right Ones.
Empathy, in order to genuinely “put on” someone else’s perspective, needs and values as if they were your own.
Introspection, in order to reach a deeper understanding of your own impulses and values.

We look for these skills when selecting people to join the Rust teams, and we expect them to do this kind of work when exploring a design space. But this is delicate work, and we do it best when the work is shared by the whole community, not just team members. And, in particular, “unsubtle” shouting driven by fear makes this work so, so much harder.

This is why I feel distraught when I see accusations of bad faith, of people having an “agenda” and the “listening” done in the RFC process being a charade to avoid revolt. Or the sense of “luckily enough of us yelled to stop the terrifying original proposal from happening; the moment we stop speaking up, Those People will start pushing in that direction again”. All of these sentiments indicate a rising distrust, a zero-sum power-focused framing, with a dose of tribalism to boot.

What we need is to work against the vicious circle of escalation by creating a virtuous circle instead, based on humility and trust. If we can trust each other to listen and take concerns seriously, we free ourselves to be uncertain about those concerns, and open to possibilities that superficially work against them. In other words, we free ourselves to communicate and explore with subtlety and nuance. Trust and humility go hand-in-hand. And they are the key to finding positive-sum outcomes.

A code of conduct is not enough. Being “nice” is not enough. We need to take a leap of faith and embrace humility and trust in our discussions. It is my strong belief that doing so will lead to strictly better ideas and decisions, enabling us to find positive-sum outcomes. But I also think it’s vital for keeping our plural community whole and inclusive.

In the next post in this series, I’ll present some concrete case studies from Rust’s past and present, examining how the discussions functioned and what we might learn from them.

http://aturon.github.io/tech/2018/06/02/listening-part-2

aturon.log: listening and trust, part 1

May 25, 2018 Updated May 25, 2018

Show full content

For me, most weeks working on Rust are fun — exhilarating, even. But, just like with anything else, some weeks are hard.

As this week draws to a close, I feel troubled. On the one hand, things are looking strong for the 2018 Edition (which I want to write more about soon). But on the other hand, this week I locked two RFC threads, flagged a bunch of comments for moderation, and generally absorbed a lot of emotion from a lot of different quarters of the community. There’s a sense of simmering distrust.

I worry sometimes about becoming a victim of our own success: if our community grows more quickly than we can establish shared values/norms/culture, we could so easily descend into acrimony and tribalism. I’ve seen other language communities go through very painful periods, and I’m eager to try to steer Rust’s community around them if we can.

I’m a strong believer in the fundamental importance of listening for building trust. But I’ve realized that talking is also important, and that Rust’s leadership needs to do a better job broadcasting about the people and process side of the project. This post is the beginning of an ongoing series; posts like this will form a “leadership diary”, focusing on my highly personal perspective as a leader — not on technical issues but rather on how the project runs.

This week saw several controversies:

An RFC to “undo” impl Trait in argument position, a feature that recently shipped in stable Rust.
Outcry about keyword reservations for the 2018 Edition.
Heated discussion on numerous threads about the role and importance of emoji reactions in GitHub.

These may seem unrelated, but I think they all boil down to the same core issue: listening and trust.

When I first started working on Rust in mid-2014, the RFC process had just been put into place, and we were collectively grappling with how to make it work. At that time, there was a weekly video meeting, comprised mostly of Mozilla staff, in which RFC decisions were made (amongst other things). You can see the history of this meeting here, all the way up to the point it was shut down, just after Rust 1.0.

Looking back, it’s hard for me to believe that things used to operate this way. And although the process is very different now, I sometimes think those early, closed-door, Mozilla-centric meetings were a kind of “original sin” that laid seeds of distrust that we’re still working through today.

The Great int Debate, and the No New Rationale rule

A critical turning point came at the end of 2014, stemming from a rather innocuous-seeming issue: what to call the types that eventually became isize and usize.

At the time, the types were called int and uint, but these names had been debated on the issue tracker for over a year. As the time for Rust 1.0 drew near, finalizing these names was one of the countless “small issues” that needed to be settled for good. Seeing this as a relatively minor issue, project leaders read the comment history, discussed the matter in — you guessed it! — a closed-door meeting, and then posted an extensive writeup, which included a very important sentence:

We (the core team) have been reading these threads and have also done a lot of internal experimentation, and we believe we’ve come to a final decision on the fate of integers in Rust.

The result was… explosive. And rightfully so! I am forever indebted to glaebhoerl, who articulated the problem with painful clarity:

Importantly though: There was almost zero participation from members of the core team in the public discussion thread. That’s what I most think is not right. When anyone else has an opinion on an RFC that they want to express, whether in support or opposition, what they have to do is to lay out their reasoning as a comment in the discussion thread. Then other people can read, be swayed by it, or not, respond to it, and a productive discussion may ensue. Why is it a good idea for members of the core team to be entitled to skip this, to keep their reasoning and discussions to themselves, and only reveal it together with their final decision?

This moment crystallized the dysfunction in the early days of RFCs. I’m proud to say that the core team ultimately responded by going back to square one and fully engaging, and in the end, the decision was reversed.

But more important than that: the experience led to numerous shifts in the process. The most direct was codifying what I call the “No New Rationale” rule:

No New Rationale: decisions must be made only on the basis of rationale already debated in public (to a steady state)

Here’s what we say about this in the RFC process README (emphasis mine):

At some point, a member of the subteam will propose a “motion for final comment period” (FCP), along with a disposition for the RFC (merge, close, or postpone).
- This step is taken when enough of the tradeoffs have been discussed that the subteam is in a position to make a decision. That does not require consensus amongst all participants in the RFC thread (which is usually impossible). However, the argument supporting the disposition on the RFC needs to have already been clearly articulated, and there should not be a strong consensus against that position outside of the subteam. Subteam members use their best judgment in taking this step, and the FCP itself ensures there is ample time and notification for stakeholders to push back if it is made prematurely.

The “FCP” process, which involves consent of all subteam members, plays out entirely on the RFC thread, and is mediated by our beloved @rfcbot. And it’s specifically designed to signal that the team believes the discussion has reached a steady state, and give participants ample time to object if they disagree (or believe that some commentary hasn’t been sufficiently addressed).

In addition, all major project decisions must go through the RFC process.

The unifying theme here is a steady move away from “being in the room when it happens” to a fully inclusive process, and it’s something we’re always working to improve.

So with all of that, why am I troubled? Because I’m seeing increasing signs of distrust, “us vs them” thinking, and people feeling like they have to yell in order to be listened to. And I’m also seeing a lot of divergent understanding of how the RFC/decision-making process is supposed to work.

The Rust community prides itself on being a friendly and welcoming place, but it’s going to take constant, explicit work to keep it that way — and part of that work is being forthright about the cases where things have gotten less than friendly, pausing and working together to figure out why.

In the next post on this topic, I plan to focus on the kinds of breakdown I’ve been seeing, and some of my hypotheses about the underlying causes.

http://aturon.github.io/tech/2018/05/25/listening-part-1

Borrowing in async code

Apr 24, 2018 Updated Apr 24, 2018

Show full content

The networking working group is pushing hard on async/await notation for Rust, and @withoutboats in particular wrote a fantastic blog series working through the design space (final post here).

I wanted to talk a little bit about some of the implications of async/await, which may not have been entirely clear. In particular, async/await is not just about avoiding combinators; it completely changes the game for borrowing.

The core issue is that, while the Future trait does not itself impose a 'static bound, in practice futures have to be 'static because they are tossed onto executors like thread pools and hence not tied to any particular stack frame. Today, what that means is that futures-based APIs have to be careful not to hold on to borrows, and instead take ownership of whatever they need. That in turn leads to all kinds of unidiomatic patterns, including threading through ownership and widespread use of Rc and RefCell.

Idioms in the standard library

To see what I mean, it’s helpful to work through an example. Let’s take the read method from the standard library:

fn read(&mut self, buf: &mut [u8]) -> Result<usize, io::Error>

This method takes a mutable reference to both an I/O object and a buffer to read into, then does the read synchronously. That lets you write idiomatic code like the following:

let mut buf = [0; 1024];
let mut cursor = 0;

while cursor < 1024 {
    cursor += socket.read(&mut buf[cursor..])?;
}

This is perfectly ordinary code, in which we repeatedly take mutable borrows within a loop.

Idioms in futures today

If we wanted to translate the above to an asynchronous setting using futures, we’d need to use a futures-based analog to the read method. That exists today with roughly the following signature:

fn read<T: AsMut<[u8]>>(self, buf: T) ->
    impl Future<Item = (Self, T, usize), Error = (Self, T, io::Error)>

That signature looks rather different! The reason is that we want the returned future to be 'static, so we have to pass in (and return) ownership of both the I/O object and the buffer.

Not only is the signature more complicated: it’s also unwieldy to use, even if we employ async/await notation:

struct Buf {
    // box this up so we're not moving it around
    data: Box<[u8, 1024]>,

    cursor: usize,
}

impl AsMut<[u8]> for Buf {
    fn as_mut(&mut self) -> &mut [u8] {
        &mut self.data[self.cursor..]
    }
}

let mut buf = Buf {
    data: Box::new([0; 1024]),
    cursor: 0,
};

while buf.cursor < 1024 {
    match await!(socket.read(buf)) {
        Ok((new_socket, new_buf, n)) => {
            socket = new_socket;
            buf = new_buf;
            buf.cursor += n;
        }
        Err((new_socket, new_buf, e)) => {
            socket = new_socket;
            buf = new_buf;
            Err(e)?
        }
    }
}

While we could take steps to make this particular example easier, the fact is that requiring you to always move values in and out of async code prevents you from following the usual Rust idioms for borrowing.

Borrowing in async code

You might wonder: why can’t we just use the following signature instead?

fn read<'a>(&'a mut self, buf: &'a mut [u8]) -> impl Future<Item = usize, Error = io::Error> + 'a

And indeed, you can write and implement such a function; you just can’t effectively use it. The problem is that the future you get back contains borrowed values, which today will prevent it from being used in most futures-based code, due to there being a 'static requirement to ultimately execute futures.

This is where the async/await plan comes in: you can await a future with borrowed data, while still being 'static overall!. This is what it means to support “borrowing across yield points”, as explained in @withoutboats’s post.

In particular, using this borrowing version of read, we can write:

async {
    let mut socket = /* .. */;
    let mut buf = [0; 1024];
    let mut cursor = 0;

    while cursor < 1024 {
        cursor += await!(socket.read(&mut buf[cursor..]))?;
    };

    buf
}

and the type of the async block will be:

impl Future<Item = [0; 1024], Error = io::Error> + 'static

Despite the fact that we borrow internally within the async block, the block as a whole produces a 'static future which we can spawn onto a thread pool or other executor.

In other words, the async/await proposal allows you to write fully idiomatic Rust code that runs asynchronously. That applies even to signatures; the borrowing version of async read will ultimately look as follows:

async fn read(&mut self, buf: &mut [u8]) -> Result<usize, io::Error>

This signature is exactly the same as for the synchronous version, just with an async on the front.

The implications

The bottom line is that async/await isn’t just about not having to use combinators like and_then. It also fundamentally changes API design in the async world, allowing us to use borrowing in the idiomatic style. Those who have written much futures-based code in Rust will be able to tell you just how big a deal this is.

Right now the networking WG is focused on landing async/await itself (which will probably happen soon), and providing a migration path for the futures crate. Once those basics are in place, though, we’ll be able to revisit APIs throughout the async stack and make them more idiomatic. With luck, we’ll have a very strong story in place for Rust 2018.

If you’re interested in getting involved in this effort, please check out the Net WG gitter and repo!

http://aturon.github.io/tech/2018/04/24/async-borrowing

Futures 0.2 is here!

Apr 6, 2018 Updated Apr 6, 2018

Show full content

As of this morning, the futures crate version 0.2.0 is now available on crates.io! You can get the full low-down on the changes from my earlier post; here I’ll review the overall roadmap and what this release means.

Our goal is to ship async/await in Rust 2018 (roughly by mid-September), and to ship futures 1.0 this year. All told, this work will provide a stable and ergonomic foundation for async programming in Rust. But it will take a few steps to get there!

Today’s release

The 0.2.0 release today marks an important snapshot of our progress so far:

It completely revamps the task/executor system — the most hairy and confusing part of futures 0.1.
It sets up the crate to more easily allow for iteration.
It makes a large number of long-standing API tweaks that required breakage.

It has also been fully integrated into both Tokio and Hyper under experimental feature flags.

What’s ahead

Concurrent with the 0.2.0 release, we’ve posted two RFCs to rust-lang covering, respectively, the language and library additions needed to support async/await notation.

On the library side, the RFC proposes two significant changes to futures-core compared to the freshly-minted 0.2 release:

The use of pinned types to enable borrowing within futures.
Removing the associated Error types (and adjusting combinators accordingly).

It’s not currently possible to make these changes, because of rustc limitations, but these are expected to be addressed quite soon.

The overall plan is to:

Immediately begin work a 0.3 branch that fully matches the library RFC.
Publish the 0.3 version, initially as nightly-only, as soon as the rustc limitations around pinning are lifted.
Publish a 0.3.x version that works on the stable channel, as soon as pinning is stable.
Publish a 0.3.x version that simply re-exports the core APIs from std, once they are available.

In other words, the 0.3 release will be forward-compatible with the std version of futures-core APIs.

TL;DR: experimentation and feedback on this 0.2 snapshot is very welcome, but we anticipate a 0.3 release relatively soon. That release will set a stable foundation for futures-core, after which we can focus on iterating the rest of the stack to take full advantage of async/await!

http://aturon.github.io/tech/2018/04/06/futures2

Cargo, Xargo, and Rustup

Apr 6, 2018 Updated Apr 6, 2018

Show full content

Another topic of discussion at the Berlin Rust All Hands was the long-term story around Cargo, Xargo, and Rustup. The latter two tools are both involved in managing your Rust toolchain, with Xargo allowing you to build custom stds and Rustup managing pre-built artifacts for mainstream targets. Xargo is most commonly used for cross-compiling to less common platforms, but can also be used to customize the standard library on mainstream platforms.

The tools today are a bit of a muddle: Xargo acts as a CLI wrapper around Cargo, while Rustup is a completely separate tool, despite the fact that they handle some similar responsibilities. Moreover, Rustup cannot manage targets set up by Xargo. And there’s long been a desire for toolchain requirements to be expressed directly within Cargo.toml, so that cargo build is all that is ever required to build a Rust package.

Given all that context, we’ve talked at various junctures about simply integrating all of the above functionality into Cargo. We had another such discussion at the All Hands, which I’ll summarize below. Note: as always, this summary covers preliminary thoughts; all changes will go through the usual RFC process.

Xargo integration

There is widespread agreement that Xargo’s functionality should instead be expressed directly within Cargo, along the rough lines of the std-aware Cargo RFC. In particular, we strongly want the ability to enrich std with feature flags and enable crates to customize those flags.

There are a lot of open design questions here; in particular, there is not consensus on the details in the std-aware RFC. There are a lot of moving parts, since the ability to specify details for the standard library is closely related to specifying details about the compiler version and release channel as well.

The upshot, though, is that once we have this integration the concept of “target” within Rustup can disappear entirely (though, of course, the target must be specified at some point in the build process). Rather than manually installing a list of targets per toolchain, Cargo will automatically set up targets as needed, either building them or downloading cached binaries when available.

Rustup integration

In general, folks didn’t see a lot of value in having toolchain management handled by a separate command from Cargo, especially given the desire for crates to specify toolchain dependency information. Hence, while there will probably always be a need to have a rustup binary, we want to explore exposing its functionality through Cargo instead.

Here, again, the hard work is in the details. We probably want some combination of:

Toolchain declarations within Cargo.toml, as with the Xargo integration.
Automatic toolchain component installation.
New Cargo subcommands for manual toolchain adjustments.
Additional settings in .cargo/config.

All told, the hope is that we can replace some of Rustup’s unique aspects (like its override system) with uses of standard Cargo concepts (like Cargo.toml and .cargo/config), and generally speaking to make toolchain management “disappear”, instead driving it on demand as part of the normal project workflows.

One other insight: since rustup the tool will likely stay around in some form, we can retain its CLI for niche cases, meaning that the Cargo integration only needs cover the common case, and can thus likely make some simplifications.

The plan

The most pressing issue to address for the Rust 2018 release is the needs of the Embedded WG, where Xargo is currently commonly used to set up embedded targets. It turns out, though, that by raising a handful of those targets to Tier 1 status (which has other benefits besides), the large majority of these cases will be covered just by using Rustup.

On the whole, the Rust community is entering another “impl period” like state: for the next several months, the focus will be on executing all of the plans already laid for Rust 2018, rather than on brand new design. So the relevant teams plan to pick back up these integration questions in the last quarter of the year, after the 2018 edition has shipped.

http://aturon.github.io/tech/2018/04/06/rustup-xargo

Sound and ergonomic specialization for Rust

Apr 5, 2018 Updated Apr 5, 2018

Show full content

Specialization holds the dubious honor of being among the oldest post-1.0 features remaining in unstable limbo. That’s for good reason, though: until recently, we did not know how to make it sound.

There’s a long history here, but I’ll pick up from the immediately previous episode: Niko’s blog post on “max-min” specialization. While that post showed, for the first time, an “obviously sound” approach to specialization, it came at a severe ergonomic cost for the ecosystem. This post proposes a twist on Niko’s idea that avoids its downsides.

Restating the problem

First, let’s reintroduce our nemesis: lifetime dispatch. I wrote at length about this problem before, but the core issue is that for specialization to be sound, we need type checking and code generation (“trans”) to agree on what it does. But there are two big things that happen between those compiler phases:

Monomorphization, which instantiates all generics with actual types.
Lifetime erasure, which destroys all lifetime information.

Lifetime erasure means that we must prevent lifetime-dependent specializations like the following:

trait Bad {
    fn bad(&self);
}

impl<T> Bad for T {
    default fn bad(&self) {
        println!("generic");
    }
}

// Specialization cannot work: trans doesn't know if T: 'static
impl<T: 'static> Bad for T {
    fn bad(&self) {
        println!("specialized");
    }
}

fn main() {
    "test".bad() // what do we see?
}

In a case like this, the type checker has more information than trans does, and hence will use the more specialized impl. Trans, by contrast, has lost the information that "test" is 'static, and hence cannot use the specialized impl. This kind of disagreement can easily cause soundness problems.

Unfortunately, it’s not so easy to fix, partly because “lifetime dependence” can happen in very subtle, indirect ways, and partly because monomorphization causes information mismatches in the opposite direction, where trans knows more than the type checker.

Bottom line, we’ve searched long and hard in this space and come up short – until fairly recently.

Niko’s max-min proposal

In Niko’s latest blog post, he proposes an ingenious strategy for ensuring soundness: we simply bake in the needed requirements for any traits we wish to specialize on.

In particular, we want a specialization to occur only when the relevant impl is “always applicable” (i.e. regardless of how type and lifetime parameters are instantiated). This always-applicable test, in particular, ensures that the difference in knowledge produced by monomorphization and lifetime erasure cannot matter.

The blog post is fairly long and contains several extensions, but basically boils down to the following: an impl is always applicable if:

it is fully generic with respect to lifetimes (no repetitions, use of 'static, or constraints),
it doesn’t repeat any generic type parameters, and
the only trait bounds that appear are for “always applicable traits”.

Specialization is only allowed when the more specialized impl is always applicable.

An “always applicable trait” is one that is marked with a special attribute, #[specialization_predicate], which means that all impls of the trait must be always applicable. In other words, it forces the “always applicable” property to apply recursively to all the traits involved in an impl.

Now, this works quite well when specializing a blanket impl with an impl for a concrete type:

impl<T> SomeTrait for T { /* default fns */ }
impl SomeTrait for SomeType { /* specialized fns */ }

That’s because in this case, we aren’t adding any extra trait bounds nor type parameters.

However, the proposal has a major downside, and one that it seems was not well understood by the broader community: specialization based on traits like TrustedLen requires those traits to be specially-marked, and doing so is a breaking change!.

In particular, suppose we want to do the following:

impl<T: Iterator> SomeTrait for T { .. }
impl<T: Iterator + TrustedLen> SomeTrait for T { .. }

According to the definition above, this is only allowed if TrustedLen is marked with #[specialization_predicate]. But nothing prevents there from being impls like the following today:

impl TrustedLen for MyType<'static> { .. }

Adding the #[specialization_predicate] would make such impls illegal, breaking downstream code. And more generally, both adding or removing the attribute is a breaking change, forcing all trait authors to make a difficult up-front decision, and meaning that none of the existing traits in the standard library could be used as a bound in a specializing impl.

A new idea

Last week at the Rust All Hands in Berlin, I talked to some members of the Libs Team about the max-min proposal and it became clear that they’d missed the above implications – and they were left quite dejected. “So does specialization solve any of the original use cases?”

Naturally, that got me thinking whether we could do better, and I think we can – basically by slightly repackaging Niko’s insight.

The key idea: Rather than a per-trait attribute, we provide an explicit specialization modality for trait bounds. That is, you write something like specialize(T: TrustedLen) in a where clause. This specialization mode is more selective about which impls it considers: it effectively drops any impls that constrain lifetimes or repeat generic parameters. Trait bounds, however, are fine; they are just interpreted within the specialize mode as well, recursively. Thus, an impl is “always applicable” if:

it is fully generic with respect to lifetimes (no repetitions, use of 'static, or constraints),
it doesn’t repeat any generic type parameters, and
the only trait bounds that appear are (recursively) within the specialize mode.

This is easiest to see by example:

trait Foo {
    fn foo(&self);
}

impl<T> Foo for T {
    default fn foo(&self) {
        println!("generic");
    }
}

// The compiler refuses this specialization: it is not always applicable
impl<T: 'static> Foo for T {
    fn foo(&self) {
        println!("specialized");
    }
}

trait SomeTrait {}
impl SomeTrait for i32 {}
impl SomeTrait for &'static str {}

// The compiler refuses this specialization: it is not always applicable
impl<T: SomeTrait> Foo for T {
    fn foo(&self) {
        println!("specialized");
    }
}

// The compiler **accepts** this specialization, because `specialize(T: SomeTrait)`
// filters the applicable impls to only the "always applicable" ones.
impl<T> Foo for T
    where specialize(T: SomeTrait)
{
    fn foo(&self) {
        println!("specialized");
    }
}

fn main() {
    true.foo(); // prints "generic"
    0i32.foo(); // prints "specialized:"
    "hello".foo(); // prints "generic", because the `&'static str` impl for `SomeTrait` is ignored
}

Interestingly, this design is almost latent in the original RFC! The new mechanism here, though, is an explicit filtering of impls via specialize. This explicit filtering is helpful not just for soundness, but to remind the programmer that specialization is not considering all impls, but rather a filtered set.

Moreover, it should be possible both within the type checker and in trans to detect cases where the “naive” (unfiltered) specialization algorithm would have produced a different result, and produce a warning in such cases.

I believe that this approach is as “obviously” sound as Niko’s proposal; it could even be understood as a kind of sugar over his proposal. And given our experiences in this area, I’ve long since believed that the only acceptable solution would have to be “obviously” sound – no clever tricks. The specialize modality has a very natural interpretation in Chalk, where we are already juggling other modalities related to crate-local reasoning.

Finally, it’s worth saying that the particular mechanism here is orthogonal to the many other design questions around specialization, including things like “intersection impls”, as well as the other extensions mentioned in Niko’s previous post.

While I’m doubtful that specialization will make it for the Rust 2018 release, I think that with luck it could stabilize this year.

http://aturon.github.io/tech/2018/04/05/sound-specialization

Custom tasks in Cargo

Apr 5, 2018 Updated Apr 5, 2018

Show full content

One of the big requests from the Domain Working Groups for Rust 2018 is a richer feature set for framework- or domain-specific workflows in Cargo. At the simplest level, that might look like project templates – the ability to direct cargo new to start with a custom template defined in crates.io. That’s already enough to get you cooking with frameworks like QuiCLI, which today involve a fixed set of initial scaffolding that you can fill in.

More ambitiously, though, working within a particular framework or domain may require special workflows after initial project creation. For example, a web framework might want to provide workflows for making database changes or adding new resources.

At the Rust All Hands in Berlin last week, the Cargo team and other stakeholders talked about these desires and cooked up a simple but compelling plan to address them.

Cargo tasks

The core idea is extremely simple. We add a [tasks] section to Cargo.toml, with entries resembling normal dependencies. However, the binaries provided by those packages are automatically available from the Cargo CLI via the task subcommand.

Suppose for example that we have the following in Cargo.toml:

[tasks]
rust-on-rails = "0.1"

If the rust-on-rails crate provides server and console bins, then you’d be able to type:

> cargo task server
> cargo task console

at the CLI to invoke those binaries.

Ultimately, we may want to avoid the need for writing task, but this raises questions about conflicts with built-in and installed custom commands that we didn’t want to get into.

Anyway… that’s it! A very simple but powerful idea.

Metapackages

In subsequently discussing these ideas with @wycats, he (as always) raised a very astute point: in some package managers, the existence of project templates has made it easy to set up leaky abstractions. For example, if we do have a rust-on-rails crate, it would probably provide a Cargo template that would include several sections of Cargo.toml – at the very least, both [dependencies] and [tasks]. But that’s not really what we want; conceptually, these are all part of the same framework, and should be versioned together, requiring only a single entry to bring into your project.

Incidentally, the same is already true of things like custom derives and build scripts, where to use what is conceptually a single package requires multiple bits of setup.

A while back I proposed metapackages as a way of grouping and versioning a set of dependencies. But in my chat with @wycats, we had the insight that metapackages could more generally be a way of abstracting a chunk of Cargo.toml, including not just normal dependencies, but also tasks, build scripts, and more.

In this brave new world, a single dependency entry in Cargo.toml is generally all that is ever needed to bring in a conceptual package.

Open question: what might this mean for things like metabuild?

Today’s custom subcommands?

One open question: if we provide [tasks], how should we think about today’s custom subcommands (generally set up via cargo install)?

One possibility would be to allow for a [tasks] section in .cargo/config, basically using the same mechanism for all workflow customization. But this raises questions about conflicting names, global lockfiles, and more. More thought and design is needed.

Prior art?

Before finalizing any design here, we should do a survey of existing package managers, many of which offer similar functionality and have learned painful lessons.

The plan

The most immediate step along these lines is to write and implement an RFC for Cargo templates, which @withoutboats plans to do.

After that, I’m hoping to pair up with @ag_dubs to dig into the ideas in this post and put together an RFC. In the meantime, though, please let me know if you have thoughts or pointers to prior art!

http://aturon.github.io/tech/2018/04/05/workflows

Putting bors on a PIP

Mar 19, 2018 Updated Mar 19, 2018

Show full content

We have a problem: the average queue of ready-to-test PRs to the main Rust repo has been steadily growing for a year. And at the same time, the likelihood of merge conflicts is also growing, as we include more submodules and Cargo dependencies that require updates to Cargo.lock.

This problem could threaten our ability to produce Rust 2018 on time, because as the year progresses, there will be an increasing amount of contention for the PR queue. My goal in this post is to avoid this outcome, without reliving the Rust 1.0-era experience of Alex working full time to land massive rollups.

In particular, the Rust All Hands is coming up next week, and I think it’s a great opportunity to dive into these issues, so after chatting for a while with Alex I wanted to set out some ideas.

Goals and problems

There are two major bors experiences we care about:

Small PRs from early contributors. We want these to land very quickly to provide a good contribution experience.
Major PRs. We want to avoid requiring lots of rebasing, or having too long a delay before landing.

Let’s say “bors time” is the amount of time a PR is in mergable and r+’ed state but not yet landed. Quantitatively, the above goals probably map to:

Low average bors time
Low maximum bors time

Rollups today

Today, rollups generally group together a large number of small PRs; we then attempt to land those rollups aggressively. The result is improved average bors time, but often at the cost of worsening maximum bors time. That’s because of a few factors:

Rollups generally prioritize small PRs over old PRs. That is, the “normal” order for bors is to attempt the oldest ready-to-test PR. But when doing a rollup, we cherry pick throughout the queue, and then give maximum priority to the rollup.
Rollups generally bounce on the first couple of attempts, which is “wasted” time during which an older PR might have been able to merge.

In addition, rollups tend to cause a need for rebasing, which for major PRs introduces significant extra latency: the author has to get around to doing a rebase, then get back in the queue, with a non-trivial chance of being pre-empted by another rollup that requires further rebasing.

Overall queue length

The steady state of the queue has been worsening over time. The average queue length has roughly quadrupled over the last year, from 5 ready PRs to 20.

The longer the queue, the worse the situation is for major PRs, because the effects of rollups and rebase requirements is multiplied by the standing queue length.

Some ideas

I believe that, because of rollups, our current average bors time remains tolerable. But our maximum bors time has gotten quite bad as the steady-state queue size has continued to grow. We need to rebalance.

Reduce absolute cycle time

The most direct action is to reduce absolute cycle time by improving the build system. That will help across the board, and there is usually low-hanging fruit to be had on whatever our current-slowest build scenarios are.

I’m not qualified to say much more here, but I’m hopeful that, during the All Hands, the Infra Team can work together to come up with plans or guidance on this front.

Reduce failures

Failed builds are often very expensive, since failures often occur late in the build cycle, meaning that we lose ~2.5 hours of serialized work.

Spurious failures

Currently, spurious failures largely come down to timeouts; reducing absolute cycle time will help.

More radically, we could consider storing artifacts at each build stage, allowing us to retry a build without going through the cycle scratch. But that would amount to a complete reworking of the build system, and probably isn’t plausible for the Rust 2018 timeline.

Legit failures

But there are also “legit” failures — and there’s potentially a lot we could do to help there. We effectively have a two-stage CI system today:

Stage 1: automatic PR testing. Today this is a single Travis build, hence a small slice of our overall test suite. Stage 1 testing is almost always complete by the time a reviewer looks at a PR.
Stage 2: serialized, full PR testing via bors.

Stage 1 testing is parallelizable and masks queue length because it’s generally dominated by the time it takes to get a review. We can decrease the likelihood of legit failures by testing more build scenarios in stage 1. For example, we could gate stage 1 on Windows as well as Linux. And we could include more of the test suite in stage 1. Generally, we have some amount of free capacity to work with here, and we can always through in additional builders to get more.

We should also consider strongly gating on stage 1 passing before ever attempting stage 2 testing on a PR.

More generally, are there ways we can better take advantage of the two-stage system we have today, and the way in which stage 1 is “masked” by review latency?

Being more strategic with rollups

Right now rollups generally gather together a number of “easy” PRs. However, this comes at the cost of “hard” PRs, because rollups skip them in the queue, thus forcing a rebase. And rollups themselves tend to bounce on the first few tries, essentially blocking the build queue for some period.

Here are some ways we could be smarter about rollups:

Prioritize a rollup PR as if it is as old as the oldest PR is contains. That is, in bors’s normal queue ordering, a rollup would not make it possible to “jump the queue”. Rather, it would be running at the same time as its first PR normally would, but we’d be trying to “get more” out of the run.
Make it possible to pre-test rollup PRs with a greater number of build scenarios, without blocking the queue. We could do this by expanding the set of stage 1 tests (mentioned above) in general, or by having a separate “try”-style build command for rollups that tests a larger subset, but still much smaller than the full build (i.e. “Stage 1.5”). We should make it possible to aggressively run this larger suite of tests on a rollup build, before going to the full test suite. The core idea is that a rollup should ~never bounce.
Fix test failures within a rollup, rather than removing PRs from the rollup. This can be done by pushing changes back to the original PR branches. Basically, once we’ve invested in including a PR in a rollup, we should “see it through”.
Consider including “bigger” PRs in a rollup, and seeing them through as above.

Data to gather

There’s a bunch of data that would be great to have before the All Hands to help guide discussion:

How often are we making rollups?
How many PRs are included in the average rollup?
How often do rollups bounce?
- What % spurious?
How often do non-rollups bounce?
- What % spurious?
For non-spurious failures, what are the major cases we’re missing in stage 1 that we catch in stage 2? E.g., Windows? A particular part of the test suite?

What else?

Are there other near term steps we could take to head off problems with the queue length? (While long-term improvements are important too, the immediate risk is that we will struggle to produce Rust 2018 over the next few months).

http://aturon.github.io/tech/2018/03/19/bors

Futures 0.2 is nearing release

Feb 27, 2018 Updated Feb 27, 2018

Show full content

On behalf of the futures-rs team, I’m very happy to announce that the master branch is now at 0.2: we have a release candidate! Barring any surprises, we expect to publish to crates.io in the next week or two.

You can peruse the 0.2 API via the hosted crate docs, or dive right in to the master branch. Note that Tokio is not currently compatible with Futures 0.2; see below for more detail.

What’s Futures 0.2 about?

The Futures 0.2 release is all about putting us on a road toward 1.0 this year. To that end, it:

Makes numerous long-desired API improvements, many of which are breaking changes.
Positions the crate for significant iteration this year by (temporarily!) breaking it into a number of independently-versioned subcrates.

The full details are in the three RFCs, but we’ll review the high level changes here.

API improvement: explicit task contexts

The heart of the futures library is its task system. But historically, that task system was almost invisible: information about the current task in 0.1 was provided via implicit context (a thread-local variable):

// The 0.1 API for task contexts:
fn current() -> Task;

While this implicit context had some ergonomic benefits, it was also a major stumbling block for learning futures, and meant that you needed to carefully read documentation to know whether a given function could only be used “in a task context”.

In Futures 0.2, we instead deal with task contexts via an explicit argument:

// Futures in 0.2
trait Future {
    type Item;
    type Error;

    fn poll(&mut self, cx: task::Context) -> Poll<Self::Item, Self::Error>;
}

While we believe the ergonomic hit here is minor, we were also encouraged by an ingenious construction from @seanmonstar showing how to recover the ergonomics of the 0.1 API.

As a happy by-product, the APIs for working with task-local data are now substantially more pleasant, giving direct mutable access to the stored data:

// The 0.2 API for working with task-local data
impl<T> LocalKey<T> {
    fn get_mut<'a>(&'static self, cx: &'a mut Context) -> &'a mut T
}

API improvement: overhauled executors

Executors in 0.2 are vastly simplified compared to 0.1, while supporting a wider range of functionality.

First, we codify that futures are always run in the context of an executor on which they can spawn additional tasks:

// An API on `task::Context`:
impl Context {
    fn spawn<F>(&mut self, f: F) where
        F: Future<Item = (), Error = Never> + 'static + Send;
}

Baking in an executor as part of all task contexts makes it much easier to coordinate execution choices.

The “out of the box” ways of executing futures change as well:

The new ThreadPool executor replaces CpuPool as a general purpose task executor, and provides a streamlined set of APIs for getting things running. It provides “M:N” task scheduling.
The new LocalPool executor provides single-threaded (“M:1”) task scheduling, which is appropriate for mostly I/O-bound tasks. Since it is single threaded, it supports non-Send tasks. This executor is ultimately intended to replace the old built-in executor in Tokio.
The wait methods, which block on futures (and friends), have been replaced with a new top-level block_on function designed to be harder to misuse.

A common theme with the built-in executors is removing footguns from the previous design, either by detecting problematic situations and panicking, or by structuring APIs in a more natural and intuitive way.

Finally, there are a host of simplifications to the way you implement executors. The numerous traits and types of 0.1 now boil down to just two key constructs: the Wake trait, which itself has been simplified, and the Context type, which is very simple to construct, and is all you need to execute a task.

API improvement: core I/O interfaces

The Futures crate will now ship with an async equivalent to std::io, namely AsyncRead and AsyncWrite traits and numerous conveniences for working with them.

These traits previously lived in the tokio-io crate, but they are in no way specific to Tokio as a backing source of I/O. This new setup provides all the core I/O interfaces at the futures level, with the intent that libraries can use them to be event loop agnostic. (Note, however, that codec support will remain in Tokio).

The traits are also updated in several ways:

They no longer inherit from Read and Write, eliminating a major source of confusion; instead, there are specific adapters that allow you to pass async I/O objects into sync APIs.
The vectored I/O operations are now based on the more foundational iovec library, which allows AsyncRead and AsyncWrite to be object safe, and to decouple from more opinionated buffering stories. Use of e.g. the bytes crate can be layered on top.

API improvements: top-to-bottom cleanup

In addition to the highlights above, a whole host of APIs received minor tweaks, including renamings, generalizations, adjustments for consistency, and so on. With the 0.2 release, we’re clearing out a long backlog of such requests.

The API documentation has also been completely reworked.

Supporting further design iteration

While we’re making a bunch of improvements in this release, there are still some known issues and places where changes are expected (see below for some more detail). Our goal this year is to iterate the crate to a 1.0 state, but we want to minimize ecosystem pain while doing so.

Starting with 0.2, the main futures crate is now a facade that simply re-exports from a number of separate crates. This allows us to decouple the key public APIs–Future, Stream, and the task system–from the myriad other APIs that work with them, versioning them independently. These core APIs are provided by the futures-core crate.

The upshot is that most of the async ecosystem can happily interoperate as long as they agree on a futures-core version; the rest of the futures APIs can usually be used with independent versions without harm. Since futures-core also contains the most stable of the futures APIs, we expect this to cut down on ecosystem coordination pain as we continue to iterate on the peripheral APIs.

To take advantage of this split, libraries are encouraged to use the futures-* crates directly, rather than the facade.

Ultimately, when we reach 1.0, the expectation is that all of these APIs will be re-incorporated into a single futures crate, and the facade will be no more.

More detail about this split is available in the RFC.

When will it be published?

TL;DR: most likely within a couple of weeks.

While we’ve been discussing and vetting the 0.2 changes publicly for some time, it’s important to get some real usage prior to publication. We’ve made substantial progress porting parts of Fuchsia to the new release, and expect to have a complete port soon. We will also be coordinating with the Tokio team, which intends to release a 0.2 to integrate with Futures 0.2.

If you are a Futures user, you are strongly encouraged the look at the docs and, if possible, try porting some code. Please open issues or reach out on #futures if you run into problems!

What’s the road to 1.0?

Concurrent with the release of Futures 0.2, we plan to release an updated version of futures-await that provides async/await notation with full borrowing support, due to @withoutboats’s great work in that area.

Beyond 0.2, there are several areas where further iteration is needed:

The initial support for borrowing with async/await will depend on unstable features, and thus will be provided by an external “shim” so that the core futures crate can continue to work on stable Rust. Once the ingredients are stabilized, we will need to update futures-core to remove the shim.
Borrowing support will also entail changes to combinators and possibly core traits (particularly Sink); we will need to work through the full set of ramifications.
We plan to investigate removing Error from Future and friends, which could clear up some longstanding issues with the combinators.
We plan to hone our backpressure story with Sink and bounded channels.

Changes in these areas will go through the RFC process.

Note that some of these changes affect futures-core, meaning that there’s likely to be at least one more disruptive bump before we hit 1.0.

We are also working on a book, Asynchronous Programming in Rust, that will provide comprehensive explanations of the library and how to use it, including exercises and case studies.

How to get involved

The Futures team wants to grow, and as part of 0.2 we’ve been pushing toward Rust-style governance to make it easier to get involved.

At the moment, the most valuable help is review of the 0.2 release candidate. You can report feedback via issues or the #futures IRC channel.

If you’re interested in any of the topics listed above for post-0.2 iteration, or otherwise see areas to improve, please reach out on the tracker or channel, or consider writing an RFC!

http://aturon.github.io/tech/2018/02/27/futures-0-2-RC

Closing out an incredible week in Rust

Feb 9, 2018 Updated Feb 9, 2018

Show full content

This week has been so amazing that I just had to write about it. Here’s a quick list of some of what went down in one week:

Breakthrough #1: @withoutboats and @eddyb tag-teamed to develop a safe, library-based foundation for borrowing in async blocks. It’s suddenly seeming plausible to ship async/await notation with borrowing as part of Rust Epoch 2018.
Breakthrough #2: @nikomatsakis had a eureka moment and figured out a path to make specialization sound, while still supporting its most important use cases (blog post forthcoming!). Again, this suddenly puts specialization on the map for Rust Epoch 2018. Update: the post is here!
Breakthrough #3: @seanmonstar came up with a brilliant way to make “context arguments” more ergonomic, which lets us make a long-desired change to the futures crate without regressing ergonomics.
Tokio reform: @carllerche shipped the newly reformed Tokio crate, with a plan for intercepting ongoing work with futures and laying a more stable foundation for async I/O in 2018.
Futures 0.2: @cramertj, @alexcrichton and I have completed and merged an RFC for futures 0.2, and the 0.2 branch made a ton of progress.
Domain working groups: we now have an all-star lineup for leading the 2018 Domain Working Groups:
- Networking services: @withoutboats and @cramertj
- WebAssembly: @fitzgen
- CLI apps: @killercup
- Embedded: @japaric
Libs Team restructuring: we finalized a revamp of the Libs Team, which will break out:
- a subgroup to manage std led by @alexcrichton,
- a subgroup working on discoverability led by myself, and
- a subgroup supporting ecosystem work led by @kodraus
A vision for portability in Rust: I finally wrote up the vision we’ve been working toward for a uniform way of handling portability concerns in Rust.

These are just the items that loomed large for me personally; one of the great things about how Rust has grown last year is that it has taken on an increasing set of leaders and teams doing great work independently. It’s now simply impossible to drink from the full firehose. But even a sip from the firehose, like the list above, can blow you away.

http://aturon.github.io/tech/2018/02/09/amazing-week

A vision for portability in Rust

Feb 6, 2018 Updated Feb 6, 2018

Show full content

TL;DR: This post proposes to deprecate the std facade, instead having a unified std that uses target- and capability-based cfgs to control API availability. Leave comments on internals!

Portability is extremely important for Rust, in two distinct (and sometimes competing!) ways:

Rust should be usable in almost any environment, and ideally much of the ecosystem would be as well.
Rust should be low-friction when writing for “mainstream” platforms (32- and 64-bit machines running Windows, Linux, or macOS).

An example of the tension between these two goals is handling allocation:

Some targets for Rust do not support allocation natively, so Rust must at least have a “mode” in which no allocation is assumed.
For “mainstream” applications and platforms, we want to assume not only that allocation is available, but that running out of memory is a catastrophic failure. Those assumptions are reasonable for a huge amount of software, and making them greatly reduces the friction to writing Rust code.

We’ve been slowly evolving a set of answers to this kind of question, and part of the point of this blog post is to step back and try to give a unifying vision for how to approach portability issues in Rust.

But first, let’s take stock of where we are today.

The status quo The facade

Rust’s standard library is actually made up of three “rings” of increasing assumptions:

core: assume “nothing” about the target platform.
alloc: assume that allocation is available.
std: assume that “mainstream” OS facilities are available.

In particular, std is partly a “facade” crate that re-exports almost all of the functionality from core and alloc. This factoring allows crates that target core to be seamlessly used with crates that target std, and led to the no_std flag. So far, only core and std are stable.

Problems with the facade

While the three-layer division may seem very clean, in practice things turn out to be far more complicated:

core does not in fact assume “nothing”: some core types like i128 and AtomicU8 are available within core, but not available on all platforms Rust targets. Thus, on some of these platforms, these definitions are simply missing (i.e. have cfg applied).
For non-mainstream OSes, often only a portion of std functionality is available. The remaining pieces are either cfg-ed out, return errors, or panic if you try to use them.
Because the crates are separated, there are some trait coherence issues, which std uses special magic to overcome.
Libraries have to specifically opt in to no_std and rewrite to use core rather than std. While it’s relatively rare for a library to just happen to be no_std compatible, it’s still a bit of a papercut.

The root issue here is that the three-layer arrangement is based on a particular division of environment capabilities, and that reality is not so simple.

Environment-specific extensions

Today we provide access to low-level or OS-specific services via the std::os module. APIs in this module are largely traits that extend the cross-platform APIs, and in particular can expose their OS-level representation. The fact that these APIs require explicitly importing from std::os provides a small “speed bump” for venturing out of guaranteed mainstream platform portability.

Problems with environment-specific extensions

The std::os module has submodules that correspond to a hierarchy of OS types. But it’s not at all clear how to use the module hierarchy to organize features like fixed-size atomic types, where the types available vary in a fine-grained way based on the CPU family; SIMD is even worse. And even the OS story is ultimately not such a simple hierarchy.

The “speed bump” for using std::os is minimal and easy to miss; it’s just an import that looks the same as any other.
Platform-specific APIs don’t live in their “natural location”. The majority of std::os works through extension traits to enhance the functionality of standard primitives, rather than providing inherent methods directly on the relevant types.

The vision

Rather than today’s assortment of approaches to portability, I propose the following consolidated story:

There is just std.
All APIs in std live in their “natural” location.
APIs not supported by a target are cfg-ed off for that target.
There are capability-based cfg flags.
You can use the portability lint to check for compatibility with arbitrary platform assumptions.

In short, I propose that we move away from the facade, the std::os model, and runtime failure, and instead embrace target- and capability-based cfgs as the sole way of expressing portability information.

The portability lint makes it possible to compile and test on one target while checking that you are not accidentally making assumptions based on that target. For example, by default Rust code will be checked for “mainstream” portability, so that even if you’re compiling on Windows, any use of a Windows-specific API will be linted against. If you want to be compatible with today’s “no_std” ecosystem, you can tune the knob to check that you are–but you won’t have to change from std to core. The RFC has full details.

To make this all work, we will need to give careful design to the set of cfg flags and their interrelations.

And to fully gain from abandoning the facade (i.e., to remove the special magic used in std today), we would need to use an epoch boundary to fully remove libcore.

As part of this effort:

std itself should likely be refactored to make maintaining the external cfg information as easy as possible, and to create a sharper division between public APIs and internal, platform-specific implementation.
We would need to reconceptualize “pluggability” into std. For example, today no_std allows you to define certain primitives, like panic handling, which are normally defined by std. We would need a way to instead swap out std’s default definition. Some related issues have come up in the wasm world, where ideally we would let you plug in your own JS imports to define things like printing to stdout.

Call to action

The vision above is deliberately sketchy. The fact of the matter is that the Rust project has never had a group of people tasked with thinking about portability and platform support from a holistic design perspective–and as we continue to expand Rust, we really need that.

In particular, we need help:

Implementing the portability lint.
Fleshing out a unified std design.
Designing a clean, coherent cfg hierarchy.
Refactoring std to make portability cleaner and easier.
Designing a more general “plugability” story.
Ensuring that we provide top-notch support for platform capabilities.

I propose that the Rust project spin up a dedicated Portability Working Group devoted to this work. The group will need a strong leader who can take a holistic, design-focused view of things. If you’re interested in leading or participating in such a group, please leave a comment on the internals thread!

http://aturon.github.io/tech/2018/02/06/portability-vision

Retooling the Rust Libs Team team for 2018

Jan 16, 2018 Updated Jan 16, 2018

Show full content

The Libs Team met today to discuss a weighty topic: what is its mission as a team, and are we set up to achieve it?

As team lead, I took the liberty of proposing a mission statement:

To improve the quality of the crate ecosystem, as a product.

Working backwards:

“as a product” means that we need to focus on the end-to-end experience people have with the ecosystem. It’s not enough to have great libraries if no one can find them. It can be a problem to have too many libraries. Docs count for a lot!
“the crate ecosystem” means that the Libs Team needs to look far beyond std and help look after the library ecosystem as a whole. The Libz Blitz was one of our first major attempts on this front.
“improve the quality” means that we don’t own or oversee the ecosystem, but that we work together with library authors to improve the experience. What quality means, and what aspects to prioritize, is of course also important to nail down.

That’s a lofty goal! Let’s take a look at how we’ve approached it in the past, and then talk about the future.

Please comment on the internals post!

The Libs Team circa 2017

Last year, the Libs Team split its focus onto two main topics:

Overseeing std.
The Libz Blitz.

For std, the work involved shepherding and deciding on RFCs and jointly reviewing PRs that impact the stable API surface. Despite the fact that std is not substantially growing, the workload here is sizable!

For the Blitz, the work involved leading crate evaluations, doing API walks in synchronous meetings, and working with crate authors to help push through changes. As by-products, the team also worked on the API guidelines and, to a lesser extent, the Cookbook.

These efforts definitely made a positive impact on our goals, but collectively the team feels that there’s more we could be doing, and that a rebalancing of priorities is in order–partly drawing on lessons from our 2017 work.

Retooling the team in 2018 Growing

One clear lesson from the Libz Blitz and the impl Period is that there are a lot of people out there who are excited to help improve Rust’s ecosystem, but we lack the infrastructure and leadership bandwidth to direct this energy effectively.

So the Libs Team needs to grow its leadership, and grow to accommodate people eager to pitch in. Today we announced two additions to the team, which is a good step.

However, a limiting factor is the current “monolithic” structure to the team, which means that every member is expected to participate in all activities, including signing off on std changes. To remove this bottleneck, we are considering a “working group” model, in which team members cluster into smaller working groups that tackle particular topics, where each member participates only in the groups they have time/interest for. Examples groups might be: std, SIMD, networking, API guidelines, cookbook. To some degree these groups exist informally now, but we want to be more systematic about them, explicitly delegating decision-making power and designating a lead for each group.

The working group model should allow us to drastically increase the number of people involved in the team, while at the same time making us more agile by moving day-to-day decision-making to smaller, more focused groups.

We’re working with the Core Team to flesh out these ideas, in part because several other subteams are pursuing similar thoughts; expect an RFC on this topic soon!

Areas of focus

With the above changes, the Libs Team should be able to devote much more of its focus to the broader crates ecosystem, and not just std. But where should that focus go?

What follows are some preliminary thoughts, with the main goal of stirring up discussion.

Let’s go back to the question of “product quality” for the ecosystem. I’d break that down as follows:

Crate availability. Does there exist a crate for your needs?
Crate discoverability. Can you find that crate?
Crate quality. Is the crate good? How can you tell?
Crate interoperability. Does the crate fit well into the rest of the ecosystem?

Last year, the Libs Team’s focus was clearly crate quality. Now we want to retool to hit on all of these topics.

Availability

Where are the gaps in the ecosystem? That’s not just missing crates, but crates that are missing important features in their domain. In the past, the Libs Team has sometimes tried to look at availability issues by examining the entire ecosystem and comparing to ecosystems for other languages–an approach that’s never panned out.

I think instead we should spin up working groups devoted to particular topics/goals. For example, we could have a SIMD working group with the mandate to produce a stable SIMD API and the power to make decisions on related RFCs. But working groups could also be more broad, e.g. by bringing together people interested in “networking” in general. The theory is that these domain experts, by talking more regularly, can come to better understand the gaps and turn them into contribution opportunities. They can also, of course, work to improve the quality of the crates in their domain.

It’s important to note, though, that working groups should be spun up only when we have committed leaderhip for keeping up momentum and organization of the group. That comes back to team growth.

Discoverability

In the “distant” past (circa 2016), we floated ideas like the Rust Platform, that involved “blessing” crates and tools that would then, in some sense, be “shipped” as part of the Rust distribution. Part of the goal was to improve discoverability by officially curating these crates. But in discussion with the broader community, it became clear that this approach just has too many downsides; it takes the oxygen out of the room for crate iteration and competition, amongst other things.

Instead, in 2017, the crates.io team put a lot of work into improving discoverability within crates.io. The Libs Team also intended to turn the Cookbook into a central point of discoverability, but that work hasn’t fully panned out.

I don’t think the work here is finished. As I said in my #Rust 2018 post, I think this year we should focus on shipping a new iteration of Rust as a product, and that should include a more polished discoverability story. As such, I think we should have a working group dedicated purely to improving the process of finding and evaluating crates. (There are lots of specific ideas about further improvements, but those are out of scope for this post.)

Quality

The Libs Team put a lot of its focus in 2017 on crate quality. As KodrAus put it, this happened both strategically and tactically:

Strategically: by creating resources like the API guidelines, we started to give library authors much more guidance how to create a high quality crate.
Tactically: through the Libz Blitz, we directly impacted the quality of specific crates.

Both of these efforts were shaped by the Libz Blitz, which purposefully targeted nearly-stable crates in an attempt to help clear up remaining design questions and polish toward a 1.0.

These kinds of quality improvements are one of the highest-leverage activities the Libs Team can take on, so we want to expand our efforts here. Some ideas and open questions include:

Supplementing the API guidelines with more “long form” material, e.g. by writing detailed “design evaluation” documents that explain all the design choices made in a particular crate.
Surfacing pockets of the ecosystem that lack uniformity, such as the current situation around -sys crates, and working to produce a set of consensus conventions.
Improving maintenance of vital crates (e.g. libc, rand, cc) by bringing on more contributors.
Doing deeper dives into particular domains that need more design work; dhardy took on such work with the rand crate, and there are several other areas that need more than a Blitz-style treatment to get to 1.0-level libraries.

I’m sure there are other avenues to explore, and I’d love to hear your ideas! It’ll also take some work to figure out how to map these to working groups we can plausibly staff.

Interoperability

One important aspect of looking at the ecosystem as a whole is making sure that crates work well together. For example, there’s currently an issue with error-chain that is preventing smooth interop with failure. The Libs Team should be working to surface and help solve this kind of issue. Probably this is best done by working toward another useful goal: building and documenting mid-sized sample applications that plug together various Rust libraries.

Cross-cutting concerns

Finally, a general point: to fully achieve its mission, the Libs Team needs to have much more contact with the ecosystem in general; the team should understand what libraries are becoming important in which areas, and spend time checking them out and helping contribute. There are a lot of ways we could do that, but most fundamentally this means bringing more folks working in particular sub-ecosystems into the Libs Team working groups. Thoughts on how we might structure such an effort are welcome, particularly from crate authors!

Wrapping up

This post was essentially a brain-dump of my current thinking about how to take the Libs Team to the next level. I’m eager to hear from you about the problems you see with the ecosystem, the ways you can envision the Libs Team helping, and best of all, the ways you’d like to be involved.

http://aturon.github.io/tech/2018/01/16/libs-mission

Rust in 2018: a people perspective

Jan 9, 2018 Updated Jan 9, 2018

Show full content

The call for #Rust2018 blog posts has generated a fantastic set of responses so far, and there’s already an emerging consensus around much of the technical focus for the year. Since I largely agree with what others have said on that front, I want to focus my post on the people side of things: what kind of impact do we want to make on people, both contributors and customers, in 2018?

Tell our story with a new product

Rust, like major web browsers, ships a new version every six weeks. There are a ton of advantages to this rapid release process, but two major people-related downsides:

Contributor downsides. Part of the appeal of rapid-release is that “there’s always another train on the way”; there’s no need to sprint to land something in a given release, because another one will follow soon. But that also makes it hard to “rally the troops”, bringing together the whole community to drive a cohesive set of goals all the way to completion.
Customer downsides. There’s a kind of “frog boil” effect from rapid releases; Rust is always improving, a little bit at a time, so it can be hard to fully grasp the large shifts that accumulate, especially if you’re not following development closely.

In 2017, we merged an RFC introducing epochs. While the discussion focused on the technical mechanics, to me the thrust of the idea is to supplement our release cycle with periodic “product” releases. These are sometimes called “marketing releases”, but I think marketing is just one (important!) part of the story:

When thinking about Rust as a product, we shift our focus away from the details of particular features, and instead think about the end-to-end experience of Rust. How do people hear about Rust, and what draws them in? What are their first impressions of Rust? What are the major wins and pain points at every point from novice to expert? This product focus helps us prioritize our efforts on what will make the biggest impact on people using, or thinking about using, Rust.
Part of the end-to-end experience is coherence: making sure that the set of features available on stable work well together, without major gaps; that these features are well documented; that the features are fully supported by tools like IDEs; that the compiler provides top-notch error messages for the full set of features; that there is likewise a coherent set of libraries available in the ecosystem that works well with the set of currently-stable features.
Aiming for a major product release gives us an opportunity, as a community, to come together and do something big that goes well beyond the usual six week cycle. We’ve seen this effect leading up to the Rust 1.0 release, and also with the more recent impl period. A product release gives us focus and drive.
Finally, there’s the marketing aspect, which comes in several layers. Releasing “Rust 2018” gives us a chance to say to the world that “Rust has taken a major step since 1.0; it’s time to take another look”. That, of course, ties directly back to the end-to-end experience; we’ll want to have a polished web site, installation process, etc. For existing Rust users who are not deeply involved in Rust’s development, the product release gives a way to understand Rust’s evolution as an overarching narrative. We can explain how the features stabilized since the previous epoch all fit together to establish new idioms.

I believe that 2018 should be an “epoch year”, in which we focus on shipping a quality product, for all of the above reasons. That’s going to require a lot of discipline, and a steady stream of stabilizations, but I think we’re up to the challenge!

Empower new technical leaders

I’m really proud of the work we did in 2017 to grow Rust’s formal teams, including creating several new subteams and expanding all of the existing ones. But we’re still suffering from a deficit of technical leadership. That’s in part because technical leadership is a hard job with mostly intangible effects; it’s largely about enabling other people to do the on-the-ground technical work, by working to reach consensus on constraints and high-level design. It requires enormous empathy, being able to understand the goals of a wide variety of people and thread the needle between them. And it often doesn’t involve landing reams of code with your avatar attached to them.

One of my personal lessons from 2017 is that we need to Think Big when it comes to Rust’s teams. Rust is a staggeringly large project with a huge and talented community, and we need its leadership structure to fully reflect that if we are to reach our full potential.

Concretely, this might look like:

Continuing to grow and subdivide the teams, so that we have more people in total involved in leadership and decision-making, but each individual has a narrower focus relative to today. This echos Niko’s points about impl period working groups, which was one such attempt. Splitting up the “tools” team into “dev tools”, “cargo”, and “infrastructure” in 2017 was another such example.
By the end of the year, having no single person leading more than one subteam. That would hopefully reflect a greater degree of importance and accountability around team leadership.
Improving the RFC process to make it more manageable for team members and the broader community alike. There are some strawman ideas on this front floating around, which will hopefully get written up soon.

In general, the point is that there’s potential for much greater parallelism within the Rust community than we have today, but to unlock that parallelism we need to grow our leadership capacity.

Engage corporations as users and sponsors

Rust’s adoption approach so far has been relatively “bottom up”: create a product with some strong potential business value, and focus largely on getting engineers “on the ground” to see that business value and work toward adopting it in their organization. Those engineers usually find small ways to use Rust, to prove it out in their organization, before getting more ambitious. It’s a great strategy.

However, as we seek to push further into larger, more conservative organizations, we need to supplement this bottom-up approach with a top-down one: make Rust appealing to CTOs. The primary way to do this is by making Rust look boring and safe as a technology choice. And we do that by showcasing our successes (we’ve commissioned some white papers to do this), being clear about where Rust makes sense and where it doesn’t, and projecting maturity, stability, and sustainability.

In 2018, we should take all of these efforts to the next level. We should have a polished web site that works for both engineers and CTOs, offering white papers and directing companies to sources of training, consulting, and support. And we should have a large and growing list of sponsors, like Mozilla, Bouyant, and Tilde and many others, who are funding Rust work in one way or another, either by paying their staff to contribute to Rust OSS, or by helping pay for project infrastructure, conferences, and the like.

Finally, we should do more to get production users engaged, at some level, in the RFC process. When we’ve talked to production users, the feeling is usually “we’re too busy writing Rust code and we trust you to get it right”. But the result is that RFC threads don’t present a picture fully inclusive of the production context.

Connect Rust’s global community

While we talk about “the Rust community”, the reality is that there are many Rust communities, separated by geography and by language. We need to do much more to connect these communities, again in terms of both contributors and customers. What are the important use-cases for Rust in India? What are the interests of volunteers in Brazil? What are the opportunities for Rust in China? How can we support each other, and communicate our respective values and needs?

Connecting these communities is not a small task, and I’m not sure what the right goal is for 2018. But I would love to see a greater awareness of and focus on this issue.

Serve intermediate Rustaceans

In 2017, we put a strong emphasis on early-stage productivity and learning curve, which has consistently been the top issue raised by the Rust survey. But last year, we’ve also heard an increasing plea for more “intermediate” level materials, focused on people who have learned the basic mechanics of the language and have written some code, but are looking for help on how to be effective as a Rust programmer. How should you organize a library? An app? When should you use traits? What about trait objects versus generics? And what libraries and tools should you be highly familiar with?

We’ve made some strides on this front in 2017 with efforts like the API guidelines and the cookbook. But there’s still a lot more to do in terms of (1) surfacing crates and tools that “every working Rustacean should know” and (2) fleshing out more guidance for how to wield Rust effectively, after you understand the features it provides.

Treat each other with empathy

One of the most amazing things about Rust is that it brings together grizzled C++ hackers, Raspberry Pi hobbyists, and JS pros and more into a single community. Rust offers something to all of us, despite that our goals, values, and interests sometimes diverge. Its ambition is to democratize robust, high performance code, to make systems programming better and more accessible to everyone.

The secret sauce has been a certain unwillingness to compromise: to find a way to take this diverse set of goals, backgrounds and contexts, and serve them all simultaneously by digging deep and thinking creatively. To keep that up, it’s vital that we don’t descend into tribalism or us-versus-them thinking, but instead to respect each other’s constraints, and trust that our constraints will be heard and respected as well.

http://aturon.github.io/tech/2018/01/09/rust-2018

Revisiting Rust’s modules, part 2

Aug 2, 2017 Updated Aug 2, 2017

Show full content

It’s been a week since my last post on Rust’s module system. Unsurprisingly, the strawman proposal in that post garnered a lot of commentary–174 comments in one week!–with sentiments ranging from

Now this is a proposal I can get behind

I’ve rarely hated anything as much as I hate the module system proposal

and everything in between :-)

The discussion has raised a number of very interesting points; thanks to everyone who has participated so far!. I won’t try to give a comprehensive summary here. What I want to do instead is focus on one particular critique of the earlier proposal, and present a quite different strawman design that embraces a different set of priorities.

For ease of discussion:

I’ll call the strawman in my last post the “directories-as-modules” proposal.
I’ll call the strawman in this post the “use-universally” proposal.

A critique of the directories-as-modules proposal

There were a number of concerns about the directories-as-modules proposal (including its fairly radical nature), but the one that struck me was that the proposal was very heavily weighted toward a particular subset of the problems the original post raised, and didn’t help much with some of the others.

To recap briefly: the original post talked about obstacles both for learning the module system, and for using it at scale. It ultimately focused a lot on the issue of how much we have to employ pub use (aka the “facade pattern”) when setting things up today, and I think the proposal clearly streamlines that story. (There are also variants like “inline” aka “anonymous” modules that bring in just part of the proposal).

On the other hand, the proposal didn’t do much to help with issues around “path confusion”:

The fact that use declarations work with absolute paths while other items do not is confusing, and even experienced Rust programmers (myself included) often confuse the two. To make matters worse, the top-level namespace contains all of the external crates, but also the contents of the current crate. Unless, of course, you’re writing an external test or binary. And finally, when you’re working at the top level, the absolute/relative distinction doesn’t matter, which means that you can have the wrong mental model and only find it when trying to expand out into submodules.

Many on the thread cited this as the core problematic issue with the module system; I’ve collected some data about confusion around Rust modules which also supports that to a degree.

My goal in this post is to float a quite different proposal that emphasizes these issues, de-emphasizes the facading issues, and overall is more conservative. Similarly to last time, the idea here is to present a coherent, plausible “spike” with ideas that could be useful, and seek feedback on the broad direction without getting too bogged down in the fine details.

One other bit of framing

Before giving the proposal, though, I want to record one other insight I’ve had along the way, in terms of where people sometimes go wrong when learning the module system.

Coming from other languages, there’s often an expectation that adding a .rs file to the source tree, or a dependency to Cargo.toml, should be all that’s needed to set up the naming hierarchy. From that perspective, you’d expect to be able to use use to pull items out of any of these. Instead, you sometimes can, but need to write the correct incantation (extern crate or mod) in the right place first. It requires a shift in mental model. And the fact that use is much more common than mod can make this all the more confusing.

@kornel put together a really great chart comparing module systems that makes this point quite strongly.

Part of the reason I’m labeling this proposal as “use-universally” is that it sets up use declarations as the only thing you need to write in your Rust source to bring items into scope. The items that are available, by contrast, are determined by Cargo (or another build system), together with your file system. This is one aspect that mirrors the earlier proposal, part of which is now an RFC.

The basic ingredients

Here’s a quick summary of the proposal:

Start with today’s module system.
Deprecate extern crate, along the lines of the in-progress RFC.
Deprecate mod foo; and instead determine module structure from the file system.
- However, unlike the previous proposal, this determination is the same as today, i.e. files are modules, and directories are used to introduce nested modules.
Improve use for greater clarity around paths, which I’ll explain below.
Modules are pub(crate) unless they are pub used (so pub mod foo; becomes pub use foo; – note that this is using relative paths, as I’ll explain next).

The meat is in making two adjustments for use declarations:

Introduce a from <crate_name> use <path>; form for importing items from external crates.
Change use <path>; to treat the path as relative to the current module (i.e. as if it started with self::).
- A leading :: takes you to the root of the current crate, but is not a way to reference items from other crates.

(Similar adjustments are needed for referencing paths in function signatures etc., which I’ll elide here.)

This is, of course, a breaking change. However, it has some properties that make it a reasonable fit for the checkpoint model:

It’s trivial to write a rustfix tool that mechanically switches today’s use declarations to this new setup, and likewise deals with mod and extern crate.
We could introduce and stabilize the from/use syntax, then deprecate use of absolute paths in use (without a leading ::), and employ rustfix at that point – all before a new checkpoint is needed.

Of course, the full migration story needs to be significantly fleshed out, but this is just meant to sketch plausibility.

What does it look like?

Before talking about the rationale, I want to show an example for clarity. First, the parts that don’t change.

Here’s a Cargo.toml excerpt:

[dependencies]
petgraph = "0.4.5"

A directory structure excerpt:

src/
  lib.rs
  coherence/
    mod.rs
    solve.rs

Code in today’s module system

In lib.rs:

extern crate petgraph;
pub mod coherence;

In mod.rs:

use petgraph::prelude::*;

use errors::Result;
use ir::{Program, ItemId};

mod solve;

pub use self::solve::Solver;

In solve.rs:

use std::sync::Arc;
use itertools::Itertools;

use errors::*;
use ir::*;

Code in the proposed system:

In lib.rs:

pub use coherence; // note relative path; this makes `coherence` pub

In mod.rs:

from petgraph use prelude::*;

use ::errors::Result;
use ::ir::{Program, ItemId};

pub use solve::Solver; // note use of relative path

In solve.rs:

from std use sync::Arc;
from itertools use Itertools;

use ::errors::*;
use ::ir::*;

Rationale

Each piece of this proposal has a rationale, but in some cases they’re tied together:

Introducing from/use. This form provides a much more clear distinction between imports from external crates and those from the local crate, which can be helpful when exploring a codebase. Splitting out this form also means we eliminate the very confusing issue that extern crates are “mounted” in the current crate’s module hierarchy, usually at root. (In this analogy, the from form is more like addressing an entirely separate volume.) Incidentally, grepping for this declaration will tell you which external crates are in use.
Changing use to take paths relative to the current module. There are two main reasons to do this.
- If submodules are always in scope for their parent module, things like function signatures feel like they are taking relative paths. (In actuality, they are resolving names based on what’s in scope). In any case, making paths everywhere relative to the current module reduces confusion.
- We want to use pub use to export submodules publicly, but with absolute paths this would be pub use self::my_submodule which is awkward and confusing; people are almost certain to forget self much of the time.
- Note that there are often arguments that use-like mechanisms should employ absolute paths by default because that’s the common case. However, for Rust I think that’s at least partly based on the current use for pulling in items from external crates, and would be more evenly split in this new setup.
Using pub use for exporting modules. If the module hierarchy is determined from the file system, we need some way to say whether a module is public. While we could say this in the module itself, doing so is syntactically awkward, and also means that a module’s exports are spread over multiple files. At the same time, pub use still exists as a form you need to use for re-exporting items, and it provides a reasonable mental model when using it to export your child module.
The general privacy setup. A basic premise is that the visibility of a module name is not terribly important by itself; what really matters is the visibility of items within the module. Thus we simplify matters by making all modules have at least crate visibility—though this does mean that marking an item pub in a module means it, in reality, has at least pub(crate) visibility (and perhaps more, if it’s exported in a public module). This is arguably a good thing; today, the fact that you can write pub but the actual visibility is determined by a complex nest of re-exports and module visibilities can make it quite hard to reason about unfamiliar code. As has been argued on thread, the vast majority of the time you only need visibility at one of three levels: the current module, the crate, or the world. This proposal makes those cases all easy to express, and requires a more explicit pub(super) etc to get other privacy granularities.
- TL;DR: writing pub on an item means pub(crate) unless (re)exported in a public module (which itself is done via re-exporting).
Deprecating mod/extern crate. This was already explained above. There’s already been some discussion around the downsides (and ways to mitigate them), so I’m not going to spend time on that here.
- Note, however, that one of the alternatives below may help further mitigate these concerns.

Alternatives

This design pulls together choices I believe cohere well, but there are many possible variations that are also quite plausible. These can be broken down into largely orthogonal knobs. I’ll take a brief look at each, and the tradeoffs as I see them.

Knob: from/use ordering

The from/use syntax follows precedent from Python, but we could instead use the use/from ordering from JS.

Possible benefits of use/from:

Makes it easier to read at a glance, when the item name makes obvious what the crate is.
Avoids “jagged edges” of imported names.
Arguably more “natural” reading (as a sentence).

Possible benefits of from/use:

More natural for autocomplete in Ides.
Gives you the crate name first when reading left-to-right (better if you often need that information to understand the import)

It’s interesting to consider the choices when it comes to multi-line imports:

from std use {
    io::{self, Read, Write},
    collections::{HashMap, HashSet},
    rc::Rc,
};

// versus
use {
    io::{self, Read, Write},
    collections::{HashMap, HashSet},
    rc::Rc,
} from std;

There are of course plenty of other possible syntactic choices, but these are relatively intuitive and descend from very commonly-used languages.

Knob: pub use foo vs pub mod foo

Rather than using re-exports to make a module public, we could say that the file system determines module structure, but you use pub mod foo; to make a child module foo public.

The main advantage would be that it’s more plausible to continue to make use take absolute paths, which reduces breakage. On the other hand, it seems to double down on some aspects of “path confusion”, and doesn’t achieve the unification around use that the main proposal does.

Knob: absolute vs relative paths

We could keep other elements of this proposal, but have use continue to use absolute paths. (We could then, for example, only allow you to reference external crates that were brought in through extern crate in use, but ones implied from Cargo.toml would go through from/use, potentially making the whole system backwards compatible).

If we go that route, then to make a module public we’d most likely wind up with one of the following:

pub use self::my_submodule;
pub mod my_submodule;

And again, as above, some path confusion issues remain.

Knob: include on use

Rather than determining the module hierarchy from the file system immediately, we could follow many other languages which add modules to the name hierarchy only if they are in some way referenced (e.g. via use); only at that point would we examine the file system for resolution.

Such an approach makes the Rust source somewhat more independent of the precise state of the file system, and may thereby address some of the concerns people have raised about previous proposals.

A downside, though: sometimes modules contain nothing but impl blocks, in which case they are not naturally referenced elsewhere. You’d have to explicitly use such modules, and forgetting to do so could lead to some head-scratching errors. (That said, we could generate a warning if the directory contains unused .rs files).

Knob: useing submodules

The proposal assumes that submodules are always in scope for their parents. We could instead require you to use them before referring to them. I can’t see a lot of advantage to doing that, though.

Extensions

Finally, while the proposal as-is only marginally helps with facades (by removing the need for self:: that’s currently common when facading), it’s compatible with future extensions that do more.

For example, we could draw from earlier proposals involving “anonymous modules” (aka “inline modules”) – say, files beginning with a leading _ – which do not affect the module hierarchy, and where all non-private items are automatically re-exported by the parent module. This has some of the flavor of the previous proposal, but with a more opt-in form.

Wrapping up

Just like last time around, please take this proposal as charting out one more plausible point in the design space, and see whether there are big-picture aspects to like or dislike, or ideas that might have promise. I’m looking forward to your feedback!

http://aturon.github.io/tech/2017/08/02/modules-part-2

Revisiting Rust’s modules

Jul 26, 2017 Updated Jul 26, 2017

Show full content

As part of the Ergonomics Initiative, I, @withoutboats and several others on the Rust language team have been taking a hard look at Rust’s module system; you can see some earlier thoughts here and discussion here.

There are two related perspectives for improvement here: learnability and productivity.

Modules are not a place that Rust was trying to innovate at 1.0, but they are nevertheless often reported as one of the major stumbling blocks to learning Rust. We should fix that.
Even for seasoned Rustaceans, the module system has several deficiencies, as we’ll dig into below. Ideally, we can solve these problems while also making modules easier to learn.

This post is going to explore some of the known problems, give a few insights, and then explore the design space afresh. It does not contain a specific favored proposal, but rather a collection of ideas with various tradeoffs.

I want to say at the outset that, for this post, I’m going to completely ignore backwards-compatibility. Not for lack of importance, but rather because I think it’s a useful exercise to explore the full design space in an unconstrained way, and then separately to see how best to fit those lessons back into today’s Rust.

Learnability issues

It’s hard to nail down the precise blockers to learnability, but here are a few of the obstacles we’ve heard repeatedly in feedback from a variety of venues:

Too many declaration forms. Module-related declarations include extern crate, mod foo;, use, pub use, mod { } and more, and each one has somewhat subtle effects on what is in scope where. For someone just starting out, this array of choices can be bewildering and stand in the way of writing “actual code” to feel out the language.
Path confusion. The fact that use declarations work with absolute paths while other items do not is confusing, and even experienced Rust programmers (myself included) often confuse the two. To make matters worse, the top-level namespace contains all of the external crates, but also the contents of the current crate. Unless, of course, you’re writing an external test or binary. And finally, when you’re working at the top level, the absolute/relative distinction doesn’t matter, which means that you can have the wrong mental model and only find it when trying to expand out into submodules.
Filesystem organization. The foo.rs versus foo/mod.rs distinction, together with mod foo;, can be an intimidating amount of machinery just to incorporate a file into your project.
Privacy. A module’s private items are always visible to its submodules. But private items within its submodules aren’t visible to each other. Moreover, it’s an error to expose a private item in a public interface, but it’s common to define public items within a private module and re-export them elsewhere. Learning the ropes of the privacy system is not easy, and even experienced Rust programmers sometimes grate against it.

It can be hard, when more experienced with Rust, to empathize with these concerns—we suffer from the “Curse of Knowledge” here. But it’s important to recognize that all of these distinctions that are hard to learn in the first place also impose a small, but non-trivial mental tax even when you know them well. So the goal is not to make things easier for newcomers at the expense of those with more experience, but rather to make things easier for everyone.

Productivity issues

Once you’ve gotten the hang of the module system, there are still annoyances, ranging in importance from code readability concerns to minor papercuts.

Who can see this item? And how? It’s pretty common to find items within modules that are marked pub, but are not in fact visible through the module defining them—or even visible outside the crate at all! This tends to happen when you want to organize code within the file system differently from the API hierarchy you expose to the rest of the crate or to the outside world. It generally means you have to look at several files to figure out how to access an item (or even whether you can).
pub use abuse. More generally, re-exports are ubiquitous in idiomatic Rust code. The result is that the “apparent” module hierarchy (as seen from the file system) often tells you very little about the actual module hierarchy, as seen from inside or outside the crate. This can make it difficult to jump into a new code base, or back into one you haven’t worked on in a while.
Repetition. The module system often requires two steps to do something, when a single step would suffice to convey all the necessary information:
- When you add a dependency to Cargo.toml, you also need to add an extern crate declaration.
- When you add a new .rs file, you also need to write a corresponding mod declaration.
- When you have a file that exists solely for organization and you add a pub item to it, you also have to pub use that item elsewhere in the hierarchy.

These issues may not seem like a big deal at first, but at least in my experience, after thinking deeply about modules and surfacing these problems, I find myself noticing them all the time.

What is a module system, anyway?

With the critique of today’s module system out of the way, I want to talk a bit about the core concerns of a module system, at least from Rust’s perspective:

Bringing names into scope, including from external crates
Defining the crate’s internal namespace hierarchy
Defining the crate’s external namespace hierarchy
Determining how code is arranged in the file system
Visibility (aka privacy)

Things seem to work most smoothly when these concerns are closely aligned. Conversely, the places where the module system becomes hard to work with and reason about tend to be misalignments.

An example of misalignment: facades in futures

Let’s take a concrete example from the futures crate. Futures, like iterators, have a large number of methods that produce “adapters”, i.e. concrete types that are also futures:

trait Future {
    type Item;
    type Error;
    fn poll(&mut self) -> Poll<Self::Item, Self::Error>;

    fn map<F, U>(self, f: F) -> Map<Self, F>;
    fn then<F, B>(self, f: F) -> Then<Self, B, F>;
    // etc
}

Each of these concrete types (Map, Then and so on) involve a page or so of code, often with some helper functions. Thus, there was a strong desire to define each in a separate file, with the helper functions private to that file.

However, in Rust each file is a distinct module, and it was not desirable to have a large number of submodules each defining a single type. So, instead, the future module has code like this:

mod and_then;
mod flatten;
mod flatten_stream;
mod fuse;
mod into_stream;
mod join;
mod map;
mod map_err;
mod from_err;
mod or_else;
mod select;
mod select2;
mod then;
mod either;

pub use self::and_then::AndThen;
pub use self::flatten::Flatten;
pub use self::flatten_stream::FlattenStream;
pub use self::fuse::Fuse;
pub use self::into_stream::IntoStream;
pub use self::join::{Join, Join3, Join4, Join5};
pub use self::map::Map;
pub use self::map_err::MapErr;
pub use self::from_err::FromErr;
pub use self::or_else::OrElse;
pub use self::select::{Select, SelectNext};
pub use self::select2::Select2;
pub use self::then::Then;
pub use self::either::Either;

This kind of setup is known generally as the facade pattern, and it’s pretty ubiquitous in Rust code.

The facade boilerplate is needed to deal with a misalignment: each adapter is defined in its own file with its own privacy boundary, but we don’t actually want that to entail a distinct module for each (in the internal or external namespace hierarchy). That means we have to do two things:

Make the modules private, despite that they contain public items
Manually re-export each of the public items at a higher level

When first trying to navigate the futures codebase, you have to read the future module to understand how its submodules are being used, due to these re-exports. For the futures crate, this is a relatively small annoyance. But it can be a real source of confusion for crates that have more of a mixture of submodules, some of which are significant for the namespace hierarchy, other of which are hidden away.

Another common confusion: items defined as pub within a private module which are not, in fact, exported from the crate, but which may be re-exported in another crate-internal module. In this case, pub(crate) would better convey intent, but today’s module system makes pub the path of least resistance. That means, in turn, that an item definition alone doesn’t tell you the fully visibility story (though it does give you an upper bound on visibility); in general you have to crawl through the rest of the code to figure out where the item is ultimately visible.

Expressiveness and the common case

Rust’s module system, through things like the facade pattern, gives you a lot of expressiveness: you’re not forced to keep the various concerns of the module system in alignment, and are thus free to craft the organization that you deem best.

The concern isn’t so much having this freedom, but rather how often you must wield it. How often does the facade pattern show up in your code? How often do you use re-exports? How often does the directory structure of your crate bear little resemblance to the intended module hierarchy?

I spent some time surveying some of the most popular and most respected crates to get a qualitative feel for this question, including: futures, regex, rayon, log, openssl, flate2, bytes, irc, clap, url, serde, chalk , std and chrono. Virtually every crate had something “unique” about its organization, and almost all of them used the facade pattern somewhere. In general, it was impossible to predict anything about the public API surface just by looking at the file system organization; you have to trace re-exports.

In short, in the vast majority of cases the module system necessitated boilerplate and a disconnect between its various concerns, impairing both write- and read-ability.

Increasing alignment

The question I want to pose now is: can we make the module system work more smoothly for the common case, decreasing boilerplate and increasing predictability/readability? This is a more narrow question than “how do we lower the learning curve”, but I believe that a good answer will help learnability as well.

A basic strategy is to try to make the various uses of facades more “first class”, i.e. expressed in a more explicit and clear way, rather than encoded via a particular pattern of usage. Let’s take a deeper look at the ways in which facades are commonly used in the examples mentioned above:

To allow breaking code into files, with file-private definitions. This is the futures example discussed above: you’re forced to create submodules in order to split things into files, but you try to “hide” the submodules as much as possible using a facade, and other than the facade definition you never refer to them by name.
To allow for cfg-specific implementations. For example, the standard library uses a facade-like pattern to have two side-by-side implementation of its core system primitives, for cfg(unix) and cfg(windows). This is set up so that there is no visible impact on the module hierarchy, but there is an impact on the filesystem hierarchy.
For crate-internal organization, where you do want a module hierarchy (for privacy or namespacing purposes), but you don’t want to reveal it to the outside world (or, in some cases, even to other modules in the crate).

How can we make these use cases more explicit, clear, and streamlined?

Proposal: directories determine modules

The central idea in this post is to make intent more explicit in the file system than we do today, while streamlining common facade patterns. Here’s one way we might do it:

Deprecate mod foo; declarations, instead determining module directly from directory structure.
Directories (not files!) determine the module hierarchy.
- A directory with a leading _ gives you a pub(crate) module.
- All other directories give you pub modules.
The .rs files in a directory collectively determine the contents of the corresponding module.
- Private items are private to the file in which they are defined.
- Items with pub(self) or greater visibility are, in particular, visible to sibling files that are part of the module’s definition.

A basic example Let’s start with an example just showing the mechanics. First, the directory structure:

src/
  foo/
    these.rs
    names.rs
    do_not_matter.rs
    _infer/
      instantiate.rs
      unify.rs
  bar/
    mod.rs // this is fine, but has no special status
    impls.rs
    tests.rs
  baz.rs

From the directory structure alone, we know the precise module structure (modulo any inline modules; more on that later):

pub mod foo {
  /* scoped contents of `these.rs`, `names.rs`, `do_not_matter.rs` */

  pub(crate) mod infer {
    /* scoped contents of `instantiate.rs` and `unify.rs` */
  }
}

pub mod bar {
  /* scoped contents of `mod.rs`, `impls.rs` and `tests.rs` */
}

/* scoped contents of `baz.rs` */

What do I mean by “scoped contents”? To reiterate from above, fully private definitions are private to that file, while pub(self) means private to the current module. To illustrate, imagine we have the following for instantiate.rs:

// private to this file
struct Instantiator { ... }
fn some_helper() { ... }

// private to this module, i.e. visible to all files within `_infer`, i.e.
// `instantiate.rs` and `unify.rs`
pub(self) fn instantiate<T: Fold>(table: &mut InfTable, arg: &T) { ... }

and then, in unify.rs:

// private to this file
struct Unifier<'a> { ... }

// note: this private definition does *not* clash with the private definition
// in `instantiate.rs`
fn some_helper() { ... }

// visible to the whole crate at `foo::infer::UnificationResult`
pub(crate) struct UnificationResult { ... }

pub(crate) fn unify<T: Zip>(table: &mut InfTable, a: &T, b: &T) -> UnificationResult {
  /* may invoke `instantiate` */
}

Thus, visibility annotations give you fine-grained control ranging from current file (private) to world-public (pub) and every module in the hierarchy in between.

Breaking code into files without module structure: the futures example

Having seen the basics, let’s put this proposal to use in expressing the futures example described above:

src/
  future/
    mod.rs
    and_then.rs
    flatten.rs
    fuse.rs
    // etc

These files would have exactly the same contents as today, except that we would be able to delete most of mod.rs. That is, none of the following boilerplate is needed:

// these can go!
mod and_then;
mod flatten;
mod fuse;
// etc

// these too!
pub use self::and_then::AndThen;
pub use self::flatten::Flatten;
pub use self::fuse::Fuse;
// etc

Thus, this proposal works particularly smoothly when you want to break a module into multiple files, with potentially file-private items—because that’s exactly how modules work in the proposal! No facade necessary, and the intended (flat) module structure is made clear and explicit via the file system structure.

Platform-specific implementations

What if you want to provide distinct implementations by platform? Again, you no longer need a facade:

src/
  foo/
    unix.rs
    windows.rs

Where unix.rs starts with #![cfg(unix)] and similarly for windows. Both files are considered part of the foo module’s definition, but depending on the platform one of the files will appear to be empty. (Today this pattern is implemented using submodules tagged with cfg, together with re-exports.)

Internal module structure

The final mis-alignment was cases where you want a module hierarchy internally, but want to expose some collapsed version of it externally. This is the one place where you still need to use pub(use):

// excerpted from `clap`

src/
  _app/
    help.rs
    macros.rs
    mod.rs
    parser.rs
    usage.rs
  _args/
    arg.rs
    arg_matcher.rs
    arg_matches.rs
    macros.rs
    settings.rs
  errors.rs
  fmt.rs
  suggestions.rs
  lib.rs

This directory structure is excepted from clap, which currently has app and args subdirectories but does not export any submodules; these modules are used purely for internal organization and namespacing.

In _app/mod.rs you might have a definition like:

// note that this is `pub`!
pub struct App<'a, 'b> { ... }

The reader can immediately see something interesting happening: this item is defined within an “internal” (pub(crate)) module, since _app begins with an underscore. But it has a larger, world-public visibility. This is an indication that the item will be re-exported somewhere else (and in fact, we could lint against this not being the case).

Accordingly, in lib.rs, we might have:

pub use app::App;

In short, re-exports are still needed, but the directory structure and item visibility give the reader a strong, localized indication of what’s going on.

Fine details

I’m glossing over a lot of fine details here, including:

How do you provide module docs? One appealing possibility: via a README.md file, which would have several benefits — most importantly, moving the often very large module-level docs out of band.
Similarly, what’s the story for module-level attributes in general?
What about inline modules?
Backward-compatibility concerns?
And many more.

For the moment, I’m going to ask that we avoid getting bogged down in these questions (which are ultimately important), so that we can focus first on whether the broad direction here is a good one.

Tradeoffs

Speaking of evaluation: there are some tradeoffs we can see even at this level of detail.

Primary Upsides:

Learning the basics of the module system is really easy: each directory defines a module name in the module hierarchy; the .rs files within that directory collectively define the contents of that module. The end.
The file system organization gives you a very clear, explicit view into the module structure and programmer intent. Compared to dropping into a random crate’s source code today (an exercise I performed repeatedly), I believe this approach will make it much easier to understand a crate’s overall structure with a quick run of tree.
Fewer imports are needed, because module-visible items defined in sibling files are automatically in scope (but see the downside below).
Most of the common uses of facades (breaking into files/privacy boundaries, platform-specific modules) no longer require any facading, or indeed any boilerplate at all.
Cases where you want some crate-internal module namespacing are expressed in a natural, obvious way (via the _ prefix), and one that makes it easier for readers to see that a given item will be re-exported elsewhere.

Primary downsides:

Bringing module-visible items into scope from sibling files means that one may have to search in multiple files to discover the definition of some item. In contrast, today every item you can mention in a file is brought into scope somewhere in that file—assuming you don’t use globs.
- On the other hand, this proposal eliminates boilerplate use declarations for definitions that are conceptually part of the same module. And in particular, the fact that such declarations would often be relative (e.g. use self::item;) may help mitigate confusion around the absolute/relative path issue.
- A variant of this proposal would not bring these items into scope by default, but instead allow you to do so via use self::item; However, having to use self here is awkward, and the definition is not helpful—it only serves to tell you that another file in the directory defines item, which is something you can already determine if the binding isn’t located in the current file. (In contrast, today the required import also gives you a hint as to which file to look at).
Determining module structure from file system structure is problematic for some, due either to stashing stray .rs files in the project directory, or due to potentially laggy network file access.
- If this proves to be a problem in practice, we could provide an optional way to specify the desired file list. But in the vast majority of cases today the file system and module hierarchy are aligned.
- Some have also argued that leveraging the file system is too “implicit”, but I don’t think that argument holds water; the file system arrangement itself is a perfectly “explicit” way of providing information, and there’s no particular reason to distinguish that from mod statements in code. I rather see it as the current setup forcing repetition of information. (I would also urge a focus on concrete instances of reasoning about code in judging this kind of question.)

Wrapping up

There’s a lot more to say about modules, and this proposal is just one variant of probably a dozen that the language team has been exploring. But I wanted to take the time to at least spike out one plausible option, and see what people think. As I asked above: I strongly urge people to focus only on the big-picture question of whether this avenue is appealing at all, and not get too bogged down in finer details until later in the process.

A bit of editorializing

What I like about this proposal is that it’s dirt simple: the correspondence between file system and module hierarchies is very easy to describe, and today’s patterns fall out naturally, usually with significant boilerplate reduction. I think there’s a very real chance that, with this proposal, people will view Rust’s module system as easy to learn. Finally, and most subjectively, compared to some of the other ideas we’ve been exploring, there’s a certain elegance to this set up; nothing feels bolted on, and the examples drawn from real-world code have a quite pleasing expression.

I do worry about the sibling scoping question. I know I, for one, often track down bindings by searching purely within the current file. With this proposal, I’d have to change that workflow to greping within the current directory, or using tags more consistently, etc. Yet, I suspect that in the end, these other workflows are an improvement — e.g., tags allow a more direct jump to definition regardless of where that definition lives, whereas my current workflow often requires following a chain of imports.

In any case, I think this potential workflow shift is more than made up for by the greater clarity about module structure, which makes it much easier to find your way around a project in the first place.

http://aturon.github.io/tech/2017/07/26/revisiting-rusts-modules

Shipping specialization: a story of soundness

Jul 8, 2017 Updated Jul 8, 2017

Show full content

Rust’s impl specialization is a major language feature that appeared after Rust 1.0, but has yet to be stabilized, despite strong demand.

Historically, there have been three big blockers to stabilization:

The interplay between specialization rules and coherence, which I resovled in an earlier blog post.
The precise ways in which specialization employs negative reasoning, which will be resolved by incorporating ideas from Chalk into the compiler.
The soundness of specialization’s interactions with lifetimes. The RFC talks about this issue and proposes a way to address it, but it has never been implemented, and early attempts to implement it in Chalk have revealed serious problems.

I’ve been wrestling, together with nmatsakis, withoutboats and others, with these soundness issues.

Spoiler alert: we have not fully solved them yet. But we see a viable way to ship a sound, useful subset of specialization in the meantime. Feel free to jump to “A modest proposal” if you just want to hear about that.

This blog post is an attempt to write up what we’ve learned so far, with the hopes that it will clarify that thinking, and maybe open the door to you cracking the nut!

The problem

In stable Rust, it is not possible for lifetimes to influence runtime behavior. This is partly an architectural issue, and partly a design issue:

Architecture: the compiler erases lifetime information prior to monomorphization and code generation, meaning that the generated code simply has no way to depend on lifetimes. That could be changed, but we’d have to work hard to avoid code blowup by generating separate copies of code for each lifetime it was used within, assuming that the behavior didn’t change.
Design: lifetime inference generally chooses the smallest lifetime that fits the constraints at any given moment. That means that you can have a piece of data that is valid for the 'static lifetime, yet is viewed as having a shorter lifetime. Having runtime behavior depend on these choices seems bound to result in confusion and bugs.

Unfortunately, specialization makes the story more difficult:

trait Bad1 {
    fn bad1(&self);
}

impl<T> Bad1 for T {
    default fn bad1(&self) {
        println!("generic");
    }
}

// Specialization cannot work: trans doesn't know if T: 'static
impl<T: 'static> Bad1 for T {
    fn bad1(&self) {
        println!("specialized");
    }
}

fn main() {
    "test".bad1()
}

What does this program print? Since the string literal "test" has type &'static str, you might expect the second, specialized impl to be used (and hence to get specialized as the output). But, as explained above, from the perspective of trans this type will look like &'erased str, making it impossible to know whether the more specialized impl can safely be used.

Here’s another, less obvious example:

trait Bad2<U> {}

impl<T, U> Bad2<U> for T {}

// Specialization cannot work: trans doesn't know if two refs have equal lifetimes
impl<'a, T, U> Bad2<&'a U> for &'a T {}

Here, the second impl is requiring that two lifetimes are the same, and once more for trans we can’t tell whether the impl safely applies.

On the other hand, simply naming a lifetime that must exist, without constraining it, is fine:

trait Good {}

impl<T> Good for T {}

// Fine: specializes based on being *any* reference, regardless of lifetime
impl<'a, T> Good for &'a T {}

In addition, it’s in principle okay for lifetime constraints to show up as long as they don’t influence specialization:

trait MustBeStatic {}

impl<T: 'static> MustBeStatic for T {}

// Potentially fine: *all* impls impose the 'static requirement; the dispatch is
// happening purely based on `Clone`
impl<T: 'static + Clone> MustBeStatic for T {}

Why does this lead to unsoundness?

So far, it might seem like we can just be conservative in trans, which could lead to confusing behavior but is otherwise alright.

Sadly, it’s not, at least given the original design of specialization:

trait Bomb {
    type Assoc: Default;
}

impl<T> Bomb for T {
    default type Assoc = ();
}

impl Bomb for &'static str {
    type Assoc = String;
}

fn build<T: Bomb>(t: T) -> T::Assoc {
    T::Assoc::default()
}

fn main() {
    let s: String = build("Uh oh");
    drop(s) // typeck and trans disagree about the type of `s`
}

The problem here: specialization as originally designed will allow the typechecker to conclude that T::Assoc is String if it knows that T is &'static str. That’s because the impl for &'static str does not use the default keyword when defining its associated type, meaning that no further specialization is allowed (so the type checker knows everything there is to know).

But trans, of course, sees &'erased str instead, and so cannot safely use the specialized impl. That means that trans will make the call to build return (), but the rest of the code assumed that a String was returned.

Oops.

(Spoiler alert: the “as originally designed” bit above is a give-away of where we’re ultimately going to end up…)

Some “solutions” that don’t work

Before giving my proposed way forward, let me explain why some of the solution that are probably coming to mind don’t work out.

Can’t we just rule out “bad” specializations?

It’s very tempting to blame the specialized impls for Bad1 and Bad2 above, since they clearly impose lifetime constraints. Maybe we could just make it an error to do so.

Unfortunately, the trait system is very powerful, and you can “hide” lifetime constraints within other trait impls that don’t involve specialization. Worse still: the problem can arise from two independent crates, each of which is doing something seemingly reasonable.

////////////////////////////////////////////////////////////////////////////////
// Crate marker
////////////////////////////////////////////////////////////////////////////////

trait Marker {}
impl Marker for u32 {}

////////////////////////////////////////////////////////////////////////////////
// Crate foo
////////////////////////////////////////////////////////////////////////////////

extern crate marker;

trait Foo {
    fn foo(&self);
}

impl<T> Foo for T {
    default fn foo(&self) {
        println!("Default impl");
    }
}

impl<T: marker::Marker> Foo for T {
    fn foo(&self) {
        println!("Marker impl");
    }
}

////////////////////////////////////////////////////////////////////////////////
// Crate bar
////////////////////////////////////////////////////////////////////////////////

extern crate marker;

pub struct Bar<T>(T);
impl<T: 'static> marker::Marker for Bar<T> {}

////////////////////////////////////////////////////////////////////////////////
// Crate client
////////////////////////////////////////////////////////////////////////////////

extern crate foo;
extern crate bar;

fn main() {
    // prints: Marker impl
    0u32.foo();

    // prints: ???
    // the relevant specialization depends on the 'static lifetime
    bar::Bar("Activate the marker!").foo();
}

The problem here is that all of the crates in isolation look perfectly innocent. The code in marker, bar and client is accepted today. It’s only when these crates are plugged together that a problem arises – you end up with a specialization based on a 'static lifetime. And the client crate may not even be aware of the existence of the marker crate.

If we make this kind of situation a hard error, we could easily end up with a scenario in which plugging together otherwise-unrelated crates is impossible. Or where a minor version bump in one dependency could irrevocably break your code.

Can we make a knob: “lifetime-dependent” vs “specializable”?

Thinking more about the previous example, you might imagine the problem is that the Marker trait ends up being used in two incompatible ways:

It’s used in a specialization, the second Foo impl.
It’s used in impls that constrain lifetimes (the Bar impl).

It’s the combination of these things that gets us into trouble. And each one arises from a different crate. So you might be tempted to add an attribute, say #[lifetime_sensitive], which allows for impls that constrain lifetimes but prevents use in specialization.

In other words, the Marker trait could say, in advance, whether the Foo impls or the Bar impl are acceptable.

There are several downsides to this idea, but the real death-knell is that “constraining lifetimes” is a surprisingly easy thing to do. To wit:

trait Sneaky {
    fn sneaky(self);
}

impl<T> Sneaky for T {
    default fn sneaky(self) {
        println!("generic");
    }
}

impl<T> Sneaky for (T, T) {
    fn sneaky(self) {
        println!("specialized");
    }
}

fn main() {
    // what does this print?
    ("hello", "world").sneaky()
}

Here we have a specialized impl that doesn’t mention any lifetimes or any other traits; it just talks about the type (T, T). The problem is that it’s asking for the two tuple components to have the same type, which means that if a lifetime appears, it must be the same in both.

Once more, when we go to trans the main function, we’ll be invoking sneaky on the type (&'erased str, &'erased str), and we can’t tell for sure whether the more specialized impl applies.

But saying that you can never repeat a type within a specialization would be very restrictive. And there’s always the worry that we’ve missed other sneaky ways to constrain lifetimes…

Can we make trans smarter?

At this point it becomes tempting to start blaming trans. After all, if we tracked lifetime information all the way through, wouldn’t that solve everything?

It would solve some things: it would make specialization sound. But at a high cost.

As explained at the outset, tracking information through trans would involve a massive overhaul of the compiler, and we’d have to be very smart about coalescing code with different lifetimes but identical behavior. There’s no guarantee we could do this without making the compiler significantly slower and/or creating more code bloat.

More fundamentally, though, it would lead to highly unpredictable behavior:

trait Print() {
    fn print(self);
}

impl<'a, T> Print for &'a str {
    fn print(self) {
        println!("Arbitrary str: {}", self);
    }
}

impl<T> Print for &'static str {
    fn print(self) {
        println!("'static str: {}", self);
    }
}

fn print_str(s: &str) {
    s.print()
}

fn main() {
    let s = "hello, world!";
    s.print();
    print_str(s);
}

Does this program print 'static str: hello, world! twice?

No! Because the call to print_str will reborrow the string slice at a shorter lifetime, and so trans will monomorphize it differently.

Making program behavior sensitive to the exact rules around lifetime inference and reborrowing seems extremely risky.

A modest proposal

Hopefully the above gives you some taste of the challenge here. Later in this post we’ll look at some more promising, clever solutions. But none of them have worked out completely, so I want to pause here and propose an incremental step forward.

First off, we add a new feature gate, assoc_specialization, which is needed whenever you use default type in an impl. We then focus on stabilizing just the core specialization feature, i.e. without being able to specialize associated types. That immediately means we can stop worrying about making type checking and trans agree, since type checking will essentially no longer care about specialization.

Many uses of specialization, including most of the original motivating examples, do not need to be able to specialize associated types.

With that out of the way, we still have work to do at the trans level. In particular, we must ensure that trans is conservative when it comes to lifetime constraints. The proposal here is twofold:

Any time a specialized impl imposes any lifetime constraints not present in the more general impl, trans uses the more general impl instead.
However, in these cases, we trigger an error-by-default lint to warn that a possibly-applicable specialization is not being used. (This needs to be a lint, not a hard error, because the relevant impls aren’t always under your crate’s control.)

Let’s revisit some of the earlier examples in this light:

// Specialization cannot work: trans doesn't know if T: 'static:
trait Bad1 {
    fn bad1(&self);
}

impl<T> Bad1 for T {
    default fn bad1(&self) {
        println!("generic");
    }
}

impl<T: 'static> Bad1 for T {
    fn bad1(&self) {
        println!("specialized");
    }
}

fn main() {
    // prints `generic`, but also generates a warning
    "test".bad1()
}

For this example, trans would pick the generic implementation, but issue a warning that a specialization might have applied. You could imagine going further and detecting simple cases like this where a given impl will never be used (as in the second impl of Bad1) and issuing errors. But as explained above, we cannot catch them all.

On the other hand, consider this case:

trait MustBeStatic {}
impl<T: 'static> MustBeStatic for T {}
impl<T: 'static + Clone> MustBeStatic for T {}

Here, both impls impose 'static constraints, so the second impl doesn’t impose any new lifetime constraints, and trans can choose it.

To make this work, in trans, when we query the trait system we replace each instance of 'erased with a distinct, fresh lifetime variable, which is a simple way to encode that anything we deduce in the query must be valid for all sets of unerased lifetimes. The Chalk approach will make this quite easy to do.

Even for the cases we’re covering, though, it’s possible to do better (we’ll see more on that later). That means we might want to “improve” the behavior of trans after stabilizing the core of specialization. Fortunately, we should be able to statically detect all cases where the behavior of trans would change, and issue a different warning that the behavior will improve. That gives us leverage to use something like epochs to make trans smarter over time, while still shipping some version of specialization relatively soon.

The only alternative seems to be to continue to pursue increasingly clever solutions before shipping anything—which is a worrying approach to take when it comes to soundness. Better, in my opinion, to ship a sound 80% of the feature now, with some rough edges, and improve it over time.

Going deeper

Before I close out this post, I want to write out some of the further explorations we’ve done, and what we’ve learned.

Here’s an interesting example:

trait Special {
    fn special(&self);
}

impl<T> Special for T {
    default fn special(&self) {
        println!("generic");
    }
}

impl<T> Special for (T, T) {
    fn special(&self) {
        println!("specialized");
    }
}

fn pair<T: Clone>(t: T) {
    (t.clone(), t).special()
}

fn main() {
    pair("hi");
}

Using the strategy outlined above, trans will go from (&'erased str, &'erased str) to (&'a str, &'b str) and hence use the generic implementation (and issue a lint that the more specific impl is being ignored). However, type check could deduce that the more specialized impl always applies when invoking special in pair, and you could imagine communicating that information down to trans.

What’s going on here is that type check sees things before monomorphization, and trans sees them afterward. In this particular case, that ends up making trans more conservative, since it can’t tell that two appearances of 'erased always come from the same, single lifetime.

The story changes if we add one layer of “indirection” around trait dispatch:

trait Special {
    fn special(&self);
}

impl<T> Special for T {
    default fn special(&self) {
        println!("generic");
    }
}

impl<T> Special for (T, T) {
    fn special(&self) {
        println!("specialized");
    }
}

fn use_special<T: Special>(t: T) {
    t.special()
}

fn pair<T: Clone>(t: T) {
    use_special((t.clone(), t))
}

fn main() {
    pair("hi");
}

Now at type checking time, the actual use of special occurs in a context where we don’t know that we’ll always be using the more specialized version.

Why harp on this point? Well, for one, it’s the main issue in allowing for sound specialization of associated types. We can see this in a variant of the Bomb example:

trait Bomb {
    type Assoc: Default;
}

impl<T> Bomb for T {
    default type Assoc = ();
}

impl<T> Bomb for (T, T) {
    type Assoc = String;
}

fn build<T: Bomb>(t: T) -> <(T, T) as Bomb>::Assoc {
    <(T, T) as Bomb>::Assoc::default()
}

fn main() {
    let s: String = build("Uh oh");
    drop(s) // typeck and trans disagree about the type of `s`
}

Here, again, type check knows that the relevant uses of Bomb all involve types of the form (T, T) and therefore can use the specialized version, and that could be communicated to trans. But, once more, adding a layer of indirection makes that much harder:

trait Bomb {
    type Assoc: Default;
}

impl<T> Bomb for T {
    default type Assoc = ();
}

impl<T> Bomb for (T, T) {
    type Assoc = String;
}

fn indirect<T: Bomb>() -> T::Assoc {
    T::Assoc::default()
}

fn build<T: Bomb>(t: T) -> <(T, T) as Bomb>::Assoc {
    indirect::<(T, T)>()
}

fn main() {
    let s: String = build("Uh oh");
    drop(s) // typeck and trans disagree about the type of `s`
}

The problem is that type check can no longer tell trans to use the specialized impl in the call to Assoc::default, but it is still assuming that the specialized impl is used externally (i.e., in the build function).

To sum up, there are two inter-related places where type check and trans differ:

Lifetime erasure
Monomorphization

We can partly deal with the first of these by introducing fresh lifetime variables for each lifetime that appears in type check, just as we do for trans—basically asking for the trait system to only find answers that would apply for arbitrary lifetime choices.

The monomorphization issue, though, appears much harder to cope with. One possible avenue is to track impl choices in a way that crosses functions, in other words allowing the knowledge from build that the specialized impl of Bomb can be used to be used when monomorphizing and generating code for indirect. Niko tells me that, in ancient times, the compiler used to do something much like this—and that it was incredibly painful and complicated.

In any case, taking these further steps would appear to require substantial additional work, and it seems hard to achieve confidence in their soundness. So dropping associated type specialization for now, where it’s relatively easy to argue for soundness, seems like the right step to take (@arielb1, here’s where you prove me wrong).

http://aturon.github.io/tech/2017/07/08/lifetime-dispatch

Negative reasoning in Chalk

Apr 24, 2017 Updated Apr 24, 2017

Show full content

I’ve had the pleasure in recent weeks of working on Chalk, the project that Niko’s been blogging about:

The project has a few goals:

Recast Rust’s trait system explicitly in terms of logic programming, by “lowering” Rust code into a kind of logic program we can then execute queries against.
Provide a prototype for an implementation based on these principles in rustc.
Provide an executable, highly readable specification for the trait system.

We expect many benefits from this work. It will consolidate our existing, somewhat ad hoc implementation into something far more principled and expressive, which should behave better in corner cases, and be much easier to extend. For example, the current implementation already supports associated type constructors.

It also makes it much easier to gain confidence in what the trait system is doing, because we can understand it in relatively simple logical terms.

Open problems in paradise

All that said, Chalk isn’t finished, and it’s currently missing some of the core pieces of the real trait system.

I’ve been trying to puzzle out a tangle of related such open problems for Chalk. In particular, I want to work out how to:

Give a very precise and principled meaning for the Yes, No, and Maybe results you can receive.
Account for the various “mode switches” we employ in today’s trait system, which control the degree of negative reasoning permitted.
Account for rustc’s precedence rules that e.g. give more weight to where clauses than to blanket impls when it comes to type inference.
Support coherence checking, which requires (constrained) negative reasoning.
Leverage the orphan rules for reasoning.
Incorporate specialization soundly (ruling out lifetime dispatch).

The theme that ties all of these topics together is negative reasoning, i.e the ability to conclude definitively that something is not true. For the trait system, that usually means that a type definitively does not implement a trait. And what we’ve learned over time is, relying on this kind of reasoning can make your code brittle to changes in other crates: new impls are added all the time, and can invalidate these kinds of negative conclusions. We’ve carefully designed the existing trait system to strike the right balance between the ability to reason negatively and the ability of other crates to evolve, but the current implementation feels ad hoc and incomplete. The challenge is putting all of this on much firmer footing, by understanding it in terms of explicit logic programming, and keeping the underlying logic grounded in well-understood logical principles. (And that, by the way, would be a huge win, since we’ve often been quite fearful about negative reasoning in rustc, since it’s so easy to do it incorrectly.)

It turns out that Prolog has similar concerns about negation. In particular, the natural way of implementing negation in a Prolog engine is through failure: not P means you tried but failed to prove P given the facts currently present in the Prolog program. For this to be valid as logical negation, we have to view the program under a “closed world assumption”: the facts that follow from the program’s clauses, and only those facts, are true.

To understand the rest of this post, you’ll want to have read at least the first of Niko’s series.

Negative reasoning in Rust today

To get more clarity about the negative reasoning issues, let’s look at the various places they come into play in the current trait system.

The current system has two distinct “mode switches”:

Intercrate mode, which forces the trait system to account for the possibility that (1) downstream crates using this crate can introduce new types and trait impls that we can’t know about and (2) upstream crates could be changed to introduce new trait impls.
User-facing projection mode, which forces the trait system to account for the possibility that upstream crates could be changed to introduce new specializations (and thus alter the definition of an associated type).

We’ll see what this means concretely in a moment, but one observation right off the bat: these switches are not used orthogonally today. In particular, there is no code today that uses intercrate mode without also using user-facing projection mode.

Let’s go through the three major areas of the compiler that use the trait system and see how they employ these modes, and what the implications are.

Overlap checking and intercrate mode

Part of trait coherence is checking impls for overlap. Consider the following:

trait MyTrait { .. }
impl<T: Error> MyTrait for T { .. }
impl MyTrait for () { .. }

Do these two impls overlap? It depends on whether (): Error – or more precisely, whether we can definitively conclude not { (): Error }. If we are allowed to conclude that not { (): Error }, then we can conclude that the two impls don’t overlap.

Should we be able to draw such a conclusion? On the one hand, currently () does not implement the Error trait (both are defined in std), and hence the two impls here do not overlap. However, if std was ever changed so that () implemented Error, these impls would overlap and could not be allowed. In other words, std adding such an impl would irrevocably break this code! And we’d like for std to be able to add trait implementations without requiring a new major version of Rust.

Part of the rebalancing coherence RFC was a decision that these kinds of negative conclusions can only be drawn about type/trait combinations that are fully under the current crate’s control. In other words, it connects negative reasoning to the orphan rule, which says which impls a crate is allowed to provide. (There is also a mechanism, called fundamental, to promise that certain impls won’t be provided in the future, but we’ll ignore that for now.) By limiting negative reasoning in this way, we can “future proof” crates against changes their dependencies will likely make, such as introducing impls. While such changes can still cause type inference ambiguities, they can never cause irrevocable breakage.

To illustrate where we do allow negative reasoning for overlap checking, consider the following variant:

trait MyTrait { .. }
impl<T: Error> MyTrait for T { .. }

struct MyStruct { .. } // does not implement Error
impl MyTrait for MyStruct { .. }

Here, we allow the trait system to conclude that not { MyStruct: Error }, because whether or not MyStruct: Error is entirely under this crate’s control, so there is no risk of an innocent upstream change breaking this crate.

Here’s a more subtle case:

trait MyTrait { .. }
trait Aux { .. } // no impls in this crate

impl<T: Error> MyTrait for T { .. }

struct MyStruct<U> { .. } // no impl for Error in this crate
impl<T> MyTrait for MyStruct<T> { .. }
impl<T: Aux> Error for MyStruct<T> { .. }

This example has a lot going on. The key point is that the current crate defines an Aux trait, but does not implement it for any types. Hence, there is no T you could mention in this crate such that T: Aux, and hence no type T such that MyStruct<T>: Error. Can we thus conclude that for all T, not { MyStruct<T>: Error }? No! Because a downstream crate using this one could define a new type Foo and implement Aux for it, and then suddenly MyStruct<Foo> would have two applicable impls of MyTrait.

For that reason, we consider not only the way that existing, upstream crates could provide new impls over time, but also consider that downstream crates can introduce new types, and new trait impls for them, that we will never be able to know about here.

All of the above restrictions on negative reasoning are part of intercrate mode, which is only used by overlap checking.

Type checking and user-facing projection mode

Another case of negative reasoning arises through specialization. Consider:

// Crate A

trait Foo { type T; }
impl<T> Foo for T {
    default type T = bool;
}

// Crate B

use crate_a::Foo;

fn main() {
    let x: <bool as Foo>::T = true;
}

Should this compile? More specifically, is it valid for crate B to conclude that the associated type T for bool’s implementation of Foo is bool?

It would be sound to make that assumption, since we know that crate A is the only crate that can implement Foo for bool (due to the orphan rules), and chose not to specialize the impl. However, in the future, crate A could be modified with an additional impl:

impl Foo for bool {
    type T = ();
}

And that change would break crate B. So this is again a question of what changes an upstream crate should be able to make in a minor revision.

Currently, we tilt things in favor of crate A being able to add such an impl, and thus do not allow the original example to compile. This is again a form of constraining negative reasoning: we do not allow crate B to conclude that there is not a more specialized impl that applies, because there could be one in the future.

Interestingly, in the current implementation you could not write fn main even in crate A, where all of the relevant impls are under the crate’s direct control. I consider this a bug.

In any case, this negative reasoning restriction is called “user-facing projection mode” (as opposed to “trans-facing”, which we’ll see below). It’s turned on during both type checking and overlap checking.

Why type checking does not turn on intercrate mode

Today, type checking and overlap checking differ in one (big) way with respect to negative reasoning: type checking does not turn on intercrate mode. Why?

Consider the following, quite contrived example:

trait Foo<T> {
    fn make() -> T;
}
trait Bar<T> {}

#[derive(Debug)]
struct Local;

impl<T> Foo<T> for () where (): Bar<T> {
    fn make() -> T { panic!() }
}

impl Foo<Local> for () {
    fn make() -> Local { Local }
}

fn main() {
    println!("{:?}", <()>::make());
}

This code compiles and prints “Local”. That’s because, from what this crate can see, no type implements Bar<T>, so only the second impl of Foo is viable. That conclusion is then fed into type inference, which decides to interpret <()>::make() as <() as Foo<Local>>::make().

This is all kosher because, for soundness, the only thing that matters about type inference is that, in the end, we get something that typechecks. And we’ve made a deliberate decision to not make type inference future-proofed against changes in other crates, since that creates serious ergonomic problems (see the linked post, particularly the section on conditional impls).

Trans

On the other hand, when it comes time to actually generate code, we are no longer interested in future-proofing (which has already been taken care of in the static checking described above), and instead expect to get a clear-cut answer to all questions we ask of the trait system–in part because all of the questions will involve fully monomorphized types. In particular, we need to allow default associated types to be revealed, so that we can generate code.

Thus, when using the trait system within trans, we allow full-blown negative reasoning.

Modal logic

So, putting together all of the above: the trait system engages in various forms of negative reasoning, but at different times this reasoning is restricted in different ways. Only the trans point of view correlates directly with Prolog’s “negation as failure”/closed world view. The question now is, can we understand the restricted forms of negative reasoning in logical terms as well?

It turns out that there’s a very satisfying answer: use a modal logic.

Modal logic makes truth relative to a possible world, rather than being an absolute thing. In one world, the sky is blue; in another world, it’s red. That’s the story for “facts”. But the basic rules of logic apply no matter what world you’re talking about: 1+1 = 2 in every possible world.

An important aspect of modal logics is modalities, which are basically ways of qualifying a statement by what world(s) you are talking about. The statement I just made above is an example of a modal statement: 1+1 = 2 in every possible world. There are lots of possible modalities, like “in every future world” or “in some possible world” and so on.

There’s a lot more to say about modal logic, but this post is going to tell the story in a Rust-centric way. You can find more background here.

Possible worlds in Rust

What does all this mean for Rust? We can give a rational reconstruction of what the compiler currently does via modal logic, and use it to guide the development of Chalk, resolving a number of open questions along the way.

First off, a “possible world” for us will be a full crate graph, with a particular crate being considered “the current crate” (the one actively being compiled). In the simplest case, there’s just one crate, like in the following two examples:

// Program A
struct MyType;
trait MyTrait {}
impl MyTrait for MyType {}

and

// Program B
struct MyType;
trait MyTrait {}

Both crates define MyType and MyTrait, but in the first one MyType: MyTrait, while in the second one, not { MyType: MyTrait }. The facts on the ground depend on the crate you’re compiling. And when you’re asking Chalk a question, that question is normally grounded in the particular crate graph you’ve lowered to Chalk’s logic. In other words, statements made directly about the current world are interpreted in, well, a “closed world” way: we know precisely what the world is, and can give firm answers on that basis. That’s the appropriate interpretation for trans, as we saw above.

Second–and this is really the key idea–we add a modality to Chalk to make statements about all compatible worlds to the one we’re currently in. This is how we capture the idea of “future proofed” reasoning of the kind we want in type and coherence checking. A world is compatible with the current world if:

The current crate is unchanged.
All dependencies of the current crate still exist, but may be extended in semver-compatible ways (i.e., ways that would only require a minor version bump).
Downstream crates (that use the current crate) can come, go, and otherwise change in arbitrary ways.

Let’s see an example. Suppose we have a two crate dependency graph:

// WORLD 1

////////////////////////////////////////////////////////////
// crate A
////////////////////////////////////////////////////////////

trait Foo {}
struct CrateAType;

////////////////////////////////////////////////////////////
// crate B -- the current crate
////////////////////////////////////////////////////////////

extern crate crate_a;
use crate_a::*;

struct CrateBType;

Here’s a compatible world, one that extends crate A in a minor-bump kind of way:

// WORLD 2

////////////////////////////////////////////////////////////
// crate A
////////////////////////////////////////////////////////////

trait Foo {}
struct CrateAType;
impl Foo for CrateAType {}   // <- changed

////////////////////////////////////////////////////////////
// crate B -- the current crate
////////////////////////////////////////////////////////////

extern crate crate_a;
use crate_a::*;

struct CrateBType;

Now, here’s an interesting thing: in the world 1, not { CrateAType: Foo }. But in this new, compatible world 2, CrateAType: Foo! The facts on the ground have changed in a meaningful way.

Could we go the other way around? That is, if we’re currently talking about the world 2, is world 1 considered compatible? No. Because removing a trait impl is not a semver-compatible change. What that means in practice is that, when jumping to a compatible world, you can go from not { Foo: Bar } to Foo: Bar, but not the other way around.

Here’s a world that is incompatible with the world 1:

// World 3

////////////////////////////////////////////////////////////
// crate A
////////////////////////////////////////////////////////////

trait Foo {}
struct CrateAType;

////////////////////////////////////////////////////////////
// crate B -- the current crate
////////////////////////////////////////////////////////////

extern crate crate_a;
use crate_a::*;

struct CrateBType;
impl Foo for CrateBType {}   // <- changed

The change here is very similar to the change in world 2, but the key difference is which crate was changed: crate B, the “current crate”, is different in this world, and that makes it incompatible with world 1. This is how we get a distinction for the “local” crate, which we control and therefore don’t need to future-proof against. So if not { CrateBType: Foo } and crate B is the current crate, we know that in any compatible world, not { CrateBType: Foo } will still be true.

Now let’s talk about the other kind of change allowed to the world: arbitrary changes to downstream crates. In world 1, we didn’t have any crates downstream from crate B. Here’s a world that does:

// WORLD 4

////////////////////////////////////////////////////////////
// crate A
////////////////////////////////////////////////////////////

trait Foo {}
struct CrateAType;

////////////////////////////////////////////////////////////
// crate B -- the current crate
////////////////////////////////////////////////////////////

extern crate crate_a;
use crate_a::*;

struct CrateBType;

////////////////////////////////////////////////////////////
// crate C -- a downstream crate
////////////////////////////////////////////////////////////

extern crate crate_a;
extern crate crate_b;
use crate_a::*;

struct CrateCType;
impl Foo for CrateCType {}

So, in world 1 we could conclude not { exists<T> { T: Foo } }. But in world 4 here, we have CrateCType: Foo. That’s the kind of thing that we assume can happen during coherence checking, but today we don’t in typechecking. As we’ll see later, though, this single notion of compatible worlds will end up sufficing for both.

Finally, let’s look at one more world, this time involving downstream crates:

// WORLD 5

////////////////////////////////////////////////////////////
// crate A
////////////////////////////////////////////////////////////

trait Foo {}
struct CrateAType;

////////////////////////////////////////////////////////////
// crate B -- the current crate
////////////////////////////////////////////////////////////

extern crate crate_a;
use crate_a::*;

struct CrateBType;

////////////////////////////////////////////////////////////
// crate C -- a downstream crate
////////////////////////////////////////////////////////////

extern crate crate_a;
extern crate crate_b;
use crate_a::*;
use crate_b::*;

impl Foo for CrateBType {}    // <- Note the type here

This world is not just incompatible with world 1–it’s not even a possible world! That’s because crate C violates the orphan rule by providing an impl for a type and trait it does not define (Foo for CrateBType). In other words, when we consider “all compatible worlds”, we take into account the orphan rules when doing so. And that’s why, starting from world 1, we know that in all compatible worlds, not { CrateBType: Foo }.

The compat modality

The discussion for Rust so far has focused on the underlying meaning of worlds. But we want to “surface” that meaning through a modality that we can use when making statements or asking questions in Chalk. We’ll do this via the compat modality. (For modal logic aficionados, this is basically the “box” modality, where the reachable worlds are the “compatible” ones, described below.)

The basic idea is that if we pose a query Q, that’s understood in terms of the current world, but if we ask compat { Q }, we’re asking if Q is true in all compatible worlds.

Revisiting our example above, if world 1 is the current world, here are some fact’s we’ll be able to deduce

not { CrateAType: Foo }
not { CrateBType: Foo }
not { exists<T> { T: Foo } }
compat { not { CrateBType: Foo } } – in every compatible world, we still know that CrateBType does not implement Foo

And here are some statements that will not hold:

compat { not { CrateAType: Foo } }, because of examples like world 2.
compat { CrateAType: Foo }, because of world 1 itself.
compat { not { exists<T> { T: Foo } } }, for similar reasons

Note that for a given query Q, we might not be able to show compat { Q } or compat { not { Q } }, because some compatible worlds satisfy Q, and some don’t. So, within the compat modality, you don’t get the law of the excluded middle.

There’s more to say about this modality and how it’s implemented, but we can already put some cards on the table: when we’re type or coherence checking, the queries we pose to Chalk will be placed within a compat modality, which essentially “future proofs” their conclusions. For trans, we’ll pose queries directly about the current world.

In other words, having the compat modality means we can decide whether to make a “closed world assumption” or not, depending on what we’re trying to do.

Not taking Yes or No for an answer

To finish telling the story around modalities in Chalk, as well as to fully capture the current trait system’s behavior, we need to talk a little bit about what kinds of answers you can get when you pose a query to Chalk.

Traditional logic programming gives you two kinds of answers: Yes (with some information about how the query was resolved) and No. So for example, take the following Rust program:

trait Foo: Display {
    fn new() -> Self;
}
impl Foo for u8 { ... }
impl Foo for bool { ... }

If you ask exists<T> { T: Foo } in a traditional Prolog engine, you’ll get something like “Yes, T = u8”; if you ask again, you’ll get “Yes, T = bool”, and if you ask a final time, you’ll get “No”.

That’s not quite what we want for Rust. There, “existential” questions come up primarily when we’re in the middle of type inference and we don’t know what a particular type is yet. Think about a program using the Foo trait like so:

fn main() {
    println!("{}", Foo::new());
}

When we go to type check this function, we don’t immediately know what the type returned by Foo::new() is going to be, or which impl to use. While it’s true that there do exist types we could use, we don’t want to pick one at random for type inference. Instead, we want an error asking the programmer to clarify which type they wanted to use.

On the other hand, when there’s only one choice of type given other constraints, we allow type inference to assume the programmer must have meant that type:

struct Foo;

trait Convert<T> {
    fn convert(self) -> T;
}

impl Convert<bool> for Foo {
    fn convert(self) -> bool { true }
}

fn main() {
    println!("{}", Foo.convert()) // prints `true`
}

Similarly, when it comes time to generate code, we expect there to be a unique choice of impl to draw each method call from!

These considerations have led both rustc and Chalk to adopt a kind of three-way answer system: Yes, No, and Maybe. The precise meaning of these outcomes has been a bit muddy, but part of what I want to advocate for is the following setup:

Yes: in the current world, there is a unique way of choosing the existentials (inference variables) to make the query true; here’s what it is.
No: the query does not hold in the current world.
Maybe: the query may or may not hold in the current world; optionally, here’s a suggestion of what to choose for the existentials if you get stuck.

What is this business about getting stuck? In general, when we’re type checking a function body, we don’t always know the type of everything at the time we encounter it. Take, for example, the following:

fn main() {
    let mut opt = None;
    opt = Some(true);
}

When we are typechecking the first line, we know that opt will have type Option<?T>, but we don’t know what ?T is; it’s an inference variable. Later on in checking, as we encounter further constraints, we’ll learn that ?T must be bool. By the time we finish typechecking a function body, all inference variables must be so resolved; otherwise, we wouldn’t know how to generate the code!

This inference process is interleaved with querying the trait system, as in the Convert example above. So in general we need the trait system to feed back information about unknown types. But for the case of Maybe, the trait system is saying that there might be multiple ways of implementing the trait, and the suggested types being returned should only be used as a “fallback” if type checking otherwise can’t make any progress.

Let’s take a look at a couple of ways that this version of Maybe helps.

Leveraging Maybe for where clause precedence

In the current trait system, where clauses are given precedence over other impls when it comes to type inference:

pub trait Foo<T> { fn foo(&self) -> T; }
impl<T> Foo<()> for T { fn foo(&self) { } }
impl Foo<bool> for bool { fn foo(&self) -> bool { *self } }

pub fn foo<T>(t: T) where T: Foo<bool> {
   println!("{:?}", <T as Foo<_>>::foo(&t));
}
fn main() { foo(false); }

The program prints false. What’s happening here is that the call to foo within println! does not provide enough information by itself to know whether we want the Foo<bool> impl or the Foo<()> impl, both of which apply. In other words, there’s not a unique way to resolve the type. However, the current trait system assumes that if you have an explicit where clause, it should take precedence over impls, and hence influence type inference.

It wasn’t initially clear how this would carry over to Chalk, where we’re trying to take a “pure logic” stance on things, and hence would prefer not to bake in various notions of precedence and so on.

However, with the reading of Maybe given above, we can yield a Maybe answer here and recommend to type inference that it choose bool if it gets stuck, but we’ve made clear that this is a sort of “extra-logical” step.

Leveraging Maybe for type checking under compat

Similarly, recall that in the current trait system, there are two different mode switches, but so far we’ve only talked about a single compat modality to add to Chalk.

The key, again, is to leverage Maybe. In particular, we can have both type checking and coherence checking make queries to Chalk within the compat modality. But if they get a Maybe answer back, they will interpret it differently:

Coherence, which is trying to be conservative, will consider a Maybe to mean “Yes, these could potentially overlap”, and hence produce an error.
Type checking, as explained above, will take the fallbacks suggested by Maybe under advisement, and if it gets stuck, will apply them and see whether it can make further progress.

Here again was the example that distinguished the modes that type checking and coherence used:

trait Foo<T> {
    fn make() -> T;
}
trait Bar<T> {}

#[derive(Debug)]
struct Local;

impl<T> Foo<T> for () where (): Bar<T> {
    fn make() -> T { panic!() }
}

impl Foo<Local> for () {
    fn make() -> Local { Local }
}

fn main() {
    println!("{:?}", <()>::make());
}

The key point was: can you deduce that not { exists<T> { (): Bar<T> } }, and hence that only the second impl of Foo could possibly apply?

In the system proposed by this post, we’d follow a chain of events like the following:

The type checker asks: exists<T> { compat { (): Make<T> } }
- We check the first impl, and end up asking: compat { (): Bar<?T> }
  - We return Maybe, since we’re within compat and there are indeed some compatible worlds for which (): Bar<?T> for some ?T; but we have no idea what that ?T should be.
- We check the second impl, and get Yes with ?T = Local.
- Since there were multiple possibilities, we don’t have a unique answer; but only one of the possibilities gave us a suggestion for the inference variables. So we return Maybe with ?T = Local
The type checker takes under advisement that ?T should be unified with Local if nothing else constrains it.

In other words, unlike with the current trait implementation, we don’t have to pretend that we actually get a unique answer here; we can work within the future-proofed compat modality, and get back a Maybe answer with the suggestion we wanted.

It’s quite nice that all of the static checking takes place under the “future proofing” of the compat modality, whereas trans talks only about the world as it is, under a closed world assumption.

Implementing not and compat in Chalk

Before we close out this post, it’s worth being a bit more concrete about how not and compat would be implemented in Chalk (neither exists today).

Negation

For not { Q }, we basically follow Prolog-style negation-as-failure:

Attempt to solve Q, then dispatch on the answer we got:
- If we got Yes, return No
- If we got No,
  - If there are no existential variables within Q, return Yes
  - Otherwise, return Maybe, with no type suggestions
- If we got Maybe, return Maybe, with no type suggestions

As you can see, with negation we don’t get information about existential variables, which is the same in traditional Prolog.

The compat modality

The implementation of compat { Q } is a bit more complex. First of all, it’s important to realize that Chalk already operates on an explicit “world”, namely the program you’ve lowered to it. When you ask questions, it will use this world as one source of facts (together with where clauses, basically). So the question is: how do we tweak this setup to capture the idea of “Evaluate this in any world compatible with the current one”?

We certainly can’t literally construct every such world, as there are an infinite number of them. But fortunately, we don’t have to. The role of the world in Chalk, as I said above, is to provide a core source of facts. To model “some arbitrary compatible world”, we just need to capture the various facts that might be true in such a world. This can be done in a “lazy” kind of way: in the compat modality, whenever we are seeing whether a particular fact is true by virtue of the current world, we see whether it’s a fact that could be made true in some compatible world, and if so yield Maybe (with no suggested types).

Let’s look at this concretely, revisiting world 1:

// WORLD 1

////////////////////////////////////////////////////////////
// crate A
////////////////////////////////////////////////////////////

trait Foo {}
struct CrateAType;

////////////////////////////////////////////////////////////
// crate B -- the current crate
////////////////////////////////////////////////////////////

extern crate crate_a;
use crate_a::*;

struct CrateBType;

If we ask CrateAType: Foo, we’ll get No. But if we ask compat { CrateAType: Foo }, then Chalk should switch into “compatible world” mode, so that when it’s consulting the world whether CrateAType: Foo, it will determine that such an impl could be added by crate A, and hence return Maybe. But compat { CrateBType: Foo } will return No, because we know the current crate controls the existence of such an impl. And hence, compat { not { CrateBType: Foo } } returns Yes. (Other than turning on “compatible world” mode, the compat modality just re-invokes the solver and returns up whatever was found.)

This strategy has much in common with the current rustc implementation, but now we have explicit negation (which does the right thing both inside and outside the modality), and we can get away with just this single modality, versus rustc’s two different “global switches”. Moreover, we’ve rationally reconstructed the behavior by connecting it to modal logic, which puts us on better footing for exploring extensions (like negative trait impls and so on).

A word about associated types: as Niko’s latest post discusses, when we lower impls we separately lower the fact that the impl exists from the various projections it provides. For compat, we only need to handle Type: Trait kinds of facts; the “applicative fallback” for associated type projection takes care of the rest.

Altogether, we avoid actually constructing some particular compatible world, and avoid having to guess meaningful facts about it; we just say “Maybe” to any question that could have a different answer in some compatible world.

What’s next

Putting together everything proposed in this post, we’ve achieved quite a bit:

A clear meaning for Yes/No/Maybe.
A story for the various “modes” in today’s trait system, which means we can support type checking, coherence checking, and trans.
A story for integrating the orphan rules into Chalk.
A story for integrating the where clause precedence rules into Chalk.
A clear treatment of negative reasoning in general, which will allow us to much more confidently employ it in the future.

What’s missing to achieve parity with rustc is specialization. It turns out that the modal foundation laid here provides most of what we need for specialization. However, there are some additional concerns around “lifetime dispatch”, which render rustc’s implementation of specialization unsound. The Chalk implementation should provide a testbed for finally solving those issues. I plan to have a follow-up post about that in the near future.

The design presented here is also just an “on paper” design. I’ll be working to implement it over the coming weeks.

http://aturon.github.io/tech/2017/04/24/negative-chalk

Specialization, coherence, and API evolution

Feb 6, 2017 Updated Feb 6, 2017

Show full content

Specialization has been available in nightly Rust for over a year, and we’ve recently been thinking about the steps needed to stabilize it.

There are a couple of implementation issues that are currently blocked on an overhaul of the trait system which should be coming in the next couple of months.

What I want to talk about, though, is some deeper design questions that need to be resolved prior to stabilization, ones involving potential changes to the core specialization rules. This is a story that begins with a bold hope, runs headlong into a tragic discovery, and ultimately ends up close to where it started.

A New Hope

A rite of passage for learning Rust’s trait system is first encountering a coherence error. Coherence is a vital but frustrating property; it guarantees that there is always a single, unambiguous impl of a trait that is used for any given type.

It’s not too hard to see why coherence is vital. Imagine working with a HashMap in which multiple implementations of Hash applied to the key type. If different pieces of code ended up using different impls, map operations would return totally bogus results, and it would be very difficult to track down why.

What makes coherence frustrating is that, to enforce it, we must limit the kinds of impls you can write in different crates. We do this through a pair of rules:

The orphan rule, which *very roughly* says that you can write an impl only if either your crate defined the trait or defined one of the types the impl is for.
The overlap rule, which says that a given trait cannot have two impls that both apply to a single type (which would introduce ambiguity about which impl to use), unless one is a specialization of the other.

These rules work closely together; in particular, the orphan rule ensures that sibling crates can’t accidentally define overlapping impls for a parent trait, since their impls must each involve crate-local types.

Prior to Rust 1.0, we iterated quite a bit on these rules, and arrived at the current design in the Rebalancing coherence RFC. That RFC was predicated on a core assumption:

The problem is that due to coherence, the ability to define impls is a zero-sum game: every impl that is legal to add in a child crate is also an impl that a parent crate cannot add without fear of breaking downstream crates.

However, with specialization, it seems this assumption may not longer hold! The point of specialization is to allow for impls to overlap, and then to select the “most specific” impl. That means, in particular, that it’s feasible for a parent and child crate to safely define overlapping impls. Niko wrote a blog post proposing to leverage specialization in just this way.

The fundamental attribute

The existing orphan rule is based on an idea of “fundamental” types. It’s easiest to understand how it works through example:

//// Parent crate ////
trait ParentTrait {
    fn foo(&self);
}

//// Child crate ////
struct ChildType { .. }
impl ParentTrait for Box<ChildType> { .. }

This example is permitted today, and works fine as written. However, it means that adding the following blanket impl to the parent crate is a breaking change (assuming that the new impl is not specializable):

// Add to parent crate
impl<T> ParentTrait for Box<T> {
    // note: not specializable, since we didn't write `default`
    fn foo(&self) { .. }
}

This change would introduce an overlap with the child crate’s impl, which prevents it from compiling.

To avoid all such new parent crate impls being breaking changes, the Rebalancing coherence RFC introduced a restriction on child crates: roughly speaking, their impls of parent traits much either directly reference a type defined in the child crate, or reference it within a “fundamental” type constructor (&, &mut, Box). In other words:

// These child crate impls are allowed:
impl ParentTrait for ChildType { .. }
impl ParentTrait for Box<ChildType> { .. }
impl<'a> ParentTrait for &'a ChildType { .. }

// ... but these impls are NOT allowed:
impl ParentTrait for Vec<ChildType> { .. }
impl ParentTrait for (ChildType, ChildType) { .. }

The idea is to strike a balance between the impls that child crates can have, and the ones that parent crates can add over time. If a parent crate adds a blanket impl involving a fundamental type constructor, that’s a breaking change (since it could overlap with a child crate). But if it adds one for e.g. Vec<T>, that’s fine, because no child crate could have an impl involving that type.

This “fundamental” restriction is an arbitrary line drawn as part of the “zero-sum game” of writing non-overlapping impls. It’s applied using an unstable attribute, #[fundamental], which is difficult to understand and has had no clear path toward stabilization.

A positive-sum game?

But wait. When we introduced specialization, we relaxed the overlap rule to allow for overlap, as long as one impl specialized the other. Since the orphan rule is ultimately about preventing overlap from arising between multiple crates, maybe there’s a way to leverage specialization there as well?

If we revisit the above example, but make the new parent crate impl specializable (by using default), it no longer breaks the child crate. In other words, the following impls can all safely coexist:

//// Parent crate ////
trait ParentTrait {
    fn foo(&self);
}

impl<T> ParentTrait for Box<T> {
    default fn foo(&self) { .. }
}

//// Child crate ////
struct ChildType { .. }

// Now specializes the blanket impl from the parent crate
impl ParentTrait for Box<ChildType> { .. }

Is this always the case? In other words, if you add a new impl in the parent crate and mark it specializable, are you guaranteed not to break any child crates? That would allow us to get rid of fundamental, allowing both parent and client crates to add more kinds of impls than they can today, without breakage!

To make this idea work, you’d need to expand the specialization rules a bit, as Niko explains in his post. But those details won’t be too relevant to the core point of this post.

The Trait System Strikes Back

Half way toward writing up an RFC with the above ideas, trying to prove to myself that they worked, I started to get worried. And then I started trying to prove that they didn’t work. And it turns out they don’t.

The crux of the problem is that the trait system is just too powerful; as we’ve learned over and over again, the trait system can be used to draw sneaky connections that are hard to defend against. Let me show you what I mean.

Imagine we have three crates, arranged in the following way:

//// Crate A ////

trait A {}

// Line we want to add:
// impl<T> A for T {}


//// Crate B ////

trait B {
    type Out;
}

impl<T> B for T where T: A {
    // Note: not specializable
    type Out = ();
}


//// Crate C ////

struct C;

impl B for C {
    type Out = bool;
}

Here, we have two traits in action, linked by the impl in crate B. The problem is that, because the traits are connected in this way, adding the impl in crate A for trait A creates overlap for trait B. And crucially, it’s not enough to require that the impl we added be specializable; the problem is that the existing impl in crate B, which crate A doesn’t even know about, is not specializable.

Like all of our problems around API evolution, this problem boils down to negative reasoning. In particular, crate C is initially allowed to write its impl because it knows, locally, that C does not implement A (and thus the blanket impl in crate B doesn’t apply). The problem is that crate C isn’t the only crate that gets to decide whether it’s type implements A. So it’s a zero-sum game after all, and something like fundemantal is needed to adjudicate between different crates.

You might ask: is it really essential to make specialization opt-in? And indeed, if all impls were specializable, the idea would’ve worked out.

However, it’s not a tenable option. Associated types, in particular, need to opt in to specialization, and these days every method has an implicit associated type.

Return of the Subset Rule

So where does that lead us?

In the short term, it means that we should stick with the subset rule as-is. (I didn’t go into detail here, but Niko, Withoutboats and I had been playing with some changes to support the idea of getting rid of fundamental, which would also be breaking changes to the specialization rule; we’re backing off on that).

In the long term, there are several extensions we’d like to explore for specialization. While these extensions won’t let us relax the orphan rule, they will allow for more kinds of overlap, thereby making specialization usable in more contexts.

Intersection impls, which allow arbitrary partial overlap between impls as long as there’s an additional impl that covers precisely the area of overlap (thereby avoiding any ambiguity).
Type structure precedence, which considers more specific *type structure* before considering where clauses. While the idea was originally motivated by relaxing the orphan rules, which we can’t do, it is still a potentially useful expansion of specialization.
Child trumps parent, in which a child crate impl always specializes any parent crate impl it overlaps with. This rule is particularly simple and is usually what you want when such overlap arises.

The good news is that:

The current subset rule is forwards-compatible with *all* of these extensions.
Moreover, these extensions are compatible with each other!

I think it’s probably worth ultimately landing all three extensions under separate feature gates, to gain experience and determine which ones are most useful. They are all pretty straightforward to implement.

In the meantime, though, we can press forward with stabilization of today’s specialization, as soon as the remaining implementation issues are resolved.

http://aturon.github.io/tech/2017/02/06/specialization-and-coherence

Designing futures for Rust

Sep 7, 2016 Updated Sep 7, 2016

Show full content

I recently wrote about the importance of asynchronous I/O in Rust and the aims of the new futures library. This post deepens the story by explaining the core design of that library. If you’re looking for more on the use of the library, you’ll have to wait; we’re very actively working on the Tokio stack and will have more to say once that’s settled down a bit.

To recap, the aim is robust and ergonomic async I/O with no performance penalty:

Robust: the library should have a strong story for error handling, cancellation, timeouts, backpressure, and other typical concerns for writing robust servers. This being Rust, we’ll also of course guarantee thread safety.
Ergonomic: the library should make writing asynchronous code as painless as possible—ideally, as easy as writing synchronous code, but with greater expressivity. While the latter will require async/await to fully achieve, the futures library provides a high-level way of expressing and combining asynchronous computation, similar to Rust’s successful Iterator API.
Zero cost: code written using the library should compile down to something equivalent (or better than) “hand-rolled” server implementations, which would typically use manual state machines and careful memory management.

Achieving these goals requires a mix of existing techniques in Rust, and some new ideas about how to build a futures library; this post will cover both. In a nutshell:

Leverage Rust’s traits and closures for ergonomics and cost-avoidance. Traits and closures in Rust do not require heap allocation or dynamic dispatch—facts we take heavy advantage of. We also use the trait system to package up the futures API in a simple and convenient way.
Design the core Future abstraction to be demand-driven, rather than callback-oriented. (In async I/O terms, follow the “readiness” style rather than the “completion” style.) That means that composing futures together does not involve creating intermediate callbacks. As we’ll see, the approach also has benefits for backpressure and cancellation.
Provide a task abstraction, similar to a green thread, that drives a future to completion. Housing futures within a task is what enables the library code to compile down to the traditional model, i.e., with big state machines that can serve as a callback for a large number of underlying events.

Let’s dive in!

Background: traits in Rust

We’ll start with a quick review of traits in Rust. If you want more reading on these topics, you might check out the longer overview of traits.

To understand how the futures design works, you need to have a basic grasp on Rust’s traits. I won’t attempt a complete introduction here, but I’ll try to hit the most relevant highlights for making sense of what’s going on.

Traits provide Rust’s sole notion of interface, meaning that a trait is an abstraction that can apply to many concrete types. For example, here’s a simplified trait for hashing:

trait Hash {
    fn hash(&self) -> u64;
}

This trait stipulates that the type implementing it must provide a hash method, which borrows self and produces a u64. To implement the trait, you have to give a concrete definition for the method, like the following simple-minded one:

impl Hash for bool {
    fn hash(&self) -> u64 {
        if *self { 0 } else { 1 }
    }
}

impl Hash for i32 { ... } // etc

Once these implementations are in place, you can make calls like true.hash() to invoke the method directly. But often the methods are called via generics, which is where traits truly act as an abstraction:

fn print_hash<T: Hash>(t: &T) {
    println!("The hash is {}", t.hash())
}

The print_hash function is generic over an unknown type T, but requires that T implements the Hash trait. That means we can use it with bool and i32 values:

print_hash(&true);   // instantiates T = bool
print_hash(&12);     // instantiates T = i32

Generics are compiled away, resulting in static dispatch. That is, as with C++ templates, the compiler will generate two copies of the print_hash method to handle the above code, one for each concrete argument type. That in turn means that the internal call to t.hash()—the point where the abstraction is actually used—has zero cost: it will be compiled to a direct, static call to the relevant implementation:

// The compiled code:
__print_hash_bool(&true);  // invoke specialized bool version directly
__print_hash_i32(&12);     // invoke specialized i32 version directly

Compiling down to non-generic code is essential for making an abstraction like futures work without overhead: most of the time, that non-generic code will also be inlined, letting the compiler produce and optimize large blocks of code that resemble what you might have written in a low-level, “hand-rolled” style.

Closures in Rust work the same way—in fact, they’re just traits. That means, in particular, that creating a closure does not entail heap allocation, and calling a closure can be statically-dispatched, just like the hash method above.

Finally, traits can also be used as “objects”, which cause the trait methods to be dynamically dispatched (so the compiler doesn’t immediately know what implementation a call will use). The benefit to trait objects is for heterogeneous collections, where you need to group together a number of objects which may have different underlying types but all implement the same trait. Trait objects must always be behind a pointer, which in practice usually requires heap allocation.

Defining futures

Now, let’s turn to futures. The earlier post gave an informal definition of a future:

In essence, a future represents a value that might not be ready yet. Usually, the future becomes complete (the value is ready) due to an event happening somewhere else.

Clearly, we’ll want futures to be some kind of trait, since there will be many different kinds of “values that aren’t ready yet” (e.g. data on a socket, the return value from an RPC call, etc.). But how do we represent the “not ready yet” part?

False start: the callback (aka completion-based) approach

There’s a very standard way to describe futures, which we found in every existing futures implementation we inspected: as a function that subscribes a callback for notification that the future is complete.

Note: In the async I/O world, this kind of interface is sometimes referred to as completion-based, because events are signaled on completion of operations; Windows’s IOCP is based on this model.

In Rust terms, the callback model leads to a trait like the following:

trait Future {
    // The type of value produced by the future
    type Item;

    // Tell the future to invoke the given callback on completion
    fn schedule<F>(self, f: F) where F: FnOnce(Self::Item);
}

The FnOnce here is a trait for closures that will be invoked at most once. Because schedule is using generics, it will statically dispatch any calls to that closure.

Unfortunately, this approach nevertheless forces allocation at almost every point of future composition, and often imposes dynamic dispatch, despite our best efforts to avoid such overhead.

To see why, let’s consider a basic way of combining two futures:

fn join<F, G>(f: F, g: G) -> impl Future<Item = (F::Item, G::Item)>
    where F: Future, G: Future

This function takes two futures, f and g, and returns a new future that yields a pair with results from both. The joined future completes only when both of the underlying futures complete, but allows the underlying futures to execute concurrently until then.

How would we implement join using the above definition of Future? The joined future will be given a single callback both_done which expects a pair. But the underlying futures each want their own callbacks f_done and g_done, taking just their own results. Clearly, we need some kind of sharing here: we need to construct f_done and g_done so that either can invoke both_done, and make sure to include appropriate synchronization as well. Given the type signatures involved, there’s simply no way to do this without allocating (in Rust, we’d use an Arc here).

This kind of problem was repeated in many of the future combinators.

Another problem is that event sources like event loops need to invoke callbacks of arbitrary, different types—a case of the heterogeneity mentioned above. As a concrete example, when a socket is ready for reading, that event will need to be dispatched to some callback, and in general you’ll need a mix of different futures to be in play with different sockets. To make this work, you end up needing to heap-allocate callbacks for the event loop at every point the future wants to listen for an event, and dynamically dispatch notifications to those callbacks.

TL;DR, we were unable to make the “standard” future abstraction provide zero-cost composition of futures, and we know of no “standard” implementation that does so.

What worked: the demand-driven (aka readiness-based) approach

After much soul-searching, we arrived at a new “demand-driven” definition of futures. Here’s a simplified version that ignores the error handling of the real trait:

// A *simplified* version of the trait, without error-handling
trait Future {
    // The type of value produced on success
    type Item;

    // Polls the future, resolving to a value if possible
    fn poll(&mut self) -> Async<Self::Item>;
}

enum Async<T> {
    /// Represents that a value is immediately ready.
    Ready(T),

    /// Represents that a value is not ready yet, but may be so later.
    NotReady,
}

The API shift here is straightforward: rather than the future proactively invoking a callback on completion, an external party must poll the future to drive it to completion. The future can signal that it’s not yet ready and must be polled again at some later point by returning Async::NotReady (an abstraction of EWOULDBLOCK).

Note: In the async I/O world, this kind of interface is sometimes referred to as readiness-based, because events are signaled based on “readiness” of operations (e.g. bytes on a socket being ready) followed by an attempt to complete an operation; Linux’s epoll is based on this model. (This model can also express completion, by treating the completion of an operation as the signal that the future is ready for polling.)

By eliminating all the intermediate callbacks, we’ve addressed some of the key problems of the previous version of the trait. But we’ve introduced a new one: after NotReady is returned, who polls the future, and when do they do so?

Let’s take a concrete example. If a future is attempting to read bytes from a socket, that socket may not be ready for reading, in which case the future can return NotReady. Somehow, we must arrange for the future to later be “woken up” (by calling poll) once the socket becomes ready. That kind of wakeup is the job of the event loop. But now we need some way to connect the signal at the event loop back to continuing to poll the future.

The solution forms the other main component of the design: tasks.

The cornerstone: tasks

A task is a future that is being executed. That future is almost always made up of a chain of other futures, as in the example from the original post:

id_rpc(&my_server).and_then(|id| {
    get_row(id)
}).map(|row| {
    json::encode(row)
}).and_then(|encoded| {
    write_string(my_socket, encoded)
})

The key point is that there’s a difference between functions like and_then, map and join, which combine futures into bigger futures, and functions that execute futures, like:

The wait method, which simply runs the future as a task pinned to the current thread, blocking that thread until a result is produced and returned.
The spawn method on a thread pool, which launches a future as an independent task on the pool.

These execution functions create a task that contains the future and is responsible for polling it. In the case of wait, polling takes place immediately; for spawn, polling happens once the task is scheduled onto a worker thread.

However polling begins, if any of the interior futures produced a NotReady result, it can grind the whole task to a halt—the task may need to wait for some event to occur before it can continue. In synchronous I/O, this is where a thread would block. Tasks provide an equivalent to this model: the task “blocks” by yielding back to its executor, after installing itself as a callback for the events it’s waiting on.

Returning to the example of reading from a socket, on a NotReady result the task can be added to the event loop’s dispatch table, so that it will be woken up when the socket becomes ready, at which point it will re-poll its future. Crucially, though, the task instance stays fixed for the lifetime of the future it is executing—so no allocation is needed to create or install this callback.

Completing the analogy with threads, tasks provide a park/unpark API for “blocking” and wakeup:

/// Returns a handle to the current task to call unpark at a later date.
fn park() -> Task;

impl Task {
    /// Indicate that the task should attempt to poll its future in a timely fashion.
    fn unpark(&self);
}

Blocking a future is a matter of using park to get a handle to its task, putting the resulting Task in some wakeup queue for the event of interest, and returning NotReady. When the event of interest occurs, the Task handle can be used to wake back up the task, e.g. by rescheduling it for execution on a thread pool. The precise mechanics of park/unpark vary by task executor.

In a way, the task model is an instance of “green” (aka lightweight) threading: we schedule a potentially large number of asynchronous tasks onto a much smaller number of real OS threads, and most of those tasks are blocked on some event most of the time. There’s an essential difference from Rust’s old green threading model, however: tasks do not require their own stack. In fact, all of the data needed by a task is contained within its future. That means we can neatly sidestep problems of dynamic stack growth and stack swapping, giving us truly lightweight tasks without any runtime system implications.

Perhaps surprisingly, the future within a task compiles down to a state machine, so that every time the task wakes up to continue polling, it continues execution from the current state—working just like hand-rolled code based on mio. This point is most easily seen by example, so let’s revisit join.

Example: join in the demand-driven model

To implement the join function, we’ll introduce a new concrete type, Join, that tracks the necessary state:

fn join<F: Future, G: Future>(f: F, g: G) -> Join<F, G> {
    Join::BothRunning(f, g)
}

enum Join<F: Future, G: Future> {
    BothRunning(F, G),
    FirstDone(F::Item, G),
    SecondDone(F, G::Item),
    Done,
}

impl<F, G> Future for Join<F, G> where F: Future, G: Future {
    type Item = (F::Item, G::Item);

    fn poll(&mut self) -> Async<Self::Item> {
        // navigate the state machine
    }
}

The first thing to notice is that Join is an enum, whose variants represent states in the “join state machine”:

BothRunning: the two underlying futures are both still executing.
FirstDone: the first future has yielded a value, but the second is still executing.
SecondDone: the second future has yielded a value, but the first is still executing.
Done: both futures completed, and their values have been returned.

Enums in Rust are represented without requiring any pointers or heap allocation; instead, the size of the enum is the size of the largest variant. That’s exactly what we want—that size represents the “high water mark” of this little state machine.

The poll method here will attempt to make progress through the state machine by polling the underlying futures as appropriate.

Recall that the aim of join is to allow its two futures to proceed concurrently, racing to finish. For example, the two futures might each represent subtasks running in parallel on a thread pool. When those subtasks are still running, polling their futures will return NotReady, effectively “blocking” the Join future, while stashing a handle to the ambient Task for waking it back up when they finish. The two subtasks can then race to wake up the Task, but that’s fine: the unpark method for waking a task is threadsafe, and guarantees that the task will poll its future at least once after any unpark call. Thus, synchronization is handled once and for all at the task level, without requiring combinators like join to allocate or handle synchronization themselves.

You may have noticed that poll takes &mut self, which means that a given future cannot be polled concurrently—the future has unique access to its contents while polling. The unpark synchronization guarantees it.

One final point. Combinators like join embody “small” state machines, but because some of those states involve additional futures, they allow additional state machines to be nested. In other words, polling one of the underlying futures for join may involve stepping through its state machine, before taking steps in the Join state machine. The fact that the use of the Future trait does not entail heap allocation or dynamic dispatch is key to making this work efficiently.

In general, the “big” future being run by a task—made up of a large chain of futures connected by combinators—embodies a “big” nested state machine in just this way. Once more, Rust’s enum representation means that the space required is the size of the state in the “big” machine with the largest footprint. The space for this “big” future is allocated in one shot by the task, either on the stack (for the wait executor) or on the heap (for spawn). After all, the data has to live somewhere—but the key is to avoid constant allocations as the state machine progresses, by instead making space for the entire thing up front.

Futures at scale

We’ve seen the basics of demand-driven futures, but there are a number of concerns about robustness that we also want to cover. It turns out that these concerns are addressed naturally by the demand-driven model. Let’s take a look at a few of the most important.

Cancellation

Futures are often used to represent substantial work that is running concurrently. Sometimes it will become clear that this work is no longer needed, perhaps because a timeout occurred, or the client closed a connection, or the needed answer was found in some other way.

In situations like these, you want some form of cancellation: the ability to tell a future to stop executing because you’re no longer interested in its result.

In the demand-driven model, cancellation largely “falls out”. All you have to do is stop polling the future, instead “dropping” it (Rust’s term for destroying the data). And doing so is usually a natural consequence of nested state machines like Join. Futures whose computation requires some special effort to cancel (such as canceling an RPC call) can provide this logic as part of their Drop implementation.

Backpressure

Another essential aspect of at-scale use of futures (and their close relative, streams) is backpressure: the ability of an overloaded component in one part of a system to slow down input from other components. For example, if a server has a backlog of database transactions for servicing outstanding requests, it should slow down taking new requests.

Like cancellation, backpressure largely falls out of our model for futures and streams. That’s because tasks can be indefinitely “blocked” by a future/stream returning NotReady, and notified to continue polling at a later time. For the example of database transactions, if enqueuing a transaction is itself represented as a future, the database service can return NotReady to slow down requests. Often, such NotReady results cascade backward through a system, e.g. allowing backpressure to flow from the database service back to a particular client connection then back to the overall connection manager. Such cascades are a natural consequence of the demand-driven model.

Communicating the cause of a wakeup

If you’re familiar with interfaces like epoll, you may have noticed something missing from the park/unpark model: it provides no way for a task to know why it was woken up.

That can be a problem for certain kinds futures that involve polling a large number of other futures concurrently—you don’t want to have to re-poll everything to discover which sub-future is actually able to make progress.

To deal with this problem, the library offers a kind of “epoll for everyone”: the ability to associate “unpark events” with a given Task handle. That is, there may be various handles to the same task floating around, all of which can be used to wake the task up, but each of which carries different unpark events. When woken, the future within the task can inspect these unpark events to determine what happened. See the docs for more detail.

Wrapping up

We’ve now seen the core design principles behind the Rust futures and streams library. To recap, it boils down to a few key ideas:

Encapsulate running futures into tasks, which serve as a single, permanent “callback” for the future.
Implement futures in a demand-driven, rather than callback-oriented, style.
Use Rust’s trait system to allow composed futures to flatten into big state machines.

Together, these ideas yield a robust, ergonomic, zero cost futures library.

As I mentioned at the outset of the post, we’re very actively working on the layers above the basic futures library—layers that incorporate particular I/O models (like mio) and also provide yet-higher-level tools for building servers. These layers are part of the Tokio project, and you can read more about the overall vision in my earlier post. As those APIs stabilize, expect to see more posts describing them!

http://aturon.github.io/tech/2016/09/07/futures-design

Expanding the Tokio project

Aug 26, 2016 Updated Aug 26, 2016

Show full content

If you’ve been following Rust in the last month, you’ve probably seen the announcements of the Futures library and the Tokio framework that sits on top of it. There’s been some confusion about how these projects fit together, and what the overall story is shaping up to be.

Today, we’re happy to announce the formation of the Tokio Core Team, as well as an overall plan for the two projects and how they fit together. The team consists of Carl Lerche, Alex Crichton, and myself; more on that below.

An early vision of the I/O stack

There are three primary levels of abstraction in Tokio:

At the highest level is a service, which is where you write a server application. Following the Finagle model, a service is a simple thing: it’s a function from requests to futures of responses. This simple model is incredibly powerful: it separates the implementation of request processing from the implementation of the underlying protocol, and makes it possible to factor out an ecosystem of middleware. All of this seamlessly support async I/O via futures. Middleware runs the gamut from connection pooling to retry/timeout logic to logging to load balancing – all of which can be written independently from any particular service or protocol. Read Your Server as a Function for the inspiration.

The tokio-service crate provides core trait definitions for services. Servers that can process particular request/response types (like HTTP) are offered as standalone crates. Building an http server is just a matter of writing a function from http requests to futures of http responses.
In the middle are protocols, like HTTP or Mux. Here, too, there is a lot of complexity worth factoring out, both at the transport layer and in the protocol “dispatch” layer. The tokio-proto crate provides re-usable components for building new protocol implementations. We expect for there to be a similar kind of “middleware” ecosystem at these lower levels.
At the lowest level is the event loop, which is where we bridge the OS’s I/O facilities into the world of futures. The tokio-core crate provides a generic event loop for doing async I/O with futures. If you want complete control, that’s the entry point for you; it’s particularly useful for cases that don’t fit so nicely into the service model, such as proxy servers.

In short, we want the Tokio project to be a “one stop shop” for doing futures-based I/O, whether at the highest level of prebuilt protocols, or the lowest level of the core event loop.

In our view, the lowest layers should strive to be zero cost and fully general, allowing them to be used in a large number of contexts. As you go up the stack, getting closer to an actual application, things tend to get more specific and opinionated, and may impose some cost. Futures themselves are a zero-cost and very general abstraction in Rust, and the tokio-core crate imposes very little cost. Particular protocol implementations and middleware, on the other hand, can be more opinionated.

We’ll have a lot more to say about all of these layers (and the ones beneath them, like futures) in the coming weeks on our various blogs. Stay tuned!

A note on project management

We’re following a Rust-like model, starting with a core team that reaches major decisions through a consensus process. At the moment, this process is fairly informal: it plays out on the issue tracker, PRs, and gitter channels. But as the library begins to mature, we plan to move toward an RFC-like process for major changes. We are eager for the Tokio project to truly be a Rust community project. It’s going to have a lot of stakeholders, and we want to make sure those stakeholders have a voice just as we do in the Rust project itself.

As for the core futures library, it remains separate from the Tokio project, in part because we imagine it heading toward ownership by the rust-lang org in the relatively near future. (That’s a possible eventual path for Tokio as well, but the road will be much longer.)

Jumping in

Tokio is an ambitious project, and it’s going to take a strong community to really get it off the ground. Many from the Rust community have already jumped in to contribute, even in these extremely early days; that’s helped us get some of our early-stage integrations going, including curl, tls and redis. We’re also working with Sean McArthur to get a Tokio-integrated Hyper off the ground. If you’re interested in any of this, any other integrations, or the core libraries, we’d love to hear from you!

If you’re coming to RustConf, we’ll see you there, either at the Tokio hack night or at the talk about futures at RustConf itself. Come say hello, and join in the fun!

http://aturon.github.io/tech/2016/08/26/tokio

Zero-cost futures in Rust

Aug 11, 2016 Updated Aug 11, 2016

Show full content

One of the key gaps in Rust’s ecosystem has been a strong story for fast and productive asynchronous I/O. We have solid foundations, like the mio library, but they’re very low level: you have to wire up state machines and juggle callbacks directly.

We’ve wanted something higher level, with better ergonomics, but also better composability, supporting an ecosystem of asynchronous abstractions that all work together. This story might sound familiar: it’s the same goal that’s led to the introduction of futures (aka promises) in many languages, with some supporting async/await sugar on top.

A major tenet of Rust is the ability to build zero-cost abstractions, and that leads to one additional goal for our async I/O story: ideally, an abstraction like futures should compile down to something equivalent to the state-machine-and-callback-juggling code we’re writing today (with no additional runtime overhead).

Over the past couple of months, Alex Crichton and I have developed a zero-cost futures library for Rust, one that we believe achieves these goals. (Thanks to Carl Lerche, Yehuda Katz, and Nicholas Matsakis for insights along the way.)

Today, we’re excited to kick off a blog series about the new library. This post gives the highlights, a few key ideas, and some preliminary benchmarks. Follow-up posts will showcase how Rust’s features come together in the design of this zero-cost abstraction. And there’s already a tutorial to get you going.

Why async I/O?

Before delving into futures, it’ll be helpful to talk a bit about the past.

Let’s start with a simple piece of I/O you might want to perform: reading a certain number of bytes from a socket. Rust provides a function, read_exact, to do this:

// reads 256 bytes into `my_vec`
socket.read_exact(&mut my_vec[..256]);

Quick quiz: what happens if we haven’t received enough bytes from the socket yet?

In today’s Rust, the answer is that the current thread blocks, sleeping until more bytes are available. But that wasn’t always the case.

Early on, Rust had a “green threading” model, not unlike Go’s. You could spin up a large number of lightweight tasks, which were then scheduled onto real OS threads (sometimes called “M:N threading”). In the green threading model, a function like read_exact blocks the current task, but not the underlying OS thread; instead, the task scheduler switches to another task. That’s great, because you can scale up to a very large number of tasks, most of which are blocked, while using only a small number of OS threads.

The problem is that green threads were at odds with Rust’s ambitions to be a true C replacement, with no imposed runtime system or FFI costs: we were unable to find an implementation strategy that didn’t impose serious global costs. You can read more in the RFC that removed green threading.

So if we want to handle a large number of simultaneous connections, many of which are waiting for I/O, but we want to keep the number of OS threads to a minimum, what else can we do?

Asynchronous I/O is the answer – and in fact, it’s used to implement green threading as well.

In a nutshell, with async I/O you can attempt an I/O operation without blocking. If it can’t complete immediately, you can retry at some later point. To make this work, the OS provides tools like epoll, allowing you to query which of a large set of I/O objects are ready for reading or writing – which is essentially the API that mio provides.

The problem is that there’s a lot of painful work tracking all of the I/O events you’re interested in, and dispatching those to the right callbacks (not to mention programming in a purely callback-driven way). That’s one of the key problems that futures solve.

Futures

So what is a future?

In essence, a future represents a value that might not be ready yet. Usually, the future becomes complete (the value is ready) due to an event happening somewhere else. While we’ve been looking at this from the perspective of basic I/O, you can use a future to represent a wide range of events, e.g.:

A database query that’s executing in a thread pool. When the query finishes, the future is completed, and its value is the result of the query.
An RPC invocation to a server. When the server replies, the future is completed, and its value is the server’s response.
A timeout. When time is up, the future is completed, and its value is just () (the “unit” value in Rust).
A long-running CPU-intensive task, running on a thread pool. When the task finishes, the future is completed, and its value is the return value of the task.
Reading bytes from a socket. When the bytes are ready, the future is completed – and depending on the buffering strategy, the bytes might be returned directly, or written as a side-effect into some existing buffer.

And so on. The point is that futures are applicable to asynchronous events of all shapes and sizes. The asynchrony is reflected in the fact that you get a future right away, without blocking, even though the value the future represents will become ready only at some unknown time in the… future.

In Rust, we represent futures as a trait (i.e., an interface), roughly:

trait Future {
    type Item;
    // ... lots more elided ...
}

The Item type says what kind of value the future will yield once it’s complete.

Going back to our earlier list of examples, we can write several functions producing different futures (using impl syntax):

// Lookup a row in a table by the given id, yielding the row when finished
fn get_row(id: i32) -> impl Future<Item = Row>;

// Makes an RPC call that will yield an i32
fn id_rpc(server: &RpcServer) -> impl Future<Item = i32>;

// Writes an entire string to a TcpStream, yielding back the stream when finished
fn write_string(socket: TcpStream, data: String) -> impl Future<Item = TcpStream>;

All of these functions will return their future immediately, whether or not the event the future represents is complete; the functions are non-blocking.

Things really start getting interesting with futures when you combine them. There are endless ways of doing so, e.g.:

Sequential composition: f.and_then(|val| some_new_future(val)). Gives you a future that executes the future f, takes the val it produces to build another future some_new_future(val), and then executes that future.
Mapping: f.map(|val| some_new_value(val)). Gives you a future that executes the future f and yields the result of some_new_value(val).
Joining: f.join(g). Gives you a future that executes the futures f and g in parallel, and completes when both of them are complete, returning both of their values.
Selecting: f.select(g). Gives you a future that executes the futures f and g in parallel, and completes when one of them is complete, returning its value and the other future. (Want to add a timeout to any future? Just do a select of that future and a timeout future!)

As a simple example using the futures above, we might write something like:

id_rpc(&my_server).and_then(|id| {
    get_row(id)
}).map(|row| {
    json::encode(row)
}).and_then(|encoded| {
    write_string(my_socket, encoded)
})

See this code for a more fleshed out example.

This is non-blocking code that moves through several states: first we do an RPC call to acquire an ID; then we look up the corresponding row; then we encode it to json; then we write it to a socket. Under the hood, this code will compile down to an actual state machine which progresses via callbacks (with no overhead), but we get to write it in a style that’s not far from simple blocking code. (Rustaceans will note that this story is very similar to Iterator in the standard library.) Ergonomic, high-level code that compiles to state-machine-and-callbacks: that’s what we were after!

It’s also worth considering that each of the futures being used here might come from a different library. The futures abstraction allows them to all be combined seamlessly together.

Streams

But wait – there’s more! As you keep pushing on the future “combinators”, you’re able to not just reach parity with simple blocking code, but to do things that can be tricky or painful to write otherwise. To see an example, we’ll need one more concept: streams.

Futures are all about a single value that will eventually be produced, but many event sources naturally produce a stream of values over time. For example, incoming TCP connections or incoming requests on a socket are both naturally streams.

The futures library includes a Stream trait as well, which is very similar to futures, but set up to produce a sequence of values over time. It has a set of combinators, some of which work with futures. For example, if s is a stream, you can write:

s.and_then(|val| some_future(val))

This code will give you a new stream that works by first pulling a value val from s, then computing some_future(val) from it, then executing that future and yielding its value – then doing it all over again to produce the next value in the stream.

Let’s see a real example:

// Given an `input` I/O object create a stream of requests
let requests = ParseStream::new(input);

// For each request, run our service's `process` function to handle the request
// and generate a response
let responses = requests.and_then(|req| service.process(req));

// Create a new future that'll write out each response to an `output` I/O object
StreamWriter::new(responses, output)

Here, we’ve written the core of a simple server by operating on streams. It’s not rocket science, but it is a bit exciting to be manipulating values like responses that represent the entirety of what the server is producing.

Let’s make things more interesting. Assume the protocol is pipelined, i.e., that the client can send additional requests on the socket before hearing back from the ones being processed. We want to actually process the requests sequentially, but there’s an opportunity for some parallelism here: we could read and parse a few requests ahead, while the current request is being processed. Doing so is as easy as inserting one more combinator in the right place:

let requests = ParseStream::new(input);
let responses = requests.map(|req| service.process(req)).buffered(32); // <--
StreamWriter::new(responses, output)

The buffered combinator takes a stream of futures and buffers it by some fixed amount. Buffering the stream means that it will eagerly pull out more than the requested number of items, and stash the resulting futures in a buffer for later processing. In this case, that means that we will read and parse up to 32 extra requests in parallel, while running process on the current one.

These are relatively simple examples of using futures and streams, but hopefully they convey some sense of how the combinators can empower you to do very high-level async programming.

Zero cost?

I’ve claimed a few times that our futures library provides a zero-cost abstraction, in that it compiles to something very close to the state machine code you’d write by hand. To make that a bit more concrete:

None of the future combinators impose any allocation. When we do things like chain uses of and_then, not only are we not allocating, we are in fact building up a big enum that represents the state machine. (There is one allocation needed per “task”, which usually works out to one per connection.)
When an event arrives, only one dynamic dispatch is required.
There are essentially no imposed synchronization costs; if you want to associate data that lives on your event loop and access it in a single-threaded way from futures, we give you the tools to do so.

And so on. Later blog posts will get into the details of these claims and show how we leverage Rust to get to zero cost.

But the proof is in the pudding. We wrote a simple HTTP server framework, minihttp, which supports pipelining and TLS. This server uses futures at every level of its implementation, from reading bytes off a socket to processing streams of requests. Besides being a pleasant way to write the server, this provides a pretty strong stress test for the overhead of the futures abstraction.

To get a basic assessment of that overhead, we then implemented the TechEmpower “plaintext” benchmark. This microbenchmark tests a “hello world” HTTP server by throwing a huge number of concurrent and pipelined requests at it. Since the “work” that the server is doing to process the requests is trivial, the performance is largely a reflection of the basic overhead of the server framework (and in our case, the futures framework).

TechEmpower is used to compare a very large number of web frameworks across many different languages. We compared minihttp to a few of the top contenders:

rapidoid, a Java framework, which was the top performer in the last round of official benchmarks.
Go, an implementation that uses Go’s standard library’s HTTP support.
fasthttp, a competitor to Go’s standard library.
node.js.

Here are the results, in number of “Hello world!”s served per second on an 8 core Linux machine:

It seems safe to say that futures are not imposing significant overhead.

Update: to provide some extra evidence, we’ve added a comparison of minihttp against a directly-coded state machine version in Rust (see “raw mio” in the link). The two are within 0.3% of each other.

The future

Thus concludes our whirlwind introduction to zero-cost futures in Rust. We’ll see more details about the design in the posts to come.

At this point, the library is quite usable, and pretty thoroughly documented; it comes with a tutorial and plenty of examples, including:

a simple TCP echo server;
an efficient SOCKSv5 proxy server;
minihttp, a highly-efficient HTTP server that supports TLS and uses Hyper’s parser;
an example use of minihttp for TLS connections,

as well as a variety of integrations, e.g. a futures-based interface to curl. We’re actively working with several people in the Rust community to integrate with their work; if you’re interested, please reach out to Alex or myself!

If you want to do low-level I/O programming with futures, you can use futures-mio to do so on top of mio. We think this is an exciting direction to take async I/O programming in general in Rust, and follow up posts will go into more detail on the mechanics.

Alternatively, if you just want to speak HTTP, you can work on top of minihttp by providing a service: a function that takes an HTTP request, and returns a future of an HTTP response. This kind of RPC/service abstraction opens the door to writing a lot of reusable “middleware” for servers, and has gotten a lot of traction in Twitter’s Finagle library for Scala; it’s also being used in Facebook’s Wangle library. In the Rust world, there’s already a library called Tokio in the works that builds a general service abstraction on our futures library, and could serve a role similar to Finagle.

There’s an enormous amount of work ahead:

First off, we’re eager to hear feedback on the core future and stream abstractions, and there are some specific design details for some combinators we’re unsure about.
Second, while we’ve built a number of future abstractions around basic I/O concepts, there’s definitely more room to explore, and we’d appreciate help exploring it.
More broadly, there are endless futures “bindings” for various libraries (both in C and in Rust) to write; if you’ve got a library you’d like futures bindings for, we’re excited to help!
Thinking more long term, an obvious eventual step would be to explore async/await notation on top of futures, perhaps in the same way as proposed in Javascript. But we want to gain more experience using futures directly as a library, first, before considering such a step.

Whatever your interests might be, we’d love to hear from you – we’re acrichto and aturon on Rust’s IRC channels. Come say hi!

http://aturon.github.io/tech/2016/08/11/futures

The Rust Platform

Jul 27, 2016 Updated Jul 27, 2016

Show full content

A programming language is much more than its compiler and standard library. It’s a community. Tools. Documentation. An ecosystem. All of these elements affect how a language feels, its productivity, and its applicability.

Rust is a very young language – barely a year past 1.0 – and building out and maturing the full complement of ecosystem and tooling is crucial to its success. That building is happening, but sometimes at an explosive rate that makes it hard to track what’s going on, to find the right library for a task, or to choose between several options that show up on a crates.io search. It can be hard to coordinate versions of important libraries that all work well together. We also lack tools to push toward maturity in a community-wide way, or to incentivize work toward a common quality standard.

On the other hand, the core parts of Rust get a tremendous amount of focus. But we have tended to be pretty conservative in what is considered “core”: today, essentially it’s rustc, cargo, libstd/libcore, and a couple of other crates. The standard library also takes a deliberately minimalistic approach, to avoid the well-known pitfalls of large standard libraries that are versioned with the compiler and quickly stagnate, while the real action happens in the broader ecosystem (“std is where code goes to die”).

In short, there are batteries out there, but we’re failing to include them (or even tell you where to shop for them).

Can we provide a “batteries included” experience for Rust that doesn’t lead to stagnation, one that instead works directly with and through the ecosystem, focusing attention, driving compatibility, and reaching for maturity?

I think we can, and I want to lay out a plan that’s emerged after discussion with many on the core and subteams.

What is “The Rust Platform”?

I want to say right off the bat that the ideas here draw significant inspiration from the Haskell Platform, which is working toward similar goals for Haskell.

The basic idea of the Rust Platform is simple:

Distribute a wide range of artifacts in a single “Rust Platform Package”, including:
- The compiler, Cargo, rust-lang crates (e.g. std, libc), docs
- Libraries drawn from the wider ecosystem (going beyond rust-lang crates)
- Tools drawn from the wider ecosystem (e.g. rustfmt, NDKs, editor plugins, lints)
- Cross-compilation targets
Periodically curate the ecosystem, determining consensus choices for what artifacts, and at what versions, to distribute.

In general, rustup is intended to be the primary mechanism for distribution; it’s expected that it will soon replace the guts of our official installers, becoming the primary way to acquire Rust and related artifacts.

As you’d expect, the real meat here is in the details. It’s probably unclear what it even means to “distribute” a library, given Cargo’s approach to dependency management. Read on!

Library mechanics Cargo metapackages

The most novel part of the proposal is the idea of curating and distributing crates. The goal is to provide an experience that feels much like std, but provides much greater agility, avoiding the typical pitfalls of large standard libraries.

The key to making sense of library “packaging” for Rust is the idea of a metapackage for Cargo, which aggregates together a number of library dependencies as a single name and version. Concretely, this would look like:

[dependencies]
rust-platform = "2.7"

which is effectively then shorthand for something like:

[dependencies]
mio = "1.2"
regex = "2.0"
log = "1.1"
serde = "3.0"

Meta packages give technical meaning to curation: we can provide assurance that the crates within a metapackage will all play well together, at the versions stated.

With the platform metapackage, we can talk coherently about the “Rust Platform 2.0 Series” as a chapter in Rust’s evolution. After all, core libraries play a major role in shaping the idioms of a language at a given point of time. Evolution in these core libraries can have an effect on the experience of the language rivaling changes to the language itself.

With those basics out of the way, let’s look at the ways that the platform is, and is not, like a bigger std.

Stability without stagnation

The fact that std is effectively coupled with rustc means that upgrading the compiler entails upgrading the standard library, like it or not. That means that the two need to provide essentially the same backwards-compatibility guarantees. TL;DR, it’s simply not feasible to do a new, major version of std with breaking changes. Moreover, std is forcibly tied to the Rust release schedule, meaning that new versions arrive every six weeks, period. Given these constraints, we’ve chosen to take a minimalist route with std, to avoid accumulating a mass of deprecated APIs over time.

With the platform metapackage, things are quite different. On the one hand, we can provide an experience that feels a lot like std (see below for more on that). But it doesn’t suffer from the deficits of std. Why? It all comes down to versioning:

Stability: Doing a rustup to the latest platform will never break your existing code, for one simple reason: existing Cargo.toml files will be pinned to a prior version of the platform metapackage, which is fundamentally just a collection of normal dependencies. So you can upgrade the compiler and toolchain, but be using an old version of the platform metapackage in perpetuity. In short, the metapackage version is orthogonal to the toolchain version.
Without stagnation: Because of the versioning orthogonality, we can be more free to make breaking changes to the platform libraries. That could come in the form of upgrading to a new major version of one of the platform crates, or even dropping a crate altogether. These changes are never forced on users.

But we can do even better. In practice, while code will continue working with an old metapackage version, people are going to want to upgrade. We can smooth that process by allowing metapackage dependencies to be overridden if they appear explicitly in the Cargo.toml file. So, for example, if you say:

[dependencies]
rust-platform = "2.7"
regex = "3.0"

you’re getting the versions stipulated by platform 2.7 in general, but specifying a different version of regex.

There are lots of uses for this kind of override. It can allow you to track progress of a given platform library more aggressively (not just every six weeks), or to try out a new, experimental major version. Or you can use it to downgrade a dependency where you can otherwise transition to a new version of the platform.

Approaching std ergonomics

There are several steps we can take, above and beyond the idea of a metapackage, to make the experience of using the Rust Platform libraries approximate using std itself.

cargo new. A simple step: have cargo new automatically insert a dependency on the current toolchain’s version of the platform.
Global coherence. When we assemble a version of the platform, we can do integration testing against the whole thing, making sure that the libraries not only compile together, but work together. Moreover, libraries in the platform can assume the inclusion of other libraries in the platform, meaning that example code and documentation can cross-reference between libraries, with the precise APIs that will be shipped.
Precompilation. If we implement metapackages naively, then the first time you compile something that depends on the platform, you’re going to be compiling some large number of crates that you’re not yet using. There are a few ways we could solve this, but certainly one option would be to provide binary distribution of the libraries through rustup – much like we already do for std.
No extern crate. Getting a bit more aggressive, we might drop the need for extern crate when using platform crates, giving a truly std-like feel. (In general, extern crate is already redundant with Cargo.toml for most use cases, so we might want to take this step broadly, anyway.)

Versioning and release cadence

I’ve already alluded to “major versions” of the platform in a few senses. Here’s what I’m thinking in more detail:

First off, rustc itself is separately versioned. Conceivably, the Rust Platform 5.0 ships with rustc 1.89. In other words, a new major version of the platform does not imply breaking changes to the language or standard library. As discussed above, the metapackage approach makes it possible to release new major versions without forcibly breaking any existing code; people can upgrade their platform dependency orthogonally from the compiler, at their own pace, in a fine-grained way.

With that out of the way, here’s a plausible versioning scheme and cadence:

A new minor version of the platform is released every six weeks, essentially subsuming the existing release process. New minor releases should only include minor version upgrades of libraries and tools (or expansions to include new libs/tools).
A new major version of the platform is released roughly every 18-24 months. This is the opportunity to move to new major versions of platform libraries or to drop existing libraries. It also gives us a way to recognize major shifts in the way you write Rust code, for example by moving to a new set of libraries that depend on a major new language feature (say, specialization or HKT).

More broadly, I see major version releases as a way to lay out a narrative arc for Rust, recognizing major new chapters in its development. That’s helpful internally, because it provides medium-term focus toward shipping The Next Iteration of Rust, which we as a community can rally around. It’s also helpful externally, because people less immediately involved in Rust’s development will have a much easier way to understand the accumulation of major changes that make up each major release. These ideas are closely tied to the recent Roadmap proposal, providing a clear “north star” toward which quarterly plans can head.

Two-level curation

So far I’ve focused on artifacts that officially ship as part of the platform. Curating at that level is going to be a lot of work, and we’ll want to be quite selective about what’s included. (For reference, the Haskell Platform has about 35 libraries packaged).

But there are some additional opportunities for curation. What I’d love to see is a kind of two-level scheme. Imagine that, somewhere on the Rust home page, we have a listing of major areas of libraries and tools. Think: “Parsing”, “Networking”, “Serialization”, “Debugging”. Under each of these categories, we have a very small number of immediate links to libraries that are part of the official platform. But we also have a “see more” link that provides a more comprehensive list.

That leads to two tiers of curation:

Tier one: shown on front page; shipped with the platform; highly curated and reviewed; driven by community consensus; integration tested and cross-referenced with the rest of the platform.
Tier two: shown in “see more”; lightly curated, according to a clearly stated set of objective criteria. Things like: platform compatibility; CI; documentation; API conventions; versioned at 1.0 or above.

By providing two tiers, we release some of the pressure around being in the platform proper, and we provide valuable base-level quality curation and standardization across the ecosystem. The second tier gives us a way to motivate the ecosystem toward common quality and consistency goals: anyone is welcome to get their crate on a “see more” page, but they have to meet a minimum bar first.

The rust-lang crates

One small note: our previous attempt at a kind of “extended std” was the rust-lang crates concept. These crates are “owned” by the Rust community, and governed by the RFC process, much like std. They’re also held to similar quality standards.

Ultimately, it’s proved pretty heavy weight to require full RFCs and central control over central crates, and so the set of rust-lang crates has grown slowly. The platform model is more of a “federated” approach, providing decentralized ownership and evolution, while periodically trying to pull together a coherent global story.

However, I expect the rust-lang crates to stick around, and for the set to slowly grow over time; there is definitely scope for some very important crates to be completely “owned by the community”. These crates would automatically be part of the platform, having been approved via the RFC process already.

Open questions

The biggest open question here is: how does curation work? Obviously, it can’t run entirely through the libs team; that doesn’t scale, and the team doesn’t have the needed domain expertise anyway.

What I envision is something that fits into the Roadmap planning proposal. In a given quarter, we set out as an initiative to curate crates in a few areas – let’s say, networking and parsing. During that quarter, the libs team works closely with the portion of the community actively working in that space, acting as API consultants and reviewers, and helping shepherd consensus toward a reasonable selection. There are a lot of details to sort out, but working in an incremental way (a sort of quarterly round-robin between areas) seems like a good balance between focus and coverage. But there are a lot of details to sort out.

It’s also not entirely clear what will need to go into each minor release. Hopefully it can be kept relatively minimal (e.g., with library/tool maintainers largely driving the version choice for a given minor release).

Wrap-up

Although the mechanics are not all that earth-shattering, I think that introducing the Rust Platform could have a massive impact on how the Rust community works, and on what life as a Rust user feels like. It tells a clear story about Rust’s evolution, and lets us rally around that story as we hammer out the work needed to bring it to life. I’m eager to hear what you think!

http://aturon.github.io/tech/2016/07/27/rust-platform

Refining Rust's RFCs

Jul 5, 2016 Updated Jul 5, 2016

Show full content

At the heart of Rust’s open development is the RFC process. Every major change to the language, compiler, core libraries, tooling, and policy go through an RFC writeup and consensus-building process. The process served us incredibly well in clarifying our technical direction on the road to 1.0, and has continued to be highly active since then, with on average about 2 RFCs merged every week.

But it’s not all roses. There’s been a growing sense among both Rust leadership and the broader community that the RFC process needs some further refinement as we continue to grow the community. I want to lay out my view of the problems and sketch some possible solutions, based on extensive discussion and brainstorming with many others on the team.

Each idea operates at a different scale (from big-picture to low-level mechanics), but they are intended to fit together into a whole; each one supports the others. Ultimately, these should result in a single RFC, but in the meantime I’ll start a discuss thread for each proposal.

There is a clear common theme to all of the problems I want to raise: communication. We need to find ways to better scale up lines of communication around the RFC process, and for Rust core development in general. There is also a cross-cutting concern: a need to increase our focus on mentoring and the path to team membership. @wycats has a great saying about measuring the health of the team structure:

Being a very active contributor who is not yet on a subteam should feel very close to actually being on that subteam.

Shooting for such a state of affairs has many benefits, not least of which is increasing the scalability of our community.

Proposal: Roadmap

Discuss link.

The problem

Lack of clear rallying points. One thing that made the run-up to the 1.0 release so exhilarating was the way the release focused our effort: there was a big overarching goal we were all working toward, which led to a number of fairly clear-cut subgoals that everyone could pitch in on.

Since then, though, we’ve never had quite as clear of a “north star”. We’ve communicated some very high-level plans, and had success rallying efforts around self-contained projects like MIR. But we don’t have a systematic way of rallying our efforts around important goals on a regular basis. This gap is a shame, because there are many people eager to contribute, who we should be directing toward common, important goals with good mentoring opportunities. Likewise, there are lots of people who could provide useful perspective on goals, or even provide leadership on initiatives, who don’t have an outlet today.

Relatedly, it can be difficult to contribute at the RFC level. Is the problem you want to solve a priority for the relevant team or wider community? When it comes to the core language, there is only so much design work that can be in flight at once (since it all needs to fit together), so greater clarity on priorities and motivations is essential.

The proposal

Idea: publish a roadmap on a regular cadence, e.g. every two release cycles (12 weeks).

The roadmap would contain, at a minimum, a small set of “major initiatives” for that period. An initiative might cover any phase of development, e.g.:

Early investigation: For example, building out NDK support in rustup or exploring implications of various memory model choices.
Design: For example, working out a revised design for rand or const generics.
Implementation: For example, the MIR initiative or rustbuild.
Documentation: For example, focused effort on updating API docs in a portion of the standard library.
Community: For example, launching RustBridge.

And potentially many other categories as well.

Initiatives are intended to be a primary rallying point for the community, and thus should share some basic traits:

Clear scope: an initiative should have clear-cut goals that can actually be finished. So, an open-ended goal like “MIR” doesn’t fly, but “Get MIR-trans working on all of crates.io” does.
Timeboxed: relatedly, an initiative should realistically last at most, say, 24 weeks (two roadmaps).
Commitment: There should be some level of commitment from multiple people to actually work on the initiative. In particular, the initiative should list some primary points of contact, and ideally mentors.

Each initiative would have a dedicated status page with this information, links to issues or other materials, and potentially a FAQ. We’ve often found that there are recurring questions (“When is MIR going to be turned on by default?”) about big, ongoing work. The roadmap and status pages give us a highly visible, central and curated place to put this information.

The roadmap should be set via an open consensus process in which anyone can propose or influence initiatives. The initiatives should fit criteria like those listed above, and should also fit into an overall vision for Rust’s evolution over a longer period.

Details to be worked out:

Cadence
Can initiatives be added mid-stream?
Full guidelines for initiatives; how many should be in flight at once? Needs to be a small number to make this practical and useful (it’s a form of curation/rallying).
What is the process for deciding on the initiatives?
Do we divvy things up by subteam? That would make the discussion easier, but doesn’t allow for cross-cutting initiatives very easily.
Can we find less boring terms than “Roadmap” and “Initiative”?
Can we also include the “feature pipeline” and other long-running concerns into a roadmap somehow?

Proposal: RFC staging

Discuss link.

The problem

RFCs are hard to keep up with in part because reading a full design – and all the commentary around it – can be a lot of work, and there tend to be a large number of active RFCs in flight at any time. RFC discussions are often hard to follow, due to the overwhelming number of comments, sometimes stretching over multiple forums. Naturally, this problem is exacerbated by “controversial” RFCs, which is where we most need broad input and careful discussion. It can also be hard to track RFCs that are in some sense “competing” (offering alternative proposals for a common problem), or to correlate discussion between the discuss forum and github.

It’s also problematic to start off with a full proposal. What we really want is to get the community on the same page first about the importance of the problem being solved, and then to proceed to the design phase, perhaps considering multiple competing designs.

Finally, RFCs are sometimes closed as “postponed”, but ideally that should not simply terminate the discussion; instead, the discussion should simply continue elsewhere, or somehow be marked as being at a different stage.

The proposal

Idea: introduce stages into the RFC process, including one for reaching consensus on motivation prior to considering a design.

Idea: move the focus away from an RFC PR as the primary venue for RFC discussion.

Put differently, the idea is to orient the RFC process around problems first, and solutions second.

The rough phases I have in mind are:

Problem consensus
RFC drafting
RFC PR(s)
FCP
RFC merged

Concretely, what this would look like is having some venue for tracking the problems we might want to solve, perhaps a revamped version of the RFC issue tracker. Whatever this venue is, it would track the progression through all of the phases. Let’s call this venue the “Problem Tracker”.

Phase 1: Problem consensus. The initial discussion is essentially about reaching consensus on the motivation section of an RFC, which should include examples and make a compelling case that solving the problem is important enough to warrant expending energy and potential complexity. The subteam would sign off on that motivation, at which point there is some level of commitment to solve the problem. That puts the focus where it should be – solving problems – and should make it much easier for subteam members to engage early on in the RFC lifecycle.
Phase 2: RFC drafting. This phase can proceed in parallel with the previous one. During this phase, people sketch designs and work toward one or more full RFC drafts. Brainstorming and discussion on specific drafts would happen within dedicated “pre-RFC” discuss posts, which are linked from the Problem Tracker. In particular, newly-opened RFC PRs today often get an avalanche of comments and early revisions, making it very hard to join the discussion even a week later. Pushing early feedback to our forum instead will make the eventual RFC PR discussion more focused and easier to participate in.
Phase 3: RFC PR(s). At some point, a shepherd (see below) can determine that an RFC draft is of sufficiently high quality and steady state that a PR should be opened, at which point discussion proceeds as it does today. Multiple RFC PRs might be open for the same basic problem – and indeed, this is a good way to take the “Alternatives” section more seriously. All open PRs would be linked from the Problem Tracker.
Phases 4 and 5 work just as today.

One interesting aspect of this phasing: it’s possible to approve a Motivation section, then get all the way to RFC PR, only to close out the PR for one reason or another. In such cases, it should be possible to go back to the Problem Tracker and build a new, alternative RFC draft with the same Motivation section.

Note that, in this regime, you don’t ever open an RFC PR out of hand – it must go through the earlier phases, including the pre-RFC discuss post. While this may feel like more process, I think that globally it will make the whole thing more efficient, by weeding out poorly motivated RFCs earlier, by focusing attention on the problem, by producing higher quality RFC PRs, and (as we’ll see) by decentralizing the process a bit more. In addition, it makes it easier to cope with the problem of “Does this need an RFC?”

As part of this proposal, I think we should “reboot” the notion of a shepherd. The idea would be to create a broader network of people around a subteam who are empowered to help move the RFC process along in various ways, but aren’t necessarily responsible for the final decision. So, for example, we would have a larger set of “lang shepherds” who help lang RFCs progress. The powers and responsibilities of shepherds would include:

“Calling to question” – that is, proposing that the subteam move to make a decision on problem consensus or moving to FCP.
Working with the community to help brainstorm, draft, and revise pre-RFCs.
Moving to from pre-RFC to RFC PR phase.
Acting as the “scribe” for the RFC process, by keeping the Problem Tracker up to date. In particular, the subteams currently attempt to provide “summary” comments for contentious RFCs, to help people track the discussion. This proposal would give those comments more formal status, as something that would go directly on the Problem Tracker, and that any shepherd could provide at any point.

All subteam members can act as shepherds as well.

In general, I envision the Problem Tracker as the go-to place to see where things stand for a given problem/set of proposals, including summarization of discussion and pros/cons for the proposals. The shepherds would play a special role in establishing that official record.

I think these changes make the RFC process both more accessible and more scalable. More accessible because it’s easier to get involved and get quick feedback in lightweight ways (before writing up an entire design). More scalable because of increased parallelism, and because the big decision points happen at either an easier stage (establishing motivation) or with many fewer proposals in flight (the RFC PR stage).

Details to be worked out:

What happens to current RFC PRs? Are they grandfathered in, or moved into this new process?
Where does the “problem tracker” live?
What are good guidelines around an initial “motivation”?
How and where can we keep an “official record” of the progression of a problem, including links to (and summaries of) pre-RFC and RFC PR threads?

Proposal: Async decisions

Discuss link.

The problem

There is room for improvement around the way that the subteams themselves work. Today, subteams reach decisions on RFCs and other issues in (bi)weekly meetings. There are at least two problems with doing so. First, since the meetings have a limited duration, we often run out of time without finishing the active business, introducing delays; similarly, because of the high amount of RFC activity, the subteams often operate in “reactive” mode, more than actively leading.

Another issue is that meetings provide, in some sense, the “wrong defaults” for making decisions. We have to be careful to ensure that all the rationale for a decision is present in the online discussion thread, and that any new rationale that came up during a meeting means that the decision is delayed, to give the full community a chance to respond. The point is that, while we work hard to provide this transparency, it requires that extra work. At the same time, there is often good discussion in meetings wherein the subteam members build up a set of shared values – thereby missing the opportunity to argue for those values to the wider community. Finding a way to move decision-making to a more public, asynchronous system seems ideal, though meetings do have the benefit of providing a steady cadence to ensure that business is getting done.

The proposal

Idea: move away from video meetings for decision-making, instead reaching decisions entirely in the associated comment threads.

By moving the decision-making process fully online, we make it transparent by default. That is not to say that subteam members – or anyone else – will never have private conversation, of course. Just that this particular bit of business is better conducted online.

The key to making this work is automation. Right now, the meetings provide a convenient “forcing function” to ensure that decisions are being reached in a somewhat timely fashion. To ensure that we still make steady progress, we need a dashboard for every subteam member, showing them precisely what outstanding items they need to weigh in on – and that list needs to be kept manageably short.

We’ll need a dashboard tool that can pick up on special text from subteam members for:

Calling an RFC/issue into FCP
- “process: fcp”
Approving/disapproving FCP
- “process: fcp r+”
- “process: fcp r-“
Extending FCP
- “process: fcp extend” (for one more week by default; possibly give parameter?)
Approving stabilization/RFC merging
- “process: r+” (ideally followed up by some commentary)
Weakly objecting
- Just leave a comment, followed by a “process: r+” once you are satisfied that the objection is addressed or that it’s OK not to address it.
Strongly objecting (i.e. blocking acceptance)
- “process: r-“ (followed up with objection)
Abstaining (possibly?)
- “process: ack”

The dashboard tool would track the current status of RFCs/issues facing a decision, and would track the various timelines involved, e.g. that RFC FCP lasts for one week.

We can and should continue to hold video subteam meetings (they’re high bandwidth!), but for more forward-looking purposes: discussing specific early-stage RFCs, brainstorming, and prioritization. We can explore recording these meetings, and potentially opening them up to additional stakeholders who are not part of the subteam.

Details to be worked out:

A plausible story for automation that retains the consensus process and is likely to keep things moving.
Can the automation itself be responsible for moving to FCP/merging? Or at least provide a pushbutton way for doing so?

http://aturon.github.io/tech/2016/07/05/rfc-refinement

Resurrecting impl Trait

Sep 28, 2015 Updated Sep 28, 2015

Show full content

TL;DR: since before Rust 1.0, we’ve wanted to be able to return an unboxed closure or avoid spelling out huge iterator types. This blog post revives the old impl Trait proposal, and discusses the broad tradeoffs between two different ways of carrying it out.

Heads up: I’m going to gloss over some details in this post, in the interest of getting across the high-level situation as I see it. Of course, any actual proposal will need to address the questions that I skip over.

Update: I removed the “elision” terminology, which was more confusing than helpful. I also now mention some implementation issues for the return type inference proposal. And I’ve toned down my preference in the wrapup; I’m becoming less certain :)

The original proposal

This post is about a topic near-and-dear to me – my first Rust RFC! – which is known as the “impl Trait” proposal. The RFC termed these “unboxed abstract types”, and it’s easiest to start with the motivation given there:

In today’s Rust, you can write a function signature like
fn consume_iter_static<I: Iterator<u8>>(iter: I)
fn consume_iter_dynamic(iter: Box<Iterator<u8>>)
In both cases, the function does not depend on the exact type of the argument. The type is held “abstract”, and is assumed only to satisfy a trait bound.

In the _static version using generics, each use of the function is specialized to a concrete, statically-known type, giving static dispatch, inline layout, and other performance wins.

In the _dynamic version using trait objects, the concrete argument type is only known at runtime using a vtable.

On the other hand, while you can write
fn produce_iter_dynamic() -> Box<Iterator<u8>>
you cannot write something like
fn produce_iter_static() -> Iterator<u8>
That is, in today’s Rust, abstract return types can only be written using trait objects, which can be a significant performance penalty. This RFC proposes “unboxed abstract types” as a way of achieving signatures like produce_iter_static. Like generics, unboxed abstract types guarantee static dispatch and inline data layout.

Here are some problems that unboxed abstract types solve or mitigate:
Returning unboxed closures. The ongoing work on unboxed closures expresses closures using traits. Sugar for closures generates an anonymous type implementing a closure trait. Without unboxed abstract types, there is no way to use this sugar while returning the resulting closure unboxed, because there is no way to write the name of the generated type.

Leaky APIs. Functions can easily leak implementation details in their return type, when the API should really only promise a trait bound. For example, a function returning Rev<Splits<'a, u8>> is revealing exactly how the iterator is constructed, when the function should only promise that it returns some type implementing Iterator<u8>. Using newtypes/structs with private fields helps, but is extra work. Unboxed abstract types make it as easy to promise only a trait bound as it is to return a concrete type.
Complex types. Use of iterators in particular can lead to huge types:
Chain<Map<'a, (int, u8), u16, Enumerate<Filter<'a, u8, vec::MoveItems<u8>>>>, SkipWhile<'a, u16, Map<'a, &u16, u16, slice::Items<u16>>>>
Even when using newtypes to hide the details, the type still has to be written out, which can be very painful. Unboxed abstract types only require writing the trait bound.
Documentation. In today’s Rust, reading the documentation for the Iterator trait is needlessly difficult. Many of the methods return new iterators, but currently each one returns a different type (Chain, Zip, Map, Filter, etc), and it requires drilling down into each of these types to determine what kind of iterator they produce.
In short, unboxed abstract types make it easy for a function signature to promise nothing more than a trait bound, and do not generally require the function’s author to write down the concrete type implementing the bound.

So, the RFC began with the framing that there was a kind of “gap” in the expressiveness matrix: we can choose between static and dynamic dispatch for inputs, but not for outputs.

The RFC went on to propose the impl Trait notation as a way of solving these problems:

The basic idea is to allow code like the following:
pub fn produce_iter_static() -> impl Iterator<u8> {
    (0..10u8).rev().map(|x| x * 2).skip(2)
}
where impl Iterator<u8> should be understood as “some type T such that T: Iterator<u8>. Notice that the function author does not have to write down any concrete iterator type, nor does the function’s signature reveal those details to its clients. But the type promises that there exists some concrete type.

The point here is to avoid writing a return type like

iter::Skip<iter::Map<'static,u8,u8,iter::Rev<iter::Range<u8>>>>

and instead give only the relevant information: some trait(s) that are implemented for the return type.

For a variety of reasons the RFC was closed and the feature has not shipped. But part of the impetus for returning to this topic now is that the illustrious @eddyb has a working implementation of a subset of the RFC! Ideally, the Rust community can come to a consensus around a design, and we can adapt and land this implementation.

Design questions

As it turns out, though, there are a lot of complex issues and decisions at play here, and as usual, multiple interesting points in the design space. Some of these were brought up in the RFC itself, others brought up on thread, and others haven’t really been discussed. But they all have to be tackled.

First I’ll go quickly through the main questions, then talk about design priorities, and finally present two possible designs.

Is impl Trait a type?

Can impl Trait appear everywhere a type can?

If not, where can impl Trait be used? Only return types? What about arguments, struct definitions, type aliases, etc? In each case, what should the semantics be?

The RFC gave answers to many of these questions, although I think today I would answer some of them differently.

Is impl Trait “sealed”?

As @Ericson2314 astutely remarked on thread:

This RFC is trying to serve up type inference and type abstraction as one feature, when they are orthogonal.

Inference-wise, we want to introduce meta-variables/unknowns where we are are not allowed to today.
fn foo() -> _ { ... };
type BigLongIterator = _;
Abstraction-wise, we want to give ourselves more leeway to change our libraries without breaking client code.
mod nsa {
    // Works with any T!
    abs type Iter<T>: Iterator<Item=T> = SnoopWhile<...T...>;
    ...
}

Here “type inference” means something akin to leaving off a type annotation that’s required today (like the return type of a function), without any change to semantics. By contrast, “type abstraction” means hiding some information about a type from clients, similarly to what we often do with newtypes today. The original proposal coupled these two features together.

This is going to turn out to be a central question for this blog post. Should these two aspects of the feature be treated separately or coupled? Are both needed? What are the tradeoffs?

How do you deal with Clone or Iterator adapters?

It’s pretty common that a trait is conditionally implemented:

impl<T: Clone> Clone for Vec<T> { ... }

But that poses a problem for impl Trait, which requires an unconditional statement about which traits are implemented. This is especially painful for things like the iterator adapters, which are often Clone if the original iterator is, DoubleEndedIterator if the original iterator is, etc.

Do marker traits (Send, Sync, …) have to be mentioned?

When you use the newtype pattern today, you have to explicitly forward most traits, but certain traits like Send and Sync will automatically be implemented for the new type if they were for the old type. Should impl Trait work similarly, implicitly carrying the markers?

This is not just a question of ergonomics, though the ergonomic issue here is significant! There’s also an extensibility problem: new libraries can add new “OIBIT”-style marker traits which are supposed to automatically apply to types, but forcing those markers to be explicitly opted in to for impl Trait means they often won’t apply. We’ve already seen significant problems along these lines with trait objects today.

Design constraints

I’m going to be a bit opinionated here and lay out some design desires.

Hard constraints:

must be possible to return an unboxed closure and store it in a struct
must be possible to return a compound iterator without giving the type explicitly
must cope with multiple such types appearing as components of a return type (e.g., returning a pair of different unboxed closures)
must be able to assert that at least some traits are satisfied
must be able to deal with conditional trait implementations

Strong desires:

minimal signature verbosity
compatible with adding new OIBITs
simple semantics/explanation of the feature, especially if it looks like a type

Nice to haves:

type abstraction (the “hiding” that @Ericson2314 was talking about)
more ergonomic newtypes (where you don’t have to forward trait impls explicitly)
applicable to struct definitions, not just function signatures

Option 1: return type inference

I’ll start with the simpler design: attack only the type inference aspect of the original proposal, without actually hiding any details about a type from clients.

The simplest way to do this would be to allow wildcards to leave off types in return position:

fn foo() -> _ {
    (1..10u8).rev().skip(2)
}

fn bar() -> (_, _) {
    (|| println!("first closure"),
     || println!("second closure"))
}

The idea here is that the actual return type is fully concrete – clients of the API know exactly what it is, and can take advantage of public inherent methods or arbitrary traits.

But a pure wildcard proposal is a pretty drastic step away from our policy of explicitness for signatures and type definitions. In particular, it doesn’t lead to a very informative signature for clients of the API.

A more palatable choice would be something closer to impl Trait, like:

fn foo() -> ~Iterator<Item = u8> {
    (1..10u8).rev().skip(2)
}

fn bar() -> (~FnOnce(), ~FnOnce()) {
    (|| println!("first closure"),
     || println!("second closure"))
}

The idea is that these trait bounds don’t say everything about the concrete type, but they give some trait bounds that must hold of the concrete type. (So ~FnOnce() means “an elided type with interface roughly FnOnce()”.) Usually, there is one “primary” trait for a given return type, though of course you can list as many as you like using +.

If I have my druthers, this feature would also be usable in argument position:

fn map<U>(self, f: ~FnOnce(T) -> U) -> Option<U>

To be clear about the “roughly” here: in the foo example, the return type also implements Clone and ExactSizeIterator – and client code can rely on those facts, despite them not being written down.

On the one hand, this approach is uncomfortably implicit (since bounds can be left off), and it may leak information about the type that we do not intend.

There are also some implementation concerns – the typechecker will need to check function definitions in a particular order to discover concrete types, and must ensure that return type inference isn’t used in a cycle between functions. Note, however, that type inference continues to be purely local.

On the other hand:

It’s dead simple from the programmer’s perspective. There are no thorny questions about type equality, scoping of type abstractions, or what ~ means in various contexts. It’s just an extension of inference.
It behaves exactly like associated types today.
It accounts for conditional trait implementations easily, since those will automatically be known about the return type whenever they are applicable.
It accounts for marker traits and “OIBITs” without any fuss.
This kind of “leakage” is already prevalent – and important! – in Rust today. For example, when you define an abstract type, you give a trait bound which must be fulfilled. But when a client has narrowed to a particular impl, everything about the associated type is revealed:
```
trait Assoc { type Output: Clone; }
impl Assoc for u8 { type Output = u8; }

// we know that u8::Assoc == u8! We're only limited to the bound when writing
// fully generic code.
```
The type leakage is, in general, very unlikely to be relied upon. For example, to observe the particulars of an iterator adapter type, you’d have to do something like assign it to a suitably-typed mutable variable:
```
let iter: Chain<Map<'a, (int, u8), u16, Enumerate<Filter<'a, u8, vec::MoveItems<u8>>>>, SkipWhile<'a, u16, Map<'a, &u16, u16, slice::Items<u16>>>>;
iter = some_function();
```

This design addresses all of the hard constraints and strong desires – but none of the nice-to-haves.

Key point: the strong simplicity here is a major selling point, given that the pain we’re trying to solve here is one of the places where Rust is considered to be particularly complicated. (See this reddit post for example.)

Option 2: type abstraction

On the other end of the spectrum, we could try to address all of the use cases outlined, including type abstraction.

I’m going to give one particular strawman proposal and syntax here, and only at a high level – it’s not a fully fleshed out spec, but should give some idea of the possible direction.

The basic idea is to introduce a “type abstraction operator” @ that is used to “seal” a concrete type to a particular interface:

pub type File = FileDesc@(Read + Write + Seek + Debug);

You should read this as “at”, meaning that you are viewing a type “at” some specific bounds.

This definition is roughly equivalent to:

pub struct File(FileDesc);

impl Read for File {
    // forward to FileDesc's Read impl
}

impl Write for File {
    // forward to FileDesc's Read impl
}

impl Seek for File {
    // forward to FileDesc's Read impl
}

impl Debug for File {
    // forward to FileDesc's Read impl
}

However that there is a scope in which the equivalence File = FileDesc is known. Within that scope (“inside the abstraction boundary”), File is a simple type alias for FileDesc. Outside that scope, File is an opaque type that is only known to implement the four traits given. This is akin to what you get with privacy, except that you don’t have to explicitly project using .0 or construct using File(SomeFileDesc).

The obvious scoping rules for the abstraction would be the current privacy rules (i.e., literally the same as what you get with a newtype).

There are some tricky questions here that need to be answered in a complete design, which I don’t try to answer here:

Aside from type definitions, where else can @ be used? We’ll explore one other location – function signatures – in this post, but struct/enum definitions are another interesting possibility.
How should these type definitions interact with coherence? Can you implement traits for File? Inherent methods? What if they conflict with traits/methods on FileDesc?
How do you deal with bounds where the type isn’t in Self position? For example, there is also an impl of Read and Write for &File that should be exported.
What are the rules for equality around these types?

For now, I want to focus on the original motivation: avoiding having to fully name a type, while providing an interface to it.

Integrating return type inference

The other part of the @ proposal is that, when used in function signatures, you can leave off the type before the @ (i.e., the concrete type being abstracted):

fn foo() -> @Iterator<Item = u8> {
    (1..10u8).rev().skip(2)
}

Unlike the ~ proposal above, this hides everything about the return type except for the stated trait bound. So clients here don’t know that the iterator is also Clone.

Scaling up: marker and conditional traits

To make this kind of “sealing” work, we’d have to deal with two additional thorny problems: marker traits and conditional traits.

Marker traits like Send and Sync are often “defaulted” (via .. impls, AKA OIBITs). When you follow the newtype pattern, these “defaulted” traits come along for the ride, whether you ask for them or not – they leak through. They are also often conditional (e.g., one type is Send if some other types are Send). It’s probably simplest to say that @ has newtype-like semantics and marker traits leak through. (Leaking is also important because OIBIT-style traits can be defined in downstream crates about which you have no knowledge.)

Leaking, of course, makes @Trait and Box<Trait> different forms of type abstraction, but OIBITs are a huge pain point for trait objects today, so that’s likely a worthwhile difference.

A more difficult issue is truly conditional traits, like:

impl<T: Clone> Clone for Vec<T> { ... }

To deal with this situation, we’d need conditional bounds like:

@(Iterator<Item = T> +
  I: Clone => Self: Clone +
  I: DoubleEndedIterator => Self: DoubleEndedIterator)

That’s, obviously, pretty verbose. Fortunately, in many cases there are groups of conditional bounds that tend to go together (see the iterator adapters, for example). You could imagine capturing these groups into aliases, so that you could say something like:

trait IterAdapter<T, I> = Iterator<Item = T> +
  I: Clone => Self: Clone +
  I: DoubleEndedIterator => Self: DoubleEndedIterator;

... @IterAdapter<T, I>

These aliases still have documentation advantages over the current adapter API, since you’d reuse the same alias over and over. By contrast, today each adapter introduces a separate newtype which must be examined separately to find its API.

Benefits/drawbacks

So in all, it seems feasible to introduce a type abstraction feature, @, along with an elided form for function signatures, and have reasonably concise signatures.

Some benefits:

The design feels a bit more “principled” than the pure type inference design: except for OIBIT traits, the entire interface to a type must be written explicitly, so there’s no accidental leakage and everything is fully documented.
The use in type gives a lighter weight form of newtypes that doesn’t require manually forwarding trait impls (akin to “generalized newtype deriving” from the Haskell world). However, these types would likely not function as complete newtypes from the perspective of impl coherence – it probably doesn’t make sense to impl new traits for them, for example. So they don’t solve the whole problem.

Some drawbacks:

Complexity. This variant is way more complicated than pure type inference. And it’s not clear that type abstraction is a feature that Rust really needs, given that we already have privacy and the newtype pattern. We could provide “newtype deriving” in a much simpler way to address the pain points there.
Verbosity. Even with aliases, the signatures involve here tend to be much more complicated. Of course, that’s part of the point: this proposal is trying to be explicit about signatures.
A somewhat deeper change. This proposal means, for example, that type can no longer be understood as a straight-up alias, since repeated uses of @ create distinct abstract types.

Wrapup

We absolutely need to expand Rust in this area; I stand by the design constraints listed here. But we managed to ship a relatively slim Rust 1.0, and I’d like to fight to keep the language as small and concise as we can manage.

In that light, I’m leaning somewhat toward return type inference here, despite its break from full signature explicitness. But I remain concerned about the fact that the bound is not actually all that meaningful.

http://aturon.github.io/tech/2015/09/28/impl-trait

Specialize to reuse

Sep 18, 2015 Updated Sep 18, 2015

Show full content

TL;DR: specialization supports clean, inheritance-like patterns out of the box. This post explains how, and discusses the interaction with the “virtual structs” saga.

Table of contents

Overview
Specialization as proposed
- A small addendum
Ending 1: the trait-based approach
- Thin pointers
- Incorporating fields
Ending 2: the enum-based approach
Getting opinionated

Overview

I’ve been working for a while with Niko Matsakis and Nick Cameron on another round of design for handling type hierarchies like those found in the DOM, in GUI frameworks, and even the compiler’s AST. The Rust community has gone through many iterations of design in this space, having identified the following goals for a type hierarchy design:

cheap field access from internal methods;
cheap dynamic dispatch of methods;
cheap downcasting;
thin pointers;
sharing of fields and methods between definitions;
safe, i.e., doesn’t require a bunch of transmutes or other unsafe code to be usable;
syntactically lightweight or implicit upcasting;
calling functions through smart pointers, e.g. fn foo(JSRef<T>, ...);
static dispatch of methods.

Two important constraints are missing from this prior list, one technical and one philosophical:

reusable constructor code at every level of the hierarchy;
fits well into the language, either by smoothly extending existing features, or by adding orthogonal concepts.

In the design I’ve been pursuing, impl specialization plays a key role. That’s appealing because specialization is something we’ve long wanted for other reasons, and is a natural deepening of our trait system. But it’s not quite enough, by itself, to meet all of the constraints above.

I’m going to start by recapping the specialization design. Then I’ll explore two competing avenues for building on specialization to meet the design constraints, choose-your-own-adventure style. At the end, I’ll give my current opinions on where we ought to go.

Specialization as proposed

Specialization allows overlapping trait (and inherent) impls, so long as there is always a “most specific” impl that applies to a given concrete type. The more general impl uses default to signal which items can be specialized – sort of the opposite of final in Java, or a bit like virtual in C++.

One of the major intended uses is to support true zero-cost abstraction, by allowing you to customize the impl for specific cases to match performance of non-abstract impls, while maintaining the abstraction for clients. To quote from the RFC:

Traits today can provide static dispatch in Rust, but they can still impose an abstraction tax. For example, consider the Extend trait:
pub trait Extend<A> {
    fn extend<T>(&mut self, iterable: T) where T: IntoIterator<Item=A>;
}
Collections that implement the trait are able to insert data from arbitrary iterators. Today, that means that the implementation can assume nothing about the argument iterable that it’s given except that it can be transformed into an iterator. That means the code must work by repeatedly calling next and inserting elements one at a time.

But in specific cases, like extending a vector with a slice, a much more efficient implementation is possible – and the optimizer isn’t always capable of producing it automatically. In such cases, specialization can be used to get the best of both worlds: retaining the abstraction of extend while providing custom code for specific cases.

The design in this RFC relies on multiple, overlapping trait impls, so to take advantage for Extend we need to refactor a bit:
pub trait Extend<A, T: IntoIterator<Item=A>> {
    fn extend(&mut self, iterable: T);
}

// The generic implementation
impl<A, T> Extend<A, T> for Vec<A> where T: IntoIterator<Item=A> {
    // the `default` qualifier allows this method to be specialized below
    default fn extend(&mut self, iterable: T) {
        ... // implementation using push (like today's extend)
    }
}

// A specialized implementation for slices
impl<'a, A> Extend<A, &'a [A]> for Vec<A> {
    fn extend(&mut self, iterable: &'a [A]) {
        ... // implementation using ptr::write (like push_all)
    }
}

Because the generic impl uses default for its implementation of extend, it’s permitted to give a more specialized impl block that overrides it. (The block is more specialized because it applies to a subset of the types the generic one applies to.)

A specialized impl doesn’t have to provide all the items for a trait; whatever it doesn’t provide is automatically inherited from the generic impl it’s specializing. And conversely, it can only override items marked default.

Going a bit farther, it’s possible to specialize not just impls for a trait, but also defaults for a trait. This is done via partial impl blocks, which are impls that provide some, but not all, of the items required by a trait. Again quoting the RFC:

For example, consider a design for overloading + and +=, such that they are always overloaded together:
trait Add<Rhs=Self> {
    type Output;
    fn add(self, rhs: Rhs) -> Self::Output;
    fn add_assign(&mut self, Rhs);
}
In this case, there’s no natural way to provide a default implementation of add_assign, since we do not want to restrict the Add trait to Clone data.

The specialization design in this RFC also allows for partial implementations, which can provide specialized defaults without actually providing a full trait implementation:
partial impl<T: Clone, Rhs> Add<Rhs> for T {
    // the `default` qualifier allows further specialization
    default fn add_assign(&mut self, rhs: R) {
        let tmp = self.clone() + rhs;
        *self = tmp;
    }
}
This partial impl does not mean that Add is implemented for all Clone data, but jut that when you do impl Add and Self: Clone, you can leave off add_assign:
#[derive(Copy, Clone)]
struct Complex {
    // ...
}

impl Add<Complex> for Complex {
    type Output = Complex;
    fn add(self, rhs: Complex) {
        // ...
    }
    // no fn add_assign necessary
}

A small addendum

The specialization RFC touches on, but doesn’t actually specify, a way to “access” the implementation you’re overriding (akin to super in the OO world).

For the sake of this post, I’ll assume we’ve added such a mechanism by way of a UFCS-like use of default:

trait Foo {
    fn foo(&self);
}

impl<T> Foo for T {
    default fn foo(&self) {
        // do some complicated stuff
    }
}

impl<T: Debug> Foo {
    fn foo(&self) {
        println!("About to `foo` on {:?}", self);
        default::foo(self);
    }
}

The idea is that the default:: prefix accessing the generic impl that’s being overridden, and works like UFCS (in that methods become functions taking self explicitly).

The details here aren’t so important, as long as specialization supports some mechanism like this (which was always the intent).

Ending 1: the trait-based approach

It should already be clear that specialization has a connection to inheritance, because items left off of a specialized impl are inherited from the impl (partial or otherwise) it is specializing. As it turns out, that’s already enough to code up something like traditional type hierarchies in OO languages. You can get pretty far!

This general approach is, in some ways, inspired by eddyb and Kimundi’s proposal from last time around, but using specialization rather than a targeted feature for default refinement. And of course many of the fine points of the design here have been explored by others in the community as well.

Here’s an example, using a lightly simplified extract from Servo’s DOM implementation.

// Node ////////////////////////////////////////////////////////////////////////

trait Node {
    fn parse_plain_attribute(&self, name: &Atom, value: DomString) -> AttrValue {
        AttrValue::String(value)
    }
    // additional virtual methods for Node
}

// non-virtual methods for Node
impl Node {
    fn is_parent_of(&self, child: &Node) -> bool {
        // ...
    }
    // additional methods here
}

// Element /////////////////////////////////////////////////////////////////////

trait Element: Node {
    fn as_activatable(&self) -> Option<&Activatable> {
        None
    }
    // additional Element methods
}

// non-virtual methods for Element
impl Element {
    fn nearest_activable_element(&self) -> Option<&Element> {
        // ...
    }
}

partial impl<T: Element> Node for T {
    default fn parse_plain_attribute(&self, name: &Atom, value: DomString) -> AttrValue {
        match name {
            &atom!("id") => AttrValue::from_atomic(value),
            &atom!("class") => AttrValue::from_serialized_tokenlist(value),
            _ => default::parse_plain_attribute(self, name, value),
        }
    }
}

// Activatable /////////////////////////////////////////////////////////////////

trait Activatable: Element {
    // moar methods
}

partial impl<T: Activatable> Element for T {
    fn as_activatable(&self) -> Option<&Activatable> {
        Some(self)
    }
}

// HtmlAnchorElement ///////////////////////////////////////////////////////////

struct HtmlAnchorElement {
    rel_list: DomTokenList,
    // moar fields
}

impl Node for HtmlAnchorElement {
    fn parse_plain_attribute(&self, name: &Atom, value: DOMString) -> AttrValue {
        match name {
            &atom!("rel") => AttrValue::from_serialized_tokenlist(value),
            _ => default::parse_plain_attribute(self, name, value),
        }
    }
}

impl Element for HtmlAnchorElement {
    // ...
}

impl Activatable for HtmlAnchorElement {
    // ...
}

// HtmlImageElement ////////////////////////////////////////////////////////////

struct HtmlImageElement {
    url: Option<Url>,
    image: Option<Arc<Image>>,
    // moar fields
}

impl Node for HtmlImageElement {
    fn parse_plain_attribute(&self, name: &Atom, value: DOMString) -> AttrValue {
        match name {
            &atom!("name") => AttrValue::from_atomic(value),
            &atom!("width") | &atom!("height") |
            &atom!("hspace") | &atom!("vspace") => AttrValue::from_u32(value, 0),
            _ => default::parse_plain_attribute(self, name, value),
        }
    }
}

impl Element for HtmlImageElement {
    // ...
}

This example is following a basic pattern:

“Abstract base classes” turn into traits.

Their virtual methods become methods in the traits (e.g. parse_plain_attribute).
- They will be virtually dispatched when using trait objects, and statically dispatched when used directly on a type implementing the trait (as usual). In practice that means that after the first virtual call, additional calls on self are statically-dispatched.
Their non-virtual methods become inherent methods for the trait, DST style (e.g. is_parent_of). These are always statically-dispatched. (Note: it may be preferable to write these as extension traits with blanket impls, to gain further static dispatch through monomorphization, at a cost in code size.)
Default implementations of methods can be done via … defaulted methods!

“Concrete classes” turn into structs.

The structs implement all of the traits for the abstract base classes above them.
In this case, “overriding” generally just means supplying an impl when a default was available.

Methods can be overridden at any point in the hierarchy.

This is done via a blanket (partial) impl, like partial impl<T: Element> Node for T.
When one abstract base class overrides its parent, it generally uses partial impl. This is because there are usually still some “abstract” aka “pure virtual” methods for the parent (which are just required methods on the trait).
If further overriding should be allowed, these (partial) impls should use default.
- For example, the as_activatable method is overridden for T: Activatable with a “final” version that cannot be further overridden.

And that’s it. This entire vision of OO-ish programming rests on using the existing system of dynamic dispatch through traits, and gets reuse/inheritance via specialization.

Let’s take stock of the design constraints:

cheap field access from internal methods;
- No: traits have no way to talk about fields directly, and accessors require virtual dispatch (much more expensive than a fixed offset).
cheap dynamic dispatch of methods;
- Yes: covered via the trait system.
cheap downcasting;
- Sort of: see the as_activatable pattern. But not as fast as a type tag check. (The latter can be encoded if we have fields, however.)
thin pointers;
- No: trait objects use fat pointers.
sharing of fields and methods between definitions;
- Yes for methods (via specialization), No for fields.
safe, i.e., doesn’t require a bunch of transmutes or other unsafe code to be usable;
- Yes.
syntactically lightweight or implicit upcasting;
- Yes, once we have it for super-traits in general (an already-slated, highly desired, simple feature).
calling functions through smart pointers, e.g. fn foo(JSRef<T>, ...);
- Yes, via existing coercion/Deref mechanisms.
static dispatch of methods;
- Yes, as discussed avove.
reusable constructor code at every level of the hierarchy;
- No, because fields are not addressed.
fits well into the language, either by smoothly extending existing features, or by adding orthogonal concepts.
- Yes: we didn’t have to add anything beyond specialization (which we’re taking for granted here).

So, we essentially met all but two requirements: thin pointers, and field access/inheritance. What’s the simplest way we could accommodate those?

Thin pointers

Recall that today, trait objects are “fat pointers”: a pointer to some data, and a pointer to a vtable containing methods for operating on that data.

This representation is not an arbitrary choice. It goes hand-in-hand with an important aspect of traits: you can implement a new trait for an already-defined type. This makes traits quite unlike interfaces in languages like Java, C#, or Scala, which have to be applied when you define a type. Traits are flexible bits of glue that can be applied after the fact. But the tradeoff is that the vtable cannot be part of the Self type, at least not if you want separate compilation. After all, the crate defining a struct simply does not know what traits might eventually be applied.

On the other hand, these fat pointers have a drawback: space overhead. In particular, if you have a dense graph of trait objects that are pointing to each other, each pointer’s size is doubled, but the information being stored is often redundant (and the relevant traits are often implemented up front). For serious object graphs, this is a non-starter.

In Rust, when faced with representation tradeoffs, we have a simple tool: the repr attribute. So we can address the desire for thin pointers to traits by introducing #[repr(thin)]:

#[repr(thin)]
trait MyThinTrait {
    fn doit(&self);
}

struct MyType {
    // ...
}

impl MyThinTrait for MyType {
    fn doit(&self) { /* ... */ }
}

fn take_thin(t: &MyThinTrait) {
    t.doit()
}

Applying #[repr(thin)] to a trait like MyThinTrait has a few implications:

The representation of types like &MyThinTrait (used in take_thin) is a single pointer, which points to a vtable followed by the data.
You can only implement a thin trait for types you define in your crate.
If you implement multiple thin traits for a given type, they must form a hierarchy. That ensures consistent layout of the vtable.

Thin traits are a useful representation for Rust to offer regardless of any notion of “inheritance”.

Incorporating fields

Now all that’s left to handle is fields. To get maximal performance, we need some way for a trait to require that Self provides specific fields at statically-known locations (which is true for genuine inheritance hierarchies as one gets in languages like C++). In particular, accessing such fields through a trait is no more expensive than accessing them directly through a known struct: you just load the offset. But unlike a full struct definition, this only requires the existence of a specific field at a specific location; it doesn’t constrain other fields that might be available.

(Note: it may also be desirable to support fields in traits at dynamic offsets, which is still faster than a full dynamic dispatch, but that’s distinct from the requirements set out at the beginning of the post.)

In addition, it would be ideal if field definitions only have to be mentioned once, not repeated in every trait and struct definition that is talking about them.

There are likely a lot of ways we could accomplish these goals, but I’ll highlight a few promising ones, in order of increasing complexity: struct composition, struct inheritance, and trait fields.

Struct composition

There’s a very simple way, in today’s Rust, for one struct to contain all of the information of another struct: simply include the other struct as a field!

struct NodeFields {
    event_target: EventTarget,
    parent_node: Arc<Node>, // Note: this refers to the Node *trait*! see below.
    first_child: Arc<Node>,
    last_child: Arc<Node>,
    next_sibling: Arc<Node>,
    prev_sibling: Arc<Node>,
    // ...
}

struct ElementFields {
    node_fields: NodeFields,
    local_name: Atom,
    namespace: Namespace,
    // ...
}

In this example, Element “inherits” the fields of Node simply by embedding them. This approach also neatly solves the problem of reusing constructors across the hierarchy: a Node here is already a partly-constructed Element.

So the remaining question is how to gain access to these fields in a trait. Here’s one possibility:

trait Node: NodeFields { /* ... */ }
trait Element: Node + ElementFields { /* ... */ }

The idea is that a trait can list any number of super-structs (including indirectly through super-traits), so long as those structs form a composition hierarchy: they must form a chain where each struct contains a leading field whose type is the next struct (like with ElementFields and NodeFields above). The chain then ends in the Self type implementing the trait.

So, to complete the DOM example, we’d have:

struct HtmlAnchorElement {
    element_fields: ElementFields, // internally, contains a leading NodeFields
    rel_list: DomTokenList,
    // moar fields
}

impl Node for HtmlAnchorElement { /* ... */ }
impl Element for HtmlAnchorElement { /* ... */ }

When the trait is in scope, it allows direct access to the fields as if they had been flattened into the struct:

fn parent_node_via_element(e: &Element) -> Arc<node> {
    e.parent_node.clone()
}

(A more verbose alternative might be some UFCS-style <self as NodeFields>::parent_node.clone().)

Of course, this proposal assumes that structs beginning with the same leading field always lay that field out in the same way – in particular, this rules out reordering of the field. If we don’t want to make such a guarantee, we could limit use of struct bounds to thin traits. Since thin traits are implemented for structs in the same crate defining those structs, they can impose additional representation constraints.

Struct inheritance

The above struct composition approach is, in some ways, pretty simple. It builds directly on current patterns for building hierarchies of structs. But it is perhaps too targeted and narrow.

A more expansive alternative is explicit struct inheritance. The basic idea here is very simple:

struct NodeFields {
    event_target: EventTarget,
    parent_node: Arc<Node>,
    // ...
}

// implicitly contains all of NodeFields fields
struct ElementFields: NodeFields {
    local_name: Atom,
    namespace: Namespace,
    // ...
}

The initializer sytax for child structs would then permit either providing all field explicitly, or extending from an instance of the parent structure:

// Style 1: all fields
let e1: ElementFields = ElementFields {
    event_target: /* ... */,
    parent_node: /* ... */,
    // more NodeFields fields
    local_name: /* ... */,
    namespace: /* ... */,
    // more ElementFields fields
};

// Style 2: using parent struct
let n: NodeFields = NodeFields {
    event_target: /* ... */,
    parent_node: /* ... */,
    // more NodeFields fields
};
let e2: ElementFields = ElementFields {
    local_name: /* ... */,
    namespace: /* ... */,
    // more ElementFields fields
    ..n
};

This covers the need for constructors at each level of the hierarchy. But what about hooking into traits?

With struct inheritance, we can treat structs in general as something you can write in a bound, e.g.:

fn take_node_descendant<T: NodeFields>(t: &T) -> Arc<Node> {
    t.parent_node.clone()
}

// same as writing `trait Node where Self: NodeFields`
trait Node: NodeFields { /* ... */ }

In all cases, using a struct as a bound like T: NodeFields means that T must inherit from NodeFields. For traits, that would immediately give you access to the fields, and would impose the fixed static offset requirement (giving maximal performance when accessing those fields). That is, given the definition of Node above, if we have n: &Node, we could write n.parent_node and that would compile as if we had n: NodeFields instead. Because inheritance is explicit at the point of struct definition, we don’t have the layout worries we had with struct composition; we can lay out child structs so that their parents are always consistent prefixes.

It’s plausible that struct inheritance is generally useful outside of OO-like hierarchies; certainly it avoids long chains like my_struct.parent1.parent2.actual_field one gets when using struct composition at scale, although the previous proposal does that as well.

While it’s not necessary to address our goals here, you could also imagine adding coercions from &ElementFields to &NodeFields.

Trait fields

Finally, we could imagine instead adding fields directly to traits:

trait Node {
    parent_node: Arc<Node>,
    // ...
}

This raises a few questions:

What do you have to say in an impl?
Can the fields be hooked up arbitrarily to Self, or must they form a prefix?
Can you avoid writing out copies of the field definitions in every struct in the hierarchy?

There are many ways we might answer these questions, and I won’t try to fully explore the space here. But a simple option is to say that the fields must form a prefix of the fields of Self, and you don’t have to say anything in an impl. Furthermore, if you write:

struct HtmlAnchorElement: Element {
    rel_list: DomTokenList,
    // moar fields
}

you automatically include the fields mentioned in the Element trait (including those from its super-trait Node) as leading fields in the struct.

If we furthermore want to provide reusable constructors at every level of the hierarchy, we have to also include some way of naming these intermediate structs and using them in initaializers, e.g.:

let n: Node::struct = Node::struct {
    event_target: /* ... */,
    parent_node: /* ... */,
    // more Node fields
};
let e: Element::struct = Element::struct {
    local_name: /* ... */,
    namespace: /* ... */,
    // more Element fields
    ..n
};
let h: HtmlAnchorElement = HtmlAnchorElement {
    rel_list: /* ... */,
    // more HtmlAnchorElement fields
    ..e
}

The main advantage of this approach over the previous ones is that you avoid the need to explicitly name structs corresponding to “abstract base classes” (like NodeFields and ElementFields); instead, these are implicit via the ::struct associated type. But there remain questions about using this syntax more flexibly, for mapping trait fields in more arbitrary ways to struct fields, without giving up performance in the fixed-offset case.

A downside is the question of visibility: currently all items in a trait are considered public, but this is not necessarily desirable for fields, so we’d likely need some way to express visibility choices. Fitting that into the existing syntax is not going to be easy. Whereas with struct composition or inheritance, it just “falls out” of the struct definitions.

Ending 2: the enum-based approach

Whew! So all of the proposals we just saw took traits as the sole source of dynamic dispatch and tried to close remaining gaps through slight enrichments of traits and structs.

A radically different approach takes the perspective that Rust already has two forms of dynamic dispatch today – traits and match expressions – corresponding to “open” (extensible) and closed sets of types respectively.

Niko Matsakis outlined a substantial expansion to enums in his recent post, which is part of the original design he, Nick Cameron, and I had been working on. The missing piece in that post is how to tie it together with specialization and thereby get something more like inheritance.

I won’t recap the whole proposal here (it’s worth a read in full!), but in short it makes enums into full-blown hierarchies, defining types at every level (and structs at the leaves):

enum Node {
  // where this node is positioned after layout
  position: Rectangle,
  ...
}

enum Element: Node {
  ...
}

struct TextElement: Element {
  ...
}

struct ParagraphElement: Element {
  ...
}

...

Given such a hierarchy, you can write:

fn takes_any_node<T: Node>(n: &T) { ... }

That is, you can use an enum as a bound, which stands for “any type under this point in the hierarchy”, but will be statically resolved via monomorphization.

This immediately opens the door to specialization for reuse:

trait ParsePlainAttribute {
    fn parse_plain_attribute(&self, name: &Atom, value: DomString) -> AttrValue;
}

impl<T: Node> ParsePlainAttribute for T {
    default fn parse_plain_attribute(&self, name: &Atom, value: DomString) -> AttrValue {
        AttrValue::String(value)
    }
}

impl<T: Element> ParsePlainAttribute for T {
    default fn parse_plain_attribute(&self, name: &Atom, value: DomString) -> AttrValue {
        match name {
            &atom!("id") => AttrValue::from_atomic(value),
            &atom!("class") => AttrValue::from_serialized_tokenlist(value),
            _ => default::parse_plain_attribute(self, name, value),
        }
    }
}

impl ParsePlainAttribute for HtmlAnchorElement {
    fn parse_plain_attribute(&self, name: &Atom, value: DOMString) -> AttrValue {
        match name {
            &atom!("rel") => AttrValue::from_serialized_tokenlist(value),
            _ => default::parse_plain_attribute(self, name, value),
        }
    }
}

This pattern of specialization, trying to match the example at the beginning of the post, almost works: if you call parse_plain_attribute on an HtmlAnchorElement, you’ll get the correct behavior.

But if you’ve upcasted to an Element or Node, the behavior will revert to the defaults for those types! That’s because you’re not getting dynamic dispatch here, which for enums we said should go through match.

The solution is to instead write the code as follows:

impl ParsePlainAttribute for Node {
    default fn parse_plain_attribute(&self, name: &Atom, value: DomString) -> AttrValue {
        AttrValue::String(value)
    }
}

impl ParsePlainAttribute for Element {
    default fn parse_plain_attribute(&self, name: &Atom, value: DomString) -> AttrValue {
        match name {
            &atom!("id") => AttrValue::from_atomic(value),
            &atom!("class") => AttrValue::from_serialized_tokenlist(value),
            _ => default::parse_plain_attribute(self, name, value),
        }
    }
}

impl ParsePlainAttribute for HtmlAnchorElement {
    fn parse_plain_attribute(&self, name: &Atom, value: DOMString) -> AttrValue {
        match name {
            &atom!("rel") => AttrValue::from_serialized_tokenlist(value),
            _ => default::parse_plain_attribute(self, name, value),
        }
    }
}

Note that we’re now using specialization without any explicit blanket impls. In this proposal, when the Self type is an enum, it’s as if you had written a blanket impl like the ones above, except that when the function is invoked on the enum type, it will use a match to dispatch to most specialized implementation. That is, it’s as if we’d written the following for Node:

// Fully generic *default* impl
impl<T: Node> ParsePlainAttribute for T {
    default fn parse_plain_attribute(&self, name: &Atom, value: DomString) -> AttrValue {
        AttrValue::String(value)
    }
}

// Specialized impl for the `Node` type itself
impl ParsePlainAttribute for Node {
    fn parse_plain_attribute(&self, name: &Atom, value: DomString) -> AttrValue {
        match *self {
            // NOTE: `this` has a different type in each arm!
            ref this@Node::HtmlAnchorElement => this.parse_plain_attribute(name, value),
            ref this@Node::HtmlImageElement => this.parse_plain_attribute(name, value),
            // ...
        }
    }
}

And of course:

It’s nicer to write the impls directly against the enum type than using an explicit blanket;
This is also almost certainly the behavior you wanted anyway: you always get the most specific impl, whether dynamically or statically dispatched.

But the desugaring only triggers when Self is a direct enum type.

Getting opinionated

If you’ve stuck with me until this point, first of all: thanks! That was a long haul.

So, what should we do? To recap, we have two major routes, both of which start with specialization.

The first takes a minimalistic approach, adding a couple of minor (and independently motivated) features to traits and structs to get to our goal. There are a few options for dealing with fields, in particular.
The second combines specialization with another major set of enhancements to enums, which are also independently motivated, but are much more complex. It then adds a bit of sugar on top to tie the two together.

The trait approach allows for open-ended hierarchies (extensible by downstream crates), while the enum approach is closed. On the flip side, that means that downcasting can be somewhat more awkward (and is a code smell) for the trait approach, while it’s completely natural (just a match) for the enum approach.

Initially, I was very excited about the latter, enum-centric approach, because the work on enum hierarchies seems like such a natural extension to Rust. But I am worried about a few things. First of all, getting the proposal that Niko laid out to work will require substantially reworking our type inference to account for subtyping. Making this happen backwards-compatibly with our existing enums is not going to be easy, and may not even be possible. It also makes subtyping much more important in Rust, which is likely to be a complexity multiplier as it interacts with every other aspect of the type system. There are also various dark syntactic corners that come out of the partial unification of enums and structs (e.g., how do you specify visibility of shared fields in an in-line enum?) Finally, the key bit of sugar at the apex that ties enum hierarchies and specialization together is a bit subtle, and only covers the most obvious cases of Self.

In short, I see a lot of known risks, and worry about unknown risks, with the enum hierarchy route.

In contrast, the struct/trait-centric proposals feel much less risky; they don’t introduce fundamental new complexity that interacts with the rest of the type system, and they have a smaller overall footprint. I also like the way that specialization is used very explicitly to get inheritance, rather than through a layer of sugar on top.

And of the struct proposals, I think that struct inheritance strikes the best balance between minimalism, flexibility, and “fit” with the existing language.

The main downside of my preferred struct/trait approach is that there’s some amount of boilerplate: “abstract base classes” like Node turn into a Node trait and a NodeFields struct. That detail could easily be hidden behind a macro, but it might also argue in favor of the more complex traits-with-fields variant.

(I should mention: it’s possible that in the long run, we’ll want subtyping and not just coercions with any approach we take. But still, enum hierarchies want this to happen for our existing enums in a way that may require breakage.)

If we do go with a struct/trait approach, I think we should carefully check that the design is compatible with a future expansion of enums along the lines Niko laid out in his post, since we may still want enum hierarchies in the long run. I’ve given this a fair amount of thought, and likewise think that struct inheritance is the best fit. In particular, it could replace the somewhat magical proposal for “common fields” (which included an associated MyEnum::struct type).

http://aturon.github.io/tech/2015/09/18/reuse

Lock-freedom without garbage collection

Aug 27, 2015 Updated Aug 27, 2015

Show full content

TL;DR

It’s widespread folklore that one advantage of garbage collection is the ease of building high-performance lock-free data structures. Manual memory management for these data structures is not easy, and a GC makes it trivial.

This post shows that, using Rust, it’s possible to build a memory management API for concurrent data structures that:

Makes it as easy to implement lock-free data structures as a GC does;
Statically safeguards against misuse of the memory management scheme;
Has overhead competitive with (and more predictable than) GC.

In the benchmarks I show below, Rust is able to easily beat a Java lock-free queue implementation, with an implementation that’s as easy to write.

I’ve implemented “epoch-based memory reclamation” in a new library called Crossbeam, which is ready to use in for your own data structures today. This post covers some background on lock-free data structures, the epoch algorithm, and the entire Rust API.

Contents

Benchmarks
Lock-free data structures
- Treiber’s stack
- The problem
Epoch-based reclamation
The Rust API
The road ahead

Benchmarks

Before looking in depth at the API design and usage for epoch reclamation, let’s cut right to the chase: performance.

To test the overhead my Crossbeam implementation relative to a full GC, I implemented a basic lock-free queue (a vanilla Michael-Scott queue) on top of it, and built the same queue in Scala. In general, JVM-based languages are a good test case for the “good GC” path toward lock-free data structures.

In addition to these implementations, I compared against:

A more efficient “segmented” queue that allocates nodes with multiple slots. I wrote this queue in Rust, on top of Crossbeam.
A Rust single-threaded queue protected by a mutex.
The java.util.concurrent queue implementation (ConcurrentLinkedQueue), which is a tuned variant of the Michael-Scott queue.

I tested these queues in two ways:

A multi-producer, single-consumer (MPSC) scenario in which two threads repeatedly send messages and one thread receives them, both in a tight loop.
A multi-producer, multi-consumer (MPMC) scenario in which two threads send and two thread receive in a tight loop.

Benchmarks like these are fairly typical for measuring the scalability of a lock-free data structure under “contention” – multiple threads competing to make concurrent updates simultaenously. There are many variations that should be benchmarked when building a production queue implementation; the goal here is just to gauge the ballpark overhead of the memory management scheme.

For the MPSC test, I also compared against the algorithm used in Rust’s built-in channels, which is optimized for this scenario (and hence doesn’t support MPMC).

The machine is a 4 core 2.6Ghz Intel Core i7 with 16GB RAM.

Here are the results, given in nanosecond per message (lower is better):

Analysis

The main takeaway is that the Crossbeam implementation – which has not been tuned – is competitive in all cases. It’s possible to do better on both the Rust and JVM sides by using more clever or specialized queues, but these results show at least that the overhead of epochs is reasonable.

Notice that the Java/Scala versions fare much better in the MPMC test than they do in MPSC test. Why is that?

The answer is simple: garbage collection. In the MPSC test, the producers tend to overrun the consumer over time, meaning that the amount of data in the queue slowly grows. That in turn increases the cost of each garbage collection, which involves walking over the live data set.

In the epoch scheme, by contrast, the cost of managing garbage is relatively fixed: it’s proportional to the number of threads, not the amount of live data. This turns out to yield both better and more consistent/predictable performance.

Finally, one comparison I did not include on the chart (because it would dwarf the others) was using a Mutex around a deque in Rust. For the MPMC test, performance was around 3040ns/operation, over 20x slower than the Crossbeam implementation. This is a vivid demonstration of why lock-free data structure are important – so let’s start by diving into what those are.

Lock-free data structures

When you want to use (and mutate) a data structure from many concurrent threads, you need synchronization. The simplest solution is a global lock – in Rust, wrapping the entire data structure in a Mutex and calling it a day.

Problem is, that kind of “coarse-grained” synchronization means that multiple threads always need to coordinate when accessing a data structure, even if they were accessing disjoint pieces of it. It also means that even when a thread is only trying to read, it must write, by updating the lock state – and since the lock is a global point of communication, these writes lead to a large amount of cache invalidation traffic. Even if you use a lot of locks at a finer grain, there are other hazards like deadlock and priority inversion, and you often still leave performance on the table.

A more radical alternative is lock-free data structures, which use atomic operations to make direct changes to the data structure without further synchronization. They are often faster, more scalable, and more robust than lock-based designs.

I won’t try to give a full tutorial to lock-free programming in this post, but a key point is that, if you don’t have global synchronization, it’s very difficult to tell when you can free memory. Many published algorithms basically assume a garbage collector or some other means of reclaiming memory. So before lock-free concurrency can really take off in Rust, we need a story for memory reclamation – and that’s what this blog post is all about.

Treiber’s stack

To make things more concrete, let’s look at the “Hello world” of lock-free data structures: Treiber’s stack. The stack is represented as a singly-linked list, with all modifications happening on the head pointer:

#![feature(box_raw)]

use std::ptr::{self, null_mut};
use std::sync::atomic::AtomicPtr;
use std::sync::atomic::Ordering::{Relaxed, Release, Acquire};

pub struct Stack<T> {
    head: AtomicPtr<Node<T>>,
}

struct Node<T> {
    data: T,
    next: *mut Node<T>,
}

impl<T> Stack<T> {
    pub fn new() -> Stack<T> {
        Stack {
            head: AtomicPtr::new(null_mut()),
        }
    }
}

It’s easiest to start with popping. To pop, you just loop, taking a snapshot of the head and doing a compare-and-swap replacing the snapshot with its next pointer:

Note that compare_and_swap atomically changes the value of an AtomicPtr from an old value to a new value, if the old value matched. Also, for this post you can safely ignore the Acquire, Release and Relaxed labels if you’re not familiar with them.

impl<T> Stack<T> {
    pub fn pop(&self) -> Option<T> {
        loop {
            // take a snapshot
            let head = self.head.load(Acquire);

            // we observed the stack empty
            if head == null_mut() {
                return None
            } else {
                let next = unsafe { (*head).next };

                // if snapshot is still good, update from `head` to `next`
                if self.head.compare_and_swap(head, next, Release) == head {

                    // extract out the data from the now-unlinked node
                    // **NOTE**: leaks the node!
                    return Some(unsafe { ptr::read(&(*head).data) })
                }
            }
        }
    }
}

The ptr::read function is Rust’s way of extracting ownership of data without static or dynamic tracking. Here we are using the atomicity of compare_and_swap to guarantee that only one thread will call ptr::read – and as we’ll see, this implementation never frees Nodes, so the destructor on data is never invoked. Those two facts together make our use of ptr::read safe.

Pushing is similar:

impl<T> Stack<T> {
    pub fn push(&self, t: T) {
        // allocate the node, and immediately turn it into a *mut pointer
        let n = Box::into_raw(Box::new(Node {
            data: t,
            next: null_mut(),
        }));
        loop {
            // snapshot current head
            let head = self.head.load(Relaxed);

            // update `next` pointer with snapshot
            unsafe { (*n).next = head; }

            // if snapshot is still good, link in new node
            if self.head.compare_and_swap(head, n, Release) == head {
                break
            }
        }
    }
}

The problem

If we had coded the above in a language with a GC, we’d be done. But as written in Rust, it leaks memory. In particular, the pop implementation doesn’t attempt to free the node pointer after it has removed it from the stack.

What would go wrong if we did just that?

// extract out the data from the now-unlinked node
let ret = Some(unsafe { ptr::read(&(*head).data) });

// free the node
mem::drop(Box::from_raw(head));

return ret

The problem is that other threads could also be running pop at the same time. Those threads could have a snapshot of the current head; nothing would prevent them from reading (*head).next on that snapshot just after we deallocate the node they’re pointing to – a use-after-free bug in the making!

So that’s the crux. We want to use lock-free algorithms, but many follow a similar pattern to the stack above, leaving us with no clear point where it’s safe to deallocate a node. What now?

Epoch-based reclamation

There are a few non-GC-based ways of managing memory for lock-free code, but they all come down to the same core observations:

There are two sources of reachability at play – the data structure, and the snapshots in threads accessing it. Before we delete a node, we need to know that it cannot be reached in either of these ways.
Once a node has been unlinked from the data structure, no new snapshots reaching it will be created.

One of the most elegant and promising reclamation schemes is Keir Fraser’s epoch-based reclamation, which was described in very loose terms in his PhD thesis.

The basic idea is to stash away nodes that have been unlinked from the data structure (the first source of reachability) until they can be safely deleted. Before we can delete a stashed node, we need to know that all threads that were accessing the data structure at the time have finished the operation they were performing. By observation 2 above, that will imply that there are no longer any snapshots left (since no new ones could have been created in the meantime). The hard part is doing all of this without much synchronization. Otherwise, we lose the benefit that lock-freedom was supposed to bring in the first place!

The epoch scheme works by having:

A global epoch counter (taking on values 0, 1, and 2);
A global list of garbage for each epoch;
An “active” flag for each thread;
An epoch counter for each thread.

The epochs are used to discover when garbage can safely be freed, because no thread can reach it. Unlike traditional GC, this does not require walking through live data; it’s purely a matter of checking epoch counts.

When a thread wants to perform an operation on the data structure, it first sets its “active” flag, and then updates its local epoch to match the global one. If the thread removes a node from the data structure, it adds that node to the garbage list for the current global epoch. (Note: it’s very important that the garbage go into the current global epoch, not the previous local snapshot.) When it completes its operation, it clears the “active” flag.

To try to collect the garbage (which can be done at any point), a thread walks over the flags for all participating threads, and checks whether all active threads are in the current epoch. If so, it can attempt to increment the global epoch (modulo 3). If the increment succeeds, the garbage from two epochs ago can be freed.

Why do we need three epochs? Because “garbage collection” is done concurrently, it’s possible for threads to be in one of two epochs at any time (the “old” one, and the “new” one). But because we check that all active threads are in the old epoch before incrementing it, we are guaranteed that no active threads are in the third epoch.

This scheme is carefully designed so that most of the time, threads touch data that is already in cache or is (usually) thread-local. Only doing “GC” involves changing the global epoch or reading the epochs of other threads. The epoch approach is also algorithm-agnostic, easy to use, and its performance is competitive with other approaches.

It also turns out to be a great match for Rust’s ownership system.

The Rust API

We want the Rust API to reflect the basic principles of epoch-based reclamation:

When operating on a shared data structure, a thread must always be in its “active” state.
When a thread is active, all data read out of the data structure will remain allocated until the thread becomes inactive.

We’ll leverage Rust’s ownership system – in particular, ownership-based resource management (aka RAII) – to capture these constraints directly in the type signatures of an epoch API. This will in turn help ensure we use epoch management correctly.

Guard

To operate on a lock-free data structure, you first acquire a guard, which is an owned value that represents your thread being active:

pub struct Guard { ... }
pub fn pin() -> Guard;

The pin function marks the thread as active, loads the global epoch, and may try to perform GC (detailed a bit later in the post). The destructor for Guard, on the other hand, exits epoch management by marking the thread inactive.

Since the Guard represents “being active”, a borrow &'a Guard guarantees that the thread is active for the entire lifetime 'a – exactly what we need to bound the lifetime of the snapshots taken in a lock-free algorithm.

To put the Guard to use, Crossbeam provides a set of three pointer types meant to work together:

Owned<T>, akin to Box<T>, which points to uniquely-owned data that has not yet been published in a concurrent data structure.
Shared<'a, T>, akin to &'a T, which points to shared data that may or may not be reachable from a data structure, but it guaranteed not to be freed during lifetime 'a.
Atomic<T>, akin to std::sync::atomic::AtomicPtr, which provides atomic updates to a pointer using the Owned and Shared types, and connects them to a Guard.

We’ll look at each of these in turn.

Owned and Shared pointers

The Owned pointer has an interface nearly identical to Box:

pub struct Owned<T> { ... }

impl<T> Owned<T> {
    pub fn new(t: T) -> Owned<T>;
}

impl<T> Deref for Owned<T> {
    type Target = T;
    ...
}
impl<T> DerefMut for Owned<T> { ... }

The Shared<'a, T> pointer is similar to &'a T – it is Copy – but it dereferences to a &'a T. This is a somewhat hacky way of conveying that the lifetime of the pointer it provides is in fact 'a.

pub struct Shared<'a, T: 'a> { ... }

impl<'a, T> Copy for Shared<'a, T> { ... }
impl<'a, T> Clone for Shared<'a, T> { ... }

impl<'a, T> Deref for Shared<'a, T> {
    type Target = &'a T;
    ...
}

Unlike Owned, there is no way to create a Shared pointer directly. Instead, Shared pointers are acquired by reading from an Atomic, as we’ll see next.

Atomic

The heart of the library is Atomic, which provides atomic access to a (nullable) pointer, and connects all the other types of the library together:

pub struct Atomic<T> { ... }

impl<T> Atomic<T> {
    /// Create a new, null atomic pointer.
    pub fn null() -> Atomic<T>;
}

We’ll look at operations one at a time, since the signatures are somewhat subtle.

First, loading from an Atomic:

impl<T> Atomic<T> {
    pub fn load<'a>(&self, ord: Ordering, _: &'a Guard) -> Option<Shared<'a, T>>;
}

In order to perform the load, we must pass in a borrow of a Guard. As explained above, this is a way of guaranteeing that the thread is active for the entire lifetime 'a. In return, you get an optional Shared pointer back (None if the Atomic is currently null), with lifetime tied to the guard.

It’s interesting to compare this to the standard library’s AtomicPtr interface, where load returns a *mut T. Due to the use of epochs, we’re able to guarantee safe dereferencing of the pointer within 'a, whereas with AtomicPtr all bets are off.

Storing

Storing is a bit more complicated because of the multiple pointer types in play.

If we simply want to write an Owned pointer or a null value, we do not even need the thread to be active. We are just transferring ownership into the data structure, and don’t need any assurance about the lifetimes of pointers:

impl<T> Atomic<T> {
    pub fn store(&self, val: Option<Owned<T>>, ord: Ordering);
}

Sometimes, though, we want to transfer ownership into the data structure and immediately acquire a shared pointer to the transferred data – for example, because we want to add additional links to the same node in the data structure. In that case, we’ll need to tie the lifetime to a guard:

impl<T> Atomic<T> {
    pub fn store_and_ref<'a>(&self,
                             val: Owned<T>,
                             ord: Ordering,
                             _: &'a Guard)
                             -> Shared<'a, T>;
}

Note that the runtime representation of val and the return value is exactly the same – we’re passing a pointer in, and getting the same pointer out. But the ownership situation from Rust’s perspective changes radically in this step.

Finally, we can store a shared pointer back into the data structure:

impl<T> Atomic<T> {
    pub fn store_shared(&self, val: Option<Shared<T>>, ord: Ordering);
}

This operation does not require a guard, because we’re not learning any new information about the lifetime of a pointer.

CAS

Next we have a similar family of compare-and-set operations. The simplest case is swapping a Shared pointer with a fresh Owned one:

impl<T> Atomic<T> {
    pub fn cas(&self,
               old: Option<Shared<T>>,
               new: Option<Owned<T>>,
               ord: Ordering)
               -> Result<(), Option<Owned<T>>>;
}

As with store, this operation does not require a guard; it produces no new lifetime information. The Result indicates whether the CAS succeeded; if not, ownership of the new pointer is returned to the caller.

We then have an analog to store_and_ref:

impl<T> Atomic<T> {
    pub fn cas_and_ref<'a>(&self,
                           old: Option<Shared<T>>,
                           new: Owned<T>,
                           ord: Ordering,
                           _: &'a Guard)
                           -> Result<Shared<'a, T>, Owned<T>>;

In this case, on a successful CAS we acquire a Shared pointer to the data we inserted.

Finally, we can replace one Shared pointer with another:

impl<T> Atomic<T> {
    pub fn cas_shared(&self,
                             old: Option<Shared<T>>,
                             new: Option<Shared<T>>,
                             ord: Ordering)
                             -> bool;
}

The boolean return value is true when the CAS is successful.

Freeing memory

Of course, all of the above machinery is in service of the ultimate goal: actually freeing memory that is no longer reachable. When a node has been de-linked from the data structure, the thread that delinked it can inform its Guard that the memory should be reclaimed:

impl Guard {
    pub unsafe fn unlinked<T>(&self, val: Shared<T>);
}

This operation adds the Shared pointer to the appropriate garbage list, allowing it to be freed two epochs later.

The operation is unsafe because it is asserting that:

the Shared pointer is not reachable from the data structure,
no other thread will call unlinked on it.

Crucially, though, other threads may continue to reference this Shared pointer; the epoch system will ensure that no threads are doing so by the time the pointer is actually freed.

There is no particular connection between the lifetime of the Shared pointer here and the Guard; if we have a reachable Shared pointer, we know that the guard it came from is active.

Treiber’s stack on epochs

Without further ado, here is the code for Treiber’s stack using the Crossbeam epoch API:

use std::sync::atomic::Ordering::{Acquire, Release, Relaxed};
use std::ptr;

use crossbeam::mem::epoch::{self, Atomic, Owned};

pub struct TreiberStack<T> {
    head: Atomic<Node<T>>,
}

struct Node<T> {
    data: T,
    next: Atomic<Node<T>>,
}

impl<T> TreiberStack<T> {
    pub fn new() -> TreiberStack<T> {
        TreiberStack {
            head: Atomic::new()
        }
    }

    pub fn push(&self, t: T) {
        // allocate the node via Owned
        let mut n = Owned::new(Node {
            data: t,
            next: Atomic::new(),
        });

        // become active
        let guard = epoch::pin();

        loop {
            // snapshot current head
            let head = self.head.load(Relaxed, &guard);

            // update `next` pointer with snapshot
            n.next.store_shared(head, Relaxed);

            // if snapshot is still good, link in the new node
            match self.head.cas_and_ref(head, n, Release, &guard) {
                Ok(_) => return,
                Err(owned) => n = owned,
            }
        }
    }

    pub fn pop(&self) -> Option<T> {
        // become active
        let guard = epoch::pin();

        loop {
            // take a snapshot
            match self.head.load(Acquire, &guard) {
                // the stack is non-empty
                Some(head) => {
                    // read through the snapshot, *safely*!
                    let next = head.next.load(Relaxed, &guard);

                    // if snapshot is still good, update from `head` to `next`
                    if self.head.cas_shared(Some(head), next, Release) {
                        unsafe {
                            // mark the node as unlinked
                            guard.unlinked(head);

                            // extract out the data from the now-unlinked node
                            return Some(ptr::read(&(*head).data))
                        }
                    }
                }

                // we observed the stack empty
                None => return None
            }
        }
    }
}

Some obserations:

The basic logic of the algorithm is identical to the version that relies on a GC, except that we explicitly flag the popped node as “unlinked”. In general, it’s possible to take lock-free algorithms “off the shelf” (the ones on the shelf generally assume a GC) and code them up directly against Crossbeam in this way.
After we take a snapshot, we can dereference it without using unsafe, because the guard guarantees its liveness.
The use of ptr::read here is justified by our use of compare-and-swap to ensure that only one thread calls it, and the fact that the epoch reclamation scheme does not run destructors, but merely deallocates memory.

The last point about deallocation deserves a bit more comment, so let’s wrap up the API description by talking about garbage.

Managing garbage

The design in Crossbeam treats epoch management as a service shared by all data structures: there is a single static for global epoch state, and a single thread-local for the per-thread state. This makes the epoch API very simple to use, since there’s no per-data structure setup. It also means the (rather trivial) space usage is tied to the number of threads using epochs, not the number of data structures.

One difference in Crossbeam’s implementation from the existing literature on epochs is that each thread keeps local garbage lists. That is, when a thread marks a node as “unlinked” that node is added to some thread-local data, rather than immediately to a global garbage list (which would require additional synchronization).

Each time you call epoch::pin(), the current thread will check whether its local garbage has surpassed a collection threshold, and if so, it will attempt a collection. Likewise, whenever you call epoch::pin(), if the global epoch has advanced past the previous snapshot, the current thread can collect some of its garbage. Besides avoiding global synchronization around the garbage lists, this new scheme spreads out the work of actually freeing memory among all the threads accessing a data structure.

Because GC can only occur if all active threads are on the current epoch, it’s not always possible to collect. But in practice, the garbage on a given thread rarely exceeds the threshold.

There’s one catch, though: because GC can fail, if a thread is exiting, it needs to do something with its garbage. So the Crossbeam implementation also has global garbage lists, which are used as a last-ditch place to throw garbage when a thread exits. These global garbage lists are collected by the thread that successfully increments the global epoch.

Finally, what does it mean to “collect” the garbage? As mentioned above, the library only deallocates the memory; it does not run destructors.

Conceptually, the framework splits up the destruction of an object into two pieces: destroying/moving out interior data, and deallocating the object containing it. The former should happen at the same time as invoking unlinked – that’s the point where there is a unique thread that owns the object in every sense except the ability to actually deallocate it. The latter happens at some unknown later point, when the object is known to no longer be referenced. This does impose an obligation on the user: access through a snapshot should only read data that will be valid until deallocation. But this is basically always the case for lock-free data structures, which tend to have a clear split between data relevant to the container (i.e., Atomic fields), and the actual data contained (like the data field in Node).

Splitting up the tear down of an object this way means that destructors run synchronously, at predictable times, alleviating one of the pain points of GC, and allowing the framework to be used with non-'static (and non-Send) data.

The road ahead

Crossbeam is still in its infancy. The work here is laying the foundation for exploring a wide range of lock-free data structures in Rust, and I hope for Crossbeam to eventually play a role similar to java.util.concurrent for Rust – including a lock-free hashmap, work-stealing deques, and lightweight task engine. If you’re interested in this work, I’d love to have help!

http://aturon.github.io/tech/2015/08/27/epoch