xd009642 — GeistHaus

Apr 1, 2026 Updated Apr 1, 2026

I’ve been inspired to write something for April Cools Club, and what fits better from my normal content than my experience rice farming in rural Japan!

Show full content

I’ve been inspired to write something for April Cools Club, and what fits better from my normal content than my experience rice farming in rural Japan!

For those who aren’t aware, in 2025 I spent January-July in Japan staying with my wife’s family. During that time we helped out on the family rice farm near Shuzenji in the Shizuoka prefecture. I unfortunately had to leave before the full harvest process was done but I’ll take you as far as I got and also try and share other insights I gleaned.

Unfortunately, while I thought I took a lot of photos it seems I’m missing things I would have liked to have captured for this. Where applicable there’ll be other sources and at least one video linked for more information.

Quick Farm Intro

The farm is primarily a rice farm, there’s no animals (ignoring the koi fish that were in the garden pond). There is a portion of a bamboo forest and space to plant non-rice crops so we also grow or harvest for consumption:

Bamboo roots
Mushrooms
Potatoes
Pumpkins/gourds
Carrots
Warabi - an edible bracken. Not so much planted it just grows everywhere
Whatever other vegetables they decide to plant

And looking out from the driveway this is the view at the end of winter before everything starts growing and spring properly kicks in:

the_view

Obviously, me and my wife aren’t around all the time. When we’re not my brother-in-law and mother-in-law work on the farm part time often 1-2 days per week each.

Preparation

At the start of spring we come back to the fields. They’ve been left fallow over the winter and the dead rice plants from last year cover them. They’re currently dry, we’ll flood them later on once they’ve been prepared. Because it’s hard and spiky we have cordless strimmers with metal blades to cut through this and not take forever.

I don’t have a picture of a rice field before clearing but here is one of the fields partially cleared:

partially_cleared_field

As part of clearing and getting ready we also dig the drainage ditches along the side of the fields. When the fields were drained last year the soft mud flows back into the ditches and hardens again so they need to be redug.

The field will also be ploughed to break up the soil and loosen it, and we’ll remove large rocks we find. After ploughing we can level the field to flatten it. With the field level, the rice will be at equal depths and the planting process is more consistent.

This might be the first time someone’s prepared a rice field wearing a Rust London t-shirt. It’s definitely my first time driving a tractor!

ploughing

But before we get that far we have to prepare the route for the water to get into the field. This work is only actually needed for one of our fields, the others have a fairly direct route from the river to the field. But for one field we have to clear a few hundred metres of a channel that goes along the edge of a bamboo forest clearing the dead bamboo and other natural detritus.

I don’t have any pictures of this, but imagine all the joys of clearing out hundreds of meters of ditches among dense vegetation in high humidity.

I do have a picture after ploughing with the drainage ditch for the field dug next to the river that will supply our water:

post-ploughing

One last thing we might do before flooding is drive metal rods into the perimeter of the field as part of building a fence. This doesn’t have to be done for every field, just the ones that border the bamboo forest where the wild boar and deer might sneak through at night and eat the rice.

Water Water Everywhere

Rice fields are typically placed near rivers, before planting we have to flood and level the fields. We’ll go down into the river, and place a wooden board by a drainage pipe at the edge redirecting water down that pipe and into a channel next to the field. We can then open a hole and let water go from that channel into the field. Water can then drain out of the edge of the field when it gets full, continuing on into other fields and eventually back into the river. The water will rejoin the river in part so that farming doesn’t dry out rivers and ensure the longevity of the environment.

For the field with the more onerous ditch clearing that water flows under the field and eventually back into the river. For that there’s an ad-hoc construction of some old drain and bamboo to move the water across into the field:

adhoc-piping

And the water entering that field:

water-coming-in

After flooding, a tractor with a flat rear blade will be moved over the field a few times to level it. When the planting vehicle goes over the field little arms pluck off some rice and stick it down. If the soil is too far from the arm you end up with loose rice floating around on top of the water. Obviously, we don’t want to waste rice like this so levelling is an important step.

One thing to note, with a rice field the deeper soil is compacted and firm, it shouldn’t be able to drain by the water going into the water table and disappearing. However, our field with the tricky water intake did suffer from a minor sinkhole as water was able to go down and rejoin the stream that flows under the field. This resulted in some work to dig down and fill the area letting water out with rocks and harder mud and compacting it with the bucket of a digger. After this work was done the field held it’s water and we were able to think about planting.

After poking around to figure out why water was draining I managed to get this picture of the hole that started to open up. I guess that’s a sinkhole of sorts.

sinkhole

An interesting fact is that rice doesn’t actually need standing water to grow. The water helps by stopping weeds growing around the rice taking resources and protects the rice from certain pests that would eat it.

For some further watching, this video shows a more advanced but very similar for a different farm. The main difference is that they don’t need to manually go to the inlet gates to open/close them and instead have some more modern gates controlled via mobile phone.

Planting Day

It’s planting day, turning up I can see the neighbours have already planted and here you can see our ready but empty field next to their freshly planted field:

comparison

But here we go, everything we’ve been working toward. The previous process has taken from mid-February and now it’s early May. We go off and buy seed trays of rice to load into the Rice Transplanter. Below you can see a picture of the planting process:

planting

An arm will move along the bottom of the rice and pull off a clump of rice and then plunge it into the ground. It will keep moving back and forth doing this at regular intervals. It the motion of it working is reminiscent of a typewriter at work.

After it’s done there’s some leftover rice, and there might be gaps where things weren’t perfectly level. We go out into the field wearing jika-tabi. These are boots with a split between the big toe and the other toes. It’s meant to help our feet not get stuck in the wet mud. Grabbing rice in small bunches we pull them from the seed tray and plant them about an inch deep into the mud and compact some mud around it.

Fun language note, my wife asked me if I saw the tabi once and I thought she meant a tabby cat. I wasn’t aware of the name of the footwear.

Now the rice is in the field we’re at risk of attack. Wild boar and deer just love to snack on our hard work - this means it’s time to put up the electric fence. This is fairly simple drive the poles into the ground at regular intervals, then feed the wire along it wrapping it round the clips making sure it’s moderately taut. Also check for any breaks in the wire and if so get a bit of electrical tape and fix it.

After wiring we place a box which is just a solar panel and battery on a timer next to the fence and try to hammer it into the ground or prop it up securely enough with rocks where the dirt is too shallow.

We’ll have to come back every week or so to cut the grass that sprouts up on the edge of the field. If we don’t it will ground the fence and drain the battery and our rice will fall victim to the local wildlife.

After planting our fields look like this:

planted

Draining Time

When the rice gets older - around waist height the field is drained. Some sort of narrow plough is moved between the rows pushing the mud up around the rice to hold it up and then the water intake is closed and the field left to drain and dry out. Then the rice will continue to grow until it’s harvest time.

ready-to-drain

Unfortunately, I left Japan a couple of weeks after draining and I haven’t experienced the final stage of harvest yet. I have this picture I was sent of the rice near harvest time but the final stages will have to remain shrouded in mystery for now. I’m not ready for spoilers when I may learn this in future firsthand:

The rice

Attacked!

A spectre has been looming over this post. The wild boar. I got an update one day it seems a baby boar managed to squeeze under an unelectrified part of the fence and help itself to an all you can eat buffet:

Boar damage

Luckily someone came to the farm the day before and the day after it happened and it was closed up before the boars started visiting nightly. But it seems important to remain vigilant of your defences. I’ve still not seen a boar in the wild even going through the nearby forests - they’re nocturnal and rather dangerous so I’m glad of that!

Local Fauna

In rice fields you can see a lot of interesting wildlife. Frogs and salamanders help protect the crop by eating bugs that might feed on the rice. You also might see snakes nearby that feed on them as well.

salamander

frog

When clearing grass once I saw a snake dart out from under a pickup truck we’d had parked up for a few hours as I walked past. I then looked at the grass I was going to cut and saw it hunkered down in the grass but obscured enough to not get a picture and not wanting to disturb it I moved on. After all I don’t know how dangerous it might be.

I asked my brother-in-law about the snakes later on when he came to the truck to get a drink and asked if they’re dangerous. He asked if it’s brown or “blue” (aoi 青い) - it was brown. Also blue here isn’t blue but green, historically ao used to mean the entire blue-green spectrum so for some older terms (often things like animal colours), aoi is still used instead of the more modern word for green (midori 緑). Anyway, his response to my answer is how I first heard the Japanese word 有毒な (yūdokuna) - or venomous in English. Not speaking any English, he further translated it by grabbing his throat and miming frothing at the mouth.

There are also black kites flying around, they’ve been known to swoop down and snatch up kittens and there are warning signs in some more populated places about keeping close to small pets. I’ve seen them circling in the heat but it’s hard to get a good photograph of birds with a normal smartphone camera. But I have my best capture of one:

bird

Economics of Rice Farming

When I was in Japan there was a rice price crisis (try saying that three times fast). With a 95% increase in price, it actually became cheaper to fly to South Korea, fill a suitcase with rice and fly back. Eventually, the government released part of its emergency rice supply kept in storage to tackle food shortages and mitigate against disasters. This situation is likely to occur again, and as an outsider looking at how Japan’s farming system is organised it seems unavoidable without significant reforms.

In Japan the average age of a full-time rice farmer is around 70. For younger generations they can only afford to do it part-time, 1 or 2 days a week. They also own 4-6 rice fields. There are no factory farms and large scale operations.

In this respect my wife’s family is very average. Rice farming doesn’t generate enough income to do it full time so my Mother-in-law and Brother-in-law only farm 1 or 2 days a week maximum. Without more time they’re able to just plant enough fields to account for the family rice consumption and not to sell rice.

Part of the reason of this is the Gentan system, designed to protect small-scale farmers income it prevents large scale factory farming of rice and encourages ownership of smaller farms. It has been officially abolished but it still shapes how the rice economy works. This was part of a system to discourage communism initially by encouraging ownership of business and preventing absentee landlords accumulating large tracts of land where people who work the fields would be forced into renting. It should be noted the UK’s system is like this with rich landowners accumulating more farmland for tax reasons and renting it to farmers who often struggle to make farming profitable.

Farmers also sell their crops via a centrally managed system which fixes the price. Historically, crops used for animal feed have fetched higher price than human quality rice leading to a number of farmers planting rice for themselves and then selling animal feed to make a living.

Another issue is automation of farming. Reading this account of rice farming you might think this seems very manual and it is. In America rice is aerially planted. With consistency in fields and the distribution of the rice leads to higher yields. And if you’re dealing with such small farm area that becomes more important - and things like aerial planting become less economically viable. An American farm can be roughly 100 times larger than a Japanese one.

Additionally, with rising cost of living a lot of the youth of Japan move to cities like Tokyo, Osaka and Nagoya where they can find better paid office work. Local rural economies struggle more as they lose people and income from work doesn’t scale with the costs of living. It seems unavoidable we’ll see more and more rice farms close and further impacts due to decreased output.

If you’re interested in this there’s a video about this on Asianometry.

Finally

Reading this last section it might seem to end in doom and gloom. This isn’t really how I wanted to sign off on things. Rice farming was a positive experience for me, a connection with nature, building relationships with my wife’s family and growing my Japanese skills. Doing a day of manual labour, chatting shit, then going for the onsen and some BBQ and beers is far better than grinding away at some enterprise SaaS that will probably disappear in a few years.

Farming becoming economically unviable seems to be something afflicting many countries. At some point I expect a wakeup call or transition. Either things are changed to make it viable full-time or Japan’s system of small independent farms will gradually fade away. Only time will tell, but I hope that rural communities can continue to survive and also thrive.

/2026/04/01/My-Experience-as-a-Rice-Farmer

Automatically Running Cargo On Workspace Changes

Feb 26, 2026 Updated Feb 26, 2026

Given a Rust project using workspaces with potentially a large amount of packages how can we make it quicker to run tests or other tooling when the project changes. Using build systems like Bazel and Buck2 you can use the build dependency graph and only rerun relevant things. But I’ve touched Bazel briefly when I was a C++ developer and I swore never again for my own mental health. Buck2 looks intriguing and I might look at it in the future if I have some stupid big project to work on - but for Rust projects it’s gonna be hard to beat the developer experience of just using Cargo.

Show full content

Authors disclaimer: this project isn’t a serious project I expect people to use. I had an idea and I implemented it because it felt interesting and possible with a low time commitment.

Implementation Plan

The idea is simple at it’s core. If we look at the files changed in a commit and can correlate them to packages in the workspace we can run our commands just on those packages which have changed. If we can build a dependency graph between packages in the workspace we can also run our commands on dependents of changed packages to ensure that changing a dependency doesn’t change how it’s users work.

One of the things that first drew me to this was that I could use a trie to represent workspace paths and check which package a file exists in by looking for the strongest prefix match in the trie. I hadn’t used a trie in anger before, only in University style assignments, so using one for a real project seemed like something that might be entertaining.

The next part of the plan was using Minijinja for templating to allow any command to be ran like this - on the changed packages only. I’ve been using it recently for some work projects so it was in my mind.

With this rough idea we have 4 components:

Git based identification of what’s changed
Building a dependency graph of the workspace, likely via cargo metadata
Finding each files owned package (if there is one)
Generating a command and executing it

Steps 1-3 are really just “identifying relevant packages” and 4 is creating/running command. I’ve fleshed out 1-3 more already in my head just because that’s the area my mind went to first. I know this is due to a bias on what’s more interesting to me but I felt it correlated well with the difficulty. Spoiler alert: it did correlate well.

We can also special case some commands we already know we’ll use to make it more ergonomic. Running something like dc --template "cargo test is definitely not as friendly as dc test.

Identifying Impacted Packages Finding Changes via Git

My first thought here is we don’t want to run on a package if any file in it has changed. That would be safer but we can probably ignore anything that’s not a source file, manifest or clearly generating code. A fairly simple function provides this check:

use std::path::Path; 

pub fn is_considered(path: &Path) -> bool {
    let ext = match path.extension().and_then(|e| e.to_str()) {
        Some(e) => e.to_ascii_lowercase(),
        None => return path.is_dir(),
    };
    matches!(
        ext.as_str(),
        "rs" | "c" | "cpp" | "h" | "hpp" | "cc" | "cxx" | "toml" | "pb"
    )
}

Then after that, given the path to the root repository as we might want to run on a different folder than our working directory. We can use git2 to identify the files that changed between the current commit and the parent commit.

use git2::{DiffOptions, Repository};
use std::path::PathBuf;

pub fn get_changed_source_files(root: &Path) -> anyhow::Result<Vec<PathBuf>> {
    let repo = Repository::open(root)?;

    let head = repo.head()?;
    let commit = head.peel_to_commit()?;

    // Last commit should have at least one parent (could look further back as well)
    let parent = commit.parent(0)?;

    let commit_tree = commit.tree()?;
    let parent_tree = parent.tree()?;

    let mut diff_opt = DiffOptions::new();

    let diff = repo.diff_tree_to_tree(
        Some(&parent_tree), 
        Some(&commit_tree), 
        Some(&mut diff_opt)
    )?;

    let mut considered_files = vec![];
    diff.foreach(
        &mut |delta, _| {
            if let Some(path) = delta.new_file()
                                     .path()
                                     .or_else(|| delta.old_file().path()) 
            {
                if is_considered(&root.join(path)) {
                    considered_files.push(path.to_path_buf());
                }
            }
            true
        },
        None,
        None,
        None,
    )?;

    Ok(considered_files)
}

This wasn’t too hard, the library maps well to a knowledge of git and there’s also helpful examples to get me looking in the right places.

For a more fully featured one I’d include dirty changes (or have the option to include them), as well as allowing comparisons between more than the current head and parent. But this is intended to run in CI and I’m happy running on every commit - for now.

Using a Trie to Represent Structure

A trie is a tree structure also called a prefix tree, the wikipedia is a reasonably good resource if you’ve not heard of it before. The reason I’m using a trie for prefix matching is to get ergonomic checks on if something is within a directory tree. To visualise this here’s a potential trie we might construct from a cargo workspace:

image info

Notice I say ergonomic not fast. While tries are used to speed up prefix based matching, we don’t really know if it’ll impact things at this scale. The trie is going to be constructed when the program runs and used. But depending on the number of changes and the size of the project it might always be faster to just loop through the packages and do file.starts_with(package_root) for all the packages in the workspace. If this was for a server or something long running the cost of trie creation could be amortized and we could be more sure of savings. But that’s not the case here.

When selecting a trie library I vaguely recalled a Cloudflare blog on a trie library they wrote trie-hard. In the blog they mentioned rafix_trie as the fastest existing implementation before theirs and in the readme for trie-hard it also mentions radix_trie as more robust and fully featured. Also, radix_trie was updated more recently whereas trie-hard seems to have not been touched since their blogpost. With that in mind I chose radix_trie.

Working out Workspace Packages

In my experience, whenever you write a dev-tool that works on Rust projects you will reach for cargo_metadata. This is a minimal library that takes the json structured cargo metadata output and gives you Rust types for it. From that we can we the metadata for the package, loop over the workspace members, find out what dependencies they have (and which ones are in our workspace as well).

This code once again feels straightforward to the point I’ll just paste it all in here. We construct a metadata command that points to the project root. We then iterate over the workspace members and for each member iterate over it’s dependencies keeping ones which exist within our project root. We then construct a package type with the name of the package, manifest path and the list of dependencies.

This is inserted into a trie that maps from the workspace path to the package and the trie is returned - with a result if the metadata command fails.

use cargo_metadata::MetadataCommand;
use radix_trie::Trie;
use std::path::{Path, PathBuf};

#[derive(Debug, Default, Eq, Hash, Ord, PartialEq, PartialOrd)]
pub struct Package {
    pub name: String,
    pub manifest: PathBuf,
    pub dependencies: Vec<PathBuf>,
}

fn check_path(root: &Path, path: Option<&Path>) -> bool {
    match path {
        Some(p) => p.starts_with(root),
        None => false,
    }
}

pub fn find_packages(root: &Path) -> anyhow::Result<Trie<PathBuf, Package>> {
    let metadata = MetadataCommand::new().current_dir(root).exec()?;

    let mut packages = Trie::new();

    for package in &metadata.workspace_members {
        let package = &metadata[package];

        let dependencies = package
            .dependencies
            .iter()
            .filter(|x| check_path(root, x.path.as_ref().map(|x| x.as_std_path())))
            .map(|x| x.path.clone().unwrap().into_std_path_buf())
            .collect();

        let pack = Package {
            name: package.name.to_string(),
            manifest: package.manifest_path.clone().into_std_path_buf(),
            dependencies,
        };
        packages.insert(
            package
                .manifest_path
                .parent()
                .unwrap()
                .as_std_path()
                .to_path_buf(),
            pack,
        );
    }

    Ok(packages)
}

Generating Commands

As mentioned before I’m going to use minijinja for this. The motivation is that we have a list of packages, we’ll have a list of changed packages and probably a list of other arguments to apply to the command. Depending on the command we might have different ways of providing those lists as arguments and a Minijinja template lets us express things like loops to generate our command string.

Off the bat I created this template for a simple cargo test -p package_1 -p package_2 given a list of those two packages:

cargo test {% for pkg in packages %} -p {{ pkg }} {% endfor %}

This can easily be expanded to add a list of other arguments as well or changed from -p to -e to exclude packages. To aid in writing templates, Minijinja has a playground as well!

I’m also then using shell-words to take the command string and split it up into the command name and args so I can construct a std::process::Command. Rules on splitting arguments involves handling things like quotes and I didn’t want to roll that myself - and knowing the world there’s probably some cursed edge-cases.

With Minijinja we start by creating an environment and adding our template. We then get the template out and can see what undeclared variables are present. These will be where we insert things like our packages or args. This does rely on us having some reserved names for certain things which the user will make use of. I’ve gone for packages for the packages to include and args for the other args.

We could also not use undeclared variables and just insert all the variables available to us. But why do extra work when it’s trivial to avoid it? Minijinja uses json internally a lot, and the Minijinja value type is analogous to the serde_json::Value. This also means we can create Mininja values from types that implement Serialize and we don’t have to worry too much as long as things all look like they’d serialize into json without a care.

After that we render the expression, split using shell-words and execute the command.

use minijinja::{Environment, Value};

fn generate_command(
    template: &str,
    included_packages: &BTreeSet<&str>,
    args: &[String],
) -> anyhow::Result<Command> {
    let mut env = Environment::new();
    env.add_template("cmd", template)?;
    let expr = env.get_template("cmd")?;

    let variable_names = expr.undeclared_variables(true);
    let mut variables = HashMap::new();
    for var in variable_names.iter() {
        match var.as_str() {
            "packages" => {
                variables.insert("packages", Value::from_serialize(included_packages));
            }
            "args" => {
                variables.insert("args", Value::from_serialize(args));
            }
            s => anyhow::bail!("Unsupported variable `{}`", s),
        }
    }
    let result = expr.render(&variables)?;

    let parts = shell_words::split(result.as_str())?;
    let mut part_iter = parts.into_iter();
    let exe = part_iter.next().context("No program name")?;
    let mut cmd = Command::new(exe);

    cmd.args(part_iter)
        .stdout(Stdio::inherit())
        .stderr(Stdio::inherit());

    Ok(cmd)
}

For commands where we don’t choose packages to include but also exclude I added another allowed variable of excludes. Adding this to the code we change the function to add an argument of packages: &Trie<PathBuf, Package>. This allows us to get the difference between the set of packages and the included packages for excludes. This changes the match on variable names to be as follows (adding the arg is omitted):

match var.as_str() {
    "packages" => {
        variables.insert("packages", 
            Value::from_serialize(included_packages)
        );
    }
    "excludes" => {
        variables.insert(
            "excludes",
            Value::from_serialize(generate_exclude_list(
                packages.values(),
                included_packages,
            )),
        );
    }
    "args" => {
        variables.insert("args", Value::from_serialize(args));
    }
    s => anyhow::bail!("Unsupported variable `{}`", s),
}

With generate_exclude_list defined as follows:

fn generate_exclude_list<'a>(
    packages: impl Iterator<Item = &'a Package>,
    included_packages: &BTreeSet<&str>,
) -> BTreeSet<&'a str> {
    packages
        .filter(|x| !included_packages.contains(x.name.as_str()))
        .map(|x| x.name.as_str())
        .collect::<BTreeSet<_>>()
}

Integrating the Current Pieces

Given the root of the project, we can now:

Get the changes
Get the packages in the project

Using radix_trie if we do trie.get_ancesntor_value(&changed_file) it will give us a package if one exists. Using the initial set of changes we get the initial packages impacted. Then we can get these packages, check the dependencies for ones in the workspace. Add it to the list of packages and then repeat. Continue until no new packages are added.

An astute reader might think “couldn’t we have constructed the dependency graph and just do a graph traversal”. And yes we could. But for the size of workspaces I work with this felt like overkill. Plus I’ve only got 7 direct dependencies so far and 3 of them are >1.0

let considered_files = repository::get_changed_source_files(&root)?;
let packages = cargo::find_packages(&root)?;
let mut changed_packages = BTreeSet::new();
let mut end_package_names = BTreeSet::new();

for file in &considered_files {
    if let Some(package) = packages.get_ancestor_value(&root.join(file)) {
        changed_packages.insert(root.join(file));
        end_package_names.insert(package.name.as_str());
    }
}

let mut changed_packages_previous = 0;

while changed_packages_previous != changed_packages.len() {
    changed_packages_previous = changed_packages.len();

    for (key, val) in packages.iter() {
        if val
            .dependencies
            .iter()
            .any(|x| changed_packages.contains(x))
        {
            let path = root.join(key)
            if let Some(package) = packages.get_ancestor_value(&path) {
                changed_packages.insert(root.join(key));
                end_package_names.insert(package.name.as_str());
            }
        }
    }
}

Designing the CLI

The final step is to create the CLI, for this I’ll be using Clap. Here we’ll create an enum of commands with some predefined ones so we can do dc test, dc nextest, dc build and dc run where run is the only one without a presupplied template. We’ll pop some convenience methods on the arg types to get out the right templates, paths, and other things depending on defaults or what the user supplies.

Aside from that every command will have a --no-run option and other args will be supplied after --. Clap doesn’t seem to allow me to grab all the unexpected args as a Vec<String> or some sort of map otherwise, which makes sense but is a bit of added friction to the CLI.

dc test -- --all-features

That CLI interface defined with Clap is as follows::

const CARGO_TEST_TEMPLATE: &'static str = 
"cargo test {% for pkg in packages %} -p {{ pkg }} {% endfor %} {% for arg in args %} {{ arg }} {% endfor %}";

const CARGO_NEXTEST_TEMPLATE: &'static str = 
"cargo nextest {% for pkg in packages %} -p {{ pkg }} {% endfor %} {% for arg in args %} {{ arg }} {% endfor %}";

const CARGO_BUILD_TEMPLATE: &'static str = 
"cargo build {% for pkg in packages %} -p {{ pkg }} {% endfor %} {% for arg in args %} {{ arg }} {% endfor %}";

const CARGO_BENCH_TEMPLATE: &'static str = 
"cargo build {% for pkg in packages %} -p {{ pkg }} {% endfor %} {% for arg in args %} {{ arg }} {% endfor %}";

#[derive(Debug, Parser)]
pub enum RunCommand {
    Test(RequiredArgs),
    Nextest(RequiredArgs),
    Build(RequiredArgs),
    Bench(RequiredArgs),
    Run(Args),
}

impl RunCommand {
    pub fn required_args(&self) -> &RequiredArgs {
        match self {
            Self::Test(a) | Self::Nextest(a) 
                | Self::Build(a) | Self::Bench(a) => a,
            Self::Run(a) => &a.required,
        }
    }

    pub fn command(&self) -> Option<Cow<'_, str>> {
        match self {
            Self::Test(_) => Some(CARGO_TEST_TEMPLATE.into()),
            Self::Nextest(_) => Some(CARGO_NEXTEST_TEMPLATE.into()),
            Self::Build(_) => Some(CARGO_BUILD_TEMPLATE.into()),
            Self::Bench(_) => Some(CARGO_BENCH_TEMPLATE.into()),
            Self::Run(a) => a.command.as_ref().map(|x| x.into()),
        }
    }
}

#[derive(Debug, Parser)]
pub struct RequiredArgs {
    /// Get the project to run on, runs in current directory otherwise.
    #[arg(short, long)]
    input: Option<PathBuf>,
    /// Generate command but don't run it
    #[arg(long)]
    no_run: bool,
    /// These will be passed to the minijinja template as the args variable
    #[arg(last = true)]
    args: Vec<String>,
}

impl RequiredArgs {
    fn path(&self) -> PathBuf {
        match self.input.as_ref() {
            Some(s) => s.clone(),
            None => env::current_dir().unwrap(),
        }
    }
}

#[derive(Debug, Parser)]
pub struct Args {
    /// Run the following command. This accepts a minijinja template where
    /// `packages` is a list of packages that can be included and `excludes`
    /// is a list of packages that can be excluded. For a cargo test you can
    /// write the template 
    /// `cargo test {% for pkg in packages %} -p {{ pkg }}{% endfor %}`
    #[arg(short, long)]
    command: Option<String>,
    #[command(flatten)]
    required: RequiredArgs,
}

Missing Pieces

With this the user can add their own -p or -e to the cargo test templates for things they always want to test i.e. when things might break because of stuff not in the package workspace. I haven’t made any effort to remove conflicts in that case. There’s also probably stuff with propagating environment to the process which is easy enough to do but I don’t have a need for it.

Another thing would be applying this to multiple commits or between branches. I’ll likely look into that the first moment I feel a need for it myself - I don’t see it being too much extra just maybe a bit of faff.

Overall, I’ve made something that works on my projects and I don’t have a strong motivation to make into a more general project for a wider community. I have a feeling this is solved by other systems better and that’s fine - I don’t do a lot of this stuff looking for users more to scratch an itch.

Will I Use This?

Probably not, but maybe. Recently Github has been having “issues” as I’m sure a lot of people are aware. Seems like they’re targeting the lofty heights of one 9 of uptime. And during this at $day_job we’ve seen CI pipelines in workspaces that normally take 4 minutes take upwards of 50 minutes. Maybe this would make them take more like 20 minutes when Github Actions are having a moment. But then with degraded performance like that no bets are off. The main thing is I got to do something a bit different and have fun so I’ll chalk that up as a win and move on with my life.

/2026/02/26/automatically-running-cargo-on-workspace-changes

2025 In Review

Jan 3, 2026 Updated Jan 3, 2026

I did one last year so it’s probably time for another year in review. It’ll be mentioned more in the non-technology section but I did spend 6 months in Japan and that will be a theme in why I haven’t made as much progress in side projects as I desired.

Show full content

tarpaulin

Tarpaulin had some significant speed improvements in 2025 as well as improvements on memory usage in the llvm coverage parsing. Additionally, the llvm profraw parsing code I’ve written in llvm-profparser is being used by Ferrous Systems in a new project. They’ve helped by contributing more testing, and fixes to parts of the library which Tarpaulin doesn’t hit and it’s been a pleasure collaborating with them on something I wrote.

The focus in 2026 will be taking more of the existing issues and closing more and more of them. Hopefully, UX issues with workspaces and excluding code from results can be solved in a way that’s more intuitive to people.

Tarpaulin has seen 14 releases in 2025 (and including New Years Day). That’s 3 more releases than last year and from 12 contributors (including me and ignoring dependabot).

streaming audio API blog series

In 2024 I started a blog series on streaming audio APIs with 4 posts released and a working system which could be used as a basis for things like streaming transcription services. In 2025 I only managed to release a single post but with 2 others in a draft state.

Currently, there’s a post on implementing the production metrics for these sorts of APIs. It’s in revisions and I hope to get it out within a month or two. But first I want to add more details to the section on tokio-metrics…

After that I aim on releasing the one on batching futures for performance. I might be ambitious and aim for the Opentelemetry one but given how terrible the Otel developer experience is that one might take a long time…

wiremocket

Last year I released wiremocket to minimal interest. This was expected as not everyone likes mocking, and most people don’t like websockets. However, it’s as complete as I need for the few times I need it which means unless people raise issues or I need more it can be seen as feature complete.

My current plan is just to keep bumping up dependency versions as new versions come out and otherwise keep it in a light maintenance mode.

asyncapiv3

I did start some work on improving asyncapiv3 submitting 7 PRs in the end. My hope was to make something similar to an openapi experience for websocket APIs. I’ll have to go back into my notes for this but I believe the library should be at a place where it can be used to power something similar to utoipa in Rust.

Currently at my day job, clients consuming websocket APIs is lower priority than it used to be. This does mean my work on this has somewhat stalled.

trustfall

Last year I got into trustfall some more and wrote a post on it. I also, revived Predrag’s WIP PR for a trustfall docs site and filled in some more details (See the PR here).

There’s a tracking issue for ideas for the docs and missing holes. I hope to get back to this work and hopefully something will be up and hosted this year. Trustfall is a cool piece of technology and helping making it easier to adopt would make me personally happy!

outside of technology

I mentioned in the past year in review I’d be spending some months in Japan. All in all that was 6 months during which I did get married with a traditional Shinto ceremony. Obviously, a large landmark in life and definitely a unique experience, the kimono was like 5-7 layers and in 85% humidity it does become a feat of human endurance.

In Japan my Japanese progress has obviously leapt forwards. It’s not where I want it to be but spending most of my time around people with little to no English has sharpened up my listening comprehension and speaking skills. I also helped out on the family rice farm splitting my time between my day job and rice farming for a few months.

I did get to see a reasonable amount of Japan, a lot around the Shizuoka prefecture where I was living, Tokyo, Nagoya, Nagano, Hiroshima, Imabari, Yamanashi. I should be back again this year and hopefully I’ll have time to explore more of Japan north of Tokyo.

Reading wise I just managed to finish Hyperion by Dan Simmons, and I’m partway through How to Blowup a Pipeline by Andreas Malm. I didn’t get to any of my other reading targets too many other things going on! I did start reading more manga though, mainly classics and seinen with Beserk and Oyasumi Punpun being the ones that have taken the most of my time.

I did definitely play a lot more games though. I finished everything I planned to last year (1000xRESIST, Chasing Static, Hollowbody, Half Life 1). I also picked and completed other fun indies like Parking Garage Rally Circuit and Selaco.

Cult of the Lamb was also something I really enjoyed. But the most time I’ve put in a game last year has to be Dwarf Fortress where I’ve racked up around 100 hours in 2025.

In terms of TV/Film, I’ve mainly watched a lot of anime since getting back to the UK. All of Naruto + Naruto Shippuden, Orb, Dandadan, Spy X Family and Yuru Camp are the ones that spring to mind as standouts. For film I’ve watched a lot and some standouts (not animated) are:

Shin Kamen Rider (Dir. Hideaki Anno)
Onibaba (Dir. Kaneto Shindō)
Decision to Leave (Dir. Park Chan-wook)
Memories of Murder (Dir. Bong Joon Ho)
Pulse (Dir. Kiyoshi Kurosawa)
Weapons (Dir. Zach Cregger)
In the Mood for Love (Dir. Wong Kar-Wai)

I’ve also started trying to watch all Akira Kurosawa films: Seven Samurai, Ran and Throne of Blood are all amazing. For animated films Jin Roh - Wolf Brigade is an amazing work of animation and 100 meters might get me on a sport anime kick. Plus the Chainsaw Man movie was an amazing spectacle.

/2026/01/03/2025-In-Review

Ideas Of Crater And Coverage

Nov 15, 2025 Updated Nov 15, 2025

Recently I’ve been doing tackling some issues in my open source projects related to Ferrous System’s work to get code coverage for libcore - the Rust core library. Because of this I’ve been talking to Jyn a fair bit and this has lead to some ideation my side, about combining this work with crater. Jyn telling me this idea sounded cool is what gave me the push to write this up - so blame or thank them as necessary 😅.

Show full content

Of course libcore isn’t libstd, but having coverage for core should result in coverage for libstd potentially being possible with no extra work!

What’s Crater?

If you know what crater is feel free to skip ahead, this won’t be anything groundbreaking in terms of insights!

Crater helps detect regressions in the Rust compiler by grabbing all the projects on crates.io and github and then running cargo build/check and cargo test on them. This helps make sure that changes to the language or standard library don’t cause any breakages in users projects.

It won’t run every project, ones with flaky tests or that can’t run in the environment (a linux host) often get blacklisted. But it will run through a lot. Looking, at the crater queue I can see one of the current runs is going over 1471548 jobs which is more than the number of crates on crates.io, looking at a report I can see it’s testing multiple versions of crates so that’s likely why.

What is Coverage?

Code coverage is a means to measure how much of your code is hit by your tests. The simplest metrics are very coarse, just lines or functions, and they get more complex looking at branches, boolean subconditions or combinations of them (MC/DC coverage).

It’s a reasonable heuristic for spotting what part of your code isn’t being exercised by your tests, but it doesn’t help you determine the quality of your tests. Because of this it’s often effective when used in combination with things like mutation testing, or other testing methods. Coverage might also be coupled with formal verification tooling like kani to ensure that the verification harness isn’t overly constrained.

In the safety critical world you’ll typically have some form of requirements-based testing and your tests contain a written brief which references your requirements and design and what they’re exercising. Then you can assess both your code coverage and you requirements coverage. If you have 100% requirements coverage and only 60% code coverage something is probably wrong - either some requirements are poorly tested or your system is under-specified. Likewise if you had 60% requirements coverage and 100% code coverage you can’t have confidence that all your tests are meaningfully testing the system.

This is just a general idea of the concept and how it’s useful in engineering. When I was working in a safety-critical domain we had requirements based tests, real world data simulation tests and something akin to property based testing. With all of these in combination it’s possible to deliver incredibly robust software like flight control computers etc.

Okay Then What’s Your Idea?

Well it should be obviously at this point. But use crater to get coverage statistics of how rust crates in the ecosystem are using the standard library. Are there any parts that tests in the wild just don’t hit? Mainly just curiosity at what the data says. Having a lot of eyes (crates) on an API makes it easier to spot regressions, so what aren’t we looking at?

Also, given the limitations to what crates crater can run and the impacts of the blacklist of some crates. This could function of a benchmark of craters effectiveness as a smoke test. Maybe we could spot some areas where changes to what crater can run could have the largest gains in it’s usefulness.

Plus when it comes to assessing nightly (library) features, which ones are more or less widely tested? With source code analysis we can come up with usage but we don’t necessarily know how much that code is being ran. There’s also potential to couple source analysis and coverage analysis, an initial thought is that the ratio of what people use but don’t test could show where things are hard to write tests for.

Of course getting more data for analysis and doing something useful with that analysis is another matter entirely. But by putting this idea out maybe some people might have useful ideas, and that’s the aim of this post.

/2025/11/15/Ideas-of-Crater-and-Coverage

Making Cli Tools With Trustfall

Aug 18, 2025 Updated Aug 18, 2025

For those who aren’t aware, Trustfall is a pretty cool tool. From the repo about section it says:

Show full content

For those who aren’t aware, Trustfall is a pretty cool tool. From the repo about section it says:

A query engine for any combination of data sources. Query your files and APIs as if they were databases!

And this is pretty powerful as a concept, and I’ve read blogposts - mostly by Predrag (blog here), I’ve seen a talk or two. And as a result, I’ve been looking for a place to play with Trustfall.

Well, I’ve finally done it and here’s my writeup of the general experience and how to use Trustfall yourself. But before I get into that it would be remiss to not mention there’s already a nifty CLI tool using Trustfall and that’s cargo-semver-checks. If you haven’t seen it before check it out it’s very useful.

If you want to skip ahead here’s the link for the project I’m talking about in this post.

The Tool

Docker. It bloats up my storage, I have tons of images as part of work with dev builds and for some projects they’re around 7GB an image. It’s not great. And I’ve been suffering writing bash functions to grep all the images and remove ones with a certain substring in the name, attempts to delete anything over a size. It ends up fiddly and a bit annoying.

But each image has a bunch of data associated with it, it seems reasonable to query it and print information or delete images that match a query. Looks like I’ve got a reason to use Trustfall.

Side note I do also use podman on some machines, this will be relevant later.

Okay, the start of a plan. I’ll make a tool that can query the docker images on my system and retrieve the images that match some predicates and delete them!

Getting Started with Trustfall

Trustfall uses a GraphQL-esque language to your schema and query. First off we need to create a schema to describe the shape of our data - similar to the relational model. To help create this schema it can be helpful to think of the queries that we want to represent.

Using the following CLI command I get can get a list of all the docker images on my machine in an easy to parse format. This is newline delimited json so for ease of reading I’ll show one of the lines in formatted json!

$ docker image ls --format=json
{
  "Containers":"N/A",
  "CreatedAt":"2024-06-07 13:00:09 +0100 BST",
  "CreatedSince":"13 months ago",
  "Digest":"\u003cnone\u003e",
  "ID":"35a88802559d",
  "Repository":"ubuntu",
  "SharedSize":"N/A",
  "Size":"78.1MB",
  "Tag":"latest",
  "UniqueSize":"N/A",
  "VirtualSize":"78.05MB"
}

Looking at this data, and thinking about what I want to do I can end up with some queries:

Finding docker images within a size range (min or max)
Docker images older or younger than some date
Ones with an exact name match (not using tag)
Ones with a name that matches a regex
Ones with a name containing a substring

This is just driven off what data I can get via docker image ls --format=json. There is a possibility to get more data, but that would need another call to docker inspect <IMAGE>.

From this I’ve made this schema in the end:

schema {
  query: Query
}

type Query {
  Image: [Image!]!
}

type Image {
  repo: String,
  tag: String,
  size: Int!,
  created: String!

  # Filtering via edges (with parameters)
  size_in_range(min: Int, max: Int): [Image!]!
  created_after(timestamp: String!): [Image!]!
  created_before(timestamp: String!): [Image!]!
  has_name(name: String!): [Image!]!
  name_matches(regex: String!): [Image!]!
  name_contains(substring: String!): [Image!]!
}

Why is there size_in_range but created_after and created_before? Mainly just laziness and what looks natural in a written query. Internally they can map to the same code and could add a created_in_range.

From Schema to Code

Firstly, we want to install trustfall_stubgen. This will take our schema and generate some starting code with todo! stubs dotted around and some boilerplate we’d rather not write ourselves.

To install it:

cargo install trustfall_stubgen

Then I’ll make a temp directory and generate the code into it so I can have a look at it:

mkdir tmp
trustfall_stubgen --schema schema.graphql --target tmp/

In this there’s an adapter folder with our generated code, plus our schema is included in the folder as it neededs to be outputted as a string by one method. I tend to generate it into a separate folder instead of directly into my source folder just to avoid potentially overwriting something and not noticing. Which is definitely a bit paranoid given I’m using version control but nevertheless.

In our brand new adapter module we have:

adapter_impl.rs
edges.rs
entrypoints.rs
mod.rs
properties.rs
schema.graphql
tests.rs
vertex.rs

We don’t have to touch all of these, edges.rs, entrypoints.rs, properties.rs and vertex.rs are the only files I’ve had to edit. So one by one, let’s go through and fill in this code! I’m going to present these in an order which builds things up gradually and (in my opinion) is the clearest order to implement.

Writing Your vertex.rs

In the Trustfall model a vertex is like a table in SQL, this will contain our parsed docker image data, we don’t really have any other relations in application. It’s also an enum and this is provided in the file for us to fill in as so:

#[non_exhaustive]
#[derive(Debug, Clone, trustfall::provider::TrustfallEnumVertex)]
pub enum Vertex {
}

Now I typically don’t like putting my data definitions in here and instead define them outside of the adapter and import them in. So in my src/image.rs I add a type definition for the Image and any methods, conversions or other utilities that I need for it. That way my vertex.rs is just vertex related stuff. My image definition is as follows, taking all the useful data from the inspected image in useful form:

#[derive(Debug, Clone, Eq, Hash, PartialEq, Ord, PartialOrd)]
pub struct Image {
    pub hash: String,
    pub repository: String,
    pub tag: String,
    pub size: usize,
    pub created_at: Timestamp,
}

And then my vertex.rs I update to be:

use std::sync::Arc;

#[non_exhaustive]
#[derive(Debug, Clone, trustfall::provider::TrustfallEnumVertex)]
pub enum Vertex {
    Image(Arc<crate::Image>),
}

The eagle-eyed amongst you may realise this Image type doesn’t match the schema:

type Image {
  repo: String,
  tag: String,
  size: Int!,
  created: String!
}

It mostly does, it’s just created is a different type. This is because Trustfall doesn’t (yet) have a timestamp type and I’m also using jiff::Timestamp. Later on when going between the Trustfall queries and my types there’ll have to be some conversion between the two. There is a clear difference between Rust types that are nice to work with, and queries that are nice to work with. At times these can clash and then you get these small differences.

Writing Your entrypoints.rs

An entrypoint is where you populate your data to query, it outputs a VertexIterator. Here we want to take our data source and output our vertices. This could be calling an API, loading a file and deserializing it, calling a CLI and parsing it’s output.

Here is where podman becomes relevant, because the docker image ls --format=json output is different between the two. Docker outputs a newline delimited json objects, whereas podman outputs a json list. Additionally, the fields in each object vary (fun).

With this in mind I made an ImageOutput type and a conversion to my Image type used in my Vertex as follows:

#[derive(Deserialize)]
#[serde(untagged)]
pub enum ImageOutput {
    Podman(podman::Image),
    Docker(docker::Image),
}

impl From<ImageOutput> for Image {
    fn from(x: ImageOutput) -> Self {
        match x {
            ImageOutput::Podman(p) => p.into(),
            ImageOutput::Docker(d) => d.into(),
        }
    }
}

impl From<podman::Image> for Image {
    fn from(img: podman::Image) -> Self {
        // SNIP
    }
}

impl From<docker::Image> for Image {
    fn from(img: docker::Image) -> Self {
        // SNIP
    }
}

With that boilerplate out of the way here is my entrypoint:

pub(super) fn image<'a>(_resolve_info: &ResolveInfo) -> VertexIterator<'a, Vertex> {
    let version = Command::new("docker")
        .args(["--version"])
        .output()
        .expect("couldn't get docker version");
    let version = String::from_utf8_lossy(&version.stdout);
    let is_podman = version.contains("podman");

    let images = Command::new("docker")
        .args(["image", "ls", "--format", "json"])
        .output()
        .expect("failed to run docker");

    let images: Vec<ImageOutput> = if is_podman {
        serde_json::from_slice(&images.stdout).expect("couldn't deserialize the json output")
    } else {
        let s = String::from_utf8_lossy(&images.stdout);
        let mut v = vec![];
        for line in s.lines() {
            v.push(serde_json::from_str(line).expect("couldn't deserialize the json output"));
        }
        v
    };

    Box::new(
        images
            .into_iter()
            .map(|x| Vertex::Image(Arc::new(x.into()))),
    )
}

This is all relatively simple code, some code to detect if it’s docker or a podman alias, deserialization and then return a type that meets the requirements of VertexIterator which is a Box<dyn Iterator<Item = VertexT> + 'vertex>.

As a side, you can also just remove this file. And instead in adapter_impl.rs have the type which the trustfall::provider::Adapter is implemented for generate your Vertex. That approach might be desirable if the type that implements Adapter has a number of member variables you want to use when generating the vertices. Also, don’t worry about ResolveInfo, it’s for optimisation purposes and more complicated.

For this project I have one source and that’s the docker images on my machine. The fact there’s no fancy configuration means I can keep things simple and stick to the generated structure.

Writing Your properties.rs

Now we have a Vertex, and we have the means to get a sequence of vertices into the query engine. The only things missing to execute queries is extracting data from the vertex and running our defined queries. First, let’s get the data out. That way we can run simple queries with no filtering like print every docker image.

The starting point that trustfall_stubgen gives us is as follows:

use trustfall::{FieldValue, provider::{AsVertex, ContextIterator, ContextOutcomeIterator, ResolveInfo}};

use super::vertex::Vertex;

pub(super) fn resolve_image_property<'a, V: AsVertex<Vertex> + 'a>(
    contexts: ContextIterator<'a, V>,
    property_name: &str,
    _resolve_info: &ResolveInfo,
) -> ContextOutcomeIterator<'a, V, FieldValue> {
    match property_name {
        "created" => {
            todo!("implement property 'created' in fn `resolve_image_property()`")
        }
        "repo" => todo!("implement property 'repo' in fn `resolve_image_property()`"),
        "size" => todo!("implement property 'size' in fn `resolve_image_property()`"),
        "tag" => todo!("implement property 'tag' in fn `resolve_image_property()`"),
        _ => {
            unreachable!(
                "attempted to read unexpected property '{property_name}' on type 'Image'"
            )
        }
    }
}

Here we have a ContextIterator and we want to transform that into a ContextOutcomeIterator. These type aliases with some simplification look as follows:

pub type ContextIterator<VertexT> = Box<dyn Iterator<Item =
    DataContext<VertexT>>>;

pub type ContextOutcomeIterator<VertexT, OutcomeT> =
    Box<dyn Iterator<Item = (DataContext<VertexT>, OutcomeT)>>;

The contexts are what we want to resolve, so for each property name I’ll create a mapping from that context and property to the same context paired with the value of that property. I believe this all came together from looking at some example Trustfall adapter implementations from Predrag’s blog and talks but it was a while ago so I can’t trace back to exactly how I learned this.

Just implementing this for the created property looks as follows:

pub(super) fn resolve_image_property<'a, V: AsVertex<Vertex> + 'a>(
    contexts: ContextIterator<'a, V>,
    property_name: &str,
    _resolve_info: &ResolveInfo,
) -> ContextOutcomeIterator<'a, V, FieldValue> {
    let func = match property_name {
        "created" => |v: DataContext<V>| match v.active_vertex() {
            Some(Vertex::Image(img)) => {
                let created_at = img.created_at.to_string().to_string();
                (v, created_at)
            },
            None => (v, FieldValue::Null),
        },
        "repo" => todo!("implement property 'repo' in fn `resolve_image_property()`"),
        "size" => todo!("implement property 'size' in fn `resolve_image_property()`"),
        "tag" => todo!("implement property 'tag' in fn `resolve_image_property()`"),
        _ => {
            unreachable!(
                "attempted to read unexpected property '{property_name}' on type 'Image'"
            )
        }
    };
    Box::new(contexts.map(func))
}

Here I have to convert my jiff::Timestamp to a string and then use the Into to turn it into a FieldValue, for types with no conversion this is a bit simpler. But generally speaking this is reasonably straightforward. Get the active vertex (of which we only have one potential type), extract the property and return it.

One important thing to note, cloning v is easy and what I initially did. However, it’s not necessary and often you can avoid a potentially expensive clone by binding the second element of the tuple to a new variable so the compiler doesn’t try and take a reference to a temporary value.

Writing Your Edges.rs

Last is edges, here we have the most generated code and we implement the queries in our schemas. If you scroll right down to the bottom you’ll see a function for each query like:

pub(super) fn created_after<'a, V: AsVertex<Vertex> + 'a>(
    contexts: ContextIterator<'a, V>,
    timestamp: &str,
    _resolve_info: &ResolveEdgeInfo,
) -> ContextOutcomeIterator<'a, V, VertexIterator<'a, Vertex>> {
    resolve_neighbors_with(
        contexts,
        move |vertex| {
            let vertex = vertex
                .as_image()
                .expect("conversion failed, vertex was not a Image");
            todo!("get neighbors along edge 'created_after' for type 'Image'")
        },
    )
}

The steps here are once again not too hard, we have our vertex already extracted for us. We just need to see if it matches the query and return it in an iterator if it does.

Here we get our timestamp once again in a string so we need to do the conversion. Also, because I want to return that vertex I’ve removed the name shadowing. But below is the code:

pub(super) fn created_after<'a, V: AsVertex<Vertex> + 'a>(
    contexts: ContextIterator<'a, V>,
    timestamp: &str,
    _resolve_info: &ResolveEdgeInfo,
) -> ContextOutcomeIterator<'a, V, VertexIterator<'a, Vertex>> {
    let ts = timestamp.parse::<Timestamp>().unwrap();

    resolve_neighbors_with(contexts, move |vertex| {
        let image = vertex
            .as_image()
            .expect("conversion failed, vertex was not a Image");

        if image.created_at > ts {
            Box::new(std::iter::once(vertex.clone()))
        } else {
            Box::new(std::iter::empty())
        }
    })
}

Repeat something like this for all the queries and then we have a working Trustfall adapter!

Using the Adapter

Using this adapter if I wanted to get all the docker images created in June this year I could use the following query:

{
  Image {
    created_after(timestamp: "2025-06-01 00:00:00+00") 
    created_before(timestamp: "2025-07-01 00:00:00+00") 
    repo @output
    tag @output
    size @output
created @output
  }
}

If I want to then execute that query it looks like so (the query is in the variable query for brevity):

use adapter::*;
use trustfall::{FieldValue, execute_query};
use std::{collections::BTreeMap, sync::Arc};

let adapter = Arc::new(Adapter::new());
let args: BTreeMap<Arc<str>, FieldValue> = BTreeMap::new();

let vertices = execute_query(Adapter::schema(), adapter, query, args).unwrap();

The vertices is an iterator over BTreeMap<Arc<str>, FieldValue>. Each BTreeMap will contain the repo, tag, size and created date of a docker image.

Now for a CLI each CLI argument will map to at least one query, and you can iteratively build up a Trustfall query going over the CLI args. Currently, Trustfall is “GraphQL-like” in that it doesn’t support everything in GraphQL. This means features like query arguments aren’t yet supported and you should place the values straight in the constructed query string.

My query string construction is a fair amount of code, but not very nested and more repetitive. Here’s a snippet of it to see how I use it:

let mut query_str = "{Image{".to_string();

if let Some(created_before) = filter.created_before {
    query_str.push_str(&format!(
        "created_before(timestamp: \"{}\")\n",
        created_before
    ));
}

// SNIP

if let Some(contains) = &filter.name {
    query_str.push_str(&format!("has_name(name:\"{}\")\n", contains));
}

query_str.push_str("repo @output\ntag @output\nsize @output\ncreated @output\n");
// Mainly for ease of readability
query_str.push_str("}}");

Then once I have the list of vertices I can map it to a printout, run docker commands on them. Anything I desire! The main part here is just using the queried data and creating a nice CLI interface to interact with it.

Small Bonus UX

Before I go, on the theme of nice CLI interface, I’m using the human-size crate for working with sizes like 2GB etc. Because of this all my size based arguments accept either the number of bytes or a human size. For the snippet I’ll put below I can enter --smaller-than 2000000000 or --smaller-than 2GB which I was very happy with.

/// Common filter options for all commands
#[derive(clap::Parser, Debug)]
pub struct FilterOptions {
    /// Only include files smaller than this size in bytes
    #[arg(long, value_parser = parse_human_size)]
    pub smaller_than: Option<usize>,
}

fn parse_human_size(input: &str) -> Result<usize, String> {
    match human_size::Size::from_str(input) {
        Ok(size) => Ok(size.to_bytes() as usize),
        Err(_) => input
            .parse::<usize>()
            .map_err(|e| format!("Invalid size '{}': {}", input, e)),
    }
}

And that’s it, peruse the code if you want to see more of how it came together. This was just a weekend project so I’ll likely refine the interface a bit more as I use it more but for now it’s already useful for me.

But Wait

Fairly content with myself I wrote up the above and sent it around a bit for feedback. And Predrag dropped a bit of a surprise, most of my queries can just be @filter clauses.

Huh, I never knew they existed. Looking it up @filter isn’t part of GraphQL but is apparently common in some things built on top of GraphQL. Currently, if you want to see what a filter can do you can check the ops here: docs.rs. Then couple that with looking up some of the example queries to see how to use them, there’s a Trustfall playground and you can see some queries there.

Removing our edge queries and adding filter directives we’d change the following:

{
  Image{
    name_matches(substring: "$name_regex")
    size_in_range(min: $larger_than, max: &smaller_than)
    name @output
    size @output
    created @output
  }
}

Into this:

{
  Image{
    name @output
      @filter(op: "regex", value: ["$name_regex"])
    size @output
      @filter(op: "<", value: ["$smaller_than"])
      @filter(op: ">", value: ["$larger_than"])
    created @output
  }
}

There is some difference in how we call this with trustfall::execute_query as well. With the first one we need to replace the arguments in the query string, whereas in the second we need to use the args argument to supply our named arguments. Filter operations in Trustfall won’t work with literals preventing us from placing the values in on query generation.

Now comes the time to rework things and see how much code can be deleted. But first this will be my new Trustfall schema with unnecessary edge queries removed:

schema {
  query: Query
}

type Query {
  Image: [Image!]!
}

type Image {
  name: String,
  repo: String,
  tag: String,
  size: Int!,
  created: String!

  # Filtering via edges (with parameters)
  created_after(timestamp: String!): [Image!]!
  created_before(timestamp: String!): [Image!]!
}

I’ve added a name field which will be the string {repo}:{tag}, for convenience.

let mut query_args: BTreeMap<Arc<str>, trustfall::FieldValue> = BTreeMap::new();
let mut query_str = "{Image{".to_string();

// SNIP

let mut size_filter = String::new();
let mut name_filter = String::new();

if let Some(contains) = &filter.name {
    name_filter.push_str(r#"@filter(op: "=", value: ["$name_eq"])"#);
    name_filter.push('\n');
    query_args.insert(Arc::from("name_eq".to_string()), contains.into());
}

query_str.push_str(&format!(
    "name @output\n{name_filter}size @output\n{size_filter}created @output\n"
));

query_str.push_str("}}");
let adapter = Arc::new(Adapter::new());

let vertices = execute_query(Adapter::schema(), adapter, &query_str, query_args).unwrap();

How many lines were deleted, time to see what the commit says:

6 files changed, 55 insertions(+), 148 deletions(-)

Not as many lines removed as I would have thought before starting but the need to build up the args map added some extra lines.

Conclusion

I’ll be refining the interface of this tool to become more useful for my own needs. However, I don’t plan on publishing it or it getting usage from other people. The main hope is that this introduces some people to Trustfall and how they can start to utilise it as a query engine for their own data.

Going forward, I’m also going to work more on the Trustfall official documentation, some of which I’ve already started doing. If this has been of interest and you want to learn more check out Predrag’s site for more talks/blogposts and other things. Also, a big thanks to Predrag for taking the time to talk to me about Trustfall and answer some of my questions plus provide feedback on this post.

/2025/08/18/Making-CLI-tools-with-Trustfall

Tarpaulins Week Of Speed Pt2

May 20, 2025 Updated May 20, 2025

You might want to read the previous post. This post is going to be much the same stuff, while skipping going over things I talked about before. And with some changes in approach, and different tools than perf and cargo flamegraph!

Show full content

New Tools!

Firstly, now that polars doesn’t take 40 minutes for a tarpaulin run, I’ll be using hyperfine to get some statistics on the tarpaulin runs.

And from the comments of the last post from /u/Shnatsel:

For profiling I’ve moved away from perf + flamegraphs to https://crates.io/crates/samply, which provides the same flamegraphs plus much more, and you even get to share your results to anyone with a web browser in just two clicks. It’s also portable to mac and windows, not just linux. It has become my go-to tool for profiling and I cannot recommend it enough.

Strong praise indeed, so I’m finally giving it a try!

The First Win

Using samply I saw that HashMap::get in generate_subreport was taking most of the time in report generation now. This time I ran it on the llvm-profparsers CLI tool instead of tarpaulin on some files I had lying around from another project I was testing. At this point I was mainly seeing if samply worked.

Running samply was as simple as:

samply record target/release/cov show --instr-profile benches/data/mapping/profraws/* --object benches/data/mapping/binaries/*

Now this code does go over the region expressions and generate count values and add to a list of pending expressions. You can see the full function at this point in time here. But it’s long so I won’t paste it all in but instead explain a bit.

There is a map of region IDs to counts and as we go over the expression list, we get elements from the initial version and update values as we resolve expressions. The HashMap::get calls happen in loops as we go over the expression tree.

Initially, we do one pass through to build up the pending expressions and then loop over the pending list until we finish it. Reducing the number of times we loop over things should speed things up.

I noticed previously that the pending expressions are represented by a flattened tree, which means the deeper nodes are located at indices further into the list. This means that, potentially, we can iterate over the list fewer times if we iterate over it in reverse first. Processing those deeper nodes first will mean that the nodes above them in the tree structure will get resolved quicker.

With that in mind I did the following diff:

< for (expr_index, expr) in func.expressions.iter().enumerate()
---
> for (expr_index, expr) in func.expressions.iter().enumerate().rev()

Running the benchmark and…

Benchmarking report generation: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. 
report generation       time:   [806.43 ms 809.27 ms 812.17 ms]                              
                        change: [-79.923% -79.433% -78.921%] (p = 0.00 < 0.05)
                        Performance has improved.

Benchmarking subreport generation: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. 
subreport generation    time:   [117.87 ms 119.28 ms 120.84 ms]                                 
                        change: [-22.306% -21.266% -20.132%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
  9 (9.00%) high mild
  4 (4.00%) high severe

This takes report generation from ~3.8s to ~0.8s. I also noticed in report generation there was a HashMap where I could reserve the additional capacity and maybe save some allocations. Running samply again I see the part of the code taking the most time is now the profraw parsing not report generation!

Oh dear glancing briefly at the report parsing I see:

let name = symtab.names.get(&data.name_ref).cloned();
let (hash, name_hash) = if symtab.contains(data.name_ref) {
    // SNIP
};

That can just be a if name.is_some() and avoid another HashMap lookup. When I compared this parsing code to llvm, my initial implementation was significantly faster. With the difficulties in the coverage mapping part of the code and then its performance issues, I didn’t really go back and see if I could speed up the profraw parsing anymore, but this should be an easy win.

Benchmarking Results #1

With three changes under our belt lets take a look at the impact. This time I’m running on:

Polars (commit 5222107)
tokio (commit 0cf95f0)
datafusion-common (commit 3e30f77)
jiff (commit 08abead)
ring (commit a041a75)

I do one warmup run and for some crates I’ve slightly adapted the tarpaulin command to either get it working, make it not unreasonably slow or include even more code/features. I’ll also be checking the change in coverage to make sure the iteration order change doesn’t cause a different in the results.

For each tarpaulin invocation I share this was ran in hyperfine as:

hyperfine --warmup 1 --export-markdown results.md "$TARPAULIN_COMMAND"

After running these I realised I could have ran both tarpaulin commands together if I used absolute paths instead of changing my local tarpaulin installation. I’ll keep that in mind for next time.

Running polars with the command cargo tarpaulin --engine llvm --skip-clean:

Command Mean [s] Min [s] Max [s] baseline 155.460 ± 1.425 152.717 157.120 iteration order 135.417 ± 3.720 129.325 139.286

Running datafusion-common with the command cargo tarpaulin --engine llvm --skip-clean -p datafusion-common:

Command Mean [s] Min [s] Max [s] baseline 10.075 ± 0.069 9.942 10.155 iteration order 9.527 ± 0.092 9.407 9.669

Running jiff with cargo tarpaulin --engine llvm --skip-clean --all-features:

Command Mean [s] Min [s] Max [s] baseline 3.241 ± 0.011 3.218 3.260 iteration order 3.063 ± 0.064 2.892 3.119

Running ring with cargo tarpaulin --engine llvm --skip-clean --all-features --release:

Command Mean [s] Min [s] Max [s] baseline 77.923 ± 0.954 77.394 80.529 iteration order 76.485 ± 0.124 76.211 76.671

Running tokio with cargo tarpaulin --engine llvm --skip-clean --all-features:

Command Mean [s] Min [s] Max [s] baseline 68.236 ± 0.563 67.080 68.992 iteration order 59.273 ± 0.477 58.751 60.484

This improvement ranges from 2% to 14% faster than the previous version. Moreover, the coverage never changed so we can be pretty sure this doesn’t introduce any issues!

Going for a Second Win

Time to generate another flamegraph and look for something to attack. This time here’s a screenshot from samply of how it looks, hopefully nothing unfamiliar:

Flamegraph

Looking at the flamegraph now for polars I can see reading the object files is a big one. I do use fs::read to read the object files and get the sections for parsing of relevant information. But I can probably avoid reading the entire file and provided those sections are before the executable code hopefully avoid reading some data.

Before I go too deeply into this lets just patch the code and see what Criterion says. Right changing this:

let binary_data = fs::read(object)?;
let object_file = object::File::parse(&*binary_data)?;

Into:

let binary_data = ReadCache::new(fs::File::open(object)?);
let object_file = object::File::parse(&binary_data)?;

And the parsing functions need a generic parameter added so that &Section<'_, '_> turns into &Section<'data, '_, R> where R: ReadRef<'data>.

An initial run with criterion confirms this assumption:

coverage mapping        time:   [23.541 ms 23.817 ms 24.115 ms]                             
                        change: [-70.666% -69.879% -69.128%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  6 (6.00%) high mild

Next step is to look at the order of the object sections and call my parsing code in that order so we roughly read things in order. That should be more cache friendly and hopefully criterion agrees…

Using objdump -h we can list the headers in an ELF file and their offsets. I don’t see a reason for LLVM to be changing the order based on architecture and most users are on Linux anyway, so we’ll stick with the Linux order.

Running it on one of the benchmark binaries with some of the output removed for readability:

$ objdump -h benches/data/mapping/binaries/test_iter-e3db028e51c53c16

benches/data/mapping/binaries/test_iter-e3db028e51c53c16:     file format elf64-x86-64

Sections:
Idx Name          Size      VMA               LMA               File off  Algn
// SNIP
 29 __llvm_prf_cnts 0004ed10  000000000063f9e8  000000000063f9e8  0063f9e8  2**3
                  CONTENTS, ALLOC, LOAD, DATA
 30 __llvm_prf_data 00109900  000000000068e6f8  000000000068e6f8  0068e6f8  2**3
                  CONTENTS, ALLOC, LOAD, DATA
// SNIP
 35 __llvm_covfun 0038cc95  0000000000000000  0000000000000000  0079e198  2**3
                  CONTENTS, READONLY
 36 __llvm_covmap 0000f8d0  0000000000000000  0000000000000000  00b2ae30  2**3
                  CONTENTS, READONLY
// SNIP

This is the output limited to the sections I parse. And, there is a difference with my order of parsing. I do llvm_covfun, llvm_covmap, llvm_prf_cnts and then llvm_prf_data. There’s also zero dependencies between the four for parsing so I can safely reorder them as desired.

coverage mapping        time:   [22.903 ms 23.062 ms 23.236 ms]
                        change: [-4.5429% -3.1690% -1.7429%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  4 (4.00%) high mild
  3 (3.00%) high severe

This seems to back up the hypothesis, although it’s a small change relatively and I wouldn’t trust Criterion that much. Micro-benchmarks tend to be rough indicators rather than sure-fire confirmations. Indeed I sometimes see consecutive Criterion runs differ by up to 10% with no changes to the code. Therefore anything below 10% I take with a grain of salt.

However, this change does make logical sense so we’ll keep it. The next step will be running this on our test projects! Hopefully, we’ll see some positive changes.

Benchmarking Results #2

For a level of consistency, I’ll include the previous table with the new results in as the final row. All commands ran are the same as previously stated.

Polars:

Command Mean [s] Min [s] Max [s] baseline 155.460 ± 1.425 152.717 157.120 iteration order 135.417 ± 3.720 129.325 139.286 file optimisation 93.891 ± 1.646 90.838 96.526

Oh wow another big win, time to see how it does in the other projects!

Datafusion-common:

Command Mean [s] Min [s] Max [s] baseline 10.075 ± 0.069 9.942 10.155 iteration order 9.527 ± 0.092 9.407 9.669 file optimisation 9.300 ± 0.154 8.988 9.491

Jiff:

Command Mean [s] Min [s] Max [s] baseline 3.241 ± 0.011 3.218 3.260 iteration order 3.063 ± 0.064 2.892 3.119 file optimisation 3.153 ± 0.021 3.112 3.178

Ring:

Command Mean [s] Min [s] Max [s] baseline 77.923 ± 0.954 77.394 80.529 iteration order 76.485 ± 0.124 76.211 76.671 file optimisation 77.825 ± 0.943 77.140 80.136

Tokio:

Command Mean [s] Min [s] Max [s] baseline 68.236 ± 0.563 67.080 68.992 iteration order 59.273 ± 0.477 58.751 60.484 file optimisation 60.592 ± 0.800 59.998 62.790

Well aside from polars and datafusion-common everything has gotten a little bit slower. This could be some noise or a sign this approach isn’t perfect. Although, if it’s only ever a small regression it should be a win overall.

Thinking about it, if I wrap the File in an std::io::BufReader I might win back some of those losses through larger less frequent file reads…

Benchmarking Results #3

Time for BufReader to hopefully save the day.

Polars:

Command Mean [s] Min [s] Max [s] baseline 155.460 ± 1.425 152.717 157.120 iteration order 135.417 ± 3.720 129.325 139.286 file optimisation 93.891 ± 1.646 90.838 96.526 BufReader 87.733 ± 0.300 87.114 88.072

Datafusion-common:

Command Mean [s] Min [s] Max [s] baseline 10.075 ± 0.069 9.942 10.155 iteration order 9.527 ± 0.092 9.407 9.669 file optimisation 9.300 ± 0.154 8.988 9.491 BufReader 9.160 ± 0.065 9.055 9.231

Jiff:

Command Mean [s] Min [s] Max [s] baseline 3.241 ± 0.011 3.218 3.260 iteration order 3.063 ± 0.064 2.892 3.119 file optimisation 3.153 ± 0.021 3.112 3.178 BufReader 3.118 ± 0.015 3.093 3.139

Ring:

Command Mean [s] Min [s] Max [s] baseline 77.923 ± 0.954 77.394 80.529 iteration order 76.485 ± 0.124 76.211 76.671 file optimisation 77.825 ± 0.943 77.140 80.136 BufReader 77.324 ± 0.108 77.213 77.505

Tokio:

Command Mean [s] Min [s] Max [s] baseline 68.236 ± 0.563 67.080 68.992 iteration order 59.273 ± 0.477 58.751 60.484 file optimisation 60.592 ± 0.800 59.998 62.790 BufReader 57.703 ± 0.396 56.901 58.145

Okay this looks better again. I could play around with memmap, but it’s not a safe OS level API and I don’t think this is worth adding unsafe into things for.

Let’s also take a look at the flamegraph just to see how things have been shaken up

Flamegraph

Here I actually highlighted the CoverageMapping::new call because whereas before it took up most of the time, now it’s so small you can’t read the symbol name without expanding on the block. I imagine for projects that generate a lot of very large test binaries this change likely impacts the most. This also includes high LoC counts in the crate plus used dependency code as that will increase the symbol table and count metadata in the ELF sections (and other object files of your own choosing).

Final Change Today

Okay, we’ve done a lot of changes with not a lot of diffs to the project and gotten some big changes in performance. Now there’s one more thing I’d like to try out from this issue.

[profile.release]
codegen-units = 1
lto = true

Link-Time Optimisation (LTO) is a technique that applies optimisations at the link stage taking into account all the crates being linked in and not just optimising crates individually. It should decrease the binary size and hopefully up the performance as the issue states. Let’s try it out:

LTO Results

Polars:

Datafusion-common:

Command Mean [s] Min [s] Max [s] baseline 10.075 ± 0.069 9.942 10.155 iteration order 9.527 ± 0.092 9.407 9.669 file optimisation 9.300 ± 0.154 8.988 9.491 BufReader 9.160 ± 0.065 9.055 9.231 LTO 8.944 ± 0.058 8.848 9.020

Jiff:

Command Mean [s] Min [s] Max [s] baseline 3.241 ± 0.011 3.218 3.260 iteration order 3.063 ± 0.064 2.892 3.119 file optimisation 3.153 ± 0.021 3.112 3.178 BufReader 3.118 ± 0.015 3.093 3.139 LTO 3.065 ± 0.027 3.033 3.109

Ring:

Tokio:

Well time for the takeaways. I guess generally, people should use samply over perf if it works out of the box. That is if you’re not using any perf features that aren’t supported (if there are any). Doing multiple runs and statistics are nice. Buffering file IO and reading the least amount of data possible does have knock on effects. Preallocate storage when possible and sometimes iteration order matters a lot. Benchmark on a variety of real life workloads as well.

There’s not some big magic takeaway here. Performance work is often just trying to get better measurements to judge your changes. Then when you have them trying to reduce allocations, instructions executed and data read/written.

Being methodical helps, noting down your results and keeping raw data around for analysis is also very helpful.

Now all this is done I should set about the process of releasing it all to the general public. Laters!

/2025/05/20/Tarpaulins-Week-of-Speed-pt2

Tarpaulins Week Of Speed

May 8, 2025 Updated May 8, 2025

This week I’ve dropped two cargo-tarpaulin releases, both of which have significant speed-ups for LLVM coverage reporting. When I say significant, the first one gave me a ~80% speedup on a benchmark, and then the next one was a 90% speedup. Meaning on this benchmark I achieved overall a 98% speedup.

Show full content

Obviously, with getting speed-ups of this significance I was doing something really stupid, that’s true for the first speed-up. For the second one, it wasn’t so much idiocy as a missed opportunity.

But before that let’s dive into some context!

The Preamble

Tarpaulin is a code coverage tool, and as part of coverage it offers two backends:

ptrace - UNIX process tracing API, like a debugger. Not fully accurate but more flexible on where you can get coverage data
LLVM instrumentation - runtime bundled into a binary which exports some files. Fully accurate, some limitations of what coverage it can report

The LLVM coverage instrumentation was slow, so how does that work?

You run your tests
It spits out a profraw file
Typically you use llvm-profdata (found in the llvm-tools rustup component) to turn a profraw file to a merged (profdata) file
Typically you use llvm-cov (also in llvm-tools) to take the executable file and your profdata and make a report

I say typically here because I always find it annoying having to install multiple tools to use one, and when tools go installing other things. This doesn’t matter as much in CI but in my own hubris I implemented the profraw parsing and mapping here.

Note I do have a feature planned to optionally use the llvm-tools

But anyway, it’s this parsing code that’s the issue. And we can tell this by the fact tarpaulin can go 6 minutes plus on some projects just after printing out Mapping coverage data to source before we see anything else.

Of course, the best way to figure things out is benchmarking on realistic input data, and we really want something that will stress tarpaulin so prompted by a users comment on a GitHub issue I’ll be comparing execution times using polars. This isn’t where the benchmark in the intro came from - that’s just the criterion benchmarks in the repo. For development, I looked for a project that a user mentioned.

After doing cargo tarpaulin --engine llvm --no-run to avoid measuring the cost of downloading dependencies and building the tests. This is the initial result:

Then for testing, I’ll run:

time cargo tarpaulin --engine llvm --skip-clean
// SNIP tarpaulin output
real	41m34.411s
user	40m1.784s
sys	1m4.368s

Looking, I can also see we generated 37 profraws so we’re dealing with at least 37 test binaries and 37 calls to merge the reports. Running polars without tarpaulin but with the coverage instrumentation we get:

real	0m3.103s
user	0m7.077s
sys	0m1.715s

So there’s clearly a big gap to close. We can’t get down to 3s but we should try to get as close as possible. Though it is worth mentioning that the instrumentation profiling does seem to balloon the compilation time A LOT.

The Idiocy

So first off let’s isolate the code that caused the issue. Then once we’ve discussed it I’ll go a bit into how I figured this out and then use that technique again to find the next issue.

while !pending_exprs.is_empty() {
    assert!(tries_left > 0);
    if index >= pending_exprs.len() {
        index = 0;
        tries_left -= 1;
    }
    let (expr_index, expr) = pending_exprs[index];
    let lhs = region_ids.get(&expr.lhs);
    let rhs = region_ids.get(&expr.rhs);
    match (lhs, rhs) {
        (Some(lhs), Some(rhs)) => {
            pending_exprs.remove(index);
            let count = match expr.kind {
                ExprKind::Subtract => lhs - rhs,
                ExprKind::Add => lhs + rhs,
            };

            let counter = Counter {
                kind: CounterType::Expression(expr.kind),
                id: expr_index as _,
            };

            region_ids.insert(counter, count);
            if let Some(expr_region) = func.regions.iter().find(|x| {
                x.count.is_expression() && x.count.id == expr_index as u64
            }) {
                let result = report
                    .files
                    .entry(paths[expr_region.file_id].clone())
                    .or_default();
                result.insert(expr_region.loc.clone(), count as _);
            }
        }
        _ => {
            index += 1;
            continue;
        }
    }
}

Here pending_exprs is a Vec<(usize, Expression)> where expression is just two counts and a binary operator to apply to them. So pretty small types. This code then goes through a list of expressions that depend on other expressions and evaluates them, stores the result and then passes through until all the expressions are resolved.

I remember roughly writing this code, and I remember my main concern was figuring out how the expressions and coverage regions worked. And that’s why I did the big dumb-dumb. And dear reader if you want to take a guess you have until the next paragraph to finish making it before I start revealing.

The entire issue is pending_exprs.remove(index);. With this line, I remove an element from potentially the start of the vector, and then have to move every element back filling in the hole. These expressions are also used to work out condition coverage. So the more boolean sub-conditions you have and the more branches in your code the more expressions you have in a function. I haven’t attempted to prove it, but I feel the number of regions should correlate with the cyclometric complexity of a function.

That aside there are two “easy” solutions for this:

Try to iterate through the list in reverse order so hopefully, you can remove from the back and significantly reduce copies
Have a way to mark a vec element as done so you can skip it in the list and do no copies

Intuitively, I feel that 1. has too much risk of copying too much and causing a performance hit. So I went for 2. To do this I make the pending expression type an Option and set it to None instead of removing it.

After this, the while statement looks like this:

let mut index = 0;
let mut cleared_expressions = 0;
let mut tries_left = pending_exprs.len() + 1;
while pending_exprs.len() != cleared_expressions {
    assert!(tries_left > 0);
    if index >= pending_exprs.len() {
        index = 0;
        tries_left -= 1;
    }
    let (expr_index, expr) = match pending_exprs[index].as_ref() {
        Some((idx, expr)) => (idx, expr),
        None => {
            index += 1;
            continue;
        }
    };
    let lhs = region_ids.get(&expr.lhs);
    let rhs = region_ids.get(&expr.rhs);
    match (lhs, rhs) {
        (Some(lhs), Some(rhs)) => {
            let count = match expr.kind {
                ExprKind::Subtract => lhs - rhs,
                ExprKind::Add => lhs + rhs,
            };

            let counter = Counter {
                kind: CounterType::Expression(expr.kind),
                id: *expr_index as _,
            };

            region_ids.insert(counter, count);
            if let Some(expr_region) = func.regions.iter().find(|x| {
                x.count.is_expression() && x.count.id == *expr_index as u64
            }) {
                let result = report
                    .files
                    .entry(paths[expr_region.file_id].clone())
                    .or_default();
                result.insert(expr_region.loc.clone(), count as _);
            }
            pending_exprs[index] = None;
            cleared_expressions += 1;
        }
        _ => {
            index += 1;
            continue;
        }
    }
}

After this change, the polars run is:

real	2m24.941s
user	1m27.586s
sys	1m2.708s

How to Find Bad Code (Flamegraphs)

Now that part is all well and good, a massive win. Pat on the back and go home. But how did I find where to fix my dumb mistake? The relevant part of the parsing code is probably only 500 lines or so. I guess I could have read it or sprinkled some print statements around. But this is serious work so I aim to be a smidgen scientific. And in enter flamegraphs.

Just add the following to Tarpaulin’s Cargo.toml so I can get function names instead of addresses and install the local version:

[profile.release]
debug = true

Next up I want to run tarpaulin with perf to get a perf.data file. Perf is a sampling profiler so no instrumentation or changes to your binary - except for the debug symbols to make life easier. I also use inferno by Jon Gjengset to generate the flamegraphs:

perf record --call-graph dwarf --  cargo tarpaulin --engine llvm --skip-clean
perf script | inferno-collapse-perf | inferno-flamegraph > perf_flamegraph.svg

If you want to understand more about perf and getting useful calls for it you probably want to look at anything Brendan Gregg has written link. Also Denis Bakhvalov’s book is great link.

Unfortunately, I lost the flamegraph I originally did. But it wasn’t of polars but a smaller project. And serializing the perf.data files afterwards is so slow I didn’t fancy a 1 hour plus wait to create a flamegraph of before the stupid issue was fixed so you’ll have to live with the after flamegraph.

After:

Flamegraph

You should be able to open this SVG in your browser, and click on blocks to zoom in on that execution stack. You can use that to dig in and see what some of the really small thin towers are. Before the trace was massively taken up by calls to Vec<T>::remove, which is why I tackled that first.

Why Not Cargo-Flamegraph?

I also tried using cargo-flamegraph. However, the flamegraph was mostly dominated by cargo and tarpaulin was only a tiny sliver so something was wrong there. But I’ve used cargo-flamegraph before successfully so if you want to see the command it was:

flamegraph -- cargo tarpaulin --engine llvm --skip-clean

I’m not sure why the cargo dominance, but I did during this have some faff in trying to stop it rebuilding the tests on the profiling runs. Skipping the build step was mainly so tarpaulin calling out to cargo didn’t dominate the perf data and to keep the size down. Large perf data files take an age to collapse and render so for my own sanity I wanted to avoid that.

Still Slow?

Well I cut a release and updated the issue and:

After updating to Tarpaulin 0.32.4, I see a performance boost of around ~25%, which is already a good improvement. I tried creating a fresh project and adding heavy dependencies to it (like polars-rs), but the performance is still quite fast. So, I don’t know how to provide a minimal example where this step takes too long to execute.

And looking at that new flamegraph we’ve got a lot of time spent in find. It’s the new dominant time. This feels like something I can fix. Going back to the code, I can see instantly where the find is:

let record = self.profile.records().iter().find(|x| {
    x.hash == Some(func.header.fn_hash) && Some(func.header.name_hash) == x.name_hash
});

That’s not great, and I don’t think we can search the records to do things like binary searching for the hashes. But there’s something easier to do…

Memoization

For previous performance work, some memoization was added for the profraw parsing, an FxHashMap going from the strings to the index in the records array. There’s already a symbol table going from hashes to names so I can look up the name then get the index from the name and get the exact record. This turns one find on a vec to two hashmap lookups and then the array access.

So adding this method to my InstrumentationProfile type:

pub fn find_record_by_hash(&self, hash: u64) -> Option<&NamedInstrProfRecord> {
    let name = self.symtab.get(hash)?;
    self.find_record_by_name(name)
}

I then change the find call to:

let record = self.profile.find_record_by_hash(func.header.name_hash);

And now the times for the outputs:

real	2m24.562s
user	1m25.707s
sys	1m3.097s

And the flamegraph:

Flamegraph

The fact you see other stages of the processing pipeline does show it’s more balanced. But given how similar the times are, the first flamegraph might just be some sort of sampling weirdness…

Some Thoughts

Now, looking at these results I don’t see much change. Which is disappointing, although the flamegraph is a lot different. Running the Criterion benchmarks for my profparser crate I do see an 80% speedup on top of the previous speedup. This indicates that either there’s something a bit screwy with the benchmark for the data is respresentative of a different domain than the polars code.

The find scans across all the function records, and remove handles the coverage region expressions. There is some interplay between the amount of functions in your project and the complexity of the coverage regions of those functions. And where do most of the complex functions live? Near the start of the records list or the end?

All these things could dramatically change how much impact these different speed-ups give different projects. The user who raised the issue has also confirmed that this lead to a massive speedup for them so mission accomplished!

The fact that this change lead to very little change in polars results but a big change in the users project does show that judging tradeoffs in this code can potentially be very tricky in future.

More Speed-up?

The LLVM profiling data does require (at least for polars), parsing around 1GB of data files. Then after parsing the extra ELF sections added expression evaluation and a lot of merging/resolution work. Looking at the profile I do see a lot of time spent reading files. From the sampling 51% of the tarpaulin running time is handling the instrumentation runs - which will be parsing, merging and extracting relevant data from the profraws and binaries. But ~30% is just reading files.

Now this can be thrown off by inaccuracies in sampling, and there is potentially room for some multi-threading to be applied smartly to speed up parsing of a mass of files. But it seems the low-hanging fruit may have all been plucked.

While, I can try things, seeing around 96% speed-ups on one project but only 25% on another with the same code being the bottleneck it feels like a more analytic approach is in order.

The next steps will be to gather better insights into the shape of the data for different projects. So things like the number of function records, the number of counters in those. The number of expressions and how many steps to resolve them. Start to analyse how these vary and how these variations impact performance. Part of this will also be creating a corpus of test projects where I can stress the code in different ways. But that’s a topic for a future post.

/2025/05/08/Tarpaulins-Week-of-Speed

Announcing Wiremocket: Wiremock For Websockets

Mar 3, 2025 Updated Mar 3, 2025

This is an announcement blogpost for wiremocket a wiremock influenced library for mocking websocket servers.

Show full content

This is an announcement blogpost for wiremocket a wiremock influenced library for mocking websocket servers.

The Motivation

At Dayjob™ we have a number of websocket APIs, this is because they offer bidirectional streaming where the client can send a stream of data and the server at the same time can stream back data. We do this because we work with audio APIs and real time audio/visual systems.

When testing these systems it’s too heavy to run multiple docker images with AI models in CI meaning for an application which interacts with multiple websocket connections we have a few options for testing:

Split out functionality to allow for testing the meat of it without the client connection
Replace the websocket client type with some trait abstraction and in tests inject an in-memory alternative
Implement a bespoke websocket server just for this projects tests
Don’t test it

Well today I’ve published a 5th option, use a mocking library to help generate a server quickly and ensure requests match what are expected.

Why Not PR Wiremock?

There’s an issue for this in wiremock opened by a colleague. I’m not saying this won’t go into wiremock eventually in some form, but it is a big change and the two domains dissimilar enough the overlap could confuse existing users.

What’s So Difficult?

Streaming. Streaming is always difficult.

Given a server we may want to match on information available when the connection is established such as:

HTTP headers
The URL path
The URL query parameters

But we may also have things like different input formats available and want to match on the messages coming over the connection.

To explain these difficulties perhaps an example will be clearer.

A Big Example

In addition, with the responses we want the ability for the input messages to impact the responses.

As an example of the capabilities let’s look at the process for testing some code using websockets where we have a few requirements:

When the client requests to /api/binary_stream
The first message is valid json
The last message before closing is valid json
All messages between these are binary data
A close frame is sent

We can set this up by implementing our own temporal matcher and using some already provided matchers. Try not to focus too much on the implemented matcher as this is where most of the complexity will lie and the docs explain a lot about how this works.

impl Match for BinaryStreamMatcher {
    fn temporal_match(&self, match_state: &mut MatchState) -> Option<bool> {
        let json = ValidJsonMatcher;
        let len = match_state.len();
        let last = match_state.last();
        if len == 1 && json.unary_match(last).unwrap() {
            match_state.keep_message(0);
            Some(true)
        } else if last.is_binary() {
            // We won't keep any binary messages!
            if len > 1 && match_state.get_message(len - 2).is_none() {
                Some(true)
            } else {
                Some(false)
            }
        } else if last.is_close() {
            if len == 1 {
                None
            } else {
                let message = match_state.get_message(len - 2);
                if let Some(message) = message {
                    json.unary_match(message)
                } else {
                    Some(false)
                }
            }
        } else if last.is_text() {
            let res = json.unary_match(last);
            match_state.keep_message(len - 1);
            res
        } else {
            None
        }
    }
}

#[tokio::test]
async fn binary_stream_matcher_passes() {
    let server = MockServer::start().await;

    // So our pretend API here will send off a json and then after that every packet will be binary
    // and then the last one a json followed by a close
    server
        .register(
            Mock::given(path("api/binary_stream"))
                .add_matcher(BinaryStreamMatcher)
                .add_matcher(CloseFrameReceivedMatcher)
                .expect(1);
        )
        .await;

    println!("connecting to: {}", server.uri());
    
    let (mut stream, response) = connect_async(format!("{}/api/binary_stream", server.uri()))
        .await
        .unwrap();

    let data: Vec<u8> = vec![0, 1, 2, 3, 4, 5, 6, 7, 8, 1, 2, 3, 4, 5, 6, 7];

    let val = json!({"command": "start"});
    stream.send(Message::text(val.to_string())).await.unwrap();
    stream.send(Message::Binary(data.into())).await.unwrap();
    let val = json!({"command": "stop"});
    stream.send(Message::text(val.to_string())).await.unwrap();
    stream.send(Message::Close(None)).await.unwrap();

    std::mem::drop(stream);

    // Asserts the match conditions were met.
    server.verify().await;
}

Ignoring the details of the matcher implementation, the code with-in the test should look familiar to any wiremock users. We add matchers to a mock, an expected number of calls and send off requests and assert on them.

But we can also see the extra complexity that matching on streams and looking at sequence based behaviour causes.

What’s Next?

I’ve been so deep in the sauce I haven’t yet used this in anger. So that will be my next step. However, the code I’ll be using it on is all closed source meaning whether it works well or not is going to be a “trust me bro” until people use it.

There’s also a few things I haven’t implemented that are in wiremock. Things like:

Storing all the requests received by the server during a test
Connection pooling
Fancier reports
Some ergonomic improvements (my mock construction should really use a builder)
More matchers
More out-of-the-box responders
Stare intensely at grpc and wonder if I want to make a mockery of myself

/2025/03/03/announcing-wiremocket:-wiremock-for-websockets

Streaming Audio Apis: The Axum Server

Jan 20, 2025 Updated Jan 20, 2025

All the code can be found on this branch if you look at main you’ll see future post content there!

Show full content

All the code can be found on this branch if you look at main you’ll see future post content there!

For the API in this project we’re going to be using Axum, this is largely because of my own familiarity with Axum. Although, there is a small benefit where Axum and Tonic use the same request router so multiplexing between REST and gRPC becomes easier in the future if we want to do it.

One important tip when you’re looking at Axum, unlike actix-web and other frameworks there isn’t a website with a tutorial and examples so you have to rely on the docs and Github. When you go on Github view the tag for the version you’re using as the main branch potentially has a number of breaking changes that stop the examples from working with the last released version.

At time of writing the latest released Axum is 0.8.1 so this will be added to the Cargo.toml in the dependencies section:

axum = { version = "0.8.1", features = ["tracing", "ws"] }

With that preamble out of the way let’s get started with defining our function to launch the server. Initially, we’ll just start with a simple health-check, but in future we might want to change the health status if the service needs restarting. We’ll also add a shutdown signal for graceful shutdown so if we get a SIGTERM, the server will stop receiving connections and close after the last request is finished. I’ll also pass in our model’s StreamingContext because we know we’ll need that in future!

use tokio::signal;
use axum::{extract::Extension, response::Json, routing::get, Router};

async fn health_check() -> Json<Value> {
    Json(serde_json::json!({"status": "healthy"}))
}

fn make_service_router(state: Arc<StreamingContext>) -> Router {
    Router::new()
        .route("/api/v1/health", get(health_check))
        .layer(Extension(app_state))
}

pub async fn run_axum_server(app_state: Arc<StreamingContext>) -> anyhow::Result<()> {
    let app = make_service_router(app_state);

    // run it with hyper
    let listener = tokio::net::TcpListener::bind("0.0.0.0:8080").await?;
    info!("listening on {}", listener.local_addr().unwrap());
    axum::serve(listener, app)
        .with_graceful_shutdown(shutdown_signal())
        .await?;
    Ok(())
}

async fn shutdown_signal() {
    let ctrl_c = async {
        signal::ctrl_c()
            .await
            .expect("failed to install Ctrl+C handler");
    };

    #[cfg(unix)]
    let terminate = async {
        signal::unix::signal(signal::unix::SignalKind::terminate())
            .expect("failed to install signal handler")
            .recv()
            .await;
    };

    #[cfg(not(unix))]
    let terminate = std::future::pending::<()>();

    tokio::select! {
        _ = ctrl_c => {},
        _ = terminate => {},
    }
}

Make service router is a separate function for one important reason - testability. This means our tests can easily create a router and test routes etc without binding to a port.

The health check API and all of our APIs designed to be used by users will be in a versioned API hence the /api/v1 prefix to the health check. Versioned APIs are great because they allow us to make breaking changes to our API and just increase the version. We can then potentially keep a backwards compatible endpoint and give users time to migrate any code relying on our service before we remove legacy APIs.

Now we add a launch_server function in the lib.rs which the main function will call to launch everything. It will load a config file with the model settings, create the context and then launch the axum server.

pub async fn launch_server() {
    let config = fs::read("config.json")
        .await
        .expect("Couldn't read server config");
    let config = serde_json::from_slice(&config).expect("Couldn't deserialize config");
    info!(config=?config, "service config loaded");
    let ctx = Arc::new(StreamingContext::new_with_config(config));
    info!("Launching server");
    axum_server::run_axum_server(ctx)
        .await
        .expect("Failed to launch server");
    info!("Server exiting");
}

If we run our server we can now call the health check and see it all working:

$ curl localhost:8080/api/v1/health
{"status":"healthy"}

This is all pretty simple stuff so far and you can find it in the Axum getting started documentation. The next part is where we start to up the complexity because it’s time to add in the websocket handler.

Websockets

Websockets are a streaming protocol built on top of TCP that allows for bidirectional streaming. A websocket resembles a raw TCP socket more than HTTP as it allows streaming between client and server and before HTTP/2 it was the only way to do streaming where the client streams data into the server. If it’s only the server streaming SSE (Server-Sent Events), can be used with HTTP/1.1. Because of this we need to upgrade the HTTP connection to a websocket connection:

use axum::{
    extract::{
        ws::{Message, WebSocket, WebSocketUpgrade},
        Extension,
    },
    response::IntoResponse,
};

async fn ws_handler(
    ws: WebSocketUpgrade,
    vad_processing: bool,
    Extension(state): Extension<Arc<StreamingContext>>,
) -> impl IntoResponse {
    ws.on_upgrade(move |socket| {
        handle_socket(socket, vad_processing, state, metrics)
    })
}

async fn handle_socket(
    socket: WebSocket,
    vad_processing: bool,
    state: Arc<StreamingContext>,
) {
    todo!()
}

Adding this to the router we get:

fn make_service_router(state: Arc<StreamingContext>) -> Router {
    Router::new()
        .route(
            "/api/v1/simple",
            get({
                move |ws, app_state| {
                    ws_handler(ws, false, app_state, metrics_enc)
                }
            }),
        )
        .route(
            "/api/v1/segmented",
            get({
                move |ws, app_state| {
                    ws_handler(ws, true, app_state, metrics_enc)
                }
            }),
        )
        .route("/api/v1/health", get(health_check))
        .layer(Extension(app_state))
}

I’m reusing the sample function for both the simple chunked API and the VAD segmented API, this is mainly because the only difference should be the method called on the context but everything else should be reused. In future we might want to separate it for better tracking of metrics per-endpoint, but as a first implementation we always want to strive for simplicity.

Well now, take a breath because now we’re finally set to finally write the last parts of the API defined in part 2. With this addition there will be a working API that can be called and audio streams in and responses and events out. Before we start coding lets refamiliarise ourselves with the steps we want to follow:

Receive a start message from the client
Use this to set up the audio decoding and inference tasks
Receive audio and forward it into the decoding
Connect audio decoding output to inference tasks (one per channel)
Forward inference output to client
Handle stop requests and either keep connection or disconnect
Wait for another start or data to resume processing

Well waiting for a start message might be a moderately sized block of code that’s repeated. Splitting this into it’s own function used by handle_socket already makes sense. The Websocket type implements Stream and if we split it into a sending and receiving part the receiving part also implements Stream. Because of this I’ll make the function generic on Stream as it can make it easier to test in future and potentially avoids changing the type signature again. Time to implement it as follows:

use futures::stream::Stream;

async fn handle_initial_start<S, E>(receiver: &mut S) -> Option<StartMessage>
where
    S: Stream<Item = Result<Message, E>> + Unpin,
    E: Error,
{
    let mut start = None;

    while let Some(Ok(msg)) = receiver.next().await {
        if let Ok(text) = msg.into_text() {
            match serde_json::from_str::<RequestMessage>(&text) {
                Ok(RequestMessage::Start(start_msg)) => {
                    info!(start=?start, "Initialising streamer");
                    start = Some(start_msg);
                    break;
                }
                Ok(RequestMessage::Stop(_)) => {
                    warn!("Unexpected stop received as first message");
                }
                Err(e) => {
                    error!(json=%text, error=%e, "invalid json");
                }
            }
        }
    }
    start
}

I’ve popped some tracing logs here just to make the process we go through when handshaking clear. This function will ignore any invalid messages and just receive messages from the websocket until it gets a start message or the client disconnects. For invalid start messages we may in future want to return an error to the user but we’ll assume that’s unnecessary for now.

The next steps will be to setup the receiver task to forward message to the client from the inference task, and then wait for the first message. I’ll make a small function for encoding the messages just to make the chained calls look tidier as well.

fn create_websocket_message(event: ApiResponse) -> Result<Message, axum::Error> {
    let string = serde_json::to_string(&event).unwrap();
    Ok(Message::Text(string))
}

async fn handle_socket(
    socket: WebSocket,
    vad_processing: bool,
    state: Arc<StreamingContext>,
) {
    let (sender, mut receiver) = socket.split();

    let (client_sender, client_receiver) = mpsc::channel(8);
    let client_receiver = ReceiverStream::new(client_receiver);
    let recv_task = client_receiver
            .map(create_websocket_message)
            .forward(sender)
            .map(|result| {
                if let Err(e) = result {
                    error!("error sending websocket msg: {}", e);
                }
            });

    let _ = task::spawn(recv_task);

    let mut start = match handle_initial_start(&mut receiver).await {
        Some(start) => start,
        None => {
            info!("Exiting with processing any messages, no data received");
            return;
        }
    };

    'outer: loop {
        todo!("need to start processing things!");
    }
}

Here we use StreamExt::split to split the websocket into a sending and receiving part. This allows us to move sending and receiving into different tasks and do them concurrently instead of having to share one mutable Websocket instance.

The eagle-eyed among us may have realised I’ve preemptively put a label on the loop. This is because at the start of the loop we’ll be setting up the audio decoding tasks and inference tasks. Then we will loop through the stream of messages until we get a stop. After we get the stop a new start or more audio data will resume processing but we want the stop to force an end of processing of the existing audio. For that inference and decoding the audio is finished, so if they batch up in anyway they have to know the last audio is potentially a partial batch.

Setting up the inference tasks:

// Don't forget me from forwarding from inference to client
let (client_sender, client_receiver) = mpsc::channel(8);
// snip
'outer: loop {
    info!("Setting up inference loop");
    let (audio_bytes_tx, audio_bytes_rx) = mpsc::channel(8);
    let mut running_inferences = vec![];
    let mut senders = vec![];
    for channel_id in 0..start.format.channels {
        let client_sender_clone = client_sender.clone();
        let (samples_tx, samples_rx) = mpsc::channel(8);
        let context = state.clone();
        let start_cloned = start.clone();

        let inference_task = async move {
            if vad_processing {
                context
                    .segmented_runner(
                        start_cloned,
                        channel_id,
                        samples_rx,
                        client_sender_clone,
                    )
                    .await
            } else {
                context
                    .inference_runner(channel_id, samples_rx, client_sender_clone)
                    .await
            }
        };

        let handle = task::spawn(inference_task);
        running_inferences.push(handle);
        senders.push(samples_tx);
    }
    
    // Transcoding and message processing 
    todo!("The rest of the owl");

    // Clean up inference task
    for handle in running_inferences.drain(..) {
        match handle.await {
            Ok(Err(e)) => error!("Inference failed: {}", e),
            Err(e) => error!("Inference task panicked: {}", e),
            Ok(Ok(_)) => {}
        }
    }
}

The transcoding task is also fairly simple, we just want to spawn the audio decoder with our channels passed in.

let transcoding_task = task::spawn(
    decode_audio(start.format, audio_bytes_rx, senders).in_current_span()
);

In our current function the last stage is to read the messages from the websocket and send down the appropriate channel. There’s also a small check after we join the running inferences so I’ll include a bit of the code from a previous sample again:

'outer: loop {

    // Code from setting up the inference and transcoding tasks

    let mut got_messages = false;
    let mut disconnect = false;
    while let Some(Ok(msg)) = receiver.next().await {
        match msg {
            Message::Binary(audio) => {
                got_messages = true;
                if let Err(e) = audio_bytes_tx.send(audio.into()).await {
                    warn!("Transcoding channel closed, this may indicate that inference has finished: {}", e);
                    break;
                }
            }
            Message::Text(text) => match serde_json::from_str::<RequestMessage>(&text) {
                Ok(RequestMessage::Start(start_msg)) => {
                    got_messages = true;
                    info!(start=?start, "Reinitialising streamer");
                    start = start_msg;
                    break;
                }
                Ok(RequestMessage::Stop(msg)) => {
                    got_messages = true;
                    info!("Stopping current stream, {:?}", msg);
                    disconnect = msg.disconnect;
                    break;
                }
                Err(e) => {
                    error!(json=%text, error=%e, "invalid json");
                }
            },
            Message::Close(_frame) => {
                info!("Finished streaming request");
                break 'outer;
            }
            _ => {} // We don't care about ping and pong
        }
    }

    std::mem::drop(audio_bytes_tx);
    for handle in running_inferences.drain(..) {
        match handle.await {
            Ok(Err(e)) => error!("Inference failed: {}", e),
            Err(e) => error!("Inference task panicked: {}", e),
            Ok(Ok(_)) => {}
        }
    }
    if let Err(e) = transcoding_task.await.unwrap() {
        error!("Failed from transcoding task: {}", e);
    }
    if !got_messages || disconnect {
        break;
    }
}

When we receive binary data this is audio so we send into our transcoding task. If that sending has failed it means one of two things:

Transcoding failed with an error
We’ve had a single utterance request and the utterances have finished

In this case we break out of the message handling and when we await the other tasks we’ll see if anything went wrong.

For the text messages these should either be start or stop, these come when our current stream ends and a new one is starting or the connection will be closed.

If we exited the websocket message receiving without receiving anything then the client will have disconnected before we did anything and we want to exit, same with a disconnection request hence the check with a final break at the end.

Alongside this I’ve also written a small client to test this. It’s fairly simple using hound to read a WAV file and then streaming it either as fast as possible or in real-time. You can also set the configuration via the CLI. It shouldn’t be too hard to recreate with the tokio-tungstenite docs so I’ve skipped over it currently. But here’s the help text

Usage: client [OPTIONS] --input <INPUT>

Options:
  -i, --input <INPUT>            Input audio file to stream
      --chunk-size <CHUNK_SIZE>  Size of audio chunks to send to the server [default: 256]
  -a, --addr <ADDR>              Address of the streaming server (/api/v1/segmented or /api/v1/simple for vad or non-vad options) [default: ws://localhost:8080/api/v1/segmented]
      --real-time                Attempts to simulate real time streaming by adding a pause between sending proportional to sample rate
      --interim-results          Return interim results before an endpoint is detected
  -h, --help                     Print help

If you try with a voiced file you’ll see output for both endpoints. The bare minimum command given an audio called input.wav and the server running locally is:

client -i input.wav -a ws://localhost:8080/api/v1/simple
client -i input.wav -a ws://localhost:8080/api/v1/segmented

One thing to note, tungstenite has issues with proxies as per this issue. If you need proxy support for a client with your own projects you might want to look at some of the solutions in the linked issue.

With this we’ve tied together all the building blocks in the previous parts. We now have a working API and can stream audio into a model and get results back.

/2025/01/20/streaming-audio-APIs:-the-axum-server

Adding A New Fake To The Fake Crate

Jan 15, 2025 Updated Jan 15, 2025

This post is going to go through my process of PRing something to the fake crate. I’d done it before in a different part of the crate so assumed this would be easy but it took me a bit more time than expected so figured that’s a good reason to write.

Show full content

For anyone who hasn’t seen it before fake is a crate for generating fake values for the purposes of testing. It’s mentioned in zero2prod for generating fake sign-ups to the service and this is where I first heard of it. But, when generating fake API values for testing you sometimes get many more types that need faking than are currently implemented.

When this happens there are three choices:

Implement it in your crate
PR to the fake crate
Give up

Option 1 means if we need to fake this type in a different project we have to copy our implementation around which is a bit of a drag. And option 3 is never an option here (or at least for things that should be this easy). This just leaves us with option 2.

So what are we faking? This time it is base64 strings. I’m dealing with an API which makes the decision to encode some binary data as base64 in a JSON.

Where to put our code

Typically, we also don’t want to implement things in fake if there’s crates that do them (or types in crates we want to fake). I’m going to use the base64.

Starting off, going into the fake folder in the root of the repo to add to the crate:

cd fake
cargo add base64 --optional

Now lets look at the folder structure in src to see where we might put our code:

├── src
│   ├── bin
│   ├── faker
│   ├── impls
│   │   ├── base64
│   │   ├── bigdecimal
│   │   ├── bson_oid
│   │   ├── chrono
│   │   ├── chrono_tz
│   │   ├── color
│   │   ├── decimal
│   │   ├── geo
│   │   ├── glam
│   │   ├── http
│   │   ├── indexmap
│   │   ├── semver
│   │   ├── serde_json
│   │   ├── std
│   │   ├── time
│   │   ├── ulid
│   │   ├── url
│   │   ├── uuid
│   │   └── zerocopy_byteorder
│   └─── locales

Looking at where to add our code it can be hard to figure out between the faker and impl modules. But in impls we see there’s folders for a bunch of 3rd party crates. Making that the perfect place to start.

Adding a base64 module into impls I can start to implement the fake stuff.

Implementing it

The core of the fake implementations is the Dummy trait. For those who don’t want to leave this post I’ll share the trait definition (minus a method with a default impl).

pub trait Dummy<T>: Sized {
    // Required method
    fn dummy_with_rng<R: Rng + ?Sized>(config: &T, rng: &mut R) -> Self;
}

The config type T is often used in a type-state manner. Looking at the uuid fake impl we can see types for UUIDv8 etc which are provided like impl Dummy<UUIDv8> for Uuid.

There’s not that many variants in base64, looking at the engine the main things we can vary are:

The data being encoded as base64
Whether we pad the data or not.

To vary these we’ll use the Faker type to generate random data and a random boolean and then encode the base64 using these. My initial errors all boiled down to not using the cheaply constructed Faker type. But once I realised that it felt pretty simple.

In the new fake/src/impls/base64/mod.rs the initial implementation looks something like:

use crate::{Dummy, Fake, Faker};
use base64::prelude::*;

pub struct Base64;

impl Dummy<Base64> for String {
    fn dummy_with_rng<R: rand::Rng + ?Sized>(_: &Base64, rng: &mut R) -> Self {
        let data: Vec<u8> = Faker.fake_with_rng(rng);
        let padding = Faker.fake_with_rng(rng);
        let encoded = if padding {
            BASE64_STANDARD.encode(&data)
        } else {
            BASE64_STANDARD_NO_PAD.encode(&data)
        };
        encoded
    }
}

Here we can use this as follows:

let fake_base64: String = String::dummy_with_rng(&Base64, rng);

Which isn’t super ergonomic but it serves a purpose. We ideally want to implement for Dummy<Faker> because then we can use the cheaply constructed Faker type to generate base64 values.

If we do this our code can look like:

let fake_base64: String = Base64.fake();

Right, so we need to implement Dummy<Faker> for something, and we can’t implement it for a String directly because fake can already generate fake strings. This means we need some intermediate wrapper type and then all our generics should just flow together.

With this in mind I added this to the file after some fiddling:

pub struct Base64Value(pub String);

impl Dummy<Faker> for Base64Value {
    fn dummy_with_rng<R: rand::Rng + ?Sized>(config: &Faker, rng: &mut R) -> Self {
        let s = String::dummy_with_rng(&Base64, rng);
        Base64Value(s)
    }
}

A bunch of the fiddling was whether the previous code should live in this impl and the implementation on String use it or vice versa. I struggled a bit to get that working with the trait bounds and ultimately it doesn’t matter which way round and this way is the path of least resistance.

With these two in place we can look at adding a test. The majority of the fake tests are determinism tests ensuring that with the same seed we get the same faked values. There’s a number of macros to define these but if you look at the file you can just copy the pattern. Our test then looks like so:

#[cfg(feature = "base64")]
mod base64 {
    use fake::base64::*;
    use fake::{Fake, Faker};
    use rand::SeedableRng as _;

    check_determinism! { one fake_base64, Base64Value, Faker }
    check_determinism! { one fake_url_safe_base64, UrlSafeBase64Value, Faker }
}

Add the public re-exports and mod statements matching the patterns as well and we have a working implementation!

The only other thing is to look at the base64 crate and see if there’s any other configurations we should look at supporting. I saw there were engines to generate URL safe base64 so implemented them much the same and then called it a day!

This is the implementation I opened the PR with which was ultimately merged:

use crate::{Dummy, Fake, Faker};
use base64::prelude::*;

#[derive(Clone, Debug, PartialEq, Eq, Ord, PartialOrd, Hash)]
pub struct UrlSafeBase64Value(pub String);

#[derive(Clone, Debug, PartialEq, Eq, Ord, PartialOrd, Hash)]
pub struct Base64Value(pub String);

pub struct Base64;

pub struct UrlSafeBase64;

impl Dummy<Base64> for String {
    fn dummy_with_rng<R: rand::Rng + ?Sized>(_: &Base64, rng: &mut R) -> Self {
        let data: Vec<u8> = Faker.fake_with_rng(rng);
        let padding = Faker.fake_with_rng(rng);
        let encoded = if padding {
            BASE64_STANDARD.encode(&data)
        } else {
            BASE64_STANDARD_NO_PAD.encode(&data)
        };
        encoded
    }
}

impl Dummy<UrlSafeBase64> for String {
    fn dummy_with_rng<R: rand::Rng + ?Sized>(_: &UrlSafeBase64, rng: &mut R) -> Self {
        let data: Vec<u8> = Faker.fake_with_rng(rng);
        let padding = Faker.fake_with_rng(rng);
        let encoded = if padding {
            BASE64_URL_SAFE.encode(&data)
        } else {
            BASE64_URL_SAFE_NO_PAD.encode(&data)
        };
        encoded
    }
}

impl Dummy<Faker> for Base64Value {
    fn dummy_with_rng<R: rand::Rng + ?Sized>(config: &Faker, rng: &mut R) -> Self {
        let s = String::dummy_with_rng(&Base64, rng);
        Base64Value(s)
    }
}

impl Dummy<Faker> for UrlSafeBase64Value {
    fn dummy_with_rng<R: rand::Rng + ?Sized>(config: &Faker, rng: &mut R) -> Self {
        let s = String::dummy_with_rng(&UrlSafeBase64, rng);
        UrlSafeBase64Value(s)
    }
}

And here is a small test program showing it being used including with the derive feature:

use fake::{Dummy, Fake, Faker};
use fake::base64::Base64;

#[derive(Debug, Dummy)]
pub struct FakeMe {
    #[dummy(faker =  "Base64")]
    base64: String,
}


fn main() {
    let f: FakeMe = Faker.fake();
    println!("{:?}", f);

    let fake_string: String = Base64.fake();
    println!("{}", fake_string);

    let fake_string: String = String::dummy(&Base64);
    println!("{}", fake_string);
}

/2025/01/15/adding-a-new-fake-to-the-fake-crate

https://xd009642.github.io/feed

Posts