/* 🤖🛠️ */ — GeistHaus

Mar 2, 2026

this is part 1 of a two-part series on how container images and filesystems work:

what is a container image? (this post)
how does my container get a root filesystem?

intro

my first mental model of a container was: a container is like when your friend wants to send you some files to run, and you put them in a tiny computer so the files work everywhere.

that’s… kind of right, and kind of not. so… if kind of not, then how do container images work?

a container image is not a set of files. it’s an ordered list of filesystems, plus some metadata. container runtimes use union filesystems to stack these filesystems on top of each other.

it’s like a stack of overhead transparencies: you have several sheets and if you layer them, what gets projected looks like a single unified view. the bottom layers are read-only, and the top layer is writable.

why do container images use layers? why bother?

the answer is sharing. because the lower layers are read-only, we can share them between containers.

lots of containers might use the same base image. with layers, that base only needs to exist on disk once, and every container that uses it just stacks its own changes on top.

in this post, we’ll build a minimal two-layer OCI image entirely by hand — no docker, no buildkit — import it into containerd, inspect the internals, and run it. in part 2, we’ll see how a container image gets unpacked into the root filesystem of a running container, and see how the layer sharing works in practice.

table of contents

prerequisites
what is a container image?
create the filesystem for each layer
package layers as tarballs
assemble the OCI image layout
import into containerd
overlayfs, snapshots & mounts
inspect containerd state
run the container
summary
appendix

0. prerequisites

to follow along, you’ll need a linux machine with:

containerd and runc (the container runtime)
ctr (containerd’s CLI)
jq, tree, tar, gzip, sha256sum, stat
a statically-linked busybox binary (busybox-static on most distros)

let’s set up a working directory and a containerd namespace so we can clean up easily afterwards:

export WORKDIR=~/container-demo
export CTR_NAMESPACE=spelunking
mkdir -p "$WORKDIR"

all ctr commands in this post use --namespace $CTR_NAMESPACE (or the shorthand -n). this keeps our experiments isolated from anything else running on the machine.

1. what is a container image?

a container image is not a single blob of files. it’s a structured bundle of layer filesystems plus metadata describing how they fit together. the OCI image spec defines the format, and it looks like this:

index.json
  └─► manifest
        ├─► config            (diffIDs, cmd, env, ...)
        ├─► layer[0] blob     (base.tar.gz)
        └─► layer[1] blob     (delta.tar.gz)

let’s define those pieces:

layers are tar archives, each containing a filesystem tree. they stack in order — layer 0 is the base, layer 1 is applied on top, and so on.
the config describes how to run the image (command, environment variables, working directory) and lists the diffIDs — the sha256 hash of each layer’s uncompressed tar. diffIDs identify the layer content itself, regardless of compression.
the manifest ties config + layers together. it references each blob by its digest — the sha256 hash of the blob as stored (usually compressed). it also records the size of each blob.
the index (also called the “image index”) is the top-level entry point. it points to one or more manifests (one per platform/architecture).

everything in an OCI image is content-addressed: stored and referenced by its sha256 hash. this means you can verify integrity at every level — if a blob’s hash doesn’t match its expected digest, something is wrong.

we’re going to build all of these pieces by hand.

2. create the filesystem for each layer

our image will have two layers:

base layer: a minimal filesystem with a statically-linked busybox, a hello.txt file, and a config file
delta layer: overrides hello.txt (demonstrating layer shadowing) and adds a whiteout marker to delete the config file (demonstrating layer deletion)

here’s the equivalent Dockerfile for what we’re about to do by hand:

FROM scratch AS base
COPY build/base/ /

FROM base
COPY build/delta/ /

let’s build the directory trees:

# create build directories
mkdir -p "$WORKDIR/build/base" "$WORKDIR/build/delta"

# --- base layer ---
# add busybox (our "distro")
mkdir -p "$WORKDIR/build/base/bin"
cp /bin/busybox "$WORKDIR/build/base/bin/busybox"

# symlink the commands we need to busybox
for cmd in sh ls cat; do
  ln -sf busybox "$WORKDIR/build/base/bin/$cmd"
done

# add some files
echo "hello from base" > "$WORKDIR/build/base/hello.txt"
mkdir -p "$WORKDIR/build/base/data"
echo "base config" > "$WORKDIR/build/base/data/config.txt"

# --- delta layer ---
# shadow hello.txt with new content
echo "hello from delta" > "$WORKDIR/build/delta/hello.txt"

# whiteout marker: tells the runtime to delete data/config.txt
mkdir -p "$WORKDIR/build/delta/data"
sudo mknod "$WORKDIR/build/delta/data/.wh.config.txt" c 0 0

let’s verify the layout:

tree "$WORKDIR/build"

build/
├── base/
│   ├── bin/
│   │   ├── busybox
│   │   ├── cat -> busybox
│   │   ├── ls -> busybox
│   │   └── sh -> busybox
│   ├── data/
│   │   └── config.txt        ("base config")
│   └── hello.txt              ("hello from base")
└── delta/
    ├── data/
    │   └── .wh.config.txt     (whiteout marker — deletes config.txt)
    └── hello.txt              ("hello from delta" — shadows base)

notice two important things in the delta layer:

hello.txt exists in both layers. when these layers are stacked, the delta’s version will shadow the base’s version — just like a transparency placed on top of another.
.wh.config.txt is a whiteout file. the .wh. prefix is a convention defined in the OCI spec. it tells the runtime: ‘in the merged view, pretend config.txt doesn’t exist.’ the file in the base layer isn’t actually deleted — it’s just hidden.

3. package layers as tarballs

each layer in an OCI image is a tar archive (usually gzip-compressed). two hashes matter:

files ──► tar ──► sha256 = DiffID ──► gzip ──► sha256 = Digest
                  (uncompressed)                (compressed)

the DiffID is the sha256 of the uncompressed tar. this is what goes in the image config.
the Digest is the sha256 of the compressed tar (the blob as stored). this is what the manifest uses to reference blobs.

let’s create reproducible tarballs. we pin --mtime and --owner so the archives are deterministic — same input always produces the same hash:

# create tar archives (uncompressed)
tar -C "$WORKDIR/build/base" \
  --sort=name --mtime="2025-01-01 00:00:00" \
  --owner=0 --group=0 --numeric-owner \
  -cf "$WORKDIR/base-layer.tar" .

tar -C "$WORKDIR/build/delta" \
  --sort=name --mtime="2025-01-01 00:00:00" \
  --owner=0 --group=0 --numeric-owner \
  -cf "$WORKDIR/delta-layer.tar" .

# compute DiffIDs (sha256 of uncompressed tar)
BASE_DIFFID="sha256:$(sha256sum "$WORKDIR/base-layer.tar" | cut -d' ' -f1)"
DELTA_DIFFID="sha256:$(sha256sum "$WORKDIR/delta-layer.tar" | cut -d' ' -f1)"

# compress
gzip -kf "$WORKDIR/base-layer.tar"
gzip -kf "$WORKDIR/delta-layer.tar"

# compute Digests (sha256 of compressed tar)
BASE_DIGEST="sha256:$(sha256sum "$WORKDIR/base-layer.tar.gz" | cut -d' ' -f1)"
DELTA_DIGEST="sha256:$(sha256sum "$WORKDIR/delta-layer.tar.gz" | cut -d' ' -f1)"

# record sizes (needed for the manifest)
BASE_SIZE=$(stat -c%s "$WORKDIR/base-layer.tar.gz")
DELTA_SIZE=$(stat -c%s "$WORKDIR/delta-layer.tar.gz")

echo "base  DiffID: $BASE_DIFFID"
echo "base  Digest: $BASE_DIGEST  Size: $BASE_SIZE"
echo "delta DiffID: $DELTA_DIFFID"
echo "delta Digest: $DELTA_DIGEST  Size: $DELTA_SIZE"

we now have four hashes — two DiffIDs and two Digests. we’ll use them in the next step to wire everything together.

4. assemble the OCI image layout

an OCI image on disk is just a directory tree with a specific structure. we need to:

write the oci-layout marker file
place layer blobs in blobs/sha256/
create the image config
create the manifest
create the index

# initialize the OCI layout directory
OCI_DIR="$WORKDIR/oci"
mkdir -p "$OCI_DIR/blobs/sha256"

echo '{"imageLayoutVersion": "1.0.0"}' > "$OCI_DIR/oci-layout"

place the layer blobs

blobs are stored by their digest. the filename is just the hash (without the sha256: prefix):

cp "$WORKDIR/base-layer.tar.gz" "$OCI_DIR/blobs/sha256/${BASE_DIGEST#sha256:}"
cp "$WORKDIR/delta-layer.tar.gz" "$OCI_DIR/blobs/sha256/${DELTA_DIGEST#sha256:}"

create the config

the config describes runtime settings and lists the layer DiffIDs (uncompressed hashes):

CONFIG=$(jq -n \
  --arg base_diffid "$BASE_DIFFID" \
  --arg delta_diffid "$DELTA_DIFFID" \
  '{
    architecture: "amd64",
    os: "linux",
    rootfs: {
      type: "layers",
      diff_ids: [$base_diffid, $delta_diffid]
    },
    config: {
      Cmd: ["/bin/sh"]
    }
  }')

# store config blob by its digest
CONFIG_DIGEST="sha256:$(echo "$CONFIG" | sha256sum | cut -d' ' -f1)"
CONFIG_SIZE=$(echo "$CONFIG" | wc -c | tr -d ' ')
echo "$CONFIG" > "$OCI_DIR/blobs/sha256/${CONFIG_DIGEST#sha256:}"

create the manifest

the manifest ties the config and layer blobs together, referencing everything by digest:

MANIFEST=$(jq -n \
  --arg config_digest "$CONFIG_DIGEST" \
  --argjson config_size "$CONFIG_SIZE" \
  --arg base_digest "$BASE_DIGEST" \
  --argjson base_size "$BASE_SIZE" \
  --arg delta_digest "$DELTA_DIGEST" \
  --argjson delta_size "$DELTA_SIZE" \
  '{
    schemaVersion: 2,
    mediaType: "application/vnd.oci.image.manifest.v1+json",
    config: {
      mediaType: "application/vnd.oci.image.config.v1+json",
      digest: $config_digest,
      size: $config_size
    },
    layers: [
      {
        mediaType: "application/vnd.oci.image.layer.v1.tar+gzip",
        digest: $base_digest,
        size: $base_size
      },
      {
        mediaType: "application/vnd.oci.image.layer.v1.tar+gzip",
        digest: $delta_digest,
        size: $delta_size
      }
    ]
  }')

MANIFEST_DIGEST="sha256:$(echo "$MANIFEST" | sha256sum | cut -d' ' -f1)"
MANIFEST_SIZE=$(echo "$MANIFEST" | wc -c | tr -d ' ')
echo "$MANIFEST" > "$OCI_DIR/blobs/sha256/${MANIFEST_DIGEST#sha256:}"

create the index

the index is the top-level entry point. it points to our manifest:

jq -n \
  --arg manifest_digest "$MANIFEST_DIGEST" \
  --argjson manifest_size "$MANIFEST_SIZE" \
  '{
    schemaVersion: 2,
    manifests: [
      {
        mediaType: "application/vnd.oci.image.manifest.v1+json",
        digest: $manifest_digest,
        size: $manifest_size,
        annotations: {
          "org.opencontainers.image.ref.name": "handroll:latest"
        }
      }
    ]
  }' > "$OCI_DIR/index.json"

let’s verify the final layout:

tree "$OCI_DIR"

oci/
├── blobs/
│   └── sha256/
│       ├── <base layer digest>      (base.tar.gz)
│       ├── <delta layer digest>     (delta.tar.gz)
│       ├── <config digest>          (config JSON)
│       └── <manifest digest>        (manifest JSON)
├── index.json
└── oci-layout

everything is content-addressed. the index points to the manifest by digest, the manifest points to the config and layers by digest. you can verify any blob by hashing it and comparing to its expected digest.

5. import into containerd

now let’s import our hand-built image into containerd. ctr images import expects a tar archive of the OCI layout directory:

# create a tarball of the OCI layout
tar -C "$OCI_DIR" -cf "$WORKDIR/handroll-image.tar" .

# import into containerd
sudo ctr -n "$CTR_NAMESPACE" images import --base-name handroll "$WORKDIR/handroll-image.tar"

# verify it's there
sudo ctr -n "$CTR_NAMESPACE" images ls

you should see docker.io/library/handroll:latest in the output. containerd has stored the blobs in its content store and unpacked the layers into snapshots.

6. overlayfs, snapshots & mounts

before we inspect what containerd did, let’s build a mental model of how overlayfs works.

overlayfs is a union filesystem. it takes a stack of directories and presents them as one merged view:

┌──────────────────────────────────────┐
│          merged view (rootfs)        │  ← what the container sees
├──────────────────────────────────────┤
│       upperdir (writable layer)      │  ← runtime writes go here
├──────────────────────────────────────┤
│  lowerdir[1]: delta snapshot         │  ← hello.txt = "hello from delta"
├──────────────────────────────────────┤
│  lowerdir[0]: base snapshot          │  ← hello.txt = "hello from base"
└──────────────────────────────────────┘

lowerdirs are read-only. these are the image’s layers, unpacked from the tarballs into snapshot directories.
the upperdir is writable. when a container writes a file, it goes here. when the container is destroyed, this directory is deleted — that’s why writes inside a container don’t persist.
the merged directory is what the container actually sees: a single unified view where the kernel resolves conflicts by picking the topmost layer.

here’s the full chain from what we’ve built so far to a running container:

we created filesystem trees (directories with files)
we packaged them into tarballs
we assembled those tarballs + metadata into an OCI image layout
containerd imported the blobs into its content store and unpacked them into snapshot directories
when containerd starts a container, it creates an overlay mount with a writable layer on top of the read-only layers
the result: a single merged filesystem that looks “normal” to the container

7. inspect containerd state

let’s see what containerd actually did when we imported the image.

we’ll look a LOT more at all these details in part 2.

content store

the content store holds all the raw blobs — layer tarballs, config, manifest:

sudo ctr -n "$CTR_NAMESPACE" content ls

you’ll see entries matching the digests we computed earlier. containerd stored our blobs verbatim.

snapshots

the snapshotter unpacked the layer tarballs into directories. these unpacked directories are called snapshots. each layer gets its own snapshot, and snapshots have parents. that is just a way to track where in the filesystem list each filesystem is:

sudo ctr -n "$CTR_NAMESPACE" snapshots ls

you should see two snapshots — one for each layer. let’s look at them on disk:

# find the snapshot directories
SNAPSHOTS_ROOT="/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots"
sudo ls "$SNAPSHOTS_ROOT"

each numbered directory contains an fs/ subdirectory with the unpacked layer contents:

# inspect each snapshot's contents
for snap in $(sudo ls "$SNAPSHOTS_ROOT"); do
  echo "--- snapshot $snap ---"
  sudo ls "$SNAPSHOTS_ROOT/$snap/fs/"
  if sudo test -f "$SNAPSHOTS_ROOT/$snap/fs/hello.txt"; then
    echo "hello.txt = $(sudo cat "$SNAPSHOTS_ROOT/$snap/fs/hello.txt")"
  fi
done

the snapshots are just directories on disk with the unpacked layer contents. no overlays yet — that happens when we run a container.

8. run the container

let’s run a container from our hand-built image and explore what happens:

sudo ctr -n "$CTR_NAMESPACE" run --rm -t docker.io/library/handroll:latest demo /bin/sh

the merged view

inside the container, you see a single unified filesystem:

ls /
# -> bin/  data/  dev/  etc/  hello.txt  proc/  sys/

it looks like a “normal” filesystem. the layering is invisible.

layer shadowing

cat /hello.txt
# -> hello from delta

the delta layer’s hello.txt shadows the base layer’s version. the base version still exists on disk in its snapshot directory — it’s just hidden in the merged view.

the base layer’s other content is still visible:

ls /bin/
# -> busybox  cat  ls  sh

whiteout deletion

ls /data/
# (empty — config.txt has been "deleted" by the whiteout marker)

remember the .wh.config.txt file we created in the delta layer? the container runtime processed it during overlay setup: data/config.txt from the base layer is hidden in the merged view. the file is still physically present in the base snapshot — it’s just invisible from inside the container.

let’s prove that. from the host, look at the base layer’s snapshot directory:

# the base snapshot still has everything
for snap in $(sudo ls "$SNAPSHOTS_ROOT"); do
  if sudo test -f "$SNAPSHOTS_ROOT/$snap/fs/data/config.txt"; then
    sudo cat "$SNAPSHOTS_ROOT/$snap/fs/hello.txt"
    # -> hello from base
    sudo cat "$SNAPSHOTS_ROOT/$snap/fs/data/config.txt"
    # -> base config
  fi
done

both files are right there on disk. the overlay just hides them from the container’s view.

inspect the overlay mount

cat /proc/1/mountinfo | grep overlay

you’ll see something like:

... overlay overlay rw,lowerdir=<snapshot2>/fs:<snapshot1>/fs,upperdir=<active>/fs,workdir=<active>/work ...

this is the actual kernel mount that produces the merged view. you can see:

lowerdir lists the read-only snapshots (delta first, then base — order matters!)
upperdir is the writable directory for this container’s lifetime
workdir is used internally by overlayfs for atomic operations

summary

we’ve traced the full path from raw files to a running container:

filesystem trees
    ↓
tarball layers (tar + gzip)
    ↓
OCI image layout (blobs + config + manifest + index)
    ↓
containerd content store (blobs stored by digest)
    ↓
snapshots on disk (layers unpacked into directories)
    ↓
overlayfs mount (lowerdirs + upperdir = merged view)
    ↓
running container process

we saw that a container image is just some file trees and metadata. we saw how those are ingested & set up by containerd into a running container.

the interesting beating heart here is that this is all a story about layered filesystems. every piece of this chain exists to get a stack of directory trees merged into a single view that a process can use as its root filesystem.

in part 2, we’ll look at how all this machinery actually works. we build images with shared layers, trace how containerd’s prepare-apply-commit loop unpacks them, inspect writable layers and pivot_root, and see how layer sharing saves disk space.

appendix layers are diffs

if a file is byte-for-byte identical in two layers — same content, same metadata — build tooling won’t include it in the upper layer’s tarball. since we’re using overlay mounts, a file in a lower layer is already visible in the merged view, so there’s no reason to duplicate it.

each layer only contains what changed relative to the layers below it.

cleanup

to clean up everything we created:

# remove the container (if still running)
sudo ctr -n "$CTR_NAMESPACE" tasks kill demo 2>/dev/null
sudo ctr -n "$CTR_NAMESPACE" containers rm demo 2>/dev/null

# remove the image
sudo ctr -n "$CTR_NAMESPACE" images rm docker.io/library/handroll:latest

# remove the working directory
rm -rf "$WORKDIR"

https://anniecherkaev.com/what-is-a-container-image

how does my container get a root fs?

Mar 2, 2026

this is part 2 of a two-part series on how container images and filesystems work:

what is a container image?
how does my container get a root filesystem?

intro

in part 1, we built a two-layer OCI image by hand, imported it into containerd, and ran it. we saw that a container image is an ordered list of filesystem layers plus metadata, and that the container runtime merges them into a single view using overlayfs.

in this post, we’ll go deeper. we’ll build two images that share a base layer, then trace exactly how containerd unpacks them — the overlay mount mechanics, the prepare-apply-commit loop, writable layers, and pivot_root. by the end, you’ll understand the full chain from a downloaded image to a running container’s root filesystem.

the reason containers use these layered filesystems is sharing. lots of containers might use the same base (like ubuntu), and with layers that base only needs to exist on disk once. the ordering of the lower layers matters — it’s a list, not a set. we’ll refer to layers having “parent” layers.

the container runtime first figures out how to extract the layers and how they relate to one another.

then, when it’s ready to create a running container, it creates an overlay mount by passing in a writable layer plus all the read-only layers. this tells the kernel to treat all the layers as part of the same overlay filesystem, so when you look at it, it looks like a “normal” filesystem — the kernel stitches it together for you behind the scenes.

finally, the container runtime “pivots” the root within the container to this newly created overlay mount. because of this, when you enter a container, all you see is the unified filesystem view, and the view of the host’s filesystem is gone.

we’ll look at all of this through hands-on demos below.

table of contents

prerequisites
why a union filesystem?
containerd components
hands-on with overlay mounts
the prepare-apply-commit loop
writable layer
pivot_root
summary a. appendix a: layer sharing b. appendix b: volumes

0. prerequisites

same setup as part 1 — you need a linux machine with containerd, docker, ctr, jq, and tree. see the demo-instance-cdk for a preconfigured environment.

build & import two images

to demonstrate layer sharing and the unpack loop, we need two images that share a base layer. i’ll make one, and my bff dasha will make one. dasha’s is a little less fancy with only two layers (sorry dash) but we’ll use it to demonstrate how we can share layers.

annie’s image (3 layers — busybox base + 2 RUN layers):

FROM busybox

RUN echo "hello from layer 2" > /hello.txt \
  && mkdir -p /data \
  && echo "layer2 config" > /data/config.txt

RUN echo "hello from layer 3" > /hello.txt \
  && echo "i only exist in layer 3" > /layer3.txt

dasha’s image (2 layers — same busybox base + 1 RUN layer):

FROM busybox

RUN echo "hi dasha" > /hi-dasha.txt

both images share the same busybox base layer. let’s build and import them:

export WORKDIR=~/container-demo
export CTR_NAMESPACE=spelunking

# build with docker
docker build -t annies-image -f Dockerfile.annie .
docker build -t dashas-image -f Dockerfile.dasha .

# export as tarballs
docker save annies-image -o "$WORKDIR/annies-image.tar"
docker save dashas-image -o "$WORKDIR/dashas-image.tar"

# import annie's image into containerd
# (we'll import dasha's later for the layer sharing demo)
sudo ctr -n "$CTR_NAMESPACE" images import "$WORKDIR/annies-image.tar"

# verify
sudo ctr -n "$CTR_NAMESPACE" images ls

1. why a union filesystem?

union fs let containers share layers.

if you have containers that share the same base layers, you can re-use them directly. this saves on network bandwidth and disk, and lets containers startup faster once they hit a node.

as far as i can tell, this was the default choice for container filesystems from the get-go. it is, however, not the only option. containerd supports pluggable “snapshotters” — you could use one that doesn’t do layering at all (like the native snapshotter, which just copies files).

the downside: because layers are shared, you can’t just untar everything into a single directory and call it done. you need machinery to track which layers exist, how they relate to each other, and how to mount them. that’s what containerd’s unpack pipeline does.

2. containerd components

containerd’s image-to-filesystem pipeline has a few key components:

┌─────────────────────────────────────────────────────────┐
│                     content store                       │
│              (raw blobs: tarballs, configs)             │
└──────────────────────┬──────────────────────────────────┘
                       │
                       ▼
              ┌──────────────────┐
              │     unpacker     │
              │                  │
              │  for each layer: │
              │  ┌─► prepare ───┐│
              │  │   apply      ││
              │  │   commit ◄───┘│
              │  └───────────────│
              └────────┬─────────┘
                       │
                       ▼
┌─────────────────────────────────────────────────────────┐
│                    snapshotter                          │
│          (unpacked layer dirs on disk, chained          │
│           via parent relationships)                     │
└─────────────────────────────────────────────────────────┘

content store: where downloaded blobs live. the raw layer tarballs, image configs, and manifests, all stored by digest. this is the “what was downloaded” storage.

snapshotter: manages the “what’s on disk” storage — one directory per unpacked layer. the overlayfs snapshotter stores them under /var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/. each snapshot knows its parent, forming a list that mirrors the image’s layer ordering.

unpacker + applier: the orchestration logic that reads blobs from the content store and unpacks them into snapshots. for each layer, it runs the prepare-apply-commit loop (more on this below).

at this point, no layers are mounted or merged. the snapshots are just directories on disk. the overlay mount that creates the unified view happens later, when you actually run a container.

3. hands-on with overlay mounts

before we trace containerd’s unpack loop, let’s build an overlay mount from scratch — no containers, just raw linux filesystem calls.

what is overlayfs?

overlayfs is a kernel filesystem that layers directories on top of each other. you give it a stack of read-only “lower” directories and one writable “upper” directory, and it presents a “merged” directory that looks like all of them combined.

build a tiny overlay

# create the directories
OVERLAY_DIR=$(mktemp -d)
mkdir -p "$OVERLAY_DIR"/{lower1,lower2,upper,work,merged}

# populate the lower layers
echo "from lower1" > "$OVERLAY_DIR/lower1/unique-to-lower1.txt"
echo "from lower1" > "$OVERLAY_DIR/lower1/shared.txt"

echo "from lower2" > "$OVERLAY_DIR/lower2/unique-to-lower2.txt"
echo "from lower2" > "$OVERLAY_DIR/lower2/shared.txt"  # shadows lower1's version

# mount the overlay
sudo mount -t overlay overlay \
  -o "lowerdir=$OVERLAY_DIR/lower2:$OVERLAY_DIR/lower1,upperdir=$OVERLAY_DIR/upper,workdir=$OVERLAY_DIR/work" \
  "$OVERLAY_DIR/merged"

note that lowerdir lists directories from top to bottom — lower2 takes priority over lower1.

before any writes:

┌─────────────────────────────────────────────────────┐
│  merged (mount point)                               │
│    unique-to-lower1.txt = "from lower1"             │
│    unique-to-lower2.txt = "from lower2"             │
│    shared.txt           = "from lower2"             │  (lower2 shadows lower1)
├─────────────────────────────────────────────────────┤
│  upper (empty)                                      │
├─────────────────────────────────────────────────────┤
│  lower2: unique-to-lower2.txt, shared.txt           │
├─────────────────────────────────────────────────────┤
│  lower1: unique-to-lower1.txt, shared.txt           │
└─────────────────────────────────────────────────────┘

explore: reads, writes, shadowing reading: files from both lowers are visible

the kernel checks each layer from top to bottom until it finds the file:

read unique-to-lower1.txt:            read unique-to-lower2.txt:
  upper  (miss)                         upper  (miss)
  lower2 (miss)                         lower2 (hit!) → "from lower2"
  lower1 (hit!) → "from lower1"

cat "$OVERLAY_DIR/merged/unique-to-lower1.txt"   # "from lower1"
cat "$OVERLAY_DIR/merged/unique-to-lower2.txt"   # "from lower2"

shadowing: the topmost layer wins

shared.txt exists in both lower1 and lower2. the kernel finds lower2’s copy first and stops looking:

read shared.txt:
  upper  (miss)
  lower2 (hit!) → "from lower2"
  lower1 (has it, but never reached)

cat "$OVERLAY_DIR/merged/shared.txt"   # "from lower2"

writing a new file: goes to the upper layer

new files are always created in the writable upper layer:

write new-file.txt:
  upper  ← "new file" (created here)
  lower2 (untouched)
  lower1 (untouched)

echo "new file" > "$OVERLAY_DIR/merged/new-file.txt"
ls "$OVERLAY_DIR/upper/"               # new-file.txt appears here

modifying a lower file: copy-up

when you modify a file that lives in a lower layer, the kernel copies it up to upper first, then modifies the copy. the lower original is untouched:

modify unique-to-lower1.txt:
  upper  ← "modified" (copied up, then modified)
  lower2 (untouched)
  lower1 unique-to-lower1.txt = "from lower1" (still intact!)

echo "modified" > "$OVERLAY_DIR/merged/unique-to-lower1.txt"
cat "$OVERLAY_DIR/upper/unique-to-lower1.txt"    # "modified" (copy-up happened)
cat "$OVERLAY_DIR/lower1/unique-to-lower1.txt"   # "from lower1" (unchanged!)

deleting: creates a whiteout in upper

deleting a file doesn’t remove it from the lower layer. instead, the kernel creates a whiteout marker in upper that hides it from the merged view:

delete unique-to-lower2.txt:
  upper  ← .wh.unique-to-lower2.txt (whiteout marker)
  lower2 unique-to-lower2.txt = "from lower2" (still intact!)
  lower1 (untouched)

rm "$OVERLAY_DIR/merged/unique-to-lower2.txt"
ls -la "$OVERLAY_DIR/upper/"           # .wh.unique-to-lower2.txt (whiteout marker)
ls "$OVERLAY_DIR/merged/"              # unique-to-lower2.txt is gone from merged view

after writes:

┌─────────────────────────────────────────────────────┐
│  merged (mount point)                               │
│    unique-to-lower1.txt = "modified"                │  (from upper, copy-up)
│    shared.txt           = "from lower2"             │
│    new-file.txt         = "new file"                │  (from upper)
│    (unique-to-lower2.txt is gone)                   │
├─────────────────────────────────────────────────────┤
│  upper:                                             │
│    unique-to-lower1.txt      "modified"             │
│    new-file.txt              "new file"             │
│    .wh.unique-to-lower2.txt  (whiteout)             │
├─────────────────────────────────────────────────────┤
│  lower2: unique-to-lower2.txt, shared.txt           │  (untouched)
├─────────────────────────────────────────────────────┤
│  lower1: unique-to-lower1.txt, shared.txt           │  (untouched)
└─────────────────────────────────────────────────────┘

key takeaways:

reads fall through: the kernel checks upper first, then lower2, then lower1
writes always go to the upper layer
modifying a lower file triggers a “copy-up” — the file is copied to upper, then modified there. the lower original is untouched
deleting creates a whiteout marker in upper. the lower file still exists, but the merged view hides it

cleanup:

sudo umount "$OVERLAY_DIR/merged"
rm -rf "$OVERLAY_DIR"

how this relates to containers

containerd’s snapshotter unpacks each image layer into its own directory under snapshots/<n>/fs/. these become the lowerdirs. when a container starts, containerd creates one more directory as the writable upper layer, then mounts everything together.

one detail: if an image has only one layer, containerd uses a bind mount instead of an overlay mount, since there’s nothing to merge.

4. the prepare-apply-commit loop

when containerd imports an image, it unpacks each layer through a three-step loop:

for each layer in the image:

  ┌──────────┐      ┌─────────┐      ┌─────────┐
  │ prepare  │─────►│  apply  │─────►│ commit  │
  │          │      │         │      │         │
  │ create a │      │ untar   │      │ mark as │
  │ staging  │      │ layer   │      │ ready   │
  │ dir      │      │ into it │      │         │
  └──────────┘      └─────────┘      └─────────┘

let’s define each step:

prepare: the snapshotter creates a new staging directory. if this layer has a parent (i.e., it’s not the base layer), the staging directory is set up with the parent’s snapshot as its lower layer — so the apply step can see files from previous layers. this matters because layers are diffs, so you might need to inherit say directory permissions from your parent layer.

apply: the applier untars the layer blob from the content store into the prepared directory. since the directory has visibility into parent layers (via the overlay or bind mount from prepare), the untar can handle things like file ownership inherited from parent layers.

commit: the snapshotter marks the snapshot as committed (read-only). this is a containerd state transition — the staging directory becomes a permanent, immutable snapshot that can be used as a parent for the next layer. note! this “immutable committed snapshot” is a containerd application-level concept: it means containerd will not mutate that directory anymore. that immutability isn’t enforced at the filesystem level, the directory is the same as it was in the previous step.

this loop runs once per layer, building up the snapshot chain:

layer 0 (base):
  prepare → apply busybox.tar → commit
  result: snapshot 1 (busybox files)

layer 1:
  prepare(parent=snapshot 1) → apply layer2.tar → commit
  result: snapshot 2 (layer 2 files, parent=1)

layer 2:
  prepare(parent=snapshot 2) → apply layer3.tar → commit
  result: snapshot 3 (layer 3 files, parent=2)

inspect the results

after importing annie’s 3-layer image, let’s look at what containerd produced:

# list content store blobs
sudo ctr -n "$CTR_NAMESPACE" content ls

# list snapshots — notice the parent chain
sudo ctr -n "$CTR_NAMESPACE" snapshots info <snapshot-name>

# look at the snapshot directories on disk
SNAPSHOTS_ROOT="/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots"
sudo ls "$SNAPSHOTS_ROOT"

# each snapshot has an fs/ directory with the unpacked layer
for snap in $(sudo ls "$SNAPSHOTS_ROOT"); do
  echo "--- snapshot $snap ---"
  sudo ls "$SNAPSHOTS_ROOT/$snap/fs/"
  if sudo test -f "$SNAPSHOTS_ROOT/$snap/fs/hello.txt"; then
    echo "hello.txt = $(sudo cat "$SNAPSHOTS_ROOT/$snap/fs/hello.txt")"
  fi
done

none of these snapshots are mounted yet. they’re just directories. the mounting happens when we start a container.

5. writable layer

when containerd starts a container, it adds one more layer on top: the writable layer (also called the “active” snapshot or upper directory).

┌──────────────────────────────────────┐
│          merged view (rootfs)        │
├──────────────────────────────────────┤
│  upperdir (writable, active snapshot)│  ← container writes go here
├──────────────────────────────────────┤
│  lowerdir[2]: layer 3 snapshot       │  (read-only, committed)
├──────────────────────────────────────┤
│  lowerdir[1]: layer 2 snapshot       │  (read-only, committed)
├──────────────────────────────────────┤
│  lowerdir[0]: busybox snapshot       │  (read-only, committed)
└──────────────────────────────────────┘

let’s see this in action. before we start the container, we can preview what containerd is about to do. ctr snapshots prepare creates the writable layer on top of the committed snapshot chain, and ctr snapshots mounts shows us the exact overlay mount command the runtime will use:

# get the top layer's chain ID (the snapshot name for the topmost committed layer)
TOP_SNAPSHOT=$(sudo ctr -n "$CTR_NAMESPACE" snapshots ls | tail -1 | awk '{print $1}')

# prepare a writable layer on top of the committed chain
sudo ctr -n "$CTR_NAMESPACE" snapshots prepare demo-active "$TOP_SNAPSHOT"

# see what the overlay mount will look like
sudo ctr -n "$CTR_NAMESPACE" snapshots mounts /tmp/demo-mountpoint demo-active

the mounts command prints the exact mount -t overlay invocation containerd will use — you can see the lowerdirs (the committed snapshots) and the upperdir (the new writable layer). this is exactly what happens behind the scenes when ctr run starts a container.

let’s clean up that preview and do it for real:

sudo ctr -n "$CTR_NAMESPACE" snapshots rm demo-active

now let’s start the container:

# run a container in the background
sudo ctr -n "$CTR_NAMESPACE" run -d docker.io/library/annies-image:latest demo-annie /bin/sh -c "sleep 3600"

# find the overlay mount on the host
mount | grep overlay | grep "$CTR_NAMESPACE"
# or:
sudo cat /proc/$(sudo ctr -n "$CTR_NAMESPACE" tasks ls -q | head -1)/mountinfo | grep overlay

you’ll see the mount with upperdir=<path> — that’s the writable layer. let’s write a file from inside the container and find it on the host:

# write a file inside the container
sudo ctr -n "$CTR_NAMESPACE" tasks exec --exec-id test demo-annie /bin/sh -c "echo 'written at runtime' > /runtime-file.txt"

# find it in the upper directory on the host
UPPERDIR=$(mount | grep overlay | grep "$CTR_NAMESPACE" | grep -oP 'upperdir=\K[^,]+')
sudo cat "$UPPERDIR/runtime-file.txt"
# -> written at runtime

the file only exists in the upper directory. the lower snapshots are untouched.

now kill the container:

sudo ctr -n "$CTR_NAMESPACE" tasks kill demo-annie
sudo ctr -n "$CTR_NAMESPACE" containers rm demo-annie

the writable layer is gone. we can verify — the upperdir we found earlier no longer exists:

sudo ls "$UPPERDIR" 2>&1
# -> ls: cannot access '...': No such file or directory

but the read-only snapshots are still there — they belong to the image, not the container:

SNAPSHOTS_ROOT="/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots"
sudo ls "$SNAPSHOTS_ROOT"

# verify snapshot contents are still intact
for snap in $(sudo ls "$SNAPSHOTS_ROOT"); do
  if sudo test -f "$SNAPSHOTS_ROOT/$snap/fs/hello.txt"; then
    echo "snapshot $snap: $(sudo cat "$SNAPSHOTS_ROOT/$snap/fs/hello.txt")"
  fi
done

the committed snapshots stick around as long as the image is imported. only the writable upper layer is ephemeral — it lives and dies with the container.

that’s why when you write a file inside a container — say, in the root directory — it doesn’t persist after the container stops. the upper directory is tied to the container’s lifetime, while the lower layers are tied to the image’s lifetime.

6. pivot_root

we have our overlay mount producing a merged filesystem. but when you exec into a container, that merged view is all you see. the host’s filesystem is completely gone. what’s up with that?

the answer is pivot_root.

pivot_root is a linux syscall that swaps the root filesystem of a process’s mount namespace. the container runtime:

creates a new mount namespace for the container (via unshare or clone)
mounts the overlay at a temporary location
calls pivot_root to make the overlay mount the new /
unmounts the old root

before pivot_root:
┌─────────────────────────────┐
│  /  (host root)             │
│  ├── /home/...              │
│  ├── /var/lib/containerd/...│
│  └── /tmp/container-root/   │  ← overlay mounted here
│       ├── bin/              │
│       ├── hello.txt         │
│       └── ...               │
└─────────────────────────────┘

after pivot_root:
┌─────────────────────────────┐
│  /  (container root)        │  ← was /tmp/container-root/
│  ├── bin/                   │
│  ├── hello.txt              │
│  └── ...                    │
│  (host root is gone!)       │
└─────────────────────────────┘

see it in action

from inside a running container, you can verify the overlay mount is the root:

# start a container
sudo ctr -n "$CTR_NAMESPACE" run --rm -t docker.io/library/annies-image:latest demo-annie /bin/sh

# inside the container:
cat /proc/1/mountinfo | head -5

you’ll see that / is an overlay mount. the container has no visibility into the host’s filesystem — pivot_root made the overlay the entire world.

from the host, you can contrast this with the host’s view:

cat /proc/1/mountinfo | head -5

the host’s PID 1 has a completely different set of mounts. the container’s mount namespace is isolated.

but the host can still peek into the container’s root filesystem — the kernel exposes it via /proc/<pid>/root:

# from the host, find the container's PID
TASK_PID=$(sudo ctr -n "$CTR_NAMESPACE" tasks ls | grep demo-annie | awk '{print $2}')

# peek into the container's root from the host
# this is EXACTLY what we see when we exec into the container
sudo ls /proc/$TASK_PID/root/
# -> bin/  data/  dev/  etc/  hello.txt  layer3.txt  proc/  sys/

sudo cat /proc/$TASK_PID/root/hello.txt
# -> hello from layer 3

this is the same merged overlay view the container sees as /. the kernel just lets the host access it through the proc filesystem. the container itself has no idea — from its perspective, pivot_root made the overlay the entire world.

summary

we’ve now traced the full path from a container image to a running container’s root filesystem:

registry / docker save
        │
        ▼
┌───────────────────────┐
│    content store      │   blobs stored by digest
└───────────┬───────────┘
            │
    prepare / apply / commit
    (once per layer)
            │
            ▼
┌───────────────────────┐
│     snapshotter       │   one directory per layer,
│                       │   chained via parents
└───────────┬───────────┘
            │
    overlay mount
    (lowerdirs + upperdir)
            │
            ▼
┌───────────────────────┐
│    merged rootfs      │   single unified view
└───────────┬───────────┘
            │
    pivot_root
            │
            ▼
┌───────────────────────┐
│  running container    │   overlay is now /
│  process              │
└───────────────────────┘

containers don’t have their own copy of a filesystem. this is a way in which containers differ from VMs: they just have a partitioned view of the host. they have a view of the filesystem — a union of shared, read-only layers plus one ephemeral writable layer, pivoted to become the process’s root. the machinery exists to make sharing efficient and to make the layering invisible to the process inside.

appendix a: layer sharing

annie’s image is already imported. let’s see how many snapshots it created, then import dasha’s and watch sharing in action:

# check current state — only annie's image
sudo ctr -n "$CTR_NAMESPACE" images ls
sudo ctr -n "$CTR_NAMESPACE" snapshots ls

# now import dasha's image — she shares the same busybox base layer
sudo ctr -n "$CTR_NAMESPACE" images import "$WORKDIR/dashas-image.tar"

# check snapshots again
sudo ctr -n "$CTR_NAMESPACE" snapshots ls

you’ll notice that importing dasha’s 2-layer image created only one new snapshot, not two. the busybox base snapshot already existed from annie’s import, so containerd reused it. both images reference the same snapshot as their base layer, because the layer content (and therefore its DiffID) is identical.

annie's image:                  dasha's image:

  snapshot 3 (layer 3)
       │
  snapshot 2 (layer 2)            snapshot 4 (dasha's layer)
       │                               │
       └──────────┬────────────────────┘
                  │
            snapshot 1 (busybox base)     ← SHARED

the content store is also deduplicated — the busybox layer blob is stored only once.

SNAPSHOTS_ROOT="/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots"
sudo ls "$SNAPSHOTS_ROOT"
# you'll see snapshots for each unique layer, not each image

this is why layer ordering matters and why the spec calls layers “diffs.” a layer isn’t a complete filesystem — it’s a delta relative to its parent. the same delta only makes sense if applied on top of the same parent chain. that’s why containerd tracks parent relationships, and why two images can share a layer only if they have the same ancestry up to that point.

appendix b: volumes

if the writable layer is ephemeral, how do volume mounts persist data?

volumes work differently from the overlay. they’re bind mounts — a host directory is directly mounted into the container’s filesystem at a specific path. reads and writes go straight to the host directory. no overlay, no copy-up, no whiteout markers. so you write to the host, not to ephemeral storage.

https://anniecherkaev.com/images-to-fs

choosing learning over autopilot

Jan 11, 2026

I use ai coding tools a lot. I love them. I’m all-in on ai tools. They unlock doors that let me do things that I cannot do with my human hands alone.

But they also scare me.

As I see it, they offer me two paths:

✨ The glittering vision ✨

The glittering vision is they let me build systems in the way that the version of me who is a better engineer would build them. Experimentation, iteration and communication have become cheaper. This enables me to learn by doing at a speed that was prohibitive before. I can make better decisions about what and how to build because I can try out a version and learn where some of the sharp edges are in practice instead of guessing. I can also quickly loop in others for feedback and context. All of this leads to building a better version of the system than I would have otherwise.

☠️ The cursed vision ☠️

The cursed vision is I am lazy, and I build systems of ai slop that I do not understand. There’s a lot of ink spilled about perils and pains of ai slop, especially working on a team that has to maintain the resulting code.

What scares me most is an existential fear that I won’t learn anything if I work in the “lazy” way. There is no substitute for experiential learning, and it accumulates over time. There are things that are very hard for me to do today, and I will feel sad if all of those things feel equally hard in a year, two years, five years. I am motivated by an emotional response to problems I find interesting, and I like problems that have to do with computers. I am afraid of drowning that desire by substituting engaging a problem with semi-conscious drifting on autopilot.

And part of why this is scary to me is that even if my goal is to be principled, to learn, to engage, to satisfy my curiosity with understanding, it is really easy for me to coast with an llm and not notice. There are times when I am tired and I am distracted and I have a thing that I need to get done at work. I just want it done, because then I have another thing I need to do. There are a lot of reasons to be lazy.

So I think the crux here is about experiential learning:

ai tools make it so much easier to learn by doing, which can lead to much better results
but it’s also possible to use them take a shortcut and get away without learning
- I deeply believe that the shortcut is a trap
- I also believe it is harder than it seems to notice and be honest about when I’m doing this

And so, I’ve been thinking about guidelines & guardrails– how do I approach my work to escape the curse, such that llms are a tool for understanding, rather than a replacement for thinking?

Here’s my current working model:

use ai-tooling to learn, in loops
ai-generated code is cheap and not precious; throw it away and start over several times
be very opinionated about how to break down a problem
“textbook” commits & PRs
write my final docs / pr descriptions / comments with my human hands

The rest of the blog post is a deeper look at these topics, in a way that I hope is pretty concrete and grounded.

what tasks are different with ai tooling Things I now get to care less about:

the mechanics of figuring out how things are hooked together
the mechanics of translating pseudocode into code
figuring out what the actual code looks like

The times I’m using ai tools to disengage a problem and go fast are the times I’m only doing the things in this first category and getting away with skipping doing the things in the other two.

Things I cared about before and should still care about:

deciding which libraries are used
how the code is organized: files & function signatures
leaving comments that explain why something is set up in a way if there’s complication behind it
leaving docs explaining how things work
understanding when I need to learn something more thoroughly to get unblocked

Things I now get to care about that were expensive before:

more deeply understanding how a system works
adding better observability like nicely structured outputs for debugging
running more experiments

The times when I’m using ai tools to enhance my learning and understanding I’m doing the things in the latter two categories.

I will caveat that the appropriate amount of care and effort in an implementation depends, of course, on the problem and context. More is not always better. Moving slow can carry engineering risk and I know from experienced that it’s possible for a team to mistake micromanagement for code quality.

I like to work on problems somewhere in the middle of the “how correct does this have to be” spectrum and so that’s where my intuition is tuned to. I don’t need things clean down to the bits, but how the system is built matters so care is worth the investment.

workflow

Here is a workflow I’ve been finding useful for medium-sized problems.

Get into the problem: go fast, be messy, learn and get oriented

Research & document what I want to build
1. I collab with the ai to dump background context and plans into a markdown file
  1. The doc at this stage can be rough
  2. A format that I’ve been using:
    1. What is the problem we’re solving?
    2. How does it work today?
    3. How will this change be implemented?
Build a prototype
1. The prototype can be ai slop
2. Bias towards seeing things run & interacting with them
Throw everything away. Start fresh, clean slate
1. It’s much faster to build it correctly than to fix it

Formulate a solution: figure out what the correct structure should be

Research & document based on what I know from the prototype
1. Read code, docs and readmes with my human eyes
2. Think carefully about the requirements & what causes complication in the code. Are those hard or flexible (or imagined!) requirements?
Design what I want to build, again
Now would be a good time to communicate externally if that’s appropriate for the scope. Write one-pager for anyone who might want to provide input.
Given any feedback, design the solution one more time, and this time polish it. Think carefully & question everything. Now is the time to use my brain.
1. Important: what are the APIs? How is the code organized?
2. Important: what libraries already exist that we can use?
3. Important: what is the iterative implementation order so that the code is modular & easy to review?
Implement a skeleton, see how the code smells and adjust
Use this to compile a final draft of how to implement the feature iteratively
Commit the skeleton + the final implementation document

Implement the solution: generate the final code

Cut a new branch & have the ai tooling implement all the code based on the final spec
If it’s not a lot of code or it’s very modular, review it and commit each logical piece into its own commit / PR
If it is a lot of code, review it, and commit it as a reference implementation
1. Then, rollback to the skeleton branch, and cut a fresh branch for the first logic piece that will be its own commit / PR
2. Have the ai implement just that part, possibly guided by any ideas from seeing the full implementation
For each commit, I will review the code & I’ll have the ai review the code
I must write my own commit messages with descriptive trailers

One of the glittering things about ai tooling is that it’s faster than building systems by hand. I maintain that even with these added layers of learning before implementing, it’s still faster than what I could do before while giving me a richer understanding and a better result.

Now let me briefly break out the guidelines I mentioned in the intro and how they relate to this workflow.

learning in loops

There are a lot of ways to learn what to build and how to build it, including:

Understanding the system and integrations with surround systems
Understanding the problem, the requirements & existing work in the space
Understanding relationships between components, intended use-cases and control flows
Understanding implementation details, including tradeoffs and what a MVP looks like
Understanding how to exercise, observe and interact with the implementation

I’ll understand each area in a different amount of detail at different times. I’m thinking of it as learning “in loops” because I find that ai tooling lets me quickly switch between breadth and depth in an iterative way. I find that I “understand” the problem and the solution in increasing depth and detail several times before I build it, and that leads to a much better output.

I think there two pitfalls in these learning loops: one feeling like I’m learning when I’m actually only skimming, and the other is getting stuck limited on what the ai summaries can provide. One intuition I’ve been trying to build is when to go read the original sources (like code, docs, readmes) myself. I have two recent experiences top-of-mind informing this:

In the first experience, a coworker and I were debugging a mysterious issue related to some file-related resource exhaustion. We both used ai tools to figure out what cli tools we had to investigate and to build a mental model of how the resource in question was supposed to work. I got stuck after getting output that seemed contradictory, and didn’t fit my mental model. My coworker got to a similar spot and then took a step out of the ai tooling to go read the docs about the resource with their human eyes. That led them to understand that the ai summary wasn’t accurate: it had missed some details that explained the confusing situation we were seeing.

This example really sticks out in my memory. I thought I was being principled rather than lazy by building my mental model of what was supposed to be happening, but I had gotten mired in building that mental model second-hand instead of reading the docs myself.

In the second experience, I was working on a problem related to integrating with a system that had a documented interface. I had the ai read & summarize the interface and then got into the problem in a way similar to the first step of the workflow I described above. I was using that to formulate an idea of what the solution should be. Then I paused to repeat the research loop but with more care: I read the interface with my human eyes– and found the ai summary was wrong! It wasn’t a big deal and I could shift my plans, but I was glad to have learned to pause and take care in validating the details of my mental model.

ai-generated code is throw-away code

I had a coworker describe working with ai coding tools like working on a sculpture. When they asked it to reposition the arm, it would accidentally bump the nose out of alignment.

The way I’m thinking about it now, it’s more like: instead of building a sculpture, I’m asking it to build me a series of sculptures.

The first one is rough-hewn and wonky, but lets me understand the shape of what I’m doing.

The next one or two are just armatures.

The next one might be a mostly functional sculpture on the latest armature; this lets me understand the shape of what I’m doing with much higher precision.

And then finally, I’ll ask for a sculpture, using the vetted armature, except we’ll build it one part at a time. When we’re done with a part, we’ll seal it so we can’t bump it out of alignment.

A year ago, I wasn’t sure if it was better to try to fix an early draft of ai generated code to be better, or to throw it out. Now I feel strongly that ai-generated code is not precious, and not worth the effort to fix it. If you know what the code needs to do and have that clearly documented in detail, it takes no time at all for the ai to flesh out the code. So throw away all the earlier versions, and focus on getting the armature correct.

Making things is all about processes and doing the right thing at the right time. If you throw a bowl and that bowl is off-center, it is a nightmare to try to make it look centered with trimming. If you want a centered bowl then you must throw it on-center. Same here, if you want code that is modular and well structured, the time to do that is before you have the ai implement the logic.

“textbook” commits and PRs

It’s much easier to review code that has been written in a way where a feature is broken up into an iteration of commits and PRs. This was true before ai tooling, and is true now.

The difference is that writing code with my hands was slow and expensive. Sometimes I’d be in the flow and I’d implement things in a way that was hard to untangle after the fact.

I believe that especially if I work in the way I’ve been describing here, ai code is cheap. This makes it much easier/cheaper for me to break apart my work into ways that are easy to commit and review.

My other guilty hesitation before ai tooling was I never liked git merge conflicts and rebasing branches. It was confusing and had the scary potential of losing work. Now, ai tooling is very good at rebasing branches, so it’s much less scary and pretty much no effort.

I also think that small, clean PRs are an external forcing function to working in a way that builds my understanding rather than lets me take shortcuts: if I generate 2.5k lines of ai slop, it will be a nightmare to break that into PRs.

i am very opinionated about how to break down a problem

I’m very opinionated in breaking down problems in two ways:

how to structure the implementation (files, functions, libraries)
how to implement iteratively to make clean commits and PRs

The only way to achieve small, modular, reviewable PRs is to be very opinionated about what to implement and in what order.

Unless you’re writing a literal prototype that will be thrown away (and you’re confident it will actually be thrown away), the most expensive part about building a system is the engineering effort that will go into maintaining it. It is, therefore, very worth-while to be opinionated about how to structure the code. I find that the ai can do an okay job at throwing code out there, but I can come up with a much better division and structure by using my human brain.

A time I got burned by not thinking about libraries & how to break down a problem was when I was trying to fix noisy errors due to a client chatting with a system that had some network blips. I asked an ai model to add rate limiting to an existing http client, which it did by implementing exponential backoff itself. This isn’t a very good solution, surely we don’t need to do that ourselves. I didn’t think this one through, and was glad a coworker with their brain on caught it in code review.

i write docs & pr descriptions with my human hands

Writing can serve a few distinct purposes: one is communication, and distinct from that, one is as a method to facilitate thinking. The act of writing forces me to organize and refine my thoughts.

This is a clear smell-test for me: I must be able to write documents that explain how and why something is implemented. If I can’t, then that’s a clear sign that I don’t actually understand it; I have skipped writing as a method of thinking.

On the communication side of things, I find that the docs or READMEs that ai tooling generates often capture things that aren’t useful. I often don’t agree with their intuition; I find that if I take the effort to use my brain I produce documents that I believe are more relevant.

This isn’t to say that I don’t use ai tooling to write documents: I’ll often have ai dump information into markdown files as I’m working. I’ll often have ai tooling nicely format things like diagrams or tables. Sometimes I’ll have ai tooling take a pass at a document. I’ll often hand a document to ai tooling and ask it to validate whether everything I wrote is accurate based on the implementation.

But I do believe that if I hold myself to the standard that I write docs, commit messages, etc with my hands, I both produce higher quality documentation and force myself to be honest about understanding what I’m describing.

Conclusion

In conclusion, I find that ai coding tools give me a glittering path to understand better by doing, and using that understanding to build better systems. I also, however, think there is a curse of using these systems in a way that skips the “build the understanding” part, and that pitfall is subtler than it may seem.

I care deeply about, and think it will be important in the long-run, to leverage these tools for learning and engaging. I’ve outlined the ways I’m thinking about how to do best do this and avoid the curse:

use ai-tooling to learn, in loops
ai-generated code is cheap and not precious; throw it away and start over several times
be very opinionated about how to break down a problem
“textbook” commits & PRs
write my final docs / pr descriptions / comments with my human hands

https://anniecherkaev.com/choosing-learning-over-autopilot

workflows for ai coding

May 28, 2025

its bananas out there.

six months ago ai coding tools were take-them-or-leave-them. now they have foundationally changed my day to day work. and things are changing by the week.

approaching ai coding in mid 2025

here is how i think it’s worth approaching ai coding as of late may 2025:

use the tools
use the tools, better
improve the tools as you use them
improve your workflows for using the tools

there are lots of great guides on steps 0-2 of how to work with ai coding tools, like this one. generally i’m not trying to provide many notes on how to use these tools both because they’ll change very quickly, and because there are already many good guides. but, because i can’t help myself, i included a few brief thoughts in an appendix.

ai coding workflows

mostly though, i want to talk about a few thoughts about step 3, workflows for interacting with ai tooling.

i have an embarrassing secret which is that i was/am a productivity nerd. this means i have spent a lot of time assuaging my fears about how i’m not doing enough things by thinking about how to do more things, rather than actually doing them. in retrospect, this was a bad strategy because the fear of not doing enough things was a proxy for a deeper and scarier fear of not being enough, which, coincidentally, is not something you can fix with really good todo lists.

anyway, because of this, i have a lot of thoughts and opinions about todo lists, and how to take notes, and about attention and focus. a thing that interests me about ai tooling is how strongly ideas about how to organize personal workflows and notes and knowledge systems translate to workflows for working with ai coding tools.

tl;dr

there are 3 things that i want to talk about:

human attention

prompting tools these days generate 1-10 minute gaps in my workflows while they spit out code or text. which sucks.

even as this improves to hours, the way we work is going to have to change drastically.

waiting is really hard. task switching is really expensive. i talk about some more thoughts and things that may help, at least for my brain.

git workflows for managing lots of code changes

ai tooling generates a lot of code, which means we have to manage a lot of code changes now.

i thought i knew git pretty well, but after learning some better workflows, i now think that having better git workflows is going to be crucial for managing huge volumes of code changes. i talk about these workflows.

my feelings about the change we're experiencing

the other reason i’m writing this is it feels to me like we are standing on the precipice between an old world and a new world. software engineering, as a craft, is changing by the moment. it feels cool and exciting and significant and weird to be in the middle of it. i want to write down so i remember what it feels like to be standing here, looking forward and looking back.

human attention span

currently my ai coding tasks run for a small handful of minutes. in the near future, they will run for an hour or several hours, and that will be a worthwhile improvement. now and in the future, this is a very different way of working than iterating on code in a flow state.

so what should we do with the downtime gaps?

presumably the answer is, and will increasingly need to be, work on multiple things at once.

the problem is that, at least in the naive implementation, it is really painful to task switch at that granularity. at least for the way my brain works. trying to swap between tasks whenever i have a minute of downtime feels like the worst kind of pulse width modulation.

wait what was i doing?        _  _   _    __     ____
task b:            _ ___    _         _                        _________    
task a:   __|  |__|     |__|  |____|   |__   |__     |__|__|__ 
time -->

pair programming, by myself

there is a blogpost that i have wanted to write for five years, called pair programming by myself. it’s about how i take notes. i have written several drafts that i’ve never published because when i read them they always sound too silly– the one sentence summary is i have found it absurdly helpful to take notes that are extremely overly detailed. i have a lot of feelings about detailed notes but i also feel ridiculous waxing on about it. so let me tell you the short version now:

in mid-2020, i suddenly stopped getting much work done, in a way that’s probably familiar to you. hello, pandemic. i had a todo list of things i probably should have been doing, but my brain was laser focused on the info about the world– were we all going to die? were our supply chains going to collapse? aside: i love connie willis, but the doomsday book was a not a great choice of bookclub book in April 2020.

and eventually i had a project deadline come up, and thought i really really should get at least, you know, some work done. it was just impossibly hard to focus. and eventually i hacked it for myself with a ridiculously detailed todo list. it had things like:

- open the terminal
- cd into the project directory
- start jupyter notebook
- open the browser
- navigate to the notebook
- run the notebook, make sure it still compiles
- etc

it worked. i got my brain unstuck. it reminds me of uma thurman in kill bill “just wiggle your big toe”. i have a lot of feelings about how well this works for me, and i think this doesn’t generalize to all brains and certainly not to all tasks. my notes aren’t usually literally as detailed as the example above, but to give a sense of scale i write about two pages a day as i work. i still find it very helpful to separate out the planning from the flow state of doing.

it’s not about the notes, it’s about the way the notes enable my thinking. my notes enable me to extend my working memory and become better organized and oriented. i think they also let me stop mentally re-cataloging all the things i need to keep track of, which lets me focus. i also have a lot of feeling about this, and highly recommend reading michael nielsen’s “thought as technology”.

pair programming, stuck in navigator mode

a common division of roles in pair programming is the driver and the navigator. the driver is the person at the keyboard. the other person is the navigator; they’re responsible for having a bigger picture plan in mind and discovering the higher level strategic concerns.

i think of the process of making detailed plans as i work as pair programming by myself.

i have the feeling that as our ai coding models get increasingly better at driving, we’re going to be increasingly in navigator mode.

aside: ai coding models can and do also navigate: they work better if you ask them to plan and align on their plans before they jump into coding. so really we’ll be spending more time in some meta-navigation mode.

so then the question should be: what contexts make it easier to navigate?

pitch: consolidate navigating into a focus block

i suspect that an answer, for me and my brain, is to consolidate navigating.

specifically what im trying is writing out the steps of several tasks for the day at the start of the day. this lets me spend more time in breaking-up-work mode. and then when i switch to detail-oriented mode, i can stay there.

it doesn’t work all the time or for all tasks, but its better for my brain than trying to constantly interrupt myself to switch between two tasks

wait what was i doing?               _     _          _      __
task b:            ________    __     __     __    __     __       __
task a:   _________        |__|  |__|   |__ |  |__|  | __|  |   __| 
time -->

pitch: longer focus blocks

i think the real permenant solution will be longer focus blocks: the models have to run longer without baby-sitting. there are reports of folks already doing this! this just isn’t the case for most things that i do today.

any opportunity to improve how long a model can run uninterrupted is very worth investing in

aside on brains

when i say that this works for my brain but may not work for yours, i do really believe that. brains are different! it blew my mind to learn that people are pretty evenly split between having, and not having, an internal monologue. i have non-stop words in my head all the time, which seems correlated with why it’s so helpful to think in writing via detailed notes.

git skills

if you asked me a year ago if i was good at git, i’d say of course i’m good at git. and then one of my coworkers showed me much better workflows, and my life was much improved.

i now think these skills are pretty foundational for working with ai coding tools

ai coding tools generate a lot of code changes quickly
you have to manage all that code effectively
or else something will break and you wont know when where or why
- keep track of where the nearest exit is, keeping in mind it may be behind you

here are the specific recommendations, or skip ahead to the code below:

branches are free

here is how i used to code:

- write code
- finish writing code
- pick and choose pieces to put into a PR

this is how i write code now:

- create a WIP branch
- have the ai commit every time it finishes a task
- sometimes i add my own commits
	- these are mostly to remind myself that the state is known good, 
	  or known bad
- finish writing code
- cut a new PR branch, from the WIP branch
- blow away all the commits from the PR branch !
	- this seemed REALLY SCARY because i was sure i'd fuck up and 
	  delete files i cared about
	- but all the files i care about are safe&sound on the WIP branch!
- carefully split into commits, using rebase
- carefully split code review iterations across commits, using rebase

but also with a bit of this thrown in:

- WIP branch
- PR branch
- bug fixes on the PR branch
- cut a NEW PR branch
- some refactoring on the new PR branch
- cut a real PR branch for real this time
- use that to open a PR

branches are free!

to manage a proliferation of branches, i have an alias to get recent branches so i remember what i have in front of me. i provide it below but also you can just ask your robot friend to write it for you.

commits are free

i have told the robot how commits should be structured. every time it finishes a task, it commits what happened in that task

i blow away these commits but it’s nice to have these in case it starts down a path that ends up being rotten. then you have context on how “known good” your various checkpoints are.

commands

git checkout -b ac/my-feature-wip
# lots of commits, mostly auto-gen'd
# commit ALL the changes even temporary files

git checkout -b ac/my-feature-pr
git fetch
git reset origin/main # blows the commits away!

alias recent="git for-each-ref --sort=-committerdate refs/heads/ --format='%(refname:short)' | head -n 10"

# make some commits, probably using github desktop for careful line selection
git rebase -i origin/master
# tweak commit messages

# open a PR
# make some code review motivated change
git add <my change>
git commit -m "squash me after commit blah"
git rebase -i origin/master
# rearrange the new commit to be in the right spot, then squash it

what am i missing?

do you have sick git workflows that i should know about? hit me up pls, inquiring minds want to know

conclusion

it feels to me like we’re in the middle of a laddering up on the tech tree, and it happens to be in the tech that is the craft that i practice. that feels extremely special and exciting. it’s a time to review our assumptions about what is possible.

it’s a prudent time to pay attention and understand the new tools available to us, learn them, and imagine how we can leverage them. it’s about unlocking projects and possibilities that were too expensive in effort before.

the two things i’m trying to remember right now, in my attempt to see the ocean from my place on the wave are:

go slower to go faster
change is coming, change is happening, be prepared to change again and again and again

aside

i didn’t use ai to write this blogpost. i get the heebiegeebies from aesthetic homogeneity (looking at you, cooking blogs), and as there are more blogposts that have that corpopolished gleam to them, the more that speaks to my sense of dont tell me what to do. i have a lot of empathy for folks in creative fields who lament ai automating the fun, meaningful, and quality-inducing parts of their jobs. that sucks, i’m not stoked for that future. i want to make things that look like what i want them to look like, and i want you to make things that look like what you want them to look like

appendix: using the tools [better]

[recap] here’s a map of how to approach a relationship with ai tooling:

use the tools
use the tools, better
improve the tools as you use them
improve your workflows for using the tools

some brief notes:

0. use the tools:

- the tools work pretty well out of the box.  just try them.
    - try claude code
    - try roo plugin for vscode

1. use the tools better:

- use better prompts, ie:
	- "don't make assumptions; ask me clarifying questions"
	- "when I do X I see Y issue. list possible reasons why 
	   this might be the case, consult with me before implementing."
	- "for solution Z, how will we know if that fixes the bug? 
	write out a plan to validate and consult with me before you start."
- modes: use the right mode for the task
	- in roo: "ask", "debug", "code", "architect"
- use the right model for the task
	- larger context models do better at "ask" and "architecture" tasks
- understand what you can use the tools to do
	- some examples beyond "code", "debug" and "explain":
		- explain the history of a piece of code by looking at git
		- refactor a codebase by first adding tests to constrain 
		  the existing functionality
		- automate gathering data for a report
		- operational tasks: monitor deploys, ingest alerts and triage
		- project management
		- code review, incl explaining what a commit does, and
		  flagging complicated issues like async thread coordination

2. collaborate with the tools to improve themselves, via:

- save prompts, and tune them
	- [claude.md](https://docs.anthropic.com/en/docs/claude-code/memory), 
	   .roo/01_git_instructions.md
- turn crib sheets into saved prompts
	- you know the random notes you have in some txt file somewhere 
	on how to do something? smells like a thing you should automate
- architect first, then implement
	- "we need to implement feature X. Make a plan for how to do that, 
	write it down and ask me to review"
	- work with external systems and libraries via mcp servers & summaries
	- sessions
		- have it take notes on what it needs to do, and what it has 
		done so you can split the work across context windows
- **go slower to go faster**
	- i need this on a sticky note on my desk
	- taking the time to make improvements to the tools pays dividends, 
	and i think we're in a moment where the dividends are huge. 
	improving the tools will quickly become a non-optional step

3. improve your workflows for using the tools

- <the rest of the article>
- be prepared to change, repeatedly
- once things feel stable, coordinate with your team. [work together, be strong](https://qupqugiaq.com/)

https://anniecherkaev.com/the-times-they-are-ai-changing

project management preflight checklist

Jan 27, 2024

I think of “project management” as everything related to committing that a particular scope of work will be done by a particular date, and then increasing the odds that it actually gets done as promised.

I have strong feelings about project management because working on a project that is going poorly sucks. Having an opinion as early in the project as possible about what is (and is not) valuable to do, and how, and when, is, in my experience, the only avenue for minimizing probable future suffering.

It’s easier said than done. Projects of every kind, scale and media go off the rails all the time. My favorite fiasco is the Sydney Opera House which was a decade late and 1300% over budget.

In my opinion, the main culprits are:

there are so many different possible complications. It’s impossible to account for everything that can go wrong and how likely / impactful each complication is
social / political pressure to make overly-optimistic estimates
failure to adjust gracefully to changing realities

The goal isn’t to have every project go without a hitch, but instead to improve the odds that a project goes as well as possible given the resources and context of a given situation. This includes both building the best thing given the circumstances, and leaving everyone involved without a sour taste in their mouth.

Below are some principles that guide my thinking about project management and questions I find useful to ask along the way.

I continue to find new and exciting ways to be wrong about how long and difficult a project will be, but I think the rate at which I’m being surprised is decreasing. I’m just hoping that by the time I build a national opera house, the timeline will slip by less than a decade.

Principles

Do not build the wrong thing

A week of coding can save you an hour of planning.

Q: How do you know whether you are building the right thing?

A: Experiment and iterate

best approach: get an MVP in front of users
- this also lets you fail fast when your hypothesis turns out to be wrong
next best: find out from stakeholders what concerns they want to address. ie, product research or direction from customers
- not necessarily reliable; pay more attention to what the underlying concerns are rather than the specifics
- do not assume you know or understand what they want; keep asking questions
worst approach: form your own opinions about what is useful and build that in one go
- if you want accurate hypotheses, you must look for anti-evidence

Build incrementally in priority order

always work on the most important thing
at every point imagine that you got blind-sided by another project and no longer have time to work on this current project. Another way to say this is, imagine that instead of the timeline you thought you had, you now only have until Tuesday. Have you built the most core important thing thus far?
- don’t hang up the paintings until you’ve finished the dry-wall
this is another flavor of “do not build the wrong thing”

Front-load risk by prioritizing uncertain tasks first

if the plan hinges on a step you’re unsure will work, prove that part out first
- otherwise, if it turns out not to work, all the effort leading up to it may need to be thrown out as you pivot to another approach
if you’re not sure if the plan has steps that might not work, figure that out ASAP

Set expectations & re-negotiate as things change

not “if things change” but “as things change” because they will
if there’s a non-negotiable scope, engineers must be able to define the timeline; if there’s a non-negotiable deadline then engineers must be able to define the scope. (“you tell me what, I tell you when; you tell me when, I tell you what”)
every situation is negotiable; situations that are presented as non-negotiable are ones where someone is choosing not to negotiate
you will get new requirements after the project starts.
- it’s not a question of “if” you can add a new feature to the execution plan, but whether it is worth removing something else from the plan to make time
- time is bounded and non-expandable. You cannot add more features by working more. Hoping that there will be enough extra time is not a strategy.
if a deadline slips, involve stakeholders in renegotiations
- this builds buy-in on a new plan given the reality of the situation
keep detailed notes

Plan only when it’s worth reducing uncertainty

if you can tolerate higher uncertainty, executing is a better use of time than planning
it’s worth investing in planning if any of these things are true:
- real consequences to missing the original delivery date
- social / political pressure to commit to a tight deadline
- limited ability to renegotiate after initial timeline commitment

If you need a solid estimate, break down the project into smaller tasks

given a solid estimate and acceptable uncertainty, don’t sweat the details
- you can’t predict the future
- instead of trying to predict the future choose a comfortable cushion
- the amount of uncertainty that is acceptable depends on the situation & on the consequences of failing to meet the deadline

Optimize for maintainability

The hardest part of software engineering is maintaining the system in the future in the face of shifting use cases and unforeseen business directions. That will be more effort than writing the code. Make decisions guided by maintainability rather than ease of implementation.

we can’t predict the future, so bias towards flexibility to allow pivoting in the future
prefer data representations that generalize best
build a system that allows for the data representation to change in the future

Choose to take on tech debt intentionally

some (even many) code quality complaints are not worth addressing
categories of tech debt that must be addressed immediately for a long-lived project:
- it must be clear what the code is doing; code should be verbose
  - test cases are useful as documentation
  - abbreviated variable names are not acceptable. For those following golang’s half-century old ideology of sml vrbl names, may I recommend the marvelous modern tooling that we’ve developed since the typewriter days, such as tab completion
- code must have tests. Untested code is hard to change in the future because you don’t have confidence that your future changes won’t break something. run your tests often / don’t let them go stale / rejoice when they pass & rejoice when they fail
if you find yourself repeating sequences of steps, automate them
- especially if they are error prone

Beware of overpromising & underdelivering

it sounds obvious, but it’s surprisingly tempting to be optimistic
no one likes surprises. Reliable estimates lead to better outcomes than tight deadlines

Pre-flight checklist

Scoping:

what are we doing and why?
- who are the stakeholders?
- what concerns do they hold?
  - what is the root of the result they want?
  - sometimes specific requests obscure a simpler way to achieve the same level of functionality / satisfaction
- how will we evaluate whether what we build fulfills the intended goal
- what if we don’t do this project?
- if we do this, what will we not do instead? Is this more important than whatever is in the “wont do” pile?
what must be in the MVP?
- this is the MVP according to whom?
- for each item:
  - what if we didn’t include it?
  - is there a simpler version that we can include?
what’s missing? Things to consider:
- how will we know the implementation is correct?
- what operational metrics do we care about & how will we get them?
- what will we do when we find a bug in production?
- what are the known error conditions?
- how will we launch / rollout the project?
for each item in MVP:
- how much uncertainty is there? How possible are time-consuming surprises?
  - is it worth building any proof of concepts?
- break down large or uncertain tasks into finer tasks
  - trade-off between effort spent up-front vs taking it as it comes; how much it is worth putting effort into this depends on how important it is to give a reliable estimate
- any interactions with other systems to consider?

Sanity-checking the estimate:

how does this compare to other projects (ie a lot smaller, the same, a bit bigger)?
- does that comparison make sense?
effort audits of past projects
- what was surprising? why?

Execution:

who is working on what?
- are they around?
  - planned vacation or leave?
- do they have other commitments? Is it possible / likely that those go into crunch time?
  - if they have other project work in parallel, how painful will task-switching be?
- do they want to be working on this project? Will it be challenging & interesting to them?
what are the dependencies?
- what are the deepest dependencies? Untangle these first
what might not work? Where do we have uncertainty about how to implement?
- de-risk these first
what new work have we discovered while executing?
- how important is it that we do it immediately?

Scope creep:

does this need to be fixed / built / modified right now?
- useful to be intentional in what tech debt to take out and what to pay down

Remember that shit happens:

technical complications arise
- someone is stuck
- a large flaw or bug is revealed
  - if resolution is not cheap & obvious, explore possible remedies before committing to an approach
external pressures
- people get pre-empted, pulled to other efforts
  - requests related to compliance & security
  - urgent requests from company priorities
  - on-call requires support, ie large or complicated incidents
- code freeze, holidays & deployment / merge schedules may create complicated constraints
- chaos from above: reorgs, layoffs, abrupt changes in direction and priorities

Summary

To recap, the goal of project management is about how to do the best with what you’ve got, both in terms of what you build and doing right by everyone involved.

Summarizing the ideas of how to do that:

be as selective as possible about what to build
get feedback from customers / users early & often
keep everyone apprised and ready to re-prioritize and pivot as the ground shifts
optimistic timelines do not do anyone any favors, but it’s hard not to give optimistic timelines even knowing this

Appendix

All the images here are from midjourney 🤖❤️

thank yous

I don’t know how to attribute all the ideas above to the people who shaped my thinking, but I can at least thank some of them: most emphatically Rohan Ranade & Elise Jiang, as well as Zack Thomsen-Gray, Scott Moore, Tristan Ravitch, Avery Pennarun’s blog on planning and on tech debt, and Martin Nally’s talk on designing quality APIs

https://anniecherkaev.com/project-management-preflight-checklist

writing for lazy readers

Apr 24, 2023

My goal when writing a technical document is to create something that is easy to read.

I am a lazy reader. I skim things. Sometimes I read a blogpost and I can’t recall much except the topic sentence. I try to set up my technical writing so that when lazy readers like me read my writing they recall at least the main idea I’m trying to convey.

As much as I love whimsy, weird fonts, cookbooks with gilt edges and poetry books with unusual bindings, I like my technical writing to be the opposite– concise and literal. Maybe one day I will be a better writer and then I will write design documents in the form of dialogs between cute animals but for now I’m sticking to short sentences & bulleted lists.

Below is a summary of how I think about technical writing. At the bottom there’s a list of my favorite articles on technical writing.

Structure Lazy reading

Here is the formula I learned for reading research papers:

read the abstract + intro
read all the section headers
read the conclusion
look at the graphs
optional: maybe read the paper

I try to write all kinds of documents assuming that my audience is following a similar strategy.

Writing for lazy readers

Here is the analogous formula that I follow when writing:

write a concise intro explaining why the audience should read the rest of the content
use headings and/or bolding to convey the main points at a glance
start each section with a roadmap of what that section will contain
conclude each section with a concise summary
include visuals when possible

This format helps the reader identify and pinpoint which sections they want to read.

Start with an elevator pitch

Here is the formula that I follow for an elevator pitch:

Sentence 1: what is the problem you are trying to solve
Sentence 2: what is the consequence of solving this problem (why should the reader care?)
Sentence 3: a one sentence summary of your approach

If I have thought deeply about something, it is interesting to me. The reader, however, probably won’t find the details interesting apriori. Therefore, if I start by explaining why they should care, they will be more likely to hang in there with me for the rest of the thoughts.

When you explain why the reader should care, you are telling a story. Human brains really like stories, and in particular we are motivated by the stories that we tell ourselves about the consequences of our actions.

In a technical document, you are probably telling a story. When you argue in favor or against an approach, you are telling a story. When you argue to prioritize or deprioritize a piece of work, you are telling a story.

By starting with an elevator pitch, you are offering your reader a role in the story you are telling.

The right level of detail for the expected audience

My ideal outcome as a reader is being told the information I need to know, only at the time that I need to know it. In practice this may mean being told the same concepts several times at increasing levels of details, like iteratively zooming in on a map.

Multi-level summaries are a tool to let readers navigate in approximately this way. Another tool is to push off details into appendices & links / references / footnotes instead of including them in-line.

Format

Here is a list of the formatting standards I find most useful as a reader. See any of the other links on my blog for examples :)

Format lists as lists rather than in-line

ie, don’t do this:

The rest of this document outlines possible approaches to consider in painting the bikeshed which include doing nothing, painting it in rainbow stripes or making it glow in the dark.

do this instead:

The rest of this document outlines possible approaches to consider in painting the bikeshed which include:

doing nothing

painting it in rainbow stripes or

making it glow in the dark

Semi-structured logging; not just for machines

There may be some meta-data you want to provide in your text such as who wrote it and when.

Instead of including this kind of info in-line, extract it into an explicit structured header, ie:

author: anniecherk
date: April 24, 2023
status: bit-rotting

Short sentences, short paragraphs

ie, don’t do this:

This formula can feel hard to follow. If I have thought deeply about some technical thing, I want to tell you about the details of the thing I’ve thought deeply about because it is interesting to me. You as the audience, however, probably don’t care apriori about the thing I want to tell you about.

do this instead:

If I have thought deeply about something, it is interesting to me. The reader, however, probably won’t find the details interesting apriori.

Editing

The above example is verbatim from my first and final draft respectively. My first goal is to get the ideas out in whatever form, and only later make the ideas easier to understand through all the words.

When I edit my writing, my goals are:

refine the text so that the main ideas I want to convey are clear and front & center
rewrite to have more concise sentences, paragraphs & ideas

If the main ideas are not lit up in neon lights, I look for ways to make them more obvious including:

bulleted lists at the beginning and/or end that lay out the take-aways
using bolding or a different text color to highlight key ideas

Sources / Recommended Reading

Matt Might’s advice on how to write emails; also applicable to other types of writing: https://matt.might.net/articles/how-to-email/
Julian Shapiro’s writing handbook, especially the section on re-writing: https://www.julian.com/guide/write/rewriting
Simon Payton Jones on how to write a great research paper; also applicable to other types of writing: https://www.microsoft.com/en-us/research/academic-program/write-great-research-paper/
Dan Luu on why you may consider learning to type faster (tl;dr, typing faster lets you iterate through ideas faster): https://danluu.com/productivity-velocity/
Andy Matuschak on how to write notes that are useful to yourself in the future: https://notes.andymatuschak.org/Evergreen_notes
I can’t find sources now for who taught me how to read research papers & give elevator talks, but sending my thanks into the universe :thanks:

https://anniecherkaev.com/thoughts-on-technical-writing

principles for keyboard layouts

Oct 15, 2022

I spent a few months iterating on a keyboard layout that I now like a lot, so I wanted to share my take-aways about the process and the principles I found useful along the way.

As you see in the top gif, I’m currently using a moonlander keyboard. This makes it extra-easy to iterate on keyboard layouts, however it’s also possible to redefine the keymapping on any keyboard, including the one built into your laptop.

Why a custom keyboard layout? The drama-filled history of dvorak should teach us that it is hard to measure and predict the benefits of layouts that intuitively seem more reasonable. To be honest, I mostly have a custom layout because I like it, and as I hope is evident in the rest of this blogpost, I enjoyed the process of thinking through which keys I type when and setting up the layout. It’s an aesthetic preference, tinged with a little intuition about ergonomics.

I like to think about objects, how they’re made, how I interact with them, and how I can make or customize or modify them. I think it’s fun to get clarity on how well or poorly an object is suited to how you use it.

For instance, here’s a thing I find absurd: I spend a lot of time typing special characters, and on a standard keyboard they’re terribly laid out.

To type special characters I either have to use my pinkies, move my hand off the home row, or both. My accuracy typing special characters is pretty bad in spite of spending a lot of time typing them, and I definitely can’t write code without looking at my keyboard.

This terrible layout isn’t a catastrophe; I can and do type all these poorly laid out special characters on a keyboard with a standard layout. It’s just that it could be so much better with some thought put into it: this is what makes this an aesthetically satisfying exercise.

To give you a visual comparison, here are three clips of hands typing this python function definition. I chose the code sample by taking the popular open source library numpy and choosing a function at random as a decent representation of standard python code. Look at all the special characters!

# Note: cross is the numpy top-level namespace, not np.linalg
def cross(x1: Array, x2: Array, /, *, axis: int = -1) -> Array:
    """
    Array API compatible wrapper for :py:func:`np.cross <numpy.cross>`.
    See its docstring for more information.
    """
    if x1.dtype not in _numeric_dtypes or x2.dtype not in _numeric_dtypes:
        raise TypeError('Only numeric dtypes are allowed in cross')
    # Note: this is different from np.cross(), which broadcasts
    if x1.shape != x2.shape:

Top-to-bottom:

1) me typing on my custom layout on my moonlander keyboard. I have two layers defined: an alphabetic layer which is layed out using the colemak layout, and a numeric + special char layer. You can tell when I am typing in the special character layer because the small LEDS at the top of my keyboard turn on blue while I am in that layer.

2) me typing on my laptop keyboard. Here I’m typing with the same colemak layout for the alphabetic characters but with the standard special char + numeric layout.

3) my partner-in-crime typing on a standard qwerty layout.

Here is what I notice when I look at these recordings:

the most drastic difference is (as expected, by design) between my layout (top video) and the standard layout (bottom video).
on my layout I am nearly constantly using my thumbs. I take this as a sign that I’ve correctly placed the most commonly typed keys in the thumb clusters.
on my layout my hands stay fairly steady on the home row (top video). Compare this to how much my hands move to reach the special characters on my laptop keyboard (middle video) and how much my partner’s hands move in the standard layout (bottom video).

Quick tour of my layout

My layout has an alphabet base layer, a special character layer, and a tiny third audio control layer. Here they are:

Layer 0 is the alphabet layer, laid out in colemak:

Layer 1 is the numeric + special key layer. I set up my special char layer by starting out with the default character layout, and riffing on it to fit the patterns I often typed:

Layer 2 is my tiny audio control layer, only the thumb cluster keys are mapped:

Principles for keyboard layouts I only use keys I can reach from the homerow

Here is a photo of my keyboard– notice that the keys that are in far-off places have black blank keycaps on them. They are unmapped because I don’t need so many keys and I don’t like stretching:

I treat the two mapped keys in the bottom-most row on each keyboard half as an extension of the thumb cluster; I type those keys with my thumbs.

I also have a few hard-to-reach keys mapped to macros:

the fun big red thumb buttons. The left one opens my dropdown terminal, and the one on the right creates a new todo item in my todo list app of choice
the two keys in the inner-most column on each side. These are mapped to copy / paste / cut / spotlight search.

I type all these macro keys infrequently, and never mid-thought / in conjunction with other keys.

Keep it simple

As I showed above, I have 3 layers, which are really more like 2 layers and a tiny 3rd layer. On each layer, each key is only in one spot.

I don’t map any keys that I don’t need. On the standard layout these include:

the duplicate modifiers right-command, right-shift, right-option
caps lock
function keys
the function key overlays to control screen brightness & pull up the control center
the fn key

Do you ever type function keys? I do not. Why map keys you don’t use?

I find that duplicating keys and putting them in multiple spots just-in-case ends up being too clever for me. It’s easier to learn a layout where each key is exactly in one spot.

I have one duplicated key– the character - is mapped twice. Once it is conceptually a dash on the left side, place adjacent to the underscore key, and it is mapped again conceptually as minus on right side, adjacent to all the other arithmetic operators.

Place the most commonly used keys in the hottest spots

Here is my opinion about key positions ordered from best to worst:

thumb cluster keys, in order from outside in
home row keys, in order from inside out
keys within one of the home row, with a slight preference for those above than those below
the two keys on the bottom row closest to the thumb cluster (I also type these with my thumbs)
all other keys are too far away

I looked at 3 things when thinking about my layout:

which keys do I type in combination with one another (where do I place modifiers?)
how often do I type keys
which keys are conceptually similar

Assigning the base-layer thumb cluster

The first question is which keys to put on the thumb cluster on the base layer. I chose:

space
enter
backspace

as my most frequently typed keys. I then chose my three favorite modifiers to fill the remaining slots:

command
alt
shift

This left control and layer-1 to map on the base layer. I decided it wasn’t worth bumping any alphabetic keys out of position to get a better position for these characters, so I put them both within reach of my left pinkie in the shift and caps lock position on a standard keyboard.

Additionally, I wanted to make sure that I could type all the modifiers in combination with one another. This meant that I could not put control in the place where I have right-arrow mapped, because I also hit that key with my left thumb, which would make it impossible to type control and command at the same time. I have command and shift next to each other, and when I want to type them simultaneously I type them both at the same time by pressing my thumb down between them.

Assigning the special-character-layer thumb cluster

I made a roughly ordered list of how often I type various special characters in both prose & code. It was ordered something like:

period
comma
apostrophe
double-quote
pound, equals, colon
question mark, exclamation point
underscore
semi-colon
parens, brackets, curly braces

This was flavored by my writing predominantly python code, which is why I favored pound (the comment character) and underscore (snake_case naming convention).

I put my top two choices onto the outside of the thumb cluster, which is where my thumbs rest, making them the top choice key positions in the layer. I switched from using double-quotes to using single-quotes in python for strings, which let me bump double-quote down the priority list when making placements. I then mapped the next 4 top keys to the thumb cluster.

To lay out the rest of the special-character layer, I first blocked out the number characters into a numpad arrangement, same as the default layout. I often type 0’s (more frequently than any other number even maybe!) so I bumped the 0 from a hard-to-reach far-right-bottom-row location on the default layout to a much more accessible spot.

I then tried to group keys conceptually in the remaining space, again starting with the default layout and riffing on it to adjust to my uses. I order the placement of the grouped keys with the most frequent members of the grouping going on the home row, followed by the priority ordering: above home row, below home row, two below home row. Finally I filled in the remaining spaces with the remaining one-off keys.

Grouping conceptually similar keys

Here are a few examples of grouping keys together conceptually.

Brackets I kept the parenthesis, bracket & curly brace placement of moonlander’s default layout, and I added the angle brackets in the spare space beneath to group them with all the other bracket-like-characters. Taking the analogy one step further, I mapped the left/right arrow keys onto the same physical keys as the angle brackets, but on the layer below.

Arithmetic Same story, I started with the default layout, but because I frequently type forward-slash (aka the divide symbol) while typing filepaths, I bumped it to an easier-to-type location on the home row rather than below it. I then moved the minus symbol to fill in the arithmetic column. Then I grouped the other slash, back-slash, into the position adjacent to foward-slash.

Process notes

I wrote a bit about my experience switching to colemak here, but that was more at the level of typing tutors. I also did two things in the physical world to make learning (and iterating on) my keyboard layout easier. One was labeling the keys, made easier by the specific keycaps I bought, and the other was creating a diagram which I could look at instead of at my hands.

This seems obvious in retrospect, but I was pretty hesitant to label the keys, and then struggled with how to label them in a way that worked, which is why I mention it here.

Keycap labels

I first tried to label the keys in a lot of ways that did not work. I started with sticking tiny cut out post-it notes over the keys, followed by attempting to tape those on, followed by blank keycaps that I tried to label with sharpie and nail polish. None of those worked.

The thing that did work was these keycaps. The keycap comes apart into two parts and lets you slip in a piece of paper between the layers so you can easily label the keys. They work great, and it’s an aesthetic bonus to have my keyboard annotated in my own handwriting.

I printed out a square grid, labeled the squares, cut them out, and put them in the keys. Here is my grid template. If you use it check your printer resizing settings to make sure it prints at the correct size for your caps. It looks like this:

Key-map

The other thing that I did that I found helpful was I printed out a blank key-map, filled it out, and color-coded it by which finger should hit which key to make it easier to read at a glance. I taped it to my monitor and looked at the keymap rather than my hands when I was going through typing tutors.

One advantage of setting up a simple keymap– one where keys are not repeated, and one with only two layers– is that creating this visual reference was pretty easy. I imagine it would have been much more confusing if I had more layers. I didn’t include my audio layer on this map because it was so minimal that I didn’t need a reference.

Here is the blank keymap template, and an example of what my filled in one looked like:

Appendix Moonlander

I use a keyboard called ZSA moonlander. It’s my favorite keyboard that I’ve tried, hands down.

Things I like about the moonlander:

The UX for trying out new layouts is exceptional. ZSA has a drag-and-drop layout tool called oryx which I found very easy to use. Once you have a layout it is also very easy to flash it onto the keyboard. The toolchain rocks, and it makes it so much easier to iterate on layouts.
It has large thumb clusters. The hypothesis is that your thumbs are strong, so it makes sense to have them type many common characters.
It is fully split so I can adjust the halves to be exactly where I want them to be.
It is portable, because it is flat and is split into halves. I like to take my keyboard with me when I go on longer trips. I use a laptop stand, my mouse and my keyboard on kitchen counters to create a reasonable standing desk.

Other keyboards

In the past I’ve also used an Iris keyboard from keebio and a kinesis advantage.

I love my Iris keyboard, but it was much harder to configure so I never really got it to the point where I was using it as my goto keyboard. It was fun to solder it together and use the open-source toolchain QMK. Keebio has great build guides, a great first-timers guide to mechanical keyboards, and great diy hardware-hacking vibes. The Iris is a lot less batteries-included + polished than the moonlander, and just a tiny bit too small for what I like. Looking at my current layout I would need 2 more keys on each side of the Iris to make everything fit. At the time I built my Iris I didn’t think to label clear keycaps so I had a really hard time learning where I had put which keys. Not seeing the keys combined with the less polished toolchain for modifying the layout made working with the Iris substantially more challenging than working with the moonlander.

I’ve also previously used a kinesis advantage. At the moment I prefer the moonlander because it is more portable and the toolchain is smoother, but I also loved the kinesis advantage for the two years that it was my goto keyboard.

There’s a whole rabbithole of custom keyboards for those who are curious about it. People have thought about everything from shape, layout, input controls, keycap design, switch design, toolchains, and manufacturing toolchains.

Colemak

Colemak is a layout that has alphabetic keys laid out in approximately priority order, but also geared towards keeping uncommon characters in the same position as on the standard qwerty layout to make it easier to switch. Colemak in one of the pre-installed layouts on mac & linux, and easy to install on windows, so it is low-effort to install on a new machine. I wrote about my experience with switching to colemak here.

https://anniecherkaev.com/principles-for-keyboard-layouts

colemak is for quitters

Jan 30, 2022

I spent the better part of the last month switching to typing in Colemak.

I switched as part of a broader ergonomics kick; I’ve been thinking about what habits I can change to be kinder to my body. Colemak is a keyboard layout that places the most commonly typed characters on the home row. The hypothesis is this is a more ergonomic user interface because you don’t need to move your fingers as much.

I use the moonlander keyboard, which is a split and ortholinear keyboard. Here is what Colemak looks like on my keyboard:

For reference, here is what the defacto layout, QWERTY, looks like. A major design consideration of Colemak was being easy to learn for people who know QWERTY. I drew the letters that are in the same position in both layouts in blue, and rest of the QWERTY layout in green:

This is actually my second attempt at learning Colemak. The first was almost exactly two years ago. I spent about a week learning where all the keys were with the help of learncolemak.com before giving up. I suspect that short experience made it easier to get started this time- Colemak is for quitters!

This is my experience report: a summary of what I did to learn Colemak and how long it took.

Typing Tutors

I used four typing tutor programs, in addition to typing free-form text.

learncolemak.com is a fabulous html website that has 9 static typing tutors, each of which introduces a few letters at a time. I enjoyed it as a gentle introduction to the layout.

Epistory is a beautiful adventure game where you ride around on an origami fox, fighting big bugs by typing words that appear above them. I remember the miserable typing tutor game I was forced to play as a fifth grader. Epistory is the exact opposite of that- highly recommend! Epistory automatically adjusts the difficulty, and can start out easy enough that I could make progress when I could only type 14 wpm.

keybr.com is an excellent website that starts you out with a few letters and slowly adds one letter at a time as you master your letter set. It took me so long to earn my first additional letter that I wondered more than once if the website was broken. Nope, it just took me a lot of repetitive practice to solidly learn the home row. The feature of very slowly adding letters was very effective, especially for the letters that I found most challenging in Colemak. I found this site very zen- I had thought that typing gibberish sentences over and over would be boring but I found it to be a nice state of easy flow.

10fastfingers.com is a website that has a typing speed test on what it claims are the 200 most common words in English. It was okay. I didn’t love this site as much as the others but I think it was useful to practice typing common words and it was useful as a benchmark of my progress.

I also wrote a lot of text in Colemak; I switched cold-turkey so pretty much all the text I have written since switching has been in Colemak. One thing I practiced while writing text is when I typed a word wrong, I would delete the whole word (alt + backspace) and retype the whole word. If I typed it wrong again- and I often did, wrong in exactly the same way- I would delete that word and retype it until I did it correctly a few times in a row. Then I would delete all the extra words and continue with my writing. I didn’t measure whether or not this helped, but the hypothesis is that it let me train the correct muscle memory for typing that word.

Time spent

I tracked the amount of time I spent learning Colemak using a drop-down tracker called daily. I didn’t set up daily specifically to know how long it took to learn Colemak; I use it to create a record of how I’m spending my time. I like doing this because it lets me see how I’m really spending my time, which lets me think about what I want to do with my time going forward. It’s a nice side-effect that I got to see how long it took to learn Colemak!

Below is a graph of how much time I spent in the last three weeks learning Colemak, split into two bars. The bottom bar (orange) shows the number of minutes each day I spent using one of the typing tutors I mentioned above. The top bar (yellow) shows the number of minutes I spent writing. This is a bit of an underestimate because I also wrote text while doing other activities, like sending emails, that I didn’t count here.

I spent a total of 24 hours (!) doing typing tutors, and an additional 17 hours writing. That is almost two full days!!

I started typing at a painful 12 wpm, and now am at around 50 wpm 24 days later. For reference, I typed about 75 wpm in QWERTY before I switched. Here is the full record of my average daily typing speed, as measured by 10fastfingers.com.

Here is what that progression felt like.

For the first five days, it took my full undivided attention to type at all. That was a bit frustrating but also… kind of fun? I don’t spend a lot of time being that terrible at something, or needing that much focus. My brain felt tired, but it was satisfying.

On day 6 I was at 24 wpm, and that was about when the mental load got lighter. I could begin to type some words like the without thinking about it. I managed to unlock my computer but not my password manager.

By day 17 I was consistently around 40 wpm, and that felt entirely tolerable. I typed notes on a video call without missing what was being said.

On day 20 I finished all the letters on keybr.com. It was a bittersweet victory. It was the moment I had been waiting for, but it also became less obvious what I should do next to get better. A few days later I am typing at about 50 wpm and it just feels like typing normally, even though I am still about 25 wpm slower than I had been in QWERTY. I think from here I slowly get faster through the quotidian process of typing in my day to day life.

I often felt like my progress on my wpm speed had stalled out, but looking at that graph, my progress was pretty linear. It was hard to see the slow progress in the moment, but clearly it compounded. If that’s not a parable, I don’t know what is.

p.s.

Here are a few after-thoughts:

From about 28 wpm I switched to focusing on accuracy rather than trying to increase my speed by hitting keys as fast as I could. Making mistakes while typing is very slow, not only because it takes time to erase and retype the letters, but it is also very distracting. I focused on accuracy by doing things like deleting and retyping words with mistakes in them, and paying extra attention to letters that keybr.com told me I was frequently getting wrong.
There were a few letter pairs that gave me a lot of trouble: r & s, d & g, p & l. In Colemak, the r is where the s is in QWERTY. I was infuriated by this for a bit and considered switching it on my keyboard, but ultimately did not. I decided that, in addition to wanting to avoid creating the stupid future where I was capable of typing only on my own bespoke keyboard, I bought the reasoning for the switch: that it makes it easier to ‘roll’ your fingers outside-in for the common bigram st and the common trigram rst.
Because I switched entirely from QWERTY to Colemak, I have become bad at typing QWERTY. There are people who are very good at both, and I suspect it wouldn’t take me much time to get back to remembering QWERTY. However, because I haven’t been training both simultaneously, learning Colemak has caused me to rapidly forget QWERTY. This is the psychological phenomena called negative transfer of learning: same stimulus, different trained response. This made the second week of switching to Colemak particularly “fun” because I typed about 30 wpm in both Colemak and QWERTY. Interestingly, however, I type just fine in QWERTY on my phone because the muscle memory is so different!
For an excellent story of how it is surprisingly difficult to show that a rigorously designed layout is better than the barnyard mess that is QWERTY, I highly recommend reading the somewhat sad history of Dvorak. Alas, the best laid plans!
Was it worth switching? I’m not entirely convinced that Colemak is strictly better for ergonomics, but I doubt it is worse for ergonomics. I will say it is at least not the highest priority computer-setup thing to think about. Things that have a bigger impact on ergonomics include using:
- a monitor at eye-level
- an external mouse (not apple’s magic mouse) + minimizing using a mouse
- an external keyboard
- a desk to both sit and stand at
- a webcam to avoid taking meetings on a laptop
- a mechanism to remember to take frequent short breaks to stretch and re-adjust
However, it was a very good experience relearning the muscle memory. It was genuinely magical to go from having to manually think about every keystroke to beginning to move by routine to full auto-pilot in the span of a few weeks.

https://anniecherkaev.com/learning-colemak

python tooling cribsheet

Nov 1, 2021

🧝🗡️🏹🛡️ it’s dangerous to go alone! take this 🐍🛠️🐍🛠️

This page is a brain-dump of the tools & libraries I reach for in python in various situations. It’s a living document; I intend to update it as my practices drift. Perhaps you’ll find something you find interesting here!