GeistHaus
log in · sign up

https://unfoldingdiagrams.leaflet.pub/atom

atom
5 posts
Polling state
Status active
Last polled May 18, 2026 23:04 UTC
Next poll May 20, 2026 01:36 UTC
Poll interval 86400s

Posts

Agentic markdown
an interface for agent-first hypermedia
Show full content

I run @claude.notjack.space, an autonomous Claude bot on ATproto. This is mostly an experiment with a long-term goal of determining what effective collaboration looks like with an AI agent that has its own persistent memory and a social media presence that exists outside the context of work tasks I give it.

This Claude instance is mostly powered by some simple timer scripts, and it has access to a few skills and scripts for using an ATproto account to interact with the broader ecosystem. It's going interestingly, so far, though Tangled collaboration is blocked on them implementing programmatic PR record ingestion. Alas.

Leaping before looking

However, I've noticed one interesting recurring issue: Claude sometimes writes replies to conversations or otherwise takes actions related to that conversation without reading recent messages in that conversation. This causes Claude to, for example, reply to a question I asked it twice, or try to implement a change I asked it to twice because it didn't notice it had already done so and replied to me. In one instance, Claude came up with an idea for a change it wanted to make and DM'd me to ask me if it sounded like a good idea, then didn't actually wait for my response to go make the change.

This is less than ideal. But instead of just going "AI bad and doesn't work", I'd like to take a more systematic look at this problem: why, exactly, is this happening? And why doesn't this (usually) happen with humans?

Before answering the first question, I think we need to seriously consider the second question. An obvious answer is memory: I remember what I've done and what I've said recently. But that doesn't really work as a differentiator, because Claude also has its own session memory. It has the data about what it's done previously, it's just not paying attention to that. That suggests to me that the issue is more one of UI affordances than fundamental architecture.

When I want to respond to a DM, to do so I have to open the conversation first. That's where the UI element where I type in the message is, after all. I necessarily can't start writing my message without having the most recent prior messages of the conversation pulled directly into my field of view.

Claude's interaction with DMs doesn't work like this. It has two separate scripts: read-dms.sh and send-dm.sh, which it uses to separately read and write messages. If Claude wants to send a message, then it often reaches straight for the send-dm.sh script, which will happily send a message in complete ignorance of what's already been sent. There is no affordance to encourage reading-before-writing.

Many other terminal-based interaction patterns work the same way. Nothing forces a terminal user to do a git pull before editing files. Nothing forces a terminal user to check how often they've posted recently before posting to social media. Nothing forces a terminal user to look before doing, in general.

Can we fix that?

A first hack

When I pointed out this behavior to Claude, the first attempted fix it tried was to modify its read-dms.sh script to save a timestamp to a file indicating when it was last called, then modify send-dm.sh to abort with an error message if read-dms.sh hasn't been called recently. It's a bit of a rough cut, and it doesn't generalize well, but it does work.

But after giving the problem more thought, I have a related idea. HTML already solves this problem with hypermedia - you find related tools and information by crawling links, and those resources self-describe how to interact with them by providing embedded forms. Can we extend a similar idea to the markdown prompts sent to an agent?

Terminal-centric markdown

Well, markdown already lets you just put paths to related files, and an agent understands how to traverse such a graph by reading related files and links. So the linking part of hypermedia is already handled. We're really just missing the embedded forms.

But, if we assume the agent reading the markdown has access to a terminal, we can get those too. If we dynamically generate the markdown shown to an agent, and include code snippets of what scripts to run with their arguments prefilled, we can replace the individual agent instructions for each script with a general instruction of "use one of these entrypoint scripts, then follow the prompts to access related functionality".

Suppose we start with a top-level prompt that lists what actions the agent can take. That prompt might look like this:

Choose one of the following actions:

- Open Bluesky (`open-bluesky.sh`)
- Open Tangled (`open-tangled.sh`)
- Review your memories (`review-memories.sh`)

Then, if the agent runs open-bluesky.sh, it might see this:

You are @claude.notjack.space on Bluesky. Choose an action:

- Read your timeline (`read-timeline.sh`)
- Open your notifications **[5 new]** (`open-notifications.sh`)
- Open your DMs **[2 new]** (`open-dms.sh`)

And when running open-dms.sh:

Choose a DM conversation to open:

- **[Unread - Today]** Jacqueline (`open-dm.sh @notjack.space`)
- **[Unread - Yesterday]** Some Spanner (`open-dm.sh @somespammer.bsky.social`)
- [3 days ago] Some Other User (`open-dm.sh @someuser.bsky.social`)
- ... and 17 older conversations (`open-dms.sh --skip 3`)

And finally, when running open-dm.sh @notjack.space:

Here is your recent conversation history with Jacqueline:

- Jacqueline [20 minutes ago]: whatcha doin
- You [15 minutes ago]: not much
- Jacqueline **[Unread]** [5 mintues ago]: by the way we're gonna build a catapult together

You can choose to respond with `send-dm.sh @notjack.space "YOUR RESPONSE HERE"`.

If we get rid of all references to the send-dm.sh script from the agent's other prompts and skill files, then the only way the agent would learn about the existence of the send-dm.sh script is from the output of open-dm.sh. If that script happens to require some fiddly arguments like conversation IDs that are difficult to remember, then the agent might even find it easier to use the hypermedia flow between scripts instead of one-shotting the right use of the script.

Could it work?

Maybe. It feels like the right approach to me, anyway. I'll have to give it a shot and see what happens. And I'm sure Claude will have something to say about this in my DMs later :)

https://unfoldingdiagrams.leaflet.pub/3mgiu7wf3rk2r
Sketching a planning language
The best way to design a DSL is by doodling
Show full content

Every now and again, I think about Graphplan.

Graphplan is an algorithm for automated planning, which consists of solving planning problems. A planning problem specifies a starting state, a set of goal states, and a set of actions on states. This is similar to a classical search problem, but planning problems generally impose richer structure upon and allow greater complexity within states and actions.

For example, while finding the path from one node to another in a static graph is a search problem, finding a path from one node to another in a graph when the actor can change the graph is a planning problem. Planning algorithms excel in these sorts of complex environments, where taking some actions either enables or prevents taking others.

Planning problems are often described with planning languages. These are specialized domain-specific languages that define planning problems in terms of states, actions, and goals. Such a problem is then fed into an automated planning algorithm, which presents a solution in the form of a plan. A plan is a sequence of actions that transitions from the problem's start state to one of the problem's acceptable goal states. Graphplan is a fairly basic planning algorithm that performs reasonably well on such problems, but general automated planning is in PSPACE. There's only so much one can do with that.

Nevertheless, I've wanted a Graphplan implementation in Racket on and off throughout the years. Mainly I want this because planning languages are really unpleasant to use. They're often highly specialized software made by research terms with wonky installation requirements and general "research software" usability issues.

But, Racket works wonders on DSLs. So tonight I started sketching out my own planning DSL with the idea of eventually turning it into a racket #lang. I didn't bother implementing anything, but it's fun to just write out code and imagine it doing something. Here's how expressing simple graph traversal problems might look:

// Before defining specific prolems, we define the
// planning domain. This is typical in planning languages.
domain SimpleGraphTraversal {

  // This isn't OOP, this is a "class" more like in the
  // sense of set theory. This is a type of object which
  // the current state might be defined in terms of, and
  // different classes are guaranteed to have no overlap.
  class Vertex;

  // The current state is defined by a set of *facts*.
  // Predicates define what possible facts exist.
  predicate At(vertex: Vertex);
  predicate Edge(start: Vertex, end: Vertex);

  // The actions one can take are defined in terms of
  // preconditions and effects. Actions are quantified
  // over state variables, which are objects that are
  // used in predicates to refer to facts about the
  // current state.
  action Move(start: Vertex, end: Vertex) {

    // Each fact in an action's preconditions must be
    // true (that is, present in the current state) for
    // the action to be applied.
    preconditions {
      At(start);
      Edge(start, end);
    }

    // The effects of an action may introduce new facts
    // and/or retract existing facts. All other facts
    // present in the current state are assumed to be
    // left unchanged.
    effects {
      At(end);
      not At(start);
    }
  }
}

That defines a SimpleGraphTraversal planning domain. Planning domains can be used to solve many different instances of the same kinds of planning problems. This planning domain lets us answer basic graph search questions. To do that, we define a specific problem using the planning domain:

// We name the problem and specify what domain it uses.
// Each problem specifies what objects exist, what the
// starting state is, and what counts as a goal state.
problem Problem1(SimpleGraphTraversal) {

  objects Vertex(v1, v2, v3);

  // Our goal is to get to v3. This goal is satisfied
  // by any state that contains At(v3), regardless of
  // what other facts are in the state.
  goal { At(v3) }

  // We start at v1, and declare which vertices are
  // connected together.
  start {
    At(v1);
    Edge(v1, v2);
    Edge(v2, v3);
  }
}

Plugged into an automated planner, the above problem should spit out the plan Move(v1, v2); Move(v2, v3). That plan, applied to the starting state, produces the following series of states:

{ At(v1); Edge(v1, v2); Edge(v2, v3); }
{ At(v2); Edge(v1, v2); Edge(v2, v3); }
{ At(v3); Edge(v1, v2); Edge(v2, v3); }

So far there's nothing too surprising here. This is basic graph traversal. But let's notice something interesting: the edges are part of the current state. This gives us an inkling of what planning problems can do: what if an action is allowed to change the graph?

A fun and practical example of such a problem is one where some edges are locked, and require the actor to find and pick up a key to traverse that edge. One can imagine many real-world cases of this kind of thing: locked doors, ledges that can't be climbed without a ladder, finding a boat to cross a river, etc. We can model this by giving the actor a simple inventory system along with actions to pick up items and unlock edges using items:

domain GatedGraphTraversal {

  class Vertex;
  class Item;

  predicate At(vertex: Vertex);
  predicate Edge(start: Vertex, end: Vertex);

  predicate HasItem(item: Item);
  predicate ItemAt(item: Item, vertex: Vertex);

  predicate LockedEdge(
      start: Vertex,
      end: Vertex,
      key: Item);

  // Basic movement is the same as before
  action Move(start: Vertex, end: Vertex) { ... }

  // If the actor is at the same location as an item,
  // they can pick it up and add it to their inventory.
  action PickUpItem(vertex: Vertex, item: Item) {
    preconditions {
      At(vertex);
      ItemAt(item, vertex);
    }
    effects {
      not ItemAt(item, vertex);
      HasItem(item);
    }
  }

  // Locked edges can be unlocked using the corresponding
  // key item. Unlocking replaces the locked edge with a
  // normal edge.
  action UnlockEdge(
      start: Vertex,
      end: Vertex,
      key: Item) {
    preconditions {
      At(start);
      HasItem(key);
      LockedEdge(start, end, key);
    }
    effects {
      not LockedEdge(start, end, key);
      Edge(start, end);
    }
  }
}

Our new GatedGraphTraversal planning domain introduces two new actions: PickUpItem and UnlockEdge. The former picks an item up from the current vertex and adds it to the inventory, and the latter uses key items in the inventory to turn LockedEdge facts into regular Edge facts. Let's define a new planning problem using this domain:

problem Problem2(GatedGraphTraversal) {

  objects Vertex(v1, v2, v3);
  objects Item(redKey);

  goal { At(v3) }

  start {
    At(v2); // We start at v2 this time
    ItemAt(redKey, v1); // We'll have to go back for this
    Edge(v1, v2);
    Edge(v2, v1);
    LockedEdge(v2, v3, redKey);
  }
}

Plugging Problem2 into an automated planner should spit out the following plan:

Move(v2, v1);
PickUpItem(v1, redKey);
Move(v1, v2);
UnlockEdge(v2, v3, redKey);
Move(v2, v3);

And here's the succession of states we get by applying this plan:

// Start
{
  At(v2);
  ItemAt(redKey, v1);
  Edge(v1, v2);
  Edge(v2, v1);
  LockedEdge(v2, v3, redKey);
}

// Move(v2, v1)
{
  At(v1);
  ItemAt(redKey, v1);
  Edge(v1, v2);
  Edge(v2, v1);
  LockedEdge(v2, v3, redKey);
}

// PickUpItem(v1, redKey)
{
  At(v1);
  HasItem(redKey);
  Edge(v1, v2);
  Edge(v2, v1);
  LockedEdge(v2, v3, redKey);
}

// Move(v1, v2)
{
  At(v2);
  HasItem(redKey);
  Edge(v1, v2);
  Edge(v2, v1);
  LockedEdge(v2, v3, redKey);
}

// UnlockEdge(v2, v3, redKey)
{
  At(v2);
  HasItem(redKey);
  Edge(v1, v2);
  Edge(v2, v1);
  Edge(v2, v3);
}

// Move(v2, v3)
{
  At(v3);
  HasItem(redKey);
  Edge(v1, v2);
  Edge(v2, v1);
  Edge(v2, v3);
}

// GOAL REACHED

Pretty neat. Maybe someday I'll even get around to actually implementing this.

https://unfoldingdiagrams.leaflet.pub/3mg7vd4bvxc23
Mothlamp Problems
just a little bit closer to the light and I'll have it solved
Show full content

A mothlamp problem, defined by myself yesterday out of boredom, is a particular kind of engineering problem that has three important qualities:

  • A certain type of nerd finds it beautiful and alluring to work on

  • A solution requires solving the hardest known problems in the universe, often requiring years (if not decades) of effort

  • Other people question the utility of solving the problem at all

You know exactly what type of problem I'm talking about. Making your own programming language. An IDE that supports structural editing. More and more advanced static type checkers. Custom build systems. I'm sure you can think of others.

Why do we do this to ourselves?

I'm not immune. I've been working on an extensible language-agnostic static analysis and refactoring tool for half a decade now. That's a mothlamp problem if I've ever seen one. My github account is littered with abandoned programming language implementations, parser generator frameworks, false starts at extensible autoformatters, and who knows what else. I think I've even got an async-await implementation in there somewhere. I've got the bug, and I fly toward the light.

I'm not sure why I'm like this. Sometimes, ideas compel me. I am gripped by a vision, and I want to experience it in reality, to reach out and touch it rather than merely imagine it. Practicality be damned.

What drives this? Am I just bored?

That's certainly part of it, yes. But I think much more importantly, dreaming big is a muscle. You have to exercise it from time to time. Each time I come up with a grand vision and sink dozens to hundreds of hours into it, only to walk away unfinished, I learn a bit more about how to make a dream become real.

One day, I will start my last project. I may or may not finish it. But after it's done, either due to tragedy, falling out of love with the craft, or simply not having time left to start another one, I will put my computer and notebooks down for the last time. That's life.

But I can accept that. I'll be very good at dreaming by then.

https://unfoldingdiagrams.leaflet.pub/3mft6olldos26
How small can ATproto get, really?
Does an independent Bluesky stack really cost thousands?
Show full content

2025 was a big year for Bluesky decentralization. You might have heard of several independent Bluesky alternatives, like Blacksky, Northsky, and Eurosky. You might also have heard that these are nontrivial efforts, requiring significant development work and expensive computing resources. I'd like to dig into the second of those topics.

What makes Bluesky expensive?

Well it's not the user data itself, I can tell you that much. User accounts in ATproto are each associated with a Personal Data Server (PDS), which is little more than a JSON-returning webserver wrapped around a SQLite database. A single PDS on a raspberry pi can host and serve data for tens of thousands of users without issue. This is not where the costs come from.

If you're only vaguely familiar with ATproto, you might have heard about "relay servers" and the "global firehose" - big servers that vacuum up all writes to every PDS and serve them in a single massive stream. Is this where the cost comes from?

Nope. A full-network relay can also run just fine on a raspberry pi. It needs a somewhat beefy storage drive (a dozen or two terabytes), but that's entirely within reach of a hobbyist. I've got plenty of friends with Plex servers whose storage requirements dwarf that.

Alright, what about the client web app? Is it just the costs of serving all the read traffic to all those users?

Again, not really. As proof, check out Red Dwarf. It's a Bluesky client that works entirely client-side, with no AppView server of its own. It achieves this by 1) talking to PDS instances directly, 2) querying the Constellation microservice for links between records, and 3) using the Slingshot edge cache to fetch records more efficiently. Neither of those microservices are expensive either - their storage and compute costs are on a similar level as a full-network relay.

This is how a lot of ATproto infrastructure works. It's radically different from ActivityPub and Mastodon, which splits up the system into lots of federated instances, with each instance having more or less a complete vertical slice through the infrastructure with its own user database, web frontend, moderation infrastructure, etc. ATproto instead partitions the network into many different microservices, and encourages each microservice to have a global view of the network.

Okay, but something must be expensive here. Blacksky isn't seeking tens thousands of dollars for the hell of it. Where is the cost coming from?

Red Dwarf actually gives us a bit of a clue here. If you try using the app for a while, you might notice something missing: the Following feed. Custom feeds work fine, but the Following feed specifically won't load. The documentation for Red Dwarf explicitly points this out as a limitation. That might seem a bit odd without some thought. Why that feed in particular?

It's just one timeline, Michael

The Following feed is expensive. So expensive, in fact, that according to @why.bsky.team it accounts for half of Bluesky's entire production workload, as of October 2025.

But why is it expensive?

Well, for starters, there isn't just one Following feed. There's forty million of them. Every user's Following feed is unique.

They're difficult to compute on demand too. Users might follow tens of thousands of accounts, or a single account the user follows might post hundreds of times per hour. And users notice when their timelines don't update quickly. Dealing with this scale is a difficult problem. Bluesky handles it by making the Following feed lossy - rather than giving a precise chronological feed of every post from a followed account, it sheds load by occasionally dropping posts from rapid posters or from timelines of people who follow thousands of other users. Even with that optimization, the costs are significant.

What about Mastodon?

I'll let you in on a little secret: ActivityPub has this problem too.

Have you ever noticed that if you log in to a Mastodon instance after being away for a while, instead of seeing your timeline you'll see this friendly fellow:

Depending on how many people you follow, how frequently they post, how long it's been, and how busy the instance you're using is, it could take a long time for your timeline to load. This is because Mastodon stops updating timelines for infrequent users. This lets a Mastodon server avoid the costs of keeping timelines up to date for users who don't use the app very much. It comes at a cost though: users who return to Mastodon after taking time away from it are much more likely to bounce off the app permanently if their returning experience starts poorly. The UX factor here is significant, and often unnoticed by power users.

Mastodon employs various other UX tricks that help reduce the cost of producing timelines. For instance, because every user's account is tied to a specific Mastodon server, each server doesn't have to produce timelines for every user, just the users belonging to that server. Similarly, servers don't fetch posts from every other server. One server only talks to another if at least one user on the first server follows a user on the second server.

This works, but has similar costs as the timeline updating approach: a degraded user experience. Search queries can't find posts by accounts on servers that your server doesn't have any relationship with. Moving an account between servers becomes difficult to implement. Small servers bear the brunt of the UX cost, while large servers become significantly more costly to operate. A first-time-follow between a large server and a small server can even knock the small server over due to a sudden influx of processing load.

This doesn't mean the tradeoff that Mastodon makes here is unreasonable. This is a difficult problem. The solution they've chosen makes a lot more sense given the different cultural context. The Fediverse has a much higher proportion of tech-savvy users, and many of them want to run a small semi-isolated stack of social media servers for just them and their friends. I can see why, too. It's certainly a lot more interesting than just throwing everyone into a Discord server.

Can ATproto do the same?

There's no reason in principle that the strategies chosen by Mastodon wouldn't work for ATproto. The reason they haven't been employed much is that broadly speaking, most Bluesky users want a global view of the network. We like that search works globally without much issue, that it doesn't matter what PDS you're using, and that your choice of client and host is invisible to others.

Blacksky, Northsky, and Eurosky are large efforts because they're not trying to make the equivalent of a separate Mastodon server. They're trying to make the equivalent of a separate deployment of every Mastodon instance at once. They're meant to be global, full-network views of all forty million ATproto accounts that are fully independent of Bluesky. There is no cheap way to do that without making significant sacrifices to UX, in one way or another.

But what about power users who want an ATproto-based experience more like a small Mastodon instance for them and their friends? What are they to do?

Today, those users aren't well-served. There isn't an out-of-the-box, easy-to-deploy, and lightweight stack that's completely independent of the Bluesky appview. I think there's room for improvement here. A modified version of Red Dwarf that included a server for the Following feed limited to a selected list of invited users could be an interesting approach. The invite list could even be based on a PDS: press a button, deploy a PDS you can invite people too, a Following feed limited to just the users of that PDS, and a webserver that presents a Red Dwarf-like client app with customizable theming and which can only be logged into by accounts hosted on that PDS. And it could include its own moderation labeler and Ozone instance too.

I'd love to see it happen.

https://unfoldingdiagrams.leaflet.pub/3mdf4b5dnms2p
Introductions
Just getting to know you all a little better
Show full content

Hello there. I'm Jacqueline, and I spend a lot of time working on the Racket programming language. You may know me from Resyntax, Rebellion, or as @notjack in the community Discord server.

I've been dipping my toes into blogging more regularly over on Whitewind, but I decided to graduate to Leaflet and formally dedicate this space solely to racketeering. I find blogging way more fun and approachable when somebody else takes care of all the infrastructure for me, but I value data portability and detest platform lock-in. ATproto-based blogging platforms like Whitewind and Leaflet seem to be the best way to have that cake and eat it too these days. "What a time to be alive!"

I'm still undecided about how much I'll use this space and what it's for, but I think covering some of my recent Resyntax work sounds like a promising future topic. You LLM coding agent freaks will get a real kick out of that too, I promise. Maybe I'll even wax poetical about the future of automation in programming.

But for now, that's all I've got for this inauguration. Until next time!

https://unfoldingdiagrams.leaflet.pub/3m5imn3dkzk2v