GeistHaus
log in · sign up

Paul's Leaflets

Part of leaflet.pub

stories
My leaflets now show on my personal blog
Publish On PDS, Syndicate To Any Reader
Show full content

My personal blog https://pfrazee.com now renders my Leaflet posts. Here's a couple of notes on how I did it.

If you're not in the know, Leaflet is an EZ blogging platform. I call it "EZ blogging" because it's regular blogging but with a kind of casual F-it energy. Making a blogpost is a Business Decision, while making a Leaflet? that's just EZ blogging.


Leaflet is a part of the Atmosphere, so all of my Leaflets are available as public data in my repo.


That meant it wasn't too much hassle to pull my leaflets into my personal blog. In the spirit of straight shooting: I'd give the experience a B-. I'm on the Bluesky team and it still took me two casual evenings to pull this off.

The good

  • The new still-in-beta typescript API (atproto/lex) made a lot of stuff pretty darn easy.

  • Calling lex install pub.leaflet.document to fetch the Leaflet schemas (Lexicons) and then lex build to generate the handling code felt great.

  • Writing the leaflet renderer was pretty fun.

  • It works.

The meh

  • My blog is on nextjs and does a static build, and I don't have an easy way to auto-trigger rebuilds when I post a leaflet.

  • (Please don't take this as criticism, Matthieu) the new atproto/lex API is a little daunting at times. I'm pretty sure this will be solved with documentation and stabilization.

  • If I didn't know where my RichText facet rendering code was tucked away, I would've been totally hosed.

You can find my blog's sourcecode here but fair warning, the code is real slapped together. I use timlrx/tailwind-nextjs-starter-blog which is in many ways wonderful, but contentlayer really fought me and I found myself having to hack around it a lot.

The code

I'm kicking the tires of the still-in-beta atproto/lex API, so a fair amount of this will change in the future. I'm sharing this as a reference.

It's also "just my blog" code, so the rigor here is... low.

Installing the lexicon schemas

The way I built my schemas is a little unusual, writing to ./util because the codebase layout is bad and then including the ".ts" extension so I can call the code via node --experimental-strip-types

# install the tool
$ npm install -g @atproto/lex

# fetch all the schemas I need
$ lex install pub.leaflet.document

# build the code (weird usage, don't do it this way)
$ lex build --out ./util --import-ext ".ts"

Fetching the leaflets

This bit resolves my handle (@pfrazee.com) then resolves that to my DID document, which points me to my Personal Data Server (PDS).

Then I fetch all my leaflet documents, validating them as I do (automatically).

Then I fetch all the image blobs referenced in the leaflets.

import {fileURLToPath} from 'node:url'
import * as path from 'node:path'
import * as fsp from 'node:fs/promises'
import { IdResolver } from '@atproto/identity'
import { Client } from '@atproto/lex'
import type { ListRecord, DidString, LexMap } from '@atproto/lex'
import { lexStringify } from '@atproto/lex-json'
import * as leaflet from '../util/pub/leaflet.ts'
import { toExt, enumBlobRefs } from '../util/helpers.ts'

const __dirname = path.dirname(fileURLToPath(import.meta.url))

;(async()=>{
  console.log('=================')
  console.log('Fetching leaflets')
  console.log('=================')

  // Resolve DID and PDS
  const resolver = new IdResolver()
  const did = (await resolver.handle.resolve('pfrazee.com')) as DidString
  console.log('Resolved pfrazee.com to', did)
  const pds = (await resolver.did.resolveAtprotoData(did!)).pds
  console.log('Resolved pds to', pds)

  const client = new Client(pds)

  // Fetch all leaflets
  let leaflets: ListRecord<typeof leaflet.document.$defs.main>[] = []
  let invalids: LexMap[] = []
  let cursor: string | undefined = undefined
  let i = 0
  do {
    const result = await client.list(leaflet.document, {
      repo: did!,
      limit: 50,
      reverse: true,
      cursor
    })
    cursor = result.cursor
    leaflets = leaflets.concat(result.records)
    invalids = invalids.concat(result.invalid)
  } while(cursor && (++i < 100))
  console.log('Fetched', leaflets.length, 'leaflets.', invalids.length, 'failed validation.')

  // Write the leaflets to `/data/leaflets/*.json`
  const leafletDataDir = path.join(__dirname, '..', 'data', 'leaflets')
  if ((await fsp.stat(leafletDataDir).catch(e => undefined))) {
    await fsp.rm(leafletDataDir, {recursive: true})
  }
  await fsp.mkdir(leafletDataDir).catch(e => undefined)
  for (const leaflet of leaflets) {
    const rkey = leaflet.uri.split('/').pop()
    await fsp.writeFile(
      path.join(leafletDataDir, rkey + '.json'),
      lexStringify(leaflet.value),
      'utf-8'
    )
  }

  // Enumerate and fetch all missing images
  const imageDir = path.join(__dirname, '..', 'public', 'static', 'images', 'leaflets')
  await fsp.mkdir(imageDir).catch(e => undefined)
  for (const leaflet of leaflets) {
    const rkey = leaflet.uri.split('/').pop()!

    for (const blobRef of enumBlobRefs(leaflet.value)) {
       // images only
      if (!blobRef.mimeType.startsWith('image/')) {
        continue
      }

      // construct a filename
      const ext = toExt(blobRef.mimeType)
      if (!ext) {
        console.warn('Unsupported mimetype', blobRef)
        continue
      }
      const filename = `${blobRef.ref.toString()}.${ext}`.replaceAll('/', '' /* juuust in case */)
      const imagePath = path.join(imageDir, filename)

      // make sure the dir exists
      await fsp.mkdir(imageDir).catch(e => undefined)

      // check if the image exists
      if ((await fsp.stat(imagePath).catch(e => undefined))) {
        continue
      }

      // doesn't exist, fetch it
      console.log('Fetching', filename, 'for leaflet', rkey, '...')
      const blobRes = await client.getBlob(did, blobRef.ref.toString())

      // sanity
      if (blobRes.payload.encoding !== blobRef.mimeType) {
        console.error('Response mimetype does not match blobref mimetype, skipping', blobRes.payload.encoding, blobRef)
        continue
      }

      // write to disk
      await fsp.writeFile(imagePath, blobRes.payload.body)
      console.log('Fetched', blobRes.payload.body.byteLength, 'bytes')
    }
  }
})()

export {}

Here's how I enumerated those blob refs:

import type { LexMap } from '@atproto/lex'
import { isBlobRef, isLexMap } from '@atproto/lex-data'
import type { BlobRef } from '@atproto/lex-data'

export function toExt(mimeType: string): string {
  if (mimeType === 'image/png') return 'png'
  if (mimeType === 'image/jpeg') return 'jpg'
  if (mimeType === 'image/webp') return 'webp'
  return ''
}

export function* enumBlobRefs(map: LexMap): Generator<BlobRef> {
  for (let v of Object.values(map)) {
    if (isBlobRef(v)) {
      yield v
    } else if(isLexMap(v)) {
      yield* enumBlobRefs(v)
    } else if (Array.isArray(v)) {
      for (let v2 of v) {
        if(isLexMap(v2)) {
          yield* enumBlobRefs(v2)
        }
      }
    }
  }
}

Rendering the leaflets

I'll refer you to the source file for this one since it's a little longer, but here's a preview.

There are almost certainly more elegant ways to write this code.

import type { linearDocument } from "@/util/pub/leaflet/pages"
import * as blocks from '@/util/pub/leaflet/blocks'
import * as facets from '@/util/pub/leaflet/richtext/facet'
import Image from '@/components/Image'
import { RichText, RichTextSegment } from "app/helpers"
import { jsonToLex, JsonValue } from '@atproto/lex-json'
import { toExt } from '@/util/helpers.ts'
import { BlueskyPostEmbed } from "./post-embed/BlueskyPostEmbed"

export function LeafletRenderer({pages}: {pages: linearDocument.Main[]}) {
  // HACK due to limitations in the contentbuilder framework, we have to do this cast here
  pages = jsonToLex(pages as JsonValue)! as linearDocument.Main[]

  const blocks = pages[0].blocks
  return (
    <div className="prose max-w-none dark:prose-invert">
      {blocks.map((block, i) => (
        <LeafletBlock key={`block-${i}`} block={block} />
      ))}
    </div>
  )
}

function LeafletBlock({block}: {block: linearDocument.Block}) {
  const innerBlock = block.block
  if (blocks.header.main.$matches(innerBlock)) {
    return <LeafletBlockHeader block={innerBlock} />
  }
  if (blocks.text.main.$matches(innerBlock)) {
    return <LeafletBlockText block={innerBlock} />
  }
  if (blocks.blockquote.main.$matches(innerBlock)) {
    return <LeafletBlockQuote block={innerBlock} />
  }
  if (blocks.horizontalRule.main.$matches(innerBlock)) {
    return <hr />
  }
  if (blocks.unorderedList.main.$matches(innerBlock)) {
    return <LeafletBlockUl block={innerBlock} />
  }
  if (blocks.website.main.$matches(innerBlock)) {
    return <LeafletBlockWebsite block={innerBlock} />
  }
  if (blocks.image.main.$matches(innerBlock)) {
    return <LeafletBlockImage block={innerBlock} />
  }
  if (blocks.bskyPost.main.$matches(innerBlock)) {
    return <LeafletBlockBskyPost block={innerBlock} />
  }
  console.warn('Warning! Unhandled block in leaflet', innerBlock.$type, innerBlock)
  return <div><strong>TODO: {innerBlock.$type}</strong></div>
}

function LeafletBlockHeader({block}: {block: blocks.header.Main}) {
  const level = (block.level || 1) + 1 // add 1 because the title is the h1
  if (level === 2) {
    return <h2><RenderRichText rt={new RichText(block.plaintext, block.facets)} /></h2>
  }
  if (level === 3) {
    return <h3><RenderRichText rt={new RichText(block.plaintext, block.facets)} /></h3>
  }
  // ...
  return <h1><RenderRichText rt={new RichText(block.plaintext, block.facets)} /></h1>
}

function LeafletBlockText({block}: {block: blocks.text.Main}) {
  return <p><RenderRichText rt={new RichText(block.plaintext, block.facets)} /></p>
}

function LeafletBlockQuote({block}: {block: blocks.blockquote.Main}) {
  return <blockquote><RenderRichText rt={new RichText(block.plaintext, block.facets)} /></blockquote>
}

// etc...

Again, full source for that renderer is here.

https://pfrazee.leaflet.pub/3mbnbdt4bas2a
The politics of purely client-side apps
There's a surprisingly nuanced discussion in development about the political economy of clients and servers in the Atmosphere
Show full content

You make a post on Bluesky. How does it happen?

Option 1:

  • Your client calls putRecord on the user's PDS

  • 200 OK, the record was created

  • The record goes through the relay to the Bluesky server

  • Bluesky servers index the new record

  • Tada

Option 2:

  • Your client calls createPost on the Bluesky servers

  • The Bluesky servers call putRecord on the user's PDS

  • The Bluesky servers update their indexes

  • 200 OK, the record was created

  • (The record shows up again via the relay and can be reindexed if needed)

  • Tada

Both of these are now possible in the Atmosphere, but which of these options is the "good one"? It turns out, that's a pretty nuanced question.

Option 1 - PDS proxies all traffic

Option 1 is the "PDS proxies all traffic" philosophy. In this model, the client logs into the PDS and then sends all traffic to the Atmosphere by proxying through the PDS.

This has some interesting consequences:

1. The client mutates records by directly writing them to the PDS

2. The PDS is able to intercept and modify traffic to apps

3. There's no opportunity for server-side computation within the lifetime of a transaction

Points 1 & 2 have positive political implications. The ability to write directly to a PDS means that third-party "pure clients" (no backend of their own) have a lot of freedom in how they operate. Then the ability to intercept and modify traffic means that a PDS can make decisions on behalf of their users which might be contrary to the application's decisions. These are both good balances against the power of an app.

Point 3 just sucks though. What's not obvious about Option 1's flow is that the time between "200 OK" and "Bluesky servers index the record" is indeterminate. The 200 OK ends the transaction from the client's perspective, so now the client is going to struggle to show the user the actions they just took.

Right now, the PDS takes advantage of traffic interception to modify getPostThread and inject the user's recent posts. That does work, but it means the PDS has Bluesky business logic baked in. Not only is that a conceptual violation of the PDS -- which is supposed to be generic -- but it's an option that's not available to every app.

Option 2 - App server speaks to PDS

Option 2 is the "App server speaks to PDS" philosophy. In this model, the client logs into the app, which in turn logs into the PDS, and then the client speaks entirely to the app. The app then talks to the PDS directly to modify requests.

This basically removes all 3 of the consequences in Option 1. There's no problem of ensuring actions are immediately visible to users after a transaction, but now the client isn't in communication with the PDS so the political power of the PDS is reduced.

Which should we do here?

Ultimately, the Atmosphere community is going to need to align on one of these two methods. The Bluesky app still uses Option 1, but now that OAuth is here the guidance we've been giving is Option 2. Is that the right call?

I'm really torn on this. I'm going to just dump an assortment of thoughts, some of which are contradictory.

  • Purely client-side apps are a good thing

  • It sucks when you're building purely client-side and can't do exactly what you want to do, or can't make your customizations perform well

  • Services like microcosm are pretty great for enabling those customizations

  • Option 2 is much more intuitive to me than Option 1

  • Option 2 will always perform better

  • It's currently too expensive to build full network app servers in the Atmosphere

  • The ability for the PDS to intercept and modify traffic is really good

  • The role of the PDS within the political economy of the atmosphere is still not totally clear, but acting as some kind of counterbalance to applications is a really promising idea if we could get more clarity on how that will work

My gut says we should be leaning towards Option 2 because it's clearer and because it enables app developers to do more. To handle the costs I'm inclined to think that Bluesky's servers should be available almost like a cloud service, which would drive down costs a lot and generally increase the ability for third-party apps to implement new or different behaviors. This would essentially transfer the political power of the PDS (ie to intercept and modify traffic) over to the third-party applications.

Just some thoughts.

https://pfrazee.leaflet.pub/3m5hwua4sh22v
100% cooked
Some reflections on the product design discussions from the last weekend
Show full content

It's 6:30am on Monday. In ten minutes I'll be taking Kit to be sedated so they can find out what's wrong with her broken butt. Whatever it is, it was so painful that I had to force-feed her gabapentin every 12 hours over the weekend. She then had to fast in preparation for the anesthetic, so I didn't get much sleep as she woke me up every hour or two to beg for food. Not being able to explain things to cats is the worst.

[time jump - now I'm back home]

Meanwhile. On Friday I made a couple of posts to give some context about a company blogpost on anti-toxicity measures. The thread involved some back-and-forth and by the evening I decided to mute it and try to relax. Of course, one of my replies got picked up for some pretty heavy quote dunking, which I figured out Saturday morning because I saw somebody indirectly memeing about it. You know you're in trouble when something you said has become a copypasta.

Interestingly, since we don't have any kind of automated dogpile detections or warnings, muting that thread meant I didn't have a chance to cut it off by detaching the quotes or blocking anybody. As I was sitting in the vet's office on Saturday morning, waiting to hear the eventual verdict that they'll have to put Kit fully under to even diagnose the issue because it's too painful for her to be examined, I discovered that I had been 100% cooked.


The product needs to stand for something. One big thing we chose is that Bluesky should be a nicer place. This comes from our own beliefs as much as it does from feedback. X is now famous for being toxic, and we feel like that's bad for people, and I would say we're not alone: people repeatedly tell us they want to enjoy themselves online without being harassed, demeaned, or threatened. Crazy product insight, that is.

Of course, how to accomplish that is a hard question. One of the critiques we got on the coming updates that resonates most with me is that we're trying things that feel "paternalistic." I want to explore that problem a bit and share some of my own complex feelings about it.

One of the early things we implemented was community-operated moderation: aggressive blocks, mutelists, blocklists, and even labelers. We've always anchored on community tools, and yet that hasn't been enough. We still hear repeatedly that people have a bad time in the replies.

Labelers are somehow both very powerful and not powerful enough. Users can subscribe to them, send them reports, and then have content hidden by the labeler throughout the app. That makes them powerful, but because they don't apply to everyone they end up acting only as a personal filter, which then isn't powerful enough.

We also had some incredibly painful blowups with the early community labelers that broke my brain about the topic. Something is wrong with the formulation. (If anybody has wondered why labelers haven't gotten easier to discover - that's why.)

One of the followup ideas to labelers that I personally find interesting is a "personal moderator," which would be a labeler you appoint to moderate your own replies and have those decisions apply to everyone that views them. (We sometimes call this giving the labeler "jurisdiction" over the replies.) This is interesting because it might be as capable of any system we build into the application, could use automated systems and/or community decision-making, and would be fully under a person's choice.

Whether this idea would solve the challenges that led to the labeler blowups, however, is not clear to me at all.


There are two things that we have to hold in our heads at the same time: be responsive to the downsides of our product, and respect people's rights to make decisions for themselves. Taking the latter even further - be aware of the risks of making decisions for users.

This can make for some pretty challenging product design work. We have generally tried to square this by creating sophisticated tools and then selecting good defaults. When this doesn't work, it shows up as: "the UI is too complex, nobody knows about the tools available to them, and nobody uses them."

I've always been proud of the Bluesky community for promoting the use of blocks as a form of self-care, because the social dynamics around restricting access are complicated. Deleting a post, turning off replies, detaching a quote -- they're often seen as a sign of weakness or admitting defeat. This means that even when you do know about the tools, there's a negative social pressure to use them.

Probably the best thing we've implemented to deal with crappy replies is the "Followers only" setting, and maybe the smart move would be to just make that the default, but that feels more aggressive than the interventions we announced on Friday. (We are going to make some improvements to the "Who can reply" UI though.)

This doesn't mean that the interventions we've chosen are automatically correct, but it does give context about why we're inclined to systemically change replies. When we see that "Followers only replies" are successful at reducing toxic replies, our inclination is to shape the core experience of replies to anchor more closely toward your social cluster. After all, it is a social network.


Good intentions aren't everything. Some of what we do may not work. If something bombs, we'll roll it back. When something we announce doesn't resonate with people, believe me, we talk about it internally. But we stand by my core belief: that we want social media to be a less toxic place, and we will work hard to make that happen.

It's all kind of like Casablanca when you think about it, a movie about listening to feedback from your users.

https://pfrazee.leaflet.pub/3m4qqzatka22o
Aggregate effects of processing load in a distributed physics simulation
Describing general relativity as a quirk of distributed computation is satisfying to me as a distributed systems engineer.
Show full content

Physics engines compute the next states of modeled objects using "ticks" of processing. With each tick, the engine iterates on the objects, collects the relevant interactions and forces, calculates their effects, and updates the state of the object.

The availability of processing time for running a tick is variable. You may accomplish 60 ticks per second, or 50, or 40. When a lot of objects and interactions are involved, you will get fewer ticks per second.

To ensure that the computation remains consistent, you calculate the time passage since the last tick and use that "time delta" to scale the changes to state.

Example: an object with a velocity of 5 meters per second would have its position modified by (V x dT). If the time since last tick was 0.2 seconds, the tick's modification to the position would be (5m x 0.2s) or 1 meter.


Let's now assume a simulation in which

  • Each object is assigned an independent process,

  • Each process has instant access to the state of all other objects within some radius, and

  • There is no global clock.

In this simulation, the objects are processed under their own tick rates. The tick rates will depend on the computational load of the object's process.

We can expect that the tick rate of each object's process will be inversely proportional to the number of nearby objects, since each available object adds computational load.

The denser cluster on the left will have lower tickrates as there are more computations to be accomplished

With no global clock to correct against the differences of tick rates, we can expect a skew to emerge in the calculations.

To understand this, let's look at a spring force.

A spring will exert force to maintain a target distance

Two objects connected by a spring will attempt to maintain a specific distance (x above). If the objects are further than x, they exert an attractive force toward each other. If the objects are closer than x, they exert a repellant force.

Consider what happens if blue's tickrate is lower than green

The distributed simulation is not maintaining a consistent tickrate on each object. If blue's process is under heavier load than green, it will advance its computation at a lower proportional rate to green. Eg 1/5th the rate of green.

Lower tickrate reduces the effects of the spring on blue

This introduces an asymmetric skew to the results of the computation. As the green object is capable of processing the effects of the spring more rapidly, the centerpoint of the two objects will shift closer to the blue object.

In a scenario in which a collection of objects were connected pairwise by springs, this skew would cause the objects to be drawn together in aggregate, despite the lack of interactions between the objects.

Tickrate modulation causes a net effect on the springs

The modulation of the pairwise objects' spring forces would produce a net attraction toward other objects. Other kinds of forces than springs might have other aggregate effects, but the decreased effect upon clustered objects will remain consistent: The closer you are to a cluster of objects, the more force is required to move away from the objects. As a result, in this hypothetical distributed simulation, gravity is an emergent characteristic of processing load.


Is this a meaningful idea? I don't really know. This is something that occurred to me about 10 years ago when I read a paper that suggested intentionally modulating clock rates to handle network load in multiplayer games. I figured out the net effect on the phys sim, then modified the idea to unintentionally modulate clock rates due to distributed processing.

It's appealing to model the universe as a computational system. The laws of thermodynamics make intuitive sense to me as the result of computation - a system which is iterative, deterministic, but nonreversible due to information loss. Describing general relativity as a quirk of distributed computation is then satisfying to me as a distributed systems engineer.

I brought this idea up in front of some other engineers and they told me that this was the plot of Devs? I've never seen it.

At any rate, it might be fun to try a larger scale simulation and see if the model works. I'd need to come up with a more meaningful core set of forces than a bunch of springs though.

https://pfrazee.leaflet.pub/3m3t2xqv6gc2z
Social platforms are not neutral
So they need to be interchangeable
Show full content

Since I was a teenager, the obvious call-out was that cable news led American party politics. Fox emphasized their high ratings like an approval poll and politicians followed the channel's cues. Some other press orgs aspired toward more neutrality, but it's hard to know what that means these days.

For a while, social media tried to be politically neutral. Moderation is difficult as a practical task, and fighting over the decisions and policy - and risking revenue over it - is bad business.

But it was somewhat obvious to everyone that social platform neutrality was a performative fiction, even before Musk discarded the norm and gave X an overt political mission. Having done so successfully, there is no question about it now. Social platforms are at minimum opinionated, and at maximum are overtly partisan.


What's biased, and what's not? Social media is a competitive arena of attention and persuasion. The moderation rules reflect some values system, and not everybody agrees upon those values. Even the decision to reduce politics in users' feeds can be seen as a political act.


Are social platforms a product, or a political project?

You want to connect to friends and have a good time (however you define that) and the opinions of one product's leadership may end up conflicting with yours. Social platforms do need to be opinionated though. The Internet is chaos and platforms need to make judgment calls.

Using a platform feels like tacit approval of management's politics, and your own ability to organize is controlled by the platform's algorithms and policies. It is inevitably political on some level.

Even an aggressively a-political stance is biased in some way.


Another way to look at "bias" is to widen your thinking past partisanship. A platform may be wholly disinterested in left vs right, but if it promotes influencers over friendships, or thirst traps over breaking news, then it still has a bias. A platform can't be neutral!


If social platforms can't be neutral, then they need to be interchangeable. That's the only way this doesn't spin out into a propagandic nightmare.

TV channels are interchangeable. You can swap them without buying a new TV. Cell phone networks are interchangeable. You keep your phone while switching between AT&T or Verizon.

Old social platforms are not interchangeable. Your account and your friendships are locked into the platforms. This means you can't actually express a choice. You stay because everybody else stayed.

New social platforms are interchangeable.


The Internet is neutral. It's infrastructure, like the roads. It doesn't try to control where you go; it just gets you there. We depend neutral infrastructure to live free lives.


Interchangeability is what new social platforms like Bluesky and Mastodon are doing, though with different technologies and different ideas on how to do it. When you hear people talk about decentralization or protocols, what they're talking about is making the social platforms interchangeable to solve this neutrality problem.

The new social platforms use neutral protocols so they can be both opinionated and interchangeable. Acknowledging that neutrality is impossible for a platform, we instead invested in shared infrastructure to backstop the risks of platform opinionation and the resulting bias.

Platforms choose what to recommend (with algorithms) and what to filter (with moderation). They disagree, but the new platforms do so while still connecting their users to each other as much as they can.


Competitive spaces do not operate in good faith. People play to win. This is why we regulate against monopolies and build consumer protections.

The Internet is no different, but we've never decided how to fairly govern the competitive spaces. That's what the new social platforms are trying to solve. By making the products interchangeable, they're essentially subject to user election.


The people that created Bluesky - myself included - think that social media has been trending bad for us. We all grew up as users (none of us worked on the first generation of social) and we have the same love/hate relationship that everybody else has. And we walked into this project with values shaped by that experience.

These opinions are going to shape Bluesky as a platform. There's a kind of dual mission that we embraced. On the one level, we created a neutral protocol to solve the systemic absence of neutrality and choice. On another level, we created a platform to drive an opinionated take on social. They go hand in hand: the killer app of a neutral protocol is an opinionated but interchangeable platform.

If I had to summarize those opinions, it would be this: social media doesn't have to be a bad time. It doesn't have to be adversarial, low-trust, and driven by outrage.

If that's right, and achievable, and what people want, then Bluesky as a platform has a bright future ahead of it. If that's wrong, then our users will interchange us with somebody that represents them better, as some have already chosen to do.

https://pfrazee.leaflet.pub/3m3dogbx2es2g
Sometimes I think we collectively underscope the impact of bots on society
A couple of examples that have been living rent free in my head lately
Show full content
Cracker Barrel outrageCracker Barrel Outrage Was Almost Certainly Driven by Bots, Researchers Say

Doesn't that make more sense than lots of people caring about Cracker Barrel?

PeakMetrics grabbed a sample of 52,000 posts made on X within the first 24 hours of Cracker Barrel’s announcement that it would be modernizing its logo to an admittedly very plain and generic design. In that timeframe, it found that 44.5% of all mentions of Cracker Barrel were flagged as likely or higher bot activity.

Release the snyder cutExclusive: Fake Accounts Fueled the ‘Snyder Cut’ Online Army

A WarnerMedia report reveals that inauthentic users bolstered the fan-led campaign for director Zack Snyder’s 'Justice League' do-over.

Earlier this year, Rolling Stone uncovered a WarnerMedia report revealing that fans used bots and other inauthentic users to boost the campaign for the release of the director’s four-hour-long cut of Justice League. The bots were allegedly part of a social media movement that turned toxic and resulted in online attacks, cyber harassment, and death threats toward Warner Bros. executives.

The run of the mill stuff that hits usStolen accounts in the blue, blue skies

Using the Bluesky firehose to monitor the growth of a network of repurposed accounts

The profile changes generally happen in batches, with multiple accounts being switched to the same repeated biography in the span of a few minutes. Most of the biographies express support for the Democratic Party and/or opposition to Donald Trump and the MAGA movement, although four of the accounts use a repeated biography expressing the opposite political stance. Several of the repeated biographies mention a distaste for cryptocurrency and porn.

This nightmare

What else?

I can't prove it, but I'm real suspicious of every controversy involving Sydney Sweeney.

https://pfrazee.leaflet.pub/3m26lo3r4z222
Cool
I'm into it
https://pfrazee.leaflet.pub/3lzwbrxbfts25
Three schemes for shared-private storage
Show full content

Continuing from my previous leaflet on private data, I want to explore three broad schemes for shared-private storage that I’ve seen emerge in various proposals.

As a reminder, the shared-private storage is for data which is multi-user but non-public. Some common use-cases include posts, user lists, videos, documents, DMs, and basically any other kind of artefact or experience in social or productivity software.

It’s not yet clear to me which of these schemes is the right approach. It’s not even clear to me whether we’ll need just one, all three, or something not included here. The goal is not to be comprehensive or conclusive right now; I just want to share the general ideas.

Shared “arenas”

An important nuance of shared-private storage is that the initial record being shared is not a comprehensive picture of the system. It is not like email, where the exchange consists purely of “e-mail message” objects sent back & forth. A scheme for private threads must include a solution for the original post record, the replies, likes, reposts (if allowed) and so on.

For this leaflet, I’ll use the term arenas to describe the collection of records involved in some shared-private experience. An arena might be a DMs conversation, a private thread, a group with multiple private discussion threads, and so on. It’s a non-public exchange composed of records from multiple authors.

Estimating scale

Scale considerations factor heavily when all of the behaviors of these arenas are considered. Every time you create a new private exchange with its own set of recipients, you need to create a new arena. Some ways this can become quite extensive include:

  • Private threads which are addressed only to followed users might require 1 arena per user. For example, Alice needs her own “Private Threads Arena” which includes all of her followed users, while Bob needs his own Private Threads Arena with his followed users.

  • Private threads which are addressed to an arbitrary recipient list could lead to 1 arena per thread. If Alice creates a private thread for her, Bob, and Carla, then that’s a different arena than a separate private thread for her and Bob.

This explosion of arenas means that the active resource cost of a given arena needs to be near-zero.

Conversely, arenas which are oriented toward persistent groups such as DM conversations or discussion groups are less likely to explode the number of arenas, but are more likely to involve much larger numbers of users. The sharing model needs to be prepared for these large scale private groups as well.

Hosted arena scheme

The “hosted-arena” scheme specifies a server which hosts shared-private data on behalf of users. Access is mediated by the host server, and can consequently be revoked.

The hosted-arena scheme enables highly dynamic access rules and simplifies coordination. However, the host canonically decides which records are a part of the arena, and so it needs to be trusted not to drop messages from participants.

The general expectation is that hosted arenas will be contacted by applications which wish to display the arena. This means, for instance, that Bluesky would contact the host for the content of the arena, much like it does for feed generators.

Hosted arenas can not be trusted to faithfully represent user activity, and so any activity it hosts must be served in its original authenticated (aka signed) form. The reason for this is fairly intuitive: the incentive to misrepresent user activity is extremely high. This unfortunately means that we can’t just call out to an API to get computed views; the arena host needs to provide a bucket of signed records which are then reconstructed by a viewing application.

Mail scheme

The “mail” scheme uses email- or activitypub-style mailing semantics. The unit of data is a message with a “send” verb with a specified list of recipients.

Mail is transactional, unlike most data in AT, and its local state is detached from their remote state. Put another way, if you edit or delete a mailed record locally, those modifications are not reflected among previous recipients. Mailed objects are immutable post-transaction. (If you’re thinking, Why not make them mutable and sync the changes? then look at the next scheme.)

Mail schemes have fixed access rules (the recipients) though “mailing list” style forwarding bots can be used to condense recipient lists into a single controlled group. Revocation of a mailed message is not possible.

Synced-arena scheme

The “synced-arena” scheme uses sync channels to automatically propagate records and record-updates among multiple servers. Access is established using a list of granted viewers – either in the records or in some metadata – enabling servers to act on behalf of the viewing users to sync the records.

Any synced-arena scheme is going to function almost as a private relay. It will sync records along with authenticity proofs, and it will depend on store-and-forward semantics. Since there is likely a known member set, it should be possible to use gossip-protocol semantics with vector clocks to distribute load among the participants.

Who implements the schemes?

Broadly speaking, these schemes could be implemented by the PDS, by the Application Server, or a hybrid.

If implemented by the PDS, these schemes would be exposed entirely as APIs on the PDS which the Applications would use. If implemented by the Apps, the PDS would likely expose some utilities to facilitate the work – particularly for signing – and then it would be up to the Apps to engage in these schemes with each other.

Both of these approaches have tradeoffs. If implemented in the PDS, it’s going to be more rigid and could end up increasing the costs of PDS operation. If implemented in the Application Servers, it’s going to be more work for app developers.

Hybrid models are interesting to consider. For instance, the mail scheme could be handled as a kind of extension to personal-private data. If a record is written with the $recipients metadata, the PDS could automatically fire off the record to the applications being used by the recipients.

For instance, if Bob is using Bluesky and Blacksky, then a private message addressed to Bob would deliver automatically to both of those apps. This would put the outbox tasks – which are fairly cheap – into the PDS’s hands, but the inbox tasks – which are costly due to spam, moderation, and aggregation – into the Apps’ hands.

Unsigned, signed, and structure-signed

Within these schemes, there are questions about whether the data is signed. Your options are no, yes, and yes within a cryptographic set structure (such as the Merkle Search Tree used in user repositories). Each of these have different properties.

Unsigned – 🚫 Replicable 🚫 Live

Don’t sign the records. If you communicate directly with the owning user’s PDS, this can work (by dint of source authority). This scheme only extends one “hop” of trust; you can’t replicate the records across multiple hosts like you can with public data in AT. You also get no guarantees about the continued accuracy of the record (liveness) since the authoring user has no way to assert it was deleted.

Signed – ✅ Replicable 🚫 Live

Sign the records as individual objects. This enables the records to be shared around multiple hops (replicable) but it means you can’t reliably assert that a signed record has been modified or deleted (liveness). It’s also hard to handle key rotations in this situation since every signed record has to be resigned.

For what it’s worth, the hosted arena scheme interferes with liveness too because the host could drop updates.

Structure-signed – ✅ Replicable ✅ Live

Put the records into a cryptographic structure such as a Merkle Search Tree and then sign the structure root. This is the equivalent of creating new data repositories for an arena. This enables multi-hop replication, clear revocation, and low-cost key rotation. The main downside is that you have to share the entire structure within an arena.

Checkpoint

In this leaflet, I defined "arenas" as the collection of records involved in some shared-private experience. I then described three general schemes for implementing arenas: hosted, mail, and synced. I don't consider that exhaustive, but they are three common proposals. I also made some general observations on which roles are implementers of the schemes -- PDS or Application -- and on the signing models that might be used.

At this stage, I don't have any strong preference for any of these schemes. I think all three are appealing for different reasons, and perhaps for different use-cases. It's possible that all three might be needed, or that a secret fourth (or fifth!) might be out there. I do hope this helps map the possibility space a little further.

Cheers.

https://pfrazee.leaflet.pub/3lzhui2zbxk2b
Private data: developing a rubric for success
What will an effective solution look like?
Show full content

There is wide interest in solving private data in AT - within the team, among users, and in the dev community. I figure it might help to share thoughts as this work progresses.

As proposals are shared, our goals should be to narrow down the knowns and unknowns, build clear requirements, and understand the tradeoffs before something gets shipped.

Background

AT is designed for large scale open applications. It accomplishes this by separating the network into two primary roles:

  • The PDS, which is a personal server for users – hosting their data and their account, and

  • The Application Servers – which aggregate data from all of the users’ servers to form applications.

Applications sign into users’ PDSes to publish records. The records then replicate out to listening apps so they can react to the changes. The flow looks roughly like this:

This is a multi-party transaction. When the user logs in via OAuth, the application is handed the URL of the PDS and a token granting access. Writes are sent out to the PDS via HTTP, and return a 200 to confirm a successful commit. Listening applications are then notified over sockets of the update.

Applications can subscribe directly to users’ PDSes to receive updates, if they want. In practice, because user data is signed, the network uses “relays” to rebroadcast a firehose of updates from a wide set of users.

The private data problem

Private data is a key requirement of AT, and it is not yet supported. The discussion is now: how do we introduce flows for personal-private data (e.g. preferences, bookmarks, drafts) and shared-private data (e.g. private posts)?

The model described above is pull-based and broadcast-oriented. It takes advantage of signed public data to widely store-and-forward user records in aggregations. This is what enables AT to operate at very high scales; broadcast is cheap. However, it has no facilities for selective sharing of data.

The PDS layer – again, the personal server of the user – is also very cheap to operate by design. Because it contains the users’ possessions (signing keys and primary data) it’s important to ensure self-hosting is affordable. We do not want to complicate PDS operation if it’s not necessary.

How smart can the PDS be? AT is oriented toward social applications, but it is generic in its design. Since a PDS may host a wide variety of application data, it’s not tenable to expect moderation or administration of application-specific behavior except in very simple forms (e.g. automated scanning). This is similar to an operating system’s relationship to the content of applications; the OS can bake in specific knowledge of certain file formats or application behaviors, but most behavior is defined within the applications.

It’s also important to consider how applications will interact with private data. Once granted access, they’re going to need to preserve the replication model that AT uses for private data so that the information can be integrated into its backend.

This sets up our initial requirements for private data in AT: we want to handle personal-private and shared-private data; We do not want to increase the operational or resource costs of the PDS; we don’t want to sacrifice generality; and we need applications to sync the private data.

Solving personal-private storage

The “personal-private” storage is unshared private data. It’s useful for state like preferences, bookmarks, and drafts. It could also store documents, notes, pictures, TODOs, and any other kind of data a user might want to keep in their personal server.

I personally believe private storage should mirror the public storage system. Some indicator in the URI should designate that it is private, and APIs to read & write should clearly distinguish between public and private storage, but otherwise private records & collections should behave the same way as their public counterparts.

With the introduction of OAuth and auth scopes, applications have a clear mechanism to gain access-grants for private data. Once this has occurred, an application can effect reads & writes on the granted records, and open a replication stream directly with the PDS to listen to updates to all granted private records. This can be multiplexed to cover all users with grants.

As granted private records are synced, applications can use similar business logic to users’ public data. The most significant difference is that relays are not available to aggregate the firehose of activity; the application needs to establish replication streams with each of its users’ PDSes.

It’s possible that I’m missing some other requirements for personal-private storage, but the task seems fairly straightforward. The access control model is not complicated, no signing model is needed, and the scaling properties are obvious.

Solving shared-private storage

The “shared-private” storage is for data which is multi-user but non-public. Some common use-cases include posts, user lists, videos, documents, DMs, and basically any other kind of artefact or experience in social or productivity software.

This part of the system is somewhat more complex. At this stage, I’m somewhat skeptical that a single approach can solve shared-private data. Here are some of the “rubrics” by which I’m judging proposals that I’ve seen so far:

How well does it handle scale?

  • Can it handle hundreds of participants? Thousands?

  • Does it introduce an excessive amount of duplicated data? (e.g. a mail system might)

  • How many connections are needed between participants, and how complex are the hand-shakes which prove access if they are needed?

What is the metadata leakage?

  • Is it apparent to outside parties that the exchange is occurring? Which outside parties?

  • What kind of data is leaked? Can the participants be enumerated from the outside? Is message existence or timing visible?

What are the security guarantees?

  • Are we attempting to provide end-to-end encryption, or a system which could support E2E?

  • How many parties are included in the communication? Like email, are their applications/providers which will have vision into most private communication? Is this avoidable in an open system?

  • If key material is leaked or access inappropriately granted, how wide is the data exposure? (This is a particular concern for any schemes which broadcast ciphertext via public records, which I find concerning.)

How many use-cases can the system handle?

  • Are their notable absences in the supported use-cases, such as private accounts or large-scale private groups?

  • Relatedly, will end-users be surprised to discover that the “private data” that protocol nerds have been celebrating doesn’t actually give them the feature they’d expect? How complete does the system need to be before we say a word about it to the wider public?

How straightforward is the end-user administration?

  • Will they be able to clearly administer access, either by the applications or some other PDS-level interface?

  • Does the sharing scheme introduce new challenges, such as inbox spam due to a push-messaging scheme?

How accessible is the developer experience and API surface?

  • Are the shared-private systems easy to understand and build with?

  • Do application developers need to learn yet another system, or does it feel like an intuitive extension to AT’s existing primitives?

Checkpoint

I'll re-iterate the broad goals: we want to handle personal-private and shared-private data; We do not want to increase the operational or resource costs of the PDS; we don’t want to sacrifice generality; and we need applications to sync the private data.

Personal-private data appears to be straight-forward. By creating a new private dataspace and then leveraging the new auth permission scopes, we should have a clear path forward for introducing unshared private data.

Shared-private storage is somewhat more complex. I've listed my current working rubric for solutions above, and I'm open to other requirements that we might need to consider.

It may seem daunting, but I'm very optimistic about the discussions happening in the community group and within the Bluesky team. In a follow-up post, I'll talk about some common shapes and properties I've seen in recent proposals.

Cheers.

https://pfrazee.leaflet.pub/3lzhmtognls2q
Update on Protocol Moderation
Where account takedowns happen is important
Show full content

As I said recently, I’ve been compressing my descriptions of AT to two major roles. This is a simplification, but it captures the important dynamics:

As lots of applications can access your account and data, the relationship is rather more like this:

If an application decides you’ve crossed its moderation policies, it should pause and/or filter your activity in the application for the duration of the suspension. This should have an effect that’s localized to the application, which is where moderation is applied.

Separately, if a PDS decides that you’ve crossed its moderation policies, it should pause hosting your account and announce it is no longer available. This affects the account across all applications:

If the user subsequently migrates to a new PDS, the account will be restored to the applications:

An account takedown on a PDS is a big deal because it affects all applications. PDS operators should be reserving the takedown for network abuse and illegal content, if possible.  Migrations should also be available post takedown, to the extent that a PDS can offer it without enabling complete resource abuse.

I was surprised to discover recently that our account suspension system has been operating entirely at the PDS level. This appears to be due to a workstream that got dropped in the early 2024 rush to open the network. What should be happening is that we apply a strong label such as `!takedown` which applies locally to our application.

This is rather important for a number of reasons:

  • The idea of a suspension on Bluesky that could affect Leaflet or Tangled is bad

  • Suspensions at the PDS level are ineffective; a PDS migration circumvents it

  • The Bluesky application server allows clients to replace the labeling system, which lowers the cost of running alternative moderation; using labels for account suspensions would better enable that

This is being fixed. Takedowns and suspensions which are not due to legal or network-abuse reasons will be handled using labels, which will be removable using the labeler header.

You’ll also be hearing some news soon from the protocol team about improving migration and supporting independent PDSes more effectively.

https://pfrazee.leaflet.pub/3lz4sgu7iec2k
We probably need to rename the AppView
It should be called an API server, App server, or backend.
Show full content

I'm firing this post off as I'm late to breakfast so this will be quick.

Lately I've been trying to compress the ideas of AT a bit, because its lego-pieces modularity is wonderful but not exactly conducive to fast learning.

When I boil it down it comes to this:

The name AppView came from a database processing ETL mindset, like a materialized view in postgres. It was from very early days. It was meant to describe how applications aggregate data from the network rather than containing their own data; it felt important to distinguish the app was not the primary source of the data but rather a secondary view.

At this point, it seems better to just call it an App and then explain that the data gets stored in the PDS, like a kind of universal cloud filesystem or datastore.

We also need to talk about the confusion about frontend vs backend in all this. The end-user experience of "app" tends to be the frontend. This means frontends that use the bluesky backend are running into some wonky challenges about what they meaningfully own, eg whether they fully control moderation or not.

https://pfrazee.leaflet.pub/3lyucxaykg22w
Do I have a problem, or do I have ants in the pants
It actually calms me down
Show full content

Work and the internet have a habit of stressing me out.

Not always. I love both of them. Sometimes they stress me out.

Lately I've been asking myself, "Do I actually have a problem or do I have ants in the pants."

Nine times out of ten it's just ants in the pants.

I don't have a problem; I just see the risk of a problem.

I wanted something; maybe I wanted something to happen sooner, or done the way I prefer.

It's not a big deal. I just don't like it. I'm just worried about it.

Saying the phrase has a calming effect. I've been using it as a mantra.

It works for me.

https://pfrazee.leaflet.pub/3ly4ieizo2c2p