xahteiwi.eu — GeistHaus

Florian Haas Apr 18, 2026 Updated Apr 18, 2026

It’s not completely trivial to run OpenCode in rootless Podman. But it’s not impossible, either.

Show full content

OpenCode is an MIT-licensed agentic coding assistant. It’s not completely trivial to run it in rootless Podman, but it can be done. Here’s how.

The OpenCode container image

Although the OpenCode documentation makes it a bit hard to find, an official Docker image for OpenCode does exist, and is available from the GitHub Container Registry (GHCR).

It runs in TUI mode, out of the box, with podman run, like so:

podman run -it ghcr.io/anomalyco/opencode

Images are released along with new OpenCode versions, so if you don’t want the latest, you might instead want to run something like:

podman run -it ghcr.io/anomalyco/opencode:1.4.5

Rootless Podman

I’ve written about how I manage services with systemd and podman in rootless mode before; see this article for details.

The remainder of this article assumes the same setup.

The objective

What I want to do is this:

Run OpenCode in a rootless Podman container that I can manage as a user-mode systemd service.
Selectively give OpenCode access to a directory containing Git repos.
Ensure that OpenCode has access to those files using my user identity, the way the OpenCode TUI would if I ran the opencode binary directly (without a container).
Define everything in a Docker Compose configuration.1

Obviously, the TUI itself won’t be very helpful for that purpose.

However, OpenCode also comes with a very helpful server mode, including a web-based GUI, and this we can containerize rather well.

Considerations

There’s a few things one needs to know to make this work.

Default config file paths

In a default configuration, OpenCode needs access to the following files and directories, in addition to the working directory with the code I want it to modify:

~/.config/opencode/opencode.json or ~/.config/opencode/opencode.jsonc: OpenCode’s main configuration file.
~/.local/share/opencode: Directory for logs, and OpenCode’s SQLite database.
~/.local/state/opencode: Directory for lock files and other state information.
~/.cache/opencode: Cache directory.

Thus, containerized OpenCode requires that I mount these paths into my container.

OpenCode in server mode

When running in server mode, OpenCode by default listens on localhost only. Thus, if I want to run it in a container and port-forward its server, it’s necessary to pre-create the ~/.config/opencode/opencode.json file and populate it. Like so:

{
  "$schema": "https://opencode.ai/config.json",
  "server": {
    "port": 4096,
    "hostname": "0.0.0.0",
    "cors": [
      "http://localhost:4096"
    ]
  }
}

Podman with PODMAN_USERNS=keep-id

When you run Podman with PODMAN_USERNS=keep-id, as I normally do, Podman

makes sure that the user ID and name in the container match the UID and name of the host user that invokes the container,
sets that user’s home directory in the container to whatever --workdir option it was invoked with (or / if no such option was given),
injects a line to the above effect into /etc/passwd in the container.

Putting it all together

With all the above in mind, I can use this Compose configuration:

services:
  opencode:
    image: ghcr.io/anomalyco/opencode  # optionally add version suffix, like ":1.4.7"
    container_name: opencode
    volumes:
      - "~/.local/share/opencode:/home/coder/.local/share/opencode"
      - "~/.local/state/opencode:/home/coder/.local/state/opencode"
      - "~/.config/opencode/opencode.json:/home/coder/.config/opencode/opencode.json"
      - "~/.cache/opencode:/home/coder/.cache/opencode"
      - "~/coding/git:/home/coder/git"
    working_dir: "/home/coder"
    command: serve                     # enables web server mode (instead of TUI)
    ports:
      - "127.0.0.1:4096:4096"
    environment:
      - "SHELLCHECK_EXTERNAL_SOURCES=false"
    restart: unless-stopped

I can then invoke this like so:

PODMAN_USERNS=keep-id podman compose up

With this,

My host UID is mapped to the same UID in the container, meaning all file access on the host and in the container use the same credentials.
My home directory becomes /home/coder inside the container.
My OpenCode configuration files and state directory are accessible to OpenCode in the container.
My home directory’s coding/git subdirectory (assuming that’s where all my Git checkouts live) becomes /home/coder/git in the container. If I wanted to be even more restrictive, I could mount just a single Git checkout directory into that container path.
The OpenCode web server is available on my host as http://localhost:4096.

Setting the environment variable SHELLCHECK_EXTERNAL_SOURCES=false is a workaround for the OpenCode issue 5363.

The reason I want to use the Docker Compose format is simplicity and personal preference. Skipping this layer, and using Podman Quadlets instead, would also be an option. ↩

tag:xahteiwi.eu,2026-04-18:/resources/hints-and-kinks/opencode-podman/

Reviving a near-bricked Pinebook Pro

Florian Haas Oct 5, 2025 Updated Oct 5, 2025

I’ve recently had to bring back my Pinebook Pro from a zombie state.

Show full content

Recently, I put my Pinebook Pro in something of a zombie state. This was by no means the hardware’s fault, nor that of the Armbian system I run on it. Rather, I put a bad bootloader image into the SPI flash (/dev/mtd0), which upon reboot put the laptop in permanent Maskrom mode.

It took me a little while to figure this out, because from the outside Maskrom mode looks remarkably like the machine is dead: when you hit the power button, nothing appears to happen. The power LED doesn’t come on (not even in red), the display stays dark. And the PBP being a fanless device, there’s obviously no fan spin-up either.

What I needed to do instead was find another machine, and connect it (via a USB A-to-C cable) to the PBP’s USB-C port. Then I powered on the PBP. If the PBP is in fact not dead, but in Maskrom mode, the kernel log (on the good machine) will show a message like this:

$ journalctl -b --grep 2207
Oct 05 09:30:43 foobar kernel: usb 1-6: New USB device found, idVendor=2207, idProduct=330c, bcdDevice= 1.00

I was also able to use rkdeveloptool (on the good machine) to talk to the device:

$ rkdeveloptool ld
DevNo=1 Vid=0x2207,Pid=0x330c,LocationID=106    Maskrom

Once able to access the PBP in Maskrom mode, I was nearly back in business. I now needed to follow the instructions for writing to the SPI from another machine in the Pinebook Pro wiki.

Now, I’ve found that zeroing the SPI (which should mean that the PBP boot process just ignores the SPI, and attempts to find a boot loader on the eMMC, and then on the MicroSD card) didn’t change anything for me — the PBP just kept coming back into Maskrom mode after resetting or power-cycling.

Thus, I chucked Tow-Boot into the SPI. At the time of writing, that was version 2023.07-007, downloaded from its GitHub release page.

I also had to obtain the rk3399_loader_spinor binary from the recommended source (at the time of writing, that’s https://dl.radxa.com/rockpi4/images/loader/spi/rk3399_loader_spinor_v1.15.114.bin per the wiki).

Thus, the whole process amounted to:

Enumerating the device with rkdeveloptool ld
Applying the bootloader
Writing the Tow-Boot.spi.bin file
Checking
Rebooting the device

Here is the corresponding sequence of commands.

$ rkdeveloptool ld
DevNo=1 Vid=0x2207,Pid=0x330c,LocationID=106    Maskrom
$ rkdeveloptool db rk3399_loader_spinor_v1.15.114.bin
Downloading bootloader succeeded.
$ rkdeveloptool wl 0 towboot/pine64-pinebookPro-2023.07-007/binaries/Tow-Boot.spi.bin
Write LBA from file (100%)
$ rkdeveloptool td
Test Device OK.
$ rkdeveloptool rd
Reset Device OK

After this, the power LED immediately came on and the PBP happily booted Tow-Boot, which then enabled me to select the boot device and boot from either the eMMC or MicroSD card.

tag:xahteiwi.eu,2025-10-05:/resources/hints-and-kinks/pbp-maskrom/

Exploding memory usage in Django/uWSGI containers

Florian Haas Dec 7, 2024 Updated Dec 7, 2024

We recently came across an interesting problem at work while migrating from one flavor of Kubernetes to another. It’s sufficiently obscure to merit a brief write-up for reference.

Show full content

When running Open edX on Kubernetes clusters, one of its Pods is the lms Pod, which runs the core of the Open edX Learning Management System (LMS).

This is a relatively complex Django application, which runs in the Pod’s sole container. Said Django application is being launched with uWSGI.

At work, we had previously run this platform on Kubernetes clusters managed with OpenStack Magnum, and were in the process of migrating to Gardener. Apart from the fact that we were upgrading to a newer Kubernetes release, this also meant that the base operating system of our Kubernetes worker nodes changed from Fedora CoreOS to Garden Linux (which is effectively a Kubernetes-optimised Debian). The virtualisation platform underpinning the Kubernetes cluster remained the same (OpenStack).

Mid-migration, we suddenly noticed that our cluster was oom-killing our lms pods. Now this shouldn’t happen, for the following reasons:

Normally, Kubernetes only kills a Pod for excessive memory usage when a memory limit is set on that Pod, which wasn’t the case.
Otherwise (that is, with no memory limit set), Pods get killed only by the “regular” kernel oom-killer, and that should only happen when the Pod is grossly misconfigured — that is, its actual memory usage far exceeds its configured memory request.

We quickly found out (via kubectl top pod) that we were dealing with the latter of these two: our lms Pod was consuming a whopping 8 GiB of memory when running on the Gardener-managed cluster — nearly 4 times the memory request of 2 GiB.

This had us scratching our heads, for on the Magnum-managed cluster it was previously running on, that same pod had typically consumed only 80-120 MiB of memory (with occasional spikes). Thus, we were dealing with baseline memory usage that had suddenly increased by two orders of magnitude.

Now to explain this memory usage jump, you’ll need this background information:

The corerouter plugin in uWSGI maintains an array of file descriptor references.
The size of this array, and with it its memory usage, is a multiple of the value set for uWSGI’s max-fd configuration option.1
If max-fd has not been set in the uWSGI configuration, its default is the maximum number of open file handles allowed for the process per the system-wide configuration.
Said default can be defined by the nofiles ulimit, or a cgroups restriction. A cgroups restriction is also what systemd uses to implement the LimitNOFILE option, which can be set on any systemd unit.2
If neither the ulimit nor a cgroups restriction is in place, the fs.nr_open sysctl, if set, acts as a backstop.

Prior to release 256, systemd effectively bumped the default for LimitNOFILE from 1048576 (2²⁰) to infinity, which meant that rather than setting its own cgroups limit, it would rely on fs.nr_open. And that value was recently upped in some distributions to 1073741824 (2³⁰) — an increase by a factor of 2¹⁰ or 1024 over the previously applicable value.

This change was also applied on Debian (which Garden Linux is based on), and it was even discussed on the Debian mailing list — where ironically, concerns about raising this limit were pre-emptively quashed with the assertion that file descriptors are such an “extremely cheap resource” that it does not hurt to allow absurdly high numbers of them.

In the uWSGI case, however, this had the somewhat devastating effect of increasing memory usage to insane levels.

To their credit, the Garden Linux developers identified this flaw (which, to my knowledge was baked into their version 1592.2), and fixed it in version 1592.3. Still, to insulate ourselves from further such issues, we have opted to reconfigure our systems to run uWSGI with an explicitly defined max-fd option, set to the prior system-wide default of 1048576 (although setting it to something as low as 1024 would probably work too).

Acknowledgements

Lothar Bach, Brennan Kinney, Piotr Kucułyma, Namrata Sitlani, and Maari Tamm all contributed to the findings discussed in this article.3

See the source, which at the time of writing reads: ucr->cr_table = uwsgi_malloc(sizeof(struct corerouter_session *) * uwsgi.max_fd); ↩
As far as I can tell, at the time of writing the table captioned “Resource limit directives” in the systemd.exec man page is outdated and incorrect as far as LimitNOFILE’s default is concerned, and also the “Don’t use” admonition seems misguided at this point. ↩
I’ve listed these individuals in alphabetical order by surname. ↩

tag:xahteiwi.eu,2024-12-07:/resources/hints-and-kinks/max-fd/

Keynote-worthy talks (I think?)

Florian Haas Oct 4, 2024 Updated Oct 4, 2024

Maybe one of these is a good fit for your conference?

Show full content

As of late, I’ve been doing a number of talks that weren’t so much focused on specific technologies like I’ve done in the past.

Rather, they talk about broader issues (still mostly related to what I do for a living, though), and some conference organisers have approached me after the talk expressing regret that they didn’t put it in front of a bigger audience.

So, just in case: maybe your next conference is looking for a keynote? Here are a few talks I think might be worthy of such a thing. All are approximately 45 minutes long.

Quit Simplifying (slides, video from Config Management Camp 2024) is a talk in which I talk about the ever-increasing complexity of systems, the futility of simplification, and thus the necessity to keep things as simple as possible from the get-go.

Creativity: How we lost it, why that’s bad, and how we get it back (slides, video from PyCon Italia 2023) talks about the value of creativity, its necessity in business, the myriad ways in which contemporary ways of working are detrimental to creativity, and simple and effective measures to foster creativity.

It’s Your Own Damn Fault: Why great people don’t want to work with you addresses hiring challenges in many modern organisations and the way that many of those organisations fail to attract good people by repeatedly shooting themselves in the foot. This talk I’ve thus far only given in German, and I am currently working on an English version. The slides for the German edition are in this GitHub repository.

No, We Won’t Have a Video Call For That (slides, video from FrOSCon 2020) was a talk I gave at the height of Covid lockdowns in 2020, thinking it was the last time I’ll ever have to do a distributed work talk because surely everyone would have figured it out by now. Little did I know. Not only did the writeup of that talk get slashdotted by Hacker News more than a year later, but it seems that just a few years on, most companies have completely forgotten what they learned during lockdowns. So, if you think your conference benefits from a talk about distributed work that works, maybe an updated reprise of that talk might be helpful.

Want one of these for your conference? Please feel free to contact me via one of the channels you’ll see just below my mug on the left, or at the top of your screen if you’re reading this in portrait mode.

tag:xahteiwi.eu,2024-10-04:/blog/2024/10/04/keynote-worthy-talks/

3 places to eat: Florence

Florian Haas May 27, 2024 Updated May 27, 2024

Find yourself peckish in Firenze? I can help.

Show full content

A recent trip to PyCon Italia prompts me to add the Tuscan capital to this series.

Da’ Vinattieri

Via Santa Margherita, 6R, Firenze

Over-the-counter street food. Lorenzo the local tour guide recommended this as his go-to place for lampredotto, and who am I to second-guess him?

Pompi

Via Faenza, 37R, Firenze

Tiny corner café. Has an incredible selection of tiramisù, of which the salted caramel variety is my favourite (though of course, the classic recipe is exquisite too).

l’Pizzachiere

Via San Miniato, 2, San Niccolò

Casual sit-down or takeaway with delicious pizza. Public seating is available just uphill, making this an excellent source for your pizza picnic.

Special mention: Mercato centrale

Piazza del Mercato Centrale, 4

Gigantic market building on two floors, where the ground floor is the market proper (which does have prepared-food outlets as well). Technically open until 5, but many shops close earlier.

The upper floor, in contrast, is a food mecca that is open from morning until midight with an endless variety of options.

tag:xahteiwi.eu,2024-05-27:/blog/2024/05/27/3-places-to-eat-flr/

Quit Simplifying (Config Management Camp & PyCon Italia 2024)

Florian Haas May 24, 2024 Updated May 24, 2024

A talk I gave at two different conferences, in Belgium and Italy.

Show full content

This is a talk about complexity that I did twice in 2024:

First, I presented it at Config Management Camp in Ghent in February. A recording is on YouTube.

Then, I reprised it at PyCon Italia in Florence in May. That talk is also on YouTube, and I like that recording slightly better.

My slides (with full speaker notes) are available here.

tag:xahteiwi.eu,2024-05-24:/resources/presentations/quit-simplifying/

Torrtija de las señoritas Tatin

Florian Haas Apr 14, 2024 Updated Apr 14, 2024

A dessert I made up.

Show full content

This is a dessert that’s an excellent use for some stale bread and an apple that’s lost its crunch.

I don’t think I’m the first person to come up with this, but I did make it up by myself so if there’s someone else to credit, I wouldn’t know whom.

The dessert decribed in this recipe, shown on a white plate

I call this a torrtija because it looks like a cross of a torrija (usually called “French toast” in English, though the French call it pain perdu or “lost bread”; go figure) with a tortilla española. And the whole mashup looks something like a Tarte des Demoiselles Tatin.

Ingredients

Amounts are for 4 dessert servings.

1 stale bread roll or brioche bun (pretty much anything based on wheat flour will do)
2 eggs, whole
100 ml milk
50g butter
1 tablespoon granulated sugar (optional)
1 large apple

Equipment

1 medium-size bowl
Cooking knife and board
Whisk
Small skillet

Method

Whisk the eggs and milk together in the bowl.
Cut the bread into pieces, roughly 2×2×2 cm.
Toss the bread into the bowl and mix thoroughly. Set the bowl aside for 15-20 minutes so the bread soaks in the egg wash.
Meanwhile, peel and core the apple. Cut it in half and then cut into thin slices.
When the bread is done soaking, melt the butter in the pan on medium heat. Optionally, sprinkle the sugar on the bottom of the pan.
Lay the apple slices on the bottom of the pan, covering it.
Using a large spoon or your clean bare hands, cover the apple slices with the bread/egg mix.
Put a lid on the pan, and cook on low to medium heat for about 10 minutes. The steam from the apples will cook the mixture almost through.
If necessary, gently loosen the pan’s contents from the pan using a spatula. Try not to break it up.
Cover the pan with an upside-down plate, press down on it with one hand while holding the panhandle in the other, and flip it so the plate, now right way up, ends up on the bottom. Set the plate down. You should now be able to lift off the pan.
Either serve the dessert on one communal plate for everyone to tuck in, or cut it into 4 slices and serve on small plates. Optionally, add a scoop of ice cream or a dollop of whipped cream.

Nutrition facts

No warranty of any kind on these. Values are per serving.

Calories (kcal) 204 Total fat (g) 13.5 Saturated fat (g) 7.7 Total carbohydrates (g) 16.6 Sugars (g) 10.6 Protein (g) 4.8

tag:xahteiwi.eu,2024-04-14:/blog/2024/04/14/torrtija-de-las-senoritas-tatin/

Lievito madre

Florian Haas Feb 25, 2024 Updated Feb 25, 2024

My take on Italy’s take on sourdough.

Show full content

I was recently asked (on Mastodon) how I prepare my lievito madre, and how I make pizza from it. The answer takes a few more characters than my instance’s post limit, so I might as well put it here. Now, I’m not Italian, I’m just a baking nerd, so please take whatever you read here with a pinch of salt and don’t get into an argument with your nonna about it.

In case you haven’t heard, lievito madre is simply the Italian take on sourdough. The fun bit about it is that isn’t sour at all, and you can thus use it for all sorts of baked goods including sweets.

Making lievito madre is not difficult, if you have access to wholemeal wheat or spelt flour. But it does take a bit of patience. Actually, a fair bit of patience.

Here’s how I go about it.

Bootstrapping

To bootstrap a starter, you mix 60g of wholemeal flour with 30g of water and 5g of honey.1 Mixed, it turns into a squishy ball. Dab some olive oil on your palms and roll the ball between them, so that a thin film of oil covers the ball. Put that ball into a glass jar and close the lid.

Nourishment

What follows now is a matter of some debate. Some say that throughout the cycle of feeding the fresh starter, it should be kept refrigerated, which means the whole process takes 6-8 weeks. I think you’re perfectly fine developing your starter at room temperature, and only refrigerate it later. This tends to get you a pretty potent starter within about half that time.

At any rate, until your starter has developed some leavening power, you repeat the following process every day or two: You take 30g from your developing starter and dissolve it in 30g of water, to which you add another 5g of honey. Then, you add 60g of wholemeal flour and work it all into a ball again, covering it with another thin layer of olive oil. You discard the rest of your starter.2

After about a week or so, you’ll notice that the starter will approximately double in volume in about 24 hours at room temperature. At that point, your culture is well developed and is no longer at risk of collapse, and will only require feeding. If you’re going to bake mostly with white flour, you may at this stage want to also switch to feeding your starter with that. So, on your next feeding cycle, you simply replace the wholemeal with white flour.

After a few more repetitions of that, your starter will probably reliably double in volume in 24 hours at room temperature. But you’re not quite done yet, because you want your starter to be potent enough to do the same while refrigerated. So, you keep the same feeding pattern, but after every replenishment of your starter you now return it to the fridge at 6 to 8°C. Eventually, your starter will have enough leavening capacity that it will triple to quadruple in volume in 24 hours at that temperature.

Baking arithmetic

Then you can start baking. What might you want to bake? Pizza, of course.

My standard quantities for pizza dough are as follows. (These are per adult person; multiply as needed. Children under 10 count like half an adult.)

125g flour
20g activated sourdough
75ml water
4g salt

Now there’s a little arithmetic to be done here, for the purposes of preparation.

Suppose I am baking pizza for 10 people.

This means I will need 200g of sourdough, and at any time I’ll have about 120g in my jar. So the night before I make pizza, I’ll take what’s in my jar, and put it in bowl, dissolve it in 55g of water and add 55g of flour. Then, I let it sit overnight in a covered bowl until it’s nice and bubbly. Next morning, I now have 230g of activated sourdough. Of this I take 30g, mix it with 30g of water and 60g of white flour, and that’s my starter for next time. It goes into the jar, and back into the fridge. (It takes less than 24 hours to be ready for baking again.)

The remaining 200g sourdough get whisked into 750ml of tepid water, and then mixed and kneaded with 1,250g of flour and 40g of salt.

This big ball of dough again gets a thin layer of olive oil so it’s easier to get in an out of the bowl, and is then allowed to rise at room temperature for another 3-4 hours. (Maybe stretch and fold every hour or so, but I consider this less than strictly necessary.) Then split into 10 portions of about 225g each, roll them in flour, and off we go with pizza. Or calzone, if you prefer.

I am told that the honey serves two purposes: one, its sweetness takes some of the edge off the otherwise tangy sourdough, so it’s not really “sour” at all anymore. Two, something something enzymes in honey. I need to nerd out with a beekeeper about this at some point. ↩
If you have a garden, put the dumped-out starter in your compost pile. ↩

tag:xahteiwi.eu,2024-02-25:/blog/2024/02/25/lievito-madre/

My talks are good (and suitable for a big audience)

Florian Haas Feb 10, 2024 Updated Feb 10, 2024

This post will come across as cocky. But that’s okay, for once.

Show full content

Something that’s happened a few times over the past few months is a conference organiser, having attended my talk, coming up to me and saying something to the effect of:

Had I known you were going to give this good a talk, I’d have put you in a keynote slot.

So, dear conference organisers, in order to spare you that afterthought, please be advised that I usually give damn good talks. The kind of talk that people will say was their personal favourite of the conference, or at least one of the top three.

This wasn’t always the case, for like any craft I had to learn it, and I can look at some recordings of early talks of mine, and I cringe. I am also no stranger to nagging doubt about the relevance of my talk to the audience, setting in 24 hours prior to my cue.

But having done it for close to 20 years now, I can say with confidence that my talks are damn good.

They’re relevant, well put together, prepared starting weeks in advance, rehearsed to the max, timed to 30-second accuracy, constantly tweaked for accessibility, they use the best presentation technology available, and occasionally they even have the audience in stitches.

What’s more, I submit many of my talks to more than one conference, a few months apart. Thus, you might be able to review the full slide deck and a video of the talk at the moment it comes across your proposal review. When that happens, you can rest assured that the talk that your conference is getting will be even better than the one you are watching, because I will update it with feedback from the first talk, will have the confidence of the talk already having worked out well once, and will still rehearse it in every free hour I have in the lead-up to your conference.

Of course, if you accept my talk, I’ll be happy in any slot, because I’m always grateful for the opportunity to speak, and chances are I’ll be enjoying your conference no matter what happens. But if you’ve already decided that my talk fits your programme, and your choice is now to put me in front of a big audience or a small one, chances are that picking the former will be to your conference’s advantage.

Having concluded this flex, an acceptable degree of humility will now return to this blog.

tag:xahteiwi.eu,2024-02-10:/blog/2024/02/10/my-talks-are-good/

3 places to eat: Vienna

Florian Haas Nov 18, 2023 Updated Nov 18, 2023

I’ve done this for several other cities. It figures I should do one for mine.

Show full content

I’ve been asked to finally add an entry in this series for my hometown. Of course, these are not necessarily touristy places, nor are they stereotypically Viennese or Austrian, nor particularly upscale. They’re just good.

Gorilla Kitchen

Gußhausstraße 19

It’s not easy to find a good burrito in this city, but this place definitely has them. This is just a stone’s throw from Karlskirche and a short walk from the Karlsplatz U-Bahn station.

All burritos are made to order. They have meaty and vegan options, and 3 degrees of spice in the sauce.

Wehrgasse 8

My favourite Chinese place in Vienna, equal distance (about 7 minutes’ walk) from the Kettenbrückengasse and Pilgramgrasse U-Bahn stops.

Not your typical buffet-style Chinese restaurant, but rather a lovely a la carte place. Their menu includes ingredients not typically served in more mainstream Chinese restaurants in Europe, such as lamb liver, tripes, or frog legs.

Kent

Märzstraße 39

Vienna has many, many Turkish restaurants, but this is my favourite by a large margin. You can swing by for takeaway from their charcoal grill, but it’s also a nice place for a sit-down dinner with friends.

Kent has multiple locations around the city, but for some reason the one on Märzstraße (right next to the Schweglerstraße U-Bahn station) has always been my preferred one.

tag:xahteiwi.eu,2023-11-18:/blog/2023/11/18/3-places-to-eat-vie/

3 places to eat: Stockholm

Florian Haas Nov 15, 2023 Updated Nov 15, 2023

Getting hungry in Stockholm?

Show full content

Continuing the series of selecting 3 places to eat at random in a major city, here is the entry for Stockholm.

These were collected on a recent trip to PyCon Sweden, and as a result are all around Slussen on Södermalm (the island just south of the Old Town).

Omnipollos Hatt

Hökens gata 1A

Solid craft beer bar with incredible pizza, and I don’t say that lightly. Somewhat crowded even on a weeknight, but well worth it — and if you’re not feeling social, they have takeaway.

Apparently, if you happen to arrive during the week that Swedes call “summer”, you can also sit on tables curbside.

La Neta

Östgötagatan 12B

Totally random find that turned out to be a seriously terrific taquería. This is one of three locations across the city, and they also do takeout.

Excellent tacos and quesadillas, with a small selection of Mexican beers.

Tabbouli

Tavastgatan 22

Absolutely sublime Lebanese place, more upscale than the other two spots I’ve listed here. Also in three more locations around Stockholm.

Their entire menu is exquisite, though I am a particular fan of their kibbeh nayyeh and hummus kawarma. That said, they have plenty of excellent non-meat options too.

tag:xahteiwi.eu,2023-11-15:/blog/2023/11/15/3-places-to-eat-sthlm/

Are you working in a remote office?

Florian Haas Oct 28, 2023 Updated Oct 28, 2023

Does your company try to do everything companies have always done in an office, just with remotees? Not a grand idea.

Show full content

In a recent Mastodon discussion, Martin Seeger accidentally coined the phrase “remote office”. About which I then mused that it describes perfectly what’s wrong in far too many organizations: that they think they can get away with all the anti-productive nonsense they’ve been doing in offices for decades, but now with a bunch of remotees.

So, dear reader who (at least frequently, if not permanently) works from home, do you surmise you might be working in a remote office? I have a few questions you might want to think about.

If you get pulled into video meetings with no advance notice and no chance to prepare, yanking you out of whatever you’re doing, like someone shouting down the hallway from a conference room something like “Alex, could you just come in here for a moment?”, you might be working in a remote office.
If managers direct-message you in the company text chat demanding your attention now, like a shoulder-tapping “management by walking around” amateur, you might be working in a remote office.
If, in the event of some important news breaking, your head honcho calls everyone into an ad-hoc all-hands video meeting rather than sitting down to write a 5-paragraph email reflecting functional literacy, you might be working in a remote office.
If it’s tolerated for people to ignore information circulated ahead of a discussion, and to come into it blissfully unprepared expecting to be brought up to speed by other participants who did prepare, you might be working in a remote office.
If everyone is stuck in meeting hell and hates it, and people at the top aren’t doing anything to fix it, you might be working in a remote office.
If it’s somehow considered acceptable for people to drop comments in a long issue tracker thread that start with “I didn’t read the whole thing but here’s my take”, you might be working in a remote office.
If you have some middle manager insisting that he1 is expected to “lead, not read”, you might be working in a remote office.
If managers insist on a daily scrum or standup or similar abomination, you might be working in a remote office.
If managers get all uneasy at the idea of not being able to monitor everyone’s being “at work” for whatever definition thereof they apply, you might be working in a remote office.
If not nearly enough people have any grasp of the importance of acknowledgment in communication, you might be working in a remote office.
If somebody extolls the necessity of socializing with your work mates, you might be working in a remote office.
If all coordination at your multinational company explodes in a massive fireball twice a year around daylight saving time changes, instead of everyone just using UTC year-round, you might be working in a remote office.
If your organization’s idea of documentation is “ask Joe” or whoever the longest-serving technically capable member of staff is, you might be working in a remote office.

And in that case, here’s something else you might want to read.

I use “he” here because in 20 years I have never heard anything like this from someone who wasn’t a man. ↩

tag:xahteiwi.eu,2023-10-28:/blog/2023/10/28/remote-office/

Rootless Podman, systemd, and Docker Compose files

Florian Haas Oct 26, 2023 Updated Oct 26, 2023

How I run containers for my Home Assistant deployment

Show full content

This is a summary of how I run a set of Docker (actually, Podman) containers for my Home Assistant setup on a Raspberry Pi. It works reasonably well for me, so I am sharing it here in the hope that it is useful to others.

The stage

I run my Home Assistant environment on a Raspberry Pi 4B running, currently, Ubuntu 23.04 Lunar Lobster. In total, that little machine runs five containers, out of which 3 are related to Home Assistant:

One is for Home Assistant itself,
one is for running the Mosquitto MQTT broker,
and one is for running Grott.

For all these services, the respective developer communities do not only maintain official Docker images, but also supported or at least recommended Docker Compose configurations.

I wanted a way to make the most of those available configurations, so as not to reinvent too many wheels.

How I manage containers

I prefer my containers to run in the context of users other than root.

Per-container system users

This means that I create a dedicated user for each container. What’s important is that in order to be able to use systemd user services later, I enable lingering for each user account.

For example:

$ sudo -i
# useradd homeassistant
# adduser homeassistant bluetooth
# loginctl enable-linger homeassistant

In order to actually enable lingering for the affected users, one must apparently reboot the machine after this change.

(I’ll get back to why I add the homeassistant user to the bluetooth group in a moment.)

Podman

I also don’t very much like the daemon-driven approach from Docker proper, so I tend to prefer podman as my container manager on a small system like the Raspberry Pi.

Podman tends to not be particularly well covered in the documentation of the projects I work with, but that is not much of an issue: I can combine Podman with a compatibility layer, podman-compose, so that although I am actually using Podman, I can configure my containers with an unchanged YAML configuration originally written for Docker Compose.

Here’s how I can install the necessary packages on my Raspberry Pi:

# apt install podman podman-compose

Next, I create the necessary Docker Compose configurations in the home directory of a user created to run that container.

For example, the /home/homeassistant directory, owned by the user homeassistant, contains this docker-compose.yaml file:

# /home/homeassistant/docker-compose.yaml
---
version: '3'
services:
  homeassistant:
    container_name: homeassistant
    image: "ghcr.io/home-assistant/home-assistant:stable"
    volumes:
      - /etc/localtime:/etc/localtime:ro
      # Replace this volume mapping with wherever
      # you want to put your Home Assistant configuration
      - /home/homeassistant/.config/homeassistant:/config
      - /run/dbus:/run/dbus:ro
    ports:
      - 8123:8123
    restart: always
    environment : {}

You can of course create a more elaborate configuration as you please.

Once this is set, I can manually fire up my container as a non-root user, using Podman, like so:

$ id
uid=1003(homeassistant) gid=1003(homeassistant) groups=1003(homeassistant),124(bluetooth)

$ podman-compose up
['podman', '--version', '']
using podman version: 4.3.1
** excluding:  set()
['podman', 'network', 'exists', 'homeassistant_default']
podman create --name=homeassistant --label io.podman.compose.config-hash=123 --label io.podman.compose.project=homeassistant --label io.podman.compose.version=0.0.1 --label com.docker.compose.project=homeassistant --label com.docker.compo
se.project.working_dir=/home/homeassistant --label com.docker.compose.project.config_files=docker-compose.yaml --label com.docker.compose.container-number=1 --label com.docker.compose.service=homeassistant -v /home/homeassistant/.config/h
omeassistant:/config -v /usr/share/zoneinfo/Etc/UTC:/etc/localtime:ro -v /run/dbus:/run/dbus:ro --net homeassistant_default --network-alias homeassistant -p 8123:8123 --restart always ghcr.io/home-assistant/home-assistant:stable
[...]

Systemd

Once I am satisfied that my container comes up just fine, the next step is managing it with systemd in user mode.

To do that, I need to create a config directory for systemd:

$ mkdir -p ~/.config/systemd/user

… and create a single file in there, which I name podman-compose.service:

[Unit]
Description=Podman via podman-compose
Wants=network-online.target
After=network-online.target
RequiresMountsFor=%t/containers

[Service]
Environment=PODMAN_SYSTEMD_UNIT=%n
Environment=PODMAN_USERNS=keep-id
Restart=always
TimeoutStartSec=60
TimeoutStopSec=60
ExecStart=/usr/bin/podman-compose up --remove-orphans
ExecStop=/usr/bin/podman-compose stop
Type=simple
WorkingDirectory=%h

[Install]
WantedBy=default.target

Note that many other tutorials about running docker-compose or podman-compose from systemd recommend you set Type=oneshot instead, and add the -d option to the ExecStart command.

I think using the simple type and omitting the -d option is the better idea, because in doing so,

I see the latest log lines from the container in systemctl --user status podman-compose,
I can access the full log with journalctl --user -u podman-compose,
I get more reliable output overall from systemctl --user status podman-compose, because rather than only reflecting whether starting the container was successful, it tells me whether it is still running at the time I check.

For more details on what the various %-prefixed specifiers mean, see the relevant section in the systemd documentation.

The Environment=PODMAN_USERNS=keep-id entry is somewhat crucial in a Home Assistant configuration. This, in combination with adding the homeassistant user to the bluetooth group and bind-mounting the /run/dbus directory, enables me to use the Raspberry Pi’s Bluetooth controller from the rootless container.1 That comes in handy for Home Assistant integrations for sensor devices using BLE.

Then, running

$ export XDG_RUNTIME_DIR=/run/user/$UID
$ systemctl --user daemon-reload
$ systemctl --user start podman-compose
$ systemctl --user enable podman-compose

starts my container, and also brings it up (under the non-root user account) every time the system boots.

In summary

What’s nice about this whole approach is that for all of my container-based services the configuration is exactly identical, except for one thing that differs from service to service: the docker-compose.yaml file.

Thanks to GitHub user “Fattire” for an immensely useful GitHub comment on this subject! ↩

tag:xahteiwi.eu,2023-10-26:/resources/hints-and-kinks/rootless-podman-docker-compose/

On Responsibility

Florian Haas Oct 25, 2023 Updated Oct 25, 2023

What does it mean to be responsible for something?

Show full content

You may know that I am fairly active on Mastodon (my profile is here), and one of the things I like a lot are polls. I use them largely to get an outside view on my own thoughts.

Recently, I posted one such poll. It talked about a hypothetical scenario, and the question read:

A screw-up happened. The screw-up is ultimately due to a crucial piece of information not being relayed to the right person. Multiple levels of seniority were in the loop, all of whom could have caught the omission, but nobody did. Who bears responsibility for the screw-up?

The options were:

A — Everyone, jointly.

B — The most senior person1 involved.

C — The person who would have normally been “most likely” to catch the omission.

D — Someone else. (You pledge to add a comment who that would be)

The poll ran for 24 hours, 276 people replied, and these were the results:

A solid majority of 63% chose the “everyone, jointly” option (A).
28% selected the most senior person involved (B).
3% and 5%, respectively, selected options C and D.
Some of the “D” options were sarcastic to outright cynical, amounting to, more or less, “the most convenient scapegoat” (note that the cynicism in that answer doesn’t make it wrong, though).

Now, I have a question for the 63% supermajority: have you any idea what “responsibility” means?

Have you been brainwashed by your corporate overlords that “bearing responsibility” is the same as “sharing the blame”? Or, worse, have you been tricked into the insanity that “responsibility” and “accountability” are two different things, rather than two terms for one inseparable concept? Because those are the only two ways I can think of in which dispersing it to everyone makes any sense.

Bearing responsibility is not something that starts when shit hits the proverbial fan.

Saying “I am responsible for something” in a professional context means: I’m taking it upon me that it works, and if it stops working I’m taking it upon me that it works again. Whatever that is: it can be a tiny piece of machinery, or a big, complex process involving the coordination of dozens or even hundreds of people.

And that rolls up with seniority. The higher up you are in the decision chain of a project, a department, or even a whole organization, the more responsibility you have. And making sure that the right people have the right information at the right time is a core responsibility of leadership. So if you’re in any leadership position, that is something you need to make sure works, and if it stops working, you need to fix it.

So in my not-so-humble opinion there is only one sensible answer to that question, and it’s B, and if we’re expecting otherwise, that means we are letting managers who don’t fullfill their responsibilities off the hook.

Bjoern Michaelsen had this to say in a reply:

I was tempted by A, but took B because the “jointly” suggests the responsibility is shared and reduced by the sharing. But responsibility is not a zero sum game: even if someone below you in the hierarchy is responsible, it does not reduce your own.

Higher ups need to hire the people to ensure fuckups don’t happen and design the workflows and processes that work for the people they hired.

TL;DR: responsibility can be shared, but never divided.

And that sums it up pretty nicely.

The phrasing “most senior person” caused some confusion. What I meant was “the most high-ranking person in the organization as applied to the subject at hand”, which could be an appointed project manager, or someone who by virtue of their line management position is the highest person in the food chain — but that’s a bit too long for a Mastodon post. Another respondent suggested “the person highest in the chain of command”, which I had considered but deliberately rejected, fearing that some readers might balk at the military nature of that term. ↩

tag:xahteiwi.eu,2023-10-25:/blog/2023/10/25/responsibility/

Follow-up and follow-through

Florian Haas Oct 7, 2023 Updated Oct 7, 2023

It comes with the territory.

Show full content

It recently occurred to me that something I thought was a basic elementary aspect of management at all levels is apparently, in fact, unusual in many organisations. It’s the idea that you can’t simply handwave and tell people to do something and if they don’t do it then it’s their fault, but that as a manager you retain ultimate responsibility and accountability1 for whatever you assign or delegate.

This means that it’s a core part of your job description that you ensure that the people you work with actually work on, and have a chance to accomplish, the objectives you entrust them with. This basic concept in management is known as follow-up and follow-through. And I maintain that being unwilling, incompetent or incapable of follow-up and follow-through should disqualify any person from a management position.

Effectively it means that your people should — having agreed with you on realistic, achievable objectives — be able to rely on you that you’ll clear any obstacles that stand in their way.

They might need to coordinate with someone outside your immediate team, perhaps someone they haven’t worked with or even met before, and it’s your job to facilitate that collaboration, proactively — that is, ideally you should be thinking of the right person to coordinate with, before they do.
They might need approval from some higher-up or a green light from a customer, and it’s your job to secure that.
They might need something as simple as a sign-off on an expense, and it’s your job to know the applicable policy, get any approval necessary, and then give your report a simple yes or no answer.
They will rely on you for updates on the schedule. Perhaps we need to get something done faster, or maybe there’s a holdup somewhere that extends our time window.
And they will rely on you to follow up on things if they don’t go as planned.

Sometimes, when people ask me how I do that and then I ask other the person what’s keeping them from doing that, their answer gives me the impression that they feel unable to, simply because they don’t know what their people are working on, at any given time. And when I tell them that I always know exactly what each of my direct reports is working on, they assume I must be working in a single office with them. At which point I politely inform them that in fact the average distance between any two of my team members is about 2,000 kilometres, and we see each other in person about twice a year (tops), and we have exactly one video conference per week, and I detest corporate surveillance tactics like “camera on” mandates, and we don’t use chat — and then they think I’m either nuts or some sort of a wizard.

Okay. If you’re not already convinced I am nuts, then let me share a bit of alleged wizardry with you.

Use tools, tool-using species!

If you have some kind of a system2 that allows you to describe individual units of work — let’s call them “tasks”3, which contain

a description (what the task is meant to accomplish; you can also call this the “objective” if you’re being fancy),
some kind of time span (say, from a projected start date to a projected end date),
a person who is assigned the task,
a status,
any number of links to related tasks or documents,
any number of free-form comments,

then you have everything at your disposal to never ever again be clueless about your team’s work, and never ever be incapable of follow-up and follow-through.

In using such a system, it is our job to define a few ground rules.

The first thing we define is what status types apply to tasks. These ultimately define our agreed-upon, practiced, and documented workflow — that is, something that actually deserves being called a process. I have seen people build absolutely Byzantine workflows with a grotesque proliferation of statuses, but I’ve found these five to be perfectly sufficient:

Backlog: we have defined the task, we want to get to it at some point, but it is currently not being worked on. It thus is not assigned to anyone, and has no due date.
To Do (may also be named Scheduled or Selected): the task is assigned to someone, and it has a completion date, but work on the task has not started.
In Progress: the assignee is working on the task.
Done (or Completed): the assignee has completed the task.
Declined (or Rejected): we have reconsidered a task that was previously on the backlog, and have decided against pursuing it after all.

The second thing we define is how tasks can relate to each other. Again, it really doesn’t take many of these categories; I think three are enough:

Blocker: one task can be blocked by another, so task A cannot continue/proceed before task B is finished.
Cause: one task can evolve as a direct result of another. For example, a task to investigate a certain problem, assigned to person A, may lead to a task to fix that problem in a certain way, assigned to person B.
Relation: pretty much “everything else”, as in task A has something to do with task B, other than blocking or causing it. Thus, anyone looking into one of the tasks may benefit from understanding it in the context of the other.

Now, we get to reasonably expect the following things from our reports:

They update the task status according to their actual progress on the task.
They add any relevant information related to the task either as comments, or as cross-references to other forms of documentation.
They create and maintain relevant cross-links to other current or previous tasks.
They peruse those cross-links and cross-references in the completion of current and future tasks.

That’s it. It’s not reasonable to expect consolidated status reports from individuals on a weekly basis. Neither is it to torture them with daily standups.4 We do, however, get to use the tools that our system provides for us to visualise, organise, contextualise, and comprehend the tasks that our team is working on.

A Kanban board is an excellent facility to do all that. Whatever tool you prefer or your organization mandates, it will most likely let you build a Kanban board for all the currently pending tasks on your team, which fulfills just a handful of criteria:

We need 4 columns, one of each of our statuses (excepting the Declined one).
We need the ability to filter by date, depending on whether we want a weekly, monthly, quarterly or annual overview.
We probably want to be able to filter by person.
We also want some sort of automated colour coding. For example, we can automatically colour tasks green if they have a defined start date in the future (“future tasks”). We can show those with a defined completion date in the past (“overdue tasks”) in yellow, and those with a “blocker” relationship with another open task (“blocked tasks”) in red.

And that gives us practically everything we need for follow-up and follow-through, at a glance:

Something is yellow? You can check on the comments in that thing and see if the person has run into a roadblock, or needs help, or has asked a question for which you can provide, or ask someone else to provide, an answer. You can ask (again, in a comment) if there is something you can do to facilitate, or you can suggest a different approach, or you can simply conclude that the task will take a few extra days, and bump the due date.
Something is red? You’ll want to investigate what the hold-up is, which you might be able to remove or mitigate. Or maybe you’re not, in which case you can check on green tasks and maybe ask the person to take one of those on early instead, while they wait for the blocker to be removed.
Work on something should have started by now, but is still in the “To Do” stage? Maybe you want to apply your person filter, see what else that person has on their plate, and check on that. Maybe that person is unexpectedly overloaded, or someone else is better suited for this task, or has fewer plates to juggle. Or maybe the assignee is actually hard at work on the task already, as is evident from their comments therein, but they have simply forgotten to update the status.
Or perhaps it’s Friday5 noon and someone just closed out their last task for the week? Good opportunity to drop in the comments to say thanks and wish them a lovely weekend.

But my org has more than one team!

Sure. Up to this point, for reasons of simplicity, I have excluded cross-team collaboration. But trust me, the first thing you’ll want to get going is what I described above, at the team level.

To extend this notion to the entire organization, it is furthermore helpful if we don’t have just one level of “task”, but three:

A regular “task” is something that can realistically be achieved by one person.
That person may, for their own benefit, break this task down into something we can call “subtasks”3 to make them more manageable. It may also be prudent for another person to take care of an individual subtask, however: responsibility for completing the task remains with the task assignee, and subtasks can only ever be taken on by someone on the same team as the assignee of its parent task.
When it is necessary to collaborate across teams, we collect multiple tasks into “supertasks”3, so that everyone has a consolidated view of the common objective. Obviously, the ultimate responsibility for the completion of the supertask then moves one level up, to whoever has decision authority over everyone involved. For example, if an organisation has “departments” made up of “teams” and a supertask includes work done by multiple teams in the department, then supertask responsibility rolls up to the department head (who, of course, can fulfill their follow-up and follow-through responsibility by applying the same concept described here, one level up).

And then, we only add one more colour to our Kanban board palette: blue is for tasks that have at least one relationship (of any type) with a task owned by another organisational unit. These are the ones that you, as the unit lead, monitor for anything you need to coordinate with the other unit lead. Again, facilitating coordination with other organisational units is our job, not something where our reports must fend for themselves.

In summary

As a manager at any level, follow-up and follow-through is a core element of our responsibility. In a contemporary technology company using standard tools of the trade, this is easier to do than ever before. So let’s just do it.

I argue that the idea of responsibility and accountability being two separate things that can be split between two people is a gross distortion of reality and an insult to common sense, but that’s a rant for another day. ↩
It is of secondary importance what system it is that you choose to implement the concepts I describe here. They can be applied in a number of systems — Taiga, Trello, Jira, GitHub Projects all come readily to mind, as do others. At least theoretically, you might even use sticky notes on a big sheet of brown paper tacked to a wall near your desk. ↩
I deliberately use the generic terms “task”, “subtask”, and “supertask” here. The system of your choice my use different terms. What matters is that you have a “thing” which can have zero or more children and zero or one parents. ↩↩↩
My working hypothesis is that the existence of a mandatory daily meeting is an indicator for incompetence, poor communications, or functional illiteracy, or any combination of any of these. ↩
This assumes the weekend in your culture is Saturday/Sunday. ↩

tag:xahteiwi.eu,2023-10-07:/blog/2023/10/07/follow-up-follow-through/

Creativity (PyCon Italia 2023)

Florian Haas May 25, 2023 Updated May 25, 2023

PyCon Italia 2023 was wonderful. Here’s my talk.

Show full content

This year, I traveled to Florence for my first PyCon Italia. It’s a wonderful conference in a lovely city, and I hope to return.

PyCon Italia accepted my Creativity talk that I’d already done at DevOpsDays Tel Aviv the prior December, and I was much happier with it the second time around.

Also, thankfully I did not have the audio issues that plagued the Tel Aviv talk, so the recording came out rather well, too. It is available on YouTube. My slides (with full speaker notes) are available here.

tag:xahteiwi.eu,2023-05-25:/resources/presentations/pyconit-2023/

The trouble with Key Results

Florian Haas Mar 19, 2023 Updated Mar 19, 2023

Some thoughts sparked by a discussion on Mastodon.

Show full content

I like Mastodon. I really do. Ever since turning my back on the birds(h)ite, I enjoy the consistent quality of the discussions I’ve been having on the Fediverse. It’s nice that I can disagree with someone, without it turning into a roiling flamefest.

I recently had one such disagreement.

Coming across a post (in German) in which someone extolled the virtues of the OKR method, I took the liberty to reply with a simple, “I once wrote something about that”, with a link to my Meaningless Metrics, Treacherous Targets article. Clearly, its metrics obsession — every Key Result must be quantifiable and measurable, otherwise it’s not a Key Result — is something that I consider highly problematic about the OKR method, for the reasons which that article outlines.

A person other than the original poster1 then stepped in and defended the method by arguing in favour of having objectives. Those by themselves are of course not something that I disagree with in the least. It’s just that without Key Results (and the fixation on metrics they bring in), it’s not OKR anymore. It’s not a novel approach either: without metrics obsession, you can trace objectives-based management/leadership back to least 1888, when the Prussian army extended Auftragstaktik to all levels of command.

But then the person pointed out that in their mind, Objectives were really more about habit-forming than about short-term goals.2 They gave the example of an Objective being “living healthy”, and a Key Result being “exercise twice a week.”

And now that’s an interesting proposition that I want to get into in a little more detail. Because obviously living healthy is a good and sensible goal to pursue. But OKR is a great way to muck it up — just like it’s a great way to muck up most good and sensible goals one might pursue.

Let me explain.

First off, “living healthy” is a goal that an OKR practitioner employing accepted Best Practices would probably reject outright, because it is “business as usual”. Living healthy is clearly a long-term objective, not one that you should define for a year or a quarter or a month, pursue with great rigour, and then move further on down your priority list in favour of the next period’s OKRs.

It is also not an “actionable” goal, because living healthy is as much about doing things (exercise, get enough sleep, eat healthy) as it is about refraining from things (smoke tobacco, consume alcohol and drugs).

But, let’s think the scenario through and let’s conjure up a person, aged about 40, male, 185cm tall, slightly overweight, not always eating well. Let’s call him Frank.

Assume further that Frank is a former habitual smoker that managed to quit five years ago. And suppose Frank wanted to use the OKR method to attain an Objective of “living healthy” for a quarter, setting the following Key Results:

Do 26 hours of exercise (that’s 1 hour, twice a week, for the 13 weeks in a quarter).
Get an average of no less than 7 hours and 30 minutes of sleep per night.
Smoke zero cigarettes and refrain from consuming tobacco products in any other form.
Eat junk food no more than 5 times.
Attain a weight of 82kg at the end of the quarter.

Now, all of these Key Results sound perfectly reasonable. They are something that a person in that situation should strive for, are they not.

Now assume that — in accordance with the OKR method — we consider the Objective achieved when (and only when) all Key Results are met. And assume further — as is commonplace in organizations that use OKR — that there is some sort of reward that awaits Frank upon meeting his Objective. Assume, for example, that Frank can put his defined Key Results into an app, and if he meets his Objective the app gives him a badge that we’ll call the Goal Keeper award. Frank can share this with his friends on social media, and it gets Frank a 20% discount on his next purchase of athletic shoes, at his preferred store.3

OK, now. We are 11 weeks into the quarter, and Frank is in the following situation:

He has already completed 32 hours of exercise, because he managed to put in one extra hour of exercise, in more than half of the preceding weeks.
He has slept a little less than planned, and his average stands at 7 hours and 15 minutes per night.
He has smoked not a single cigarette and has not consumed tobacco in any form.
He has had junk food only 3 times.
Since he has dampened his junk food cravings on several occasions by eating more sugar (which does not factor into a Key Result by itself), his weight now stands at 85kg, 3kg above his goal weight.

Now, on face value, what should Frank be doing in order to live healthy (which was his original goal)? Clearly, he should probably continue his exercise regime, go to bed a little earlier, and consume a bit less sugar. Quite probably, he can lose those three extra kilograms easily in six weeks, at a healthy weight-loss of 500g/week.

But with a potential reward looming that is tied to meeting the Objective (this is where Goodhart’s Law comes in), it stands to reason that Frank will follow a different line of reasoning:

“I can lose 3 kilograms in two weeks if I crash-fast. It will drain all my energy, but I don’t need that energy for exercise anymore, and if it makes me really tired and I sleep for 9 hours a night for the next two weeks, so much the better: it’ll push my sleep average above 7.5 hours, so I’ve got that box ticked as well.”

And that is a clear example — and a depressingly common one — of a perverse incentive: Frank now has an advantage out of being less healthy at the end of the period than he otherwise could have been: at the end of the quarter, he will be two weeks out of training, he will have slept more than he needed (which has no health benefits), and his caloric balance will probably be solidly upset, so that his weight will bounce right back up once he has passed the end of quarter, collected his discount code, and ceased his crash-fast.

But it gets worse. Up until this point, we have assumed that Frank setting goals for himself did so naively, with no plans to game them — he only did so, eventually, when he felt an incentive to do so. But of course, that’s not how humans operate, at least not beyond the first iteration. The next time around, Frank will build some wiggle room into setting the goal measures in the first place.

To discuss what that means, let’s return to the tobacco consumption goal. It’s blindingly obvious what’s a “good” number of cigarettes to smoke for a former habitual smoker: zero. The tobacco inhalation itself serves no beneficial purpose at all — in contrast to eating junk food, which at least does contain energy that your body can burn for useful purposes. And even a single cigarette may re-trigger substance dependency. So, clearly, zero it is.

But, setting this goal is extremely risky in terms of attaining the overall Objective, using the OKR method: if Frank had a weak moment and said yes to a cigarette proffered by a colleague on a break during an extremely stressful workday, and thereby failed his zero cigarettes goal, and this happened two weeks into a new quarter, any motivation to stick to the other Key Results for the remaining 11 weeks evaporates. I cannot meet my Objective anyway, the reasoning goes, so why should I bother trying for the other Key Results?

And so, the “clever” thing to do in the twisted logic of the method is to set a target of no more than two or three cigarettes a quarter, rather than zero which is the obviously more healthy choice.

And now here comes an interesting twist: when you confront someone with such examples of why a particular method is inherently flawed, they quickly retreat to a position of “well no method is 100% perfect; pointing out an imperfection does not render the method invalid”. That is true in principle — except when the thing that is broken is what defines the method.

I’m also pretty certain that someone will come forward pointing out that the hypothetical example I presented here is construed, that I am discussing something no-one in their right mind would define as an Objective, and that my Key Results are all nonsensical. The problem is, whenever I then challenge them to give me a good Objective and 3-5 good Key Results, it usually takes about five minutes to identify either a way to game them, or a perverse incentive that they managed to build in.

Again, there’s no OKR without the KR. And the fact that Key Results must be measurable, and must be 3 to 5 in number, and meeting all of them is what constitutes meeting the Objective, no exceptions — that is what defines the method. And if that is broken, then the method is broken.

I’ve heard the same from Scrum practitioners who have told me that “we do Scrum, but without sprints” — yes sprints are one of the many things that are broken about Scrum, but if you find sprints perpetually strung back-to-back to be non-sensical (and you should!) then you must also find Scrum nonsensical, because without sprints there is no Scrum.

And I’ve come to find this sort of goalpost move increasingly annoying. You don’t get so say “let’s use X” and when confronted with the fact that X must include Y, which is counterproductive and toxic, retreat to a position of “well then let’s use X without Y, but let’s still use X”. You can’t. Just own up to the fact that without Y, there’s no X.

I am not including the names of the people involved in the conversation, nor am I linking to the original Mastodon thread from here. This is because I really don’t want to finger-point at any one person: the arguments that were brought forward in favour of OKRs in the discussion are arguments that I have heard from almost every proponent of the method that I have talked to. ↩
OKR does not foster anything long-term or sustainable in my mind; I think it emphasises short-term gains. But that’s just my opinion. ↩
In my example I use a “reward” that has a social component, and a component that looks like a monetary benefit but really isn’t (it just lets you buy something cheaper that you might not even need). In an organization the reward might be financial, but it might also be as simple as being recognized or commended by a manager, or even just having less fear of being hit by the next round of layoffs. ↩

tag:xahteiwi.eu,2023-03-19:/blog/2023/03/19/key-results/

Making docs with MkDocs

Florian Haas Feb 23, 2023 Updated Feb 23, 2023

At work, my team and I built and launched a new documentation website, built on Material for MkDocs.

Show full content

This week, work launched a new documentation web site (Cleura Docs) which my team and I had been building for several months. Since this was my first foray into any significant tech writing in about 7 years, it was a fun exercise to see what tools are now available to the community, and how the technical landscape has changed in the interim.

This post is a summary of the technical considerations that went into creating that site, and the functional decisions that we made building it.1

What we use

We had, early on, made the decision that the site would use Markdown as its primary documentation format. This is because Markdown strikes a nice balance between richness of expression, and ease of use.

reST and DocBook are probably much more appealing to the die-hard tech writer, but they are also somewhat impenetrable and difficult to grok. AsciiDoc is just as expressive as DocBook (and indeed is semantically equivalent to it), but it is also somewhat obscure and niche. Markdown, in contrast, is ubiquitous and comparatively intuitive, which makes it accessible to contributors who aren’t full-time professional writers, which is exactly what we were looking for.2

What we also wanted was a static site generator that could be kicked off from a CI run, with the ability to host the results pretty much anywhere. We currently run on GitHub Pages, but there is nothing that keeps us from running anywhere else.

So the combination of those factors quickly led to the selection of MkDocs, which I had last used in 2016 or thereabouts, and holy mackerel has it come a long way since. This is in no small part due to Material for MkDocs (also known as mkdocs-material), which is a staggeringly excellent way to render MkDocs sources, and has become something of a contemporary de-facto standard for technical documentation in the industry.

Finally, as a theoretical documentation framework we adopted Diátaxis, which is also becoming something of an industry default.

We furthermore decided that within the Diátaxis framework, we would follow this order of priorities: How-to guides would come first, followed by the necessary amount of reference material. Once those bits were considered sufficient to be useful (not “complete” — documentation is never complete), we would be ready to drop the “beta” warnings from the site and announce it publicly. Then we would start adding background explanations, and finally, tutorials.

As I write this article, the site is out of beta, we have just started on the background bits, and no tutorials do as yet exist — although we do have academy.cleura.cloud which has full-blown training courses.

How we use it

A non-technical decision that we also made early on is that the documentation should be available under a Creative Commons license, and that its whole build chain should be publicly available. This is nice, because it allows me to go into the nitty-gritty of some technical details.3

So, I am next going to go into a few elements of our MkDocs configuration that we found particularly useful.

Plugins

There is an inordinate number of plugins available for MkDocs (and some, specifically, for mkdocs-material), of which we use a handful.

macros

This is a plugin that allows you to use Jinja2 expressions in your Markdown sources. It’s exceedingly useful because product and service names, and other terms that may be relevant to your technical documentation, change. Whenever that happens, you normally hear tech writers groan because they now need to dust off their grep and sed skills and embark on a massive search-and-replace effort.

With mkdocs-macros, you just define a variable under the extra key of your mkdocs.yml dictionary, and you’re off. Like so:

extra:
  support: "Service Desk"
plugins:
  - macros:
      # These settings are helpful because you want your build to fail if you're using an undefined macro.
      on_error_fail: true
      on_undefined: "strict"

And then you can do this, in your Markdown sources:

## Getting Help

For any further questions, contact our {{support}}.

Then when your support team decides they want to rename from “Service Desk” to “Service Center”, you change just one line in your configuration.

git-authors

I think it’s always valuable to credit people’s contributions individually. The git-authors plugin lets you do that quite nicely. And it even gave me the opportunity to make a tiny code contribution in the process of incorporating it into our build.

plugins:
  - git-authors:
      enabled: true
      show_email_address: false

You can take a look at any random page on the site for the inconspicuous “Authors” list at the bottom of the page.

htmlproofer

One of the things that everyone hates when perusing documentation is when it contains dead links. I think it is therefore incumbent on us documentation authors to employ a link checker, run it on every build, and not publish documentation that links to HTTP 404s. The htmlproofer plugin lets us do just that:

plugins:
  - htmlproofer:
      # We want dead links to fail the build, not just produce a warning.
      raise_error: true
      validate_external_urls: true

Note that this can add a significant amount of time to the build (up to 50 seconds, in our case), so we find it helpful to be able to disable external link checking when we run mkdocs serve locally. We can do that by adding one more line to the configuration:

plugins:
  - htmlproofer:
      enabled: !ENV [DOCS_ENABLE_HTMLPROOFER, True]
      raise_error: true
      validate_external_urls: true

Now if we don’t do anything specific, links will be checked. This is also what we use in CI runs.

However, we can also do this, which greatly facilitates work-in-progress:

export DOCS_ENABLE_HTMLPROOFER=false
mkdocs serve

Content tabs

A feature in mkdocs-material that has proven to be very useful are content tabs.

It turns out that more often than not, particularly when dealing with an infrastructure platform, there’s more than one way to do something. Then, you often end up interspersing explanatory content (which is the same regardless of the tool you use) with command examples (which are of course tool-specific). The use of content tabs makes this kind of content a breeze to write and maintain.

For example, we expose an S3-compatible object store API with Ceph radosgw, and there you can frequently do things just as well with aws s3api or s3cmd or mc. With content tabs, we are able to explain complex features in a relatively uncluttered and compact way, without losing the necessary detail.

This comes in handy even if we want to be specific about some functionality not being available in a particular tool. Consider this example from the page on S3 bucket versioning:

## Enabling bucket versioning

To enable versioning in a bucket, use one of the following commands:

=== "aws"
    ```bash
    aws --profile <region> \
      s3api put-bucket-versioning \
      --versioning-configuration Status=Enabled \
      --bucket <bucket-name>
    ```
=== "mc"
    ```bash
    mc version enable <region>/<bucket-name>
    ```
=== "s3cmd"
    This functionality is not available with the `s3cmd` command.

Git integration

Material for MkDocs has excellent integration with GitHub, GitLab, and other Git-based revision control and collaboration platforms.

We chose to use that to its fullest extent, to the point where every single page has an edit button, and things are made as easy as possible for drive-by contributors. We also wrote a guide for submitting changes, and a general contribution guide.

For people who don’t want to write a patch but do want to report a problem or bug, we implemented GitHub issue forms (currently in beta, we hope they stay) — and wrote a separate guide for using those, too.

CI and deployment automation

Our test/build/deploy pipeline runs from tox, very similar to what I’ve covered in some detail before. This means that we can ship a .githooks directory enabling documentation contributors to run the full test suite on every commit and push, that we can keep our GitHub Actions workflows rather simple and lean, and that we can switch to a different build platform (such as GitLab) quite easily if we choose.

Analytics

At work we are acutely GDPR conscious, so Google Analytics were a non-starter. Thankfully, there is a European, privacy-preserving, lightweight site analytics solution in Plausible (which I also use for my site), which you can incorporate into mkdocs-material with a very tiny theme override. Feel free to take a look at the PR if you want to do something similar.

How it’s going

Overall, feedback on the new site has been unanimously positive. This is nice, but what is even better (and highly important, in the long run) is that people evidently find it very straightforward to make contributions. Our colleagues no longer even ask how they can help out — they just do it, some of them making extremely impressive content additions even on their first PR.

So that feels very promising.

There is also an article the Cleura blog, published a few weeks after this post, that talks about the organizational considerations that went into building the documentation platform. ↩
Lest you think I am bashing something I am clueless about, I have used all the mentioned formats for technical documentation in a professional capacity: reST (with Sphinx) for contributions to the Ceph and OpenStack docs, AsciiDoc in the context of Linux-HA, and DocBook XML for ancient versions of the Pacemaker documentation and, believe it or not, for my thesis. And Markdown, obviously, for too many things to count. ↩
You are welcome to take a peek at the GitHub repo where we maintain the documentation sources and CI infrastructure. ↩

tag:xahteiwi.eu,2023-02-23:/blog/2023/02/23/mkdocs/

Brown M&Ms, UTC, and ISO 8601

Florian Haas Feb 10, 2023 Updated Feb 10, 2023

What have brown M&Ms got to do with date/time formats and mutual respect? More than you might think.

Show full content

The Van Halen “brown M&Ms” story is a classic tale of rock’n’roll lore. In 1982, Van Halen had a famous concert rider that included a requirement that was patently ludicrous at face value: backstage, the catering at a Van Halen show had to provide bowls of M&Ms chocolate candies — with all the brown ones removed. That’s right, there had to be M&Ms, but if there was a single brown one to be found, this would constitute a breach of contract on the promoter’s part.

This example is frequently trotted out as an example of crazy rock stardom. Clearly, this was an episode of fame getting to the heads of a group of kids that had suddenly hit the big time. Or was it?

David Lee Roth, the then-singer of Van Halen, explained the reason behind the “no brown M&Ms” rule in an interview nearly 30 years later.

In 1982, the Van Halen show was one of the biggest acts on the North American tour circuit. Their stage lighting rigs would look positively tame by modern standards, but at the time very few acts were touring with equipment that was as power-hungry as Van Halen’s. (Consider that at the time it wasn’t uncommon for bands to just play under the venue’s house lights, rather than travelling with several truckloads of stage equipment as big acts commonly do today.)

So the band put together very stringent electrical wiring and power distribution requirements in their promotion contract. Their rider would specify the density and spacing between outlets, amperage requirements, and fuse ratings. (Clearly, Van Halen had a vested interest in not blowing a fuse or tripping a circuit breaker mid-show, and in keeping their crew safe from electrocution by improper grounding.) This amounted to quite a compendium of documentation, and right in the middle of that binder they slipped a page specifying the catering requirements — including the now-famous “no brown M&Ms” rule. Its purpose was simply to check on whether the promoter had thoroughly read the contract.

Thus, if a band member or roadie walked into the backstage area shortly before sound check, and they found brown M&Ms, it stood to reason that the promoter hadn’t been paying close attention to that part of the contract. And that meant that they couldn’t rule out that the promoter had been sloppy with the power specs, too. So they would do a full line check to make sure that the equipment held up.

Allow me to change the subject.

I wrote the communication guidelines for my team. Eventually, since the company I work for hadn’t any such thing, they basically adopted large parts of what I wrote and made it company policy.

Two of the things that made it into the company policy, and are still standing rules on my team — and will remain so, as long as I run it — are these:

Always use the ISO 8601 YYYY-MM-DD format for any dates you specify.
Always use UTC for any time information you communicate. (Adding conversions to local timezones is OK, but not required — and UTC must always come first.)

People have called me silly and nitpicky for insisting on these things. But they serve a purpose.

You see, I work in an international team. I am from Austria and I would not specify today’s date as 2023-02-10 in regular written communications. I would put 10.2.2023. Someone who lives in Spain would use 10/2/2023. Someone from India would put 10-02-2023. And someone from the U.S. would write 2/10/2023 (as you can see this creates some difficulty establishing, unambiguously, whether we are talking about the 10th day of February, or the 2nd day of October).

The fact that we agree on an internationally unambiguous format that none of us would use natively simply means that we all slightly go out of our way, in order to make things easier for everyone. That is a hallmark of respectful behavior: if everyone accepts a slight inconvenience for themselves, we all act more fairly to each other.

The same goes for giving times in UTC: very few people (on a global scale) live “on UTC”: there is just a handful of countries1 that are in the UTC±0 timezone and never observe daylight saving time. Their combined population is about 140 million people, less than 2% of the world’s total. Everyone else lives in a timezone other than UTC at least for half of the year.

This means if you’re one of most people, coordinating times in UTC means having to do a little bit of mental calculation for yourself: you know which side of Greenwich you’re on, and how far removed, and whether or not it’s summer and whether or not your area observes daylight saving time. Everyone else is in the same boat. You go a little out of your way, I go a little out of mine, we respect each other, we get along.

It’s been a little while since I wrote those guidelines. “Use ISO 8601” and “use UTC” are my “no brown M&Ms” rules. And like the removal of brown M&Ms were a proxy variable for diligent contract reading, my date/time rules have turned out to be proxy variables for respect.

There are some people who just assume that these simple rules are not for them, and that surely everyone else can adapt so they don’t have to. I grow progressively more skeptical of those people.

Burkina Faso, Côte d’Ivoire, Gambia, Ghana, Guinea, Guinea-Bissau, Liberia, Mali, Mauritania, São Tomé and Príncipe, Senegal, Sierra Leone, and Togo. ↩

tag:xahteiwi.eu,2023-02-10:/blog/2023/02/10/brown-mms/

Handy Git aliases

Florian Haas Jan 6, 2023 Updated Jan 6, 2023

I keep a few aliases in my ~/.gitconfig that you might find useful, too.

Show full content

I use Git on a practically daily basis, and although it comes with just about everything including the proverbial kitchen sink, there are a few bits of functionality that I only wish it had. Luckily, Git’s functionality is almost indefinitely extensible via the use of aliases.

So, here are some that I define in my ~/.gitconfig file, with a brief explanation of what they’re good for:

List branches by their date of last modification

[alias]
  recent = branch --sort=-committerdate --format=\"%(committerdate:relative)%09%(refname:short)\"

I frequently have a pretty large number of topic branches that I work on, plus ones that I pull in from other people’s remotes for local review. So it’s helpful to know which branches in my checkout were most recently updated, and I can run git recent to do that.

Delete old topic branches that have been merged

[alias]
  delete-merged-branches = !git branch --merged | grep -Ev '(main|master)' | xargs -prn1 git branch -d

I create a topic branch for everything that needs to be reviewed and merged to main at some point. That means it’s not unheard of that I create dozens of them each month, and they quickly accumulate. If I did not regularly prune old topic branches, my Git checkouts would become unmanageable pretty quickly.

So, I use my git delete-merged-branches command to remove those local branches that are fully merged to main.

Find the origin of a branch point

[alias]
    oldest-ancestor = !bash -c 'diff -u <(git rev-list --first-parent \"${1:-main}\") <(git rev-list --first-parent \"${2:-HEAD}\") | sed -ne \"s/^ //p\" | head -1' -

Sometimes I create a topic branch off main, then add oodles of commits on it. At the same time, more commits land on main, and eventually I forget which commit I based my branch on.

Then, I can use git oldest-ancestor to retrace my branch point, like so:

git oldest-ancestor foo bar: find out at which commit bar branched off foo.
git oldest-ancestor foo: find out at which commit the currently checked-out branch branched off foo.
git oldest-ancestor: find out at which commit the currently checked-out branch branched off main.

I seem to recall I learned this trick from a Stack Overflow discussion, which I can’t find anymore. What I have found is a similar implementation from Lee Dohm that is MIT licensed: git-oldest-ancestor.

Fix trailing whitespace

[alias]
  fixws = !git diff-index --check --cached HEAD -- | sed /^[+-]/d | sed -r s/:[0-9]+:.*// | uniq | xargs sed -e s/[[:space:]]*$// -i

I usually want to avoid committing changes with extraneous whitespace, and if I enable the default pre-commit script that lives in a .git/hooks directory by dropping .sample off its filename, Git will even enforce this as a pre-commit rule.

So what I do is this:

I try git commit.
Git complains about trailing whitespace.
I run git fixws, and repeat my git commit command.

tag:xahteiwi.eu,2023-01-06:/resources/hints-and-kinks/git-aliases/

3 places to eat: Berlin

Florian Haas Dec 29, 2022 Updated Dec 29, 2022

Need a bite in Berlin?

Show full content

My Fediverse friends recently encouraged me to share places where I liked to eat in cities I’ve travelled to. So here are 3 of my favourite places to grab a bite in Berlin. These are three picks from a list that is at least three times as long, from three different city boroughs.

Lemon Grass Scent

Schwedter Straße 12, Prenzlauer Berg

You may scoff at Asian fusion because you’ve seen too many places that are terrible at it. This place is excellent. It’s particularly lovely in the summer when you can sit at one of the curbside tables outside on a balmy evening.

Bejte Ethopia

Zietenstraße 8, Schöneberg

Get your injera fix here! This is a lovely cozy place that I’d suggest you visit with a small group. Highly vegetarian/vegan friendly, although they have excellent meat dishes as well.

Restaurant L’Escargot

Brüsseler Straße 39, Wedding

I’ll happily recommend the eponymous snails, but this is also known to be a source of excellent coq au vin.

tag:xahteiwi.eu,2022-12-29:/blog/2022/12/29/3-places-to-eat-ber/

3 places to eat: Tel Aviv

Florian Haas Dec 19, 2022 Updated Dec 19, 2022

Find yourself hungry in Tel Aviv? I can help.

Show full content

My Fediverse friends recently encouraged me to share places where I liked to eat in cities I’ve travelled to. And since I’m just back from Tel Aviv, here are 3 of my favourite places to grab a bite in that city.

Benedict

Ben Yehuda St 171, Tel Aviv

A 24/7 breakfast restaurant that is an excellent location for shakshuka, but it also makes the best Eggs Benedict I’ve found outside Auckland (this doesn’t reflect poorly on the place — Eggs Benny from Auckland are unbeatable). In my opinion, the top item on the menu are the ones with the lamb bacon and spinach.

Benedict has several locations around the city. The most conveniently placed one for touristy purposes is arguably the one on Rothschild; there’s also one in Sarona Market, but my favourite branch is somehow the small one on Ben Yehuda.

Falafel Frishman

Frishman St 42, Tel Aviv

Best falafel I’ve ever had. I prefer the full falafel platter (with generous amounts of Arab salad, hummus, and pickles) over the pita sandwich.

My local friends tell me the sabich next door is also excellent, but I haven’t tried it myself since I’m not too hot on hard-boiled eggs (sorry!).

Haj Kahil

David Razi’el St 18, Jaffa

An Arab cuisine place in Jaffa just outside the old town, on Clock Tower square. The starters are mindblowing and delicious, and the meats are absolutely divine. Bonus: you have a lovely walk back to the central Tel Aviv hotel district, via the waterfront boardwalk.

tag:xahteiwi.eu,2022-12-19:/blog/2022/12/19/3-places-to-eat-tlv/

Creativity (DevOpsDays Tel Aviv 2022)

Florian Haas Dec 17, 2022 Updated Dec 17, 2022

I did a 30-minute Spotlight Talk at DevOpsDays Tel Aviv 2022.

Show full content

This year, DevOpsDays Tel Aviv accepted my talk on creativity (which I submitted as a 20-minute session), and then upgraded it to a 30-minute Spotlight Talk on the main stage to close out the first day.

This was my first DevOpsDays Tel Aviv since 2019, and the event has grown significantly since then. It is still a 2-day event that is mostly single-track save for some workshops, but it now takes up all of Pavilion 10 of the Tel Aviv Expo, including an 800-seat main stage keynote area.

The video for this talk on Youtube, but unfortunately it has pretty bad audio issues so it’s a bit painful to listen to. My slides (with full speaker notes) are available here.

tag:xahteiwi.eu,2022-12-17:/resources/presentations/devopsdaystlv-2022/

No, you won't get a PowerPoint from me!

Florian Haas Dec 11, 2022 Updated Dec 11, 2022

Dear conference organizers, I use my preferred slide deck framework for a reason. Please don’t try to second-guess me.

Show full content

Dear conference organizer,

Please do not attempt to foist a slide deck format on me. Please stop asking for a PowerPoint deck, or a Google Slides link, or a PDF. And if these are the only formats you are willing to accept, please do state that clearly and up front in the CFP, because it means I won’t be submitting a proposal to your conference.

I am not doing this to annoy you, or make your life harder. It’s just that the presentation stack I have settled on is far superior to the ones you are suggesting, and I am disinclined to compromise on quality.

I use the reveal.js presentation framework, and publish my slide decks using GitHub Pages. I provide a public URL to my slides ahead of time — always at the beginning of the talk, and also in advance if requested, in case you want to put the link in your conference programme.

This approach allows me to do things that — to the best of my knowledge — neither PowerPoint, nor Google Slides, nor a PDF allow me to do, which I consider essential in the interest of my (that is, your!) audience:

I always encourage people who sit at the back of the room, or who have vision acuity issues, to open my slides on their phone or tablet as I start the talk (I provide a full-size QR code for that purpose; here is an example). reveal.js Multiplex then allows me to advance my slides on the big screen or projector, and the slides on people’s personal devices will advance in lockstep. This enables everyone to follow along without having to crane their neck or squint their eyes.
My slides always use the reveal.js black-on-white theme (which works best even if lighting conditions in the presentation room are unhelpful), and I include the reveal.js menu plugin with a theme switcher. This enables people who cannot look at bright screens for a long time (migraine patients come to mind) to switch to the dark, white-on-black theme as they follow along on their own device, which tends to make their talk experience more pleasant. To see what this looks like, open this deck, then click the ☰ icon or hit the m key, and go to Themes → Black.
On a related note, please don’t prescribe a slide template or colour scheme either, unless your template designers have made better accessibility accomodations than I have (at which point I will be more than happy to work those into my slide deck design).
reveal.js comes with a magnificently useful and highly flexible pacing timer, which enables me in my delivery to keep time very precisely, and even allows me to make timing changes in a hurry. For example, if my 50-minute talk starts 5 minutes late because of a scheduling issue, I can make a quick change to make up that time without compromising talk quality, and without the talk getting rushed at the end. Good schedule-keeping enhances the quality of your conference, and reduces your stress level as an organizer.
The fact that my slides live on GitHub, and are automatically published with a git push command, enables me to make last-minute modifications in relation to other talks previously seen in the conference. This keeps me from sharing information that is outdated, lets me add new information from your conference that is useful, and will thus make my talk (and your conference!) more practically relevant.
Immediately after my talk is done, I re-push my slides with inline speaker notes enabled. (To see what this looks like, look at this example.) You always get my talk with my full speaker notes, meaning every single one of my talks comes with a full transcript. This is beneficial to people with attention issues, and also greatly facilitates hallway track conversations — because people can always refer back to something I said, without having to try to remember and discern it from the slides’ contents.

You get all this for the small price of opening one more browser tab on your presentation and A/V production laptop. It might add a small amount of extra work for the people who run your A/V production; I acknowledge that and ask you and them to please accept that extra burden. But I promise you it’ll be a net win for our audience.

tag:xahteiwi.eu,2022-12-11:/blog/2022/12/11/no-you-wont-get-a-powerpoint/

Bye, Birdsite

Florian Haas Nov 12, 2022 Updated Nov 12, 2022

Find me on the Fediverse.

Show full content

This is just to let you know that as of today, I am no longer active on the birdsite.

I have deleted my 15,227 tweets (going back to 2015-08-25, just over 7 years) using TweetDelete, and am leaving just a single one up to let people know where I went.

At this point, I have no intention to resume my activity there. You can find me on the Fediverse, and I’d be delighted to connect with you there. My primary Fediverse account is currently @xahteiwi@mastodon.social; if I ever move to a different instance I’ll be sure to set up a redirect.

I might still occasionally re-run Fedifinder to find people that I’d been following, or that I had added to lists, and then follow their Fediverse identities. So, if you’re also moving, please consider adding your Fediverse handle (or a link to your profile) to your bio.

tag:xahteiwi.eu,2022-11-12:/blog/2022/11/12/bye-birdsite/

Nebulous Percentage Shenanigans

Florian Haas Oct 15, 2022 Updated Oct 15, 2022

I recently learned that I am a “detractor” of things I actually quite like. Here are a few related things I also learned about in the process.

Show full content

Surely you’ve seen a question similar to the following in a survey or request for feedback.

How likely is it that you’ll recommend X to a colleague or friend?

In this question, X is either a product or service, or a brand, or all of a company’s products or services. Answers are given on a scale of 0 to 10, with 0 being “not at all likely” and 10 being “extremely likely”. It is frequently the last question in a survey, or even the only one.

The reason this question is so frequent is that it drives a metric that is popular with marketingfolk and management: the net promoter score, or NPS.

To work out the NPS, what you do is deduct the percentage of “detractors” of your product from that of the “promoters”. If the net result is above zero, per the lore, you are generally doing well. If you hit above 30, your product or service or brand or company is allegedly performing excellently.

What’s a promoter? What’s a detractor?

Now, what defines your promoters and detractors, according to the NPS metric? Clearly we’ll have to slot the possible responses into categories.

And here’s where you’ll notice something peculiar about the scale. It consists of discrete values going from 0 to 10 inclusive, not from 0 to 9 or 1 to 10. That means it’s an 11-point scale. What do we notice about the number 11? Exactly, it’s prime. It’s thus impossible to sort the scale items into evenly sized categories, no matter how many or few.

But okay, let’s say it’s not a law that the categories need to be evenly sized. So you might think that answering anything from 0 to 3 makes someone a “detractor”, and anything between 7 and 10 a “promoter”, with the middle ground (4 to 6) being somewhat neutral.

But that’s not what NPS uses. The actual NPS scale looks like this:

0 to 6: detractor
7 to 8: “passive”
9 to 10: promoter

Now, recall that NPS ignores the “passive” respondents altogether, and only looks at the percentage of “promoters” minus that of “detractors”. If a majority of respondents answer 7 or 8 (intuitively a solidly positive score, if you ask me), those do not factor into the result at all. Only being inclined to rather enthusiasically recommend a product or service makes you a promoter. And answering 6, clearly north of the scale’s middle mark, makes you a detractor.

Obviously, this oddly warped scale in combination with ignoring part of the sample altogether makes the deduction of one percentage from another rather agony-inducing for any secondary school maths teacher. That’s eminently not how these things work; on its face it’s one of those results that Wolfgang Pauli would have called “not only not right, but not even wrong.”

Would you recommend a rental car company to someone who doesn’t need a rental car?

But in addition to a warped scale and off-label use of percentages, NPS also uses an inherently biased premise.

Suppose I rent a car from a fictitious company we’ll call My Local Public Transit Sucks, or MyLoPTS. And after I return my vehicle, I am asked how likely I am to recommend renting a car from MyLoPTS to “a friend,” which I would presume to mean any randomly selected one of my friends.

Now I would not recommend any rental car company to someone I know to not be in need of a rental car. And I would assume that a maximum of 10% of my friends, acquaintances, and colleagues are in need of a rental car at any one time (considering that my locally available public transport very much does not suck, so the demand for rental cars is quite low). So even if I was certain to recommend MyLoPTS to anyone needing a rental car, the correct answer for the question as asked — the likelihood of recommending MyLoPTS to a friend, regardless of circumstances — on a scale of 1 to 10 would be 1. My answer to the question as stated thus has little to no relation to how happy I am with MyLoPTS’ service.

So, it’s a misguided question with a warped scale that makes implicit assumptions and then does creative maths with the result. Why do so many people believe that this makes any sense?

Greetings from Harvard

The answer, apparently, is in a single article in the Harvard Business Review. That article will be old enough to drink in two years’ time in its USian habitat, so whether its 2003 findings are still valid in 2022 is debatable. But let’s assume for the time being that they are.

On the subject of the scale in question, here’s a quote from that article:

[We] settled on a scale where ten means “extremely likely” to recommend, five means neutral, and zero means “not at all likely.” When we examined customer referral and repurchase behaviors along this scale, we found three logical clusters. “Promoters,” the customers with the highest rates of repurchase and referral, gave ratings of nine or ten to the question. The “passively satisfied” logged a seven or an eight, and “detractors” scored from zero to six.

— Frederick F. Reichheld, The One Number You Need to Grow, Harvard Business Review (2003)

So, this confirms that the scale itself isn’t meant to be warped by default. Anything under 5 means unlikely to recommend, 5 is neutral, and anything above 5 is likely to recommend. (Since the respondents would presumably be required to select discrete values, that fact still warps the scale and places the “neutral” value off-centre — but let’s assume the creators of the scale did think of it as a continuum, which does not have this problem.)

Rather than being baked into the scale from the get-go, the categorization into “promoters” comes from an actual correlation of responses to repurchase and referral behavior. Or at least the article claims so — the research data it’s based on does not appear to be publicly available. We can only assume that similar correlations with actual customer behavior were drawn for the “passively satisfied” and “detractor” categories, though I am not quite sure how they would have identified the former, separating them from promoters. I suppose a “passively satisfied” person could perhaps have been one that did come back to make another purchase, but never made a referral? It would be interesting to see how they tracked that in 2003.

At any rate, the HBR article then asserts that NPS was a predictor of company growth when comparing to its competition: in other words, that companies with a higher NPS than their competitors also experienced higher revenue growth than them — across multiple industries (the article specifically mentions airlines, ISPs, and rental car companies).

The article also says this:

The “would recommend” question wasn’t the best predictor of growth in every case. In a few situations, it was simply irrelevant. In database software or computer systems, for instance, senior executives select vendors, and top managers typically didn’t appear on the public e-mail lists we used to sample customers. Asking users of the system whether they would recommend the system to a friend or colleague seemed a little abstract, as they had no choice in the matter. […]

Not surprisingly, “would recommend” also didn’t predict relative growth in industries dominated by monopolies and near monopolies, where consumers have little choice. For example, in the local telephone and cable TV businesses, population growth and economic expansion in the region determine growth rates, not how well customers are treated by their suppliers.

— Frederick F. Reichheld, The One Number You Need to Grow, Harvard Business Review (2003)

That sounds quite reasonable. Obviously, recommending a product to a friend or colleague doesn’t help the company selling it, if the friend or colleague has no say in the purchase decision.

Cultural bias, and Goodhart et al.

But there’s another thing that the article doesn’t say, apparently because it’s obviously implied: all companies covered in the research, and presumably the vast majority of the customers surveyed, were from the United States.

It’s rather well understood that scales are read differently by people from different cultures. So while the correlation of a certain response score with a certain behaviour is likely to work fine when you’re surveying U.S. customers of U.S. companies, it’s likely to fall apart when you’re trying to make similar predictions from answers from non-U.S. respondents, or to compare responses internationally.

Note further that the article describes NPS as a predictor of growth, meaning the underlying conditions that cause a company to have a high NPS also give it a competitive advantage and thus facilitate growth. Trying to tweak the measure itself — for example, by coaching people to respond 9 or 10 when they would intuitively select 7 or 8 — is a great example of collapsing a statistical regularity by placing pressure on it for control purposes, i.e. Goodhart’s Law.

And, of course, NPS is subject to Campbell’s Law as much as any other metric. When a score becomes the goal of a process, it both loses its value as an indicator, and distorts the process itself in undesirable ways. You could argue that this effect of NPS is a regrettable, but natural aftereffect of its enduring popularity over nearly 20 years — but no, it’s right there in the same HBR article:

Branch scores were not improving quickly enough, and a big gap continued to separate the worst- and best-performing regions. […] So the management team decided that field managers would not be eligible for promotion unless their branch or group of branches matched or exceeded the company’s average scores. That’s a pretty radical idea when you think about it: giving customers, in effect, veto power over managerial pay raises and promotions.

— Frederick F. Reichheld, The One Number You Need to Grow, Harvard Business Review (2003)

“Radical” strikes me as a rather charitable assessment for what Goodhart and Campbell (and Marilyn Strathern) would call completely messing up the measure, by making it a career-defining target.

In summary

NPS strikes me as Not Particularly Sensible.

Further reading

These are provided for additional reference only; I do not necessarily agree with all their findings and suggestions.

Kim Witten, 10 reasons why NPS is BS (and what you can do about it) (2022)
Khadeeja Safdar and Inti Pacheco, The Dubious Management Fad Sweeping Corporate America (Wall Street Journal, 2019) (paywalled)
Ron Shevlin, It’s Time To Retire The Net Promoter Score (And Here’s What To Replace It With) (Forbes, 2019)
Jared Spool, Net Promoter Score Considered Harmful (and What UX Professionals Can Do About It) (2017)

I should mention that I particularly disagree with the notion of “replacing” the NPS with something else. See my thoughts on metrics for background.

tag:xahteiwi.eu,2022-10-15:/blog/2022/10/15/nps/

Uncertainty, industrious compliance, and the illusion of control

Florian Haas Sep 29, 2022 Updated Sep 29, 2022

Why are so many managers obsessed with faux certainty in situations when objectively, there can’t be any?

Show full content

If I had visited you from the future six months ago, and given you today’s headlines to read, you would probably have dismissed me as a lunatic. And you’d probably say the same to someone who dropped in on you today and read you the headlines from six months out. And if conversely you tried to sit down tonight and write the headlines for March 29, 2023, you’ll probably notice on the day that your predictions were nowhere near actual events.

You have no idea what the world, or even your life, will look like just six months from now. Make it five years or ten, and you won’t have the faintest shimmer of a clue. There is no certainty about your future.

Now, there are three approaches to deal with this kind of uncertainty:

Fatalism. The idea that whatever happens will happen, you have no control over it, and therefore any attempt to wield any influence over your own life is doomed.
Control. The idea that you can achieve certainty in your life by planning everything and leaving nothing to chance.
Creativity. The idea that, although some events in your life (and the world) are beyond your influence, you will have the ideas needed to make the best of the twists and turns, good and bad, that life throws at you.

In comparing these, I’d maintain that fatalism is a terrible strategy for life, and that far superior to sticking to the illusion of control is an approach favouring creativity — “the process of having original ideas that have value,” according to Sir Ken Robinson.

Now, almost exactly the same considerations apply to the medium-term future of a corporation. Nobody really knows what the future holds. There is no certainty. The market can go either way, interest rates can explode shattering your financing, you might close on an unexpected opportunity that sends your business flying. In principle, corporations could make the same considerations that apply to people, and strive to foster creativity left, right, and centre.

I am routinely dismayed, though, by the fact that the approach heavily favoured in management is one of control, to absurd ends. I think it’s obviously ludicrous to expect an engineering manager to “plan” what their team will be working on 5 quarters from now, or a sales person to “project” product sales next July. There are close to a million things that could influence either outcome, and the person would have to do the work equivalent to that of meteorologist predicting maritime weather patterns for container ships, weeks and months out. Almost nobody in business has the data, the time, the staff, and the budget to do that.

So managers turn to metrics and KPIs and OKRs and whatever the TLA du jour may be, in a frantic quest to achieve some degree of certainty — a process which, as others have pointed out, suffocates the creativity that’s really needed to address a constantly changing environment. The idea is that if you collect all the metrics and do all the statistics and measure everything (in other words, if you follow some “process” perfectly), you’ll succeed.

That is so obviously non-sensical that the question becomes, why the hell does anyone think that this works?

And I have a hypothesis on why that is so. I attribute it to an institution that practically every manager has gone through, indeed, one that practically everyone has gone through.

That institution is school.

I don’t know about yours, but I can say one thing about my own school education: all of my schooling can be summarized in just a few sentences. Specifically, here’s what school taught me:

Here is a set of rules.
Apply these rules, and you will succeed.
Apply the rules perfectly, and you will excel.

That applied no matter the subject. It was as true for maths as it was for foreign language education, and for writing essays in history class. Here’s how it’s done; do it this way and you’ll succeed.

As it happens, I was good at applying rules. I succeeded in creative writing even when my writing wasn’t at all creative: I could spell, I had good grammar, my writing had the expected structure, I understood punctuation. As long as my writing was error-free, I made A’s (or rather, 1’s, in the school system I inhabited). My schoolmates might have written far better stories, but if in a couple of pages they had three grammar errors and a few misplaced commas and a couple of spelling glitches, they’d make C’s (3’s).

Now, there’s nothing wrong with giving children a set of rules for solving a problem, and then rewarding them for applying those rules correctly. The problem is with the idea that applying the rules perfectly is where excellence lies. That the difference between doing something well, and doing something exceptionally well, is simply in the more perfect application of the rules.

That couldn’t be farther from the truth. What distinguishes good results from excellent ones is either that the excellent ones follow the rules very well and then add a personal creative touch, or that they deliberately bend or reject conventions and are great nonetheless.

It gets worse. The people that school rewards the most — the ones it considers the overachievers, the cream of the crop — are the ones who can most perfectly follow the rules everywhere. We call them straight-A students. I was one of them. There is absolutely no way that any one child could have straight A’s, if in school report cards we applied an understanding of excellence that included creativity. You cannot possibly be equally and exceptionally creative in maths, science, languages, history, and all the other subjects you take. Straight A’s is what you get solely from industrious compliance.

And now, in business, we are stuck between a rock and a hard place. There is a whole generation of people under 35 who have gone through this kind of schooling, but are increasingly disillusioned by it — having understood that all their rules-compliance in school and at university is not a guarantee for economic prosperity or even financial security, and further that they can scarcely expect loyalty from their company if the going gets rough, even if they follow all the rules like a straight-A student.

But those same people are being managed by upper-level managers in their late 40s and 50s who not only have achieved some degree of financial stability, but who do believe that it was their education that prepared them for it — and that as long as they continue to meticulously apply a set of rules, everything will be fine and they get to expect a promotion. And when given a new rule book — say, some new management fad imposed by a person of authority, akin to their erstwhile teacher — all they need to do is apply those rules well, and it all will work for them, too. And the more industrious compliance they apply to the rules in the book, the more likely it is that results will not only be good, but excellent.

And if that rulebook includes collecting metrics everywhere and doing misguided statistics on them, it will result in an inordinate amount of busywork that kills the creativity that organizations really need to succeed. And likely drive away the people that could contribute exactly that kind of creativity.

Eventually, this problem will rectify itself by those managers retiring. But in the interim, maybe a few of us could ditch our school approach to work and collaboration, and engage our brains to produce the original ideas that we really need?

tag:xahteiwi.eu,2022-09-29:/blog/2022/09/29/industrious-compliance/

Writing Professionally (DevOpsDays Berlin 2022)

Florian Haas Sep 18, 2022 Updated Sep 18, 2022

I couldn’t make it to DevOpsDays Berlin, because my Covid status kept me from travelling. Thus, here’s a write-up.

Show full content

At DevOpsDays Berlin 2022, I was slated to give a 30-minute talk titled Writing Professionally. Unfortunately a positive Covid test just three days prior to my departure threw a wrench in those plans, and I was unable to travel or attend a conference (much less speak at one, unmasked).

So, I am converting my talk into a write-up, before any symptoms set in.

Writing Professionally (in English)

This is a talk about developing superpowers. Superpowers that will help you be better in any role, any industry, any profession — particularly the ones where you work in distributed teams, and use an asynchronous and distributed workflow.

This is about expressing yourself clearly, succinctly, and professionally in writing. I have put this together for the English language, though parts of this talk may be applicable to other languages as well.

This is about writing well in a professional context. That means we’re not talking about poetry, or creative writing, or writing a message to a friend. We’re talking about writing in the context of professional communications.

Being understandable

When you write in a professional context, your goal is the reader (or readers) understanding you.

That is really the overarching goal, and it’s important to understand that we must do everything we possibly can to make it easy for our reader to understand what we mean.

When you’re writing a novel, you can add plot twists and surprises. You can even use a literary device called an unreliable narrator, where at the very end of the novel you reveal that everything the narrator said (or the protagonist did) was a lie, a delusion, or a dream.

In professional writing, we don’t have that luxury. We must write in such a way that whoever reads our writing, understands us.

And that is the overarching goal no matter the mode we use to communicate. Writing to be understood extends to

email messages,
chat messages,
issue descriptions and comments (in whatever issue tracker you might be using; this includes Jira tickets, Trello cards, GitHub issues and the like)
collaboratively edited documents (like wiki pages or shared Google docs)
meeting notes (more on those in a bit), and of course
technical documentation.

And in writing, your No. 1 priority is clarity.

Not beauty, elegance, or cleverness. Those all have their place, and there is some room for them in professional writing as well, but you should never sacrifice clarity for any of them.

I would also say that being clear is more important than being friendly. If there is something that you need to express, but you can’t do it in a friendly manner if you’re being clear, I’d say you should be clear, rather than friendly. However, not being particularly friendly doesn’t mean being disrespectful — even if you’re expressing strong disagreement, you can do so in a respectful manner.

Clarity takes two forms

Writing clearly can mean one of two things:

Clarity of expression, that is, caring about how we write each sentence (and word, really) with a view toward maximum clarity.
Clarity of structure, that is, caring about how we design “documents” (in the broadest sense), so that they are clear and easy to understand, and are useful for communication.

Both of them are equally important, and they depend on each other: a beautifully structured document is useless if it’s full of gibberish, and splendidly clear words are useless if they end up in an unstructured wall of text.

So we’ll spend some time on both, starting with clarity of expression.

Orwell’s Six Rules

One of the simplest, most concise, and most useful rulesets for writing clearly in the English language comes from George Orwell’s 1946 essay, Politics and the English Language.

In it, Orwell suggested six rules for writers to follow:

Rule 1

Never use a metaphor, simile, or other figure of speech which you are used to seeing in print.

What Orwell means by this is essentially, “don’t try to be clever”. If you’re using a metaphor, chances are that it’s not original, that somebody else has used it before, and that people may see it as a tedious trope.

More importantly though, your reader might simply not be familiar with the metaphor, depending on their background. As an example, if your readers are from Germany, or are medieval scholars from somewhere in Western Europe, they will probably understand what you mean by calling a salesperson a Pied Piper. If they are not, your readers will be very confused by your reference to a 13th century rat exterminator (or musician, depending on viewpoint) in your writing.

Rule 2

Never use a long word where a short one will do.

It’s remarkable how much clearer your writing gets if you ruthlessly edit it. English almost always gives you a choice between using a long word, and a shorter one with the same meaning. Always opt for the short one. Rather than utilise, write use. Rather than saying methodology, say method. Instead of I don’t mean to insinuate, you can write I don’t mean to suggest.

Rule 3

If it is possible to cut a word out, always cut it out.

When you’re using shorter words instead of longer ones, you cut syllables and letters from your writing. Naturally, you can also extend that concept to cutting whole words.

There’s a famous quote that fits well with this particular rule that comes from the French writer, poet, and aviator Antoine de Saint-Exupéry (best known to English-speaking readers for his novella The Little Prince).

« Il semble que la perfection soit atteinte non quand il n’y a plus rien à ajouter, mais quand il n’y a plus rien à retrancher. »

(Terre des hommes, 1939)

In English, that roughly translates to:

It appears that perfection is achieved not when there is nothing left to add, but when there is nothing left to take away.

(I say roughly because cut away would be a better translation than take away, but in English the former sounds unnecessarily harsh.)

It needs to be said that this quote disagrees with itself a bit, because it does contain a phrase that is perfectly okay to edit out. If he followed his own advice (and was a little more assertive), Saint-Exupéry could have written:

Perfection is achieved not when there is nothing left to add, but when there is nothing left to take away.

But it is a good quote to go by, still. If you make it part of your mindset to send an email, or hit the Save button on an issue comment not when you can’t think of anything to add, but when you’ve tried to trim all the fat and find nothing else to trim, that will make you a better professional communicator.

Rule 4

Never use the passive where you can use the active.

Orwell’s rule No. 4 states that you should be using the active voice whenever you can. There are some exceptions to this, but in general the active voice tends to be shorter, more concise, and thus clearer than the passive voice.

Passive voice: The system was rebooted.
Active voice: We rebooted the system.

Sometimes, it’s difficult for non-native speakers to even recognize that they are using the passive voice. There’s a neat little trick to apply here: if you can take the sentence and append “by dragons” to it, then it’s passive voice.

Applying this rule, we could edit the Saint-Exupéry quote even more radically:

We achieve perfection not when we can’t add anything, but when we can’t take anything away.

Now of course, the quote becomes less and less poetic as we edit it. I’m not trying to say that Saint-Exupéry ought to be writing any differently than he did. What I’m trying to say is that the last version is probably the clearest to most people (including those for whom English is their second or third language), and that in professional communications we shouldn’t strive to be poetic, we should strive to be clear.

Rule 5

Never use a foreign phrase, a scientific word, or a jargon word if you can think of an everyday English equivalent.

Orwell’s fifth rule emphasises simplicity and accessibility. If there is an uncommon word that means the same thing as a common one, always use the common one.

Note that this means that you can — and should — use technical terms if they have no everyday equivalent. That’s simply because using an imprecise term would mean losing clarity, when compared to using a precise term. But when we have one that we can easily replace with a common word, we should do so.

For example:

This is a favourable conceptualization.
This is a good idea.

Here, we can simply replace the uncommon terms with common ones.

Another example in the corporate world is that we can easily replace the jargon term “human resources” with “people”, and also replace “acquire” with “hire” or “buy”, depending on context.

For this project we’ll need to acquire more human resources.
For this project we’ll need to hire more people.

Rule 6

Break any of these rules sooner than say anything outright barbarous.

And finally, Orwell reminds us that while this ruleset is a good guideline to follow, it is not absolute.

Yes, sometimes the passive voice is just fine, such as when you want to make a point of not assigning blame for something that went wrong, to a particular individual or group. Sometimes you can include a witty quote unedited, even though it used jargon or a foreign word.

But having the rulebook in the back of your mind at all times is guaranteed to improve your writing.

Little things

Let’s look at a few other, small things that you can think about to make your writing better.

Parallelism

Parallelism (parallel structure) is the simple rule that if you are using a particular structure in a sentence, then you keep using that structure until the end of the sentence.

We wanted to know the time, the place, and where we were going.
We wanted to know the time, the place, and the destination.

In this example, both sentences are correct English, even though the first one may sound a little colloquial.

The second sentence is clearer, because the use of “we wanted to know the time, the place, and” primes the reader to expect another instance of the definite article “the”, followed by a noun. Finishing the sentence by “the destination” makes for a more straightforward reading experience.

Emphasis goes last

Most English speakers subconsciously read the end of a sentence as being more important than its beginning. Thus, whatever you consider the more important aspect should go at the end of the sentence for emphasis.

The drug is highly effective, but has significant side effects.
The drug has significant side effects, but is highly effective.

The first example emphasises the side effects. Thus, most readers would read it as a warning to prescribe the drug with caution. The second example emphasises the drug’s effectiveness, and could thus be seen as an encouragement to prescribe the drug for treatments.

Effectively using bullets in a sentence

In professional writing, you often have the opportunity to make a long sentence much more readable, by simply injecting bullets.

Take this example:

It is important that we listen to the customer’s needs, build a good solution that keeps us within the applicable regulatory framework, and clearly delineate responsibilities in the proposal and SLA.

This sentence is perfectly correct, and it isn’t even awkwardly structured. It is just a long, run-on sentence, which most readers will find difficult to grasp at a glance. You probably reread it once or twice.

Compare this version:

It is important that we

listen to the customer’s needs,

build a good solution that keeps us within the applicable regulatory framework, and

clearly delineate responsibilities in the proposal and SLA.

In this version, not a single word has changed. The statement could still use further editing. But even at this stage, most people will find reading this version much easier than reading the original. That is because most readers will be able to grasp the initial part of the sentence, “it is important that we”, at a single glance, and will also be able to notice at a glance that there are three bullets.

That is to say that without even reading it in full, your reader will understand that you are about to mention what’s important, and that there are three things you suggest to consider.

Writing positively

Writing positively means to write using affirmations, rather than negations. In other words, you state what is, rather than stating what is not.

The project manager may not hire outside contractors except those that have been carefully vetted for reliability.
The project manager must vet all outside contractors for reliability.

Here, we have transformed the negative structure “may not… except” into an affirmative one, and we’ve further cleaned up the writing by eliminating an example of passive voice (“carefully vetted [by dragons] for reliability”) and replacing it with active voice.

Punctuation

… does matter. Really.

People often overlook how important punctuation is to clarity. What follows is an example of changing the meaning of a sentence drastically, by inserting commas.

First, we are dealing with a perfectly normal panda:

The panda eats bamboo shoots and leaves.

With one comma, the panda becomes a picky eater who takes off after eating the bamboo shoots, rather than also eating the bamboo leaves:

The panda eats bamboo shoots, and leaves.

And with two commas, we’re suddenly dealing with a panda that’s a gun nut with an anger management problem:

The panda eats bamboo, shoots, and leaves.

Whitespace

Whitespace normally goes after punctuation, not before.

In English (in contrast to some other languages), when whitespace precedes punctuation, it’s almost always wrong. The correct sequence at the end of a sentence is full-stop (or exclamation mark, or question mark) followed by whitespace. Likewise, it’s first comma, then whitespace. First semicolon, then whitespace.

There are select exceptions to this rule: whitespace does precede the en dash (–), opening quotation marks, parentheses, brackets, and braces, and optionally the em dash (—). But whitespace preceding punctuation at the end of a sentence is always an error.1

Question marks (?)

Questions should end in a question mark, shouldn’t they?

Don’t end questions with a full-stop or an exclamation point in professional writing. That will likely come across as either passive-aggressive or confrontational. End questions, even rhetorical ones, with a question mark.

Exclamation marks (!)

Use no more than one explanation mark per paragraph. Zero is fine, too.

Exclamation marks are something you want to use sparingly. Never use more than one in immediate succession. Limiting yourself to one per paragraph is a good rule of thumb.

Semicolons

Semicolons are great; splitting the sentence in two is usually better.
Semicolons are great. Splitting the sentence in two is usually better.

English mandates that on either side of a semicolon, you have what constitutes essentially a full sentence with a subject and verb (both obligatory) and an object (optional). Thus, they give you the option of just splitting the sentence into two — that is to say, replacing the semicolon with a full-stop.

Possessive apostrophe

Oscars: more than one Oscar
Oscar’s: of or related to Oscar

A common mistake that native speakers of German or Swedish often make is the omission of the possessive apostrophe. English doesn’t have much in the way of cases, but does have something like a genitive case. If something is of Oscar, or related to Oscar, it is Oscar’s — such as Oscar’s car, Oscar’s job, Oscar’s house. An “Oscars job” is a job for multiple people named Oscar, or a job at the Academy Awards.

It’s vs. its

Infuriatingly, the possessive apostrophe does not apply when the thing that something is related to is simply it. It would be perfectly reasonable to assume that when we refer to the car in “the car’s roof” as it, it becomes “it’s roof”, but alas, things don’t work that way.

it’s stands for “it is” (or “it has”)
its means “of it”

Yes, this is illogical and immensely confusing. English is weird.

Parentheses and brackets

Parentheses (like these) are always removable, without taking anything (of significant value) away from the statement.

You can use square brackets to shorten quotes.

To be, or not to be. That is the question. […] To die, to sleep. No more!

Angle brackets often serve as placeholders, for when the reader must replace <important thing> with <other thing>.

Hyphens and dashes

Often, people think of the three principal forms of dashes as being interchangeable. They really are not.

A hyphen, sometimes just called a dash, joins word-pairs together.
En dashes, that is dashes roughly the width of a lowercase n, identify things like ranges or dates (March – June, page 47–49).
Em dashes — like these, which have the width of a lowercase m — can replace parentheses.

Avoiding Germglish (and Poglish, and Ukrainglish, etc.)

As with accents, I personally think that we should celebrate the little language quirks that come from influences of our native language on English as a second (or third) language. Unfortunately, not everyone is so tolerant, and sometimes English monolinguals are particularly pesky about these things.

So, what follows are a few little things to think about when it comes to typical crossovers from our respective native languages, to English.

…, or?

Adding the suffix “…, or?” to a statement that you want to follow up with a question that asks for confirmation is very common in German and Swedish. But it is a structure that does not exist in English.

You must instead use an interrogative phrase, such as:

This is a good idea, right?
That is an excellent proposal, isn’t it?
I thought we agree on that, don’t we?

“Disturbance”

Swedish and German native speakers often mentally translate the nouns “störning” or “Störung” into English as “disturbance”. This is a natural error because these derive from the verbs “störa” and “stören”, respectively, for which the English translation is indeed “disturb”.

However, in English “disturbance” is usually read as shorthand for “civil disturbance”, or “disturbance of the peace”, which are euphemisms for a violent riot.

Thus, to an English speaking recipient the statement “we have a disturbance in our data center” would sound much more dramatic than the native German sender probably intended it to be.

Articles

Many Eastern European languages (such as Polish and Ukrainian) don’t use articles, and their native speakers frequently omit them when speaking English. In standard English however, the and a are not optional.

Applicable exceptions are newspaper headlines (“Congress passes law on guns”) and controlled vocabularies (such as troops in combat using voice procedure, “gunner, sabot, tank, on my command, fire”).

But in regular spoken and written communications, English speakers use both definite and indefinite articles.

Subject

English sentences always mention the subject.

English, unlike Spanish and many Slavic languages, is not a null subject language. With very few exceptions, you can never leave out the subject in a sentence — even when it would otherwise be clear from context.

For example, in the English sentence “I am hungry” the “I” is redundant, because the form “am” uniquely identifies the verb as being first-person singular. Still, the sentence cannot be shortened to “am hungry,” because that would simply make it incorrect.

Lowercase “you”

Unlike in German and Polish, English speakers never capitalize you in the middle of a sentence, even when using a polite form of address. “Thank You for Your help” would look odd and antiquated to them.

Now, let’s move on to clarity of structure.

Five paragraphs that matter

Sometimes there are pretty complex things that you want to convey to a group of people. This is a where lot of people would be inclined to call a meeting — but in reality, there’s a much better way to do that, in writing. And doing it in writing will help you do it in a much clearer, more concise, and more efficient fashion.

Whenever you need to thoroughly brief a group of people on an important matter, in writing, consider using a 5-paragraph format.

What follows is a format that is being used by many armed forces; in NATO parlance it’s called the 5-paragraph field order.

Situation
Mission
Execution
Logistics
Command and Signal

Now I’m generally not a fan of applying military thinking to civilian life, but in this case it’s actually something that can very much be applied to professional communications, with some rather minor modifications:

Situation
Objective
Plan
Logistics
Communications

Let’s break these down in a little detail:

Situation is about what position we’re in, and why we set out to do what we want to do. You can break this down into three sub-points, like the customer’s situation, the situation of your own company, any extra help that is available, and the current market.
Objective is what we want to achieve.
Plan is how we want to achieve it.
Logistics is about what budget and resources are available, and how they are used.
Communications is about how you’ll be coordinating among yourselves and with others in order to achieve your goal.

What if that’s too formal?

Sometimes these headings may look overly formal. People who are not used to that format might actually find it off-putting. You can totally use more colloquial headings, to make your communication less formal. For example:

Why am I contacting you?
What do we want to achieve?
How are we going to do that?
What will we need?
How will we communicate?

You’ll quickly notice that these map precisely to the concepts of situation, objective, plan, logistics, and communications. But they sound much more casual and informal and approachable.

Updates

Sometimes you need to convey updates to your plan. Then, it’s often not necessary to redo the whole 5 paragraphs. Instead, you just leave out the bits that are unchanged, compared to your previous plan.

However, it’s always a good idea to include the following 3 paragraphs:

Current Situation
Current Objective
Updated Plan

And the reason for this is easy to explain. There are really only two reasons why you would update your plan: either because the situation is different (the circumstances have changed), or the objective has been modified. And people should know which of the two it is.

Meeting notes

It’s a very common misconception for people to think that meeting notes are for the people that were in the meeting, so they can remember what was said.

News flash: if you have people in the meeting that can’t remember what was said, maybe they didn’t actually participate and shouldn’t have been there in the first place?

So, who do we write meeting notes for? For the people that weren’t in the meeting.

Every meeting needs notes and a summary, and you need to circulate these notes not only with everyone who attended the meeting, but with everyone who has a need-to-know.

And the way you write them is so a person wo wasn’t there, or may not have known of the meeting, or perhaps even wasn’t part of the organization at the time of the meeting, can understand what was said, what was discussed, and what was decided.

There is one person that I can guarantee is never in a meeting you are attending today, and that is you, but six months from now. Looking back at the notes of a meeting that you participated in, six months after the fact, is a terrible experience if the meeting notes are not good. Writing good meeting notes today means making your life, six months from now, easier.

It follows logically, from the requirement that people who were never in the meeting must be able to read the meeting notes and understand everything that they need to know about the meeting, that only the written word counts. What’s not on the written record in a meeting did not happen.

Who’s responsible for meeting notes? The meeting chair (the person that called the meeting).

Who pulls the meeting notes together? The scribe (the appointed note-taker).

Who writes the meeting notes? Everyone.

Meeting notes structure

In terms of what you want to include in your meeting notes, here’s a reasonably useful structure.

Meeting title
Date, time, attendees
Summary
Discussion points (tabular)
Action items

Section 4 would be the longest, and should indeed list everything that was discussed in the meeting. It is usually practical to organise the individual points of discussion in a table. If you keep your action items in issue tracker references, you may also include those in the table together with the corresponding items you discussed. At that point, there is no need for a separate section 5.

And now… the No.1 Awesome Writing Superpower

Here it comes, the No.1 superpower that will improve all your writing, every time. Are you ready?

Read things out loud.

For real. I am dead serious about this one. It’s pretty safe to say that nothing will improve your writing as much as getting into the habit of reading things out loud as you edit.

There are a number of reasons for this:

You can read silently at about 250 words per minute, but you can speak at only about 150. Reading things out loud deliberately slows down your reading, and gives you a better chance to spot your own errors.
Forcing yourself to recite what you wrote will make it easy to detect run-on sentences, confusing phrases, and errors in flow. You will naturally want to correct these, and it will make your writing better.
Making time to read what you write simply means your mind will stay on topic longer than when you just bash something out and immediately send or publish it. This additional time gives you the opportunity to spot more mistakes, detect errors in your thinking, and have better ideas.

Do not squander all those opportunities. Read things out loud.

Guess what I did with this article.

Frederic Hemberger has pointed out that there is one exception to the “no whitespace before punctuation at the end of a sentence” rule, which is that whitespace does precede a trailing ellipsis (…). That is entirely correct, however I’d argue that an ellipsis at the end of a sentence is almost always bad form in professional communications. When you would use it for an open-ended enumeration, it’s clearer to instead write “and so on”, “and others”, or “etc.” — and if you want to illustrate trailing off in thought, finish that thought! ↩

tag:xahteiwi.eu,2022-09-18:/resources/presentations/devopsdays-berlin-2022/

Jammy, don’t snap at me!

Florian Haas Aug 19, 2022 Updated Aug 19, 2022

The current Ubuntu LTS release, 22.04 “Jammy Jellyfish”, tries to force a snap-installed Mozilla Firefox on you. I’m not a fan of that approach.

Show full content

The current Ubuntu LTS release, 22.04 “Jammy Jellyfish”, does not install a Debian package for Mozilla Firefox anymore. Instead, Ubuntu now delivers Firefox as a snap. I’m not particularly enthralled by that idea.

Every once in a while I look at the current state of snaps. And every time I look them, I find that they don’t solve any problems I am having at the time, but do add some. The same, incidentally, happens to be true for Wayland, which is why I still use X.org. (I want to emphasize that the foregoing is true for me — your own experience may well differ, and that’s perfectly okay.) So I have kept my systems free of snapd, and I intend to keep them that way for the foreseeable future.

However, if you upgrade an existing Ubuntu Focal or Impish system to Jammy in-place, with the customary apt dist-upgrade command, Ubuntu replaces the pre-existing Debian (.deb) package with a snap. That is to say, firefox in Ubuntu Jammy is a transitional package that would install snapd as a dependency, and then run snap install firefox. Mid-upgrade, it does pause and prompt you about this fact — but there’s no yes or no that would give you the option to bail, only an “OK” button.

What you thus want to do if you’re wired like me, prior to commencing your upgrade, is tell Ubuntu that you want to keep installing Firefox from a package. And while you’re at it, you might also politely inform your package manager that you have no desire to use snaps, at all.

To do so, first become root, and make the necessary changes to change the focal or impish references in your /etc/apt/sources.list and /etc/apt/sources.list.d files to jammy as you normally would.

Then, make sure that you don’t have the snapd package installed:

# dpkg -l snapd
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name           Version      Architecture Description
+++-==============-============-============-=================================
un  snapd          <none>       <none>       (no description available)

Next, mark the snapd package with hold, so that the current state of the package (un) is made permanent:

apt-mark hold snapd

Now, add the mozillateam PPA:

add-apt-repository ppa:mozillateam/ppa

Next, create a file named /etc/apt/preferences.d/mozilla-firefox, containing the following three lines:

Package: *
Pin: release o=LP-PPA-mozillateam
Pin-Priority: 1001

At this stage, your system should be set up to (a) not install the snap daemon, and (b) conduct the upgrade of the firefox package using the regular Debian package as it appears in the PPA, not the distro package that is a wrapper around snap install.

Now, proceed with:

apt dist-upgrade

Happy jamming!

tag:xahteiwi.eu,2022-08-19:/resources/hints-and-kinks/jammy-dont-snap/

Something interesting happened at work the other week...

Florian Haas Jul 16, 2022 Updated Jul 16, 2022

My company’s CEO recently put out a well-written email update instead of calling an all-hands meeting. It was excellent.

Show full content

I am sure that many of you have been summoned to your fair share of quarterly, end-of-half-year, or end-of-year updates which were held by way of a company meeting. You probably didn’t like them too much, maybe even dreaded the thought of 45 minutes (or longer! 😱) of Death By PowerPoint. But perhaps you didn’t mind turning up because hey, at least it was free coffee and donuts.

When the pandemic rolled around, many organizations turned that quarterly all-hands summons into a quarterly all-hands virtual summons, in the shape of a video call. Now you had to supply your own self-funded donuts, and probably didn’t like the idea of staring into a camera for 45 minutes straight (or longer! 😖🔨). Worse still, I hear that some of you might be have been subjected to virtual pre-recorded events, where you could watch an executive pontificate from a script, with no interactivity whatsoever. Something, I imagine, that is approximately as engaging as watching a wannabe investment advisor on a YouTube channel with 6 subscribers, 4 of them relatives.

Last week, our CEO Jim (who, to his credit, previously ran these events in a reasonably engaging fashion) did something different. At the end of our second quarter, and a few days before going on annual leave, he sat down and wrote an email. As in, he actually sat down and wrote an email. It had structure, it had a format, it had clear messaging, it was to the point.

What was in the message?

I’m obviously not going to go into the detailed content of the email, but here’s what stood out:

The message started with an intro that clearly settled the reader’s expectations: this is what I’m writing about; this is what you’ll be reading.
It clearly delineated that some of the items discussed were going to be somewhat negative (as in, worthy of improvement); many others were positive. It also clearly established that the negatives were coming first, and the positives thereafter.1
It used paragraphs, which were clearly separated by topic. As you read, you knew exactly where one thought concluded and the next one started.
It conveyed an obviously personal viewpoint: Jim was giving his own perspective on things, rather than writing like a detached omniscient narrator.
It was very evident that Jim hadn’t just bashed out the message and sent it off in a hurry, but that some careful re-reading and editing went into it. I have no idea if he did this by himself or asked someone else to go over it with him, but that does not matter: what matters is that he edited, not whether he was his own editor.2

What else is good about this?

Best of all, the whole thing was a remarkable exercise in efficiency. Jim put his thoughts into precisely 1,500 words.3 As I’ve pointed out elsewhere, information you can express in 1,500 concisely written words is at the top end of what you can convey in a 60-minute verbal meeting — but when put in writing, it takes a fluent English speaker only about 6 minutes to read.

How does this compare to the conventional meeting-based approach?

Had Jim prepared slides and a speech for an all-hands meeting, it would have taken him at least two hours, plus the hour of conducting the meeting. He probably spent the same total of three hours writing out, rereading, and editing his message. So the total effort he had to spend on his email was about the same he would have needed to spend on the preparation and the conduct of a meeting.
For everyone else, that is the other 98% of the company, reading the email took one-tenth of the time that participating in a meeting conveying the same information would have taken. Jim gave back 90% of the productivity that would have been spent in a meeting, to 98% of the company.

Makes sense, doesn’t it?

Yes it eminently does. So, next time you consider summoning your whole company to an all-hands meeting or video call, try to be a little more like Jim.

This is an excellent move on two counts: first, readers naturally perceive what comes last as being emphasised. By getting the negatives out of the way first, the positives stick in people’s minds more strongly. Second, it establishes that once you get to the positives, there is no further downer coming to sucker-punch you. ↩
A trusted editor is a gift from god (if you believe in such things). I can highly recommend asking someone you enjoy working with to lend you an extra pair of eyes to go over what you write. I was mutual editors with Elena Lindqvist for two years; it was glorious. ↩
I am told this exact round-number precision was coincidence. ↩

tag:xahteiwi.eu,2022-07-16:/blog/2022/07/16/something-interesting/

Python package dependency checking in a CI pipeline with pipdeptree

Florian Haas Jun 26, 2022 Updated Jun 26, 2022

Sometimes pip behaves rather oddly when it comes to package dependency resolution. Here’s one way to catch such issues in your CI pipeline.

Show full content

Recently at work we ran into rather strange-looking errors that broke some functionality we depend on.

In an application run from a CI-built container image, we were seeing pkg_resources.ContextualVersionConflict errors indicating that one of our packages could not find a matching installed version of protobuf. Specifically, that package wanted protobuf<4 installed, but the installed version of the protobuf package was 4.21.1.

This was somewhat puzzling: all Python packages in the image were installed with pip, and the packages’ requirements ought to have been in good shape.

We found another dependency that did specify protobuf<5, but taken together pip should surely resolve that into a 3.x version of protobuf, in order to satisfy both the protobuf<4 requirement from one package, and the protobuf<5 one from another?

To visualize and test such dependencies, the pipdeptree utility comes in quite handy.

So, I hacked up a couple of minimal tox testenvs:

[testenv:pipdeptree]
deps =
    pipdeptree
commands = pipdeptree -w fail

[testenv:pipdeptree-requirements]
deps =
    -rrequirements.txt
    pipdeptree
commands = pipdeptree -w fail

The first one, pipdeptree, merely installs the package being built, obeying the install_requires list in its setup.py file. This is the “minimal” installation.

The second one, pipdeptree-requirements, runs a full installation, pulling in everything needed from the requirements.txt file.

pipdeptree generates warnings on potential version conflicts between dependent packages. So, in both testenvs, we run pipdeptree in -w fail mode, which turns all warnings into errors that fail the testenv.

So now, having added tox to both our CI and our local Git hooks, we can run these checks locally and from GitHub Actions, and they should both fail and thereby expose our package dependency bug, right?

Well, here is where it got weird.

Because if I ran that locally, on my Ubuntu Focal development laptop, I got:

        - protobuf [required: >=3.15.0,<4.0.0dev, installed: 4.21.1]
      - protobuf [required: >=3.15.0,<5.0.0dev, installed: 4.21.1]

This is “bad” in the sense that it’s the wrong protobuf version, but good in that it exposes the bug that we’re trying to fix. Progress!

However, running the same thing from our GitHub Actions workflow, there’s this:

          - protobuf [required: >=3.15.0,<4.0.0dev, installed: 3.20.1]
        - protobuf [required: >=3.15.0,<5.0.0dev, installed: 3.20.1]

So here, in GitHub Actions, we see a protobuf version being installed that doesn’t break anything, but it also means that our test doesn’t expose our bug, which is a problem!

I’ll spare you the details of finding this out, but it turned out that this is actually a pip problem. pip 20.0.2 (which is the version you get when you run apt install python3-pip on Ubuntu Focal) has the dependency resolution error, which results in a protobuf package that is “too new”. If you install with pip version 21 or later, you get a protobuf that is “old enough” to make all installed packages happy.

So, how do we test that?

There is a package called tox-pip-version that comes in very handy here, in that it allows you to set an environment variable, TOX_PIP_VERSION, instructing tox what pip version it should use in order to install packages into testenvs.

This you can use from a GitHub Actions jobs definition, making use of a matrix strategy:

jobs:
  build:
    runs-on: ubuntu-latest
    strategy:
      matrix:
        python-version:
          - 3.8
          - 3.9
        pip-version:
          - 20.0.2
          - 22.0.4

    steps:
    - name: Check out code
      uses: actions/checkout@v1
    - name: Set up Python ${{ matrix.python-version }}
      uses: actions/setup-python@v2
      with:
        python-version: ${{ matrix.python-version }}
    - name: Install dependencies
      run: |
        pip install tox tox-gh-actions tox-pip-version
    - env:
        TOX_PIP_VERSION: ${{ matrix.pip-version }}
      name: Test with tox (pip ${{ matrix.pip-version }})
      run: tox

What this does is it sets up a 2×2 matrix: run with Python 3.8 and Python 3.9, and for both those Python versions run with pip 20.0.2 and 22.0.4 (these happen to be the two versions that we’re interested in).

That way, we were able to expose the package dependency bug, and then fix it. The test now serves as a regression test, to make sure we don’t run into a similar issue again.

If you’re curious, the full PR discussion with additional context is on GitHub.

tag:xahteiwi.eu,2022-06-26:/resources/hints-and-kinks/pipdeptree-ci/

Writing Professionally (C!Conf 2022)

Florian Haas May 30, 2022 Updated May 30, 2022

I did a workshop on writing in a professional context at my employer’s annual unconference, C!Conf.

Show full content

At work we have an annual1 conference, C!Conf, where we pull the whole company together for 3 days of talks and workshops. All conference sessions are conducted by employees — we don’t bring in any external consultants or (deity forbid) “motivational speakers”, so it tends to be a very productive affair.

This year I ran a 45-minute session on writing in a professional context. There is no recording, but the whole talk is available with my full speaker notes at https://fghaas.github.io/writing-professionally. It is available under a CC BY-SA license, as usual.

For obvious reasons, we didn’t have this event in 2020 and 2021, so this was actually the first we did in 3 years. But the intention is for it to be an annual occurrence. ↩

tag:xahteiwi.eu,2022-05-30:/resources/presentations/c-conf2022/

Batch-processing stereograms with StereoscoPy

Florian Haas May 15, 2022 Updated May 15, 2022

I often need to process stereograms in bulk. A Python tool named StereoscoPy is very handy in doing that.

Show full content

My camera with the Loreo 40mm stereoscopic lens attached (cross-view stereo image)

I have two methods of taking stereoscopic images, both of which I use regularly:

The “left foot, right foot” method, which I can do with any camera, including that on my smart phone (I covered this method at length in my 2021 FrOSCon talk). This is the method I prefer when doing stereograms of landscapes, buildings, statues and such like, and also what works very well for posed stereo portraits.
My Loreo stereoscopic lens, shown in the picture above attached to my camera.

Either way, I need to post-process my images to get cross-view stereograms like the one you’re seeing here.

In the former case the need is obvious: I start with two images and need to make them into one stereogram.

In the latter case, it’s perhaps less so: my stereo lens obviously already produces a stereogram, but it’s a wall-eyed one (which I’m not particularly good at viewing), and it has an area in the centre of the frame where the two images slightly overlap. I have found this area to be about 6% of the total width of the image. So that means that what I need to do, starting with the original stereo image, is this:

Split the original image into two halves.
Cut off 3% on the left and right of each image — on one side, that crop removes the overlap; on the other, it restores symmetry.
Swap the sides of the image: the originally left side goes right, the right side goes left.
For easier viewing, add a divider, and a border.

What comes in very handy here is a neat little tool: StereoscoPy is a small Python library and CLI that is helpful in batch-processing stereo images.

In combination with convert from ImageMagick, this enables me to batch-process a whole folder of stereo images into something that is much more suitable for general consumption than the original images that the Loreo lens produces.

#!/bin/bash

# Set the border/divider width, in pixels
BORDER=40

for f in *.JPG; do
    # Grab the file name, sans extension
    name=${f/.JPG/}

    # If the cross-view stereogram already exists, 
    # skip to the next original image
    if [ -e stereo/${name}-cross.jpg ]; then
       continue
    fi

    # Convert the wall-eyed stereogram foo.JPG 
    # into foo-0.jpg (left) and foo-1.jpg (right)
    convert $f -crop 50%x100% stereo/"${f%.JPG}".jpg

    # Create a cross-view image, with auto-alignment, 
    # that crops 3% off each side of the image, auto-aligns, 
    # and creates a border and divider
    StereoscoPy -x -A \
      --div $BORDER --border $BORDER \
      -C 3% 0 3% 0 \
      stereo/${name}-0.jpg stereo/${name}-1.jpg \
      stereo/${name}-cross.jpg

    # Remove the intermediate images
    rm stereo/${name}-0.jpg stereo/${name}-1.jpg
done

Maybe I’ll eventually get round to submitting a patch to StereoscoPy itself, so that the pre-processing step with convert is no longer necessary and the little script above becomes an actual one-liner. But for now this works okay for me.

tag:xahteiwi.eu,2022-05-15:/resources/hints-and-kinks/batch-process-stereo/

My experiment: an interim update

Florian Haas May 14, 2022 Updated May 14, 2022

An update on my small RSS/Mastodon/Twitter social media experiment

Show full content

My social media experiment is still ongoing. A week without posting articles on social media, while asking people to subscribe my RSS feed, has yielded

practically zero engagement on Twitter (except for the article on Drizzle article, which was reviewed by other people and then shared by my reviewers),
very much non-zero engagement on Mastodon (where some of my followers have apparently been subscribers to my RSS feed for quite a while).

This doesn’t surprise me much: evidently, the habit of subscribing to feed sources via RSS comes more naturally to citizens of the Fediverse.

One nice upside of the experiment is that I have been enjoying content written by others more. My aggregation of about 20-or-so feeds, followed via Aggregator, gives me about 1 to two notifications new post per day, and most of those I find truly enjoyable and insightful.

I will now progress to phase II in my experiment which is to throw new posts out via RSS and Mastodon, but not on Twitter.

tag:xahteiwi.eu,2022-05-14:/blog/2022/05/14/experiment-update/

Running a solar-powered laptop

Florian Haas May 14, 2022 Updated May 15, 2022

I’m a happy Pinebook Pro user, and I frequently use it on solar power.

Show full content

Pinebook Pro laptop sitting on a table, outside, connected to a solar charging panel

This piece of kit has been a conversation starter everywhere I take it out, so I figured it could use a short writeup.

In 2020 I purchased a Pinebook Pro laptop. I had wanted a low-power ARM laptop for a while, the PBP came in a tolerable size (this is a 14” screen; about the top end of acceptable screen sizes for me), and it was an absolute steal. Including shipping and import duty — my device shipped from Hong Kong — I got mine for €277 all told.1

Now if you haven’t heard of the Pinebook Pro, or for that matter of the PINE64 community, you should check out their web site. They make a bunch of really neat devices, though I can only speak to the Pinebook Pro as that’s the only one of their devices I’ve ever owned.

Obviously, the device’s claim to fame is its low power profile. Thus it should come at no surprise that its charging input voltage is a USB-typical 5V, like you know from your phone.

Now the PBP comes with a separate barrel-plug charging port, but most of the time I just charge it via it USB-C. This I do primarily for convenience; it’s simply one fewer piece of kit to carry around. I can thus charge the PBP with a standard wall-socket USB charger, a USB power bank, or any other USB power source.2

Which is where the solar panel comes into play. Mine is a 28W charger from BigBlue. Now, please don’t mistake me for an authority on solar panels; there may be better or more efficient ones on the market — I just found this one useful and compact enough for my liking. Nominally, this panel’s maximum amperage is 4.8A, but I’ve never seen it actually generate that. Under optimal conditions where I live (at 48°N latitude), that is direct sunlight around solar noon on a cloudless day, I can get just under 3A out of the panel in total. Out of this, the maximum output of a single port is 2.4A, so that’s my maximum solar charge current for the PBP.

Overall, for the PBP’s power consumption this is generally perfectly fine. I can work under a sunny or partly cloudy sky for the whole day if I want to.

I’ve also found the display contrast to be sufficient even in full sunlight. I do use a light GNOME theme for my desktop settings, but I don’t need to enable the high-contrast accessibility features. It’s not advisable to work with the whole laptop exposed to full sunlight, though, as the black device body does absorb a fair bit of heat. If you’re sitting outside with a light breeze going, that mitigates this problem.

Of course, sitting in the shade with just the panel exposed to the sun is the most preferable setup overall.

In terms of software running on the device, I never particularly warmed to the idea of running Manjaro (which the PBP ships with by default), so I run armbian with Ubuntu. I’m not a big fan of Cinnamon or XFCE either, but that’s no big issue: I just started with the Ubuntu Focal XFCE image, and then installed the vanilla-gnome-desktop metapackage and subsequently removed xfce4*.

Overall the Ubuntu aarch64 port works very well on this device with the armbian Linux kernel (currently, that’s 5.9.14), with a couple of small caveats:

Suspend support is essentially limited to suspend-to-idle. I’d really love to have suspend-to-disk support on this device (ideally in combination with encrypted swap, which by itself works fine), but neither that nor suspend-to-ram are currently reliable. Even suspend-to-idle is sometimes unreliable and requires that I restart gdm after resuming.
Some packages just behave oddly, or don’t function at all. For example, ykcs11 just won’t want to accept my PIN when I try to hook my Yubikey up with ssh-agent.
Most PPAs don’t build with aarch64 support. Thus, if you like to run Ubuntu with a bunch of packages that are not in Ubuntu proper, you might have a hard time with the PBP.
The PBP’s SoC maxes out at 4GiB RAM, which means you shouldn’t be using the PBP for video editing or gaming or any other RAM-intensive activities. Even the GIMP runs out of steam pretty quickly at about 3 or 4 concurrently opened images.3

So can I use this as my daily driver? Yes, with some minor drawbacks. But those I can work around fairly well.

If the PBP becomes available for order in Europe via pine64.eu, then — if you are an EU resident — shipping should be faster and you wouldn’t need to pay import duty. At the time of writing, however, the PBP can only be purchased from the main pine64.com store. ↩
The device cannot charge over the barrel port and USB-C simultaneously. ↩
Note that I can get cloud computing capacity for cheap at work, so if I need more RAM for something I can get it in a pinch — I am aware that that option is not available to everyone. ↩

tag:xahteiwi.eu,2022-05-14:/resources/hints-and-kinks/solar-powered-laptop/

Drizzle: the most influential software project you’ve (probably) never heard of

Florian Haas May 10, 2022 Updated May 11, 2022

Drizzle aimed to rewrite the MySQL database server. It instead rewrote collaborative software development.

Show full content

Drizzle was an open-source project1 that, for all intents and purposes, died in 2016. Its project web site is now defunct, and the most recent snapshot from the Wayback Machine is that of September 2, 2016. In July of that year, Stewart Smith (one of the project’s core team) announced on the project mailing list that neither he nor any other core team members had time to dedicate to Drizzle anymore.

Prior to that, the project had been mostly dormant since 20122, having been founded in 2008. So it was properly “active” for just 4 years, and then in limbo for 4 more before finally wrapping up. Chances are, you’ve probably never run a Drizzle database server in production, and quite possibly never spun one up for any purpose either.

And yet, if you’re an open source software developer, you’re probably using something, every single day, that came out of Drizzle. And that something isn’t even software.

Drizzle’s history, a very brief summary

Drizzle started as an attempt to refactor MySQL, and was originally driven by Brian Aker, together with a small team of engineers at Sun (which had then-recently acquired MySQL), in the first half of 2008. A skunk works project that flew under the radar — to put it charitably — at Sun, Drizzle was publicly announced at O’Reilly OSCON of that year. There are a couple of videos floating around from that event (from the keynotes, and from a booth presentation) that are both… well, go and see for yourself. The aforementioned Stewart Smith did a very entertaining talk at linux.conf.au some five years later that covers those events, which you can watch from the official Linux Australia mirror, or from a YouTube upload.

There’s also an interesting old blog post from MySQL co-founder Monty Widenius, written in late July of 2008, which outlines the state of affairs at the time.

Of course, in 2010 Oracle acquired Sun (and with it, the MySQL database) — and Oracle was presumably less than keen on having an in-house fork of the database technology it had just acquired. Thus, the Drizzle engineers found a new home at Rackspace, with the goal of getting Drizzle to a production-ready release. That sort of happened, and the Drizzle package even got into Debian, but after the Drizzle 7.1 release in 2012, adoption did not exactly skyrocket. Development on Drizzle stagnated and eventually petered out. The 7.2 release branch never made it out of the alpha stage.

Today, to the best of my knowledge, you can’t install a Drizzle package on any contemporary operating system. There is no official Drizzle container image on Docker Hub, no DBaaS offering based on Drizzle, nothing.

But Drizzle left a very important legacy.

What did Drizzle do differently?

In 2008, it was already common for open source software to live in public version-controlled repositories. But far from all of them used Git, like the vast majority do today: some used CVS or Subversion, some used Mercurial, and the Launchpad platform (which Drizzle lived on) used Bazaar.

But most of them did have one thing in common, which is how changes landed in the tree. You had a small group of “core committers”, who had write access to the “official” code repository. They could (and would) push changes to the codebase on their own volition and authority. In smaller projects, the core committers “group” might be just one person. If someone outside the core committers group wanted to make a contribution, they had to convince a core committer to merge it.

Sometimes (though quite rarely at the time), projects had some form of scripted unit testing — typically implemented with the then-popular Hudson server, which was subsequently forked to become Jenkins. But such unit tests would be seen as merely advisory: breaking unit tests didn’t necessarily mean that a patch couldn’t land, specifically if the patch originated with a core committer. Unit tests would also not necessarily run automatically when a patch was submitted, they might instead run only if specifically kicked off by a core committer.

The Drizzle team, as Brian put it in a talk I recall attending (though not exactly when and where), “took commit rights away from everybody.” That meant that nobody could push changes directly to a central repository, and everything had to flow through CI tests. The process generally went like this:

You submitted a patch to Drizzle, implementing a new feature. Immediately after your submission, an automated process (in Hudson, later Jenkins) would automatically run its complete suite of unit tests against the current code base, with your patch applied.
Your patch would perhaps break an existing regression test. You would immediately be notified of the failure, giving you a chance to fix the problem that your change introduced.
You submitted a new version of the patch, which would now pass the test suite.
Humans would now review your patch. They would no longer have to worry that your patch broke anything pre-existing (a common question in patch reviews in many contemporary projects), and could instead focus on the merit of your feature addition.
If your reviewers determined that your new feature should come with additional tests (and they usually should), they would recommend you implement a test for your new feature.
You would then resubmit your patch with the added testing functionality, and — assuming everyone was happy with the implementation — your reviewers would give the go-ahead to merge your patch.
At this stage of course, the rest of the codebase might have changed: some other patches might have landed before yours. So, the entire pipeline — including tests that predated your patch, the new tests your patch introduced, and the new tests that other patches might have added in the interim — would re-run with the current state of the codebase with your patch applied. If your patch broke things now, you would be asked to fix them once more.
However, if your change didn’t break anything even now, then there would be no human blocking the merge anymore: as soon as the tests passed, the thing that ran the tests (I don’t recall if in 2008 we already had the term “CI pipeline” for that thing) would merge the patch on your behalf.

Much of this automation was brand new innovation at the time, largely due to the work of Drizzle developer Monty Taylor — who later went on to becoming a highly influential engineer in other projects, which (among many other things) landed him a profile in WIRED in 2013.

The Drizzle team also was pretty diligent about what they considered “breaking things:” for example, the Drizzle test suite contained several performance benchmarks. If a patch made the server perform worse, i.e. introduced a performance regression, that would be treated the same as a functional regression. So you not only would be unable to land a patch that actually broke functionality or made the database server eat data; you would also be unable to land a patch that made the server slower.

The Drizzle team is also where, to the best of my knowledge, a coinage for this kind of approach originated: “gated commits”, or “gating” in general.

How is this relevant?

A substantial fraction on the Drizzle core team — which had moved to Rackspace in 2010 — was instrumental in launching another project that came out of that company (and NASA) that same year: OpenStack. And OpenStack took the gating approach from its humble beginnings with Drizzle to an absolutely massive scale in its hype years (2011 – 2015 or thereabouts) — so much so that it established a new default in collaborative software projects. Many other projects that launched in that timeframe (including Kubernetes and Terraform) adopted this approach as well.

Today, having automated CI testing on every submitted patch is considered par for the course in a collaborative software project. GitLab CI and GitHub Actions workflows have made these much more accessible than they used to be with Hudson and Jenkins. It’s also exceedingly common to do detailed collaborative reviews in a public forum before merging — GitHub’s PR review workflow is ever more closely approaching the Gerrit review workflow that OpenStack uses. GitHub’s auto-merge functionality (which lands patches automatically once they have passed both automated unit tests and human review) is more or less a direct copy of the automated merge found in OpenStack’s development workflow, which itself can be directly traced back to Drizzle’s review process.

And all these things are found in open source software projects across all sorts of communities. Kubernetes, Terraform, Django, CPython, Open edX — you name it, it probably uses an approach first pioneered in Drizzle.

And that’s the real lasting legacy of a project that few people even remember by name.

Who do we owe this to?

I know some of the Drizzle developers personally, though certainly not all. What follows is an incomplete list of people you can buy a meal or a drink if you run into them, and you like the way you collaboratively develop software today:

Brian Aker
Mark Atwood
Aeva Black
Patrick Crews
Eric Day
Patrick Galbraith
Andrew Hutchings
Jay Pipes
David Shrewsbury
Stewart Smith
Pádraig O’Sullivan
Monty Taylor

Acknowledgements

Stewart Smith and Mark Atwood kindly reviewed this article and provided valuable feedback on it. Thanks to both of you! All errors and omissions are of course mine, and mine alone.

Also, though I’ve been meaning to write something like this post for a while, it was ultimately a Mastodon thread by Julia Ferraioli that became my writing prompt. Thanks for that, too!

Disclaimer: I was never a part of the Drizzle project in any role, which for the purposes of this article is probably a good thing as I am not talking about personal accomplishments or failures, in other words I have no skin in the game. This article also does not contain any information about the Drizzle project except that which was available via public channels at the time, or has become public since. ↩
The project did participate in Google Summer of Code in 2013, which is what the last tweets on the project’s Twitter account are about. But the project’s development branch had its last alpha release in September 2012. ↩

tag:xahteiwi.eu,2022-05-10:/blog/2022/05/10/drizzle/

Sweet & savoury stir fry

Florian Haas May 8, 2022 Updated May 8, 2022

Easy lunch. With meat, or vegan.

Show full content

This is inspired by a dish that goes by “Mongolian beef” in parts of the U.S., but I opted for the generic title since it’s neither Mongolian, nor does it require beef. It works with any red meat, but you can also leave the meat out altogether, at which point this becomes a vegan dish.

I normally serve this with jasmine or basmati rice. I have yet to try it with udon, which I am guessing should work well too.

Ingredients

Amounts are for 4 servings.

Meat (optional):

About 400g of red meat (beef flank steak, or leg of lamb)
3 tbsp soy sauce
1 tbsp rice vinegar (or Shaoxing wine, for a finer flavor)
1 tbsp sesame oil
3-4 tbsp cornstarch or potato starch
4-5 tbsp peanut oil (for frying)

Vegetables:

2 tbsp peanut oil (for frying)
4 cloves garlic, finely chopped
1-2 red bell peppers, cut into 2cm squares
1 hot chili pepper, seeds removed (alternatively a jalapeño pepper, if you like it milder), chopped
180g (drained) bamboo shoots
5-6 scallions (spring onions), cut diagonally into 2-3cm pieces

Sauce:

2 tbsp granulated sugar
3 tbsp soy sauce
2 tbsp hoisin sauce
2 tbsp sesame seeds
1 tbsp cornstarch or potato starch
1 tbsp rice vinegar (optional)

Equipment

1 small bowl (for sauce)
1 medium-size bowl (for meat)
Cooking knife and board
Whisk
Large skillet or wok

Method

Prepare the sauce: whisk sugar, soy sauce, hoisin sauce, sesame seeds, and optionally rice vinegar together in a small bowl and set aside.
If you’re including the meat: prepare the marinade by whisking all liquid ingredients together in a medium bowl. Cut the meat into very thin strips and put them in the bowl. Marinate for at least 20 minutes.
Toss the meat in the cornstarch or potato starch.
Heat peanut oil in the skillet or wok on high heat. Fry the meat in batches until the starch turns golden brown, about 3 minutes or so per batch, keeping heat on high. Put fried meat aside in a bowl.
Vegetable fry: heat peanut oil in wok or skillet on high heat. Sauté garlic until fragrant but not brown. Add bell peppers, chili pepper, and bamboo shoots. Continue frying on high heat for about a minute.
Add the cut scallions, mix thoroughly. Cover skillet or wok with a lid, and cook vegetables for about 5 minutes in their own steam. Taste the scallions. If they’ve lost their onion punch and developed a slightly sweet taste but still retain some crunch, they’re perfect.
If going for the meat option, return the fried meat to the wok or skillet.
Pour the sauce prepared in step 1 over everthing, mix thoroughly, and keep cooking for another 30-60 seconds until sauce thickens.
Serve.

Nutrition facts

No warranty of any kind on these. Values are per serving.

With meat:

Calories (kcal) 561 Total fat (g) 34.9 Saturated fat (g) 7.8 Total carbohydrates (g) 29.4 Sugars (g) 13.6 Protein (g) 32.8

Without meat:

Calories (kcal) 223 Total fat (g) 13.0 Saturated fat (g) 2.1 Total carbohydrates (g) 24.8 Sugars (g) 13.4 Protein (g) 4.2

tag:xahteiwi.eu,2022-05-08:/blog/2022/05/08/sweet-savoury-stir-fry/

Entropy, management, and xkcd 927

Florian Haas May 7, 2022 Updated May 7, 2022

As a manager, don’t try to negotiate with the laws of physics.

Show full content

xkcd 927 is a modern internet classic that is frequently brought up in conversations to remind people that a proposal that they’re making will, while being intended to simplify things, actually make them more complicated.

Most people quote that strip to satirize or even ridicule the idea of introducing a “15th standard”, as if the natural order of things was simplification. Such people are frequently baffled by the amount of cruft and clutter that accumulates over time in an organization they work in, and some of them embark on a constant — perhaps career-long — quest of “streamlining,” “process optimization,” or “reducing technical debt.”

If you are one such person, please get ready for some bad news.

As far as we know, there are three fundamental theories that, combined, explain the universe as we know it: general relativity, quantum mechanics, and thermodynamics. Thermodynamics has a famous Second Law that can be stated in various ways — in one modern and simplified form, we say:

The total entropy of a system never decreases.

“Entropy,” in this context, is essentially the degree to which the system is disorderly. In effect, the Second Law states that any system can stay just as orderly as it is now, or it can become more disorderly, but in can never again become as orderly as it once was. The normal state of the world is that things keep getting more and more disorderly.

There are multiple classic examples of this: you can mix two paints in a bucket but cannot unmix them, you can open a container of gas in a vacuum chamber and the gas will disperse but never go back into the container, you can scramble and cook an egg but never return it to its original protein structure.

Sometimes the growth in entropy isn’t noticeable: you can of course pick up your cluttered desk and put everything neatly away in boxes or drawers (or the trash), and your office will look nice and clean and uncluttered thereafter. But, in the process you will have turned so much of your body’s energy into heat that the overall disorder in the “system” (consisting of the things in your office, the room, you, all the gas molecules in the air, and so forth) will have gone up quite considerably.

Now, I realize that not all laws of physics can be directly applied on a macro scale, that is, to organizations, families, or societies. For example, you’ll have to go through various mind-bends to imagine your life as a path through gazillions of Everettian many-worlds bifurcations. But I’d posit that the constant growth of entropy is indeed rather fundamental — after all, growth in entropy is one of our best definitions of the passage of time. Escaping the growth of entropy is literally just as impossible as stopping time.

What does that mean for each of us, individually? It means, bluntly speaking, that our lives get objectively and perpetually messier over time. I don’t know if you’re in a better or worse place than you were 10 or 20 years ago in your life, but I’m pretty sure that you’re in a more complicated place now. And many of us might probably want to go back to our less-complicated life from back then, but alas, backwards time travel (and hence entropy reduction, read: “a more orderly life”) is not an option.

Now as long as you’re just trying (and failing) to rewind disorder in your own life, then — as long as you live and work alone — that will probably not have a harmful effect on anyone. But it gets tricky when you’re applying the same thinking to living with a spouse, or in a family. Good luck trying to rewind your life with teenage offspring, for example, to the presumably simpler time when they were three month old babies that slept most of the day.

But let’s also talk about how this affects your work in a management position.

If you are a manager, it is your job to slow the growth of disorder in your part of the organization. You won’t be able to reduce disorder, and any attempt to do so pits you against a most fundamental law of physics. (Laws of physics are like terrorists: you shouldn’t attempt to negotiate with them.) However, many managers are exactly the opposite: they are entropy accelerators; they speed up the growth of disorder in the organization.

You can do better than that.1 Here are a few suggestions you can apply when dealing with your management peers, so you can act as your organization’s entropy decelerator.

Somebody wants to replace multiple existing things with one new thing (the original xkcd 927 scenario): the only circumstance under which you should agree to this is when you already know for certain that the existing things must go away, within a manageable timeframe. For example, the software solutions that your company has been buying from one vendor have had such a massive price hike that they now break the budget, or the legal ramifications of continuing to use them have become untenable. That’s when you have an option of possibly replacing two (or three) things with one. Under all other circumstances, you can hope to replace one thing with one other, at best.
Somebody wants to solve a communications issue by adding more channels to your company chat, more categories to your issue tracker, more whatever? That’s your cue to stop that dead in its tracks. Opening more lanes of communication never simplifies anything; it always makes things more complicated. Those new chat channels? Tit for tat. They want three new ones, so they must retire three. No, not two. Three.
Somebody wants to “open up team communications”, or “flatten the organization”, so that everyone’s complete graph has way more edges? That’s when you educate them about ${n(n-1)}\over 2$, and what quadratic growth means.

Do you notice how a lot of these involve saying “no” to someone, and that that may place you at odds with the the well-meaning proponent? Congratulations on your realization that leadership is not a popularity contest among your management colleagues.

One word of caution though: even if you fight this good fight — and trust me, it is a good fight — you will still occasionally look back at when you started working in your organization, and realize that despite all your efforts it’s a messier place than when you started. Not just the whole organization, but maybe even your own team or whatever your little corner of the corporate world is. The part where you are responsible for your part of the mess.

This is especially true if you are just in the middle of leaving an organization (or a role therein), and are reflecting on the impact of your tenure: you might fall for the thought of “I tried really hard, but things still are messier than when I got here.” They always will be. The point is not to compare today’s degree of disorder to that when you started. The point is to compare how disorderly it is now, to how disorderly it would have been if you hadn’t been there.

Making a positive contribution to a group in a leadership role is frequently — and somewhat counter-intuitively — achieved by simply focusing on not making things worse for everyone. Canadian astronaut and former ISS commander Chris Hadfield calls this approach “aiming to be a zero” and dedicates a whole chapter in his excellent Astronaut’s Guide to Life on Earth to this idea. ↩

if (!document.getElementById('mathjaxscript_pelican_#%@#$@#')) { var align = "center", indent = "0em", linebreak = "false"; if (false) { align = (screen.width < 768) ? "left" : align; indent = (screen.width < 768) ? "0em" : indent; linebreak = (screen.width < 768) ? 'true' : linebreak; } var mathjaxscript = document.createElement('script'); mathjaxscript.id = 'mathjaxscript_pelican_#%@#$@#'; mathjaxscript.type = 'text/javascript'; mathjaxscript.src = 'https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.3/latest.js?config=TeX-AMS-MML_HTMLorMML'; var configscript = document.createElement('script'); configscript.type = 'text/x-mathjax-config'; configscript[(window.opera ? "innerHTML" : "text")] = "MathJax.Hub.Config({" + " config: ['MMLorHTML.js']," + " TeX: { extensions: ['AMSmath.js','AMSsymbols.js','noErrors.js','noUndefined.js'], equationNumbers: { autoNumber: 'none' } }," + " jax: ['input/TeX','input/MathML','output/HTML-CSS']," + " extensions: ['tex2jax.js','mml2jax.js','MathMenu.js','MathZoom.js']," + " displayAlign: '"+ align +"'," + " displayIndent: '"+ indent +"'," + " showMathMenu: true," + " messageStyle: 'normal'," + " tex2jax: { " + " inlineMath: [ ['\\\$','\\\$'] ], " + " displayMath: [ ['$$','$$'] ]," + " processEscapes: true," + " preview: 'TeX'," + " }, " + " 'HTML-CSS': { " + " availableFonts: ['STIX', 'TeX']," + " preferredFont: 'STIX'," + " styles: { '.MathJax_Display, .MathJax .mo, .MathJax .mi, .MathJax .mn': {color: 'inherit ! important'} }," + " linebreaks: { automatic: "+ linebreak +", width: '90% container' }," + " }, " + "}); " + "if ('default' !== 'default') {" + "MathJax.Hub.Register.StartupHook('HTML-CSS Jax Ready',function () {" + "var VARIANT = MathJax.OutputJax['HTML-CSS'].FONTDATA.VARIANT;" + "VARIANT['normal'].fonts.unshift('MathJax_default');" + "VARIANT['bold'].fonts.unshift('MathJax_default-bold');" + "VARIANT['italic'].fonts.unshift('MathJax_default-italic');" + "VARIANT['-tex-mathit'].fonts.unshift('MathJax_default-italic');" + "});" + "MathJax.Hub.Register.StartupHook('SVG Jax Ready',function () {" + "var VARIANT = MathJax.OutputJax.SVG.FONTDATA.VARIANT;" + "VARIANT['normal'].fonts.unshift('MathJax_default');" + "VARIANT['bold'].fonts.unshift('MathJax_default-bold');" + "VARIANT['italic'].fonts.unshift('MathJax_default-italic');" + "VARIANT['-tex-mathit'].fonts.unshift('MathJax_default-italic');" + "});" + "}"; (document.body || document.getElementsByTagName('head')[0]).appendChild(configscript); (document.body || document.getElementsByTagName('head')[0]).appendChild(mathjaxscript); }

tag:xahteiwi.eu,2022-05-07:/blog/2022/05/07/entropy/

An experiment

Florian Haas May 6, 2022 Updated May 6, 2022

I am launching a small experiment. Would you like to help?

Show full content

A little while back, I posted on my Twitter and Mastodon feeds1 asking people who kept a personal blog to post their RSS or Atom feed URLs. A surprising number of people responded — many more on Mastodon than on Twitter, incidentally, though I have about 15 times as many Twitter as Mastodon followers. You can find the Twitter thread here and the Mastodon thread here.

I now have a very decent aggregated feed, thanks to all who responded!

Now, normally, when I put a new post out, I chuck out a link to the post on those two social networks. For some time, I’d like to do something different, and if you’re inclined, you can help!

So what I’d ask you to do is subscribe to the RSS feed or the Atom feed for my site. For the next few things I post, I will not link them on Twitter or Mastodon — but if you notice them in your feed and find them share-worthy, I would very much appreciate if you posted them there.2 Feel free to tag me in them, I’m @xahteiwi on Twitter and @xahteiwi@mastodon.social on Mastodon.

I’d be really curious to see if it’s feasible in 2022 to reach blog readers with essentially just an RSS feed, and word-of-mouth.

So, if you would like to participate, then just add my feed to your reader, and keep your eyes peeled for the next few articles. Thank you!

If you’re curious about these things: I use Renato Cerqueira’s Mastodon Twitter Crossposter to mirror my Twitter feed to Mastodon, and vice versa. You can take a look at its GitHub project. ↩
I don’t want to run Google Analytics on this site out of concern for your privacy, which is why I don’t know where my traffic comes from. If you post them and tag me, I have something functionally approaching a pingback beacon. ↩

tag:xahteiwi.eu,2022-05-06:/blog/2022/05/06/experiment/

This site now lives on GitHub

Florian Haas May 5, 2022 Updated May 5, 2022

I have moved my site to GitHub Pages. Here’s what that means.

Show full content

I have moved this site to GitHub. It’s still available under the same URL, of course, but it uses GitHub Pages for hosting.

Why did I do this? A few reasons:

I don’t have a comment facility on this site, and I don’t intend to add one, but I do want to give people the ability to submit corrections or sugggestions. You can do that now, by creating a GitHub issue, sending me a pull request, or doing a GitHub edit (which is really just a streamlined way of sending a PR from your browser).
It gives me the option to use a GitHub Actions workflow to deploy the site fully automatically. As you may know I build this site with Pelican, and wiring up a workflow that first sets off a Pelican build and then invokes ghp-import (via tox) was a breeze. You’re welcome to take a look at the implementation if you like. (You can also look at the relevant section in the Pelican docs, of course.)
Overall, this gives me the ability to do quick edits from almost anywhere, and also gives someone else (like you!) the ability to suggest fixes, which I can then apply almost instantaneously. But please don’t expect any such things; I do maintain this site on a “when I get around to it” basis.

In short: if you’ve used this site as a regular or irregular visitor/reader, not much will change. If however you wanted to chuck in the occasional fix or correction, you can do that more easily now.

If at any point I find that GitHub Pages hosting isn’t the right thing to after all, I’ll happily rehome the site elsewhere.

Please be advised that this is still my site, though, and I maintain editorial control of all content. If you’re sending me a PR, please do so with the understanding that I might decline to merge or publish it, for any reason at all. If that’s not for you, please use your own platform.

tag:xahteiwi.eu,2022-05-05:/blog/2022/05/05/moving-to-github/

The Review Review

Florian Haas Jan 29, 2022 Updated Jan 29, 2022

Musings on source code management, code review, testing, deployment, and collaboration culture.

Show full content

I wanted to share a few thoughts on something I consider a rather important topic in our industry: code review and CI/CD tools, and how they relate.

This means that I’m talking about

source code management: where we store our code, and how we manage access to it;
code review: how we coordinate changes to our code;
testing and gating: how we make sure that those changes don’t break anything;
deployment: how we push changes and updates out to the consumers of our code.

In case it’s not obvious, that means I’m talking about a large fraction of the software engineering cycle. Not all of it; the part involving “fooling around” (creative play) is perhaps excluded — but substantially everything where people can be said to be “developing” in a software engineering organization is encompassed in these things.

And there’s a few things that follow from that:

First, whatever tools we use in order to accomplish these four things, they simultaneously influence and are influenced by our collaboration culture.

It’s ludicrous to presume that tools and culture are independent of each other, or to categorically declare that tools must be made to fit processes, not the other way around. That’s not how people work. Culture and tools always have an influence on each other.

Second, the scope of these things is continually expanding as the field evolves. To illustrate, a few years ago a CI/CD platform could get away with supporting automated unit tests and kicking off an Ansible playbook to deploy things to VMs. Today, what we expect out of a continuous deployment pipeline includes support for

a package registry (for Python packages or Node.js modules, to give just two examples),
a container image registry (for Docker/Podman/OCI containers),
a secret store,
the ability to deploy to a Kubernetes cluster.

And that’s just a few examples. I might be forgetting others.

Third, this is a classic example of where we must apply systems thinking: since substantially everything the organization does is connected to the toolchain, we cannot make changes to one part of the system without considering the consequences for the system as a whole. That is not to say that we cannot make incremental changes, just that we can’t pretend that anything in the system stands alone.

To illustrate what I mean, consider the example of an automotive engineer implementing a design change for an engine. If the design change makes the engine so much more efficient that it means a range extension by 10% then that’s excellent. But if in the process the designer has made it impossible to connect the engine to its battery (or the fuel line, if we’re talking about obsolescent technology), then installing the new engine doesn’t just not improve anything — it renders the vehicle immobile.

Responsibility

Now, what does that mean about responsibility? Who is ultimately in charge of the system consisting of source code management and review tools, and your CI/CD pipeline? The answer is hopefully a no-brainer: since everything I talk about including your organizational culture encompasses substantially all of your engineering organization, the responsibility rests with whoever is in charge of your engineering organization (in most companies, that’s often the CTO). And if you’re a software technology company so your entire enterprise is substantially a software engineering organization, it’s your CEO’s or MD’s responsibility.

Of course, that person may delegate some of the tasks and details of running your source code management and code review and CI/CD platform, but responsibility stays with them.

And that responsibility requires both an understanding of the technology itself, and an understanding of how it interacts with your engineering culture. A profound understanding.

And I’d go so far as to say if you head up a software engineering organization and you don’t have a profound understanding of this toolchain and its mutual influence on your culture, you should find another job.

And if you work in a software engineering organization and the person in charge lacks precisely that profound understanding, you should also find another job, because you deserve better.

So having said all that, we can start talking about tools.

And I’m going to talk about three of them, all of which I use in some professional capacity on an at-least-weekly basis.

GitHub

The first one is the toolchain that — I think — a majority of open source developers will be most familiar with: GitHub, whose collaboration model is based on the Pull Request (PR).

Now the GitHub PR model was strongly influenced by the distributed development model of the Linux kernel. The kernel project is what Git was originally written for, so naturally it is also where the original convention for pull requests emerged.

In kernel development, during a kernel merge window, subsystem maintainers fix up a publicly accessible Git tree for Linus to pull from. They then send a message that follows a conventional format to the linux-kernel mailing list (the LKML) outlining the purpose of the changes they want merged. This email contains a summary of the changes, and then an enumeration of each commit to be merged. (There’s a git subcommand, git request-pull, to format such a message.)

The review then proceeds in an email exchange on LKML. Once Linus is happy with the change, he pulls from the subsystem maintainer’s branch and informs them that their changes have merged.

Individual subsystem maintainers replicate this model, perhaps with small modifications, for contributions to the subsystems they are responsible for.

GitHub Pull Requests (PRs)

GitHub replicates some features of the kernel’s model:

The collaboration model is generally, “fork and pull”. Individuals maintain their own forks of an upstream codebase, and then send pull requests when they are ready to review. (However, the review process then uses a web interface, rather than a mailing list — in principle, a GitHub reviewer can do a complete review within the GitHub web interface and source code browser and would never even need to check out the repository locally.)
Each PR generally consists of multiple commits, which however are expected to closely relate and serve a common purpose.
That common purpose is enumerated in a summary at the top of the pull request. GitHub calls this the PR description.
Submitters can mark a PR as a draft, with which they indicate that the PR is not ready to be merged yet. When drafts became available in 2019, they replaced an emerging convention in which PR descriptions would be prefixed by WIP (work in progress) or DNM (do not merge).

GitHub PRs can be approved, rejected or commented on by maintainers or other contributors, and an approval can be made a mandatory requirement for merging, but by default GitHub will let anyone merge the PR who has write permissions to the repository that the PR targets. This includes the possibility for a maintainer to merge the contributor’s remote branch to their own local checkout, and then pushing the merged branch to he target repo of the PR. Such an event will automatically close the PR and mark it as merged.

GitHub Actions

GitHub has, for a long time, allowed maintainers to require that PRs pass automated testing. However, until rather recently, it relied on them to run (or interface with) a separate testing infrastructure outside of GitHub to do that. Typical examples for this included CircleCI, or Travis, or Jenkins. It was only in 2019 that GitHub announced automated testing via GitHub Actions.

At the time of writing however, GitHub Actions workflows are in widespread use for CI/CD, but it is still quite common for GitHub-hosted projects to allow maintainers to circumvent CI/CD tests and merge directly. When this happens, it often creates a rather unpleasant situation in which CI/CD testing is only run for contributions by “outsiders” or “newbies”, whereas maintainers get to break things with impunity. This means that issues are often not detected until a casual contributor sends a PR, at which point the test breaks and leave the contributor confused (and sometimes lead to the change not even being considered because, well, “it makes the tests break.”)

Another thing that comes bundled with GitHub (and GitHub workflow actions) is the ability to maintain your own package registry and push artifacts to it from your workflow. Interestingly, at the time of writing, GitHub’s definition of “packages” includes container images, Ruby gems, and npm modules among others, but presently does not include Python modules — although you do, of course, have the option to push your packages to PyPI from your workflow.

GitLab

The equivalent to the GitHub pull request (PR) is the GitLab merge request (MR). In principle, a GitLab MR is quite similar to a GitHub PR, albeit with a few noticeable differences:

The “fork and pull” model is less prevalent on GitLab. Instead, it is far more common for collaborators to work on one project, and then create topic branches within that project for each set of changes.
Since the project repo is shared, this facilitates collaboration on a single changeset by multiple people: if two or more people wish to collaborate on a change, they simply push additional squash or fixup commits on the topic branch. They can also agree to force-push amended commits to the topic branch, in which the GitLab web interface will helpfully point out differences between individual versions of a commit (something that GitHub presently cannot do in a PR).
As in a GitHub PR, a GitLab MR is generally expected to include one or more commits.
Also as in a GitHub PR, an MR is expected to contain a summary that outlines its purpose.
GitLab MRs have a Draft status just like GitHub PRs do, and they were introduced about the same time in both products, but GitLab had a preceding feature called work-in-progress MRs (WIP MRs). GitLab has the handy feature that MRs are automatically marked as drafts once any commit with squash: or fixup: in the commit message ends up in the topic branch — GitLab rightfully infers that the branch still needs a squash rebase prior to merge.
GitLab MRs can be reviewed in full using the web interface alone: the review interface and the source code browser are closely integrated, just like in GitHub.

GitLab CI

CI/CD has been an intrinsic part of the GitLab review experience for years, since GitLab includes full CI integration via the .gitlab-ci.yml configuration file.

Since GitLab CI has been around for quite a while, and it has a multitude of ways to be used, it “feels” more intrinsic to the review process than GitHub Actions do, which to me still leave an impression of being bolted on. In addition, GitLab CI comes with multiple options of using the CI runner:

You can use shared runners, which GitLab operates for you. These are Docker containers that GitLab spins up on your behalf in the cloud, and which you share with other GitLab subscription customers.
You can also host your own runners. You can do that in Docker containers, in Kubernetes clusters, in virtual machines, and even on bare metal. The runners need no incoming network connectivity; they simply connect to a service on your GitLab host and then poll whether jobs wait for them.
You can also specify runners that are exclusive to a project, or to a group or subgroup of projects.

GitLab also comes with a package registry, to which you can push packages from CI pipelines. This differs from GitHub in such a way that it includes more package different formats, including a private PyPI workalike for Python packages. In addition, there’s also a separate container registry for container images.

Gerrit/Zuul

Now, it feels a bit awkward to call this one “Gerrit/Zuul” when I’ve called the others just “GitHub” and “GitLab” respectively, and tacitly included the corresponding CI integrations (GitHub Actions and GitLab CI, respectively) in them. There are a couple of reasons for that:

Zuul is a CI/CD framework that is, in principle, not tied to Gerrit, whereas GitHub Actions only apply to GitHub, and GitLab CI only to GitLab. Gerrit/Zuul is a particular combination that was largely popularized by the OpenStack community, which is why a lot of people who are or were part of that community intuitively associate Gerrit with Zuul and vice versa.
Likewise, Gerrit is not tied to a specific CI/CD framework. It’s perfectly feasible to run code reviews in Gerrit and use a different CI/CD pipeline (or even none at all).

And Gerrit/Zuul does differ quite notably from GitHub and GitLab, whose features often map quite closely to each other, and I’d like to highlight some of those differences.

Gerrit reviews

The Gerrit review process differs in a few crucial points from the one we know from GitHub and GitLab:

You don’t ask someone to pull from a branch or a fork or yours. Instead, you run git review and Gerrit will make a branch for you. Everything else flows from there.
Unlike a GitHub PR and GitLab MR, which both typically contain a series of commits to be taken as a whole, a Gerrit change is really just that: one change.
Which, of course, also means that we don’t need a separate summary for the change: the summary is the commit message.
It’s still possible to submit a series of commits in the course of a Gerrit review. However, Gerrit simply sees those as a series of changes that all depend on one another.
Dependencies between changes can also be expressed explicitly, by including appropriate keywords in commit messages. Crucially, these dependencies can cross project boundaries. That is to say, a change in one Git repository can depend on a change in another Git repository, so long as they both use the same Gerrit instance for review.
And we also have the equivalent of a Draft PR/MR; in Gerrit that’s called a work-in-progress change.

Because of this, when used in combination with CI such as Zuul, a Gerrit-reviewed project generally expects CI tests to pass on every commit, without exceptions. This is in contrast to many GitHub or GitLab managed projects, which typically only expect the head commit of the topic branch associated with a PR/MR to pass CI.

In Gerrit/Zuul managed projects, it’s also Zuul that merges the commit. This is also in contrast to projects that live in GitHub or GitLab: in those, the pipeline run results are generally advisory in nature, and a successful pipeline run must still be confirmed by a human clicking a Merge button (or running a git merge command locally, and then pushing to the repository). In addition, even a failing CI run can generally be overridden by a “core committer” who has the ability to merge the PR/MR anyway.

A Gerrit/Zuul project typically has no such shortcuts, meaning the only way to get changes into the repo is to pass both peer review, and the CI pipeline. In my experience, this tends to create a climate of leadership by example, which has a beneficial effect on both experienced developers (“seniors” in a corporate setting) and newcomers (“juniors”).

Speculative merging

There is one other property that Gerrit/Zuul has that sets it apart from other review/CI toolchains: speculative merging. This involves the parallel execution of CI jobs for interdependent changes. With speculative merging, even complex, long-running CI/CD pipelines don’t hold up the development process — and this massively enhances project scalability.

No direct repo browser integration

Notably, in Gerrit/Zuul there is no close integration with repository browsing. Gerrit does include the Gitiles plugin for the purpose, but its user experience is rudimentary at best. A popular alternative is to deploy Gerrit with Gitea, but again, that’s not built-in and your trusted Gerrit/Zuul admin has to set it up for you. In addition, while source code browsing in GitHub and GitLab is tightly integrated with project permissions, and that is also true for Gitiles, there is a certain amount of administrative duplication to make your Gerrit repository and project permissions apply to Gitea.

No built-in package registries

There’s another difference in the Gerrit/Zuul stack when compared to GitHub and GitLab, and that is its absence of built-in package registries. Zuul has ready-to-use jobs for pushing to a container registry, or to PyPI, but you do have to either push to upstream public registries, or build your own. Zuul does not come bundled with multitenant private registries the way GitHub and GitLab do.

Administrative complexity

In view of the above, there’s another thing that you might want to consider, which in my humble opinion is an important reason why the Gerrit/Zuul combination has less uptake than it deserves on its technical merits. And this may sound overly dramatic, but: people like to be in charge of their own actions, and software developers are people. And here’s an issue with Zuul: there are quite a few things a developer can do on their own in GitHub Actions or GitLab CI that they’d need to ask an admin’s help for in Zuul.

Creating a relatively standard workflow of building a private container image, pushing it to your own registry, and then rolling out that image to a Kubernetes deployment, is something you can do in GitHub or GitLab as a project owner. With Zuul, you’ll need an admin at least to set up and manage your container registry. Rerunning a pipeline, a simple click of a button or API call in GitHub or GitLab, is something you trigger via a Gerrit keyword (typically recheck) for Zuul — but only on the pipelines where your admin has defined that trigger.

So, which one’s best?

So you want to know which one of these you should choose (or advocate for)? That’s surprisingly difficult to answer, and greatly depends on your priorities. And I’ll give you this from four angles.

When it comes to scalability — the ability to adapt to massive organizational sizes, and/or rapid project growth, or an obscenely large number or projects within an organization — the Gerrit/Zuul combination wins hands down if you have a competent, responsive, and dedicated crew to manage it.
When it’s about getting started quickly — helping a project get off the ground with a good, usable, easily manageable review and fully integrated CI/CD structure — you can’t beat GitLab.
In terms of beneficial effect on your development culture, Gerrit/Zuul again probably scores best. If you have a team that’s great at reviews and commit and CI and doesn’t cut corners, or you want to build a team like that, Gerrit/Zuul can really help.
And when it’s about giving developers the lowest barrier to entry — meaning using tools that they’re most likely already familiar with — GitHub is your platform of choice.

tag:xahteiwi.eu,2022-01-29:/blog/2022/01/29/review-review/

Scaling the flat organization

Florian Haas Jan 16, 2022 Updated Jan 16, 2022

“We need a flat organizational structure in order to scale.” Or do we, really?

Show full content

There’s a common trope in management that goes something like “in order to better scale the organization as we grow, we need to keep it flat.” The thrust of the argument is that as the organization grows to meet customer and demand growth (and with it, growth in head count), additional levels of corporate hierarchy stifle that growth, and should thus be avoided.

For any knowledge-driven organization this is wrong, and just how wrong it is can be proven, numerically, with simple high school level maths. And in this context, “knowledge-driven organization” encompasses any technology company, any software engineering outfit, any technology services provider — in short, any organization that makes its money off the brains of its people.

Let’s establish a few self-evident facts about knowledge-driven organizations:

The people who actually “make things happen” are the ones with no direct reports. The frontend designers, the infrastructure engineers, the backend specialists, the data analysts. Their managers (and their managers, and everyone else all the way up to the CEO) are charged with aggregating information, making decisions, removing obstacles to productivity, and perhaps providing some form of vision and guidance. But it’s individual, non-managerial contributors of all specializations that actually do things.
In doing so, engineers work best in small teams with a great degree of autonomy. They will usually benefit from close working relationships with a small group of people.
A manager’s role is thus twofold: remove any obstacles that stand in the way of the team accomplishing its goals, and act as an interface to other parts of the organization.

An example

With that in mind, let’s consider a hypothetical small company that is currently structured in teams of 5. There’s always 4 people reporting to one manager. Currently, that company is made up as follows:

the CEO/founder, Alex (1 person),
4 team leads (4 people),
4 employees on each team, all of whom report to the respective team lead (in total, 16 people).

So, 21 people in all.

Management theory calls the number of reports per manager in an organization the span of control. I don’t like that term a great deal. For one thing, at four syllables it’s a bit of a mouthful, particularly if it needs to mentioned frequently. But more importantly, it’s not an accurate reflection of reality: in a knowledge-driven organization (like any technology or engineering company), it’s ludicrous to think that a manager “controls” their reports like puppets or robots. So, I’ll use a different term for the remainder of this article: I’m going to call the number of reports per manager the width of the company.

Also, I’ll use the term depth for the number of hierarchy levels that the company has. A sole proprietorship has a depth of zero. A company with a founder-CEO and a few employees, but no other managers, has a depth of 1. Alex’ company, with one level of management reporting to Alex, and everyone else reporting to one of those managers, currently has a depth of 2.

So we can say that Alex’ company is currently narrow and shallow — it has small teams, and few management levels.

Now, the company has just closed a major funding round and several big customer deals, putting them on a solid growth trajectory. So, Alex expects the company to double in headcount on an annual basis for the foreseeable future.

So the question is: is it better for the organization to stick to the current width, and add depth as it grows, or should Alex increase its width, so that it can accomodate more people while retaining a shallower depth?

In other words, as the company scales, should it become deeper while staying narrow, or should it grow wider while staying shallow?

Fast-forward five years

To look at that, Alex mentally fast-forwards five years under the currently assumed growth model. After five years of doubling in headcount, the company now has $21 \cdot 2^5 = 672$ employees.

In this scenario, everyone in the company works still works in a 5-person team, out of which one person is the leader. So every leader has 4 people that report to them. Let’s look at one employee, Sam. Sam works in a team with Joe, Jane, Harry and Ruth, and Ruth is the team lead. Let’s say her title is simply, “Manager”.

Ruth now has at most 3 peers of her own, and reports to someone who goes by “Senior Manager,” putting her in another team of no more than 5 at her management level. That Senior Manager has at most 3 peers again, all of whom report to a Director. A Director also, together with a maximum of 3 other Directors, reports to a VP, and the 4 VPs work together under Alex, who is still the CEO.

Now, I’ll tell you that for 672 people, you’ll not nearly have filled all those 5-person teams. But try to intuitively guess, without doing the math, what organizational size this structure would accommodate. That is to say, with every person in the company being at most 5 hops away from the CEO, and everyone working in a group of 5, what’s the maximum company size this model can handle?

The answer is 1,365.

Let’s quickly break that down and see how we can plug other numbers in.

A gentle bit of maths, part I: team size and hierarchy levels

Say we take company’s width, that is the number of people working together in any group, excluding the leader, as $x$. In our example, that’s $4$.

Then, any team’s size (which we’ll call $n$, for reasons we’ll get to in a jiffy) is of course $x+1=5$.

The number of people any Senior Manager is reponsible for is $(x+1)x + 1 = x^2+x+1 = 21$ (that is, their Manager’s teams, and themselves).

The number of people any Director is responsible for is $((x+1)x+1)x+1 = x^3+x^2+x+1 = 85$.

You see where this is going. For any additional level of depth, we simply need to add another power of $x$.

And of course $1 = x^0$ — at the zeroth depth level there’s one single person: the CEO.

So we can express the number of people in an organization with a width of $x$ and a depth of $y$ as

$$x^0 + x^1 + x^2 + ... + x^y$$

or, more briefly:1

$$\sum_{i=0}^{y} x^{i}$$

And that, in turn, happens to work out to 2

$${x^{y+1}-1} \over {x-1}$$

Plug in the numbers for $x=4$ and $y=5$, and we get 1,365.

A gentle bit of maths, part II: communications in complete graphs

Now, what’s our scaling constraint in a knowledge organization? The number of people you need to constantly be in touch with in order to accomplish your goals.

For Sam, those people are principally your Sam’s teammates team colleagues, including their manager, Ruth. That’s 4 people. However, it’s not enough for Sam to understand what he is exchanging with Jane, Joe, Harry, and Ruth; it’s also imperative for him to understand what they communicate about. So, Sam needs to keep himself appraisedof what Ruth told Harry, or what information Jane gave to Joe, and how Joe and Harry are coordinating their latest change (etc.).

That means that within a team, communications are a complete graph. And for a complete graph, the number of edges is given by

$${n(n-1)}\over 2$$

In our case, $n$ is our team size (including the leader), thus $x+1$ (the reports plus the leader).

So we can rewrite the complete-graph formula as:

$${{(x+1)(x+1-1)} \over 2} = {{x(x+1)} \over 2}$$

So in order for the team to be well informed of everyone’s actions at all times, a 5-person team must keep track of 10 communications links between people. That’s absolutely doable, though we must keep in mind that the number of links does not grow linearly with the number of people in direct communications which each other, but it grows proportionally to the square of that number.

Sam’s manager Ruth, of course, works on two 5-person teams: Sam’s, and Ruth’s team of fellow Managers reporting to a Senior Manager. That means Ruth needs to constantly keep in touch with the people on her team (including Sam), and also understand what everyone on her team of Managers is doing. Thus, she keeps track of 20 communications links. This is also true for her Senior Manager, that Senior Manager’s Director, and that Director’s VP. It’s only at the very top that the CEO has the luxury of directly managing only 4 VPs.3

This should be flatter! Or should it?

Now, suppose someone tells Alex that in this growth plan the organization is much too hierarchical, and the organization must thus lose some of its projected hierarchy levels — that is, reduce its depth. Of course, the only way to do that while still being able to manage the same headcount growth is to make the company wider — in other words, have more people report to one manager than previously planned.

So Alex, being a good CEO, opens some spread sheet software and creates this handy table that simply plugs in values for $x$ and $y$, with $x$ (width) in columns and $y$ (depth) in rows.4

2 3 4 5 6 7 8 9 10 11 1 3 4 5 6 7 8 9 10 11 12 2 7 13 21 31 43 57 73 91 111 133 3 15 40 85 156 259 400 585 820 1,111 1,464 4 31 121 341 781 1,555 2,801 4,681 7,381 11,111 16,105 5 63 364 1,365 3,906 9,331 19,608 37,449 66,430 111,111 177,156 6 127 1,093 5,461 19,531 55,987 137,257 299,593 597,871 1,111,111 1,948,717 7 255 3,280 21,845 97,656 335,923 960,800 2,396,745 5,380,840 11,111,111 21,435,888 8 511 9,841 87,381 488,281 2,015,539 6,725,601 19,173,961 48,427,561 111,111,111 235,794,769 9 1,023 29,524 349,525 2,441,406 12,093,235 47,079,208 153,391,689 435,848,050 1,111,111,111 2,593,742,460 10 2,047 88,573 1,398,101 12,207,031 72,559,411 329,554,457 1,227,133,513 3,922,632,451 11,111,111,111 28,531,167,061

For our previous five-year plan, Alex can just look up the cell matching $x=4$, $y=5$ and finds our known outcome, a maximum head count of 1,365.

Now, Alex looks at what it takes to flatten the organization by eliminating one hierarchy level, or by two.

If we want to reduce depth by 1, we simply go up one row (thus, $y=4$) and find the value for $x$ that just accommodates 1,365 people or more. Alex sees that that’s $x=6$, which can accommodate 1,555 people. That is, increase the width by 2: reorganize from teams of 5 to teams of 7. Alex could also pick $x=5$, that is increase the width by only 1, which would land the company at a maximum head count of 781. That is well below what $x=4$ can handle, but it still lands Alex north of the original growth target of 672.
If we want to reduce depth by 2, we go up two rows ($y=3$) and do the same. We end up at $x=11$, which means to increase width by 7: reorganize from teams of 5 to teams of 12. Thus, we land at a maximum of 1,464 people, slightly exceeding the headcount we’re able to accommodate if we keep growing with the current structure. We could also do $x=10$ or $x=9$, landing us at maxima well below that (1,111 or 820), but still north of 672.

Now what does that mean in terms of communication channels each person has to maintain?

Again, what we want to keep in mind is the number of edges in a complete graph connecting $n$ (that is, $x+1$) points. For regular employees, we know that that’s

$${x(x+1)}\over 2$$

And for any manager, who is effectively on two teams of size $x+1$ simultaneously, that’s

$$2 \cdot {{x(x+1)}\over 2} = x(x+1)$$

Which means:

If we want to reduce depth by 1 and go from $x=4$ to $x=5$, every non-manager employee now needs to be aware of 15 communications links (instead of 10), every manager, of 30 (instead of 20).
If instead we go from $x=4$ to $x=6$, every non-manager employee now needs to be aware of 21 communications links, every manager, of 42.

So that’s a least a 50% increase, or even a doubling, of communications complexity.

For the elimination of two hierarchy levels (a depth reduction by 2), we’ll need to move from $x=4$ to at least $x=8$. At that point, every regular employee has at least 36 communications links on their teams to deal with; every manager deals with 72.
If instead we go to $x=9$, every non-manager employee now needs to be aware of 45 communications links, every manager, of 90.
And for $x=10$, every non-manager employee now needs to be aware of 55 communications links, every manager, of 110.

At this point Alex realizes that making the company wide and shallow, instead of narrow and deep, is painfully expensive in communication cost.

But what about all those managers we won’t have to pay?

A well-meaning advisor interrupts Alex in the middle of planning. He interjects that Alex is missing a point, namely all the managers that the company will now no longer need, and the cost savings thus generated.

So Alex looks at the table again (width in columns, depth in rows):

What’s handy here is that Alex can look at any one table cell, and the cell directly above it will contain the total number of managers (that is, people who have direct reports) for the same width. So,

for $x=4$, $y=5$ (our original scenario allowing the company to grow to 1,365 people), Alex would have to hire and pay a total of 341 managers.
for $x=6$, $y=4$ (the scenario that eliminates one level, and can accommodate 1,555 people), Alex’ company will need 259 managers. That’s 82 fewer managers, or a reduction by about 24%.
for $x=5$, $y=4$ (the scenario that eliminates one level, but accommodates only 781 people), Alex’ company will need 156 managers. That’s 185 fewer managers, or a reduction by about 54%.
for $x=11$, $y=3$ (the scenario that eliminates two levels, and can accommodate 1,464 people), the company will need 133 managers. That’s 208 fewer managers, or a reduction by about 61%.
for $x=10$, $y=3$ (the scenario that eliminates two levels, but accommodates only 1,111 people), the company will need 111 managers. That’s 230 fewer managers, or a reduction by about 67%.
for $x=9$, $y=3$ (the scenario that eliminates two levels, but accommodates only 820 people), the company will need 91 managers. That’s 250 fewer managers, or a reduction by about 73%.

A gentle bit of maths, part III: how much of our company will be managers?

It so happens that we can generalize this. If Alex looks at our table again, but considers the number of managers proportional to the number of people in the company, a pattern quickly emerges (again, width is in columns, depth is in rows):

2 3 4 5 6 7 8 9 10 11 1 33.33% 25.00% 20.00% 16.67% 14.29% 12.50% 11.11% 10.00% 9.09% 8.33% 2 42.86% 30.77% 23.81% 19.35% 16.28% 14.04% 12.33% 10.99% 9.91% 9.02% 3 46.67% 32.50% 24.71% 19.87% 16.60% 14.25% 12.48% 11.10% 9.99% 9.08% 4 48.39% 33.06% 24.93% 19.97% 16.66% 14.28% 12.50% 11.11% 10.00% 9.09% 5 49.21% 33.24% 24.98% 19.99% 16.66% 14.28% 12.50% 11.11% 10.00% 9.09% 6 49.61% 33.30% 25.00% 20.00% 16.67% 14.29% 12.50% 11.11% 10.00% 9.09% 7 49.80% 33.32% 25.00% 20.00% 16.67% 14.29% 12.50% 11.11% 10.00% 9.09% 8 49.90% 33.33% 25.00% 20.00% 16.67% 14.29% 12.50% 11.11% 10.00% 9.09% 9 49.95% 33.33% 25.00% 20.00% 16.67% 14.29% 12.50% 11.11% 10.00% 9.09% 10 49.98% 33.33% 25.00% 20.00% 16.67% 14.29% 12.50% 11.11% 10.00% 9.09%

You’ll see that at a depth of 1, the share of managers is obviously $1 \over {x+1}$, but then as we increase in depth it quickly trends toward:5

$$1 \over x$$

The number of managers in Alex’ company is roughly the reciprocal of the company’s width. In other words, the number of managers is inversely proportional to width.

In contrast, the cost of communications is directly proportional to the square of the width.

At this point Alex realizes that while there are indeed savings to be made by the elimination of management in a wide-and-shallow company, they cannot possibly balance the added communication cost.

In other words: the cost in communications inefficiency grows much faster with width, so much so that it will eat up Alex’ company’s manager payroll savings several times over.

In summary

The “flat” (wide) organization scales poorly. Its growth in communication cost far outpaces its savings in payroll cost. And it scales progressively worse, the “flatter” (wider) it gets.

In this capital-sigma summation formula, $i$ doesn’t mean anything other than it being a counter. The formula is pronounced, in English, as “sum of $x$ to the $i$, from $i$ equals zero to $y$” (in other words, add up all whole-number powers of $x$, from $x^0$ to $x^y$). ↩
You might notice that this expression is indeterminate for $x = 1$. Now I’d say the idea of a hierarchical company made up of one-on-one teams (every manager has one report, who in turn is the manager of one report, and so on) is extremely unrealistic. But just for completeness’ sake, we can apply a limit to show that $$\lim_{x \to 1} {{x^{y+1}-1} \over {x-1}} = {y + 1}$$ In other words, such an organization could accommodate a number of people that is equal to its depth plus 1. ↩
This why they might also be able to appoint a CFO, CSO, CTO or whatever other C-suite functions are appropriate for the organization. So in the scenario we might end up with a handful more people than 1,365 for the C-suite and perhaps some number of staff in their offices. But for the purposes of this discussion those don’t make a big difference, so we’ll disregard them for now. ↩
I encourage you to compare the bottom rows and rightmost columns of this table to Wikipedia’s list of largest employers. ↩
If you’re curious, that is because the share of managers in relation to the total number of people in the company is $${\sum_{i=0}^{y-1} x^{i}} \over {\sum_{i=0}^{y} x^{i}}$$ That works out to be $${x^y-1} \over {x^{y+1}-1}$$ Which, for $y=1$, is $${{x-1} \over {x^2-1}} = {{x-1} \over {(x+1)\cdot(x-1)}} = {1 \over {x+1}}$$ And for larger values of $y$, both $x^y$ and $x^{y+1}$ become so large that the $-1$ part barely matters, so it’s effectively: $${{x^y-1} \over {x^{y+1}-1}} \approx {x^y \over x^{y+1}} = {1 \over x}$$ In slightly more formal terms, we can consider $1 \over x$ the limit of the expression as $y$ goes to infinity: $$\lim_{y \to \infty} {{x^y-1} \over {x^{y+1}-1}} = {1 \over x}$$ ↩

tag:xahteiwi.eu,2022-01-16:/blog/2022/01/16/flat-org-scaling/

Voice messages

Florian Haas Dec 5, 2021 Updated Dec 5, 2021

Apparently, some people think you should replace professional textual communications with voice messaging. Here’s why I think that that’s a bad idea.

Show full content

As of late, I’ve noticed that when people share one of my articles on asynchronous communications on Twitter (particularly any from the Getting out of Meeting Hell series, or the one on meetings that should have been an email), there’s a reply from a brand account that likes to plug/advertise their service. That service recommends that synchronous meetings be replaced by “asynchronous meetings” based on voice messages.

I’d like to point out that I consider that an utterly terrible idea.

Let me explain why.

Voice is slow

First, voice messages suffer from the exact same drawback that meetings do: they are incredibly slow. Most of us speak at a rate of approximately 4 syllables per second.1 In English, that translates to about 120-140 words per minute. That means that as a listener, you absorb the content of a voice message at the same rate. You might make that a little more efficient by increasing playback speed, but that’s only feasible to about a 25% speed increase, which lands you around 150 words per minute.

In contrast, unless you are dyslexic (I’ll get to that in a bit) you can read at 240 words per minute.

In other words, conveying a certain amount of information by voice takes nearly twice as long as doing the same in writing. And that’s if your verbal expression is perfect, which it never is — any voice message will come with its fair share of filler words (“uh”, “um”, “y’know”) and incomplete sentences.

Add to that the occasional slurred word or phrase, or idioms unfamiliar to the recipient of the message. If you come across something that’s unclear while reading, it takes you fractions of a second to re-read a sentence, and maybe a few seconds to re-read from the beginning of a paragraph. But in the listening case, it may take you upward of a minute to go back and re-listen to a passage you missed or didn’t understand. (Anyone who both reads books and listens to audiobooks will relate to this.)

Voice is more difficult to follow and retrieve

Secondly, voice messages are usually much more difficult to understand for a recipient listening in their second or third language, particularly if the other person is a native speaker using an accent that is unfamiliar to them — say, a French person listening to heavily Scots-accented English or a pronounced Australian twang. Written messages might still have their ambiguities — as an example, the word “doubt” meaning “question”, a common substitution in Indian English, frequently confuses non-Indian English speakers — but those are far fewer and easier to resolve for a reader.

Furthermore, until speech recognition is perfect and automatic transcription is thus exquisitely faithful, your voice messages aren’t searchable. You could say that they’re practically half-off-the-record. Good luck trying to come back to an important bit of information that someone conveyed via a voice message that you have only a vague recollection of. Or, worse, trying to establish the context in which a decision was made, and having to piece it together from multiple voice messages.

Voice doesn’t convey as much nuance as you think

Thirdly, the notion that you ought to be using voice messages to add “nuance” when you can’t convey such nuance in your writing strikes me as patently ludicrous.

When you need to convey emotion or feeling or nuance to a greater extent than you would be able to in writing, that is absolutely a situation in which you should meet with a person face-to-face, one-on-one, and where that doesn’t permit itself, get on a video call. At that point, when a written message won’t cut it, a voice message absolutely won’t.

One good use?

Now, there may be one useful use of voice messages that I can think of: they may work for people with dyslexia, for whom consuming a lot of writing may cause cognitive overload. In that case, voice messages might be a workable alternative. If so, then that would make the option to communicate via voice message a very valid accessibility consideration. That said, I’ve talked to people who are dyslexic and who said that voice messages are not an adequate substitute for interactive verbal communication to them — but that’s of course highly anecdotal, and shouldn’t dismiss the idea outright.

However: even if voice messages are a good thing for people with dyslexia, though, I think that as screen readers continuously improve, generating speech from text may a be preferable option. That’s because it retains the searchability advantage for everyone, and the efficiency advantage for non-dyslexics, while also accomodating people with dyslexia. But, I want to re-emphasize that that’s just a hunch, and I may well be completely wrong. If you’re dyslexic and have thoughts on this, I’d love to hear from you! Please find me on Twitter or Mastodon.

Fun useless fact: the rate of 4 syllables per second is practically universal across spoken languages. How many words a native speaker of a particular language speaks in a minute depends largely on the average number of syllables per word in that language. ↩

tag:xahteiwi.eu,2021-12-05:/blog/2021/12/05/voice-messages/

Creativity: How we lost it, why that’s bad, and how we get it back

Florian Haas Nov 21, 2021 Updated Nov 21, 2021

A summary of how I think we ended up where we are right now, and what we can do to get moving again.

Show full content

Right now, it’s easy to open a news site and come to the conclusion that the world as we know it is well and truly fucked. It doesn’t matter if you’re looking at Covid response, or inaction on climate change, or corruption, or communications surveillance: it increasingly looks like we are being governed and managed overwhelmingly by dunderheads just bumbling along, equipped with less than the most basic empathy, and lacking the mental faculties required to understand exponential growth or conditional probability or even percentages. And people — intelligent people — are seriously pondering the situation with utter befuddlement: how the hell did we get here?

The German writer Max Buddenbohm recently asked his followers a question to that effect on Twitter, which I am taking the liberty to translate:

Do you have a reflected opinion on why everything is so poorly organized, as in at its core? Historically or sociologically, what’s the real principal reason? How did this happen?

Now it’s perhaps a bit amusing and stereotypically German to complain of ſchlechte Organiſation! in the middle of a pandemic and global climate cataclysm, but the sentiment behind the question is sound: it looks as though at every twist and turn, those empowered to make any kind of decision make the wrong one, or none at all, or — the worst — aren’t even able to come up with sensible options between which to choose.

And I have a hypothesis: I think the issue at the core of why everything appears to be going down the tubes is that we have systematically drummed creativity out of people, for at least fifty years. And as a result, we collectively have no idea how we can get ourselves out of a rut.

Let me swiftly explain what I mean by “creativity.”

The wonderful Sir Ken Robinson, who left us much too soon in 2020, described creativity as “the process of having original ideas that have value.” And in arguing for the value of creativity, we don’t need to get all hippy-touchy-feely: creativity — and teaching and learning creativity — is a simple economic necessity that is also essential to our survival as a species.

If we have no idea what our world will look like ten years — or even ten months — from now, then how the hell is any “hard skill” we acquire today guaranteed to be useful in future challenges? The paramount faculty we need to acquire, train, and nurture is the ability to come up with flexible, intelligent, creative solutions to the problems we’ll encounter tomorrow.

And this we haven’t been doing. From an early age onwards, generation after generation has been schooled in “the right way” to do things. And since “the right way” exists only for things we already know, we are now ensnared in a trap called conformity that is no good at all in the present situation where so many of us are confronted with things we haven’t a clue about.

To illustrate what I mean, allow me to offer an anecdote from my own personal family experience. When my son, who is now nearly an adult, was a nine-year-old pupil in his fourth year of primary schooling, he started to be tasked with writing little stories — it would be an exaggeration to call them “essays” at that point — in school. And, given the fact that he was quite an imaginative kid, the first couple of stories he wrote were truly charming and delightful. But after a few weeks, something strange happened: his stories were getting rather bland and boring, and were hardly a reflection of his vivid imagination anymore. So as his parents, we gently queried about this mysterious phenomenon — and he was quite happy and forthcoming to explain the reason. He had observed, he informed us, that the fellow pupils of his that had got good marks and the teacher’s appreciation on the first stories they had written were the ones that had turned in the writing with the fewest spelling and grammar mistakes. And apparently he resolved then and there to henceforth only turn in stories that were composed of sentences strung together from words he already knew how to spell, using constructions that he was already familiar with — and that he was thus unlikely to stuff up. Of course that made reading the stories about as exciting as watching paint dry, but it kept the teacher happy and thus, off his back.

And I can attest that this very much goes for my own schooling as well. It doesn’t matter if we’re talking about my German classes or English classes or French classes, “correctness” always prevailed over originality or wit or creativity. I don’t mean to insinuate that clever writing should get you a free pass to absolutely butcher your spelling and grammar, but then on the other hand perfect orthography and punctuation did always make up for abject boredom, and that’s not quite right either. And this wasn’t restricted to just language classes: in maths exams there was no extra credit to be had for arriving at the correct solution of a problem in a novel or unconventional way. Nay, such brazen nonconformity would net you either a reprimand from the teacher for not showing the correct path to the solution, or at least a snide remark of the “oh you think you’re very clever don’t you, now sit down and behave” type.

My English and French and German teachers appear to have been unaware of another fact related to their institutional correctness obsession. Consider this: people who actually make a living from writing — no matter if it’s fiction or non-fiction — tend to work with editors, people whose calling it is to not only correct issues with orthography or punctuation, but also to helpfully point out plot holes and suggest the occasional rewrite of dialogue or rearrangement of chapters. Editors are highly respected by writers and instrumental to the success of a book but yet, strangely, these people’s names are normally not printed in bold letters across the book cover, and they also don’t appear in best seller lists. If my language teachers had been correct, editors should be celebrity superstars! And they should hold far greater prestige than the silly authors who only come up with the storylines but regularly struggle with the placement of a semicolon.

And the problem with the idea that errors are awful, and the fact that that idea is being hammered into our heads from an early age, is that this has a devastating effect on our creative imagination:

I don’t mean to say that being wrong is the same thing as being creative. What we do know is: if you’re not prepared to be wrong, you’ll never come up with anything original.

— Ken Robinson, “Do schools kill creativity?” (2006)

So, if you want people to be boring and dull and utterly devoid of originality, then foster a culture where being error-free is paramount.

In the business world, some tend to look down on others who have a tenuous relationship with spelling and grammar and punctuation, chiding such deficiencies as “unprofessional.” But at the same time, it’s commonly accepted to be dragged into a meeting and then forced to listen to someone drone on for an hour in a narcotic monologue consisting of the recitation of thirty-four wall-of-text slides with precisely seven bullet points each, in a ferocious assault on everyone’s attention and consciousness that gives an overdose of Valium a run for its money. How, pray tell, is it ever “professional” to steal people’s precious lifetime by boring them out of their fucking minds? And yet, this is somehow acceptable behavior in the world of business.

Let me add another bit of anecdotal first-hand experience: a few years ago when I was making most of my living as a travelling technical consultant, I was often brought in to help a team of engineers chart a path for solving a particular problem using one of the technologies I knew a little bit about. And in doing so, some decisions frequently boiled down to two choices, which I was always happy to lay out in detail: here is option A, it comes with these advantages and disadvantages, and here’s option B, it comes with those advantages and disadvantages. And I would explain to my client that it is now their business decision to determine whether the pros of A outweigh the pros of B for their business, and whether or not they would be able to live with the cons of whatever option they chose.

And inevitably, the reactions to this nearly always fell into one of two categories.

The first category was gratitude for me having laid out the options clearly and distilled the pros and cons of each, informed by my technical expertise on the matter and my understanding of their situation. And the business decision was either completely obvious to them, or they appreciated having a good basis on which to make a decision, which they resolved to make in the following days or weeks, presumably after some more empirical, experimental evaluation.

And the second category was complete confusion on a person’s face, followed by the exasperated question, “so what do you recommend?” Or, worse, “what’s the right way?”

You can probably guess which category of answers was more likely to come from managers — or “leaders”, as such people like to be called in a gross exaggeration of their capabilities.

And now imagine someone having gone through a conventional primary and secondary education, then continued on to university and from there into business or the law, or maybe the academic Ph.D. track, while in parallel having risen through the ranks of that cult of rigidity and conformism called a political party — and ultimately entering public office.

On that note, another illustrative anecdote. Not from myself or my family but still close to home: you may remember how early in the Covid-19 pandemic in February 2020, the Tyrolean ski resort of Ischgl became Central Europe’s first major infection hotspot, directly linked to at least 600 cases in Austria and more than twice as many across Europe. (At the time, 1,800 confirmed Covid cases seemed like a lot. As I write this, we have about 10 times as many. Per day.) The universal understanding of the majority of observers at the time — and today — was that the situation on the ground had been horribly botched, and that egregious mistakes were made that greatly facilitated the spread of Covid-19 across Europe.

The official in charge of public health in the provincial cabinet then went on national news a couple of weeks later to discuss the events. And, despite the repeated questions from the exasperated interviewer, voiced with increasing levels of disbelief, the official kept insisting that the authorities had “done everything right” and were not to be faulted at all for their actions — and inactions — in this public health emergency.

And I don’t even think that this was just hubris or an attempted cover-up. Rather, it’s a symptom of the exact problem I’m trying to describe. If you’re applying only what you already know to a situation that’s never been here before, you’re failing horribly at dealing with that situation. But if you’ve been conditioned that applying “the correct solution” is all you’ll ever have to do in life to succeed, you eventually end up genuinely believing that that’s enough.

And that’s how, in the greatest crisis that humanity has faced in peacetime in over a century, what we’re stuck with are leaders who, for the most part, have been so thoroughly molded by the perpetual vicious cycle of conformism that they now operate with the decisiveness and agility of a herd of mammoths deep-frozen in permafrost. They are just shockingly ill-equipped to deal with a global health crisis affecting a closely interconnected civilization. And the ones that actually used creative ways to get to power turn out to be sociopathic one-trick cronies that could not apply their skills to something useful if their life depended on it.

It looks as though systematically, those who have the power to make decisions, at all levels — in business, politics, anywhere — frequently lack the intellectual, emotional, and empathetic creative capabilities required to make those decisions. This is not to say that exceptions to this rule don’t exist — I’m lucky enough to live in a town where officials from the mayor on down have been exceptionally creative, empathetic, and successful in their Covid response, for example — but I would argue that the rule does stand.

Now, it would be entirely fair to counter my arguments with the observation that surely, in the say 1950s or 1930s or 1910s schooling and education were still more rigid than they are today, and certainly did not allow for more creative freedom than they do now. And that is certainly true, but there is something that children (at least those lucky enough to go to school, I am aware that for those who spent their childhood toiling in the fields or coal mines it was an entirely different story) had during their schooling in those days that is a precious rarity today: time and space to let their mind play.

John Cleese writes in his excellent book on creativity that mental play — the ability to let your mind wander and thereby become open to developing new ideas — is a key element of the creative process. And this is by no means limited to music or literature or the arts; some of the examples he lists of mental playfulness leading to groundbreaking new ideas are from the world of science. Now, it is essential to be able to keep our mind in that state of playfulness for some time, because it takes a little while for new ideas to pop into our head. And so, Cleese observes:

The greatest killer of creativity is interruption. It pulls your mind away from what you want to be thinking about. […] It might be an interruption from outside, like someone coming over and talking to you, or an email popping up in your inbox. Or it may come from inside, as when you suddenly remember something you’ve forgotten to do, or worry that time is running out, or that you don’t think you’re clever enough to solve whatever problem it is you’re trying to deal with.

— John Cleese, “Creativity: A Short and Cheerful Guide” (2020)

And this is the bit that’s incredibly difficult to achieve for most people born after about 1970, and many born earlier too. That includes any child alive today, but also their parents and many of their grandparents. We are constantly dealing with outside interruptions, many of them coming from a device we carry in our pockets all day. We have to fight for our uninterrupted mental play time. A child in the 1950s might just run off to play with friends for the afternoon, and return home for supper. Without a text from a parent or a snapchat message from the school bully rudely interrupting halfway through. It’s perhaps no coincidence that some of those 1950s children ended up being 26-year-olds who could figure out powering up a disabled spacecraft while it’s on a free-return trajectory around the Moon, saving the life of a three-man crew in the process.

There’s more evidence that even in the world of engineering, allowing your mind to let go for a bit can lead to creative breakthroughs: Jim Crocker was the Ball Aerospace engineer who solved the problem of how exactly the corrective optics on the Hubble Space Telescope should be installed — obviously not a scenario that anyone accounted for in Hubble’s design. The ingenious idea that ended up saving Hubble from being a multibillion dollar boondoggle came to him in the shower.

Also, in offices in which we worked with maybe one other colleague in the room, we had stretches of time where the other person was off running an errand in town or taking a meeting in a conference room. And you could close the door and put up a “do not disturb” sign and do some uninterrupted thinking.

Around the year 2000, most of that started to change, dramatically. Open-plan offices, which of course were allegedly introduced to “facilitate cooperation,” eliminated any room for uninterrupted mental play at work. Emails replaced typed and printed memos, office workers transitioned from workstations to laptops. Pagers and cell phones started to buzz people at home. Around 2010 we progressed to smart phones, tablets and other portable devices that had the ability to ping us out of playful thought with an audible notification at any time of day. Knowledge work became interrupt-driven — a sentence that in itself should make anyone shudder that knows anything about knowledge work at all.

And it’s no surprise that people who rose to corporate leadership after being imprinted by an interrupt-driven lifestyle — that is, people now in their 50s — think that such a thing is normal, and try to impress the same thing on their subordinates. That’s how you get to Slack-driven companies. That’s how you end up in a culture where people are proud of getting back to any email within 30 minutes (or less), and expecting the same from everyone. That’s how you end up with managers who take being signed into a chat (and thereby constantly listening for interruptions) as a measure of being “active”, so much so that they end up tracking metrics for it, and of course also making it a target they call “engagement” or some other abomination.

Organizations that do that are at risk of constantly shutting down creativity, problem-solving, and innovation. What they’re good for is developing efficient cookie-cutter techniques for optimizing solutions for yesterday’s issues. What we need today — in a global pandemic, and in the roiling climate crisis that’ll make this pandemic look like a walk in the park — is people and organizations equipped with the mindset for the issues of tomorrow. And a first imperative for making that happen is to let people think.

So is everything all doom and gloom? I’d say it isn’t, and there are indeed some things that make me hopeful. Some of those are related to a changed approach to creativity in education, some, to a changing approach to work.

For example, in my country as of a few years ago it is indeed such that creativity and originality are accorded at least some merit in marking and grading standards. And this is at the secondary school level, traditionally one of the most rigid and unchanging branches of education where I live. Much more still is happening at the primary and preschool level, where I see much more emphasis on thinking, creative play, and innovation in my younger kids’ education than I did in my older ones’.

And then, there’s the big push towards asynchronous distributed work — where obviously the asynchronous part is the bit that matters. Sure, companies suffering from offissification still exist, but for a while, so did dodos. But an ever-increasing share of humanity is beginning to understand what it’s like when you’re no longer shackled to seventeen Slack channels you constantly need to watch, when you can take a walk in the middle of the day because you know it’s OK to push something off for an hour to clear your head and come up with an idea, and when instead of spending an hour in a meeting you can spend 5 minutes reading a memo and use the other 55 minutes for thinking. And that’s where things get interesting.

tag:xahteiwi.eu,2021-11-21:/blog/2021/11/21/creativity/

Meaningless Metrics, Treacherous Targets

Florian Haas Nov 14, 2021 Updated Nov 14, 2021

A quick introduction to Goodhart, Strathern, Campbell, Yankelovich, and McNamara.

Show full content

A common feature of organizations in the software technology industry (but certainly not only in that industry) is their fixation on metrics, measurements, and quantifiers. I understand that this is frequently done and advocated for in the spirit of making management more objective, less arbitrary, more scientific, and perhaps fairer. But since they say that the road to hell is often paved with good intentions, here’s a quick summary of what we know about about the undesirable side effects of such an approach.

Goodhart’s Law

British economist Charles Goodhart wrote in 1975, in an article about British monetary policy:

Any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes.

— Charles Goodhart, “Problems of Monetary Management: the U.K. Experience” (1975)

That’s a mouthful of somewhat niche technical jargon, but let me try to paraphrase it like this:

You collect some data.
You crunch the numbers using statistics.
You observe a pattern.
You distill a value (a “statistical regularity”) from it.
Someone decides that that value should change: it is too high or too low.
Someone — an individual or a group — is tasked with bringing that value up or down, and then keeping it high or low, or rising or falling, or above or below a particular threshold.
That value now is no longer a useful statistical indicator.

What you probably knew as Goodhart’s Law if you’d heard about it prior to reading this article is a generalization by anthropologist Marilyn Strathern, also from the UK:1

When a measure becomes a target, it ceases to be a good measure.

— Marilyn Strathern, “‘Improving ratings’: audit in the British University system” (1997)

Why is that so? It’s because once you make the measure a target that has an influence on people (for example, meeting it gets them a bonus, failing at it gets them a demotion), you have wired them to improve the measure, and not necessarily to improve the underlying conditions that the measure originally arose from. Therefore, they might opt for gaming the measure, because that gets them to their goal (a promotion, for example) more quickly and at less effort to them.

Furthermore, even keeping the option of fudging the numbers aside: when faced with a choice between doing something that might have a negative effect on the measure and something else that might have a negative effect on something other than the measure, people will tend to choose the latter. This may lead to situations where people avoid an activity with significant inherent value, just to avoid depressing a measurement — a concept known as creaming.2

For example, a hospital may be interested in measuring individual surgeons’ intraoperative death rates: the percentage of a surgeon’s patients that die in the middle of surgery. On its face, this metric could help weed out bad surgeons. If a particular surgeon is an outlier and has way more patients dying on their operating table than their peers, it’s possible that that surgeon might be doing something wrong: they could be incompetent, or frequently intoxicated, or even be a Dr. Death type serial killer.

It gets tricky, though, when in the interest of transparency the hospital doesn’t just fire or retrain incompetent surgeons which it identifies based on such statistics, but when it “publishes” the patient mortality data. (I use quotes here because this does not necessarily mean sharing it with the general public, but perhaps sharing it with all of the surgical staff.) At that stage, an individual surgeon’s rank in the statistics will become at least a matter of pride, status, and prestige, even if it’s not otherwise rewarded in any way, nor seen as a precondition for continued employment.

This, then, will incentivize surgeons to avoid taking on risky surgeries where there is a significant chance of the patient dying mid-surgery — surgeries typically attempted in the first place to save the patient’s life, in the course of an immediate major emergency. Thus, Dr. Alpher who only ever treats torn knee ligaments might look better in the ranking than Dr. Bethe the polytrauma specialist, or Dr. Gamow the neurosurgeon who specializes in particularly challenging malignant brain tumor removal. If there is a non-negligeable risk of intraoperative death for a particular brain cancer patient and such an event would be bad for Dr. Gamow’s ranking, then Dr. Gamow might have an incentive to declare that patient inoperable — and as a result the patient would certainly die, just not in surgery.3

Campbell’s Law

Although less well known than Goodhart’s law, Campbell’s law is closely related and, in my humble opinion, just as important.

Donald T. Campbell, a U.S.-based social scientist, wrote in 1976, on the subject of standardized testing in education:

The more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor.

[…]

Achievement tests may well be valuable indicators of general school achievement under conditions of normal teaching aimed at general competence. But when test scores become the goal of the teaching process, they both lose their value as indicators of educational status and distort the educational process in undesirable ways.

— Donald T. Campbell, “Assessing the impact of planned social change” (1976)

In other words, if you conduct a one-time evaluation of student achievement across many students in multiple schools, then the fact that the test is standardized might help in achieving comparable results. However, as soon as you make the tests a repeat occurrence, and tie students’ test results to school funding allocations, teacher salaries, or even just school prestige, you’re undermining their original purpose: teachers will now spend a significant portion of their time and effort to ensure that students score well on the test, rather than build the competence that the test was originally designed to measure.

This is an example of allocating resources (teacher and student time and effort) to an activity with no inherent value (taking a standardized test) just to improve a measurement (the test score). And since the resources are finite, spending them on the activity with no inherent value (test-taking) makes less of them available to the inherently valuable activity the indicator is intended to assess (teaching and learning). This is the “corruption and distortion” Campbell talks about.

The McNamara Fallacy, and the Yankelovich Ladder

Closely related to Goodhart’s, Strathern’s and Campbell’s observations is something called the McNamara Fallacy.

Robert McNamara, U.S. Secretary of Defense during much of the Vietnam war, infamously believed that he could scientifically measure the progress of the war by quantitative indicators alone. One of his favourites was body count, the number of enemy personnel killed, in comparison to friendly casualties. The rationale appears to have been, whatever other factors (qualitative or quantitative) are in play, whichever side kills more of the other wins the war. Indeed he seems to have been inclined towards ignoring all non-quantitative indicators of how the war was going.

An anecdote told by U.S. Air Force general Edward Lansdale alleges that he (Lansdale) pointed out to McNamara in a briefing that McNamara, when assessing the progress of the war, failed to take into account the feelings of the common rural Vietnamese people. McNamara then allegedly wrote an item saying “feelings of the Vietnamese people” on his list of things to keep track of in pencil, pondered it for a moment, and then erased it — reasoning to Lansdale that feelings cannot be measured, thus must not be important.4

This is step 3 on a progressive scale social scientist Daniel Yankelovich described a few years later:

The first step is to measure whatever can be easily measured. This is OK as far as it goes.

The second step is to disregard that which can’t be easily measured or to give it an arbitrary quantitative value. This is artificial and misleading.

The third step is to presume that what can’t be measured easily really isn’t important. This is blindness.

The fourth step is to say that what can’t be easily measured really doesn’t exist. This is suicide.

— Daniel Yankelovich, “Corporate Priorities: A continuing study of the new demands on business” (1972).

And it’s somewhat remarkable just how often businesses and organizations fall into this trap, fifty years later. They might not end up at step 4, but falling for step 2 or 3 is bad enough.

An applied example

Let’s now turn to an example from our industry. Something that’s so important, evidently, that it has given rise to a whole discipline in our field: site reliability.

Now it’s perhaps a bit amusing that although you can find myriads of articles describing what site reliability engineering (SRE) is, a definition of “site reliability” lives only in a small footnote of the Google SRE Book:

For our purposes, reliability is “The probability that [a system] will perform a required function without failure under stated conditions for a stated period of time”.

— Betsy Beyer, Chris Jones, Jennifer Petoff, Niall Murphy, “Site Reliability Engineering: How Google Runs Production Systems” (2017)5

But, at least there is a definition, which is good. Now I think it’s reasonable to say that the following two statements about site reliability are probably true:

In keeping with SRE reflecting a holistic approach to engineering, trying to unify a multitude of considerations, site reliability is not something we can judge by a single, numerical, universal, and useful metric. You can’t measure a single “site reliability score”, and then compare hundreds of platforms based on that.6
Whatever site reliability is as a whole, it certainly includes a site’s ability to process your data and not mangle it. So, if you upload your data into a platform, you want to be able to do something useful with it.

SRE tends to rely on service level indicators (SLIs) to measure compliance with service level agreements (SLAs), manage error budgets, and generally keep track of what shape the site/platform is in.

So, let’s compare two indicators that differ greatly in their measurability.

Availability is exceptionally easy to measure for, say, a REST API. You send a request with a defined payload, you measure the time it takes to serve your request, you check the status code, you check whether the response contains what you expect, and you record a data point.
Durability is much more difficult to measure at any given point in time. Effectively, to properly take a data point for durability at the same time as getting one for availability, you’d have to read back some data you wrote, say, a year ago, and check its content against something like a known hash.7 And also write some data now, travel a year into the future, read it back at that point, travel back into the present,8 and record your data point.

Now before I continue I’d like to inject another thought to the issue of data durability: not every platform is a storage solution. In other words, you don’t always have the option of reading your data back verbatim. Say for example you’re feeding an inordinate number of data points into a platform that ingests and aggregates them. You may not even be interested in the original data some months or years down the road, so it might be acceptable (and even necessary, as dictated by cost concerns) to discard the original data immediately after it has been processed. And that rules out the possibility (or necessity) to ever read it back exactly as it went in. But you will be interested in the statistics that you generate based on the aggregated data.

And now suppose there is a subtle bug in the implementation of the aggregation algorithm. As in, the algorithm itself is perfectly fine, but there’s a flaw in the implementation. That, too, may render part of your data unusable or outright invalid, violating data integrity and durability.

But the tricky part here is that availability is easy to measure. Data durability isn’t. Therefore, availability lends itself to becoming a target (hello, Professor Strathern), and durability tends to be seen as difficult to measure and hence less important (hello, Secretary McNamara).

So now, if you find yourself in charge of a system that you suspect has started to corrupt a significant fraction of customer data, data which customers are pouring into it at an alarming rate, what do you do? You’re not sure whether there’s actual corruption yet. The proper thing to do, if it’s impossible to rule out or fix the data corruption problem immediately,9 is probably to stop intake, and also ensure that no requests are served that may touch potentially corrupted data — that is, shut the service down even before you’ve ascertained corruption. But if you suspect that your next bonus payout or promotion may rely on you meeting your availability goals, and you know you’re already shaving it close with your availability error budget, would you really be inclined to do that?

“You can’t manage what you don’t measure”

There’s a popular saying in management circles that takes one of the following forms:

“If you can’t measure it, you can’t manage it.”
“You can’t manage what you don’t measure.”
“You can’t manage what you can’t measure.”

Whichever variant you discuss, it is commonly attributed to either Austrian-American management thinker Peter Drucker, or to American engineer and statistician William Edwards Deming. Drucker is seen by many as highly influential in management theory, Deming developed groundbreaking sampling techniques used on the massive scale of the United States census. So either of them would be an authority on management and measurement, lending high credibility to the statement.

There’s just a small problem: neither of them appears to ever have said or written anything to that effect.

The closest that one of them, Deming, ever wrote was:

It is wrong to suppose that if you can’t measure it, you can’t manage it — a costly myth.

— W. Edwards Deming, “The New Economics for Industry, Government, Education” (1993)

In case you didn’t notice, the point this makes is the exact opposite of the popular version of the quote. It’s so wrong that it comes close to the corruption of the Seneca lament, “non vitæ sed scholæ discimus”, “we learn not for life but for school,” of which you surely learned the inverse… in school.

Metrics-obsessed managers often take the misquote for gospel. So much so that they frequently see issues where a qualitative approach is obviously necessary, and they still try to apply quantification.

My standard example for this are employee satisfaction surveys.

Ultimately, what leadership should be interested in learning from those surveys is how good people feel about working in the company. There are a number of factors that contribute to this: are they overloaded, well utilized, or bored? Are people treating each other with respect and kindness, or malice and contempt? Does everyone feel that they are doing something meaningful, or do they all hate their work and are solely in for the money? All these things are inherently qualitative. And the company could do a great job by hiring a person trained in sociology or psychology, who sits down with people for confidential qualitative interviews, and then prepares a research report with findings and recommendations that management can act on.

But no, we have to measure. Make everyone take an online survey where they rate everything on a scale of 1 to 5. Do you know what that is? Exactly, step 2 on the Yankelovich ladder. Give that what can’t easily be measured an arbitrary quantitative value — because that’s what it is, arbitrary. People from different cultures won’t agree even on what a simple 5-step scale really means.

And depending on what version of the faux quote they adhere to, a manager may even be farther up the ladder:

If they say “you can’t manage what you don’t measure” (with the translation being “I won’t concern myself with anything for which I don’t have quantitative data”): that’s step 3, blindness, that which isn’t measured isn’t important.
If they insist that “you can’t manage what you can’t measure” (with the translation being “I won’t concern myself with anything that isn’t quantifiable”): that’s step 4 (suicide), that which isn’t measured doesn’t exist.

So, what now?

Every article and book on bad metrics ends on a positive note, giving you suggestions for “good” metrics: for example, make them hard to game, make sure they are defined by competent experts, ensure that they are in line with inherent ideas of respectability and professionalism. Honestly, I’ve yet to come across a metric that ticks all these boxes.10

So, I am aware that if you are running a platform under an existing SLA, you will be running under some metrics of questionable utility that you cannot get rid of — just because they happen to be industry standards.

However, instead of expanding metrics obsession to your entire organization by introducing ever more counterproductive metrics, I want to propose a different approach:

Whatever you measure, make the marginal cost of a measurement negligeable.12 The cost of adding a new metric should be practically zero. The moment someone has to repeatedly spend time on collecting and compiling the data, they can’t spend that time on doing productive work (and Campbell says hi), so you want to avoid that.
This effectively means that all the systems you care about (machines, services, applications) should generate collectable data points, everywhere, all the time.11 And you probably won’t be collecting metrics from anything else. In other words, you are just measuring that which is easily measurable, and you keep aware that there a lot of things you don’t measure that are just as important. You stay on step one of the Yankelovich ladder.
Now, I’d propose you make the data thus ingested available throughout your organization, in machine-readable form and using standardized APIs. You want people to actually discover things from your data.
Encourage people to use real, scientific, statistical methods to figure out statistical regularities (“indicators”). Offer statistics training to people who are interested.
Once someone identifies a statistical regularity, encourage them to form an opinion of whether it would be beneficial for it to go up or down, formulate a hypothesis on what change to your system would have the desired effect, and conduct an experiment. If the experiment has no effect, roll back the change and proceed with the next hypothesis. If it has an adverse effect, roll back and try the opposite. If it has the desired effect, keep the change. Move on to discovering the next regularity. Resist the urge to make the discovery a target. (Otherwise, Strathern will drop by.)
Constantly observe and identify things that are important, but not measurable. Apply qualitative analysis, emotion, and empathy. (Otherwise, McNamara will introduce himself.)

So, is there anything inherently wrong with measuring or measurements? Nope. But making them targets, introducing arbitrary quantifiers, and ignoring everything else is.

The reason the condensed version is called “Goodhart’s Law” and not “Strathern’s Law” is apparently due to a coinage by British researcher Keith Hoskin, who wrote a year prior to Strathern, in a paper she cited:

“Goodhart’s Law” — that every measure which becomes a target becomes a bad measure — is inexorably, if ruefully, becoming recognized as one of the overriding laws of our time.

— Keith Hoskin, “The ‘awful idea of accountability’: inscribing people into the measurement of objects” (1996)

↩
If you think that term sounds a bit odd, I’d agree. I guess it comes from the idea of milking a cow and then skimming only the cream, discarding the rest. ↩
The surgery statistics example of creaming is paraphrased from Jerry Z. Muller, “The Tyranny of Metrics” (2018). ↩
The Lansdale/McNamara anecdote is paraphrased from the Wikipedia article on the McNamara Fallacy, which in turn cites Rufus Phillips and Richard Holbrooke, “Why Vietnam Matters: An Eyewitness Account of Lessons Not Learned” (2008) as its source. ↩
It should be noted that the SRE book is itself quoting a definition of reliability found in Patrick P. O’Connor & Andre Kleyner, “Practical Reliability Engineering” (2012). ↩
The irony is not lost on me that by the definition quoted in the SRE book, such a score absolutely should exist if its definition of reliability were adequate: it claims to be a probability. Probabilities go from 0 to 1. That would make site reliability a dimensionless quantity between 0 and 1, end of story. But it goes without saying that such a score would be “an arbitrary quantitative value”, which would put it on step 2 of the Yankelovich ladder. ↩
That hash would have to be separately stored outside the system. If the hash is stored alongside the data whose integrity it’s meant to protect, then it only guards against unintentional data corruption, but not against deliberate manipulation. ↩
I wish to point out that the only bit that’s impossible here is the backwards time travel. The forwards time travel is fine, we all travel forwards in time all the time, just at a constant rate of one second per second. ↩
I’ve run into a few issues of suspected silent data corruption in my career and I’ve never been in the situation where a reliable fix was available immediately. ↩
In particular, pretty much any real-world metric fails the “hard to game” test. Said Lukas Grossar on Twitter: “It always amazes me that people don’t believe that slapping a KPI onto something won’t lead to people gaming that KPI. We’re engineers for God sake, making broken stuff work in our favor is basically our job description.” ↩
I’d argue that this requires strong privacy guarantees for your users/customers. Effectively, just don’t collect data that’s none of your business. ↩
Emphasis on marginal. It’s obvious that the fixed cost of building and maintaining an instrumentation platform and metric system is nonzero. But once you’ve got it set up, the cost of adding a new metric should be substantially zero. ↩

tag:xahteiwi.eu,2021-11-14:/blog/2021/11/14/meaningless-metrics-treacherous-targets/

Warnock’s Dilemma, Objections, and Acknowledgements

Florian Haas Oct 30, 2021 Updated Oct 30, 2021

“Warnock’s Dilemma” is a classic feature of distributed, asynchronous, online communications. You can’t avoid it, you can’t work around it, but you can deal with it with two simple changes to your communicative behavior.

Show full content

People skeptical of distributed, asynchronous, written communications sometimes make the understandable objection that it is often difficult to interpret the reactions, specifically the absence of reactions, to written online communications.

The reasoning goes like this: if you inform someone of something in a face-to-face conversation, there is practically no way for them not to provide some sort of feedback. Even if the recipient of a verbal message doesn’t say a word, they usually exhibit some unconscious, nonverbal reaction, which can carry a whole load of information:

The person might smile, light up, and become actively engaged,
they might express surprise (pleasant or unpleasant),
they may show signs of dismay or annoyance, or even anger,
they might just stare or wander off, indicating disconnection or indifference,
or anything in between.

In online, textual communications, people obviously also exhibit all those reactions, sitting at their desk, lounging on their couch, walking with their phone — it’s just that the sender of the message usually never gets to see them.

In addition, a face-to-face conversation is a one-to-one communication mode that we direct our undivided attention to. In contrast, textual online communications are often many-to-many, and we usually get many more parallel inputs than we do when we’re speaking to a colleague or acquaintance or friend.

That means that while in a face-to-face conversation we’re always answering, or at least showing our reaction to what was said, in online textual communications we must pick and choose what to react to, and what to just absorb without providing any kind of feedback to the message sender.

Warnock’s Dilemma

In online communities this has been known since at least 2000, when Bryan Warnock formulated it as “the ostrich theory”, although it eventually was named “Warnock’s Dilemma”1 by Dave Mitchell. Writing about mailing list posts without replies, Bryan wrote:

The problem with no response is that there are five possible interpretations:

1) The post is correct, well-written information that needs no follow-up commentary. There’s nothing more to say except “Yeah, what he said.”

2) The post is complete and utter nonsense, and no one wants to waste the energy or bandwidth to even point this out.

3) No one read the post, for whatever reason.

4) No one understood the post, but won’t ask for clarification, for whatever reason.

5) No one cares about the post, for whatever reason.

— Bryan Warnock, “Re: RFCs: two proposals for change”, perl.bootstrap mailing list, 2000-08-07

In asynchronous and distributed work communications, we have much the same issue. The beautifully crafted five-paragraph briefing that you sent out this morning may have been considered manna from heaven by your recipients (if you’re a manager, your recipients are usually your direct reports), and they immediately sprung into action energized by your electrifying leadership. Or maybe nobody understood a word of the unintelligible drivel you concocted, but out of respect or courtesy they are very hesitant to point this out.

So in my humble opinion, there are two very simple things you can do as a manager to address Warnock’s Dilemma in your distributed team: making it a habit to specifically encourage objections, and establishing a culture of acknowledgements.

Encouraging objections

The habit I have developed to encourage objections is to not merely ask the recipients of a message to raise questions if they have them, but to ask them to poke holes in whatever I’ve been writing.

To that end, I have standing phrases that I use, such as:

“Do you think that sounds reasonable?”
“Did I overlook something important?”
“Can you think of a better way to do this?” (Better than my suggestion, that is.)
“I’m pretty sure I’m missing something here, can you pitch in?”
“Am I way out in left field with this?”
“How nuts of an idea is this?”

I might use a variation on one or several of these phrases at the end of an email, but also in the comments section of a wiki page (or in individual inline comments), or even in the reply thread of an issue tracker.

This serves multiple purposes:

There are many individual areas of knowledge where someone on my team is more of an expert than I am. Obviously, I want those people’s ideas on the table.
It establishes the notion that nobody’s opinions or suggestions on technical matters are sacrosanct, and we want to do the right thing in any situation, not follow the hippo.2
It encourages others to ask for feedback in the same manner, whenever they float ideas or suggestions of their own.
It establishes that there’s nothing wrong with being wrong from time to time.

Acknowledgements

So now, on to acknowledgements, that is, what to do when you have no objections on something.

Here’s a general rule that I use: all communications should be acknowledged. Yes, really. Anything my team sends me, I try to reply to with at least “ack” or “OK”, but frequently it’s something like “great, thanks!” — it costs nothing to be kind.

Likewise, for everything I send to my team I can count on getting the same kind of reply back. There’s something I need to pass on from higher up? Or just something I want everyone to know? Out goes an email, in come a few “ack” replies over the next few hours. I don’t even have to specifically ask anyone for acknowledgement anymore, it just happens.3

In this context, we deliberately use written acknowledgements — as in, somebody actually types something, even if it’s just the two letters “OK”. I find Like buttons or thumbs-up or email “read receipts” (anyone remember those?) oddly perfunctory.

This has a nice side effect, in combination with encouraging objections: someone who — while knowing that objections are always encouraged — acknowledges an idea, opinion, or plan, actually makes it clear that they are on board with it.

To his credit, Dave points out in the same message that “dilemma” is technically inappropriate as the problem described includes five choices, not two. It’s properly a pentalemma. ↩
HiPPO: the Highest-Paid Person’s Opinion. Following the hippo is when you value ideas and opinions by seniority of their originator, not by correctness or factual merit. ↩
We’ve also codified this in my team’s communications guidelines. You don’t have something like that? Write them. ↩

tag:xahteiwi.eu,2021-10-30:/blog/2021/10/30/warnock-dilemma/

This Meeting Should Have Been an Email

Florian Haas Oct 27, 2021 Updated Oct 27, 2021

If you believe that it’s just a silly joke or an overused trope, please read this.

Show full content

Fellow managers, there is an ongoing trope in just about any software technology or knowledge based organization (and probably others, too) that goes like this:

This meeting should have been an email.

It’s such a well-established meme at this point that you can buy mugs saying so. Or cross-stitched “award certificates”, or ribbons. And yet, many of you appear to dismiss it as a nerdy joke, and refuse to take the sentiment behind it seriously.

And this even though you may agree that your organization has too many meetings. Even that you are in too many meetings. But you’re convinced that sadly, sadly you can’t cancel that meeting. Or that one. Or the quarterly financials update. Or the update about the shakeup in the CTO office. Or the meeting explaining at your level what the CEO just communicated to everyone via a video message or an email of their own.

You can do it. I’m here to help.

“Email” means any structured, written communication that allows for feedback

Let’s set one thing straight to begin with.

The standing phrase is “this meeting should have been an email” because that’s catchy. But that’s not to say that you actually need to write an email message. What it really means is that to communicate whatever it is that you’re trying to get across, you use a medium that

uses written expression,
allows you to formulate complex thoughts and reasoning in writing,
allows people to comment and share feedback, in writing,
ideally allows for that feedback to subsequently be worked back into the original writing.

You’ll see that particularly considering item #4, email isn’t even the best option available at your disposal. Instead, you can look at the following, additional options, all of which will probably be available to you in some form:

a shared flow-text document, like a Google Doc, a collaboratively edited Office 365 Word document, or a Nextcloud Text document,
a page in your organization’s wiki, like MediaWiki or Confluence,
or even a barebones shared text editor, like Etherpad.

So all of these are good.1 All of them are better than a meeting. With near certainty at least one of them is available at your disposal.

Meetings burn people’s time

Meetings are gigantic time consumers. And the productivity gains from switching to well-structured written communications are enormous.

To illustrate, allow me to offer some first-hand experience. When I’m being called to attend a meeting, my colleagues will attest to the fact that I am a meticulous note-taker. I write meetings up in our corporate wiki, and I record notes, rather than producing a verbatim transcript. But I can guarantee you that I will write down every point that the attendees make that’s worth remembering or referring back to. This includes some key points that I do record word-for-word. I’ve been in meetings with 20 attendees of 1 hour in length. My meeting notes never go over 2,000 words for such a meeting, and usually they’re more like 1,000 words. So that means that for a meeting that burns 20 person-hours just to attend (that is, not including meeting prep), what actually gets said can be summarized in 2,000 words, tops.

Now, consider that the average silent reading rate for English speakers is approximately 240 words per minute. So people can read a 2,000-word summary in under 9 minutes, a 1,000 word one in about 4. In other words, by conveying the information in writing rather than orally, you can eliminate five-sixths to fourteen-fifteenths of that useless overhead. Or put differently, replacing an hourlong meeting with a well-written briefing gives each and every person 6 to 15 times more productivity. And that’s not even counting the benefits of eliminating the meeting as a forced synchronization point.

But writing things up means more work for me!

You may argue that although you understand that putting together a well-written briefing (instead of calling a meeting) saves everyone else time, it takes up more of your time.

Let me observe this: If you’ve been convening and chairing meetings of an hour, and you haven’t been spending about as much time preparing for that meeting yourself, then I’m sorry to break it you but you may not have been a very conscientious meeting chair all along. In fact, you may have be been rather disrespectful of other people’s time, and now is a very good time for you to change.

If however you have been a conscientious meeting chair and every one-hour meeting did, in aggregate, consume about one hour of meeting prep (including scheduling, collecting information, and preparing it so you have it all ready to go), then rejoice: the onerous scheduling-and-roping-everyone-in bit is gone, so that saves up a sizable chunk of your time, and you can punch out 1,000 to 2,000 words in 30-45 minutes. So, less work for you. Admittedly, not as dramatically so for you (the writer) as for your erstwhile attendees (now readers), but still pretty substantial.

(Not to mention the fact that team productivity gains in the order-of-magnitude range, see above, should make your heart jump with joy.)

OK but how? I don’t know where to start!

I’ve written about this before, but I’d like to come back to this again: if you’re looking for guidelines on structuring your writing for what you would otherwise communicate in your meetings, look at the 5-paragraph briefing format, adapted from the NATO 5-paragraph field order. If you make it your habit to at least think about this format, chances are that your briefing will be pretty damn comprehensive:

Situation
Objective
Plan
Logistics
Communications

Let’s break these down in a little detail:

Situation is about what position we’re in, and why we set out to do what we want to do. You can break this down into three sub-points, like the customer’s situation, the situation of your own company, any extra help that is available, and the current market.
Objective is what we want to achieve.
Plan is how we want to achieve it.
Logistics is about what budget and resources are available, and how we can use them.
Communications is about how we’ll be coordinating among ourselves and with others in order to achieve our goal.

Sometimes you want to give not a full briefing, but a simple update, such as because circumstances have changed. In that case, you may only include the first three items, and the changes that apply to it.

It’s good practice to always include these three (that is, situation, objective, and plan): to you it may be clear and obvious that since the situation has changed, a slight modification of the plan (or the objective!) is necessary. To others, it might not. So just always include the current situation, the current objectives, and the current plan.

You can also apply this to a problem statement, where it’s just as useful:

This is what we’re currently dealing with, and how I see it (that’s the situation)
Here’s why it’s a problem, and why it needs to be fixed (that’s an objective)
This is my suggestion for how it could be fixed (that’s a plan)

And finally, some poetry

And as my final writing tip for improving communications and eliminating needless meetings, I want to leave you with some poetry. These lines that just so happen to serve as a perfect mnemonic for professional briefings.

I keep six honest serving-men:
(They taught me all I knew)
Their names are What and Where and When
And How and Why and Who.

— Rudyard Kipling, The Elephant’s Child, 1902

Strive for all your professional writing to answer most or all of what, where, when, how, why, and who, and watch your need for meetings evaporate like morning dew in glistening sunlight.

Please note that interactive chat (like Slack) is not in this list. It fails the “formulate complex thoughts and reasoning” test. ↩

tag:xahteiwi.eu,2021-10-27:/blog/2021/10/27/this-meeting-should-have-been-an-email/

No, We Won’t Have a Video Call for That: The Companion Pieces

Florian Haas Oct 23, 2021 Updated Oct 23, 2021

The original article continues to prompt a lot of thoughts and discussions, so I’ve written a a couple of follow-up pieces:

Getting out of Meeting Hell is a short series about getting from a distributed workplace that attempts to duplicate the synchronous nature of an office by sticking everyone …

Show full content

The original article continues to prompt a lot of thoughts and discussions, so I’ve written a a couple of follow-up pieces:

Getting out of Meeting Hell is a short series about getting from a distributed workplace that attempts to duplicate the synchronous nature of an office by sticking everyone into video meetings all the time — with usually disastrous results — to one that successfully adopts an asynchronous way of working. It has suggestions for employees, mid-level managers, and executives.
Please, make my company distributed! takes a broader view on changing organizations so that they become better suited for a distributed and asynchronous style of work, and why that’s a lot harder than most people think.

tag:xahteiwi.eu,2021-10-23:/resources/presentations/no-we-wont-have-a-video-call-for-that-the-companion-pieces/

Please, Make My Company Distributed!

Florian Haas Oct 23, 2021 Updated Oct 23, 2021

Turning an organization into one that works in a distributed and asynchronous fashion is far from trivial.

Show full content

After No, We Won’t Have a Video Call for That, which covered how productive distributed teams operate, and the Getting out of Meeting Hell series, which focused on how you as an individual can get to being a member of a functional distributed team, let’s zoom out a bit.

Let’s discuss a slightly larger, organizational picture, just in case you ask yourself this question: “Why isn’t my organization distributed and asynchronous? And, as companies bleed talent right and left because competent people run for the competition that does get distributed work right, why don’t they wake up and become like that? Can someone please make my company more distributed and asynchronous?”

For additional context, let me interject the following observations from Kris Köhntopp, who posted them in German on Twitter (I am taking the liberty to translate here; emphasis mine):

What’s funny is that [what matters to being successful as a distributed company] are all learnable skills — written communications, sensible meeting prep and follow-up, correct definition of objectives and tasks, etc.

That’s a craft.

But it appears that organizations prefer to bleed their teams dry by attrition, rather than to learn, or to hire people that can help teams to write things down, professionally maintain a wiki, and teach and drive communications.

And further downthread, Kris says (again, my translation and emphasis):

This is all definitely feasible, after all there’s remote-first companies of substantial size and they work.

They must have built themselves somehow, they didn’t get to where they are by pure chance.

I agree with all of that (obviously), but things are complicated. So let’s drill into this a bit.

Building is easy. Changing is hard.

I’ve founded and bootstrapped a distributed company, and I’ve also been involved in making a localized company more distributed. Take my word for it: founding was daunting and scary and stressful, but in terms of shaping structure and communications — even if you have to figure it out as you go, as I did with my cofounders1 — it’s easy. Changing the communication structure of an established company is hard. And slow. Even if you have full support from the top.

So, if you’re in an organization that wasn’t distributed from the get-go, do not compare it to one that was.

Also, do not compare it to one that is two years ahead of yours in reshaping itself in a distributed and asynchronous manner. Also, if your company went distributed kicking and screaming at the start of the pandemic, do not compare it to one where the top leadership made a conscious decision to be more “remote friendly” or “remote first” or whatever their preferred term is, long before the pandemic hit. These organizations are ahead of yours. Their change may be happening just as slowly as yours, they just started the race earlier and are a couple of laps ahead of you.

But your company hasn’t even started the change to distributed and async work? You spent the last nearly two years in unfettered meeting hell? Your bosses think you’ll now go “back to normal”, because an office is the only “normal” they know? You know what to do.

Hiring people to help: aye, there’s the rub.

Now on to the idea of hiring people to help. There’s two ways to look at that: hiring people as employed managers to work and lead in the organization in a distributed and async fashion, or hiring people as management consultants to guide and advise the company in the distributed and async transition.

(Please note: the idea of hiring only regular employees that drive a distributed and async change “from within” — against the resistance or inertia of established management — is ludicrous, unethical, and unworkable.)

Hire distie managers?

If you make it a priority to hire managers geared toward distributed and asynchronous work, then if they’re worth their salt I can guarantee you that one of the first things that they’ll ask in their first interview is this:

“What’s my line manager’s (director’s, VP’s, etc) distributed and async work experience? How about my lateral peers’?”

And if the truthful answer is “they’re new to this,” it’s quite likely that your prospective new management colleague will politely end the conversation. Their skills are in high demand; they have plenty of options to go elsewhere. Why should sign they up for a job that’s going to come with a ton of needless friction?

And of course, that consideration applies in all management positions, all the way to the top. So for this to work, in an organization currently hellbent on localized-synchronized work (I use offissification for that state of affairs), it would need at least its entire C-suite to come around first. And it would then probably need to replace at least one-third of its current managers at all levels, to have any chance of attracting fresh blood in management.

You’ll notice that this isn’t exactly easy to do, plus it’ll probably take longer than you have the patience to put up with it.

Hire management consultants?

So that leaves the option of hiring people that are not part of the organization but are brought in to advise, ideally on all management levels. These people are called management consultants. And you should understand that they are commonly loathed by mid-level line managers. But let’s leave that aside for now. Let’s assume that you want to consider the possibility of a management consultant whose communication skills and empathy and talent are so outstanding that they are absolutely not hated by anybody.

Then, please put yourself in that consultant’s shoes. Before they get anywhere near you, they would have spent countless hours in — you guessed it — meetings with the CEO, with top-level management, possibly with department heads, to get buy-in. And then, they embark on a multi-month project where they must use the style of work currently prevalent in your company, because that’s the only way to even approach people. So they spend more time in — are you with me — meetings to convince and educate people. Oh, and they probably need to bill by the day or by the hour, in some arbitrary increment that your bean counters dictate, because otherwise they can’t get paid. Async work, on your own time and schedule, much?

So that means that under most circumstances, such a project will either involve a person who’s actually perfectly fine with localized and synchronous work, and for this kind of project that’s probably not a person you’d want to hire. Or else they’re miserably unproductive throughout the project, and that doesn’t exactly bode well for their health nor your project. (Which means a person understanding this probably will never start such a consultancy in the first place.)

Now, I want to mention that I suppose from the perspective of a consultant, there is one way to square this circle: charge exorbitant rates. If you can make a killing working two days a month, and you actually do work just two days of meeting hell per month and take the rest of your time off to recuperate, then that might actually work. But then it’s less likely — not impossible, but less likely — that your company will retain their services. Because unlike a consultant that you bring in to chop heads, monetary gains (that is, return on investment for the consultant’s fees) are much more difficult to put in numbers when you’re “only” making everyone’s life better, and attracting better employees, increasing productivity, building better products, attracting more customers, and hence making a bigger profit are mere knock-on effects from that.2

So, how do things change, then?

It’s my rather firm belief that these things change by evolution, not on an organizational scale but on one of market and society. Many of the companies still stuck in the office mindset will not change, at least not dramatically. They will continue to bleed talent, not attract much fresh blood, and face fierce competition from companies that do better and attract good people. Some will undergo a slow and sometimes painful transition and some will succeed at it. Some will fail and go under, or become irrelevant.

But expecting dramatic, sudden change at a large organization is just not realistic. And if you’re stuck in one that’s still pretending that this’ll all blow over, your best option is probably to make a change for yourself, than to try to wait for one in your organization.

This is not to say that we invented anything, just that we had to figure out how to make things work for ourselves that already worked for others. Distributed companies existed well before my cofounders and I started a company in 2011. MySQL AB (incidentally, Kris Köhntopp’s former employer) established those practices in the 2000-2005 time frame. There is an excellent November 2011 talk from ex-MySQL CEO Mårten Mickos, in which he runs through the entire history of MySQL as an independent company, until its acquisition by Sun in 2008. Around the 27-minute mark, he starts talking about work in a distributed company. ↩
And even if there is a demonstrable expected positive monetary effect, meaning in terms of numbers it’s an absolute no-brainer, you should consider that top management are people, and people don’t always act rationally. ↩

tag:xahteiwi.eu,2021-10-23:/blog/2021/10/23/make-my-company-distributed/

Universal tox tests (from just about any CI)

Florian Haas Oct 17, 2021 Updated Oct 17, 2021

I like tox. A lot. I use it all the time. This is a quick summary on how to use it in such a way that it becomes a central anchor point that you can use from all your CI systems.

Show full content

I like tox. A lot. I use it all the time. This is a quick summary on how to use it in such a way that it becomes a central anchor point that you can use from all your CI systems.

What’s tox for?

Normally tox is used to run tests for Python projects, and it’s very well suited for that. You can use it with Python libraries, Django projects, scripts you use for system automation, whatever. But you can use it just the same for code that isn’t a Python application or library itself, but a Python application just happens to come in handy for testing that code.

In this example, I’ll describe a super simple use case: using a barebones tox configuration that lints YAML configurations. Suppose you’ve got a Git repo that’s full of YAML files. And you want to make sure, for example, that all your truthy values are true or false and never yes, no, on or off. Or that your indentation is always consistent.

tox.ini

There first thing you’ll do is create tox.ini, the central tox configuration file, in the top level directory of your repository. Here’s a tiny example:

[tox]
envlist = py{3,36,39}
skipsdist = True

[testenv]
deps = yamllint
commands = yamllint {toxinidir}

That’s it. What this’ll do, when invoked as simply tox, is

create a Python 3 venv,
pip-install the latest version of yamllint,
invoke the yamllint command, which will recursively check for all .yml, .yaml, and .yamllint files in the directory where the tox.ini file itself lives.

What’s helpful here is that tox does a little bit of magic with the testenv names. tox knows that if you call a testenv py36, you want to test with Python 3.6 (more precisely, CPython 3.6). py39, that’s Python 3.9. Just py3 means whatever Python version maps to the python3 binary on your system.1

Running tox on every commit

Now the first thing you might want to do is run tox on every commit, and encourage your collaborators to do the same. You can easily do that by dropping this tiny shell script2 into your repo as a file named pre-commit in the .githooks directory:

#!/bin/sh

exec tox -e py3

Add that file to your repository as .githooks/pre-commit, and make it executable. Also, add a little note to your README explaining that, to enable the pre-commit hook, all your collaborators can simply run

git config core.hooksPath .githooks

Easy, right? And once you’ve run that command, every git commit will kick off a tox run and you’ll never commit borked YAML again.3

Now of course, using those hooks is entirely optional, and can be overridden with --no-verify. So, for those slackers that can’t be bothered to use them, you also want to check centrally. Here’s where your CI comes in.

Running tox on every GitHub PR

If you collaborate via GitHub, you can run tox on every PR, with a simple GitHub Actions workflow. To use it, you’ll need a small addition to your tox.ini file:

[tox]
envlist: py{3,36,39}
skipsdist = True

[gh-actions]
python =
    3.6: py36
    3.9: py39

[testenv]
deps = yamllint
commands = yamllint {toxinidir}

And then, you add a workflow to .github/workflows, say .github/workflows/tox.yml:

---
name: Test with tox
'on':
  - push
  - pull_request
jobs:
  build:
    runs-on: ubuntu-latest
    strategy:
      matrix:
        python-version:
          - 3.6
          - 3.9
    steps:
      - name: Checkout
        uses: actions/checkout@v2
        with:
          submodules: true
      - name: Set up Python ${{ matrix.python-version }}
        uses: actions/setup-python@v2
        with:
          python-version: ${{ matrix.python-version }}
      - name: Install dependencies
        run: |
          pip install tox tox-gh-actions
      - name: Test with tox
        run: tox

So that sets up your workflow so that it tests with two different Python versions that you care about, and then runs a test with each of them.

It does this via a combination of the information contained in the [gh-actions] section of tox.ini, and the matrix strategy defined in the workflow. The tox-gh-action plugin then pulls that information together and sets up testenvs as needed.

And it runs these checks every time you push to a branch (topic branch or default branch), and also on every pull request.

Running tox from GitLab CI

So you’re either using only GitLab and not GitHub, or you’re mirroring a GitHub repo to a self-hosted GitLab and want to run your pipelines there as well? Easy. Here’s the exact same functionality for your .gitlab-ci.yml file:4

---
py36:
  image: python:3.6
  stage: build
  script:
    - pip install tox
    - tox -e py36

py39:
  image: python:3.9
  stage: build
  script:
    - pip install tox
    - tox -e py39

In GitLab CI I know of no elegant matrix syntax to map the image version to the testenv. But on the other hand there’s a bunch of things that “just happen” in a GitLab CI pipeline, which you specifically need to define in a GitHub Actions workflow definition. So overall your .gitlab-ci.yml ends up shorter than your GitHub Actions tox.yml.

Running tox from Zuul

If you’re running a tox testenv from Zuul, you would use the built-in tox jobs in your pipeline, as referenced in .zuul.yaml:

---
- project:
    check:
      jobs:
        - tox-py36
        - tox-py39
    gate:
      jobs:
        - tox-py36
        - tox-py39

Here, the tox-py36 and tox-py39 environments are both derivatives of the base tox job, which will run with cPython versions 3.6 and 3.9, and by default invoke testenvs called py36 and py39, respectively.

And now?

Now that all of your Python testing standardizes on tox, you can go to town. Add more tests, add more testenvs, more Python versions, whatever.

You might need to make minimal changes, like add one line for each new Python version you want to support, to all your CI definitions. But if your project moves from GitHub to GitLab or from GitLab to Gerrit/Zuul, or your entire company goes on a great big CI migration, then you’ll have one less thing to worry about, because your tests already run anywhere.

By the way: when you set up your tox.ini and your CI configuration files as shown in this article, then yamllint will of course also lint your YAML CI configuration files themselves. Which comes in handy; I found 4 yamllint warnings and one error while testing the examples I’ve given here.

Testing with multiple Python versions may seem less than useful when you’re dealing with just one upstream package, yamllint. I use that here as an oversimplified example. As soon as you add your own Python scripts or modules to the tox checks, you may very well be interested in multiple python versions. ↩
If you’re being a purist, you could also invoke the tox runner from a Python script. I prefer the shell exec one-liner. ↩
In this case, for testing locally, we’re not going to care about a specific installed Python version. We’ll just make sure that the commit doesn’t obviously break anything. In my humble opinion it’s OK to catch version-specific issues in CI, but we shouldn’t feed the CI code that’s outright broken. ↩
This example assumes that you’re either using shared GitLab runners using Docker, or a self-hosted runner on Kubernetes. ↩

tag:xahteiwi.eu,2021-10-17:/resources/hints-and-kinks/universal-tox-tests/

The Review Review

Florian Haas Oct 7, 2021 Updated Oct 7, 2021

A talk I submitted to DevOpsDays Tel Aviv 2021 and DevConf.CZ 2022

Show full content

This is a talk I submitted1 to DevConf.CZ 2022, which used a non-anonymized CfP process via Red Hat’s CfP website. For that conference, it was selected as the lead talk in the Modern Software Development track. I had previously submitted this talk to DevOpsDays Tel Aviv 2021, which used a non-anonymized CfP process via PaperCall. That submission was rejected.

Title

The Review Review: comparing code review, testing, staging and deployment across development collaboration platforms

Elevator Pitch

You have 300 characters to sell your talk. This is known as the “elevator pitch”. Make it as exciting and enticing as possible.

GitHub, GitLab, Gerrit — what should I choose? What’s the best review process, the best CI/CD integration, the best deployment facility? Which should I select for my startup, or consider migrating to? Which supports good collaboration practices, which bad ones? This talk gives the run-down.

Talk Format

What format is this talk best suited for?

Talk (~25-40 minutes)

Audience Level

Who is the best target audience for this talk?

Intermediate

Description

The description will be seen by reviewers during the CFP process and may eventually be seen by the attendees of the event. You should make the description of your talk as compelling and exciting as possible. Remember, you’re selling both the organizers of the events to select your talk, as well as trying to convince attendees your talk is the one they should see.

In DevOps, the process of collaborative review, testing, staging, and deployment to production constitutes a core element of the work we do. And we generally strive to make this process as effective, efficient, smooth, and transparent as possible. Achieving that partly comes from the work culture we shape and inhabit, partly from our selection of tools — and of course, work culture and work tools permanently and closely influence each other. This goes for both the tools that drive review, and the tools that drive CI/CD:

the GitHub Pull Request process in combination with GitHub Actions;
the GitLab Merge Request process in combination with GitLab CI;
the Gerrit Review process in combination with Zuul.

None of these is perfect, all of them have their advantages and disadvantages under particular circumstances. Some are meant to be used principally as a service, some are fine to self-host. Some are adamant about enforcing specific deployment practices, some follow a more relaxed approach.

This talk is a summary of the current state of affairs with all these tools, and contains recommendations on what to use under which circumstances.

Notes

Notes will only be seen by reviewers during the CFP process. This is where you should explain things such as technical requirements, why you’re the best person to speak on this subject, etc…

My team and I have worked with all tools mentioned in a professional capacity, and I believe I’ve got a very good understanding of the relative merits of the systems presented. This does not include a hard-and-fast recommendation for one particular tool or platform.

This is a talk that’s suitable for both in-person and on-line events.

Tags

Tag your talk to make it easier for event organizers to be able to find. Examples are “ruby, javascript, rails”.

GitHub, GitLab, Gerrit, Zuul, CI/CD, Development, DevOps

If you’re curious why this is here, please read this. ↩

tag:xahteiwi.eu,2021-10-07:/talk-submissions/the-review-review/

On Contravictions

Florian Haas Oct 7, 2021 Updated Oct 7, 2021

A talk I submitted to DevOpsDays Tel Aviv, 2021

Show full content

This is a talk I submitted1 to DevOpsDays Tel Aviv 2021, which used a non-anonymized CfP process via PaperCall. This submission was rejected.

Title

On Contravictions

Elevator Pitch

You have 300 characters to sell your talk. This is known as the “elevator pitch”. Make it as exciting and enticing as possible.

A contraviction is when a person firmly believes that two objectively mutually exclusive standpoints are simultaneously true. Being contravinced makes you extremely vulnerable to manipulation. Here’s how to spot a contraviction, and what to do about them.

Talk Format

What format is this talk best suited for?

Talk (~25-40 minutes)

Audience Level

Who is the best target audience for this talk?

All

Description

The description will be seen by reviewers during the CFP process and may eventually be seen by the attendees of the event. You should make the description of your talk as compelling and exciting as possible. Remember, you’re selling both the organizers of the events to select your talk, as well as trying to convince attendees your talk is the one they should see.

“Contraviction” is a term I use for when a person is firmly convinced of two sides of an obvious contradiction. This may sound like it would be an unusual and rare occasion, and yet, once you start looking, they are all over the place. A few examples:

At the core of Nazi ideology in Germany in the 1920s and 30s was the notion that Jews are engaged in a successful global conspiracy to subjugate all nations including the German nation, and that Jews were also, simultaneously, socially, intellectually, economically, and morally inferior to Germans.
At the core of religious extremism is the belief that God is all-forgiving and merciful, and also that as long as a portion of humanity (“infidels”) displeases God, all of humanity must suffer God’s wrath. We see this in contemporary Islamic extremism, but Catholicism in the 16th century did no better in the conquest of Latin America, nor did Western Christianity do much differently during the medieval crusades.
At the core of Trumpism is the notion that the United States of America is the greatest nation on Earth and that there is no better nation nor will there ever be, but also that America has been ruined by “the liberals” and has slipped into inferiority, so that it is necessary to “make America great again.”

In all these examples, both statements can logically be false, or one of them could theoretically be true — but if one is true, the other one must be false. And yet, people (millions of people!) hold or held both of these statements to be true, simultaneously.

But being contravinced puts people in a very vulnerable position: if someone gets you to believe both sides of a contradiction, they can logically argue anything to follow from either one side, or the opposite. Which means they can convince you of anything. And that never ends well.

This talk defines contravictions, highlights examples (even devopsy ones!) and provides suggestions on how to uncover and dismantle them. Because contravictions have the potential to poison and destroy discourse, and that’s a cultural issue we all need to deal with — among friends, family, and coworkers.

Notes

Notes will only be seen by reviewers during the CFP process. This is where you should explain things such as technical requirements, why you’re the best person to speak on this subject, etc…

Given the fact that the nature of this topic is sensitive and emotional — yes I, an Austrian, will be talking about Nazi ideology, in Israel, consider me terrified — I’ll need to submit this on the condition that I’d only want to deliver this talk in person. If that does not permit itself on account of the Covid-19 situation or of travel restrictions, and I would have to rely on a streamed talk and have no way of reading the room or scanning the audience for body language feedback (some more details on this topic here), then I’d rather not give the talk at all, rather than stream it.

If this disqualifies the talk (or if you just consider it too controversial to begin with), no hard feelings at all.

Tags

Tag your talk to make it easier for event organizers to be able to find. Examples are “ruby, javascript, rails”.

Culture, Communications

If you’re curious why this is here, please read this. ↩

tag:xahteiwi.eu,2021-10-07:/talk-submissions/devopsdaystlv-2021-contravictions/

Getting out of Meeting Hell: As a top-level executive

Florian Haas Oct 3, 2021 Updated Oct 3, 2021

A short series on transitioning to distributed work and asynchronous collaboration. This article is on what you can do if you are stuck in meeting hell as a top-level executive, such as a CEO, Managing Director, or Executive Director.

Show full content

Please have a look at the introduction for background, for applicable disclaimers, and for information about the specific environments this series talks about.

This part of this series is for you if you are a chief executive officer, a managing director, executive director, or whatever else the top-level role in your organization may be. This means that you have people who report to you, but you don’t report to anyone in the day-to-day operations of the company — even though you may, of course, be answerable to your board of directors or your investors, or some oversight body.

So, you realize that

a lot of your company spends a lot of time in useless meetings,
they’re often forced to sit through 90 minutes of staring into cameras when they could instead have spent 5 minutes reading an email,
people’s productivity suffers badly because they are constantly being interrupted.

And this is badly affecting you, personally, as well. So for the benefit of yourself and everyone else in the organization, you want to change things toward being less interrupt-driven, less synchronous, more productive, and healthier.

Now, I’ve got some bad news for you.

In contrast to your employees at any level, it is much harder for you to pull the Leave option than it is for them.
In contrast to most of your mid-level managers, no matter how hard you try, you may never escape being in a lot of meetings, and being in a lot of unpleasant meetings to boot. Chances are, whenever stuff is exploding, boiling up, or otherwise going bonkers, you’ll be roped in to calm things down, make a decision, or soothe a high-profile customer who is conniptiously trying to convince you that your SLA is akin to the Constitution and that an engineer guilty of causing a violation ought to hang for treason.

But, and this is the good news to balance the bad, if you manage to pull the rest of the company out of meeting hell, life is going to get way better for you, too.

So, obviously, there’ll be a lot for you to Learn. If you’ve never led an organization that was distributed and asynchronous by default, there’s a lot to unpack, understand, and overcome when it comes to turning one that isn’t, into one that is.

But your most important role at the top of the organization is this:

Lead.

And by that I mean lead by example, and also lead by policy.

Here are a few ways you can lead by example:

Write. Particularly when you want to communicate something to the whole company. All-hands video streams? [stage whisper] Everyone hates those. Meeting invites saying “we’ll anounce something important”? Just write an email saying the important thing, and then ask people to send you questions by a deadline. And publicly commit yourself to a deadline by which they can expect answers.
Take no shortcuts. If you’re setting up communications rules for everyone, live by them yourself. There’s a bad, hidden cost in skirting around them.
Make a point of always giving context when pinging someone in chat.
Quit calling people without warning — you can’t convey the context of the call without them answering. A cold call is the ultimate naked ping.
Generally, cut back on synchronous, realtime communications. They always interrupt people, and interruptions are expensive. Think about how much money you’ll lose from someone not having a brilliant idea because you pinged them about some technicality in the company chat while they were deeply immersed in a complex problem. (Also, think about how much money you might have already lost that way.)
Insist on agendas being drawn up and meeting notes being kept (and both circulated to anyone who needs the information) for every meeting you are asked to attend. Write an agenda for every meeting you call or chair. If you have a group that meets regularly, appoint a different person from the group as a scribe each time. Do not ask a person to volunteer. (If you do, chances are that one person in the group will be typecast as the scribe for all such meetings. I call such an unfortunate person a scrapegoat. You don’t want scrapegoats.)

Then, here are ways you can lead by policy:

Hire professional writers. (In the software industry, tech writers come to mind.) Not because you want those people to be scrapegoats, but because professional writers can massively move your organization forward in efficiency of written expression, document organization, and clarity of communications. Make them a hiring priority. Pay them handsomely — good tech writers can make good money freelancing; you’ll need to make them a pretty compelling offer to consider giving up some of that freedom.
Ram a stake in the ground making clear to everyone you will not tolerate corporate surveillance or invasions of privacy. Any attempt to introduce an “always on camera” policy should be grounds for reprimanding the manager that instigated it.
Insist on the procurement of tools that take distributed, asynchronous work into account. A video conferencing system that values surveillance over privacy is not one your company should throw money at. A hiring platform that requires specifying an “office location” for every role, and has no provisions for remote positions, isn’t either. Neither is a chat platform hosted by a company whose sole chance at long-term success is to pull all corporate communications into synchronous chat. Yes, these judgments require technical expertise. If this is something you don’t have because you consider yourself a “non-technical” person, I have something to read for you.

There’s another thing that you might be inclined toward doing: declaring meeting-free days, as in making it a policy that no meetings are to be scheduled on Wednesdays. I think of that as very much a stop-gap measure that’s often done out of sheer despair. Sure, you want your people to have meeting-free days, but you actually want them to have meeting days, with meetingless days being the norm, rather than the exception. I think you’re better off gradually replacing your meeting-addicted managers with ones that are accustomed to distributed and asynchronous work.

Finally, and this may be a bitter pill to swallow: getting to distributed and asynchronous is infinitely harder if you have built a “flat” organization. If every one of your managers has 20-30 direct reports, they will feel utterly overwhelmed at the thought of staying in asynchronous communication with every one of them, not to mention the fact that it’s damn near impossible for them to convince that many people at one time to adopt a new way of collaboration.

This is one of the many, many ways in which flat organizations fail to scale, but this gives you the opportunity to fix two issues at the same time. If you’ve got one manager that’s actively promoting meeting hell that is currently making life miserable for 20 people, you can split that team up into 4, and hire 3 team leads that know how to work asynchronously and distributed (or promote people who have that sort of experience and want to step up). Then, in the one remaining team with the unreformed meeting addict, things can go three ways:

The one “traditional” team sees how things go in the other teams, and they quickly coax their team lead into doing things the new way, or
they all ask to be transferred, or
(the worst-case scenario) the one meeting-addicted manager continues to annoy everyone, and they all leave. That’s bad, and a failure of judgment (that manager should probably have been let go first), but losing 4 good people, as difficult as that might be, is probably “better” than losing 20.

As a final thought, please take a moment to put yourself in other people’s shoes, and understand what options they have with being stuck in meeting hell. Whether it’s your line managers or your regular employees, they both have the option to just chuck in their notice and leave, and leave they will, if they’re sufficiently deep in meeting hell. They have plenty of opportunities.

So lead by example, and lead by policy. Do things right, but more importantly do the right things.

This article concludes the series — for now. I am guessing that people reading this will have opinions, air their grievances, and share feedback. Those usually give me good thinking material to dwell on, so I’ll probably have an additional installment based on reader feedback at some point.

tag:xahteiwi.eu,2021-10-03:/blog/2021/10/03/getting-out-of-meeting-hell-executives/

Getting out of Meeting Hell: As a mid-level manager

Florian Haas Oct 2, 2021 Updated Oct 2, 2021

A short series on transitioning to distributed work and asynchronous collaboration. This article is on what you can do if you are stuck in meeting hell as a mid-level manager.

Show full content

Please have a look at the introduction for background, for applicable disclaimers, and for information about the specific environments this series talks about.

This part of this series is for you if you are a manager at any level of your organization except the very top. In other words, you have reports, but you also report to someone. And as such you may be stuck in meeting hell in two ways:

With your own team, that is to say, yourself and the people reporting to you.
With your management peers, in other words, the people that manage other teams and report to the same director, VP, or whatever other fancy titles your company might use.

And I’ll cover both of those angles, but before I do, I should note that you, of course, have one option to improve your personal situation that is also available to your reports: Leave. It’s not the only option you have, and it may not be your first option, but an option it is. And, for your personal development, taking on a management role in a company that does distributed work and asynchronous communications right might be an excellent career move.

If you are a conscientious manager and you feel a sense of responsibility and obligation to your team, and this makes you hesitate to consider the option to leave, then that reflects nobly on your character but know this: that responsibility is contingent on everyone’s employment in one organization, and it ends the day your employment contract (or theirs) terminates. That may sound harsh, but that’s the breaks of the game, and you shouldn’t let that hold you up, if leaving is the objectively most sensible option for you. There’s no use burning out over an exaggerated sense of duty.

In addition, I’d posit that you should leave your company if it employs or promotes employee surveillance, keeps voodoo “meeting engagement metrics”, or engages in otherwise harmful or toxic behavior. I’d argue that in those cases you should also make sure your direct reports know why you’re leaving, to the extent that your contractual obligations allow you to tell them.1

However, if you’ve chosen not to leave your current organization, and you want to actually do the hard work for making work better for yourself and your distributed team, here is your paramount obligation:

Learn.

If there’s anything that the coronavirus pandemic has shown about organizations making the (admittedly abrupt) transition from localized to distributed work, it’s this: it’s a challenge for managers much more than for non-manager employees. You have to make a conscious effort to learn how distributed work works, and how you manage a distributed team.

But, chances are that you won’t be limited to watching talks or listening to podcasts or reading articles (you might have read this one before you landed here). Instead, it’s not unlikely that if you’re doing software or technology development, some of the people on your team are already well versed in distributed collaboration and asynchronous communications. Yes, I am of course talking about open source contributors. It absolutely does not matter what kind of open source software some on your team have worked or helped out on, or whether it’s in any way related to the products your company builds. Everyone who’s ever landed a pull request probably has had to learn more about distributed, async communications than any office dweller did who never has. Find those people. Talk to them. Learn from them.

You’ll also have peers in the industry, outside your company. Managers who’ve already done the work. They can’t do the learning for you, but they can you show a few tricks of the trade.

And I’ll give you one, straight away: in a company stuck in Meeting Hell, you change things for your team first, and don’t even think about people at or above your level. I would not advise taking your ideas up the chain without the confidence of knowing that what you are suggesting works in your team. Mid-level management is unfortunately often full of “great idea, but it’ll never work here” or “that might sound nice, but I’m afraid it doesn’t fit our culture.” You want to run silent, run deep (if you permit me a strained metaphor). When your team lands its first great big success, that’s when you casually want to drop something like “we did this on one meeting a week.”

So, how can you get there? Here are a few ideas.

Eliminate daily standups.
Limit yourself to one scheduled team meeting a week. Keep it to approximately one hour, give or take a few minutes. Schedule it near the start of the work week (meaning, in most countries, on Monday).
Have an agenda for your team meeting (and all other meetings) that is available in a collaboratively editable document, cross-referencing all open tasks. Have this agenda in a place where everyone can find it, and have it ready no later than 15 minutes prior to the meeting.
During the meeting, transform the agenda into your meeting notes. This is something a collaboratively editable document (like a wiki page or Google Doc) makes easy. Appoint a responsible note-taker or scribe, whose job it is to compile the meeting notes, edit minor glitches and errors, and then let you know the notes are ready for consumption. You may decide that everyone who is in the meeting can co-edit the notes, and this can be very beneficial to make sure that even the quiet voices are heard. But it’s the scribe’s responsibility to bring them into good shape, and it’s your responsibility as the team lead to make sure they’re truthful and accurate.
Spend the first 15-20 minutes of the weekly meeting on non-work topics. Ask something like “how was your weekend” or “how are things going in your part of the world” or anything else that you can think of that helps you feel the mood of your team. These things don’t go on the meeting record. Always accept when people don’t want to give an answer. They may have many reasons, none of which are necessarily your business. Trust that they’ll share whatever they’re comfortable sharing, even if some weeks that’s nothing.
Learn to keep each other appraised of your relative progress and status via a simple, central coordination facility, such as a virtual Kanban board.
Learn to write to be understood, and encourage your team to do the same.
Do not attempt to get back to people “immediately”, and do not expect people to get back to you promptly. A reasonable default expectation for reply times is 24 hours. If you need an answer faster, you can always send an email (or better still, a comment on a card or ticket) with a request like “if I could get this by 1600 UTC today, that’d be excellent.” Bonus points if you add why you need that information quickly, as in “… so I can put this together with Kim’s performance data and make a decision on which load balancing strategy we’ll use, which Alex needs to know by tomorrow.”
Establish a convention for marking things as urgent, when needed. The word URGENT, included in capital letters in an email subject or a chat message, usually works well. Only use this if things are on fire.2 Never use the urgency marker just by itself. (That’s a naked ping on steroids.) Also, when someone legitimately uses the marker in communication with you, honour it. When someone uses it for something other than a legitimate reason, take them to task.
Get used to referring back to written communication, such as ticket comments or the record of a meeting, weeks or months later. This is not “pulling out the receipts” or “quoting chapter and verse” at someone. It simply serves to establish that yes, we have a written record of just about everything, and there is no need to keep these things stored in our heads under a banner saying “must not forget” (the latter being a major contributor to personal stress).
Don’t attempt to hide your own mistakes and misjudgements. They’re on the record. That makes them a learning opportunity.
Gradually establish firm communication rules with your team. Many of these won’t need to be autocratically decreed, because they just make sense and people tend to intuitively agree on them (like the just-mentioned URGENT marker and the circumstances under which to use them). Then, make sure you stick to them. Stick to them yourself, and correct others who break them.3

Importantly, let your team know that you’re working on getting them all out of meeting hell. Don’t sugarcoat things if you’re struggling while swimming against the company tide, at least temporarily. If they know you’re actively striving to make their work situation better, more productive and healthier it may well make the difference between them leaving and staying on board.

Finally, keep an eye on corporate policy and the messages you get from company leadership. If they’re obviously stuck in 1995, you might as well go and update your CV. If the people at the top do make an effort to implement distributed and async friendly policies, however, then things may not change overnight, but there’s hope that change they will, and your effort will not be for naught.

By the way: if your non-disparagement clause is so harsh that you can’t talk about anything negative in the company, then that’s probably an awful company to begin with. ↩
Note that the convention of the URGENT marker also establishes the convention that things are not urgent by default. ↩
Failure to enforce sensible collaboration rules may result in the inadvertent creation of an asshole filter. “Oh but I can’t enforce that rule on Bob, because Bob thinks rules aren’t for him but he’s brilliant!” — No. Don’t tolerate brilliant jerks. ↩

tag:xahteiwi.eu,2021-10-02:/blog/2021/10/02/getting-out-of-meeting-hell-managers/

Getting out of Meeting Hell: As a regular employee

Florian Haas Oct 1, 2021 Updated Oct 1, 2021

A short series on transitioning to distributed work and asynchronous collaboration. This article is on what you can do if you are stuck in meeting hell as a regular employee who isn’t in a management role.

Show full content

Please have a look at the introduction for background, for applicable disclaimers, and for information about the specific environments this series talks about.

So, you’re a regular employee (sometimes called an Individual Contributor or IC), who doesn’t have a team that reports to you. And you are in an organization that, although it does have remote employees or may even have switched to an all-distributed mode on account of the pandemic, does not adopt basic rules of distributed work and asynchronous communications.

You spend more time in meetings than is healthy, your productivity is severely impacted, you’re stressed out from frequent interruptions, you have trouble getting into a state of flow. You may be asked to keep your camera on for hours at a time, and you feel that this badly encroaches on your privacy or sense of personal space. Your manager or your peers like to naked-ping you or send you “hey, I need you for a minute” one-liners that disrupt your train of thought for half an hour.

And your organization (including your peers and direct manager) tolerate or condone this kind of collaboration.

If that sounds like your typical day, I have great news for you. You have at hand an excellent opportunity to improve your personal situation, be more productive, and communicate better with your peers:

Leave.

Hand in your notice. Quit. Hightail it outta there. If

you want to truly improve your personal situation and work effectively in a distributed team, and
neither your direct manager nor the top leadership of the organization takes any interest in optimizing for distributed and asynchronous collaboration,

then the way to do that for yourself is to leave your current organisation behind, and go elsewhere. There’s plenty of employment opportunities for you to pick up in this industry at this time, and there are a number of organizations that handle distributed work better than the one you’re currently in.

Sure, you could consider trying to change the system from within. And there is a small (I’d say minute, but nonzero) probability that you’ll be successful, and drive real change in your organization. However, there is an inordinately larger probability that you’ll burn out in the process — and endangering your health is never a gamble that’s worth taking.

You may think that that’s a pessimistic view. I think the opposite is true. You’re not “failing” to drive change in an organization that resists it, instead you can succeed at pushing change in an organization that embraces it. Throwing your energy into meaningful change for the better is the definition of optimism, in my book. And if the place for that is elsewhere from where you are now, seize that opportunity!

However, note that the foregoing is all predicated on the entire company (including your direct manager) being fine with synchronous work and people having their calendar jammed with back-to-back meetings, with no apparent intent to change. If it’s just you that wants change, or perhaps only you and people at your level in other teams that you can’t reasonably join forces with, I’d argue that you just shouldn’t be risking your health. In contrast, if your manager1 and a couple of your peers are on your side — consider your manager may be in their own learning phase about distributed and async collaboration — then things are a lot less clean cut, and maybe you’d want to stay on for the ride. Particularly so if there’s an active, supportive push from the top of the organization.

Even if your manager and some of your peers are on your side, I can think of a few situations where you should still leave: like when your company has an “always on camera” policy, or doesn’t allow people to go audio-only on calls, or promotes number crunching on employee “engagement” in a meeting app. Companies doing that are, in my view, beyond salvation and will never be able to attain the levels of trust required for a distributed organization to function.

So: if you’re stuck in meeting hell with nobody on your side, as a regular employee with no reports, your best way forward is probably the way to the exit. If you’re stuck in meeting hell now but there’s a gale blowing the company into async & distributed mode, and/or you have a very thick-skinned manager that gets it, and will deflect and absorb any pressure from meeting-addicted higher-ups, you may want to hang on for the journey.

I’d posit that the only manager that really matters in this scenario is your direct line manager. That person would your make-or-break partner in any transformation of how your team collaborates. If you and they don’t share views, then don’t expect to be able to play four-dimensional chess by forming an alliance with some other manager who you then expect to influence your manager so that everyone gets better at distributed and async work. You don’t want to trade being stuck in meeting hell for being in stuck in meeting hell and corporate politics. ↩

tag:xahteiwi.eu,2021-10-01:/blog/2021/10/01/getting-out-of-meeting-hell-employees/

Getting out of Meeting Hell: What this is about

Florian Haas Sep 30, 2021 Updated Sep 30, 2021

A short series on transitioning to distributed work and asynchronous collaboration. This article is the introduction.

Show full content

In 2020, I presented a talk at FrOSCon titled No, we won’t have a video call for that: Communications for distributed teams. In early 2021 I put together a full-length writeup of that talk, and published it here. And for no apparent reason that article then made it to the top of Hacker News on one day in September 2021,1 and apparently resonated with quite a few people.

And after that, I got a fair number of questions along the lines of “okay, what you talk about is a spot-on description of how I want to work, but how do I get there?” In other words, what can you do in order to transition from a team that’s stuck in meeting hell, to one that actually goes fully distributed and embraces asynchronous communications?

Note: I use the shorthand Meeting Hell for a situation in which people are forced to be in unnecessary and unproductive video meetings2 for an unhealthy fraction of their work time.

This is admittedly a mere symptom of not having adopted asynchronous and distributed ways of working, but it’s such a tell-tale sign thereof that it counts as a dead giveaway. Thus, I think it’s okay to say “I’m stuck in meeting hell” when what someone means is really “I work in an organization that’s failing badly at distributed and asynchronous work.”

So what I’m doing here is offer suggestions for getting out of that. If

you’re one of the people that has 25 meetings a week, or
you spend 60% of your work week in standups and planning and retro, or
the only time you have for doing what you actually signed up for is in overtime,

and it’s grinding you down, and you want to work differently, then this series might be for you.

Personal strategies: one size does not fit all

I think it’s very important to differentiate personal strategies for getting to distributed & async, based on your position in the company or organization. So, I’m going to look at it from three angles:

Your options if you are what some companies call an “Individual Contributor”, or IC. In other words, these are for you if you are a regular employee (or contractor), and you’re not personally responsible for other people — in other words, you have no reports.
Your options if you are at some level of management that is not top organizational leadership. That is to say, you have people that report to you, but you also report to someone.
Your options if you’re a top-level executive, meaning you’re a Chief Executive Officer or Managing Director or Executive Director or something of the sort. You have people that report to you, but you don’t directly report to anyone — even though you may be answerable to a Board of Directors or some other oversight body, of course.

These are my views, not self-evident truths

Now, I’m only going to talk about my industry (software-driven technology) because that’s the only industry I feel remotely qualified to talk about. Also, I have never worked in a company that had more than 3,000 employees, and I feel most at home in small outfits under 50. I’m reasonably confident that what I talk about is somewhat useful for companies from 3 to 3,000 people in the software industry. It may be applicable elsewhere — larger companies, other industries — but at any rate, I make no guarantees of any kind. Feel free to adopt an entirely contrarian position. And, you should also know I am not a scientist, so none of what I write is informed by rigorous empiricism.

So, are we cool with that? My opinion, my thoughts, my views — not pronouncements of absolute truth.

Let’s get started.

For context: it was at the top for like three hours on a Saturday morning. So it might have landed there just because enough people were simultaneously bored enough to give it a read… ↩
Meeting Hell might also apply to excessive in-person meetings in a shared work space, such as an office. However, I really don’t believe my industry will ever go back to defaulting to office-based work, now that it has shown for nearly two years that it can function, in principle, in a default-distributed mode. Thus, I am using the word “meeting” as synonymous with “video meeting” in this series, and I wouldn’t be surprised if this eventually became the norm.

Think of this as akin to the transition undergone by the word call: before the advent of the telephone, a call was a visit you paid to someone’s house. Then, if you metaphorically “visited” someone by telephone, that was a telephone call. Now a call is always a phone call, except when explicitly specified as a house call. ↩

tag:xahteiwi.eu,2021-09-30:/blog/2021/09/30/getting-out-of-meeting-hell/

No, We Won’t Have a Video Call for That: Reader Feedback

Florian Haas Sep 26, 2021 Updated Sep 26, 2021

Communications in distributed teams: a write-up of my talk from FrOSCon 2020, Cloud Edition.

Show full content

This is a short summary of selected reactions to the original article.

Twitter, 2021-04-10

On Twitter, Michael K Johnson made an interesting point in response to this article:

My way of thinking about DMs is a little different, or maybe we think differently about confidentiality. Work goes in public, meta-work is often about relationships, and I want my reports to be 100% confident they can bring any question or concern to me.

So “encrypted email” is not the right metaphor or measuring stick in my view. DMs are a tool for saying what you aren’t comfortable saying “out loud”, but that shouldn’t be about the work itself. Never “how do I do this?” But “advice pls about how to work with fred?” — yes!

I entirely agree with the sentiments behind this; I still maintain that chat DMs are not necessarily a good approach for addressing this, for reason that some chat systems give merely an illusion of confidentiality. If both participants in a conversation use OTR encryption over a protocol like IRC or XMPP, inadvertent disclosure to a third party is highly unlikely. Slack DMs? I wouldn’t be so sure. If your report confides in you, you don’t want them to have to worry if their message is really just between you and them.

Hacker News, 2021-09-25

On 2021-09-25, a link to this article ended up being the top post on Hacker News for a few hours. You’re very welcome to read through the 200-odd comments in that thread, but here I’d like to respond to just a couple. I’m deliberately only picking out ones where I feel like clarification on my part is necessary; as far as differences of opinion are concerned I’ll be happy to let those stand.

The article also assumes every one is a native speaker who can write quickly and clearly in a chat – in a lot of international projects this is not the case.

— comment from papaf

My native language is German, and neither at the time of presenting the original talk nor at the time of writing this update did I work with anyone who is a first-language English speaker.

I have an objection to the author’s blanket disregard for “pings” in chat - while the request could/should be worded a bit more clearly than just “ping”, IMHO they’re a valid way of requesting if an opportunity for synchronous communication is available in the (not that rare!) cases where asynchronous communication would be worthless.

— comment from PeterisP

My blanket disregard is for naked pings, not for pings in general.

tag:xahteiwi.eu,2021-09-26:/resources/presentations/no-we-wont-have-a-video-call-for-that-reader-feedback/

Rules are rules

Florian Haas Sep 16, 2021 Updated Sep 16, 2021

I have a reputation for being rigid when it comes to communication rules. Here’s why I am.

Show full content

I have a reputation among my colleagues that I am very strict and rigid about the communications rules I follow for myself, and for my team. That reputation is entirely deserved, and I am indeed not particularly flexible in my strong preference for asynchronous, non-interruptive communication methods. I have discussed elsewhere, and at length, what those are and why I have that strong preference. But I haven’t really outlined why I very rarely allow myself, or others if I can help it, to deviate from it.

But Florian, the complaint usually goes, can’t you occasionally make an exception? Sometimes there’s something you could easily do to unblock someone else, if they could only quickly chat you up and ask you to jump in. You’d be in and out of there in no time.

Let’s use an analogy here.

Suppose you’re out of an indispensable food product, say milk or flour or potatoes or eggs. So you make a quick run to the grocery store, and because you don’t live in a country with proper cycling infrastructure and the store is out of walking distance, you drive.

You arrive at the grocery store parking lot, and for some unfathomable reason it’s chock full. Completely packed. Not a single spot available. Except those two spots right near the store entrance that are reserved for wheelchair users, which you are not. You don’t see a single car around with a wheelchair plaque or decal. Not even one.

Now. When you look at the situation, the objectively simplest and most practical solution is for you to park in a wheelchair spot. You know exactly how long it takes you to buy a carton of eggs; you’d be in and out of that place in two minutes. The chance that one car needing a wheelchair spot arrives in exactly that time is minute, and even then there would be another one available. The probability of two wheelchair users arriving in their cars, simultaneously, in those two minutes, is infinitesimal. There is an overwhelming probability that you will vacate the spot again, without it ever being needed by one of its intended users while you were occupying it. So, why not use it?

The answer is that if it’s OK for you to break the rule that that spot is for wheelchair users only, there is absolutely no reason why it shouldn’t also be OK for everyone else. You’re not special, the same rules apply to you as to everyone else — so if we were to decide that this rule doesn’t apply to you this very minute, then it also needn’t apply to anyone else under similar circumstances.

And then, promptly, we’re in a situation where everyone flouts the rules, and an actual wheelchair user can no longer do their grocery shopping. That’s why you, if you are not a wheelchair user, shouldn’t park there, and nobody else that isn’t shouldn’t either.

And with interruptive communications — such as pinging someone in a chat when you could send them an email just the same — it’s much the same way: if you needlessly ping me and I acquiesce, drop the thing I’m doing, and focus on your interruption instead, it would be unfair of me to not do the same for somebody else. And I know if everyone does this to me, my work day is purely interrupt driven and that’s awful. The same goes for tolerating interruptive communications towards my team — or, worse, engaging in such interruptive communications myself.

Are there exceptions to this rule?

Ah but of course. Let me take you back to the grocery store. Suppose someone had a heart attack or other major health emergency that struck them down right as they were exiting the store and walking back to their car. Would anyone — including a wheelchair-using motorist that arrived just at that moment to do their shopping — complain if the ambulance parked across both wheelchair accessible spots, if that was the only practical way to get closest to the patient? I hope not. And caring for the patient and stabilising them for the trip to the hospital would surely take longer than your two-minute egg procurement dash that we discussed earlier.

Again, this has a parallel in interruptive communications in a (much less dire) regular work situation: stuff is actually on fire? Or there’s something that for some legitimate reason needs doing right now that only I or someone on my team can do, or is most comfortable with? Ping me in chat, use the back channel, give me a ring on my phone for cryinoutloud, whatever it takes to get my attention. Nobody will hold that against you, least of all me. And in this case that’s a rule that I can also easily apply generally, treating everyone around me fairly and equitably: when stuff is urgent and seriously overrides the priority of what’s currently being worked on, you get to interrupt me or, if necessary anyone on my team. (Though I would prefer that you interrupt me specifically, and I can decide whether we really need to mobilise another person.)

Just don’t abuse that. If you do, you’ll just condition people into taking your sense of “urgent” with a big pinch of salt.

Edit, 2021-09-21:

My colleague Jean-Philippe Evrard has suggested that I refer to another, much more elaborate article on a similar subject: Siderea’s The Asshole Filter. I recommend you give it a read if you’re inclined.

tag:xahteiwi.eu,2021-09-16:/blog/2021/09/16/rules-are-rules/

How to write a decent job ad

Florian Haas Aug 27, 2021 Updated Aug 27, 2021

This is a series on writing job ads that people actually get interested in.

Show full content

Over time I’ve come to accept that one of the things I’m apparently reasonably competent at is writing and publishing ads for open positions, and I’ve received questions and requests for advice from other folks who hire people. So I’m going to try and break down what I consider a decent job ad. Not a perfect one mind you, perhaps not even a particularly good one, just a decent one that people will want to read, pass on, and maybe apply to.

A few general notes

I try to write an ad in such a way that it answers most of the questions an applicant might have about the position. And I then structure it like an imagined conversation between a potential applicant (asking questions) and me, the hiring manager (answering them). That’s why practically every subheading in the ad is a question.

The structure

In my career ads, I give answers to this list of questions:

What’s this gig about? — The ultra-concise summary of the role to be filled. One sentence.1
What will I be working on? — Details of the systems/processes/responsibilities associated with the role.
What should I know? — Prerequisite skills and knowledge.
What can I learn? — Opportunities for acquiring new skills and knowledge.
What communities would I engage with? — People and communities outside your organisation the employee would interact with.
Who would be my direct manager? — Information about yourself.
How’s work at company? — Notes on organisational culture.
What does my team look like? — Notes on team composition and culture.
What does my work week look like? — Information on how the team organises its work on a daily/weekly basis.
Where can I work from? — Information about preferences or restrictions regarding the physical location of prospective employees.
Can I work from home?2 — Information about your remote work policy.
What timezone would I work in? — Most teams have preferred times-of-day when the majority of team members is awake and working, which tends to be when most work gets done. Or, else, your team may operate 24/7 in shifts, and you’re looking to cover a particular shift.
Is travel involved? — Possibly a non-issue in the middle of a pandemic, but you might want to establish expectations for when it’s over.
What employment conditions apply? — Have standard contractual clauses that apply to everyone, like vacation policies or specific packages? Might as well list them.
When would I start? — Don’t assume that your applicants are available immediately. If people have a notice period in their current job to work with, they’ll want to know what’s the earliest and latest date you want the role filled.
What will I make? — Compensation.
How do I apply? — Details and deadlines related to the application process.

The details

I have a few more details about several of these items, which I’ll try to elaborate on as my time permits. So there should be more installments in this series, eventually. Hopefully. 🙂

Can’t condense the role description into one sentence? You’ll either have to work on your editing skills, or define the role better. ↩
If your answer to this question is “no”, I bid you good luck! ↩

tag:xahteiwi.eu,2021-08-27:/blog/2021/08/27/decent-job-ads/

Audience feedback on online conference platforms: a speaker's view

Florian Haas Aug 24, 2021 Updated Aug 24, 2021

As a speaker, I look for something very specific in online conference platforms.

Show full content

We are in year 2 of the Covid-19 pandemic, and open-source conferences are still, for the most part, online-only events. (And I find myself questioning the judgment of those that put on large in-person conferences.) I have spoken at several of such conferences, and I’d like to zero in on one aspect of my personal experience, as a speaker, on some of the conference platforms I worked with.

Now, I should explain one thing up front. As a speaker, the thing I care the most about is:

Is this talk useful to you?

That’s it. That’s the paramount question. I want you to take something away from the talk that you find useful. Whether that’s a technical insight, or a new angle on a problem, or even just entertainment, I want there to be something in the talk for you.

And it’s incredibly important for me to get a sense of that, as I am delivering the talk.

Now in an in-person event, that’s easy: all I need is a look across the room, and I permanently look across the room. I can tell if you’re making eye contact, or listening intently, or nodding your head, or even putting on a face that makes it clear that you’re violently disagreeing with me. Sure, if you raise your hand and ask a question, or heckle me, or laugh at a joke, that drives the point of your engagement home — but I don’t need you to become so explicitly engaged, to know that you are engaged.

And it is this kind of feedback (that you might not even realize you’re giving me!) that makes the difference between delivering a talk, and just speaking into the void. It’s also the difference between delivering a conference talk, even if it’s a pre-recorded one, and just uploading the video on YouTube. If an conference platform doesn’t give me that kind of feedback channel, the work that I put into writing, rehearsing, and recording/streaming the talk is better spent with a view toward upload to a video hosting platform, and engaging with viewers there.

So, I as a speaker am foremost interested in a single thing about the conference platform:

Is it easy for you to tell me if my talk is useful to you?

And I’m not talking about you getting into a chat and saying “this is useful.” That’s much too high of a threshold. How often do you sit in a talk and then tell the speaker, “hey, that’s useful”? Quite rarely, and only if you find the content especially actionable or insightful. Because you normally don’t have to tell me explicitly: if you’re actively listening to me, I can tell. And I know you wouldn’t be listening to me if I talked useless nonsense.

And this is one of the reasons why I like Venueless so much. Venueless has implemented an extremely low-threshold feature of showing audience engagement. You get a handful of emoji like ❤️👏🤣👍🤔 that you, as a viewer, can click on, and then they appear in the event chat stream like “emoji rain” from the top of the screen. Just one click, which also gets anonymized and lost in the crowd — this last bit is important for people who are shy or reserved, and don’t like to stick their head out. And this makes it so much easier for you to engage, than having to actually type a sentence (or even a word, or even typing an emoji) into a chat channel.

I can not tell you enough how much of a difference this makes to the speaker experience. Add to this that the event chat is not shackled to Slack or any other horribly overgrown not-even-really-chat platform anymore, and you’ve got a simple, easy-to-use, no-unnecessary-frills experience that puts you directly in touch with your audience. I loved this at PyCon AU last year.

For comparison, I also saw LoudSwarm at DjangoCon Europe. The way it was used in that conference it seemed very tightly tied to Slack, although the audience was very nice in using emoji reactions very generously. Still, it wasn’t the same quality of feedback that Venueless provided.

Voctoconf, which I saw at FrOSCon, allegedly predates Venueless and was excellent in terms of streaming, but in terms of audience interaction it’s essentially BigBlueButton on steroids, meaning it’s about on the same level as the LoudSwarm/Slack combination I saw at DjangoCon.

tag:xahteiwi.eu,2021-08-24:/blog/2021/08/24/online-conferences-audience-feedback/

Add Depth! Stereoscopic imagery for everyone

Florian Haas Aug 21, 2021 Updated Aug 21, 2021

My talk from FrOSCon 2021, Cloud Edition.

Show full content

FrOSCon 2021 was again an online event due to the COVID-19 pandemic (just like the previous year), and had a track named Woodwork instead of IT, showcasing people’s hobbies and interests they had discovered, or rediscovered, during lockdown.

So I submitted a talk that has absolutely nothing to do with what I do for work, and instead covered stereography: making, and viewing, stereoscopic images and videos.

The talk recording is available from YouTube.

And, as always, you can also review my slides, with all my speaker notes:

Rendered slides: GitHub Pages
Slide sources (CC-BY-SA): GitHub

tag:xahteiwi.eu,2021-08-21:/resources/presentations/add-depth-stereoscopic-imagery-for-everyone/

Fixing powerline flicker on your webcam feed with a udev rule

Florian Haas Jan 17, 2021 Updated Jan 17, 2021

If you are

spending a non-trivial amount of time in video calls every week (something that, at the time of writing, is true for a lot of people due to the COVID-19 pandemic), and also
having to use mains-powered artificial lighting in your office (true at the time of writing …

Show full content

If you are

spending a non-trivial amount of time in video calls every week (something that, at the time of writing, is true for a lot of people due to the COVID-19 pandemic), and also
having to use mains-powered artificial lighting in your office (true at the time of writing for significant portions of the Northern Hemisphere, as it’s winter there),

then you may be dealing with an unpleasant effect where your web cam feed produces a permanent horizontal flicker that is due to the electronic rolling shutter interacting with the (otherwise imperceptible) 50 or 60Hz AC powerline frequency.

The good news is that most webcams come with a facility to eliminate that effect, and on a Linux desktop it’s not difficult to permanently do so.

v4l2-ctl

The utility you want to use for this purpose is v4l2-ctl, which on Ubuntu ships with the v4l-utils package. v4l2-ctl allows you to read and set a bunch of parameters for your webcam. Here’s the set of parameters available for my Razer Kiyo, a piece of kit that I highly recommend:

$ v4l2-ctl --list-ctrls --device=/dev/video0
                     brightness 0x00980900 (int)    : min=0 max=255 step=1 default=128 value=128
                       contrast 0x00980901 (int)    : min=0 max=255 step=1 default=128 value=128
                     saturation 0x00980902 (int)    : min=0 max=255 step=1 default=128 value=128
 white_balance_temperature_auto 0x0098090c (bool)   : default=1 value=1
                           gain 0x00980913 (int)    : min=0 max=255 step=1 default=0 value=0
           power_line_frequency 0x00980918 (menu)   : min=0 max=2 default=2 value=2
      white_balance_temperature 0x0098091a (int)    : min=2000 max=7500 step=10 default=4000 value=4000 flags=inactive
                      sharpness 0x0098091b (int)    : min=0 max=255 step=1 default=128 value=128
         backlight_compensation 0x0098091c (int)    : min=0 max=1 step=1 default=0 value=0
                  exposure_auto 0x009a0901 (menu)   : min=0 max=3 default=3 value=1
              exposure_absolute 0x009a0902 (int)    : min=3 max=2047 step=1 default=127 value=127
         exposure_auto_priority 0x009a0903 (bool)   : default=0 value=1
                   pan_absolute 0x009a0908 (int)    : min=-36000 max=36000 step=3600 default=0 value=0
                  tilt_absolute 0x009a0909 (int)    : min=-36000 max=36000 step=3600 default=0 value=0
                 focus_absolute 0x009a090a (int)    : min=0 max=255 step=1 default=0 value=0 flags=inactive
                     focus_auto 0x009a090c (bool)   : default=1 value=1
                  zoom_absolute 0x009a090d (int)    : min=100 max=140 step=10 default=100 value=100

The value you’re looking for is power_line_frequency. Its default is 2 (compensating for a 60Hz powerline frequency), which means the camera should work out of the box and without any powerline flicker in the Americas and parts of Asia. I am in Europe though, where the mains frequency is 50Hz, so I need to set this to 1:

v4l2-ctl --device /dev/video0 --set-ctrl=power_line_frequency=1

However, it’s rather tedious to run that command every time I want to use the webcam.

udev

Thankfully, this process can be automated with a simple udev rule:

ACTION=="add", SUBSYSTEM=="video4linux", DRIVERS=="uvcvideo", RUN+="/usr/bin/v4l2-ctl --set-ctrl=power_line_frequency=1"

This way, any camera handled by the uvcvideo driver (meaning, practically any contemporary webcam) will have its power line frequency setting configured to the 50Hz value, eliminating the banding effect from the rolling shutter.

Chuck that line into a file in /etc/udev/rules.d, run sudo udevadm trigger, and you should be good to go.

Acknowledgments and further reading

I got the udev rule suggestion from user telcoM’s answer on this StackExchange post. The discussion thread on that post has a few additional suggestions, including some not using udev.

tag:xahteiwi.eu,2021-01-17:/resources/hints-and-kinks/webcam-rolling-shutter-udev/

Running (Almost) Anything in LXC: Applications Using Your Webcam

Florian Haas Jan 17, 2021 Updated Jan 17, 2021

One of the non-open-source applications I sometimes have to run for work purposes, and which out of principle I run in LXC containers, is Zoom. Now Zoom is of course an X application, so my previously shared considerations for those apply. It also needs to process input from my microphone …

Show full content

But a thus-configured LXC container is still missing one other bit: it’ll have to process the video feed from my webcam. Here’s how to do that.

LXC Configuration

In the article on running X applications in LXC, I give the example of sharing a host directory, (the one that contains the X.org server sockets). For sharing a webcam, I need to do the same for a few files.

Now, video capture devices like webcams are represented in Linux by character devices named /dev/video0, /dev/video1 and so forth. Udev manages these and (on Ubuntu platforms) creates them as owned by the user root and the group video — but it helpfully also creates POSIX ACL entries for the user currently logged in on the X console.

All I thus need to do is mount these files into the container (yes, LXC lets you “mount” individual files), like so:

lxc.mount.entry = /dev/video0 dev/video0 none bind,optional,create=file
lxc.mount.entry = /dev/video1 dev/video0 none bind,optional,create=file
lxc.mount.entry = /dev/video2 dev/video2 none bind,optional,create=file
lxc.mount.entry = /dev/video3 dev/video2 none bind,optional,create=file

Here, the optional bit of course means that the container will start even in case a particular file does not exist in the host at the time the container receives its lxc-start command.

That, in principle, is all there is to it.

Things to consider

Be aware that since early 2018 (in other words, in kernel 4.16 and later) the Linux kernel’s uvcvideo subsystem will create two /dev/video devices for your webcam. One of them is the actual video capture device; the second one is a metadata device node. You can easily tell which is which, with v4l2-ctl: only a video capture device will have a non-empty list of supported formats.

This is a video capture device:

$ v4l2-ctl --list-formats --device /dev/video0
ioctl: VIDIOC_ENUM_FMT
    Type: Video Capture

    [0]: 'MJPG' (Motion-JPEG, compressed)
    [1]: 'YUYV' (YUYV 4:2:2)
    [2]: 'H264' (H.264, compressed)

And this is the metadata device; note that it lists no video codecs:

$ v4l2-ctl --list-formats --device /dev/video1
ioctl: VIDIOC_ENUM_FMT
    Type: Video Capture

Normally, device nodes /dev/video0 and /dev/video1 will be occupied by a built-in webcam, your USB webcam will use /dev/video2 and /dev/video3, and if you have another video capture device then that will be /dev/video4 and /dev/video5.

Thus, perhaps you want your container to see only your USB webcam, and you don’t care about the metadata device. In that case, instead of the four lxc.mount.entry lines I gave above, you might use just one:

lxc.mount.entry = /dev/video2 dev/video2 none bind,optional,create=file

Also, the bind mounts occur at the time you start the container. Thus, if you plug in a USB webcam while the container is already running, it won’t magically become available to the container. There are two ways to address this:

You start (or restart) your container whenever you need to use a web cam (or other video device) that you have just plugged in, or
you remove the optional keyword from your lxc.mount.entry line(s), so that the container will refuse to start unless the correct webcam is plugged in.

Note further that for the same reason, if you disconnect your USB webcam while your container is running, you can’t just plug it back in and expect it to work. In that case, udev in the host will have deleted the device node, so the bind mount in your container is now stale, and your containerized applications won’t be able to use your capture device anymore. Under those circumstances, you’ll simply have to restart your container.

tag:xahteiwi.eu,2021-01-17:/resources/hints-and-kinks/lxc-webcam/

Running (Almost) Anything in LXC: Sound

Florian Haas Jan 16, 2021 Updated Jan 16, 2021

Some of the X applications I run in LXC make sounds. Now, I find alert sounds horribly distracting so I turn them off, but for some containerized applications I want to actually play sound.

Examples include the Spotify Linux client (which I run in its own LXC container because it …

Show full content

Some of the X applications I run in LXC make sounds. Now, I find alert sounds horribly distracting so I turn them off, but for some containerized applications I want to actually play sound.

Examples include the Spotify Linux client (which I run in its own LXC container because it’s not open source), and occasionally things like the latest available Shotcut version for video editing.

You’ll notice that, on face value, that’s a pretty similar problem compared to getting containerized applications to talk to my X server. It’s just that rather than applications only being clients to my X server, I also want them to be clients to my PulseAudio daemon.

LXC (Non-)Configuration

In the article on running X applications in LXC, I give the example of sharing a host directory, which contains X.org server sockets.

In principle, I could do the same thing with the Unix socket that PulseAudio runs. However, there’s a small problem with that: the directory I would have to bind-mount into my container is /run/1000/pulse, and you see the difference to bind-mounting /tmp/.X11-unix: /tmp already exists in my container on system startup — but while /run also does, /run/1000 does not. I have experimented with making this work, and I’ll spare you the details but it’s not as simple as it initially looks. I eventually gave up on that approach, because there is a much simpler way to do this — and it doesn’t even require any specific LXC container configuration.

The trick is to use the PulseAudio native-protocol-tcp module. When I load it into my running PulseAudio configuration, like so:

pactl load-module module-native-protocol-tcp

… then a PulseAudio sound server starts listening on a TCP socket on port 4713.

I can of course also add this line (minus its pactl prefix) to my PulseAudio configuration file, ~/config/pulse/default.pa.

And then, all I need to do is attach to my container, export the PULSE_SERVER environment variable set to 10.0.3.1 (my IPv4 address of the host on the lxcbr0 bridge), and launch an application.

I can do this all in one go, like so (using the Spotify client as an example):

pactl load-module module-native-protocol-tcp && \
  lxc-start -n focal-spotify && \
  sleep 1 && \
  lxc-attach -n focal-spotify -- \
  sudo -Hu florian env PULSE_SERVER="10.0.3.1" spotify && \
  lxc-stop -n focal-spotify

… and as long as the application links to any PulseAudio client libraries, it will correctly parse the set PULSE_SERVER environment variable as an instruction to connect to the given IP address on its default port, and send its audio stream there.

I am then still able to control my volume, control my mix, and mute the output from my host.

Of course, you probably want to chuck that long command into a .desktop file, or wrap it in a script or function.

By the way, no I don’t really know why I need that 1-second sleep between starting the container and attaching to it, but it works for me and breaks without it. I presume there is some initialization going on in the container that needs just a few tenths of a second to complete. And I can deal with waiting for my music for one more second.

Things to consider

Your Ubuntu desktop will most likely run with ufw enabled. If your containerized applications are unable to connect to the PulseAudio server because your firewall blocks them, you won’t get sound. Here’s what I do:

First, I create /etc/ufw/applications.d/pulseaudio, with this content:

[pulseaudio]
title=PulseAudio Native Protocol TCP
description=PulseAudio Sound Server 
ports=4713/tcp

Then, I allow traffic incoming via the LXC bridge to connect to that server:

sudo ufw allow in on lxcbr0 to any app pulseaudio

Also do consider, of course, that once your system is set up in this way, not only will your LXC applications be able to play sound through your speakers, but they will also be able to pick up input from your microphone. So use this wisely, particularly if the application you are running does record and process sound.

Sometimes you totally want your application to record sound, though, and indeed see the video stream from your webcam, too. Zoom calls come to mind as one such example. More on this in the next installment of this series, where I’ll talk about letting your containerized app use host video input.

tag:xahteiwi.eu,2021-01-16:/resources/hints-and-kinks/lxc-sound/

Running (Almost) Anything in LXC: X applications

Florian Haas Jan 9, 2021 Updated Jan 9, 2021

I occasionally want to run X applications in an LXC container. Sometimes that’s because they’re not open source and I need to run them for work, like Zoom. Sometimes it’s an open source X application that doesn’t work splendidly well on the Ubuntu release that I …

Show full content

It turns out that this isn’t particularly hard to do — if you are running X.org. To the best of my knowledge, what I am describing here cannot be expected to work, reliably, on Wayland. To me that’s no big loss, because there are several other things that I like to use (like Autokey and Plover) that won’t work on Wayland, either. So I run GNOME on X by default, anyway.

LXC Configuration

Compared to the basic LXC configuration that I have described before, there’s only one line that you’ll need to add:

lxc.mount.entry = /tmp/.X11-unix tmp/.X11-unix none bind,optional,create=dir,ro

Now let me explain what this does. /tmp/.X11-unix is where your X display sockets will live, and I map it to the same path in the container.

If I look into this directory while I’m in an X session myself, I see one single socket file in there, named X0, which is owned by my user account that owns the session.

And since my standard configuration maps my personal user account (and only my personal user account) from the host to the container, that means that processes running as florian in the container will be able to use this socket just like processes owned by florian in the host can.

Now, what’s with the create=dir and ro options?

create=dir tells LXC to create the mount point in the container if it does not exist.
ro bars processes in the container from creating or deleting any files in the directory. You see, my X server always runs in my host OS, I only want applications running in the container to connect to it, as clients. So there’s no need for applications in the container to ever modify this directory. However, you’ll almost certainly be running something on your system that will sweep /tmp on system startup (systemd-tmpfiles will, for example), and if that happened, you’d lose the socket.

With all that set up, any application that runs in the container with a default $DISPLAY variable (:0) in its environment, will connect to the socket in /tmp/.X11-unix/X0 which is a direct pass-through of the X server socket in the host.

Things to consider

Since my default configuration maps /home in the host to /home in the container, any application running in the container will happily apply the same configuration as in the host. So for example, if I start Firefox in the container, my Firefox profiles and configuration are all there. However, so are any application locks that my application creates.
Sticking with the Firefox example, I won’t be able to open a specific profile in the container that is simultaneously open in the host. I can, however, totally use two different profiles side-by-side, or the same profile sequentially in first the host, then the container or the other way round.
On a highly customized desktop your application may look different in the container than it does in the host. For example, my desktop is configured to use Cantarell as its sans-serif and Hack as its monospace font. If I neglect to install the fonts-cantarell and fonts-hack Ubuntu packages in my container, containerized X applications will instead fall back to the system default fonts. The same consideration applies for GTK themes.
I have yet to tell you about pushing sound from the container to the host, and about sharing the host’s webcam and microphone with the container. More on that in future installments in this series.

tag:xahteiwi.eu,2021-01-09:/resources/hints-and-kinks/lxc-x11/

Running (Almost) Anything in LXC: The Basics

Florian Haas Dec 28, 2020 Updated Dec 28, 2020

LXC is part of my standard Linux desktop toolbox, and I use it daily. I have done tutorials about this before, one of which you can find on YouTube (courtesy of linux.conf.au) and GitHub, but it’s about time I included this in a series of articles.

My …

Show full content

My motivations for running LXC containers are manifold, but here are some of the most important ones:

I want to keep my main system clean of anything that’s not free and open source software. There is, however, the odd bit of non-free software that I do need to or want to use — Zoom for work, for example, or the excellent Spotify Linux client for pleasure.
Even if a piece of Software is open source, it sometimes does not play nicely with the version of my main system that I currently use. A recent example is the somewhat premature inclusion of pre-release versions of Calibre in Debian and Ubuntu, which means that Calibre is currently not playing too nicely on Ubuntu Focal (the current LTS at time of writing), but runs just dandy on Bionic, which I can handily run in an LXC container.
Sometimes the opposite is true as well, that is, some application comes in a version that I want to use, except it’s only bundled with a future Ubuntu (or Debian) release that I am not yet prepared to use. Or else, it’s available only on Fedora or openSUSE, which are perfectly fine desktop distributions but just not my preferred ones to use on a daily basis. In that case, LXC containers are exceedingly useful as well, and are much less hassle than building the application in question from source.

Here are my general rules for running LXC containers:

I run my containers as non-root, under my own user account. (If you are unfamiliar with this, and would like to learn more about how it works and how you need to tweak your system to enable it, please see the excellent LXC Getting Started guide.)
I use UID and GID mapping rules so that all of the container’s user accounts, including the container’s root, are mapped to subgids and subuids of my account — all except my own user account and group, with uid and gid 1000.
I bind-mount the /home directory into the container. Combined with the uid and gid passthrough of my own account, this means that florian in the container can access /home/florian in any container, just like in the host.
I run all my containers in btrfs subvolumes.
I maintain a basic container configuration for each Ubuntu release I run, and then I duplicate that configuration for a bunch of containers using snapshot cloning (lxc-copy -s), which in combination with btrfs makes the clones quite space-efficient.

For example, this is the “container specific configuration” section in ~/.share/lxc/focal/config, the configuration for my current base container running Ubuntu Focal:

# Container specific configuration
lxc.include = /etc/lxc/default.conf
lxc.idmap = u 0 100000 1000
lxc.idmap = g 0 100000 1000
lxc.idmap = u 1000 1000 1
lxc.idmap = g 1000 1000 1
lxc.idmap = u 1001 101001 64535
lxc.idmap = g 1001 101001 64535
lxc.mount.auto = proc sys cgroup
lxc.rootfs.path = btrfs:/home/florian/.local/share/lxc/focal/rootfs
lxc.uts.name = focal
lxc.mount.entry = /home home none bind,optional 0 0

Of this, perhaps the lxc.idmap settings merit a bit of extra explanation:

lxc.idmap = u 0 100000 1000 means “map the uid 0 (root) in the container to uid 100000 in the host, and continue up until you’ve hit 1,000 mappings”. In other words, map uids 0 to 999 including, to 100000 to 100999.
lxc.idmap = u 1000 1000 1 means “map the uid 1000 in the container to uid 1000 in the host,” (in my case, my user account named florian) “and follow this pattern for just one mapping”. In other words, make uid 1000 a pass-through.
Finally, lxc.idmap = u 1001 101001 64535 means “starting with uid 1001 in the container, map it to uid 101001 in the host and proceed until you’ve hit 64,535 mappings”.

So in total, that’s LXC-ese for “map all possible uids from 0 to 65535 in the container to host subuids shifted by 100,000 except 1000, which you shouldn’t map to any subuid. And the same is true for gids, for the g idmaps. It’s a rather roundabout way of specifying this, but it works.

Now by itself, this already gives me plenty of options for command-line applications. But since it’s my main workstation that I run this on, I usually want my applications to be wired up to my desktop GUI. More on that in the next installment of the series.

tag:xahteiwi.eu,2020-12-28:/resources/hints-and-kinks/lxc-basics/

Add depth! Stereoscopic imagery for everyone

Florian Haas Nov 6, 2020 Updated Nov 6, 2020

A talk I submitted to linux.conf.au 2021.

Show full content

This is a talk I submitted1 to linux.conf.au 2021. It was, unfortunately, rejected.

If you run a conference or meetup (on-person or online) where you think this talk would be a good fit, please let me know! I’d still love to present it when the opportunity arises.

Title

Add depth! Stereoscopic imagery for everyone

Target Audience

User

Abstract

This will appear in the conference programme. Up to about 500 words. This field is rendered with the monospace font Hack with whitespace preserved

Stereoscopic imagery (photography and videography) is a fascinating way to create 3-dimensional images of landscapes, unmoving and moving objects, and of course, people.

In this talk, we’ll cover the basics of stereoscopic imagery and projection, discover how stereoscopic vision works, and how we can trick our brains into perceiving depth from two flat images.

We start with the principles of three-dimensional vision in humans: how our eyes use the combination of focus and vergence to signal two slightly different images of our surroundings to our brain, and how our brain then processes these images to give us the perception of depth. Then, we discuss the techniques available to play tricks on our brains in which two slightly (but cleverly) distinct two-dimensional images are presented to our eyes in such a way that our mind conjures up depth where there objectively is none.

These techniques come in various forms, from very high tech (such as virtual reality goggles) to very low tech (like mechanical stereoscopic viewers), but some can deal without any projection technology at all: this is called freeviewing, and for most people it is a remarkably simple and low-cost way to enjoy stunning three-dimensional imagery. We’ll cover the parallel-view and crossview freeviewing techniques.

We’ll then dive into the simple but highly effective steps of making stereoscopic images, using run-of-the-mill cameras (even cell phones), and some straightforward image processing in the GIMP.

Finally, we talk about some neat little tricks to make stereoscopic videos, with minimal cost and investment. We’ll look at how we can make 3D video with just a GoPro, or a simple drone camera — again using a free software tool, namely the OpenShot video editor, for processing.

Private Abstract

This will only be shown to organisers and reviewers. You should provide any details about your proposal that you don’t want to be public here. This field is rendered with the monospace font Hack with whitespace preserved

This talk does not cover a specific software project; the “Project URL” below is simply a Flickr album containing a set of stereoscopic images created with the technique I am describing.

The fact that LCA is an online event this year would suit this talk particularly well: when I get to the point of explaining freeviewing to attendees, I would expect novices to have some difficulty with one of the freeviewing techniques, and some, with both. The latter would have the option of simply backing up the stream and re-watching the instructions and the test images provided, which is an option that would not exist in a live talk.

Accessibility note: Regretfully, this talk will have limited accessibility for people with vision deficiencies. Specifically, the 3D effects presented will be inaccessible to people with complete loss of vision in one eye (or both), nystagmus, or strabismus. People with these conditions will still be able to learn from the techniques presented in the talk, but will likely be unable to perceive the demonstrated 3D effects themselves. People with intraocular lens (IOL) implants might also have difficulty following some of the examples in the talk.

Project URL

https://www.flickr.com/gp/77872933@N02/SSzK0w

If you’re curious why this is here, please read this. ↩

tag:xahteiwi.eu,2020-11-06:/talk-submissions/lca-2021-stereoscopy/

I Don’t Think This Means What You Think It Means: Red Herrings in OpenStack

Florian Haas Oct 22, 2020 Updated Oct 22, 2020

My talk from the Open Infrastructure Summit, October 2020.

Show full content

This is a talk of which I had done a previous version at Open Infra Days Nordic in 2019, but where unfortunately the audio came out really messed up in the recording. This version is much better.

Talk video: YouTube

You can review my slides, including all speaker notes:

Rendered slides: GitHub Pages
Slide sources (CC-BY-SA): GitHub

tag:xahteiwi.eu,2020-10-22:/resources/presentations/i-dont-think-this-means-what-you-think-it-means-red-herrings-in-openstack/

What I now know about HAproxied Django database connections, and wish I'd known sooner

Florian Haas Sep 8, 2020 Updated Sep 8, 2020

My talk from PyConline AU 2020.

Show full content

I’ve been wanting to speak at PyCon.au for a long time now, and finally did for the 2020 PyConline AU edition.

This was a truly wonderful conference, in which the organisers and attendees collectively bent over backwards and put in a massive effort to run an event that was just as wonderful as an in-person event would have been.

This is a purely technical talk, and covers some very interesting aspects of dealing with Galera high availability clusters for MySQL and MariaDB database servers, connecting though them from Django via an HAProxy load balancer, and then dealing with interesting side effects of that combination.

Talk video: YouTube

And, as always, you can also review my slides, with all my speaker notes:

Rendered slides: GitHub Pages
Slide sources (CC-BY-SA): GitHub

tag:xahteiwi.eu,2020-09-08:/resources/presentations/what-i-now-know-about-haproxied-django-database-connections-and-wish-id-known-sooner/

No, We Won’t Have a Video Call for That: Slides and Recordings

Florian Haas Aug 22, 2020 Updated Aug 22, 2020

These are the slides and recordings of the original talk that ultimately became this article.

The talk recording is available from two different sources:

Video and audio in multiple formats, for viewing and download: CCC Media server
Streaming: YouTube

And, as always, you can also review my slides, with all …

Show full content

These are the slides and recordings of the original talk that ultimately became this article.

The talk recording is available from two different sources:

Video and audio in multiple formats, for viewing and download: CCC Media server
Streaming: YouTube

And, as always, you can also review my slides, with all my speaker notes:

Rendered slides: GitHub Pages
Slide sources (CC-BY-SA): GitHub

tag:xahteiwi.eu,2020-08-22:/resources/presentations/no-we-wont-have-a-video-call-for-that-slides-and-recordings/

No, We Won’t Have a Video Call for That!

Florian Haas Aug 22, 2020 Updated Aug 22, 2020

Communications in distributed teams: a write-up of my talk from FrOSCon 2020, Cloud Edition.

Show full content

FrOSCon 2020 was an online event due to the COVID-19 pandemic, and gave me the opportunity to present an extended and heavily updated version of my DevOpsDays 2019 talk.

I normally make my talks available as a video, and a slide deck with full speaker notes. In this case though, I consider it fitting to write the whole thing out, so that you don’t need to watch a full length video in 45 minutes, but can read the whole thing in 15.

You’ll still find links to the recording and deck downpage, as usual.

No, we won’t have a video call for that!

Communications for distributed teams

FrOSCon 2020

This presentation is a talk presented at FrOSCon 2020 Cloud Edition. It is CC-BY-SA 4.0 licensed, see the license for details.

Hello and welcome, dear FrOScon people — this is my talk on communications in distributed teams. My name is Florian, this is the second time I‘m speaking at FrOScon, and you probably want to know what the hell qualifies me to talk about this specific issue. So:

Why am I talking here?

So, why am I talking about that?

Or rather more precisely, why am I talking about that?

I turned 40 last year, have been in IT for about 20 years now (19 full-time), and out of that I have worked

in 4 successive companies, all of which worked out of offices, for 11 years,
in a completely distributed company, that I founded, for 6 years,
and now, for about three years, I have been running a distributed team that is a business unit of a company that has existed for 15 years and throughout that time, has only ever worked from a single office.

So I think I might have seen and become aware of some of the rather interesting challenges that come with this.

What changed since last time?

I originally wrote and presented this talk for the first time in December 2019. At the time, you probably had forgotten about SARS, had no idea what SARS-CoV2 or COVID-19 were, and many of you were probably working from offices.

And then something like three months later, everything changed and suddenly, this talk became much more relevant to a much greater audience.

And something else happened: a lot of people suddenly started talking about working from home and distributed teams, and a lot of those people who were talking very loudly, had themselves only been working with or managing distributed teams since March. And a fair amount of what you could about the subject then, and can still read now, is complete and utter bullshit.

So there’s one point I actually didn’t make in the initial version of this talk, because I thought it was self-evident. But I have come to the conclusion that to a lot of people it is not, so to rectify this omission from last December — and with apologies for that omission to the wonderful DevOpsDays Tel Aviv crowd, who were my first audience for this talk, let me make this one thing very clear from the outset:

Effective distributed collaboration is not pretending to be in an office while staring into a webcam all day.

You will never be able to capitalize on work as a distributed team unless you kick some office habits. The key to distributed teams being effective is not that they happen to not be in the same place, as you’ll see from the remainder of this talk. So to expect success from the approach that you take the habits of an office, simply remove the element of locality, replace every face to face meeting with a video call and carry on, is ludicrous.

The good news is that if you do it right, you’ll end up with a far better team than a local one would ever be, and everyone has a chance at far better work-life balance, and you don’t waste awful amounts of time and energy and fossil fuels on your commute.

What’s in this talk?

So you’ll find a few general themes throughout this talk:

What modes we have available for communications in teams;
Why distributed teams always collaborate asynchronously, and what communication modes lend themselves to that particularly well;
Why written communication is so important in distributed teams;
And why meetings (like video calls) are a mode of communication that effective distributed teams hardly ever need to use — except for very specific reasons.

But I do want to state one thing upfront:

This is not science.

Nothing of what I am talking about is steeped in any scientific rigour. I present anecdotes, not evidence. I might be mistaking correlation for causation, or the other way round. It’s solely based on my personal experience, and the experience of others I have talked to, watched, or read. Everything I say here is subject to debate and rebuttal, or you can simply have a different opinion.

But it’s definitely not science.

Now with all of that said, let me attempt to give a definition of a distributed team, according to my understanding:

A distributed team is a professional group whose members do not rely on proximity in order to routinely collaborate productively.

Now this is clearly not an ideal definition, not least because it defines something by a negative, and an outside factor to boot: it defines a distributed team by what it does not need to exist to function. But it’s the best definition I’ve been able to come up with.

Now there’s a couple of key words in here:

Professional. I’m talking about teams that work towards a professional goal. This doesn’t necessarily mean that they all work in the same company. They could, for example, all work in different companies collaborating on a joint project, which is what frequently happens in open source software projects. But they’re not pursuing their hobby, they’re doing their jobs.
Routinely. I’m talking about teams that habitually work in a distributed fashion, not the work that goes on in an office-based team when one person is having a work-from-home day.

It is important to understand that that lack of proximity is not only spatial, it is temporal as well, because:

Working in a distributed team means working asynchronously.

If your team is distributed, this is equivalent to saying that it works in an asynchronous fashion, that is to say, that people will work on things in parallel, and a capable distributed team will have just as few synchronization points as absolutely necessary.

The reason for this is not just working in different timezones, but also the fact that everyone will have their own daily routine, and/or have their individual times when they are most productive. Which you will not attempt to synchronize. (Doing so would mean setting the entire team up for failure.)

Now, this doesn’t come for free, nor does it fall in our lap:

Being productive in a distributed team is a skill that most people must learn; it is not innate to us.

People are not born with the ability to work in a distributed team. Humans function best in groups that collaborate in close proximity to one another; it is only very recently that technology has started to enable us to override that to an extent — giving us other benefits like the ability to work from home, or the ability to hire people residing anywhere, provided they have internet connectivity.

So we now can work in teams despite being continental distances away from each other but we do have to acquire the skills to do that. And if we fail to do so, that has a rather grave disadvantage, which is that…

Nothing has as dire an impact on productivity as poor communications.

This is a truism that applies to both distributed and non-distributed teams. Having bad communications will wreck any project, blow any budget, fail any objective. Now note that the reverse is not true: having good communications does not guarantee success. But having bad communications does guarantee failure.

And here is one thing to start with:

A capable distributed team habitually externalises information.

Information is generally far less useful when it is only stored in one person’s head, as opposed to being accessible in a shared system that everyone trusts and can use. If you take important information out of your own head and store it in a medium that allows others to easily find and contextualise it, that’s a win for everyone.

And since we’re all technology people, we typically have multiple facilities to externalise, share, and then access information at our disposal. So let’s see how those compare.

Modes of communication in distributed teams

A distributed team will habitually use multiple modes of communication, relying mostly on those that make sharing, finding, and contextualising information easy, and avoiding those that make it difficult.

In many teams, distributed or not, using chat as a default mode of communication is becoming the norm. Now with an important exception, which I’ll get to near the end of the talk, this is not a symptom of having a particularly dynamic or efficient team; it’s the opposite.

Excessively using chat isn’t being efficient. It’s being lazy.

It’s a symptom of the worst kind of laziness (not malice!): in an attempt to communicate quickly and easily, for yourself, you are really making things harder for everyone, including yourself.

Share Find Contextualise Chat 🙂 😐 🙁

This is because, while sharing information in a chat is extremely easy, it is also a “fire and forget” mode of communications. Chat makes it difficult to find information after the fact. If you’ve ever attempted to scour a busy Slack or IRC archive for a discussion on a specific topic that you only remember to have happened a “few months ago”, you’ll agree with me here.

It’s even more difficult to read a Slack discussion in context, that is to say in relation to other discussions on the same topic, days or weeks earlier or later.

Let’s compare that to other communication modes:

Share Find Contextualise Chat 🙂 😐 🙁 Email 😐 😐 😐

Email makes it easy to share information with a person or a group from the get-go, but quite difficult to loop people into an ongoing discussion after the fact. Finding information later is just as hard as with chat, and it’s marginally better at contextualizing information than chat (because you get proper threading).

Share Find Contextualise Chat 🙂 😐 🙁 Email 😐 😐 😐 Wiki 🙂 🙂 🙂 Issue tracker 🙂 🙂 🙂

A wiki and an issue tracker (provided you don’t lock them down with silly view permissions), in contrast, both make it very easy to share, find, and contextualise information.
Note that “wiki”, in this context, is shorthand for any facility that allows you to collaboratively edit long-form documents online. That can be an actual wiki like a MediaWiki, but also something like Confluence, or even shared Google Docs.
Likewise, “issue tracker” can mean RT, OTRS, Jira, Taiga, Bugzilla, whatever works for you.

Share Find Contextualise Chat 🙂 😐 🙁 Email 😐 😐 😐 Wiki 🙂 🙂 🙂 Issue tracker 🙂 🙂 🙂 Video call 😐 🙁 🙁

Video calls are even worse than chat or email, because sharing information works but doesn’t scale — you can’t reasonably have more than 5-or-so people in a video call, and sharing the recording of a full video call is just pointless.

So really, make your wiki and your issue tracker your default mode of communications, and use the others sparingly. (This isn’t meant to be a euphemism for “don’t use them”, as we’ll get to in a moment.)

Text chat

So. Let’s talk about text chat. These days, that frequently means Slack, but what I am talking about also and equally applies to IRC, Mattermost, Riot, and anything similar.

Is text chat universally useful? No. Is it universally bad? Not that, either. There is a very specific type of situation in which text chat is a good thing:

Use chat for collaboration that requires immediate, interactive mutual feedback.

Using interactive chat is a good idea for the kind of communication that requires immediate, interactive mutual feedback from two or more participants. If that is not the case, chat is not a good idea.

This means that the only thing that chat is good for is communication that is required to be synchronous, and remember, in a distributed team asychronicity is the norm. So using interactive chat for communications needs to be an exceptional event for a distributed team; if it is instead a regular occurrence you’ll make everyone on the team miserable.

For any interaction that does not require feedback that is both immediate and interactive, email, a wiki, or an issue tracker are far superior modes of communication.

The only reason to use DMs for collaboration
is a need for immediate, interactive mutual feedback
and confidentiality.

Using chat direct messages (DMs) as the default means of communication is utterly braindead. In order for a chat DM to be useful, there is precisely one clearly delineated confluence of events that must occur:

You need immediate feedback from the other person,
you need mutual back-and-forth with the other person,
you don’t want others to follow the conversation.

I can’t emphasize enough that this combination is perfectly valid — but it is exceedingly rare. If you want just a private exchange of ideas with someone, encrypted email will do. If you want to work on something together with one person before you share it with others, restricted view permissions on a wiki page or an issue tracker ticket will work just fine.

If you don’t need confidentiality but you do need interactive and immediate feedback, chances are that you’re working on something urgent, and it is far more likely you’ll eventually need to poll other opinions, than that you won’t. So just use a shared channel from the get-go, that way it’s easier for others to follow the conversation if needed — and they might be able to point out an incorrect assumption that one of you has, before you end up chasing a red herring.

A chat ping is a shoulder tap.

“Pinging” someone in a chat (that is, mentioning their username, which usually triggers a visual or auditory notification), is exactly like walking up to a person, interrupting what they are doing, tapping them on the shoulder, and asking them a question.

No matter whether it is your intention or not, they will feel compelled to answer, relatively promptly (the only exception is when you’ve done this so often that you have conditioned your colleagues to ignore you — congratulations).

This means that you’ve broken their train of thought, yanked them out of a potentially complex task, forced them to redo what they did pre-interruption, or actually have them commit a mistake.

So pinging someone in a chat is something you should only do if you are aware of exactly this risk, and you are convinced that whatever you’re pinging about is more important. Otherwise, to be very blunt, you’ll be seen as the asshole.

Want people to hate you? Send naked pings.

A “naked ping” is the action of sending someone a message consisting only of their username and a marker like “ping”, “hi”, “hey” or similar.

14:00:02Z johndoe: florian: ping
[...]
15:56:17Z florian: johndoe: I hate you

Don’t. Just don’t.

Any person who is versed in the use of chat communications will, when subjected to this behavior, be inclined to flay you alive. Infinitely more so if it’s a DM. Do not do this.

Instead, always provide context. Always always always. Don’t say “can I ask you a question, instead, ask the question. If something isn’t urgent, say something like “no urgency.”

14:00:02Z johndoe: florian: can I get your eyes on PR #1422?
[...]
15:56:17Z florian: johndoe: done! 
                   (was afk for a bit – sick kiddo)
15:56:58Z johndoe: florian: np, ty

It should be self-evident why this is better than naked pings, but if to you it is not, then please read Naked Pings, courtesy of Adam Jackson and Mark McLoughlin.

Video calls

(Zoom, Hangouts, BlueJeans etc.)

Next, I’d like to talk about video calls. Doesn’t matter what technology you’re using. Could be Zoom, Google Hangouts, BlueJeans, Jitsi, whatever.

And I’d like to address this specifically, given the fact that in the current pandemic the use of video calls appears to have skyrocketed.

There’s a very good reason to use video calls: they give you the ability to pick up on nontextual and nonverbal cues from the call participants. But that’s really the only good reason to use them.

Video calls have a significant drawback: until we get reliable automatic speech recognition and transcription, they are only half-on-the-record. Hardly anyone goes to the trouble of preparing a full transcript of a meeting, and if anything, we get perhaps a summary of points discussed and action items agreed to. So even if we keep recordings of every video call we attend, it’s practically impossible to discern, after the fact, what was discussed in a meeting before decisions were made.

It is also practically impossible to find a discussion point that you only have a vague recollection of when it was discussed in a video call, whereas doing so has a much greater probability of success if a discussion took place on any archived text-based medium.

Every video call needs an agenda.

This is, of course, true for any meeting, not just those conducted by video call.

A conversation without an agenda is useless. You want people to know what to expect of the call. You also want to give people the option to prepare for the call, such as doing some research or pulling together some documentation. If you fail to circulate those ahead of time, I can guarantee that the call will be ineffective, and will likely result in a repeat performance.

Until machines get intelligent enough to automatically transcribe and summarise words spoken in a meeting, write notes and a summary of every meeting you attend, and circulate them.

Just as important as an agenda to set the purpose of the meeting, is a set of notes that describes its outcome.

Effective distributed teams understand that the record of a call is what counts, not the call itself. It is not the spoken word that matters, but the written one.

From that follows this consequence:

To be useful, the write-up of a call takes more time and effort than the call itself.

If you think that video calls are any less work than chat meetings or a shared document that’s being edited together or dicussed in comments, think again. The only way a video call is less work, is when everyone’s lazy and the call is, therefore, useless. Every meeting needs notes and a summary, and you need to circulate these notes not only with everyone who attended the meeting, but with everyone who has a need-to-know.

Here’s the standard outline I use for meeting notes:

Meeting title
Date, time, attendees
Summary
Discussion points (tabular)
Action items

Putting an executive summary at the very top is extraordinarily helpful so people can decide if they

should familiarise themselves with what was discussed, immediately, and possibly respond if they have objections, or
only want to be aware of what was decided, or
just keep in the back of their head that a meeting happened, that notes exist, and where they can find them when they need to refer back to them.

Once you do meetings right, you no longer need most of them.

The funny thing is that once you adhere to this standard — and I repeat, having a full and detailed record is the only acceptable standard for video meetings – you’ll note that you can actually skip the meeting altogether, use just a collaboratively edited document instead of your meeting notes, and remove your unnecessary synchronization point.

Video calls for recurring team meetings

There is one thing that I do believe video calls are good for, and that is to use them for recurring meetings as as an opportunity to feel the pulse of your team.

Obviously, a distributed team has few recurring meetings, because they are synchronization points, and we’ve already discussed that we strive to minimize those. So the idea of having daily standups, sprint planning meetings, and sprint retrospectives is fundamentally incompatible with distributed teams. Aside: in my humble opinion, this is also why using Scrum is a terrible idea in distributed teams — not to mention that it’s a terrible idea, period.

However, having perhaps one meeting per week (or maybe even one every two weeks) in a video call is useful precisely for the aforementioned reasons of being able to pick up on nonverbal clues like body language, posture, facial expressions, and tone. If people are stressed out or unhappy, it’ll show. If they are relaxed and productive, that will show too.

Note that these meetings, which of course do follow the same rules about agenda and notes, are not strictly necessary to get the work done. The team I run has one one-hour meeting a week, but whenever that meeting conflicts with anything we skip it and divide up our work via just the circulated coordination notes, and that works too. The meeting really serves the purpose of syncing emotionally, and picking up on nonverbal communications.

Briefing people

Whenever you need to thoroughly brief a group of people on an important matter, consider using a 5-paragraph format.

Situation
Mission
Execution
Logistics
Command and Signal

This is a format as it is being used by many armed forces; in NATO parlance it’s called the 5-paragraph field order. Now I’m generally not a fan of applying military thinking to civilian life — after all we shouldn’t forget that the military is an institution that kills people and breaks things, and I say that as a commissioned officer in my own country’s army —, but in this case it’s actually something that can very much be applied to professional communications, with some rather minor modifications:

Situation
Objective
Plan
Logistics
Communications

Let’s break these down in a little detail:

Situation is about what position we’re in, and why we set out to do what we want to do. You can break this down into three sub-points, like the customer’s situation, the situation of your own company, any extra help that is available, and the current market.
Objective is what we want to achieve.
Plan is how we want to achieve it.
Logistics is about what budget and resources are available, and how they are used.
Communications is about how you’ll be coordinating among yourselves and with others in order to achieve your goal.

Note that people always have questions on what they’ve just been briefed about. They just might not think of them straight away. Give people time to think through what you’ve just briefed them on, and they will think of good questions. So always have a follow-up round at a later time (2 hours later, the following day, whatever), for which you encourage your group to come back with questions.

Also, use that same follow-up for checking how your briefing came across, by gently quizzing people with questions like

“by what date do we want to implement X?”, or
“Joe, what things will you need to coordinate with Jane on?”

This gives you valuable feedback on the quality of your briefing: if your team can’t answer these questions, chances are that you weren’t as clear as you should have been.

Pinching the firehose

Finally, I want to say a few words about what I like to call pinching the figurative firehose you might otherwise be forced to drink from:

The amount of incoming information in a distributed team can be daunting.

When you work in a distributed team, since everyone is on their own schedule and everything is asynchronous, you may be dealing with a constant incoming stream of information — from your colleagues, your reports, your manager, your customers.

There is no way to change this, so what you need to do is apply your own structure to that stream. What follows is not the way to do that, but one way, and you may find another works better for you. But you will need to define and apply some structure, otherwise you’ll feel constantly overwhelmed and run the risk of burning out.

Consider using the “4-D” approach when dealing with incoming information.

(Hat tip to David Allen)

There’s a defined approach for doing this, which I learned about from reading David Allen’s Getting Things Done. I don’t know if Allen invented the 4-D approach or whether someone came up with it before him, but that’s how I know about it.

In his book, David Allen suggests to apply one of the following four actions to any incoming bit of information:

Drop means read, understand, and then archive. It’s what you use for anything that doesn’t require any action on your part.
Delegate is for things that do require action, but not from you. Make sure that it gets to the right person and is understood by them, and make a note for follow-up.
Defer means it needs doing, and it’s you who needs to do it, but it doesn’t need doing immediately. Enter it into your task list (to use a very generic term, more on this in a bit), and clear it from your inbox.
Do are the (typically very few) things that remain that need to be done by you, and immediately.

Following this approach does not mean that you’ll never be overwhelmed by the amount of information that you need to process. But it’ll greatly reduce that risk.

“Drop” rules

“Dropping” things doesn’t mean ignoring them. You still have to read and understand what’s in them, and be able to find them later. So:

Never delete things (except spam).
Only archive them in a way that that keeps them retrievable in the future.
If there something isn’t understandable to you, think it through and look for clarification.

“Delegate” rules

Delegation obviously requires that there is a person you can delegate to. This is not necessarily someone who reports to you; indeed, it might be someone you report to. (You might be asked to deal with something that you have no control over, but your manager does.) So:

Find the right person that can get the task done.
Preemptively send them all the information that you think they might need (and that you have access to), rather than relying on them to ask.
Ask them to acknowledge that they have received what they need.
Make a note to follow up to see if they need anything else, and follow through by seeing the task to completion.

Within your own team, you only ever delegate tasks, not responsibility.

Tasks without follow-up and follow-through are a waste of people’s time.

Do not delegate, or even define, tasks that you are not prepared to follow through on. If you handwave “everyone use encrypted email from now on,” and you’re not even prepared to make that work for your own email account, you might as well just leave it.

And if you do proclaim an objective or rule and then you find yourself unable to see it through — this happens, and is no sign of ineptitude or failure — then loudly and clearly rescind it. It’s far better for you to visibly backtrack, than to be perceived as someone whose pronouncements are safe to ignore.

“Defer” rules

Deferring simply means that because something you need to do doesn’t need doing immediately, you can do it at a time that suits your schedule.

This means that you’ll need to

add the task immediately to some sort of queue (for email, this can be a folder named “Needs Reply”),
make sure to go through that queue at a later time to prioritize (ideally, right after you’re done with your “Do” tasks, which we’ll get to in a second),
absolutely ensure that you make time to go back and actually do your prioritized tasks, at a time you consider convenient.

“Do” rules

And finally, there’ll be your “Do” tasks — stuff that you need to do, and do immediately.

Tell people that you’re doing them, because you’ll want to be uninterrupted. Update your chat status, put some blocked time in your calendar.
Make sure you’ll be uninterrupted. For email, turn off all your notifications.
Plow through all the undropped, undelegated, undeferred items in your inbox until it’s empty.

But what about the watercooler?

The entirety of this talk, up to this point, has focused on professional communications. And among people unfamiliar or unexperienced with work in a distributed team, it is often accepted that teams can communicate well “professionally.”

However, they frequently ask, “what about watercooler chats? What about the many informal discussions that happen at work while people are getting some water or coffee, or sit together over lunch? There’s always so much communication happening at work that’s informal, but is extremely beneficial to everyone.”

Office workers often don’t habitually externalise information. A distributed team that tries that won’t last a week.

Firstly, many companies where information exchange hinges on coffee or cafeteria talk simply don’t give a damn about externalising information. Sure, if 90% of your company’s knowledge is only in people’s heads, you’re dead without the lunchroom.

But if the same thing happens in a distributed team, it never gets off the ground. So, if you have a team that’s functional and productive, because it habitually externalises information, the absence of chit-chat over coffee has zero negative impact on information flow.

However, you may also be interested in the completely non-work-related talk that happens over coffee, that simply contributes to people’s relaxation and well-being.

People working in distributed teams are often introverts. Or they simply choose to have their social relationships outside of work.

I know this might shock some people, but there are plenty of people who can make a terrific contribution to your company, but who dislike the “social” aspect of work. They might thrive when being left alone, with as little small-talk as possible, and ample opportunity to socialize with their friends and family, away from work.

But if you do have people on your team that enjoy having an entirely informal conversation every once in a while, there totally is room for that even in a distributed team. All you need to do is agree on a signal that means “I’m taking a break and I’d be happy to chat with anyone who’s inclined, preferably about non work related things” (or whatever meaning your group agrees on).

This could be

a keyword on IRC,
a message to a specific channel, or
(if you want to get fancy) a bot that updates your group calendar when it receives a message with a particular format.

However, as a word of caution, I’ve actually done this with my team before, and it didn’t catch on — for the simple reason that we almost never took breaks that happened to overlap. But that doesn’t rule out that it works on your team, and also there’s always the remote possibility that two or more people on your team might like to schedule their breaks concurrently.

What you can also do, of course, is have a channel in which you can discuss completely random things that are not work related. And if the rule is that confidential or company-proprietary discussion topics are off-limits there, the channel might as well be public. It might even be Twitter.

The antithesis: ChatOps

I do want to mention one other thing for balance. There is a complete alternative framework for distributed teams working together, and it’s what people refer to as ChatOps.

To the best of my knowledge, the first company to run ChatOps on a large scale and talk about it publicly was GitHub, back in 2013 in a RubyFuza talk by Jesse Newland.

If a distributed team operates on a ChatOps basis, the interactive text chat is where absolutely everything happens.

Everyone lives in chat all the time, and all issues, alerts and events are piped into the chat.
Everything is discussed in the chat, and everything is also resolved in the chat.
Such a system relies on heavy use of chat bots. For example, if an alert lands in the channel, and the discussion then yields that the proper fix to the problem is to run a specific Ansible playbook, you send an in-chat bot command that kicks off that playbook, and then reports its result.

And this is of course very laudable, because it resolves a major issue with using chat, which is the classic scenario of something being discussed in a chat, someone else then going away for a bit and then coming back saying “I fixed it!”, and nobody else actually understanding what the problem was.

If you make everything explicit and in-band, in becomes easy, in principle, to go back to a previously-solved problem that reappears, and replay the resolution.

When does ChatOps make sense? Here’s a hint: It’s called ChatOps.

So can this make sense? Yes, absolutely. Under what circumstances though? I maintain that this is best suited for when your work tends to be inherently linear with respect to some dimension. For example, if your primary job is to keep a system operational versus the linear passage of time, ChatOps is an excellent approach.

And keeping complex systems operational over time is the definition of, you guessed it, ops. So ChatOps may be a very suitable communication mode for operations, but it’s highly unlikely to be efficient as a generic mode of communication across distributed teams.

And even then I posit it’s difficult to get right, since you’ll have to curb channel sprawl and threading and other things, but’s that’s a whole ‘nother talk and indeed a talk for another speaker, because I don’t lead an ops team.

To summarize…

So to summarize, here are my key points from this talk, in a nutshell — please make these your key takeaways.

Distributed teams are better than localized teams — not because they’re distributed, but because they’re asynchronous.
Avoid anything that makes a distributed team run synchronously.
Use less chat.
Have fewer meetings.
Write. Things. Down.

tag:xahteiwi.eu,2020-08-22:/resources/presentations/no-we-wont-have-a-video-call-for-that/

Celery to Chew On

Florian Haas May 6, 2020 Updated May 6, 2020

Asynchronous Celery tasks that manipulate a MySQL/Galera database from a Django application can produce very interesting behavior when HAProxy is involved.

Show full content

Asynchronous Celery tasks that manipulate a MySQL/Galera database from a Django application can produce very interesting behavior when HAProxy is involved.

Some basics

When you’re running a Django application, the following things are all pretty commonplace:

You use MySQL or MariaDB as your Django database backend.
You don’t run a single standalone MySQL/MariaDB instance, but a Galera cluster.
You run asynchronous tasks in Celery.

This way, if you have a complex operation in your application, you don’t necessarily have to handle it in your latency-critical request codepath. Instead, you can have something like this:

from celery import Task

class ComplexOperation(Task)
   """Task that does very complex things"""

   def run(self, **kwargs):
      # ... lots of interesting things

… and then from your view (or management command, or whatever), you can invoke this like so:

from .tasks import ComplexOperation

def some_path(request):
   """/some_path URL that receives a request for an asynchronous ComplexOperation"""
   # ...

   # Asynchronously process ComplexOperation
   ComplexOperation.delay(pk=request.GET['id'])

   # ...

What this means is that the code defined in ComplexOperation’s run() method can run asynchronously, while the HTTP request to /some_path can immediately return a response. You can then fetch the asynchronous task’s result in a later request, and present it to the user.

(Note that there are other ways to invoke Celery tasks; getting into those in detail is not the point of this article.)

MySQL/Galera via HAProxy

Now, let’s inject another item into the setup. Suppose your application doesn’t talk to your Galera cluster directly, but via HAProxy. That’s not exactly unheard of; in fact it’s an officially documented HA option for Galera.

If you run a Django application against an HAProxyfied Galera cluster, and you have rather long-running Celery tasks, you may see occurrences of OperationalError exceptions that map to MySQL error 2013, Lost connection to MySQL server during query.

Error 2013 means that the connection between the client and the server dropped in the middle of executing a query. This is different from error 2006, MySQL server has gone away, which means that the server has gracefully torn down the connection. 2013 is really an out-of-nowhere connection drop, which normally only occurs if your network has gone very wonky.

With HAProxy however, that service may be your culprit. An HAProxy service sets four different timeout values:

timeout connect: the time in which a backend server must accept a TCP connection, default 5s.
timeout check: the time in which a backend server must respond to a recurring health check, default 5s.
timeout server: how long the server is allowed to take before it answers a request, default 50s.
timeout client: how long the client is allowed to take before it sends the next request, default 50s.

Distilling the timeout problem

If you have access to manage.py shell for your Django application, here’s a really easy way for you to trigger an adverse effect of this default configuration. All you have to do is create an object from a model, so that it fetches data from the database, then wait a bit, then try to re-fetch. Like so:

./manage.py shell
[...]
(InteractiveConsole)
>>> from time import sleep
>>> from django.contrib.auth import get_user_model
>>> User = get_user_model()
>>> me = User.objects.get(username='florian')
>>> sleep(40)
>>> me.refresh_from_db()
>>> sleep(55)
>>> me.refresh_from_db()
Traceback (most recent call last):
[...]
OperationalError: (2013, 'Lost connection to MySQL server during query')

So what happens here?

I open a session to the database with the User.objects.get() call that populates the me object.
Then I wait 40 seconds. That’s comfortably short of the 50-second HAproxy timeout.
Now when I run me.refresh_from_db(), the session is still alive and the call completes without error. The timeout clock resets at this stage, and I could keep going like this ad infinitum, as long as I sleep() (or keep busy) for less than 50 seconds.
However, I next wait 55 seconds, causing HAProxy to terminate the connection.
And then, refresh_from_db() breaks immediately with the 2013 error.

Note that if I run refresh_from_db() — or any other operation that touches the database – again, I get a different error (2016, expected at this point), but I don’t get my database connection back:

>>> me.refresh_from_db()
Traceback (most recent call last):
[...]
OperationalError: (2006, 'MySQL server has gone away')

What I have to do instead is close my connection first:

>>> from django.db import connection
>>> connection.close()

… and then, when I run anything else that requires a database query, Django will happily reconnect for me.

>>> me.refresh_from_db()

HAProxy timeouts getting in the way of your Celery tasks

Now how does this relate to a real-world application? Suppose you have a long-running Celery task with database updates or queries at the beginning and end of something complicated, like so:

from celery import Task
from model import Thing

class ComplexOperation(Task)
   """Task that does very complex things"""

   def run(self, **kwargs):
     thing = Thing.objects.get(pk=kwargs['pk'])
     do_something_really_long_and_complicated()
     thing.save()

In this case,

we retrieve data from the database into memory, populating our thing object,
then we do something very complex with it — suppose this can take on the order of minutes, in the extreme,
and finally, we take the modified data for our in-memory object, and persist it back to the database.

So far, so simple. However, now assume that while you’re executing the do_something_really_long_and_complicated() method, something bad happens to your database. Say you restarted one of your MySQL or MariaDB processes, or one of your nodes died altogether. Your database cluster is still alive, but your session, which was very much alive during the call that populated thing, is dead by the time you want to make the thing.save() call.

Depending on what actually happened, you’d see one of the following two OperationalError instances:

Either an immediate 2006, MySQL server has gone away — this is is what you’d see if the MySQL server was shut down or restarted. That’s a graceful session teardown, and it’s not what I want to focus on in this article.
Or, and this is what I want to discuss further here, 2013, Lost connection to MySQL server during query. You normally don’t get this as a result of something breaking at the other end of the connection, but rather in between. In our case, that would be HAProxy. Let’s look at our code snippet with a few extra comments:

from celery import Task
from model import Thing

class ComplexOperation(Task)
   """Task that does very complex things"""

   def run(self, **kwargs):
     thing = Thing.objects.get(pk=kwargs['pk'])
     # Right here (after the query is complete) is where HAproxy starts its
     # timeout clock

     # Suppose this takes 60 seconds (10 seconds longer than the default 
     # HAProxy timeout)

     do_something_really_long_and_complicated()

     # Then by the time we get here, HAProxy has torn down the connection,
     # and we get a 2013 error.
     thing.save()

So now that we’ve identified the problem, how do we solve it? Well that depends greatly on the following questions:

Are you the developer, meaning you can fix this in code, but you can’t change much in the infrastructure?
Or are you a systems person, who can control all aspects of the infrastructure, but you don’t have leverage over the code?

If you have control over neither code nor infrastructure, you’re out of luck. If you call all the shots about both, you get to pick and choose. But here are your options.

Fixing this in code

If it’s your codebase, and you want to make it robust so it runs in any MySQL/Galera environment behind HAProxy, no matter its configuration, you have a couple of ways to do it.

Keep connections shorter

One way to do it is do keep your database connections alive for such a short time that you practically never hit the HAProxy timeouts. Thankfully, Django auto-reconnects to your database any time it needs to do something, so the only thing you need to worry about here is closing connections — reopening them is automatic. For example:

from django.db import connection
from model import Thing

class ComplexOperation(Task)
   """Task that does very complex things"""

   def run(self, **kwargs):
     thing = Thing.objects.get(pk=kwargs['pk'])
     # Close connection immediately
     connection.close()

     # Suppose this takes 60 seconds.
     do_something_really_long_and_complicated()

     # Here, we just get a new connection.
     thing.save()

Catch OperationalErrors

The other option is to just wing it, and catch the errors. Here’s a deliberately overtrivialized example:

from django.db import connection
from django.db.utils import OperationalError
from model import Thing

class ComplexOperation(Task)
   """Task that does very complex things"""

   def run(self, **kwargs):
     thing = Thing.objects.get(pk=kwargs['pk'])
     # Right here (after the query is complete) is where HAproxy starts its
     # timeout clock

     # Suppose this takes 60 seconds.
     do_something_really_long_and_complicated()

     # Then by the time we get here, HAProxy has torn down the connection,
     # and we get a 2013 error, which we’ll want to catch.
     try:
       thing.save()
     except OperationalError:
       # It’s now necessary to disconnect (and reconnect automatically),
       # because if we don’t then all we do is turn a 2013 into a 2006.
       connection.close()
       thing.save()

Now of course, you’d never actually implement it this way, because the one-time retry is far too trivial, so you probably want to retry up to n times, but with exponential backoff or some such — in detail, this becomes complicated really quickly.

You probably also want some logging to catch this.

In short, you probably don’t want to hand-craft this, but instead rely on something like the retry() decorator from tenacity, which can conveniently provide all those things, plus the reconnect, without cluttering your code too much.

Fixing this in infrastructure

You may be unable to control this sort of thing in your code — because, for example, it’s a codebase you’re not allowed to touch, or you’re less than comfortable with the idea of scouring or profiling your code for long-running codepaths between database queries, and sprinkling connection.close() statements around.

In that case, you can fix your HAProxy configuration instead. Again, the variables you’ll want to set are

timeout server and
timeout client.

You’ll probably want to set them to an identical value, which should be the maximum length of your database-manipulating Celery task, and then ample room to spare.

The maximum reasonable value that you can set here is that of your backend server’s wait_timeout configuration variable, which defaults to 8 hours.

Careful though, while MySQL interprets timeout settings in seconds by default, HAProxy defaults to milliseconds. You’d thus need to translate the 28800 default value for MySQL’s wait_timeout into a timeout server|client value of 28000000 for HAProxy, or else you set the HAProxy timeout to a value of 28800s (or 8h, if you prefer).

Background research contribution credit for this post goes to my City Network colleagues Elena Lindqvist and Phillip Dale, plus Zane Bitter for the tenacity suggestion.

Also, thanks to Murat Koç for suggesting to clarify the supported time formats in HAProxy.

tag:xahteiwi.eu,2020-05-06:/resources/hints-and-kinks/chewy-celery/

Why do they always lie?

Florian Haas Feb 4, 2020 Updated Feb 4, 2020

When politicians (and their supporters) keep lying even though their lies are easily exposed, it’s a strategy.

Show full content

Recently I came across a tweet from my Irish OpenStack community friend Dave Neary, in which he wondered aloud why a picture, which was very obviously (and poorly) doctored, made its way onto Twitter. As if, so goes the reasoning, the creator of the picture was ass enough to assume that no-one would notice.

It’s so obvious […] I just don’t understand why you’d bother.

More generally, you can summarize this befuddlement by rephrasing the question as follows: “why would anyone, in a political campaign even, run with a lie that’s so easily called out?”

This assumes that lying, deception, is something you’d prefer to go undetected. And generally that’s true, human behavior works exactly that way: when we lie and deceive — and humans do this all them time for many reasons, some of them benign — the deception only works if it isn’t caught.

Why do they lie, when it’s so easy to tell?

Then, what makes humans lie and spread falsehoods, even when they’re easily detected?

I submit that to understand why, you should play a game.

The game is called The Evolution of Trust, and its creator is Nicky Case. You can play it online, it’s available in multiple languages. And it will take only about 15 minutes to complete. Go play it. No, really do.

Did you play it? No? Well it’s here. Please go play it.

Done? OK. Let’s carry on.

What does The Evolution of Trust tell us?

The game-theoretical concepts that The Evolution of Trust introduces tell you three things about its simple game of cooperation and defection (playing by the rules vs. cheating):

If communications between players are perfect, then the most successful strategy is tit-for-tat (“copycat”).
If communications are imperfect, then the most successful strategy is tit-for-two-tats (or “tit for tat with forgiveness,” or “copykitten”) — up to a certain error rate in communications.
If communications are imperfect beyond that error rate threshold, then the most successful strategy is to always cheat (“cheater”).

Now, real human communications are always messy, so the perfect communications scenario is out the window. We’re always dealing with imperfect communications, but we never know how high our error rate is.

And most of us are brought up with the Golden Rule and a certain measure of forgiveness. We tend to be “copykittens.”

But now put yourself in the shoes of someone who has decided they’ll take the cheater role. They’ll not play by the rules, they’ll only ever fend for themselves and their own, everyone else be damned. They’ve chosen the asshole route.

Their problem is, they can’t win. Game theory literally tells us that the forgiveness strategy is superior. Except if the cheater manages to increase the error rate beyond the threshold.

If you’ve decided you want to be an asshole, lie. It’s your only chance.

So once you’ve decided that you’ll not play by the rules, your only shot at winning is to destroy communications — for everyone.

And your best shot at that is to lie. Never tell the truth, contradict objective facts, say the stupidest, dumbest, most blatantly false things. Small lies, large lies, medium-sized lies. It does not matter if your lie is exposed, in fact, your lies must be exposed for your strategy to work.

And now you know how to spot someone in politics who has decided to break the rules. And why you shouldn’t assume they’re stupid, just because they say objectively stupid things.

tag:xahteiwi.eu,2020-02-04:/blog/2020/02/04/why-do-they-always-lie/

Paying People

Florian Haas Feb 4, 2020 Updated Feb 4, 2020

Paying people equally is straightforward, right? We all agree on that, and then once we’re talking about people living in different countries, we all disagree on what “equal” really means.

Show full content

In my now almost 10ish years of first running my own company, and then managing a team in the company I sold my company to, I have very frequently struggled with what’s a “fair” way of paying people.

I’ve come to the conclusion that when it comes to hiring distributed, and thus hiring people living in different countries, there is no such thing. At least not one that everyone agrees on to be fair. The best thing we can hope for is an approximation of fairness.

How much do you earn?

If I ask you how much you make, you’ll either

be very offended at the invasion of your privacy, and refuse to answer, or
give me an answer straight away.

Which it is, is likely going to be determined by your culture and your upbringing. I am Austrian, I have interacted a lot with Americans, and now I work with Swedes — I can tell you, there are very different views on this.

Whether or not you choose to answer, you’ll likely have a number in your head. And again, there are multiple ways you’ll think of that.

It could be your annual salary, before income tax.
It could be your monthly salary, before income tax and public healthcare and pension payments.
It could be really weird, as in Austria: you’d think of your “monthly” salary, but in reality that’s 1/14th rather than 1/12th of your annual salary, because you’re paid a double salary in June and November.

But whichever it is, you’d probably think of a number, in your home currency.

Equal work, equal pay. Right?

I hope we generally agree that two people who do the same (or equivalent) work should be paid the same. And as long as you’re comparing two people, living in the same place, whose salary is paid out in the same currency, that’s easy.

But what about people from different countries? We can even assume that those two countries use the same currency, to facilitate the discussion. (Talking about different currencies makes things even more murky, and is perhaps a topic for another article.)

What’s the actual “worth” of the money you make?

There’s no correct answer for this. There are two possible approaches, both of which are “wrong” in a way and “right” in another. Let’s say we’re talking about two people being paid 3,000 euros a month, one living in Finland, one in Greece.

You could say that a euro is a euro. Thus, if you’ve got € 3,000 in your hand in Finland, that’s the same as having € 3,000 in Greece.
Or you could look at the question, what does that euro buy? If you’ve got € 3,000 in Finland, that will buy you things that, on average, you need to spend only about € 2,080 euros on in Greece.

So look at two people, one living in Finland and one in Greece. You pay them both € 3,000 for the same work. Are you paying them equally, or not?

I think you wouldn’t. If I had a person working for me in Finland, and I paid that person € 3,000, and another person in Greece that also made € 3,000 for the same work, I’d be massively short-changing the person in Finland.

But you can argue that € 3,000 is € 3,000 and that’s the end of the story. I tend to think that economists are on my side — that’s why the concept of purchasing power parity (PPP) exists —, but you can certainly be of the opposite opinion.1

And then there’s the problem that most PPP conversion rates are per-country. And they may be way off if you compare, say, someone living in Athens to someone living in Äteritsiputeritsipuolilautatsijänkä, to stick with the Greece and Finland example.

The fact of the matter is, neither approach is perfect, and I can only choose one approach or the other. And the approach that I’ve chosen, in the distributed team that I run, is to make PPP adjustments. Others opt for a different approach. Neither of these is right, or wrong. They’re both an attempt to treat your people equally, and neither is perfect.

I will put forward one small thought though: if you are convinced that the money you earn is just that number in your currency (as in, € 3,000 is € 3,000), then I posit that you should also stick to that opinion in the face of inflation. Under that assumption, then € 3,000 next year is still the same € 3,000 it was this year, regardless of whether inflation was maybe 5%. ↩

tag:xahteiwi.eu,2020-02-04:/blog/2020/02/04/paying-people/

Salacious Salad and Omelette

Florian Haas Jan 1, 2020 Updated Jan 1, 2020

A breakfast experiment that turned out well.

Show full content

Sometime last year I came across a post on /r/food that I seem to be unable to dig up. However, I recall that it had an infographic asserting that for a salad to be perfect, it had to have

something tangy,
something sweet,
something crunchy,
some protein (egg or meat),
some dairy (yoghurt or cheese).

So I goofed around preparing breakfast one Sunday morning, and out came this.

Ingredients

Amounts are per person. Multiply as needed.

Salad:

A fistful of rocket leaves (arugula).
Some grapes, the smaller the better.
A few cherry tomatoes. Best fresh off the vine in the middle of summer, even better if you can mix red, yellow, and purple varieties.
A few basil leaves.
A small amount of your favorite cold cut, say a thin slice or two of prosciutto crudo or bresaola (or Bündnerfleisch, if you want to get super fancy).
A few thin slices of Parmigiano or Grana Padano. Cutting them off the piece with a potato peeler works really well.
Pinch of black sesame seeds.

Vinaigrette:

1 teaspoon balsamic vinegar
Pinch of salt
A few turns of freshly ground black pepper
Half a teaspoon of mustard
Half a teaspoon of honey
A dash of something spicy, if you’re into that sort of thing. Sambal oelek or sriracha sauce will work just fine.
3 teaspoons olive oil

Omelette:

1 egg
Pinch of salt
Butter (for frying)

Equipment

1 medium-size bowl
2 small bowls
Whisk
Small frying pan

Method

Prepare the tomatoes: using a properly sharp kitchen knife, cut them into halves, quarters, or slices. Chuck them into a small bowl and sprinkle them rather liberally with salt and black pepper. Stack the basil leaves on top of each other and roll them up like tobacco leaves for a cigar, then cut the rolls into slices as thinly as you can. Throw the thin basil strands into the bowl, add some olive oil, and mix thoroughly using a spoon or your hands. Let the bowl sit on the countertop for five minutes or so, so that the tomatoes start oozing a bit of juice.
Make the vinaigrette: in a wide-enough bowl, put in salt, pepper, mustard, honey, and balsamic vinegar, and optionally the spicy condiment. Whisk to mix nicely, then add olive oil and whisk vigorously for 30-60 seconds. The goal is to get an opaque emulsion, with the mustard and honey acting as natural emulsifiers.
Throw the rocket leaves and grapes into the vinaigrette bowl and mix thoroughly with (cleanly washed) hands. Let sit for a few minutes so that the vinaigrette can infuse the leaves.
Crack egg into a small bowl, beat thoroughly with the whisk so the mixture is homogenous. Add a bit of salt.
Make sure your serving plate, pieces of meat, cheese shavings, and black sesame are close by (if you’re a mise-en-place enthusiast, you’ve likely done this already) — once the omelette is ready, you want to serve things up quickly.
Heat up some butter in the small pan.
Pour the beaten egg in and cook your omelet. It will be very thin so this should take only a minute, perhaps two.
Throw the omelet on the plate (don’t fold it) and then start stacking: rocket and grapes in vinaigrette at the bottom, then tomatoes in olive oil and basil, meat, cheese shavings finally sesame on top.
Serve.

Nutrition facts

No warranty of any kind on these. Values are per serving.

Calories (kcal) 343 Total fat (g) 28.1 Saturated fat (g) 8.8 Total carbohydrates (g) 12.1 Sugars (g) 8.2 Protein (g) 12.9

tag:xahteiwi.eu,2020-01-01:/blog/2020/01/01/salacious-salad-and-omelette/

My 2010s

Florian Haas Dec 31, 2019 Updated Dec 31, 2019

I guess at the end of a decade, it’s a good time as any to look back.

Show full content

This is my look back at the decade we’re just finishing up.1

Some stats

Besides managing to grow 10 years older, in this decade I founded 1 company, sold 1 company, left 1 company, joined 1 company, folded 0 companies, bankrupted 0 companies, raised precisely €0 of VC, provided an income to a growing family, kept the bank accounts in the black throughout, spent 836 days (2 years, 3 months, 2 weeks and 2 days) on the road, traveled 947,000 kilometers (that’s about the length of a circumlunar free return trajectory), visited 144 cities in 32 countries (including airport stopovers), and gave about 72 talks (on average a little over one every two months) at conferences and events.

hastexo

More than half the decade (6 years and 1 month, to be precise) is occupied by my tenure at hastexo, the company I co-founded and led from inception to acquisition by City Network. Coming off an excellent stint at Linbit — where I worked for 4.5 years, and which I’m pleased to report is still around, alive, independent and kicking (a rare feat in this industry) — my co-founders Martin, Andreas and I bootstrapped a company that operationally broke even in its third month, and earned its founding cost back in six (while providing us a livelihood out of operational revenue). Though we never went through any kind of meteoric rise or exponential growth — hardly a thing in professional service companies devoid of hockey sticks — we did make good calls in gaining a foothold in the Ceph and OpenStack communities early on, and quickly established a reputation as technical experts.

We parted ways three years in, and if we follow the analogy that that sort of thing is something like a divorce, this was a particularly amicable one. All of us are still on good terms, and even occasionally have the opportunity to collaborate.

hastexo also enabled me to meet my brilliant colleagues Adolfo (with whom I still work) and Syed (who has since done a career pivot and works in a completely different part of our industry).

City Network

Making the decision to sell hastexo to City Network in 2017 is something that I’ve never regretted. I generally get along very well with the Scandinavian approach to work — something that I had learned 10 years earlier when I had the pleasure of interacting regularly with then-independent MySQL AB —, and City Network is no exception here.

Obviously an integration into an acquirer is never entirely devoid of friction,2 particularly when a fully distributed team meets a previously fully office-driven company, but our colleagues turned out to be an excellent bunch and I’m on a very good working basis with my CEO Johan.

At City Network I’ve also finally succeeded in doing something I’d always failed at in the years prior, which is to have built a gender-balanced team. Namrata and Elena are fantastic assets in a highly professional, highly functional, and generally awesome group.

People

Apart from the excellent people I get to work with on a daily basis, I have met an astonishing number of utterly amazing folks in this decade. In fact, many of them have had so much influence on me that I find it hard to believe I didn’t even know them a decade ago. It’s straight-out impossible to list them all, so I’ve representatively picked three people here.

Sage Weil of Ceph fame is possibly the strongest pairing of brilliance and humility you’ll ever encounter. How many people do you know that made fuck-you money from something they thought up in their PhD thesis, then took some of that fuck-you money as a donation to their alma mater where it endows a professorship that their PhD advisor now holds?3 It’s an absolute privilege to know this guy.

Marco Ostini is, and will always be, the face of linux.conf.au for me, and I need to mention him here for his own sake and that of the community he is a part of. When I arrived for my very first LCA in 2011, as a completely inexperienced traveler, fatigued, jet-lagged and bleary-eyed as all hell, there was this wonderful Aussietalian with a beaming smile giving me the warmest of welcomes at half-past midnight and cheerfully gave me a lift to the accommodation. LCA 2011 was my best conference up to that point, ran what still appears to be my most popular conference talk (largely thanks to Tim Serong’s live cartooning), and kicked off a series of LCAs for me that were always a warm fuzzy shot-up-the-arm of Aussie and Kiwi hospitality in the middle of the dark European winter. Not to mention the talks. And Pac-Man. And hugs.

Sharone Revah Zitzman has been my constant and unbroken link to the Israeli cloud, open source, and DevOps community. Sharone is basically a conference organizing committee on two feet, and an incredibly nice and welcoming person to boot. I’ve forged many friendships in the Israeli developer community that would never have happened were it not for her, and I love coming back to her neck of the woods for that reason.

Honestly, I really dislike reading old emails (whether I sent them in private or to public mailing lists) from the early 2010s, because they remind me that more than occasionally I was abrasive to the point of being an outright jerk. I hope that that has improved somewhat.

I think I did pretty well in the “immersing myself in new technology and keeping current in it” department. At the start of the decade I knew next-to-nothing about Ceph, OpenStack hadn’t even started, and Open edX wasn’t yet under the AGPL. Today I feel kinda-sorta-OK in two of those, and not-quite-an-idiot in the other, which is about as happy as I’ll ever be with my limited knowledge of anything.

I’ve also finally allowed myself to feel reasonably comfortable about what I do as a manager, even if it means deviating from conventional wisdom or established precedent.

So let’s see what the next decade holds. Happy new year, everyone!

Yes, I know. It’s arbitrary. Gregorian calendar yada yada, plus the discussion whether the decade ends at the end of 2019, or the end of 2020. I don’t care. Now is as good as any time to look back and reflect. ↩
I should really do a conference talk for startup founders on what to expect when you’re being acquired one day. ↩
That’s pretty awesome academia bragging rights for the professor, too. ↩

tag:xahteiwi.eu,2019-12-31:/blog/2019/12/31/my-2010s/

Exceptional Pan Pizza

Florian Haas Dec 30, 2019 Updated Dec 30, 2019

My go-to recipe for pan pizza. Works best in a castiron pan, but really any large pan will do.

Show full content

I’ve found this to be the pan pizza to end all pan pizzas. It’s my standard pizza dough recipe, plus some inspiration from this video. The trick with seasoning the pan is an absolute kicker and makes the crust taste oodles better than without that seasoning.

Ingredients

Amounts are for one 26 cm pan that will likely feed 4 adults. Make several to accommodate a larger crowd, or very hungry people.

Crust:

200g plain spelt flour1
150g wholemeal spelt flour
150ml warm water2
10g fresh yeast (alternatively 7 grams dry yeast)
½ teaspoon honey (optional)
7g salt
good splash of olive oil, about 1-2 tablespoons

Sauce:

200g peeled and chopped canned tomatoes (alternatively a similar preparation from homegrown produce)
Pinch of salt
1 tablespoon of olive oil

Toppings:

100g cheese (mozzarella is canonical, but coarsely grated mild young Gouda works surprisingly well and you should give it a shot if you have it available)
spicy sausage, bell peppers, mushrooms, ham, whatever you fancy

Pan seasoning:

Splash of olive oil
Tablespoon of cornstarch
Pinch of salt
Ground black pepper
Oregano

Equipment

Required:

One large, thick-bottomed pan with a lid. Can be a cast iron skillet, or a non-stick pan, or a thick enamelled frying pan which is what I use. What’s important is that the bottom is at least 1cm thick, and that the handle can stand being under your oven broiler/grill for about 5 minutes.
Stovetop.
Oven with broiler/grill.

Optional:

Stand mixer or kitchen appliance with a dough hook. If unavailable, your hands will work just fine.

Method

Prepare a poolish: dissolve the yeast (and honey, if you like) in half of the warm water, add flour until the mixture is something like porridge. Put in a warm spot3 and let rise until the volume has doubled (15-30 minutes).
Add salt and poolish to the remainder of the flour in a bowl, knead and add enough water to make a homogeneous dough. Might take the full remaining 150ml, or less, depending on flour. Pour olive oil into bowl and and work some in, leaving the sides of the bowl nicely greased. Chuck bowl back into warm place and let rise for another 15-20 minutes.
While dough rises, season the pan. Pour in olive oil, add cornstarch, and rub the mixture with your fingers over the whole inside of the pan — both the bottom and the walls. Sprinkle salt, pepper, and oregano into the pan.
Gently spread the dough into a flat round piece roughly the diameter of the bottom of the pan. You can use a rolling pin for this, but if you do, make sure to grease the rolling pin and countertop with olive oil, rather than dusting them with flour like you’re perhaps used to.
Let the dough rise one more time, about 15 minutes.
Prepare the sauce. Simply mix chopped tomatoes with a good pinch of salt and some olive oil.
Spread the sauce all across the dough, covering the whole diameter of the pan. Do not leave any uncovered crust on the perimeter. Repeat with cheese and finally, toppings.
Turn burner on medium heat, put the pan on (cover it with a lid), and cook for approximately 8-10 minutes. The trapped steam will cook the sauce and toppings on top, while the bottom of the pan bakes the crust. Preheat your broiler to high heat.
When the cheese on top has started to melt, throw the pan under the broiler/grill for about 3 minutes until the cheese gets nice patches of golden brown.
Turn pizza out on a round pizza plate. It should easily come off the bottom of the pan, though the sides might require some scraping if cheese has melted and run down the sides. Cut into slices.
Dig in.

Nutrition facts

No warranty of any kind on these. Values are per serving, counting one serving as one-quarter of the whole pizza.

Calories (kcal) 479 Total fat (g) 17.0 Saturated fat (g) 6.7 Total carbohydrates (g) 64.6 Sugars (g) 3.5 Protein (g) 21.1

In case you’ve never baked with spelt flour before: tastes about like wheat, but takes on less water. You can modify this recipe to use wheat flour, in which case you’ll need about 375ml of water. Also, while a wheat dough normally benefits from a long or slow rise, I’ve found that not to be true for spelt. ↩
If you’re from the U.S.: yes I know, we Europeans are a bit weird in that we customarily give some quantities by weight, others by volume. You’d think it’d be straightforward that we do solids by weight and liquids by volume, but it isn’t. (Some recipes specify sugar by weight, for example, others say use so-and-so-many tablespoons of sugar.) ↩
My oven has a leavening mode in which I can let a dough rise at approximately 39°C and near 100% humidity, which is glorious, but this is in no way a requirement. I’ve let dough rise in a bowl placed on the floor (we have floor heating), on the running laundry dryer, or out on the countertop. ↩

tag:xahteiwi.eu,2019-12-30:/blog/2019/12/30/exceptional-pan-pizza/

DevOpsDays Tel Aviv 2019

Florian Haas Dec 20, 2019 Updated Dec 20, 2019

I presented two talks at DevOpsDays Tel Aviv 2019: one 40-minute full-length talk, and a 5-minute Ignite.

Show full content

This year, DevOpsDays Tel Aviv accepted two of my submitted talks:

No really, don’t chuck everything in Slack: Communications for distributed teams

This is a 40-minute talk that I presented after keynotes on day 2. It deals with the specific challenges that distributed teams face and solve, and has a bunch of ready-to-go suggestions to communicate better as a distributed team.

I had two surprises in this talk:

A large number of people still appear to be unfamiliar with the term naked ping, even though just about everyone is very familiar with the antipattern itself. It resulted in an “oh so that’s what that’s called!” reaction from a significant share of the audience.
I usually try to not throw shade in my talks. But if and when I do it’s usually about Scrum, which I continue to consider a patently ludicrous idea. I did mention Scrum in a negative manner in my talk, and got a rather unexpected round of mid-talk applause. (My talks generally tend to be rather matter-of-factly; mid-sentence applause is not something I’m used to.) Speaking to a crowd that skewed hard toward the software engineering profession, this always gets me thinking: engineers understand that Scrum is horrible; when will their managers catch on?

The video for this talk is forthcoming, but for now you can find my slides (with full speaker notes) on GitHub.

Five is Fine: A case for small teams

This was a 5-minute talk delivered as part of a round of “Ignite” talks. I use scare quotes because DevOpsDaysTLV uses a somewhat relaxed Ignite format: you must deliver your talk in 5 minutes, but you’re not restricted to the exact number of 20 slides, and your slides also don’t auto-advance every 15 seconds. I did not know this, so I did follow the original Ignite format, using reveal.js autoSlide to advance my slides every 15000 ms.

These talks were also recorded, and the recording should become available relatively shortly (I will update this post when they do). As with the other talk, the slides are available on GitHub and include my notes.

I’d like to thank Rachel for suggesting that I write this talk.

Know a conference that might like these talks?

If you organize a conference that might be interested in including these talks, or you’ve attended one that you think might, please find me on Twitter and let me know. I’ll be happy to submit one of them for consideration. I could definitely expand the Ignite talk into a full standard-length talk — doing the reverse for the other one might be a bit challenging, though.

tag:xahteiwi.eu,2019-12-20:/resources/presentations/devopsdays-tel-aviv-2019/

Slidecraft updates

Florian Haas Dec 13, 2019 Updated Dec 13, 2019

I’ve been doing public talks and presentations rather frequently for the last 10-or-so years, but this year I made significant changes to my process for creating, rehearsing, and presenting talks.

Show full content

I’ve been doing public talks and presentations rather frequently for the last 10-or-so years, but this year I made significant changes to my process for creating, rehearsing, and presenting talks.

What I used to do

When I started doing talks a while back, I followed an approach that many of us were at some point either taught, or adopted by emulating our peers:

I would roughly sketch an outline,
then I’d create slides (usually on a company or conference template),
then I’d add some speaker notes in bullet-point fashion,
then I’d rehearse the talk,
and finally I’d refine my content based on whether I was short or long in terms of the available time slot.

That’s a very conventional approach, and it tends to focus very much on getting the slides right.

Today I do things differently.

What I do now

I’ve come to the conclusion that what I really want my audience to focus on is what I am talking about. So now, I do this:

I still start with a rough — very rough — outline.
Next though, I write my speaker notes. All of them. Yes, that means I write out my entire talk, and this may well take days.
Then, I do a first practice run. Is there a good natural flow? Will it make sense to someone completely unfamiliar with my topic? Is the story I’m telling logical?
Then, I edit. This process of alternating rehearsal and edits continues until I’m reasonably happy with the whole talk, and I have timed it and am happy with my pacing, too.
Only then do I start creating my slides, and I usually completely disregard conference templates, for reasons I’ll get to in a moment.

Yes, that means that when I ultimately deliver the talk, I’m actually reading from my notes. Except when I’m just riffing and ad-libbing over them. Chances are, unless you know me very well, you’ll be unable to tell. That’s because I write my notes like I talk, and I pay more attention to flow and stress and rhythm than I do to grammar and exactitude. This talk, just like this, and this were all delivered from fully written speaker notes.

Fully writing out my talk has also enabled me to greatly reduce my use of fillers (like “ah” and “um”), which I used to say excessively and which would make me cringe at my talk videos.

Accessibility

Now to explain why I normally disregard conference templates: In preparing and delivering my talks, I try to put a greater emphasis on accessibility that I used to before.

my slides are now all high-contrast. I default to using black text on a white background (this is a good default for when I have reason to assume the projection equipment will be less than perfect), alternatively I use white text on a black background (for darkened rooms).
I try to accommodate people with the most common types of color blindness: I use blue — as opposed to red or green – as my highlight color, and in charts, I differentiate by both color and line stroke. I also try to refrain from using animations. As for slide transitions, I use only fade, no motion.
I publish my slides ahead of time, include a QR code at the very beginning, and I use reveal.js Multiplex so that my slides advance in unison with those one people’s phones, tablets, or laptops. This way, people with less-than-perfect vision (or simply seated in an unfortunate spot, at the back of the room) can follow along easily and at their own preferred zoom level.
The published slides include a menu with a theme switcher. This is to accommodate another group of attendees, namely photosensitive people who cannot stare at bright screens for a long time (this can trigger intensely painful migraines). A person who is photosensitive can follow along on their own device, using the dark theme.
Since my speaker notes are fully written out, this means that I can also include them in the published material, so that my notes can act as subtitles for my speaking. This can come in handy to people who are hard of hearing, or who are simply unaccustomed to my accent or manner of speech.

I’ve rolled these accessibility considerations into my opinionated, Cookiecutter-based reveal.js presentation generator.

A request

If you happen to be visually impaired, or color blind, or hard of hearing, or you work with people who are – in other words, if you can make a suggestion for me to improve the accessibility of my slides, please file an issue against my Cookiecutter and I’ll try to work that in as best I can. Thank you!

tag:xahteiwi.eu,2019-12-13:/blog/2019/12/13/slidecraft/

Ceph Erasure Code Overhead Mathematics

Florian Haas Nov 30, 2019 Updated Nov 30, 2019

In a Ceph cluster, the frequent question, “how much space utilization overhead does my EC profile cause,” can be answered with very simple algebra.

Show full content

So you’re running a Ceph cluster, and you want to create pools using erasure codes, but you’re not quite sure of exactly how much extra space you’re going to save, and whether or not that’s worth the performance penalty? Here’s a simple recipe for calculating that space overhead.

Suppose a RADOS object has a size of $S$, and because it’s in an EC pool using the jerasure or isa plugin,1 Ceph splits it into $k$ equally-sized chunks. Then the size of any of its $k$ chunks will be:

$$S \over k$$

In addition, we get $m$ more parity chunks, also of size $S \over k$.

Thus, the total amount of storage taken by an object of size $S$ is:

$$k \cdot {S \over k} + m \cdot {S \over k}$$

This of course we can rearrange and reduce to

$$S + S \cdot {m \over k}$$

$$S \cdot (1 + {m \over k})$$

In other words, the overhead (that is, the additional storage taken up by the EC parity data) is

$$S \cdot {m \over k}$$

or when expressed as a proportion to $S$, simply

$$m \over k$$

As an example, an EC profile with $k = 8, m=3$ comes with a storage overhead of $3 \over 8$ or 37.5%.

One with $k=5, m=2$ has an overhead of $2 \over 5$, or 40%.

And finally, a replicated (conventional, non-EC) pool with 3 replicas can be thought of as having a degenerate EC profile with $k=1, m=2$, resulting in an overhead of $2 \over 1$, or 200%.

On a parting note, you should realize that the space utilization overhead is only one factor by which you should weigh erasure code profiles against one another. The other is performance. Here, the general (deliberately oversimplified) rule is that the more chunks you define — in other words, the higher your $k$ — the higher the performance penalty you suffer, particularly on reads.2 This is due to the fact that in order to reconstruct the object and serve it to the application, your client must collect data from $k$ different OSDs and assemble it locally.3

Thanks to Lars Marowsky-Bree for reminding me that slightly different arithmetics apply to the lrc, shec, and clay plugins. ↩
Thanks to Lenz Grimmer for pointing out that the post should make this clear. ↩
If you want to know more about erasure codes and their history, not limited to their use in Ceph, Danny Abukalam did an interesting talk on the subject at OpenStack Days Nordic 2019. ↩

tag:xahteiwi.eu,2019-11-30:/resources/hints-and-kinks/ceph-ec-math/

The Little Bag Of Tricks: 10 Things You Might Not Know You Can Do With OpenStack

Florian Haas Nov 5, 2019 Updated Nov 5, 2019

Show full content

My presentation from the Open Infrastructure Summit 2019 in Shanghai.

Video: YouTube
Slides: GitHub

Use the arrow keys to navigate through the presentation, hit Esc to zoom out for an overview, or just advance by hitting the spacebar.

tag:xahteiwi.eu,2019-11-05:/resources/presentations/the-little-bag-of-tricks-10-things-you-might-not-know-you-can-do-with-openstack/

Using ftrace to trace function calls from qemu-guest-agent

Florian Haas Aug 21, 2019 Updated Aug 21, 2019

When you are using functionality that is buried deep in the Linux kernel, ftrace can be extremely useful. Here are some suggestions on how to use it, using the example of tracing function calls from qemu-guest-agent.

What’s this about?

Recently I used, for the first time, libvirt’s functionality …

Show full content

What’s this about?

Recently I used, for the first time, libvirt’s functionality to indicate to a virtual guest that it is about to have a point-in-time copy of its disks — a snapshot — taken. In doing so, it can tell the virtual machine (VM) to freeze I/O on all its mounted filesystems.

The rationale behind this is, I hope, obvious: you want the VM to momentarily stop I/O to its virtual disks, so that you can take a snapshot when no I/O is in-flight, and the snapshot image can thus be expected to be internally consistent. The snapshot itself will only take a second or so, and the minor interruption is a small price to pay for the added consistency guarantee you get.

You might be wondering how this works and it is, indeed, a bit involved.

First, you’ll need a virtual serial console that allows the hypervisor (in the host) to communicate with the guest. This will be defined in your libvirt domain XML, and in OpenStack Nova, this automatically pops up if you are booting your instance off an image which has the hw_qemu_guest_agent=yes property set.
Then, you’ll need a daemon within the guest that listens for commands received over the serial port. This daemon is called qemu-guest-agent, or qemu-ga for short. All you’ll need for it to run is to install the package of that name, which you can do in various ways (apt-get install qemu-guest-agent being the simplest, on Ubuntu guests).
One of the many commands that said daemon supports is guest-fsfreeze-freeze. When it receives that command over the virtual serial link, the daemon will loop over your mounted filesystems, and issue the FIFREEZE ioctl on all of them. This happens in reverse order, meaning your root (/) filesystem is frozen last.
That ioctl then calls the freeze_super() kernel function, which flushes each filesystem’s superblock, blocks (“freezes”) all new I/O to the filesystem, and syncs (flushes) all I/O that is currently in flight on that filesystem.

The combined net effect of all of the above is that you get a virtual machine that is temporarily read-only, with pending I/O piling up, until you are done taking your snapshot. When that happens, there are a few more actions that happen:

The hypervisor sends the guest-fsfreeze-thaw command over the virtual serial link. Now, the daemon will loop over all your mounted filesystems again, and issue the FITHAW ioctl on them. This time, it is taking the mounts in forward order, thawing the root filesystem first.
That ioctl then calls the thaw_super() kernel function, which unblocks (“thaws”) all new I/O to the filesystem, and allows the VM to continue normal operations.

Now there’s a bit of an issue with that. All of the aforementioned kernel functions only write printk’s on error, but they don’t tell you when they succeed. So you can try a snapshot, then type dmesg in the guest, and you’ll have no way of telling whether the whole freeze/thaw dance succeeded, or was never even attempted.

But fear not, there’s a way that you can trace exactly what the kernel is doing!

tracefs, and configuring ftrace

If your guest runs any modern kernel, then chances are that it will, by default, mount a virtual tracefs filesystem to the /sys/kernel/debug/tracing mount point (although as of kernel 4.1, this is nominally an alias, with /sys/kernel/tracing being the canonical mount point). Regardless of its path, tracefs exposes the kernel’s ftrace functionality.

So the first thing you’ll tell ftrace, in your guest VM, is the process for which you’ll want to do function tracing. In our case, that’s your guest’s qemu-ga. So, you can do:

pidof qemu-ga > /sys/kernel/debug/tracing/set_ftrace_pid

Then, you’ll want to instruct ftrace to trace kernel function calls:

echo "function" > /sys/kernel/debug/tracing/current_tracer

And, you’ll want to make sure that we don’t trace only function calls from qemu-ga itself, but also from its child processes:

echo "function-fork" > /sys/kernel/debug/tracing/trace_options

Let’s see what’s happening!

Now you have a guest that’s properly instrumented for tracing kernel function calls that originate with qemu-ga. So now, go ahead and take a snapshot. On OpenStack Nova, you’d do that with:

openstack server image create --name <image-name> <instance-name>

Then, shell back into your guest, and interrogate your trace for ioctl calls:

grep -E '(freeze|thaw)_super.*ioctl' /sys/kernel/debug/tracing/trace

And voilà:

         qemu-ga-14574 [001] ....   264.059109: freeze_super <-do_vfs_ioctl
         qemu-ga-14574 [001] ....   265.837955: thaw_super <-do_vfs_ioctl
         qemu-ga-14574 [001] ....   265.855048: thaw_super <-do_vfs_ioctl
         qemu-ga-14574 [001] ....   265.855084: thaw_super <-do_vfs_ioctl

So that’s the FIFREEZE ioctl that maps to freeze_super(), and the FITHAW ioctl that maps to thaw_super(). And that’s how you know that your guest is freezing and thawing I/O as you expect it to!

Where to go from here

Feel free to dig further into your trace file (cat or less will help), and play with other ftrace options. There’s a massive amount of things you can do with it, as the documentation explains. You’ll probably also find this blog post from Julia Evans useful for exploring ftrace.

Also, thank Steven Rostedt when you see him! He is the primary author of the ftrace framework.

tag:xahteiwi.eu,2019-08-21:/resources/hints-and-kinks/ftrace-qemu-ga/

Learn Complex Skills, From Anywhere: Combining Django, Ansible and OpenStack to teach any tech skill

Florian Haas Aug 12, 2019 Updated Aug 12, 2019

A talk I submitted to PyCon AU 2018, linux.conf.au 2019, and PyCon DE 2019.

Show full content

This is a talk I submitted1 to three separate conferences:

PyCon AU 2018, via an anonymized CfP process using PaperCall. This submission was rejected.
linux.conf.au 2019, which used a non-anonymized CfP process on a custom platform that, I think, is built on Symposion. That submission was accepted, and the talk ran in the main conference programme.
PyCon DE 2019, via a non-anonymized CfP process using pretalx. This submission was rejected.

It’s the linux.conf.au submission that is reflected in this page.

Title

Learn Complex Skills, From Anywhere: Combining Django, Ansible and OpenStack to teach any tech skill

Target Audience

Community

Abstract

This will appear in the conference programme. Up to about 500 words. This field is rendered with the monospace font Hack with whitespace preserved

Professional skill-building is challenging, particularly when the skill to acquire is about distributed, scalable platform technology. In this talk, I cover an open-source skill-building platform that is 100% Python: built on Open edX and heavily involving Django, Ansible, and OpenStack.

The information technology industry is currently dealing with an interesting challenge in professional skill-building: almost every new technology developed in recent years has been complex, distributed, and built for scale: Kubernetes, Ceph, and OpenStack can serve as just a few representative examples. Loose coupling, asynchronicity, and elasticity are just some of the qualities frequently found in such systems that were entirely absent in many of the systems we built only a few years ago. As a result, people comfortable with building and operating these complex systems are hardly found in abundance, and organisations frequently struggle to adopt these technologies as a direct result of this scarcity: we are dealing with a skills gap, not a technology gap.

This means that we need novel ways to educate professionals on these technologies. We must provide professional learners with complex, distributed systems to use as realistic learning environments, and we must enable them to learn from anywhere, at any time, and on their own pace. One excellent way of doing this is to use the capabilities of the Open edX platform to integrate a learning management system with hands-on, on-demand lab environments that can be just as complex, and just as distributed, as production systems. This allows anyone interested to develop a professional skill set on novel technology at minimal cost, and without the need for costly hardware platforms for evaluation.

In this talk, I will give a rapid technical introduction to the core components of this free and open source (AGPL 3/ASL 2) all-Python platform:

edx-platform, the core learning management system (LMS) and content management system (CMS), built on Django;
edx-configuration, the automated deployment facility to roll out the Open edX platform, built on Ansible;
and finally, the Open edX XBlock extension system and its integration with OpenStack, also itself an all-Python cloud platform, in order to provide on-demand lab environments from both private and public cloud environments.

Private Abstract

This will only be shown to organisers and reviewers. You should provide any details about your proposal that you don’t want to be public here. This field is rendered with the monospace font Hack with whitespace preserved

I come from a background in technical consulting and instructor-driven professional education, and together with my team have been building and deploying Open edX based platforms as described in the talk since 2015. I believe I have a good understanding on why instructor-driven training, while desirable, is not accessible to everyone in need of keeping abreast with technology development, and that a self-paced, learn-from-anywhere alternative is needed. I am extremely grateful for the fact that we have an very well-suited platform for that purpose, and since it has a completely open-source, Python codebase, it might be of interest to LCA attendees.

I have done a talk on a similar topic at the LCA 2018 Education miniconf (video link included below). In the 2018 talk, I focused primarily on the educational aspects of self-paced, on-line training. This time, and I think more appropriately for the main conference track, I would like to dive into the nuts and bolts of the platform that drives this. As such, the talk should still be appealing to people engaged in professional education (be it as learners, tutors, or instructional designers), but will also be insightful to Python and OpenStack developers, and heavy Ansible users.

Video URL

https://www.youtube.com/watch?v=E8BhTAjMwa4

If you’re curious why this is here, please read this. ↩

tag:xahteiwi.eu,2019-08-12:/talk-submissions/lca-2019-openedx/

Team meetings

florian Jun 6, 2019 Updated Jun 6, 2019

Distributed teams need to meet in person every once in a while. Here are some thoughts and suggestions on team meetings.

Show full content

I’ve run a distributed team — nominally the same team, though through people joining and leaving I am the sole original band member at this point — for almost 8 years. And about 3 years in was the first time we were sufficiently spread and had enough money to spare to warrant flying everyone to the same place for one week per year. We have been doing that ever since. Every year, it’s an exceedingly enlightening and pleasant experience.

And this year was the first time that we did not one but two team meetings back to back: first with my core team running education services at City Network, in the Hellerup neighborhood in Copenhagen, and then as part of the complete City Network all-hands meeting on the island of Tjärö in the Blekinge archipelago.

So here is how we do these things.

Accommodation

Except for the very first team meeting we did in 2014 when we put everyone up in hotel rooms, we’ve always rented a house. I’m a believer in small team sizes — 5 people being the maximum number of direct reports a leader can realistically have —, so that means that we can still find houses where everyone has a room to themselves, and nobody has to queue for a shower.

I consider both of these things extremely important. Many, many people working in tech are introverts, even more so for people working from home in tech. Many of us find the experience of constantly being around people emotionally draining, and we need solitude to recharge. Also, privacy. Our personal lives don’t stop while we’re having a team meeting, so you might want to have a room in which you can talk about your kid’s health issue with your spouse, without concern about colleagues overhearing the conversation.

Price-wise, since this puts us in an accommodation category that can qualify as a penthouse or mansion, this won’t be significantly cheaper than hotel accommodation — but it won’t be more expensive either, and it’ll be a much nicer experience. Particularly if you also have the joy of interacting with an exceedingly pleasant, nice, and helpful person for a host.

As for who scouts the accommodation, my rule has always been this:

In case we have a person on the team who lives, or has lived, in the city we’re going to, and who thus is very familiar with the locality and surroundings, I delegate the search to them. You can never beat local personal experience.
Otherwise, I do the scouting on the web, and I usually run it by the team — after I have made the booking, but while we can still cancel or change.

Arrival

Whoever did the scouting for the accommodation travels a day early, gets the key, settles in, and reports back. This has multiple purposes:

If they are local and so is the host, of course that’ll facilitate matters a lot. Particularly if we’re in a country where that one team member speaks the language, and the rest of us don’t.
If there’s something seriously wrong with the property, they can still cry foul and we can make other arrangements while we’re not all congregated in the same spot. (This has never happened to us, but just in case.)
That person is our support backstop who can field issues and customer questions, while everyone else is in the air.
That person can also meet and collect people at the airport or train station, if getting to our accommodation is nontrivial or difficult. Also, this is particularly helpful when we have a new team member who is perhaps less travel experienced.

First day

I normally don’t schedule any formal work sessions for the first day. People will be jet-lagged and fatigued from travel, we are often re-meeting in person for the first time in a year, and there is a lot of catching up to do about family, hobbies, recent travel, and all sorts of things that are not work. Somebody might be new and we might see them for the first time ever, in person.

And then of course people might be delayed in travel, may have had flights cancelled, or may have missed connections. So that means if you’re actually planning the first day to be full of “work” sessions, you ran a significant chance of your schedule getting wrecked by a flight delay. So we just don’t do that. Instead, we try to make the first day as relaxed and as enjoyable as possible, including a nice first group dinner.

And then sometimes, as happened this year, a remarkably serious and productive discussion over work issues ensues over a glass of wine in the evening. But that’s not part of the expectation.

Work sessions

We tend to have half-day work sessions these days, where we focus on one topic for 3-4 hours straight. These can get intense, and sometimes heated, but they are nearly always very, very productive.

We typically use a place like our house’s kitchen (if it’s roomy, and has a table), or patio (if it’s bearable outside), or sitting room (if it’s cozy) for work sessions. They are usually quite analog, with frequently just one person — the assigned record-keeper, often me — with a laptop open to take notes and record the discussion and its outcomes. I’ve found that on occasion, when discussing complex issues, working with a roll of brown paper and thick felt-tip markers (“sharpies” for you Americans out there) can be much more useful than with anything pushing bits.

Food

I enjoy food, and I’ve never worked with anyone who doesn’t. So we make that part enjoyable in whatever way we fancy. We might go for lunch to the neighborhood bagel store that our host recommended for their excellent pastrami. We might jump on a train to get to a street food spot, or head out for pizza or tacos or curry.

And I’m buying. These meetings are for work, my team is traveling for just that purpose, so whenever we eat together (and we practically always do), the tab is on me. What I can put on the company and what comes out of my own pocket is my job to sort out later, but we’re definitely not going Dutch.

Interesting things

Our meetings are usually in an interesting city with art, architecture, and history, and not seeing any of that would be a bit of a waste. So there’s usually maybe two to three things that we just go and do. It could be a harbor or river cruise, a visit to a castle or palace, a bicycle tour criss-crossing the city, or a museum visit. Depends a bit on the weather and a bit on individual interest.

Epilogue: going larger

So what works for a 5-person team clearly doesn’t work for a whole company, not least because you’ll be hard pressed to find a rental home with 40 bedrooms — I guess such a dwelling would be appropriately referred to as a palace, and last I checked the Queen wasn’t on Airbnb.

But you can take a page out of City Network’s playbook and do something else, which is to book an island. Yes, you read that right, immediately after our team meeting we packed up and boarded a train to join our company all-hands meeting, in which we had an island practically to ourselves.

tag:xahteiwi.eu,2019-06-06:/blog/2019/06/06/team-meetings/

Learn Ceph — For Fun, For Real, For Free!

Florian Haas May 25, 2019 Updated May 25, 2019

Show full content

My lightning talk from Cephalocon 2019.

Video: YouTube

tag:xahteiwi.eu,2019-05-25:/resources/presentations/learn-ceph-for-fun-for-real-for-free/

Geographical Redundancy with rbd-mirror

Florian Haas May 21, 2019 Updated May 21, 2019

Show full content

My presentation from Cephalocon 2019.

Video: YouTube
Slides (with full speaker notes): GitHub

Use the PgUp/PgDown keys to navigate through the presentation, hit Esc to zoom out for an overview, or just advance by hitting the spacebar.

tag:xahteiwi.eu,2019-05-21:/resources/presentations/geographical-redundancy-with-rbd-mirror/

I Don’t Think This Means What You Think It Means: Red Herrings in OpenStack

Florian Haas May 8, 2019 Updated May 8, 2019

A talk I submitted to OpenInfra Days Nordics 2019.

Show full content

This is a talk I proposed1 for OpenInfra Days Nordics, via a non-anonymized CfP process using PaperCall.

Title

I Don’t Think This Means What You Think It Means: Red Herrings in OpenStack

Elevator Pitch

You have 300 characters to sell your talk. This is known as the “elevator pitch”. Make it as exciting and enticing as possible.

OpenStack’s complexity comes with operational challenges. And in situations where OpenStack misbehaves, it is frequently non-trivial to find the actual cause of an issue. This talk includes several examples of red herrings in OpenStack, and suggestions for spotting and avoiding them.

Talk Format

Talk (>30-45 minutes)

Audience Level

All

Description

This field supports Markdown. The description will be seen by reviewers during the CFP process and may eventually be seen by the attendees of the event.

You should make the description of your talk as compelling and exciting as possible. Remember, you’re selling both the organizers of the events to select your talk, as well as trying to convince attendees your talk is the one they should see.

When working with OpenStack, you deal with an environment that is inherently complex. As with all complex environments, things sometimes go wrong or behave unexpectedly. And when that happens, your immediate goal is to locate, pinpoint, and then troubleshoot the issue.

And then, sometimes, you go down the dead-wrong path, and end up chasing a red herring for some time, before you find the real problem. This talk contains examples of such red herrings, enabling you to recognize and avoid them.

This talk is both for those who run an OpenStack cloud, and those who consume its functionality as a service. It talks about both red herrings in OpenStack operations, and red herrings in operating applications on OpenStack.

Notes

This field supports Markdown. Notes will only be seen by reviewers during the CFP process. This is where you should explain things such as technical requirements, why you’re the best person to speak on this subject, etc…

I’ve been working on OpenStack since 2012, have consulted on lots of private and public cloud deployments using OpenStack, and I work for the operator of a multi-region global OpenStack Cloud. “I’ve seen things you people wouldn’t believe. Attack ships on fire off the shoulder of Orion…”

In addition to what I have seen, others have seen other things, which is why I am crowdsourcing the content of this talk. That being so, the talk proposal is public, and I am asking people on Twitter to send me their stories, which I will add to and mix with my own, with due attribution.

Just to give one example of what I would like to cover, see this article on my web site, which talks about how you can run into what looks like a quota issue in Neutron, but whose cause is in fact buried deep in RFC 5798.

Tags

Tag your talk to make it easier for event organizers to be able to find. Examples are “ruby, javascript, rails”.

OpenStack, Operations

If you’re curious why this is here, please read this. ↩

tag:xahteiwi.eu,2019-05-08:/talk-submissions/oidn-2019-red-herrings/

One For All: Using Terraform to manage OpenStack and Kubernetes resources

Florian Haas May 7, 2019 Updated May 7, 2019

A workshop I submitted to Open Infra Days Nordics 2019.

Show full content

This is a workshop I proposed1 for OpenInfra Days Nordics, via a non-anonymized CfP process using PaperCall.

Title

One For All: Using Terraform to manage OpenStack and Kubernetes resources

Elevator Pitch

You have 300 characters to sell your talk. This is known as the “elevator pitch”. Make it as exciting and enticing as possible.

A hands-on introduction to Terraform in an OpenStack and Kubernetes context. Get the basics (of Terraform), then spin up a Kubernetes cluster in an OpenStack public cloud (with Terraform), and manage resources on it (with Terraform).

Talk Format

Workshop (>60 minutes)

Audience Level

Intermediate

Description

This field supports Markdown. The description will be seen by reviewers during the CFP process and may eventually be seen by the attendees of the event.

You should make the description of your talk as compelling and exciting as possible. Remember, you’re selling both the organizers of the events to select your talk, as well as trying to convince attendees your talk is the one they should see.

If you are interested in deployment automation for arbitrarily complex containerized microservice applications, this is for you!

In this workshop, you will

get to know the basics of Terraform and Terraform configurations,
spin up a Kubernetes cluster with Terraform, using the OpenStack provider and interfacing with OpenStack Magnum in a public cloud,
start managing Kubernetes resources from Terraform, using the Kubernetes provider.

You’ll walk away with a solid understanding of Terraform’s capabilities, enabling you to make an informed decision of whether Terraform is a suitable deployment automation facility for your organization’s needs.

Prior Terraform knowledge is not required.

Notes

This field supports Markdown. Notes will only be seen by reviewers during the CFP process. This is where you should explain things such as technical requirements, why you’re the best person to speak on this subject, etc…

There are no technical requirements other than internet connectivity, and a web browser (preferably on a laptop, though a reasonably-sized tablet with a modern browser should work as well).

Tags

Tag your talk to make it easier for event organizers to be able to find. Examples are “ruby, javascript, rails”.

Terraform, OpenStack, Kubernetes, Magnum

If you’re curious why this is here, please read this. ↩

tag:xahteiwi.eu,2019-05-07:/talk-submissions/oidn-2019-terraform/

Configuring CLI output verbosity with logging and argparse

Florian Haas May 1, 2019 Updated May 1, 2019

Command-line interfaces frequently produce output whose verbosity your users may want to be able to tweak. Here’s a nifty way to do that.

Show full content

In a Python command-line interface (CLI) utility, you will want to inform your users about what your program is doing. Your will also want to give your users the ability to tweak how verbose that output is. Now there is a de-facto standard convention for doing that, which most CLIs — Python or otherwise — tend to adhere to:

By default, show messages only about errors and warning conditions.
Define a -v or --verbose option that makes your program also show messages that are merely informative in nature.
Optionally, allow users to repeat the -v option, making the program even more verbose (to include, for example, debug output).
Conversely, also define a -q or --quiet (alternatively -s/--silent) option that, when set, makes the program suppress warnings and show only errors — i.e. the stuff that your program shows if it exits with a nonzero exit code.
Log output that tells users about what the program is doing, as it goes along, to the standard error (stderr) stream, whereas the output related to the program’s results goes to standard output (stdout). This gives your users the ability to pipe stdout to a file or another program, and your progress or status messages won’t interfere with that.

And in Python it’s not at all difficult to do that!

argparse options

First, we’ll want to define a couple of options for our argparse.ArgumentParser object, which in the following snippet I’ve named parser. Define two options, like so:1

parser.add_argument('-v', '--verbose',
                    action='count',
                    dest='verbosity',
                    default=0,
                    help="verbose output (repeat for increased verbosity)")
parser.add_argument('-q', '--quiet',
                    action='store_const',
                    const=-1,
                    default=0,
                    dest='verbosity',
                    help="quiet output (show errors only)")

From this, we get two command-line options:

-v or --verbose, which can be repeated, sets verbosity, which defaults to 0. action='count' means that if you invoke your CLI with -v, verbosity is 1, -vv sets verbosity to 2, etc.
-q or --quiet also sets verbosity, but to a constant value, -1, via store_const.

Setting up the logging subsystem

What we’ll want to do is use the logging subsystem to send our status, progress, and error messages to stderr.

First, you can translate verbosity into a logging level understood by the logging module. Here’s a little convenience method that achieves that:

def setup_logging(verbosity):
    base_loglevel = 30
    verbosity = min(verbosity, 2)
    loglevel = base_loglevel - (verbosity * 10)
    logging.basicConfig(level=loglevel,
                        format='%(message)s')

Now what does this do? Python log levels go from 10 (logging.DEBUG) to 50 (logging.CRITICAL) in intervals of 10; our verbosity argument goes from -1 (-q) to 2 (-vv).2 We never want to suppress error and critical messages, and default to 30 (logging.WARNING). So we multiply verbosity by 10, and subtract that from our base loglevel of 30.

With -v, that sets our effective log level to 20 (logging.INFO); with -vv, to 10 (logging.DEBUG). And with -q (i.e. verbosity==-1), our log level becomes 40 (logging.ERROR).

Now we can use logging.basicConfig() to configure the logging subsystem to send unadorned log messages with the desired loglevel or above, to stderr: basicConfig(), by default, sets up a StreamHandler whose output stream is sys.stderr, so it already does what we want here. And setting format='%(message)s' strips the LEVEL:logger: prefix that basicConfig() would otherwise include in the log line (and which is helpful for log files, but not so much for CLI output).

From then on, every time your program should write an informational message to stderr, you just use logging.info(), for a debug message, logging.debug(), and so on.

Adding an environment variable

In some circumstances you might always want debug output, and invoking your CLI with -vv all the time might not be practical. (CI systems are an example — you generally want your build logs as verbose as possible.) You can make your users’ lives easier by optionally fixing up your logging subsystem with an environment variable, like so:

def setup_logging(verbosity):
    base_loglevel = int(os.getenv('LOGLEVEL', 30)) 
    verbosity = min(verbosity, 2)
    loglevel = base_loglevel - (verbosity * 10)
    logging.basicConfig(level=loglevel,
                        format='%(message)s')

This way, if you invoke your CLI with LOGLEVEL=10 in its environment, it will always use debug output.

Perhaps you’d like to make this even easier, allowing your users to also set LOGLEVEL to debug, INFO, erRoR and whatever else. That you could do like this:3

def setup_logging(verbosity):
    base_loglevel = gettattr(logging, 
                             (os.getenv('LOGLEVEL', 'WARNING')).upper()) 
    verbosity = min(verbosity, 2)
    loglevel = base_loglevel - (verbosity * 10)
    logging.basicConfig(level=loglevel,
                        format='%(message)s')

Parting thought

One of the many ways in which using logging comes in handy in a CLI is in a catch-all exception handler:

if __name__ == '__main__':
    try:
        main()
    except Exception as e:
        logging.error(str(e))
        logging.debug('', exc_info=True)
        try:
            sys.exit(e.errno)
        except AttributeError:
            sys.exit(1)

This way, unhandled exceptions will show merely the exception message by default, but if and only if debug logging is enabled, your users will also see a stack trace.

This is used here. ↩
There is, to the best of my knowledge, no way to limit the number of repeats for an argument with action='count'. Hence the construct with the min() built-in function. ↩
A variation of this is used here. ↩

tag:xahteiwi.eu,2019-05-01:/resources/hints-and-kinks/python-cli-logging-options/

Nonexisticon

Florian Haas Apr 27, 2019 Updated Apr 27, 2019

I have no experience with, and presently no plans for, running or organizing a grassroots knowledge-sharing conference. But if I did run one, and I could shape it exactly as I wanted, this might be what it would look like. Please be advised that I have no idea what I …

Show full content

Let’s talk about Nonexisticon, the non-existent conference. Just to benefit your reading flow, dear reader, I am using the indicative rather than the subjunctive mood, in other words, I use phrases like “Nonexisticon is” as opposed to “Nonexisticon would be”1. I trust that you are not confused by this, as the name very clearly implies that the conference does not exist.

Unifying Theme

Nonexisticon’s theme is freely shared knowledge work. Nonexisticon brings together open-data researchers, open-source software and hardware engineers and designers, Creative Commoners, documentarians, writers, artists, educators. Basically, if you create something with your mind and you make your creation freely available for anyone to use, enhance, modify and build upon, you’re welcome at Nonexisticon.

Nonexisticon is also a mutually supportive conference, where corporate-funded or financially secure attendees can commit to supporting less-privileged ones.

Location and Reach

Nonexisticon is a regional conference. “Region” in this context means an area from which the conference attendees can reasonably travel to the conference location, using environmentally friendly transportation modes (such as high-speed rail). Attendance from outside the region is welcome, but for environmental reasons it is not encouraged.

Nonexisticon is typically held on a university campus, or other suitable facility, in a location well accessible by public transportation. Cheap accommodation is typically available on-campus.

Supporting Organization

Nonexisticon is organized by the Nonexisticon Association, a not-for-profit organization that allows only individual, not corporate membership. The organization is staffed by volunteers, that is to say, while it is authorized to reimburse members of the conference team for expenses incurred and income lost as a direct result of conference work, it does not pay salaries to its staff.

Nonexisticon registration includes membership in this organization, with full voting rights, for a period of two years.

Recurrence and Length

Any regional Nonexisticon has its own recurrence schedule, but the conference is never held in the same region more than once per year.

Each Nonexisticon has a general theme. Themes can be quite diverse and are generally broadly defined, so as to attract individuals from many disciplines and backgrounds (as long as they relate to the unifying theme of freely shared knowledge work). Examples for conference themes are “cancer research”, “the Python programming language”, “high-energy physics”, or “the arts in education.”

Conference Committee

Nonexisticon is run by a Conference Committee, which is a group of 5 individuals headed by a Conference Director. The Conference Committee serves for the run-up, duration, and wind-down of one Nonexisticon, oversees the appointment of the succeeding Committee, and acts as advisors to the incoming Committee.

Would-be Conference Committees prepare a bid for the next Nonexisticon. Bids specify the conference location, dates, venue, capacity, theme, proposed sponsorship opportunities, and budget. If a bid is uncontested, the prior Conference Committee merely assesses the bid for plausibility and compliance with formal criteria, and accepts the bid (thereby also appointing the new Conference Committee). If more than one bid exists, the outgoing Conference Committee organizes an on-line vote among the Association membership, using a preferential voting system. The winning bid then results in the appointment of the new Conference Committee.

Presentations

Nonexisticon is a single-track conference, with one presentation slot length available: 30 minutes. Nonexisticon typically runs over the course of three days, with talks scheduled between 09:00 (9am) and 18:00 (6pm) local time.

This means that after deducting conference opening and closing remarks, keynotes, and breaks, Nonexisticon has 38 talks2 in total.

Q&A time is at the speaker’s discretion; forgoing Q&A entirely is acceptable.

All presentations are open for rating by attendees for a short time period of 5 minutes prior to, and 15 minutes after scheduled conclusion.

Presentations are recorded by a professional A/V team, and publicly released under a permissive license.

Presentation Proposals

Nonexisticon uses an anonymized call for proposals (CFP), conducted online, using an open-source conference platform. The Conference Committee defines the format of the CFP proper, including the questions posed at submitters. The Conference Committee reviews all submitted proposals, anonymized, for formal compliance with the CFP only.

Nonexisticon attendance requires the submission of a presentation proposal. Thus, conference registration and presentation submission are one and the same process. The conference registration fee must be paid in full at the time of registration/submission.

Nonexisticon limits proposals to one per speaker/attendee. All presentations are solo; multi-presenter talks and panel discussions are not permitted.3

Underprivileged Attendee Fund

Prospective attendees unable to accommodate the registration fee, conference travel, or accommodation may apply, upon registration, to the Underprivileged Attendee Fund.4 If accepted (per decision by the Conference Committee), the speaker/attendee is invited to attend the conference free of charge, and their submitted presentation is included in proposal review.

The Underprivileged Attendee Fund is endowed by

corporate conference sponsorship,
donations to the Nonexisticon Association,
profits carried over from prior conferences,
donations from regular speaker-attendees who voluntarily put up double the regular application fee upon their own registration/proposal submission.

Review process and talk selection

Talks are selected by all attendees/speakers during a time-limited selection period using a modified Borda count (MBC) ranking process, as pioneered by the scientific community for allocating observation time on astronomical telescopes.

In a nutshell, every attendee is assigned a small, randomly selected set of proposals (about 10, and excluding their own submission) to review. They then rank these submissions not in an order of subjectively “best” to “worst”, but from most beneficial to the overall attendee community to least beneficial to the overall attendee community. This results in an overall preliminary ranking of submissions, which is then compared to each attendee’s individual ranking. A high degree of agreement of an individual attendee’s ranking with the overall preliminary tally results in additional points for the attendee’s own proposal; the contrary, in subtracted points. Ultimately, this produces a final, definitive ranking of all received proposals.5

The entire ranking process is automated using open-source software, and both the preliminary and the final ranking result are publicized to all attendees/submitters.

The submitters of the 48 top-ranked presentations (38 plus 10 backup/waitlist presentations) are refunded their registration fee upon acceptance.

If an accepted speaker needs to withdraw their talk, the next-ranked talk automatically moves up, and the speaker’s registration is simultaneously canceled.6

Budget and Sponsoring

Nonexisticon’s budget calls for a barebones conference (infrastructure only, no catering, no childcare) to break even solely on registration fees equivalent to two-thirds of the venue capacity. In case of registrations being in excess of this threshold, Nonexisticon funds childcare, refreshments, and catered lunch, in that order.7

Nonexisticon is open to sponsoring. Sponsoring, however, does not buy presentation slots, nor does it have any bearing on keynote selection. Sponsors can choose to contribute to infrastructure, catering, childcare, the Underprivileged Attendee Fund, and social events. Of these, social events are the only category open exclusively to sponsor funding; Nonexisticon does not spend registration fee revenue on social events.

If registrations do not meet the two-thirds threshold, and the budget shortfall cannot be compensated by sponsor contributions or profits carried over from prior conferences, the conference is cancelled and registration fees refunded.

Keynotes

Nonexisticon has one keynote, which opens the conference. The Conference Committee extends the keynote speaker invitation by consensus.

The closing “keynote” is a reprise of the highest-rated presentation in the conference.

Conference run-up timeline

Nonexisticon’s attendee-visible run-up cycle is 6 months.

Assuming a Nonexisticon is scheduled to run from May 15-17, the following schedule applies:

Date Time to conference Event Nov 15 6 months Conference Committee appointed. Date, location, and sponsorship opportunities announced. Dec 15 5 months Registration period / CFP commences. Jan 15 4 months Registration period / CFP ends, also first sponsorship commitment deadline. Jan 22 3 months, 3 weeks Conference go/no-go call, based on registration and committed sponsorship. Jan 29 3 months, 2 weeks Deadline for rejection of submissions, by the conference committee, on formal grounds. Final decision on Underprivileged Attendee Fund applications. First stage of submission review process (anonymized free-form comments on talk submissions) commences. Feb 15 3 months First stage of submission review process ends, second stage (randomized-subset review and ranking) commences. Mar 1 2 months, 2 weeks Second stage of review ends, final ranking available. Selected speakers for rank 1-38 in final ranking receive notification of acceptance, as do speakers with submissions ranked 39-48 for waitlist/backup talks. Mar 15 2 months Final conference schedule published. Second and final deadline for sponsorships.

Disclaimer and acknowledgments

I’d like to reiterate that I have no experience whatsoever in running or putting on a conference, since the only time I’ve contributed to them as something other than a mere speaker, I’ve sat on proposal selection committees. So take all of what I wrote above with a mountain of salt, and consider it nothing more than semi-elaborate handwaving full of glaring omissions. But if you do want to give me some feedback, even it is simply telling me why my ideas are nuts — as opposed to just that they are — I’d be most grateful. Find me on Twitter or Mastodon.

That said, thanks to Tom Eastman for prompting me to put this in writing, to Professor Mike Merrifield for introducing me to the MBC approach, and to Brady Haran for making the Numberphile YouTube channel where I learned about it.

Dear grammar stickler, I am acutely aware that a phrase including would + infinitive is not a true subjunctive mood, but the use of a modal verb. Feel free to replace all such instances with a true subjunctive in your head. ↩
Don’t bother to check the talk arithmetic. It doesn’t matter whether it’s really 36 talks or 41 or 42. I just picked 38 as a reasonable, concrete number to work with. ↩
Disallowing multiple submissions from one person, and presentations with multiple speakers, is a necessary consequence of the talk selection process. Allowing only one submission per submitter also has the added benefit that prospective speakers can focus on one single talk and give the proposal their very best shot. ↩
Is there a better name for this? ↩
If you find this summary insufficient to explain the process but also don’t feel like plowing through the paper, here’s a video explanation, plus additional information. ↩
It is admittedly harsh to only be able to pull out of an accepted talk by pulling out of the conference altogether. I consider this a necessary evil to ensure that no attendee/submitter submit their proposal without genuine intent to present. ↩
Of these, I am most on the fence about childcare. Meaning it would probably be a good idea to always budget for child-care cost, even if that means a higher registration fee for everyone, and thus a slightly elevated risk of conference cancellation. ↩

tag:xahteiwi.eu,2019-04-27:/blog/2019/04/27/nonexisticon/

No, really, don't chuck everything in Slack: communications for distributed teams

Florian Haas Apr 24, 2019 Updated Apr 24, 2019

A talk I submitted to linux.conf.au 2019.

Show full content

This is a talk I submitted1 to linux.conf.au 2019. The conference uses a non-anonymized CfP process on a custom platform that, I think, is built on Symposion.

This talk was accepted as a standby talk, meaning it was slated to fill the gap if any other talk had to be cancelled on short notice. I did prepare the full talk, though it did not end up presenting it at the conference.

I also submitted the talk to DjangoCon Europe 2019, where it was rejected.

Title

No, really, don’t chuck everything in Slack: communications for distributed teams

Target Audience

Business

Abstract

This will appear in the conference programme. Up to about 500 words. This field is rendered with the monospace font Hack with whitespace preserved

This is a personal story. It does not claim to be rooted in statistical analysis or scientific rigour, and the evidence presented is anecdotal. But it might be insightful to anyone joining, leaving, or interacting with a remote team.

From 2011 to 2017, I ran a company that had no office. Everyone worked from home, and apart from an annual one-week face to face meeting, all our communications were remote. In 2017, I sold my company and integrated my team into a company that had previously been working exclusively out of a single office. As one would expect, the integration was not without friction (they never are), but what emerged from the experience was a better understanding of the challenges that come with a mixed office/remote work environment, and some rules to address them. In this talk, I’ll cover:

Typical misconceptions that remoties have about office-workers, and vice versa
Using the right tools for the right type of communications: interactive chat, email, wiki, issue trackers, Kanban boards
Timezones, and communications around scheduling
The 5-paragraph format, a simple tool I habitually use to make sure everyone is on the same page
Follow-up and follow-through, and how to make sure neither you, nor your team, nor your boss loses sight of what needs doing

Private Abstract

This will only be shown to organisers and reviewers. You should provide any details about your proposal that you don’t want to be public here. This field is rendered with the monospace font Hack with whitespace preserved

It’s probably good for me to reiterate that this is not a scientific study. :)

If you’re curious why this is here, please read this. ↩

tag:xahteiwi.eu,2019-04-24:/talk-submissions/lca-2019-slack/

Writing for learners: best practices for creating, developing, and maintaining self-paced learning resources

Florian Haas Apr 23, 2019 Updated Apr 23, 2019

A talk I submitted to Write The Docs Prague, 2019.

Show full content

This is a talk I submitted1 to Write The Docs Prague, September 15-17, 2019. The conference uses a non-anonymized CfP process with a simple Google Form. The CfP page is here.

Talk title

Writing for learners: best practices for creating, developing, and maintaining self-paced learning resources.

Talk abstract

More information here is better. Submitting a single paragraph won’t give us much to go on, but please no walls of text.

This presentation talks about a special kind of tech writing: creating and maintaining self-paced technical training content. This encompasses both prose for theoretical background information, and instructions for hands-on labs.

In this talk, I’ll go over

special challenges (and advantages!) of self-directed over instructor-driven training
video content, and why we don’t do it
our rules for structuring theoretical content
our approach for interactive lab instructions

This is rooted in 4 years’ experience in writing courseware used for learning complex technical topics (like OpenStack, Kubernetes, Ceph, and others) on an Open edX platform.

You’ll find the concepts discussed in this talk useful if you write courseware (of course), but I’d say they equally apply whenever you find yourself writing any content that is instructive, rather than descriptive.

Who and why

Who is this talk for? What background knowledge or experience do you expect the audience to have? What is the take away from the talk?

This talk is for anyone who, in a technical context, uses imperatives. I’ve been at least a part-time tech writer for the better part of the last 10 years, but I’ve written software documentation for only 4 of those – the majority of the remainder I’ve written courseware content instead.

Writing prose training content and lab instructions comes with its own unique challenges and its own (explicit or implicit) style guide. Since as courseware authors we communicate with our learners exclusively in writing, I believe I have good practices to share with other documentarians who might occasionally find themselves in the situation of writing lab instructions and test scenario descriptions, and I think there are equally many things that can learn from other talks and speakers.

Other Information

Any other information that might be interesting for us to know about you? Give a lightning talk last year, speak at a WTD meetup, or anything else interesting? Add it here.

I’ve never spoken at any Write The Docs conference, though I have done talks and workshops at multiple instances of linux.conf.au, LinuxCon (now Open Source Summit), OpenStack Summit (now Open Infrastructure Summit), the Open edX Conference, and others.

My team and I maintain City Cloud Academy (academy.citycloud.com), which runs on Open edX.

If you’re curious why this is here, please read this. ↩

tag:xahteiwi.eu,2019-04-23:/talk-submissions/wtd-prague-2019/

Talk submissions

Florian Haas Apr 23, 2019 Updated Apr 23, 2019

I put talk submissions on this site, regardless of whether they get accepted or not. Here’s why.

Show full content

I have spoken at tech conferences for the better part of a decade, and up to this point I have only ever published my talks after I’ve actually presented them. I’ll change that from here on out, and I’ll instead record any talk that I submit to a conference instead, regardless of whether it ultimately gets accepted or not. I do this for several reasons.

I’d like to have a record of my talk submissions for my own reference purposes.

It’s rather remarkable how many conference talk submission systems exist that make it rather difficult to retrieve your submitted abstract a few months or years after the conference. Some event websites exist specifically for one instance of a particular conference, so they might be taken down a few months after, and unaccepted abstracts are usually not accessible publicly — so even getting them via the Internet Archive is not an option. Others record talk submissions via Google Forms, into a private Google Sheet, and don’t send an autoreply containing the full submission.

So, I use this site for having my own record of talks I submit.
I don’t want to reinforce the illusion that just because I’m an experienced speaker, all talks I submit anywhere get accepted.

I get talks rejected all the time, particularly from conferences I’d really enjoy speaking at. This is normal, and if any of you reading this are less experienced and find rejections discouraging, I want you to know that they happen to all of us.
I am curious about other people’s thoughts.

As a speaker, it is pretty hard to get good feedback on a talk submission. Very few conferences send you detailed feedback on rejected talks. And none at all, to the best of my knowledge, send you qualitative feedback on accepted ones (other than “congrats, you’re in!”).

So I figure that if I publish my submissions here, perhaps a few people might take a look and chip in some valuable thoughts. And even though this site doesn’t do comments, I am counting on Twitter and other social networks to spark some.
If you run a small conference or meetup that I don’t know about, I want to give you the opportunity to reach out to me if there’s a topic you’d like to hear me talk about.

There’s generally far more topics that I’d like to do talks on — and feel reasonably qualified to — than what I often get selected to speak about. In addition, there are just so many conferences and meetups out there that it’s impossible to keep track of all of them.

So, if you find a topic here that you’d really like at your event that I absolutely don’t have on my radar, do drop me a line.
If you find any of my talk ideas useful, go ahead and submit your own talk like it.

Like almost all content on this site, the talk submissions I record here are CC-BY-SA licensed, so as long as you include an acknowledgment, and reciprocate by sharing your own talk, please do consider yourself encouraged to build your own talk ideas from mine.

I’ll make one exception from this rule: some conferences use an anonymized talk selection process, where submissions must be devoid of any information that might be remotely likely to identify the speaker. If I submit a talk to such a conference, I’ll only put the abstract up here when the selection process is over, the schedule stands, and I have received a definitive acceptance or rejection notice. However, in case I am re-submitting a talk previously given at (or previously submitted to) a different conference, I won’t be removing that article.

You can find my (continuously updated) list of talk submissions here, and there’s also an Atom feed, here.

tag:xahteiwi.eu,2019-04-23:/blog/2019/04/23/talk-submissions/

If you’re a leader in tech, “non-technical” is not a free pass

Florian Haas Apr 21, 2019 Updated Apr 21, 2019

There’s a specific use of the term “non-technical,” and that’s applying it to oneself, as a cop-out. If you’re a leader in tech, you don’t get to do that.

Show full content

The excellent Josh Simmons recently implored people on Twitter 1 to stop using the term “non-technical” when talking about another person’s skill set. And as far as that term is often used as a put-down of others, I am completely with Josh. Belittling someone because they work in documentation, corporate leadership, marketing, middle management, PR, advocacy, legal (etc. etc.), and because you consider yourself somehow superior because you’re in a “technical” role — that has got to stop, yesterday.

However, I would like to amend his plea to also decry the use of the phrase about oneself, as a cop-out. I am talking about people in leadership roles saying “I’m not technical” or “I’m not a tech person” to allege that they have bigger fish to fry, and cannot be reasonably expected to understand technical detail.

And that has to stop yesterday, too.

Leadership roles exist that require understanding of technical detail. If you’re the CEO of a tech company, you need to understand the technology your company makes. If you’re in charge of a product, you need to understand the technology that makes up that product. And even if your company’s core business has nothing to do with technology, but you are in charge of something that does, you need to understand that technology.

Now of course it’s nigh impossible to understand every bit of technology that you need to make decisions on, in its every intricate detail. But you will be faced with decisions that do boil down to specific technology details. And then, it is incumbent on you to know as much as you need to know, to make an informed decision. This is a core element of a leadership position, and it makes up at least part of your income differential versus a person who is a subject matter expert in their field, but doesn’t manage other people.

Whenever you have an issue to decide on that you don’t understand, get someone to explain it to you. It’s perfectly fine to say “I know nothing about this bit of technology, please give me your simplest explanation that will enable me to make a decision.” But you don’t get to say “I’m not a technical person” and use that as a pass for, and perpetuation of, your self-inflicted ignorance.

(By the way, the same is obviously true in reverse. Say you’ve got the tech-person perspective and someone from legal comes up to you with a question on licensing or patents or international contract law that you know zilch about? Same thing. “Please give me your simplest explanation of this matter that will enable me to make a decision.”)

There is a recently popular spin on the “it’s OK to be non-technical” cop-out argument, which is to over-emphasize communication skills over tech skills. It starts with a truism — a person’s technical skills provide little benefit unless paired with communications proficiency. But then this is frequently flipped to claim that technical understanding can be replaced with communications skills, because the former allegedly pales in importance versus the latter.

Let me break something to you: except in the rare case of a job where someone works entirely on their own and also has no customers,2 communication skills are every person’s most important skills. Yes of course your communication skills are more important than your tech skills, because they’re literally more important than anything.

However, if you’re a great communicator but you don’t know what you’re talking about, all that makes you is a bullshit peddler. And, if you’re actually incapable of listening to experts who are able and usually very willing to explain a complex matter to you, maybe you’re not such a grand communicator after all, either.

So if you’re in a leadership position that is even remotely tech-related, and you’ve ever used “I’m not technical” as a free pass to not understand things, stop. It isn’t.

The post has since been deleted off Twitter (as have mine, incidentally), and Josh has reposted it to Mastodon. ↩
Yes I am aware that this is exceedingly rare. ↩

tag:xahteiwi.eu,2019-04-21:/blog/2019/04/21/non-technical/

Why upload filters don’t work (really simple math!)

Florian Haas Mar 25, 2019 Updated Mar 25, 2019

“I can’t figure out how upload filters should work, but I’m not a technical person — surely someone who is can sort it out!”

That is a misconception. I’ll be happy to explain, requiring — I promise! — no technical understanding of what an upload filter is, or how it …

Show full content

“I can’t figure out how upload filters should work, but I’m not a technical person — surely someone who is can sort it out!”

That is a misconception. I’ll be happy to explain, requiring — I promise! — no technical understanding of what an upload filter is, or how it works.

The current draft of the EU Directive “on copyright and related rights in the Digital Single Market”, available here, (PDF in English), also known as the EU Copyright Directive, requires in its Article 17 (formerly Article 13), clause 4, that the service provider undergoes,

in accordance with high industry standards of professional diligence, best efforts to ensure the unavailability of specific works and other subject matter for which the rightholders have provided the service providers with the relevant and necessary information.

It is obvious that the only way any platform hosting user-generated content would thus have to intercept any such content on upload, failing which it would immediately become potentially liable for a copyright violation. This would require a technical facility commonly called an upload filter.

We don’t need to talk about how upload filters work

Now, an upload filter is immensely complex and there are tons of technical difficulties — the only time this has been attempted on a large scale is YouTube’s Content ID, and it is exceedingly unreliable and prone to overblocking. But for the purposes of this discussion, it doesn’t matter whether implementing an upload filter is difficult to do.1

Assume for a moment someone has built a magnificent upload filter. Something that operates on magic pixie dust that catches all copyright violations.

Interactions

Now, let’s call every instance of someone uploading content to the internet an “interaction”. Every tweet, every Facebook post and comment, every comment on your favorite news site, every blog post you write, every picture that you take and post to a WhatsApp group of 50 people or more, every YouTube video and comment — let’s call all of those “interactions.”

And let’s make an outrageously overblown assumption: suppose that on the internet today, 1% of such interactions infringe someone’s copyright. Again, let me reiterate that this is ludicrously high. The vast majority of internet interactions today are either completely trivial and thus irrelevant to copyright, or works of your own, or a perfectly legal fair-use way of using someone else’s work, such as when you quote a passage of a book. But purely for the sake of this discussion, let’s say it’s 1%.

So then let’s look at 10,000 interactions that completely random people make on the internet.

Those would then break down like so:

Total Perfectly legal 9,900 Infringing copyright 100 Catching copyright violations. Or non-violations.

OK. Now, suppose we built a perfect upload filter, i.e. one that catches all copyright infringements. Remember, the Directive calls for “best effort to ensure the unavailability” (emphasis mine) of potentially infringing content. It does not allow providers to balance for freedom of expression or the like, so to err on the side of caution, they must strive to over- rather than underblock. So a perfect filter is one that has no false negatives — meaning if content infringes, it is always caught.

Now, suppose further that the filter mis-identifies content (meaning, flags content as infringing when it is not) with a rate of only 2%. That means it has 2% false positives. That, now, is ridiculously low for any automated screening procedure.2

So that means that out of our 10,000 interactions tracked by our “perfect” content filter, the numbers break down like this:

Total Flagged as legal Flagged as infringing Perfectly legal 9,900 9,802 198 Infringing copyright 100 0 100 Overall 10,000 9,802 298 Congratulations, a coin toss beats your upload filter.

That leads us to the question: if you upload something and it gets flagged, how likely is it that it is actually infringing any copyright? Answer: 100 in 298. Roughly one in three. Yes, that is worse than a coin toss. And remember, this is assuming an implausibly high rate of infringements overall, and a ludicrously low false-positive and false-negative rate on your filter.

Go ahead and play with the numbers, tweak the false-negatives and false-positives, whatever. As long as what you’re looking for is exceedingly rare, automated filters detect it with poor accuracy.

And if you leave all parameters the same, but consider a probably much more realistic infringement rate of 1 in 1000, that is, 0.1%, then things look like this:

Total Flagged as legal Flagged as infringing Perfectly legal 9,990 9,800 200 Infringing copyright 10 0 10 Overall 10,000 9,800 300

Now there’s a one-in-thirty chance that an upload block is legitimate. Assuming there is an appeals process, and all false positives get appealed, then that means the human going through the appeals will have to undo a block 29 times out of 30.

A cheap optimization

I’d like to propose an optimization here: any website seeking to implement a content filter should consider to just use a random number generator to reject your upload, comment, tweet, or post with a certain probability that is demonstrably larger than that of an upload filter. I’d posit that that would be by far the safest, cheapest way to comply with the directive — if it becomes law.

Of course, everyone who is now in favor of this directive (including its Article 17) will hate that.

Footnotes

It’s also easy to dismiss with a “try harder” retort, which is completely disingenuous, because it’s akin to saying, doc, this patient has terminal pancreatic cancer, but you must cure her. Inoperable? Terminal? No there’s got to be a way. Sometimes there is no way, and it’s OK when an expert tells you that. ↩
I don’t believe YouTube releases numbers on its ContentID error rate, but it’s apparently pretty bad for a system that cost $100M to build. ↩

tag:xahteiwi.eu,2019-03-25:/blog/2019/03/25/upload-filter-math/

Article 17: The time to act is now.

Florian Haas Mar 21, 2019 Updated Mar 21, 2019

Next Tuesday, the European Parliament is due to vote on something that will impact your life. Yes, yours.

Show full content

On Tuesday, March 26 at half-past noon Central European Time, the European Parliament is due to vote on an issue that will definitely impact your life, no matter if you live in or outside the EU. No, it has nothing to do with Brexit. Brexit has just managed to monopolise your attention. The Tuesday vote has much more far-reaching consequences.

What’s going on here?

On Tuesday, the EP will vote on a Directive “on copyright and related rights in the Digital Single Market”, the draft of which you can look up here (PDF in English), also known as the EU Copyright Directive.

Now this thing attempts to be a 21st-century copyright law, which is laudable, but it does absolutely awful things: go take a look at this video to get the quick run-down. It’ll only take 4 minutes of your time. And as you’ll see from the video, this law will have a devastating global impact on society at large. (While completely failing to achieve its ostensible goals, mind you.)

What is apparent from the discussion around this directive is that it isn’t being pushed by exceptionally clueful people. One MEP in favor of the directive once publicly surmised that he was dealing with a concerted campaign from Google, because lots of the email in his inbox came from gmail.com addresses. The rapporteur on the directive hasn’t subscribed to any YouTube channel and thinks that there is a Memes section on Google, and when properly roasted on Twitter, someone representing his party publicly doubled down on his behalf.

In the words of Janus Kopfstein:

It’s no longer OK to not know how the internet works.

Yes, he wrote that in 2011, and addressed at the U.S. Congress, but here we are, with our European representatives still needing that reminder 8 years on.

They even tried the oldest trick in the book: once all of Europe started screaming about Article 13, they renumbered. So what used to be Article 13 is now Article 17. Don’t get confused; this is just a ploy by people who print out emails.

Now luckily, the backers of Article 17 are opposed by a good bunch of people who are clued in, and have pledged to strike this Directive down in the EP on Tuesday.

OK. So what do I do?

You can sign a change.org petition. It already has 5 million supporters, and there will likely be many more. But supporting that petition is not enough.

You can check out your own MEPs for their status at Pledge 2019. And you can lean on them: Pledge 2019 lets you call the people whose job it is to represent you in the EP, and if they haven’t pledged to reject the Directive in its current form, you can let them know (in no uncertain terms) that you won’t be voting for them or their party in the upcoming EP elections in May.

Also, there’s Save Your Internet. The primary spin your pro-upload filter reps are trying to put on the discussion is that the legislation isn’t opposed by an appreciable number of real people, and that only astroturfing bots want this law struck down. You can write an email, send a letter (yes, a letter, as in, snail mail), or call them.

And finally, Save The Internet (yes, everyone’s, not just yours). March 23 is a day of pan-European protest against Article 17. Find a rally and go.

Using coverage with multiple parallel GitLab CI jobs

Florian Haas Mar 10, 2019 Updated Mar 10, 2019

If you ever write unit tests in Python, you are probably familiar with Ned Batchelder’s coverage tool. This article explains how you can use coverage in combination with tox and a GitLab CI pipeline, for coverage reports in your Python code.

Show full content

Running coverage from tox

Consider the following rather run-of-the mill tox configuration (nothing very spectacular here):

[tox]
envlist = py{27,35,36,37},flake8

[coverage:run]
parallel = True
include =
  bin/*
  my_package/*.py
  tests/*.py

[testenv]
commands =
    coverage run -m unittest discover tests {posargs}
deps =
    -rrequirements/setup.txt
    -rrequirements/test.txt

[testenv:flake8]
deps = -rrequirements/flake8.txt
commands = flake8 {posargs}

In this configuration, coverage run (which, remember, replaces python) invokes test auto-discovery from the unittest module. It looks for unit tests in the tests subdirectory, runs them, and keeps track of which lines were hit and missed by your unit tests.

The only slightly unusual bit is parallel = True in the [coverage:run] section. This instructs coverage to write its results not into one file, .coverage, but into multiple, named .coverage.<hostname>.<pid>.<randomnumber> — meaning you get separate results files for each coverage run.

Subsequently, you can combine your coverage data with coverage combine, and then do whatever you like with the combined data (coverage report, coverage html, etc.).

GitLab CI

Now there’s a bit of a difficulty with GitLab CI, which is that your individual tox testenvs will all run in completely different container instances. That means that you’ll run your py27 tests in one container, py35 in another, and so forth. But you can use GitLab CI job artifacts to pass your coverage data between one stage and another.

Here’s your build stage, which stores your coverage data in short-lived artifacts:

image: python

py27:
  image: 'python:2.7'
  stage: build
  script:
    - pip install tox
    - tox -e py27,flake8
  artifacts:
    paths:
      - .coverage*
    expire_in: 5 minutes

py35:
  image: 'python:3.5'
  stage: build
  script:
    - pip install tox
    - tox -e py35,flake8
  artifacts:
    paths:
      - .coverage*
    expire_in: 5 minutes

py36:
  image: 'python:3.6'
  stage: build
  script:
    - pip install tox
    - tox -e py36,flake8
  artifacts:
    paths:
      - .coverage*
    expire_in: 5 minutes

py37:
  image: 'python:3.7'
  stage: build
  script:
    - pip install tox
    - tox -e py37,flake8
  artifacts:
    paths:
      - .coverage*
    expire_in: 5 minutes

And here’s the test stage, with a single job that

combines your coverage data,
runs coverage report and parses the output — this is what goes into the coverage column of your GitLab job report,
runs coverage html and stores the resulting htmlcov directory into an artifact that you can download from GitLab for a week.

coverage:
  stage: test
  script:
    - pip install coverage
    - python -m coverage combine
    - python -m coverage html
    - python -m coverage report
  coverage: '/TOTAL.*\s+(\d+%)$/'
  artifacts:
    paths:
      - htmlcov
    expire_in: 1 week

tag:xahteiwi.eu,2019-03-10:/resources/hints-and-kinks/coverage-gitlab-ci/

Building a nested CLI parser from a dictionary

Florian Haas Mar 9, 2019 Updated Mar 9, 2019

Here’s a nice way to initialize a CLI argument parser in Python, with arbitrary levels of subcommands.

Show full content

If you’ve ever built a command-line interface in Python, you are surely familiar with the argparse module, which is part of the Python standard library. It contains the ArgumentParser class, instances of which are typically invoked from the CLI’s main() method.

The canonical way of doing this is explained in considerable detail in the standard library documentation. However, the standard way is quite repetitive, and you end up invoking parser.add_argument() a lot, as you populate your parent parser and subparsers with options.

Here’s a more concise way:

# If you must run this on Python 2. You really shouldn't!
from __future__ import print_function

from argparse import ArgumentParser

import yaml
import sys

# Using YAML here only for illustrative purposes, as it's a bit
# easier to read. You probably just want to use a dictionary outright.
#
# More at the bottom of this article.
# Yes, go read the bottom of this article.
#
# Want to just blindly copy and paste this snippet? Fine, this is for you.
assert(False)

PARSER_CONFIG_YAML="""
options:
  - 'flags': ['-V', '--version']
    action: version
    help: 'show version'
    version: '0.01'
subcommands:
- foo:
    options:
      - 'flags': ['-c', '--config']
        'help': 'YAML configuration file'
        dest: config
- bar:
    options:
      - 'flags': ['-o', '--output']
        'help': 'output file'
        dest: output
- baz:
    subcommands:
      - 'spam-eggs':
          options:
            - 'flags': ['-i', '--input']
              'help': 'input file'
              dest: input
"""

class CLI():

    def __init__(self):

        def walk_config(dictionary, parser):
            """Walk a dictionary and populate an ArgumentParser."""

            if 'options' in dictionary:
                for opt in dictionary['options']:
                    args = opt.pop('flags')
                    kwargs = opt
                    parser.add_argument(*args, **kwargs)

            if 'subcommands' in dictionary:
                subs = parser.add_subparsers(dest='action')
                for subcommand in dictionary['subcommands']:
                    for cmd, opts in subcommand.items():
                        sub = subs.add_parser(cmd)
                        walk_config(opts, sub)

        config = yaml.safe_load(PARSER_CONFIG_YAML)

        parser = ArgumentParser()
        walk_config(config, parser)

        self.parser = parser

    def foo(self, config):
        print("This is the foo subcommand, "
              "invoked with '-c %s'." % config)

    def bar(self, output):
        print("This is the bar subcommand, "
              "invoked with '-o %s'." % output)

    def baz(self):
        print("This is the baz subcommand")

    def spam_eggs(self, input):
        print("This is the baz spam-eggs subcommand, "
              "invoked with '-i %s'." % input)

    def main(self, argv=sys.argv):
        opts = self.parser.parse_args(argv[1:])
        getattr(self, opts.pop('action').replace('-', '_'))(**opts)

if __name__ == '__main__':
    CLI().main()

And now, if you want to add a new option, you add it to the top-level or the subcommand’s options list, and add it to your subcommand method.

And if you want to add a new subcommand, you just add that at the level you like, and add a method that is named like your subcommand — with any hyphens in the subcommand being replaced with underscores in the method name.

Notes

When using PyYAML, do not use versions affected by CVE-2017-18342. Really, you shouldn’t be using YAML at all for this purpose; you should just use a straight-up dictionary. If you want something just a little more readable, you might also consider JSON (for which there is a parser in the standard library), or perhaps TOML.

Also, yes there are smarter ways to define your program’s version; more on that perhaps in a later post.

tag:xahteiwi.eu,2019-03-09:/resources/hints-and-kinks/python-argparse-from-dictionary/

Learn Complex Skills, From Anywhere: Combining Django, Ansible and OpenStack to teach any tech skill

Florian Haas Jan 23, 2019 Updated Jan 23, 2019

Show full content

My presentation from linux.conf.au 2019.

Video: YouTube, Linux Australia (MP4), Linux Australia (WebM)
Slides (with full speaker notes): GitHub

Use the arrow keys to navigate through the presentation, hit Esc to zoom out for an overview, or just advance by hitting the spacebar.

tag:xahteiwi.eu,2019-01-23:/resources/presentations/learn-complex-skills-from-anywhere-combining-django-ansible-and-openstack-to-teach-any-tech-skill/

1,000 routers per tenant? Think again!

Florian Haas Dec 8, 2018 Updated Dec 8, 2018

When you allow one of your OpenStack tenants a large number of routers, they may not be getting as many as you think they will.

Show full content

Neutron quotas

As with all other OpenStack services, Neutron uses a fairly extensive quota system. An OpenStack admin can give a tenant1 a quota limit on networks, routers, port, subnets, IPv6 subnetpools, and many other object types.

Most OpenStack deployments set the default per-tenant quota at 10 routers. However, nothing stops an admin from setting a much higher router quota, including one above 255. When such a quota change has been applied to your tenant, you’re in for a surprise.

HA routers

Way back in the OpenStack Juno release, we got high-availability support for Neutron routers. This means that, assuming you have more than one network gateway node that can host them, your virtual routers will work in an automated active/backup configuration.

In effect, what Neutron does for you is that for every subnet that is plugged into the router — and for which it therefore acts as the default gateway — the gateway address binds to a keepalived-backed VRRP interface. On one of the network nodes that interface is active, and on the others it’s in standby. If your network node goes down, keepalived makes sure that the subnets’ default gateway IPs come up on the other node. The keepalived configuration is completely abstracted away from the user; the Neutron L3 agent happily takes care of all of it.

In addition, in case a network node is up but has lost upstream network connectivity itself, whereas another is still available that retains it, HA routers also fail over in order to ensure connectivity for your VMs.

The catch: one HA router network per tenant

In order to enable HA routers, Neutron creates one administrative network per tenant, over which it runs VRRP traffic. In order to tell apart all the keepalived instances that it manages on that network, it assigns each an individual Virtual Router ID or VRID.

And here’s the problem: RFC 5798 defines the VRID to be an 8-bit integer. That means that if you use HA routers, then setting a router quota over 255 is useless — Neutron will run out of VRIDs in the administrative network, before your tenant can ever hit the quota.

And this is a hard limit; there’s really not much that Neutron can do about this — apart from starting to spin up additional administrative networks once it runs out of VRIDs in the first one, but that likely would be a pretty involved change. Thus, at least for the time being, if you want more than 255 highly-available virtual routers, you’ll have to spread them across multiple tenants.

What’s more is that Neutron is not very forthcoming about this limitation itself: an attempt to create an HA router beyond the limit simply leads to an Unknown error from the Neutron API endpoint.

Wait, what if I really don’t need HA routers?

Well, firstly you probably do want them, really. But that aside, let’s assume for a moment that you actually don’t. Or rather, that it’s more important for you to have more than 255 routers in a single tenant, than for any of them to be highly available. So you create routers with the ha flag set to False, simple, right?

It turns out that you probably won’t be able to do that. And that’s not because you can’t change a router’s ha flag without first temporarily disabling it — that’s not going to hurt you much if you’ve already decided you don’t need HA; in such a case a brief router blip will be acceptable. Instead, it’s because (at the time of writing) the default Neutron policy restricts setting the ha flag on a router to admins only.

So if you want to be able to disable a router’s HA capability, you’ll first need to convince your cloud service provider to override the following default entries in Neutron’s policy.json:

{
    "create_router:ha": "rule:admin_only",
    "get_router:ha": "rule:admin_only",
    "update_router:ha": "rule:admin_only",
}

… and instead set them as follows:

{
    "create_router:ha": "rule:admin_or_owner",
    "get_router:ha": "rule:admin_or_owner",
    "update_router:ha": "rule:admin_or_owner",
}

If your cloud service provider deploys Neutron with OpenStack-Ansible, they can define this in the following variable:

neutron_policy_overrides:
    "create_router:ha": "rule:admin_or_owner"
    "get_router:ha": "rule:admin_or_owner"
    "update_router:ha": "rule:admin_or_owner"

Once the policy has been overridden in this manner, you should be able to create a new router with:

openstack router create --no-ha <name>

And modify an existing router’s high-availability flag with:

openstack router set --disable <name>
openstack router set --no-ha <name>
openstack router set --enable <name>

Is my router HA, really?

In relation to what I described above, you may want to find out whether one of your routers is configured to be highly available in the first place. You’d expect to easily be able to do this with an openstack router show command:

Alas, what you see in the example above is indeed a highly-available router, so why does it clearly report its ha flag as being False?

Well, that’s another consequence of that default Neutron policy, in combination with rather unintuitive behavior by the openstack command line client. You see, this part of the aforementioned policy

{
    "get_router:ha": "rule:admin_only",
}

… means you’re not even allowed to query the ha flag if you’re not an admin, and when the openstack client is asked to display a boolean value that the user is not allowed to even read, then it always displays False.

I’m very sorry, I still can’t force myself to call a tenant it a “project”, as I find that term profoundly illogical: the proper term for the concept being discussed here is multitenancy, not multiprojectcy. ↩

tag:xahteiwi.eu,2018-12-08:/resources/hints-and-kinks/1000-routers-per-tenant-think-again/