GeistHaus
log in · sign up

https://cablespaghetti.dev/feed.xml

rss
15 posts
Polling state
Status active
Last polled May 18, 2026 21:43 UTC
Next poll May 19, 2026 19:28 UTC
Poll interval 86400s
ETag "3128544631"
Last-Modified Sat, 05 Jul 2025 19:34:18 GMT

Posts

Hosting a Fediverse instance on an original Raspberry Pi
hardwareraspberrypialpinelinuxlinux
"Hosting a Fediverse instance site on an original 256MB Raspberry Pi using Alpine Linux diskless mode."
Show full content

In my previous post, I moved my blog to the original 256MB RAM Raspberry Pi that I purchased for the princely sum of £25 in May 2012. When I'd finished that little project I couldn't help but notice that due to not being a particularly successful blogger, I still had a lot of resources to spare.

I had heard of snac from following Justine Smithies who hosts her Fediverse presence on it. I knew it was stupidly lightweight, even more so than GoToSocial and quickly discovered that some helpful individual has even packaged it for Alpine Linux with armhf/armv6 builds available!

Whilst I do like snac and am currently using it as primary fedi instance in place of Mastodon, I will preface this with the warning that it is very minimal, and definitely aimed at unixy sysadmin types. Do not expect the full feature set that you get with Mastodon, and compatibility with Mastodon apps is patchy at best in my experience (although I have got Mona for iOS to work quite well).

Installation

As snac is still under heavy development, I wanted the newest possible version, so whilst I am running the latest stable Alpine Linux, I wanted the version of snac from the "edge" repository. To achieve this I just pointed /etc/apk/repositories at the edge version of the community repository:

/media/mmcblk0p1/apks
http://dl-cdn.alpinelinux.org/alpine/v3.22/main
http://dl-cdn.alpinelinux.org/alpine/edge/community

Then it's as simple as apk add snac. I also ran rc-status add snac default to make sure it starts up on boot.

Storage

"But Sam!" I hear you say, "The photo at the top of the post shows a 2.5" hard disk!"

Yes, well spotted. Lightweight or not, I don't want to store all my snac data on an SD Card. So consistent with the theme of this series, I found the oldest and slowest 2.5" hard disk (an IDE 40GB Fujitsu from 2004), random USB adapter and a powered USB 2.0 hub to connect up to my terrible server.

Raspberry Pi with a hard disk connected via USB and a powered USB hub

Note: I have since switched to btrfs as it seems to have lower memory usage. Snapshots are also a bonus for backups.

When connected it showed up in dmesg as /dev/sda, so I ran apk add xfsprogs to get the XFS utilities, used fdisk /dev/sda to make a new MBR partition table and single "Linux" (the default) type partition on it and mkfs.xfs /dev/sda1 to format the partition.

Amazingly considering the age of the drive and the amount of time it has been sat in my pile of mostly dead hard drives, this all worked properly and I could mount -t xfs /dev/sda1 /var/lib/snac. I then added this line to /etc/fstab so it would mount on boot:

/dev/sda1       /var/lib/snac   xfs     defaults 0 2

I also ran chown snac:snac /var/lib/snac to make sure the snac user can write to it.

We'll get to backups later; because you are definitely going to want backups with a setup like this...

Now might be a good time to lbu commit -d to save your changes in case of screw ups.

Snac configuration

It was at this point at the snac System Manager's Manual became very useful. Usually you would have the man pages available locally, but I don't...because lightweight...

For the Alpine package the snac init script mentioned in the above documentation is automatically triggered when you run rc-service snac start. Please read the docs linked above, but my answers looked like this:

Network address or full path to unix socket [127.0.0.1]: 
Network port [8001]: 
Host name: cablespaghetti.dev
URL prefix: /fedi
Admin email address (optional): myemail@example.com

As you can see I used the defaults for the networking stuff, gave it my domain name and told it that snac is going to live at /fedi because this blog lives at the root. It then started up and everything was rosy. However both snac and lighttpd (spoilers...) support unix domain sockets, which in theory have less overhead, so I plan to play with that option at some stage.

You can edit the other settings such as the server description in /var/lib/snac/data/server.json and then run rc-service snac restart.

You'll want to add your user at this point, which I did by running su -s /bin/sh - snac to get a shell as the snac user, then snac adduser /var/lib/snac/data.

Reverse Proxy Configuration

I've got a lot of little tweaks and hacks in my config which aren't really relevant to this post, so I'll just mention the important bits you need to change from the default Alpine lighttpd.conf to get this working.

First of all in /etc/lighttpd/lighttpd.conf I uncommented mod_proxy and mod_openssl in addition to the mod_deflate I mentioned in my previous post.

Getting a TLS Certificate

I've previously used certbot for getting TLS certificates (when not using Kubernetes of course). However it is a Python application and I immediately ran out of ramdisk when I tried to install it. I then went searching on pkgs.alpinelinux.org and discovered acme.sh. You know the drill by now apk acme.sh.

I'm assuming that you read my previous post and you have a DNS record for your domain pointing at your Raspberry Pi and port 80 is open to the internet...if not...do that...

Now run:

acme.sh --issue -d example.com -w /var/www/localhost/htdocs

That will get you a cert from ZeroSSL, but there are flags if you would rather use a different CA. It's great that there are a bunch of options in this market now...I remember when it was just StartCom.

Hopefully that worked and you can install the cert into the Lighttpd directory.

acme.sh --install-cert -d example.com \
    --key-file       /etc/lighttpd/server.key  \
    --fullchain-file /etc/lighttpd/server.pem \
    --reloadcmd     "rc-service lighttpd reload"

I also set up a cron job to try and renew the cert. In /etc/periodic/daily/acme.sh I put:

#!/bin/sh
acme.sh --cron >>/var/log/acme.log 2>&1

and then chmod +x it.

Now add this snippet to your lighttpd.conf:

ssl.privkey              = "/etc/lighttpd/server.key"
ssl.pemfile              = "/etc/lighttpd/server.pem"
ssl.openssl.ssl-conf-cmd = ("MinProtocol" => "TLSv1.2")
ssl.openssl.ssl-conf-cmd += ("Options" => "-ServerPreference")
$SERVER["socket"] ==     ":443" {          
  ssl.engine    = "enable"                               
} 

For bonus points uncomment mod_setenv and enable HSTS but please read up on what you are about to do first:

setenv.add-response-header = ( "Strict-Transport-Security" => "max-age=15768000; includeSubdomains" )
The reverse proxy bit

Setting up a reverse proxy in Lighttpd is stupidly easy. This is my slightly dodgy regex to only proxy the relevant paths to the snac server:

$HTTP["url"] =~ "^/(fedi|.well-known/(webfinger|nodeinfo|host-meta)$|api|oauth|share|authorize_interaction)" {
    proxy.server = ( "" => ( ( "host" => "127.0.0.1", "port" => "8001" ) ) )
    proxy.forwarded = ( "for"          => 1,
                        "proto"        => 1,
                        "host"         => 1
    )
}

Now don't forget to lbu commit -d!

That's all folks

In theory you can now go to https://example.com/fedi and login to your new shiny snac instance. I have about 700 followers and it isn't exactly fast (I mean what do you expect) but it works and is currently doing the job I need it to do.

If you're mad enough to follow this guide a something doesn't work, the blog source is on GitHub for you to open an Issue or Pull Request with any corrections.

You can also complain to me on the Fediverse/Mastodon at @sam@cablespaghetti.dev which will come to this very Raspberry Pi. If I am slow to respond you know why!

Bonus Bit: Backups

I wrote a little backup script which I run as a daily cron job. Currently my instance is very new so it works ok...I'm not 100% sure it'll still work when the data volume grows, but I suppose I'll find out!

It's in this GitHub Gist.

/hosting-a-fediverse-instance-on-an-original-raspberry-pi.html
Hosting a static site on an original Raspberry Pi
hardwareraspberrypialpinelinuxlinux
"Hosting a static site (this blog) on an original 256MB Raspberry Pi using Alpine Linux diskless mode."
Show full content

It's been a while (4ish years...) since I last posted to this blog, but I'm making another attempt at getting back in the habit of sharing my mad side-quests in a format people might find useful. Here we go!

For a long time, I've had a strange obsession with making use of the worst possible hardware to do "stuff". This is despite having a collection of semi-decent random computers sat around doing nothing.

Thanks to my friend Phill, who does not share the same hoarding tendencies as myself, I also have a collection of first generation Raspberry Pis. Two of these are super early ones from 2011 with 256MB RAM. Naturally I decided that the best way to resurrect this blog would be to move to from GitHub Pages to one of these boards.

I have attempted to make use of these boards in the past, but the limiting factor always ends up being storage. They are only capable of booting from an SD Card and your average Linux-based workload both runs like treacle and eventually kills this very sub-optimal storage medium.

The solution to this is Alpine Linux "diskless" mode; this runs the whole OS, applications and any configuration you need to persist from RAM! Yes, I only have 256MB of RAM to work with but at least whatever I can fit into this space won't be bottlenecked by terrible storage performance.

Setting up the SD Card

Alpine Linux has some great documentation on how to install on a Raspberry Pi, and are one of the few distros still supporting the 32bit ARMv6 processor in these old Pis. However I will document the process I followed here, because I did hit a number of issues.

The first issue, was that the Raspberry Pi Imager route creates an absolutely tiny partition which seem to use FAT16. A bug in dosfstools means I found no way to increase this to a workable size under Linux. So I followed the "Manual method" mentioned on the wiki page.

With the SD card still connected to my laptop I then completely ignored the note at the top of config.txt and changed the contents to the following. This is because certain settings such as gpu_mem=16 do not work if placed in usercfg.txt and I thought I'd just make all my changes in the same place.

The most important change here is reducing the amount of memory allocated to the GPU to maximise the amount of RAM available for activities; without this change I could not get the Alpine Linux installation script to complete. I have also overclocked the snot out of the CPU (your mileage may vary) which is somehow completely stable without so much as a heatsink and running at just over 50C under load.

# do not modify this file as it will be overwritten on upgrade.  
# create and/or modify usercfg.txt instead.  
# https://www.raspberrypi.com/documentation/computers/config_txt.html  

kernel=boot/vmlinuz-rpi  
initramfs boot/initramfs-rpi  
arm_64bit=0  
gpu_mem=16  
arm_freq=1100  
core_freq=500  
sdram_freq=500  
over_voltage=8  
include usercfg.txt

Before you unmount your SD Card you need to grab fixup4cd.dat, fixup_cd.dat, start_cd.elf and start4cd.elf from github.com/raspberrypi/firmware and place them alongside the rest of the firmware files. These are the cut down firmware files which get used when you set gpu_mem=16 and are not shipped by Alpine Linux by default. I think you might be able to go without the "4" ones as I suspect they are for the Pi 4/5 but I copied them anyway.

The Install Process

The Pi should boot to the SD Card pretty quickly with a keyboard, monitor and network cable attached. However you might see some harmless clock-related error messages. You can login as root with no password and run setup-alpine to begin the installation. Documentation for Diskless Mode is available on the wiki but here are the rough steps I followed:

  • Set up networking, root password, timezone and no network proxy as you might expect
  • Chose the default of Busybox for NTP
  • Ignore the errors about SSL (probably clock related again) for the APK mirror and just chose the defaults and use their CDN
  • I didn't bother setting up a non-root user
  • I chose the default of OpenSSH and changed the default of prohibit-password for root login to yes so I could easily SSH in later and skipped typing in my SSH key
  • For the disk settings I declined to try the boot media, selected "none" for a config store and "none" for an apk cache.

After connecting over SSH I then ran setup-lbu mmcblk0p1, mount -o remount,rw /dev/mmcblk0p1 and mkdir /media/mmcblk0p1/cache to set up the SD Card for diskless operation.

Then I fixed my SSH security sins like this:

mkdir .ssh
vi .ssh/authorized_keys # Pasting my public key
chmod 600 .ssh/authorized_keys
vi /etc/ssh/sshd_config # Comment out my change to the default of `PermitRootLogin`

Now to have any of these changes persist after reboot it necessary to get familiar with the lbu tool. If you are familiar with git it won't feel completely foreign.

First I ran lbu status and/or lbu diff to see any changes that have not yet been committed. Then lbu add /root/.ssh/authorized_keys to have the tool track the state of my new additional file (this must be the absolute not relative path) and lbu commit -d to save the changes.

You can now reboot the system and hopefully pick up where you left off.

Configuring Lighttpd

There are many webservers available in the Alpine Linux package repositories, including ones like Nginx and Caddy which I have used extensively in the past. However this project isn't about doing things the normal way, it's about making use of the worst hardware possible and trying to get decent performance; so I did an apk add lighttpd.

The docs for Lighttpd on the Alpine wiki are great so you can follow them to configure it to run on boot with rc-update add lighttpd default and start it up with rc-service lighttpd restart. You can then stick your static site /var/www/localhost/htdocs/ and be off to the races.

Unfortunately if you expect this to work after an lbu commit -d and a reboot I'm afraid you will be out of luck. I discovered the /var/lib/lighttpd and /var/log/lighttpd did not get created on boot, and unless you want to tell lbu to keep track of all your log and cache files (don't do this...) there isn't out out-of-the-box way to fix this. Depending on how big your static site is, you might want to lbu add /var/www/localhost/htdocs though.

What I did was put a script in /etc/local.d/lighttpd-log.start and chmod +x it:

mkdir /var/log/lighttpd
mkdir /var/lib/lighttpd
chown lighttpd:lighttpd /var/log/lighttpd
chown lighttpd:lighttpd /var/lib/lighttpd

I am still struggling with lighttpd crashing out when it detects the clock jumping forward multiple years on boot (hardware clockless SBC owner problems), so my current solution is to ignore the problem and run rc-service lighttpd restart by hand every time I reboot. Natanael Copa shared how you can tell lighttpd to start after ntpd:

echo rc_after=ntpd >> /etc/conf.d/lighttpd
Benchmark time!

Out of curiosity I spun up Locust to see what kind of performance my new fancy ramdisk-based web server could do. The static site for testing purposes is a 17KB HTML file with an ASCII art bottle of mayonnaise (https://lube.pizza). With the default 700MHz clock speed I got around 350 requests per second, which I thought was pretty good.

After overclocking the snot out of it (see config.txt at the top of this blog) I plateaued at just over 500 requests per second, but noticed that the CPU didn't seem to be working that hard.

Locust benchmark showing 525 requests per second average

At this point I had a brainwave; these old Pis only have a 100Mb NIC which I was completely saturating! What if I enable HTTP compression in Lighttpd to shrink down the response size?

To do this I just edited /etc/lighttpd.conf, uncommented mod_deflate in the server.modules section at the top of the file and the two default configuration lines in the mod_deflate section further down the file.

The result was 1100 requests per second! Out of a 1st gen Pi this is downright ridiculous. Admittedly the workload is an absolute best case scenario, but it does show what is possible with the right software choices on very limited hardware.

Locust benchmark showing 1100 requests per second average

I have lost the screenshot, but after using acme.sh installed with the Alpine Linux package to get a TLS certificate the server still managed over 600rps over HTTPS. I will detail that setup in a future blog.

Thanks for reading!

If you got this far, well done; you have endured my ramblings for a significant period of time. You are reading this blog on the very Raspberry Pi described in this article via the Marmite static site generator.

Sorry if you are following along with this post and find errors or omissions. The blog source is on GitHub for you to open an Issue or Pull Request with any corrections.

You can also complain to me on the Fediverse/Mastodon at @sam@cablespaghetti.dev. The snac instance hosting this profile also runs on the very same Pi in this article. Watch this space for a blog on that journey.

Bonus Bit: Service Ordering

Natanael Copa founder of Alpine Linux replied to my fediverse post and shared these nuggets of wisdom.

Reply from Natanael Copa sharing information about how you can automate the Alpine Linux installation process.

Reply from Natanael Copa sharing how to grab an SSH key from GitHub during the installation process and how to get lighttpd to start after ntpd.

/hosting-a-static-site-on-an-original-raspberry-pi.html
Project Home Cloud Part 3: MikroTik hAP ac² Initial Configuration with IPv6
networkinghardwaremikrotiklife
"Setting up MikroTik hAP ac² with the basic functionality to replace my ISP provided router including working IPv6."
Show full content

In the last couple of posts I went over my plans to overhaul my home network and separate off the machines I have hosting web-facing services from the private network used by my family. In this post I'll outline the configuration of my new MikroTik router, as someone with a bit of networking knowledge but no previous experience with the brand. The aim here is to seamlessly replace my ISP provided router without adding any additional functionality at this point; this includes IPv6 which was the only part I found particularly difficult to get working.

Quick set

The hAP ac² (if you don't buy a used one) comes set up with an IP address of 192.168.88.1, DHCP enabled for this range and no password on the "admin" user. These devices have a lot of options, but to help you get started when you first access the "Webfig" web UI you will first be prompted to set a new password and then presented with the simplified "Quick set" interface.

For our purposes the "WISP AP" mode (selected in the top right corner) is the one we want. Here you can fill in the basic(ish) details for our Wi-Fi, Internet connection and LAN IP address.

Quick Connect UI

Unfortunately when I filled all this in, I hit a bug which made Webfig unavailable. Other people seem to have hit it previously but there didn't seem to be any misconfigured firewall rules in my case. In the end I logged in via SSH and ran system reset-configuration to take it back to factory defaults again. The next time around I didn't hit the problem and I'm not quite sure what I did differently.

When using Quick Set I just filled in the basic details for PPPoE and changed the LAN settings to match my existing router leaving Wi-Fi settings at the defaults for the moment. I disabled DHCP as I already have a Raspberry Pi with dnsmasq doing that job. Having the IP the same as my existing router means I don't have to reconfigure my DHCP server and go round my house renewing DHCP leases.

Wi-Fi Configuration

I'm no Wi-Fi expert, but Wi-Fi on the MikroTik is nothing to write home about; it is an 802.11ac 2x2 router and the small physical size limits the coverage you'll get. I might still set up the BT Smart Hub 2 I got given by my ISP as a separate access point, which has 4x4 MIMO on 5GHz and 3x3 MIMO on 2.4GHz. However for our relatively small house and modest needs it works perfectly well after playing around with the settings a bit. Sat in the lounge which is on the floor above the router in our three storey townhouse I can get a reliable 130Mb download speed. On the top floor I still get a really good signal scrolling through Twitter on my phone in bed.

I didn't configure my Wi-Fi using the Quick Set screen and so to begin with I had the default configuration of a completely open Wi-Fi network. This obviously isn't ideal and I'm not sure why it's the default, but as I had already set an admin password for the router and it was only connected to my laptop I wasn't terribly worried.

The first thing to do is to edit the default security profile and set up WPA2 encryption. I used very basic settings, but as with anything on MikroTik gear you have a lot of options if you want to use RADIUS or something more advanced. The Security Profiles are on a separate tab within the Wireless menu.

default Security Profile

Security now taken care of I went digging into the features available to me. Having recently switched to iOS for my Phone I can no longer use the excellent WiFi Analyzer Android app to scan for the channels my neighbours are using and pick a quiet one. The "Freq Usage" and "Scanner" features are an excellent replacement for this.

Wireless Options

Here's the 5GHz band frequency usage in my house. As you can see it gives you the actual frequencies in MHz rather than the channel number as you might be used to, but this lines up with the radio configuration so I suppose it makes sense. Ideally you want four consecutive quiet channels; I picked 5500MHz as my starting channel.

Freq Usage

It's worth noting at this point that not all channels are created equal, at least not in the UK. The rules seem to change fairly frequently but most channels in the 5GHz range are "DFS" channels. These frequencies are usually the most quiet but the downside is that there will be a delay (usually 60 seconds but sometimes up to 10 minutes) while your access point scans for radar already using that frequency in your area, before it starts broadcasting itself. I've also found that my Amazon Fire TV stick only seems to support non-DFS channels, but for now I'll just live with that using 2.4GHz. You'll have to decide whether the trade offs are worth it for you.

Here is the part of my 5GHz radio configuration which I've changed from the default (MAC removed for security or something, SSID removed for embarrassment):

5GHz configuration

The "Channel Width" option is the only other thing here which is pretty interesting. I'll just say this again I am not an expert and here are the official docs. However as I understand it, the 20/40/80mhz-Ceee mode is the best one available for this particular device (or eCee, eeCe, eeeC), and what this means is that the frequency you selected is used as the primary channel and then three channels above that are used as "extension" channels to give the full 80MHz width for maximum performance. If for example you don't (or can't) use DFS channels and one of the normal channels is slightly quieter than the rest, you can tailor this to the frequencies available to you.

Port forwarding

I might be making life hard for myself by not learning the MikroTik CLI but configuring basic port forwarding made my brain hurt a little bit. Here's the basic configuration I used under IP -> Firewall -> NAT -> Add New in Webfig. There are a lot of options on this screen so I've only taken screenshots of the parts I filled in.

Port forwarding first section
Port forwarding second section

MTU tweaking for maximum performance

Where PPPoE is used there are potentially problems with the MTU (Maximum transmission unit) used for Ethernet frames encapsulated within the PPP tunnel. This is because the standard MTU for Ethernet is 1500 bytes, but PPPoE needs 8 bytes to store its own headers. This means that without your Internet connection being configured for a higher than standard MTU, then frames of 1500 bytes can't actually be transmitted over the PPP tunnel, causing fragmentation or complete failure for some services like Netflix. This higher than standard MTU is called RFC4638 or "Jumbo Frames".

BT and many other UK ISPs seem to support RFC4638 with an MTU of 1520, and I assume the ISP provided routers are configured for this. However many 3rd party routers like the MikroTik don't come configured for this by default, so we need to make a few tweaks to our configuration.

Andrews & Arnold have a section about this on their wiki, but the information seems to be out of date for the latest MikroTik firmware. Thanks to Daryll Swer for linking to this MikroTik forum thread and generally providing guidance on this. He's got some interesting MikroTik related posts on his blog if you're interested.

True to form I've configured this all this in Webfig in the Interfaces section under pppoe-out1 for the virtual PPPoE interface and ether1 for the actual physical port. Leaving the MTU and MRU on the PPPoE interface unset should allow it to auto-negotiate with your ISP's equipment for the best MTU/MRU they support; read the linked thread for more information on that one.

pppoe-out1 configuration
ether1 configuration

IPv6

One of the reasons I went with BT for my Internet is that they are one of the relatively few UK residential ISPs which support IPv6; their Smart Hub routers are configured for it out of the box and I've never had a problem with it. However I, like most people, are much more used to IPv4 and so many 3rd party devices come with IPv6 disabled by default to keep things simpler. However I want my IPv6 working, so here's the configuration I ended up with.

I mostly used this really detailed blog post by Joe Robinson.

First of all I had to enable the ipv6 package, which you probably can do somewhere in Webfig but I did it over SSH with:

system package enable ipv6
system reboot

The next thing to do is set up the firewall which unfortunately doesn't have any default configuration. I've shamelessly lifted the rules from Joe Robinson's blog post, and I'll repeat what he has said there I am not an expert, these may not be secure. That said, the easiest way to set these up is the CLI over SSH. Here are the commands:

ipv6 firewall filter add action=reject chain=input comment="Reject invalid traffic to the Router" connection-state=invalid in-interface-list=WAN reject-with=icmp-no-route
ipv6 firewall filter add action=reject chain=forward comment="Reject unsolicited traffic to the LAN" connection-state=!established,related,untracked in-interface-list=WAN reject-with=icmp-no-route
ipv6 firewall filter add action=reject chain=forward comment="Reject invalid traffic to the LAN" connection-state=invalid in-interface-list=WAN reject-with=icmp-no-route
ipv6 firewall filter add action=accept chain=input comment="Accept LAN traffic to the router" in-interface-list=LAN
ipv6 firewall filter add action=accept chain=forward comment="Accept LAN traffic" in-interface-list=LAN
ipv6 firewall filter add action=accept chain=forward comment="Accept LAN traffic" connection-state=established,related,untracked
ipv6 firewall filter add action=drop chain=forward comment="Drop everything else" log=yes

Now you should in theory be secure, you can enable the DHCP client on the PPPoE interface for IPv6. This is under IPv6 -> DHCP Client in Webfig. The prefix hint was left blank initially and then filled in after I got the prefix from my ISP the first time around; this should encourage them to give me the same prefix next time around, although it looks like I've been given this /56 prefix for 10 years!

You'll notice here that I didn't select "Add Default Route"; this seems to be set up automatically and this box just added a duplicate route I didn't need.

DHCP Client Configuration

Then under IPv6 -> Addresses you can set up your LAN "bridge" interface to use this pool of addresses. Here the "Address" field will be ::/64 when adding the configuration; it is then filled in automatically when you save:

Assign IPv6 addresses to LAN

Then you must set up Neighbour Discovery or devices on your LAN won't get assigned addresses, this is under IPv6 -> ND:

Set up neighbour discovery

And that should hopefully be it! I've continued to use my separate DNS server for IPv6 clients running on dnsmasq, which seems to respond with IPv6 addresses by default anyway, even if it is operating over IPv4 at the moment.

The next post

I've purchased a TP-Link TL-SG105E managed switch for next to nothing off eBay which should be delivered some time this week. When that arrives I'll configure a VLAN "trunk" to my office/shed from the MikroTik so I can start separating off my web-facing homelab from the rest of my network over the single Cat6 cable I have.

/project-home-cloud-part-3-mikrotik-hap-ac-initial-configuration-with-ipv6.html
Project Home Cloud Part 2: MikroTik hAP ac² Factory Reset
networkinghardwaremikrotiklife
"Figuring out how to factory reset my used MikroTik hAP ac²"
Show full content

If you read my [previous post]({% link _posts/2021-03-28-project-home-cloud-part-1-the-plan.markdown %}), you'll know that I'm giving my home network a bit of an overhaul brought on by the installation of a new "full fibre" Internet connection. The plan is to set things up so I can separate the machines I have hosting web-facing services from the rest of my LAN. I'm mostly doing this for security reasons but it should also give me a bit more freedom to tinker with (and break...) my "lab" network without upsetting my wife and kids.

As I mentioned at the end of my last post I've purchased a MikroTik hAP ac². This tiny little box is everything I could possibly need in a router and as a bonus it has pretty decent dual band 802.11ac Wi-Fi, so I don't need to buy an additional access point.

MikroTik Hap ac2

What I didn't mention in the last post is that I bought my MikroTik used from eBay, even though they're only about £60 new; I never like to pay full price for something, even if the full price is extremely reasonable. Unfortunately this is one of those times where I wish I had just bought a brand new unit and not been such a cheapskate. The hardware itself is perfectly fine, but it came with a password already set. I thought this wouldn't be a big deal, but factory resetting it proved to be a little harder than I'd bargained for.

After discovering the existing password, I went looking for documentation on how to factory reset the unit. I found the MikroTik wiki page for the hAP ac². I tried to follow the (pretty unclear) instructions in the "Buttons and jumpers" section to reset the configuration. My interpretation of this was to hold down the downright painful reset button, then plug in the unit and only release the button when the green light started flashing. Unfortunately after trying this a few times, all I achieved was to go from being unable to log into the "Webfig" web interface to having it completely unavailable, with only SSH still being accessible (but still a password I didn't know).

Lots of wasted time and a lot of Googling later I discovered there was another more drastic way to reset MikroTik devices; a Netinstall. This process involves running the "Netinstall" software on another machine and having the device boot up over the network from your machine to perform a fresh install of the firmware. There is a Linux version of the software but it seems to be quite new and poorly documented, so after failing with this a couple of times I borrowed my wife's Windows 10 laptop.

The proper documentation is available here but the process took me quite a while to get right, so I'll outline the steps I took here anyway:

  1. Download the Netinstall software and latest firmware the device
  2. Set up the NIC on the laptop with a static IP of 192.168.88.2 (as in the docs)
  3. Fire up the Netinstall software, telling Windows to give it network access on Private and Public networks when prompted
  4. Click on "Net booting" and enable the boot server with client IP set to 192.168.88.3 (also in the docs)
  5. Now here's the bit which nearly drove me mad. I presumed the device would want to network boot on one of the "LAN" ports (numbers 2 and up) so all my initial attempts were with the cable plugged into port 2. Nowhere in the docs does it mention which port to use. It turns out it will only network boot on the first port, which is the one usually used for your Internet connection.
  6. The next part is a bit fiddly too. You have to hold down the reset button, plug in the power and then keep holding until after the green light stops flashing and it hopefully shows up in the Netinstall software. The button needs to be held down very firmly to remain "clicked" in place and in the end I only managed this using a pencil rather than my fingers. I also disabled the Windows Firewall at some point during my previous failed attempts, so if you have problems maybe try this.
  7. If you've got the router to show up in the Netinstall software, congratulations the hard part is over! Browse to the downloaded firmware file for your device, check the "Apply default config" checkbox, click "Install" and wait while the software flashes your device to the latest firmware and default configuration.
  8. When it has completely finished you can unplug the power, change your machine's network configuration back to normal, plug into port 2 again, power up the device and see the normal (password free) web interface at 192.168.88.1.

This whole journey of discovery took me two or three hours. However it hasn't completely put me off the router. I'm absolutely blown away by the sheer number of options this thing gives me and my next post will outline the process of getting it set up to replicate the functionality of my (actually really good but locked down) ISP provided BT Smart Hub; IPv6 and all.

/project-home-cloud-part-2-mikrotik-hap-ac-factory-reset.html
Project Home Cloud Part 1: The Plan
linuxnetworkinghardwarekuberneteslife
"The story of my love affair with old crusty hardware and the beginnings of my plan to turn my shed into my own personal cloud provider."
Show full content
History

For as long as I can remember, I've loved tinkering with hardware; this often manifests in the form of installing Linux on computers which really should have been consigned to the bin a long time ago. In my teenage years I had a Pentium III machine with no case sat on a bookshelf running a Counter Strike: Source server, a Pentium 2 running pfSense or something serving as a firewall (to separate my bedroom network from the rest of the house...obviously) and some old machine with a bunch of drives in USB 2.0 enclosures running as a Linux software RAID 5 doing BitTorrent seed box/file server duties. Bear in mind that this was in the mid 2000s...so this hardware was extremely dated even then.

My bedroom circa 2006

Fast forward to 2021 and I'm ten years into a career in tech, with a wife, two kids and a mortgage. The things I enjoy tinkering with in my (very limited) spare time haven't changed that much, but it's been a few years since I've had anything worth calling a homelab. It's been 3 years since I had a job involving any physical hardware, and that desire to make use of old crusty machines to build some cool infrastructure at home has come back with a vengeance.

Last summer, having worked at home at the kitchen dining table in a relatively small house, with two very bored (and loud!) children for a few months, I was desperate to build a better set-up. We weren't in a position to move house and there was no space I could turn into a proper office, so I looked to the (~6 metre long) garden for a solution. Given the space constraints it wasn't possible to build anything bigger than the standard issue 2.4m garden shed, so with the help of my Dad, I insulated it, boarded it with recycled pallet wood and stuck in a couple of windows and a desk. Instant tiny office!

shed office exterior
shed office desk

What I have now

Anyway, since having a bit of space to myself again, I couldn't help digging out some of my old hardware and having a tinker. This started off with running a k3s cluster made up of a Raspberry Pi 4 and an old Core i3 laptop. However I've also got an old HP Proliant ML110 G6 with a few drives in it waiting to go into service as ZFS based storage server and I also recently bought a Lenovo M72e "Tiny" with a Core i5 and upgraded it to have 16GB RAM and an SSD.

Lenovo M72e Tiny

The first hurdle with all of this was the poor networking at the bottom of the garden; the shed was now insulated with foil backed insulation and the only connectivity was via a cheap Netgear Wi-Fi bridge. Gigabit Ethernet was a must, so I ran a cable (with lots of skinned knuckles and swearing) through the wall of the house and down the garden. For the moment this was fine, because our Internet was FTTC VDSL and I could run the router off the phone extension socket in the kitchen at the back of the house. However at some point in the next few weeks we're getting FTTP (Fibre To The Premises) installed which will come in at the front of the house.

Needless to say my wife wasn't keen on permanently running a Cat6 cable through the middle of the kitchen so I needed a better plan. Luckily when the house was built, the builders used daisy-chained Cat5e (I think...) to put phone sockets in a number of rooms; so all I needed to do was figure out which rooms connect together and splice the cable. With a bit of bodgery and having re-learned how to crimp RJ45 jacks this has now been a success, so I'm ready to start working on installing OSes.

Routers

Now I've got decent connectivity to my mediocre k3s cluster in the shed office, I'm in a position to start thinking of a practical purpose (or exuse) for all this infrastructure. I've been paying AWS a few dollars a month to host a some Wordpress sites for friends and family, so the most obvious use is to move those in-house (well...in-shed). However Wordpress being what it is, I don't really want a hacker to find themselves on my home network if they compromised one of these sites. This means it's unfortunately time to say goodbye to my (surprisingly good) ISP provided Wi-Fi router and start looking at more capable alternatives, which will allow me to have a separate DMZ (demilitarised zone) for my web-facing infrastructure.

Based on previous experiences I started looking at hardware on which I could run PfSense or its fork OpnSense. However anything vaguely capable with the three NICs I need would either be well over £100 or be a big power hungry x86 machine. I then went down a rabbit hole looking alternatives like OpenWrt and various ARM boards including a few based on the Raspberry Pi 4 compute module. However I couldn't find anything cheap enough which had three NICs without resorting to USB adapters.

This was when I happened upon a relatively obscure brand called MikroTik. They compete with the likes of FortiGate and Ubiquiti in the enterprise market, but then ship the same software on their consumer level hardware. The model I ended up deciding on is the hAP ac² which is a tiny little (roughly Raspberry Pi sized) router with dual band 802.11ac for around £50. It seems to have the performance to deal with the 150Mb line I'm getting installed and all the features under the sun! Think VLANs, VPN support (including Wireguard), multiple Wi-Fi SSIDs, BGP...just everything! As a bonus, the inbuilt Wi-Fi should be good enough to cover my whole house so I don't have to worry about additional access points.

MikroTik Hap ac2

The Plan

The plan is to set up the MikroTik when my new Internet connection gets installed (I don't have a separate VDSL modem to use it with my existing line), with a VLAN trunk for the two networks (DMZ and LAN) going down the single cable to my shed office. I'll then need to get a managed switched for the shed to split that out for my laptop/internal stuff and the public-facing Kubernetes cluster.

I'm then looking at PXE network booting my k3s nodes and running them on Flatcar Linux for a more "cattle not pets" approach, with the HP Proliant ML110 server with lots of drives providing persistent storage on top of ZFS. I hope to set this all up using Ansible or similar so it's reproducible and I can stick it all up on GitHub for others to copy.

Watch this space for more posts over the next few weeks as I hopefully make some progress!

/project-home-cloud-part-1-the-plan.html
EKS Managed Node Groups, the good, the bad and the config
kubernetesawseks
"The pros and cons, and how to migrate to Amazon EKS Managed Node Groups."
Show full content

Amazon EKS launched in 2018 to the relief of many who had been managing their own Kubernetes clusters on AWS. However it wasn't as fully featured as some had hoped out of the gate. One of the big improvements Amazon made was to release Managed Node Groups in 2019; this removed the need for people to manage their own Auto Scaling Groups and tasks like replacing nodes to upgrade to a new AMI version no longer required a long drawn out manual process or home grown automation.

Although they were a big step forward for EKS usability, the initial release of Managed Node Groups had some limitations which meant it wasn't suitable for everyone. Most importantly for us, it only supported On Demand instances, but users also couldn't customise the Launch Template for the nodes; this restricted us to using the official EKS Optimized AMI and only customising node labels rather than having full control over the bootstrap.sh script.

Launch template support was released in August 2020 and then in December so was spot instance support (although I only found out a couple of weeks ago). These two features ultimately made Managed Node Groups flexible enough for most users, even awkward ones like me.

You can read a little more about how we've got things set up in my previous post about spot instances. However I'll go into more detail on the Managed Node Group specifics in this post. The short version is that we obviously needed Spot support and to customise the taints on our ARM nodes.

The Good

The primary reason we desperately wanted to move to Managed Node Groups was the amount of time and effort it took us to replace our nodes to upgrade to a new AMI. We've been using hellofresh/eks-rolling-update for this which is a great tool, but needs to be run manually; this isn't too much of a problem with one or two clusters but for fifteen it gets very time consuming.

Whilst it didn't really benefit us as established EKS users, it shouldn't be understated just how much easier it is to get started with EKS than it was previously, thanks to Managed Node Groups.

The Bad

We have a fairly unusual set up, where we have multiple groups of nodes with different priorities; when we scale up we usually get Spot Instances, with On Demand instances only getting launched if there is no Spot capacity available. Unfortunately for this use case the naming of the Auto Scaling Groups which Managed Node Groups create under the hood is a problem; this is because the Kubernetes Cluster Autoscaler doesn't see the nice descriptive name you give your Node Group, it only sees the UUID style hash given to the ASG. I had to build a tool to generate Priority Expander configuration for the autoscaler to work around this. However if you're not using the Priority Expander it's not going to be an issue.

The Config

We found launching managed node groups with our own custom Launch Template to be a little nuanced; for example we had some cryptic error messages from Terraform when trying to use a custom user-data script without explicitly setting the AMI. Here's the Terraform we used in case it helps others who need to tweak things in more complex ways:

user-data.sh:

#!/bin/bash -xe
# Inject imageGCHighThresholdPercent value unless it has already been set.
if ! grep -q imageGCHighThresholdPercent /etc/kubernetes/kubelet/kubelet-config.json;
then
    sed -i '/"apiVersion*/a \ \ "imageGCHighThresholdPercent": 70,' /etc/kubernetes/kubelet/kubelet-config.json
fi

# Inject imageGCLowThresholdPercent value unless it has already been set.
if ! grep -q imageGCLowThresholdPercent /etc/kubernetes/kubelet/kubelet-config.json;
then
    sed -i '/"imageGCHigh*/a \ \ "imageGCLowThresholdPercent": 50,' /etc/kubernetes/kubelet/kubelet-config.json
fi

/etc/eks/bootstrap.sh --b64-cluster-ca ${cluster_auth_base64} \
--apiserver-endpoint ${endpoint} \
--kubelet-extra-args '--node-labels=lifecycle="${node_lifecycle}" --register-with-taints="${node_taint}"' \
${cluster_name}

We then configure the template_file for each set of Node Groups:

data "template_file" "userdata_spot_arm64" {
  template = file("${path.module}/user-data.sh")
  vars = {
    cluster_name        = var.cluster-name
    endpoint            = aws_eks_cluster.eks-cluster.endpoint
    cluster_auth_base64 = aws_eks_cluster.eks-cluster.certificate_authority[0].data
    node_lifecycle      = "spot"
    node_taint          = "arch=arm64:NoSchedule"
  }
}

We also need to get the AMI from Amazon's public Systems Manager paths:

data "aws_ssm_parameter" "eks-worker-ami" {
  name = "/aws/service/eks/optimized-ami/1.19/amazon-linux-2/recommended/image_id"
}

data "aws_ssm_parameter" "eks-worker-ami-arm64" {
  name = "/aws/service/eks/optimized-ami/1.19/amazon-linux-2-arm64/recommended/image_id"
}

The there's the aws_launch_template:

resource "aws_launch_template" "eks-cluster-node-group-worker-nodes-spot-arm64" {
  image_id               = data.aws_ssm_parameter.eks-worker-ami-arm64.value
  name                   = "${var.cluster-name}-eks-cluster-node-group-worker-nodes-spot-arm64"
  vpc_security_group_ids = [aws_security_group.eks-cluster-worker-nodes.id]
  user_data              = base64encode(data.template_file.userdata_spot_arm64.rendered)
  ebs_optimized          = true

  block_device_mappings {
    device_name = "/dev/xvda"
    ebs {
      volume_size = 50
      volume_type = "gp3"
      iops        = 3000
    }
  }
  lifecycle {
    create_before_destroy = true
  }
  depends_on = [aws_eks_cluster.eks-cluster]
}

Finally the aws_eks_node_group:

resource "aws_eks_node_group" "eks-worker-nodes-spot-arm64" {
  count           = length(local.subnet-ids)
  cluster_name    = aws_eks_cluster.eks-cluster.name
  capacity_type   = "SPOT"
  node_group_name = "${var.cluster-name}-${count.index}-spot-arm64"
  node_role_arn   = aws_iam_role.eks-cluster-worker-nodes.arn
  subnet_ids      = [element(local.subnet-ids, count.index)]
  instance_types  = ["m6g.2xlarge"]

  scaling_config {
    desired_size = var.spot-arm64-min-hosts-per-az
    max_size     = var.spot-arm64-max-hosts-per-az
    min_size     = var.spot-arm64-min-hosts-per-az
  }

  lifecycle {
    ignore_changes = [scaling_config.0.desired_size]
  }

  launch_template {
    id      = aws_launch_template.eks-cluster-node-group-worker-nodes-spot-arm64.id
    version = aws_launch_template.eks-cluster-node-group-worker-nodes-spot-arm64.latest_version
  }

  tags = {
    Name        = "${var.cluster-name}-${count.index}-spot-arm64"
    Environment = var.cluster-name
  }
}
The Migration Steps

After migrating to Managed Node Groups, life obviously becomes much easier with managing nodes and Kubernetes upgrades. However the process to migrate from our old Auto Scaling Groups took a bit of thought to get right. So here it is:

  1. We were upgrading to Kubernetes 1.19 at the same time, so we followed the normal steps to do that. Upgrading the EKS Cluster control plane itself and ensuring kube-proxy, CoreDNS, the Cluster Autoscaler etc were up to date for that version, as per the AWS documentation.
  2. We applied the Terraform to create the new Managed Node Groups, without removing the old Auto Scaling Groups
  3. When the new nodes had finishing coming online we drained the old ones with this handy one-liner:
kubectl drain --selector '!eks.amazonaws.com/nodegroup' --delete-local-data --ignore-daemonsets
  1. We then removed the old Auto Scaling Groups from the Terraform and applied it to delete the old, now empty nodes.

I hope this post helps those of you who haven't yet moved to Managed Node Groups. If they don't yet meet your requirements I'd love to hear what weird and wonderful things you're doing. I'm always happy to chat to people working on similar challenges, so get in touch on Twitter if you have any questions or just fancy a chat!

/eks-managed-node-groups-the-good-the-bad-and-the-config.html
Using AWS Spot Instances in your production EKS cluster
kubernetesawsspot
"How to use AWS Spot instances in your production EKS cluster without causing an outage."
Show full content

At work we have a number of fairly large Kubernetes clusters on Amazon EKS; some with 50 or 60 "xlarge" nodes. This amount of compute on AWS can cost a fortune every month, so of course we wanted to do what we could to reduce this.

How do Spot Instances work?

Spot Instances are essentially a way for AWS to sell any spare capacity they have, after all the Reserved and On Demand customers have got their instances provisioned.

This capacity is available without committing for 1 or 3 years like Reserved instances and you get a roughly 70% to 90% discount over the On Demand price (usually about 70% for recent instance types).

The downside of this is that if an On Demand customer comes along and needs the capacity you're using, your instances will be terminated with only 2 minutes warning.

But I thought they were a bad idea for prod?

Previously the "spot market" operated like an auction; you set the maximum price you were willing to pay and other people could outbid you and take your capacity. However in 2017 AWS simplified this to make using Spot Instances less intimidating and unpredictable; now the price is set by AWS rather than demand.

This means that the only way your Spot Instances will be terminated now is if an On Demand customer needs the capacity and AWS doesn't have any other capacity available.

Another change which made Spot Instances more suitable for production was the introduction of Auto Scaling Group Launch Templates with Mixed Instance Policies in 2018 and then Capacity-Optimized Allocation Strategies in 2019. The combination of these things made it possible to have a normal Auto Scaling Group request a mixture of instance types with some Spot and some On Demand.

For example you can request m4.xlarge, m5.xlarge and m5a.xlarge with 20% On Demand. You will then get 20% of your requested capacity as On Demand and Spot instances will make up the rest out of the selected types. The instances you get are based on which Amazon has most spare capacity of. This minimises the chance of your instances being terminated and maximises the chance of your ASG being able to provision instances.

In December 2020 AWS added Spot Instance support to Managed Node Groups. This takes a lot of the complexity out of managing your nodes, but doesn't change much of this information. I have noted where things differ between the two approaches.

When to use Spot Instances

Spot is not for everyone; it requires your workloads to handle individual Kubernetes Pods being terminated without causing downtime. This is of course good practice anyway, but most companies have at least a few legacy apps which can only run a single instance; these are not suited to running on spot. However I'll talk about how you and "pin" these apps to your On-Demand instances later on.

Disclaimers out of the way, we went primarily with Spot for a few reasons. Firstly, that it is the cheapest possible way to get compute on AWS; even if you pay for a Reserved Instance for 3 years upfront it's about 60% savings (depending on region and instance type) and the more sane 1 year term is about 40%.With Spot Instances you can typically expect about a 70% saving.

Secondly, our load is quite spiky; at peak load we need about 80 instances in our main production cluster but the rest of the time it's maybe only 40 instances; so reserved instances would only make sense for about half the cluster anyway.

Preparation

Even with the perfect workload, it still takes a bit of work to get your cluster ready for running on Spot instances.

Pod Topology Spread Constraints

If your Kubernetes cluster nodes are fairly likely to disappear, it's a good idea not to run all the instances of a particular service on a single Node. The old way of doing this before Kubernetes v1.19 was using Inter-Pod Affinity Rules. However now with v1.19 available on EKS you can use Pod Topology Spread Constraints which are a bit more flexible and easy to configure.

AWS Node Termination Handler

If you are not using Managed Node Groups, the AWS Node Termination Handler listens for "Spot Instance Termination Notification" events from AWS (and a number of others). These events come 2 minutes before your nodes are terminated.

When it receives one of these events it will gracefully Cordon and Drain the Node, to move the Pods to other instances before the Node is terminated.

One of the advantages of Managed Node Groups is that this is all handled for you without running additional services.

Setting up your Auto Scaling Groups

The Kubernetes Cluster Autoscaler is quite particular about how your Auto Scaling Groups (or Managed Node Groups) are set up. This will also impact how your Nodes are labelled when they join the cluster, so you can identify which ones are On Demand and which are Spot.

The Cluster Autoscaler only works properly if you have a separate Auto Scaling Group (or Managed Node Group) for each Availability Zone you want to use. It also needs the Nodes to be of the same CPU and RAM capacity, so it can properly estimate what will fit on new nodes it spins up.

In practice we've gone with this configuration:

  • A set of three Auto Scaling Groups set to request m5a.xlarge On Demand instances; for us-east-1a, us-east-1b and us-east-1c. These Nodes are labelled with capacity-type=on-demand when they join the cluster by adding parameters to the bootstrap.sh script in the EKS Worker Node AMI; this article covers how that works.
  • A set of three Auto Scaling Groups using a mixed instances policy (mentioned above) which request Spot Instances of m5.xlarge, m4.xlarge and m5a.xlarge in the same AZs. These are labelled with capacity-type=spot.
  • A set of three Auto Scaling Groups requesting Spot m6g.xlarge ARM instances ([see my ARM cluster article]({% link _posts/2021-02-20-managing-multi-arch-kubernetes-clusters.markdown %})).

The only difference when using Manged Node Groups, is that changing the Node labels can be done from the AWS Console rather than editing the bootstrap.sh parameters.

We then use the Priority Expander in the Kubernetes Cluster Autoscaler to make the Spot Auto Scaling Groups higher priority than the On Demand ones, so the On Demand groups are only scaled up if capacity is not available in the Spot groups. The Helm Chart helpfully makes this really easy to configure.

With Managed Node Groups the cluster autoscaler only knows about ASG names not the Managed Node Group names. There is an option issues on the AWS containers-roadmap GitHub to make the ASG names more identifiable. Until then I've written a little utility to work around it. There is also this open issue for the v1.19 cluster autoscaler which is worth watching out for.

Node Affinity for apps we want to run on On Demand

Because we've split up the Spot and On Demand Instance Auto Scaling Groups and used that to label the Nodes differently in the cluster, we are now able to use those labels to schedule certain Pods to our On Demand instances. This is very useful for those Legacy apps which will cause downtime if restarted. This documentation covers how Node Affinity works.

We also set give these Pods a high priority through PriorityClasses to minimise the chance of them being evicted from their Node.

Final Notes

I hope this gives you a good idea of how to use Spot instances in your EKS clusters. I expect most of it could be applied to a non-EKS cluster in AWS, and I plan on trying this with k3s in the near future.

I'm always happy to chat to people working on similar challenges, so get in touch on Twitter if you have any questions or just fancy a chat!

/using-aws-spot-instances-in-your-production-eks-cluster.html
Getting up and running with multi-arch Kubernetes clusters
kubernetesarmaws
"How to effectively add arm64 nodes into an existing amd64 Kubernetes cluster without making problems for yourself."
Show full content

The world of ARM processors has been getting very interesting over the last few years. Until fairly recently, for most people, ARM CPUs were reserved for their phone or maybe a Raspberry Pi running their home DNS. However now the Raspberry Pi 4 has a pretty decent quad-core CPU and up to 8GB RAM, Apple have blown away the industry with the M1 chips and AWS have launched Graviton2 instances which depending on who you ask have 20-40% better price/performance than the Intel equivalents.

Despite all this, until recently I wasn't convinced trying to use arm64 nodes in a production Kubernetes cluster was worth the effort. However the combination of all these events has caused a lot of the software to catch up and support for arm64 in Kubernetes distributions from Amazon EKS to k3s is now excellent! It is likely that a good proportion of the container images you're using in production now support "multi-arch" and will run on both arm64 and amd64 machines.

Container Images

Let's start with the container images, as that's what you'll need to tackle first if you want to start using arm64 in your environments. Historically if you wanted to support multiple CPU architectures for your container image you would have to have two separate tags e.g. myimage:1.2.3-amd64 and myimage:1.2.3-arm64. This was both a pain for people building the images and for people trying to use them; if you're deploying a Helm Chart for example you would have to override the image tags if you wanted to run on ARM and there was no hope for you if you wanted to use mixture of arm64 and amd64 nodes in a cluster and have the same pods seamlessly schedule on either (without some extra tooling).

This problem has gone away with Manifest Lists in the Image Manifest V2 spec. This allows you to specify a list of container images for a number of different architectures in a single "Manifest List". In newer container runtime versions (like Docker) this means if you do a docker pull nginx:alpine on your Raspberry Pi you'll get an image for arm64 and on your Intel laptop you'll get an amd64 image without any further effort. Previously you would have got the godforsaken "exec format error".

You might think that this means we've reached multi-arch nirvana and everything will "just work" now, but unfortunately this is not the case. If you're using an existing public image, you will need to make sure that it is using a manifest list and supports both amd64 and arm64. Some registries make this really easy, such as Docker Hub, where you'll get a nice list of architectures on the Tags tab.

However some don't make it obvious at all! The easiest way I've found so far to determine if an image is multi-arch, is to use the experimental docker manifest command in the latest Docker versions. As it says in the documentation you will have to enable experimental features in ~/.docker/config.json then you will be able to run a command like:

[~]$ docker manifest inspect nginx:alpine
{
   "schemaVersion": 2,
   "mediaType": "application/vnd.docker.distribution.manifest.list.v2+json",
   "manifests": [
      {
         "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
         "size": 1568,
         "digest": "sha256:9a39c77d9ea3a9ddc41535f875b7610a0f121df3c2496c16f2a3a5fcb0e43e4f",
         "platform": {
            "architecture": "amd64",
            "os": "linux"
         }
      },
      {
         "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
         "size": 1568,
         "digest": "sha256:22d2c4a5220232818a0fe7a5d3651c846bc3e7d2ff8dbfc2f665c717f0e43a69",
         "platform": {
            "architecture": "arm64",
            "os": "linux",
            "variant": "v8"
         }
      },
...

As you can see the nginx:alpine image is a "manifest list" rather than a plain old manifest and supports the two architectures we're after. Great! However distroless-java is still single architecture:

[~]$ docker manifest inspect gcr.io/distroless/java
{
	"schemaVersion": 2,
	"mediaType": "application/vnd.docker.distribution.manifest.v2+json",
	"config": {
		"mediaType": "application/vnd.docker.container.image.v1+json",
		"size": 1164,
		"digest": "sha256:85cdcf63cad1cfe5373c68f78f21f0c6349fee87fbb40bc9a9dc7d560f52438b"
	},
	"layers": [
		{
			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
			"size": 643695,
			"digest": "sha256:9e4425256ce4503b2a009683be02372ee51d411e9cc3547919e064fee4970eab"
		},
...

I won't go into building these multi-architecture manifest lists here, as it very much depends on which tool you're using. If you're using Docker to build your images you can use the docker buildx experimental feature. We're currently using the jib-maven-plugin for our Java-based apps, which has recently added the platforms feature. There are many other ways to do it and GitHub actions you can use, so it's not too hard, and you don't need an arm64 machine to build an arm64-compatible image anymore thanks to integration with QEMU.

Kubernetes

Getting arm64 nodes in your cluster is as simple as just creating an extra autoscaling group (or one per Availability Zone) in AWS or using k3sup to join your Raspberry Pi 4 to your k3s cluster. I'll try to keep this post generic to however you deploy Kubernetes, but as we're using EKS in production, here is the documentation on getting the right AMI for your architecture. I also have a k3s cluster at home with two old Intel laptops and a Raspberry Pi 4, and this guide works exactly the same on that.

Before you add arm64 nodes to your cluster you must consider whether you want to specifically exclude incompatible Pods from running on these nodes with node affinity rules, or if you want to exclude all Pods by default and specifically allow the ones you know work on arm64 with taints and tolerations. I went with the latter on our EKS clusters because the majority of our workloads aren't yet multi-arch, but on my home Raspberry Pi cluster I went with the former.

Node Affinity

The node affinity rule option requires no configuration of the nodes themselves because recent Kubernetes versions have a standard kubernetes.io/arch node label; this should be set to arm64 or amd64. However it will require a lot of effort if you have a large number of workloads which don't yet work on arm64.

[~]$ kubectl describe no myarmnode
Name:               myarmnode
Roles:              control-plane,etcd,master
Labels:             kubernetes.io/arch=arm64
                    kubernetes.io/hostname=myarmnode
                    kubernetes.io/os=linux

All you need to do is set up a node affinity rule in the PodSpec of any Pods which don't have multi-arch images like this:

affinity:
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
        - matchExpressions:
          - key: "kubernetes.io/arch"
            operator: In
            values: ["amd64"]

You could also do the opposite and use the NotIn operator and values set as ["arm64"].

Taints and Tolerations

The option I went with in our EKS clusters is to set up a NoSchedule "taint" on all arm64 nodes which we then "tolerate" on any pods we know to work on arm64. I've done this in the user-data script in the Launch Template used by our Graviton2 Auto Scaling Groups through the --kubelet-extra-args flag of bootstrap.sh; the extra arg you need to pass to the kubelet is --register-with-taints="arch=arm64:NoSchedule". You can also just use this command after your nodes are registered to the cluster: kubectl taint no myarmnode arch=arm64:NoSchedule.

Once all your arm64 nodes are tainted, only Pods with the right tolerations will be scheduled to run on them. As with affinity rules, they are specified in the PodSpec of your Pod. In our case the toleration needed will be:

tolerations:
- key: "arch"
  operator: "Equal"
  value: "arm64"
  effect: "NoSchedule"

You might see some system-level DaemonSets like kube-proxy already have tolerations like this, which will also work:

tolerations:
- effect: NoSchedule
  operator: Exists

If you get issues with certain DaemonSets not being scheduled on your arm nodes, even though they have the right tolerations, check that the affinity rules don't exclude nodes by the kubernetes.io/arch label.

Finding multi-arch images

The first Pods I had to make sure were running on our arm64 nodes were the DaemonSets for things like fluentd (our logging agent) or jaeger-agent (our tracing agent). Unfortunately neither of these images were multi-arch which would have been a blocker for running our workloads on arm64 nodes. However as is often the case in the Open Source world, somebody had already had the same problem and a bit of searching GitHub showed up an open Pull Request or Issue with links to un-official images; this in the case of fluentd and this in the case of jaeger-agent. This is course not as good as the project providing official multi-arch images (like everything except kube-state-metrics in the kube-prometheus-stack Helm Chart) but in more and more cases official multi-arch images are available and this will only get better over time.

Cluster autoscaler

After I'd got a number of our workloads set up to run on either arm64 or amd64 nodes, I of course wanted to run as many arm64 nodes as possible due to the lower cost. I did this using the Priority based expander for cluster-autoscaler by setting the Auto Scaling Groups for arm64 nodes to a higher priority than the rest of our groups. You could also set up preferredDuringSchedulingIgnoredDuringExecution affinity rules on arm64 compatible Pods but we found the Cluster Autoscaler configuration to be sufficient.

Bash one-liner for good measure

This will (on Linux at least) list out all the arm64 compatible images running in your Kubernetes cluster:

kubectl get po -A -o yaml | grep 'image:' | cut -f2- -d':' | sed 's/^[[:space:]]*//g' | grep '/' | sort -u | xargs -I{} bash -c "docker manifest inspect {} | grep -q arm64 && echo {}"

Change grep -q to grep -vq to invert the logic and return images which won't work on arm64 nodes.

This would definitely be nicer using yq rather than parsing YAML with grep, cut and sed...but I know most people don't have yq installed.

Final thoughts

It is still a bit of effort to add arm64 nodes to your Kubernetes clusters. However for us it was worth the effort for the cost savings and it's always exciting to be an early adopter. If anyone has any questions on how I've got things set up or would like to share different approaches I'm on Twitter and love to hear from other people working on similar challenges!

/getting-up-and-running-with-multi-arch-kubernetes-clusters.html
FortiClient SSL VPN Silent Install with Group Policy
scriptsnetworkingwindows
"I've never been able to find a way to silently install the Fortinet SSLVPN client with Group Policy or otherwise. Today I had a bit of a break through."
Show full content

I'm a big fan of Fortinet products; we've got a Fortigate firewall at work and it has always been completely reliable and easy (for a firewall) to configure. So when I had to implement a VPN for a handful of remote workers, I initially tried to use L2TP-IPSec which is supported by the Fortigate, but certain UK ISPs block or otherwise mess with IPSec traffic so I had to find an alternative. That alternative ended up being their proprietary SSL VPN.

The client is very simple, it's been completely reliable and the setup was extremely easy. However as more and more people have been using it, the fact I didn't have a way to silently roll it out has become a bit of a pain. I mostly use either use Chocolatey with its Puppet Module or Group Policy to push out software to Windows machines, but I couldn't find a (recent) MSI installer or a way to silently install with the EXE installer anywhere online or via their support team.

Today I had a bit of a break through. I discovered that the EXE installer creates an MSI during the installation process (although it doesn't show up if you try to extract the EXE with 7-zip or similar) which I can now deploy with Group Policy.

Heres how:

  1. Download the latest installer package from Fortinet's support portal. Navigate to Download > Firmware Images. Then select Fortigate as the product and click Download. At the time of writing the latest installer can be found in /FortiGate/v5.00/5.2/5.2.4/VPN/SSLVPNTools/sslvpnclient64pkg_4.4.2317.tar.gz
  2. Open up the archive with something like 7-zip and extract SslvpnClient.exe
  3. Run SslvpnClient.exe but don't click on anything in the installer
  4. Navigate to C:\Users\username\AppData\Local\Temp and you'll find there is an SslvpnClient.msi that you can copy somewhere safe to deploy as normal with Group Policy.

I clearly should have read the messages that the installer spits out. I would have found this out much sooner! However I'm surprised this isn't documented anywhere online and their support team aren't aware of it.

As a little bonus, I found this post on the Fortinet forums. If you push out these Registry settings to HKEY_CURRENT_USER with the User Configuration > Preferences > Windows Settings > Registry part of Group Policy you can pre-configure the client and save your users some typing (and yourself some support queries).

This is the registry file that I applied to my laptop and then imported into a GPO using the Registry Wizard (easier then doing it all by hand!):

Windows Registry Editor Version 5.00

[HKEY_CURRENT_USER\SOFTWARE\Fortinet]
@=""

[HKEY_CURRENT_USER\SOFTWARE\Fortinet\SslvpnClient]
@=""
"KeepConnectionAlive"="1"
"Installed"=dword:00000001
"ConnectionName"="CompanyVPN"
"ServerAddress"=""
"ServerPort"=""

[HKEY_CURRENT_USER\SOFTWARE\Fortinet\SslvpnClient\Tunnels]
@=""

[HKEY_CURRENT_USER\SOFTWARE\Fortinet\SslvpnClient\Tunnels\CompanyVPN]
@=""
"ServerCert"="1"
"Server"="vpn.company.com:443"
"Description"="My Company's VPN Setting"
/forticlient-ssl-vpn-silent-install-with-group-policy.html
Checking the existence of a folder on all domain machines
scriptspowershell
"I needed a way of finding a list of machines where the JRE install had failed or not run because the user hasn't rebooted for a while. So basically I needed to check that the Java install directory was present on all PCs on the Windows domain."
Show full content

I haven't posted since March which is pretty shameful. I've been extremely busy working on some web services magic that I can't really share at this point if I value my pay check. However I had a sysadmin problem today that warranted some script writing which I thought I'd share.

I've been trying to get Oracle Java updates to deploy via Group Policy. I've tried a number of different approaches before using Puppet, but all of them proved unreliable. I think I've got the Group Policy method cracked now and I'll share it here soon.

Anyway, I needed a way of finding a list of machines where the JRE install had failed or not run because the user hasn't rebooted for a while. So basically I needed to check that the Java install directory was present on all PCs on the Windows domain.

The script just gets a list of the computers in the relevant OU and then loops over them, printing a message if it finds 'Program Files' but not the specific Java directory. This weeds out machines that are unreachable or not running Windows (we've got a few Ubuntu boxes in our Workstations OU). I expected it to be much more work!

$Computers = Get-ADComputer -filter * -Searchbase "OU=ComputerOU,DC=Domain,DC=Local" | % {$_.Name}
Foreach($Computer in $Computers) {
  if ((Test-Path "\\$Computer\c$\Program Files") -And !(Test-Path "\\$Computer\c$\Program Files\Java\jre1.8.0_60")) {
      echo "$Computer doesn't have the latest Java"
  }
}
/checking-the-existence-of-a-folder-on-all-domain-machines.html
Grabbing Photo URLs from Twitter
scriptspython
"Yesterday I helped out at #CodeOff2015, an event that my employer Snowflake Software runs every year at the Electronics and Computer Science department at the University of Southampton. We give students the day to write software to solve a problem that we set, they get judged at the end of the day and then the winner gets a paid summer internship with our development team. A great way to find talented developers, I'm sure you'll agree!"
Show full content

Yesterday I helped out at #CodeOff2015, an event that my employer Snowflake Software runs every year at the Electronics and Computer Science department at the University of Southampton (which also happens to be where I studied). We give students the day to write software to solve a problem that we set, they get judged at the end of the day and then the winner gets a paid summer internship with our development team. A great way to find talented developers, I'm sure you'll agree!

Here are the students tucking in to the free pizza:

Students tucking in to free pizza

The morning was a bit hectic, helping everyone to get started and solving a few connectivity problems with the resources we had provided. However after they were all busy coding I thought I'd set myself a little (or so I thought) challenge, to brush up on my (very ropey) Python skills.

I had set up an internal website within ECS to host the various resources using Bootstrap and as the day went on, there were lots of photos posted under the hashtag on Twitter. I thought I'd use Twitter's API and Bootstrap's carousel/slideshow feature to show the latest photos on the website.

Initially I used Twython to get the tweets from the API and print out any 'media_url' attributes it found. However whatever I did to tweak the parameters for the Twitter search, I still only got 4 or 5 unique images, which is a fraction of what was posted.

#!/usr/bin/env python3
from twython import Twython, TwythonError
APP_KEY = ''
APP_SECRET = ''
OAUTH_TOKEN = ''
OAUTH_TOKEN_SECRET = ''

# Requires Authentication as of Twitter API v1.1
twitter = Twython(APP_KEY, APP_SECRET, OAUTH_TOKEN, OAUTH_TOKEN_SECRET)
try:
    search_results = twitter.search(q='CodeOff2015', count=100, result_type="recent")
except TwythonError as e:
    print(e)

for tweet in search_results['statuses']:
#    print (tweet['text'])
    if 'media' in tweet['entities'].keys():
         media = tweet['entities']['media']
         for mediaitem in media:
             if 'media_url' in mediaitem.keys():
                 print(mediaitem['media_url'])

I did some thinking and took a different (significantly more hacky) approach. The 'Photos' view on the Twitter website shows many more photos that what I was getting through the API, so I looked into grabbing the HTML and processing it. This uses urllib3 to get the HTML and Beautiful Soup 4 to process it. For our hashtag it seems to get the most recent 12 photos which is much better than the API.

#!/usr/bin/env python3

#Grab the html
import urllib3
http = urllib3.PoolManager()
twitterreq = http.request('GET', 'https://twitter.com/search?v=stream&q=%23CodeOff2015&src=typd&mode=photos')
twitterpage = twitterreq.data

from bs4 import BeautifulSoup
#Create a BeautifulSoup object from the HTML source code
twittersoup = BeautifulSoup(twitterpage)

#Make a list of the link elements which have photos associated with them
linklist = twittersoup.find_all("a", class_="media media-thumbnail twitter-timeline-link media-forward is-preview ")

#Iterate over the list of links
for link in linklist:
    imageurl = link.get('data-resolved-url-large')
    print(imageurl)
/grabbing-photo-urls-from-twitter.html
Getting FortiClient SSL VPN on Linux to trust a certificate
linuxnetworking
"I'm a Linux guy; I find it to be the most intuitive operating system for most tasks, even on a Laptop. At home I use Arch Linux and at work I've recently moved from the Windows 7 workstation I've been using for the last 10 months to Ubuntu GNOME on an old laptop that was feeling unwanted."
Show full content

I'm a Linux guy; I find it to be the most intuitive operating system for most tasks, even on a Laptop. At home I use Arch Linux and at work I've recently moved from the Windows 7 workstation I've been using for the last 10 months to Ubuntu GNOME on an old laptop that was feeling unwanted. A couple of the developers have also made the move, although they're currently using the default Unity desktop which I can't stand.

Anyway, as usual I digress. We use StartSSL certificates for our external services, because why pay for SSL certs when you can have them for free? These certs are trusted by pretty much everything, but for some reason the Linux version of Fortinet's SSL VPN client doesn't. As there is no way to turn off certificate trust checking (which is a bad idea anyway) I couldn't connect to the VPN from either my work Ubuntu laptop or home Arch Linux laptop. It took me a while to diagnose as the software just hangs on 'Connecting...', but the forticlientsslvpn.log file in the helper directory within the FortiClient install directory helped.

The only documentation I could find on this problem was on the website of the University of Bamberg in Germany. As this is in German it is pretty hard to find (although straightforward to follow).

Here's what you need to do after you have installed the VPN client (this is well documented elsewhere so I'll leave it out):

  1. Make a .fctsslvpn_trustca directory in your home directory.
  2. Save your CA's root certificate in this folder in Base-64 Encoded PEM format (StartCom's is available here).
  3. Start (or restart) FortiClient and your VPN should now connect.
/getting-forticlient-ssl-vpn-on-linux-to-trust-a-certificate.html
Ooh look! Shiny new Wi-Fi!
hardwarenetworking
"The access points powering the guest and internal Wi-Fi networks at work were getting a bit old and senile, so around 6 months ago we decided to get some money to upgrade. At the time we only thought only one needed replacing, so we went out and bought a Cisco WAP321 to replace it; having prior knowledge of their reputation and not wanting to be burdened..."
Show full content

The access points powering the guest and internal Wi-Fi networks at work were getting a bit old and senile, so around 6 months ago we decided to get some money to upgrade. At the time we only thought only one needed replacing, so we went out and bought a Cisco WAP321 to replace it; having prior knowledge of their reputation and not wanting to be burdened with another unreliable access point. However we've since realised that we needed a second access point, so this past week we bought a Ubiquiti Unifi AP-AC which I'll compare to the Cisco.

The Cisco

Overall the setup was painless and the web interface works in any modern browser, unlike the Cisco SGE2010 switches we have, which require Internet Explorer in Compatibility Mode. The only problem I had was unfortunately quite a major one; by default the access point has "Spanning Tree Mode" enabled. Logically you would think that as we use use STP on our 3 SGE2010 switches, this would be just fine. Sadly it put the switches into an STP loop and more or less brought down the entire LAN. Not very helpful at all... This problem persisted regardless of changes to switch port configuration until I found and turned off the option on the access point. It's under Wireless > WDS Bridge on the web interface if you're having the same problem.

Cisco Hardware

The web interface, despite being very plain and Cisco-ish provides lots of useful functionality. Features that jumped out at me include packet capture, captive portal, remote syslog, SNMP and rogue AP detection. This is in no way a complete list and I can't vouch for half of them as I just set it up for our two networks; our internal network using WPA2 Enterprise talking back to a Microsoft Network Policy Server via RADIUS on one VLAN and our guest network using WPA2 Personal on another VLAN. All the heavy lifting to lock down the guest network is done on our Fortigate firewall, although I could offload a lot of it to this access point if I wanted to.

Cisco Interface

Since I configured it, the WAP321 has worked faultlessly and provided great performance. Although I wish it could broadcast on both 2.4GHz and 5GHz bands simultaneously like its bigger brother the WAP371; a model which is perhaps more comparable to the Unifi access point I'll move onto shortly.

The Unifi

So, a few months later we've got problems with the Wi-Fi at the other end of the office and since buying the Cisco I've taken to reading /r/sysadmin on my OnePlus One before I drag myself out of bed in the morning. On this most excellent community I've seen numerous mentions of Ubiquiti's products and I don't think any of them have been negative; so after some further research I decided to recommend we got a UniFi AP-AC instead of another Cisco.

Although the good reviews helped, the decision was mostly based on 2 things; firstly that £200 would get us another WAP321 and PoE injector from Cisco which is only single band (at a time) and 802.11n, whereas the AP-AC is dual band and supports the new 802.11ac standard. The second major factor was that the WAP321 will only support 32 clients at a time1 (although I'm not sure what exactly happens when you hit that) which doesn't leave us much room for expansion, whereas the Unifi supports 200+2. That's a lot of extra bang for our precious IT budget buck!

Unifi Hardware

The Ubiquiti UniFi AP-AC impressed straight out of the box; unlike the Cisco it ships with a Gigabit PoE injector (instead of being £50 extra), it also comes with some very sturdy looking wall mounting brackets and all the screws you need. I even had comments when I was configuring it at my desk about just how smart it looked, with the subtle blue light that illuminates when it is up and running.

Unlike the Cisco it doesn't have a built-in web interface (although it does have an SSH interface); instead you need to install their very smart UniFi Controller application on a server. The download page only lists Mac and Windows versions so I put it on a Windows Server 2012 R2 VM I had, but it turns out there is a Linux version too which you can download from the UniFi Updates Blog; just make sure you've got a Java JRE installed and ports 8080 and 8443 aren't used.

The first machine I installed the Windows software on had something else listening on port 8080, but instead of bringing up an error, the software just listened on localhost and of course the access point couldn't connect to it. So after working this out, I put the software on a different machine and waited patiently for a few minutes, but the access point still didn't pop up in the web interface. After a bit of Googling I found these instructions on the Spiceworks Community which solved the problem and got me to the point where I could start playing with the many configuration options. Yay!

Like the hardware, the (flash-based) web interface gives a great first impression; after logging in you are presented with an example office floorplan, invited to upload your own and then drag your access points to their location. Unfortunately, despite desperately wanting more access points, so I can play with all the functionality properly, I've only got the one; so I completely skipped this tab (for now at least) and headed straight for the Settings menu to set up and set it up to mirror the Cisco with our two networks.

There is lots of functionality which mirrors the Cisco; it also has captive portal, remote syslog, SNMP and rogue AP detection as well as numerous settings for things like locking down guest access or bandwidth limiting a group of users. It is missing packet capture, although tcpdump comes pre-installed and is accessible via SSH.

For me, other than the extra performance and capacity you get for your money, the very flashy charts and tables it gives you are the biggest selling point. You can see lots of data about each user that is currently connected to your access point(s) and also lots of metrics about usage over time.

Useful data

Shiny charts

On the face of it, this might just look like a lot of pointless graphs that you can print out and bore management with. However the way they've implemented this makes it very easy to take action based on the data you're given, such as blocking or rate limiting a user who is hogging your bandwidth. This reduces troubleshooting time and any SysAdmin will tell you that this is worth its weight in gold!

Conclusion

The Cisco WAP321 is a very reliable access point, with a lot of functionality. If I was told I could only use Cisco access points for the next 10 years, I wouldn't be particularly upset because I'm sure they would give me very few problems. However the Unifi AP-AC does everything the Cisco does, with much better performance for the same amount of money. It also gives you a very powerful centralised management interface which allows you to easily scale up to much bigger numbers of access points and clients. Unless something changes drastically over the next few months I will definitely be recommending these to anyone, regardless of the deployment size.

  1. http://www.cisco.com/c/dam/en/us/products/collateral/wireless/wap321-wireless-n-selectable-band-access-point-single-point-setup/aag_c45-717569.pdf

  2. http://dl.ubnt.com/datasheets/unifi/UniFi_AP_DS.pdf

/ooh-look-shiny-new-wi-fi.html
DiskCryptor full disk encryption
encryption
"A few months ago I was tasked with encrypting the laptops of a few of our consultants who travel a fair bit. I was going to go with Microsoft BitLocker, but as we use Windows 7 Professional we're out of luck. I also looked at using TrueCrypt, but there are a lot of question marks around it since it was abandoned by its developers..."
Show full content

A few months ago I was tasked with encrypting the laptops of a few of our consultants who travel a fair bit. I was going to go with Microsoft BitLocker, but as we use Windows 7 Professional we're out of luck. I also looked at using TrueCrypt, but there are a lot of question marks around it since it was abandoned by its developers and its successor VeraCrypt wasn't around at the time I was looking. There are also a lot of enterprise paid options, but for our usage the free options are perfectly fit for purpose.

DiskCryptor is very easy to set up on an existing Windows installation, and because it uses the hardware AES encryption support built into most modern processors there is no perceivable effect on system performance. However I did find that the documentation, especially around the forgotten password reset process, was a bit lacking. So here is the documentation I wrote for our internal wiki (with some small changes), which covers the encryption process and what to do if you/one of your users forgets their password.

Table of Contents

Encrypting a machine
  1. Go to the DiskCryptor website and get the latest DiskCryptor installer (we used 1.1.846.118), then reboot the machine when prompted.
  2. After you have rebooted the machine, run DiskCryptor from the Start Menu.
  3. Select C: from the list (assuming this is your boot drive), and click on Encrypt.
  4. Continue with the default Encryption Settings and Boot Settings; these will be very secure and the best performing for most fairly modern machines.
  5. Pick a Volume Password and store this somewhere very safe such as a Lastpass Vault.
  6. The system will then encrypt your hard disk. It takes about an hour for a 256GB SSD. DO NOT CANCEL THIS OR TURN OFF YOUR MACHINE!
  7. When the encryption process has finished, select C: from the list again and click on Tools > Backup Header. Save this somewhere very safe off of your hard disk such as Dropbox, Google Drive or a backed up file server.
  8. You can now reboot the machine. You should be prompted for the password you picked in step 5.

DiskCryptor uses a US keyboard layout during boot (not when you set the password in Windows). So some special characters might be in different places.

Recovery Process when the password has been lost

To understand what you're doing you need to understand a bit more about how DiskCryptor works when it encrypts your disk.

When you click Encrypt on a disk it generates a unique encryption key which is used along with the AES cipher to encrypt and decrypt your data. This encryption key is stored in an encrypted form in the boot sector of your hard disk. When you turn on your machine the password you set is used to decrypt the encrypted encryption key. DiskCryptor can then access the encrypted data and boot Windows. If you change the password, it doesn't change the encryption key; it just decrypts it with the old password and encrypts it with the new one you've set.

So when we backed up the header after first encrypting the machine, we backed up the encryption key when it was encrypted with the initial password which we then stored somewhere super-safe. To recover access to the disk, we just have to restore this old header, reboot the machine and boot it up with the old password.

Because Windows won't boot to run the DiskCryptor software, you need the DiskCryptor Recovery Environment CD, which boots up into a Windows 7 environment with DiskCryptor installed. You can build your own with this guide or use this admittedly slightly dodgy looking pre-built iso image.

  1. Boot the machine to the CD.
  2. On another machine, download the header backup and put it on a network share or USB drive. If you used a USB drive, skip the next step.
  3. Open Explorer from the Desktop, Click on Network and Turn on network discovery and file sharing.
    Turn on network discovery and sharing
  4. Open DiskCryptor from the Desktop, select C:, then click on Tools > Restore Header.
    Open header backup file
  5. Type the URI for your network share or USB drive in the File name box e.g. \\server\users\cablespaghetti, press Enter then give it your domain username (in the domain\username format) and password.
    Grab the file from the network share
  6. You should then be able to select the Header Backup file.
  7. Reboot the machine and you should be able to unlock the encryption with the old password which I hope you still have saved somewhere super-safe.
Changing the password after initial encryption
  1. Run DiskCryptor from the Start Menu.
  2. Select C: from the list and click on Tools > Change Password.
  3. Give it your old password and set a new one.
  4. Click OK.

At this point the original password should still be kept somewhere super-safe. This old password is used along with the Header Backup to recover a machine when the password has been forgotten.

/diskcryptor-full-disk-encryption.html
Revenge of the flaky routers
scriptslinuxnetworking
"There is an DSL router at work which I'm sure is sentient. It's 100% reliable for months on end and then once in a while, always when I'm out of the office, it decides to crash and drop the Internet connection. The only solution is of course to turn it off and on again; which means visiting the office at the most inconvenient times possible."
Show full content

There is an DSL router at work which I'm sure is sentient. It's 100% reliable for months on end and then once in a while, always when I'm out of the office, it decides to crash and drop the Internet connection. The only solution is of course to turn it off and on again; which means visiting the office at the most inconvenient times possible.

I would spend some money from the IT budget to get it replaced but with us moving to a leased line in the near future and it only failing every few months, it's really not worth it. Being a sysadmin, the natural solution was to write a shell script.

Disclaimer: I am not a shell script guru. I take no responsibility for any exploding routers or other hilarious events that may transpire as a result of the use of these scripts.

The first part of the script does the actual logging in and rebooting. I must admit I had no idea how to do this part, but found the solution in this nixcraft forum thread. It uses expect so make sure this is installed on your system. I saved it as reboot.sh, but call it whatever you like, just remember to change the reference to it in the next script.

#!/usr/bin/expect
set timeout 20

spawn telnet 1.3.3.7
expect "Password:"
send "supersecurepassword\r"
expect "ras> "
send "sys reboot"
send "\r"
expect "ras> "
send "exit"
send "\r"
sleep 5
exit

This second part runs in an infinite loop (the good kind, honest!), checking it can reach an IP address on the Internet every 5 seconds (I recommend using your ISP's next hop router). If it can't then it retries for at least 2 minutes (usually around 3-4) and then runs the router reboot script and sleeps for 10 minutes before doing anything else.

#!/bin/bash
message="Insert super informative email message here."
 
while true; do
    count=120
    while [ $count -ne 0 ] ; do
        sleep 1                                #Wait at least 1 second between checks
        ping -W 1 -c 1 8.8.8.8 >& /dev/null    #Check connectivity
        rc=$?
        if [ $rc -ne 0 ] ; then
            let count=count-1                  #If the ping failed decrement count
        else
            count=120                          #Otherwise make sure count is 120
            sleep 5                            #Wait 5 seconds before checking again
        fi
    done
 
    echo "No ping response from Internet connection for over 2 minutes. Rebooting router and sleeping for 10 minutes." #Log message
    echo $message | mail -s "Router Reboot Alert" email@address.com #Send email
    ./reboot.sh                                #Reboot router
    sleep 600                                  #Go to sleep for 10 minutes
done

If you happen to be using a systemd based Linux distribution like CentOS 7, here's a bonus part for you to get the script running as a service. I saved this as /usr/lib/systemd/system/routerreboot.service then ran systemctl start routerreboot.service to start it and systemctl enable routerreboot.service to make sure it starts after a reboot of the server.

[Unit]
Description=Start router reboot script

[Service]
User=preferablynotroot
ExecStart=/script/path/script.sh
KillMode=process

[Install]
WantedBy=multi-user.target
/revenge-of-the-flaky-routers.html