rabexc blog, recent articles

Replacing and resizing a linux software raid, live

May 14, 2022 Updated May 14, 2022

Show full content

Let me describe the scenario:

You have a linux software raid (raid5, in my case, created with mdadm).
On top of it, you have a few LVM volumes, and LUKS encrypted partitions.
You literally set this up 10 years ago - 4 disks 2 Tb each.
It has been running strong for the last 10 years, with the occasional disk replaced.
You just bought new 8Tb disks.

And now, you want to replace the old disks for the new ones, increase the size of the raid5 volume and, well, you want to do it live (with the partition in use, read write without unmounting it, and without rebooting the machine).

All of this with consumer hardware, that DOES NOT SUPPORT ANY SORT OF HOT SWAP. Basically, no hardware raid controller, just the cheapest SATA support offered by the cheapest atom motherboard that you bought 10 years ago that happened to have enough SATA plugs.

Not for the faint of hearts, but turns out this is possible with a stock linux kernel, fairly easy to do, and worked really well for me.

All you need to do is to make sure you type a few more commands from your shell, so that your incredibly cheap and naive SATA controller and linux system knows what you're up to before going around touching the wiring.

During the entire process I did have to reboot the server once: the chassis was not server class, and I did not have access to the disks without removing the case, and moving some cables around.

But that was it: one shutdown, 10 mins of moving cables around, back on, and the rest of the work was done live.

Preparation

In short:

cat /proc/mdstat
lsblk -do +VENDOR,MODEL,SERIAL
echo check > /sys/block/mdX/md/sync_action
dmesg, smartctl -a /dev/sdX

Longer explanation:

If you don't remember the disks in your array, cat /proc/mdstat to see each volume. In my case, I had to replace the disks in md5.

 # cat /proc/mdstat 
 Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10] 
 md0 : active (auto-read-only) raid1 sdb1[0] sda1[1]
       145344 blocks [2/2] [UU]

 md1 : active raid1 sdb5[0] sda5[1]
       244043264 blocks [2/2] [UU]

 md5 : active raid5 sde[7] sdd[6] sdc[5] sdf[4]
       5860538880 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/3] [UUUU]

Label the disks. Run the command lsblk -do +VENDOR,MODEL,SERIAL, print the output or copy it to your laptop, open your case. Put a label on each disk marking "sdc", "sdd", "sde", "sdf" by checking the SERIAL # (printed on labels on the disk).

 NAME MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT VENDOR   MODEL                 SERIAL
 sda    8:0    0 232.9G  0 disk            ATA      WDC_WD2500AVVS-73M8B0 WD-WCAV94350152
 sdb    8:16   0 232.9G  0 disk            ATA      WDC_WD2500AVVS-73M8B0 WD-WCAV94283568
 sdc    8:32   0   1.8T  0 disk            ATA      WDC_WD20EARX-22PASB0  WD-WCAZAJ370736
 sdd    8:48   0   1.8T  0 disk            ATA      WDC_WD20EZRX-00D8PB0  WD-WCC4M1KD6YU5
 sde    8:64   0   1.8T  0 disk            ATA      WDC_WD20EARS-00MVWB0  WD-WMAZA3309946
 sdf    8:80   0   1.8T  0 disk            ATA      WDC_WD20EZRX-00D8PB0  WD-WMC4N0H1XYK1

Check that the raid is in good health - follow the next 2 steps. If it's a raid5, you can only afford one disk down at a time. If another disk turns out to be damaged while you are replacing a disk, you will lose data. The next steps
cat /proc/mdstat one more time. Verify that all disks are up. [UUUU] means that there are 4 disks, each in up state (see the example output above). If one of the disks had failed, you would have had something like [UU_U] indicating one disk down. In that case, you should have seen something like recovery in progress or degraded mode. If that's the case, you must replace the damaged disk before proceeding!

Trigger an array check. Command is echo check > /sys/block/md5/md/sync_action, with md5 being the name of your raid array. Check cat /proc/mdstat you should now see something like below indicating that the sync is in progress:

 # cat /proc/mdstat 
 Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10] 
 md0 : active (auto-read-only) raid1 sdb1[0] sda1[1]
       145344 blocks [2/2] [UU]

 md1 : active raid1 sdb5[0] sda5[1]
       244043264 blocks [2/2] [UU]

 md5 : active raid5 sde[7] sdd[6] sdc[5] sdf[4]
       5860538880 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/3] [UUUU]
       [>....................]  check =  1.7% (34292164/1953512960) finish=423.6min speed=97372K/sec

Wait. Wait. Wait, until the check is completed. I left my console up with watch -d cat /proc/mdstat.
At the end of the proces: check dmesg if any read error was reported. Then use smartctl -a /dev/sdc and then /dev/sdd, ... for each element of your array to see if any disk had errors.

In my case, the check was successful, dmesg was clean of errors after the array check was started, but smartctl -a /dev/sde showed that the disk - with more than 10 years of total run time - had multiple read errors and recovered from most of them. Still OK, but not in good health. About to fail.

Given that raid5 can only tolerate one disk failing, I started by replacing /dev/sde. Really, you don't want another disk to fail on you while you are waiting for your new disk to be synchronized.

Replacing each disk

It is actually easier than it sounds. In short (do at your own risk, or read the explanation below):

mdadm --manage /dev/mdX --fail /dev/sdX
mdadm --manage /dev/mdX --remove /dev/sdX
echo 1 > /sys/block/sdX/device/delete
Unplug the device (power cable first, sata cable next)
Plug the new device (sata cable first, power cable next)
Wait 10-15 seconds.

Run:

 for file in /sys/class/scsi_host/*/scan; do
   echo "- - -" > $file;
 done;

mdadm --add /dev/md5 /dev/sdc
cat /proc/mdstat
Once synchronization is done, repeat starting from 1.

Longer explanation:

Pick the disk to replace. In my case, I started with /dev/sde.
Mark it faulty, so the raid stops using it: mdadm --manage /dev/md5 --fail /dev/sde.
Tell linux you want the disk entirely out of the array: mdadm --manage /dev/md5 --remove /dev/sde.
If you plan to re-use the disk on a different machine or different array, and you are super-confident you won't have to plug it back in to recover your data, use mdadm --zero-superblock /dev/sde to remove the raid metadata, so your disk won't be detected as a raid array when plugged into another machine. I wanted to keep the old disk around in case the array sync failed for any reason, so I did not do this step.
Tell linux (and the controller!) you are about to unplug the device: echo 1 > /sys/block/sde/device/delete. This step may not be necessary if your controller supports hot swap.
Check dmesg, see a few messages notifying that the device was deleted. Check ls /dev/sd*, sde should be gone.
Physically remove the device. I personally removed the power first, and then unplugged the sata wire. If you have a hot swap/easy swap case, you can probably just remove the disk.

If you forgot about step 4 or 5, in my case linux noticed the disk gone on its own within a few minutes. Check step 5 again after unplugging, wait until the software layer thinks the drive is gone.

Not sure if I was lucky and step 4 and 5 are optional, but I did not trust my controller.

Prepare the new disk. Make sure the disk is new, or that any raid superblock was deleted beforehand (on a separate machine, or before it was unplugged) with mdadm --zero-superblock /dev/sdX.
Connect the disk. Here, I first connected the SATA wire, and then the power. Nothing happened, by looking at dmesg the disk was not detected, although it started spinning.

Ask linux to rescan all the SATA buses, looking for new disks:

  for file in /sys/class/scsi_host/*/scan; do
    echo "- - -" > $file;
  done;

Check dmesg, and ls /dev/sde. dmesg should tell you that a new disk was found, and /dev/sde should now be magically back.
If you need to, now it is the time to create a partition talbe on the disk if you need it. In my case, I use LVM on top of raid, so I just wanted to add the whole physical disk to the raid, no partition table.

Note that the new physical disk is 8Tb, while the old one is 2Tb. So 6Tb would get wasted. But my plan is to replace all disks in the array, and resize the array at the end. So no partitioning was necessary. If you wanted to have one disk larger, you could have partitioned it, put 2Tb back in the raid5 array, and used the rest for something else.
Add the new disk to the array mdadm --add /dev/md5 /dev/sde

Check cat /proc/mdstat. You should now see the disk being synchronized:

  # cat /proc/mdstat 
  Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10] 
  md0 : active (auto-read-only) raid1 sdb1[0] sda1[1]
        145344 blocks [2/2] [UU]

  md1 : active raid1 sdb5[0] sda5[1]
        244043264 blocks [2/2] [UU]

  md5 : active raid5 sde[7] sdd[6] sdc[5] sdf[4]
        5860538880 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/3] [_UUU]
        [>....................]  recovery =  3.4% (66457728/1953512960) finish=1237.1min speed=25421K/sec

Once recovery is done (in 1237 minutes...) make sure that it succeeded (check /proc/mdstat, always, and smartctl -a /dev/sde and any disk involved, verify errors on dmesg).
Assuming all went well, repeat the steps to replace each disk in the array.

Depending on what your machine is supposed to be doing while all of this is in progress, you can adjust the minimum and maximum sync speed by changing the value of:

  /sys/block/md5/md/sync_speed_max
  /sys/block/md5/md/sync_speed_min

For example, with:

  # cat /sys/block/md5/md/sync_speed
  24873
  # echo 50000 > /sys/block/md5/md/sync_speed_max
  # echo 40000 > /sys/block/md5/md/sync_speed_min
  # cat /sys/block/md5/md/sync_speed
  26988
  # cat /sys/block/md5/md/sync_speed
  31840
  # cat /sys/block/md5/md/sync_speed
  37234
  # cat /sys/block/md5/md/sync_speed
  40024

Repeat this same process for each disk. For a week, I'd pretty much replace 1 disk every time I came home from work. All had been replaced by the end of the week.

Resizing the raid

Resizing the raid was easier than I expected. In short:

mdadm --grow /dev/md5 -z max - to resize the array to the maximum supported by the underlying partitions / physical disk.
Wait for the resize process to be completed. I used something like: mdadm -D /dev/md5 | grep -e "Array Size" -e "Dev Size" to see the delta shrinking, and the usual cat /proc/mdstat.
pvresize /dev/md5 to instruct LVM that the physical volume is now larger, and its size can be adjusted to match it.
Once the resize is complete, vgs should show the additional space available in the volume group. Now you can resize any logical volume to have as much space as you need.
If you need instructions on how to resize a logical volume encrypted with LUKS, I wrote about it a few years ago. You can read them all here, they still worked as a charm this time around.

Conclusions

The whole process was suprisingly painless. All worked like a charm, and by the end of it I had an array with 24 TB of total space available, with close to zero downtime.

http://rabexc.org/posts/mdadm-replace

Speeding up the Carbon X1 Trackpad

May 14, 2020 Updated May 14, 2020

Show full content

Let's say you have a Carbon X1 5th gen.

Let's say your trackpoint is an TPPS/2 Elan TrackPoint (and you can check this by running xinput |grep -i TrackPoint).

Let's say you have tried various xinput or /sys/.*/acceleration or /sys/.*/speed settings. But still...

Your trackpoint is WAY TOO SLOW!!

This is what fixed it for me:

Run: mkdir -p /etc/libinput

Create a local-overrides.quirk file with:

cat > /etc/libinput/local-overrides.quirks <<END
[Trackpoint Override]
MatchUdevType=pointingstick
AttrTrackpointMultiplier=2.0
END

Logout login / restart your X server / wayland.
Enjoy the increased speed! Increase the 2.0 above for a faster experience, decrease it for a slower experience.

In one cuttable and pastable blob:

   mkdir -p /etc/libinput
   cat > /etc/libinput/local-overrides.quirks <<END
   [Trackpoint Override]
   MatchUdevType=pointingstick
   AttrTrackpointMultiplier=2.0
   END

In case this does not work, read on. I recommend you jump to the last section, and play with libinput debug-gui, or libinput quirks list /dev/input/event2 or whatever your trackpad is associated with.

Longer explanation No luck with xinput

Normally, you tune your pointing device using xinput:

A simple xinput will show you the list of devices supported. Next to each device, you will see an id, like:

$ xinput
⎡ Virtual core pointer                          id=2    [master pointer  (3)]
⎜   ↳ Virtual core XTEST pointer                id=4    [slave  pointer  (2)]
⎜   ↳ SynPS/2 Synaptics TouchPad                id=11   [slave  pointer  (2)]
⎜   ↳ TPPS/2 Elan TrackPoint                    id=12   [slave  pointer  (2)]
⎣ Virtual core keyboard                         id=3    [master keyboard (2)]
    ↳ Virtual core XTEST keyboard               id=5    [slave  keyboard (3)]
    ↳ Power Button                              id=6    [slave  keyboard (3)]
    ↳ Video Bus                                 id=7    [slave  keyboard (3)]
    ↳ Sleep Button                              id=8    [slave  keyboard (3)]
    ↳ Integrated Camera: Integrated C           id=9    [slave  keyboard (3)]
    ↳ AT Translated Set 2 keyboard              id=10   [slave  keyboard (3)]
    ↳ ThinkPad Extra Buttons                    id=13   [slave  keyboard (3)]

Use xinput list-props 12 to see the properties of the device, for example:

$ xinput list-props 12
Device 'TPPS/2 Elan TrackPoint':
    Device Enabled (154):   1
    Coordinate Transformation Matrix (156): 1.000000, 0.000000, 0.000000, 0.000000, 1.000000, 0.000000, 0.000000, 0.000000, 1.000000
    libinput Natural Scrolling Enabled (298):   0
    libinput Natural Scrolling Enabled Default (299):   0
    libinput Scroll Methods Available (302):    0, 0, 1
    libinput Scroll Method Enabled (303):   0, 0, 1
    libinput Scroll Method Enabled Default (304):   0, 0, 1
    libinput Button Scrolling Button (316): 2
    libinput Button Scrolling Button Default (317): 2
    libinput Middle Emulation Enabled (308):    0
    libinput Middle Emulation Enabled Default (309):    0
    libinput Accel Speed (310): 1.000000
    libinput Accel Speed Default (311): 0.000000
    libinput Accel Profiles Available (318):    1, 1
    libinput Accel Profile Enabled (319):   1, 0
    libinput Accel Profile Enabled Default (320):   1, 0
    libinput Left Handed Enabled (312): 0
    libinput Left Handed Enabled Default (313): 0
    libinput Send Events Modes Available (275): 1, 0
    libinput Send Events Mode Enabled (276):    0, 0
    libinput Send Events Mode Enabled Default (277):    0, 0
    Device Node (278):  "/dev/input/event2"
    Device Product ID (279):    2, 10
    libinput Drag Lock Buttons (314):   <no items>
    libinput Horizontal Scroll Enabled (315):   1

Use the number in parentheses to change the specified parameter. You can set the acceleration speed (parameter 310) to value 1.0 (the fastest, it's a rate) with:
```
xinput set-prop 12 310 1
```
Suffer silently, as your TrackPad is still too slow.

No luck with sys parameters

Most drivers in linux have tunable parameters in /sys. Turns out that the Elan Trackpad is no exception.

cd /sys
find . -name sensitivity
./devices/platform/i8042/serio1/serio2/sensitivity

By echoing values in some of those parameters you may be able to adjust speed and sensitivity. No luck with my trackpad, despite trying different paths and values. The adjustments were too minor.

Finally worked it out with libinput

This page here brought me on the right path. In short:

xinput and friends are the right way to tune the settings on your device.
However, TrackPads vary wildly in terms of numbers they generate in response to your pressure.
libinput allows you to configure some magic numbers to address quirks in your device.

If you need anything more detailed, you should follow the instructions here.

What I had to do is:

Install the command libinput, apt-get install libinput-tools.
Look for my device with libinput list-devices.
Recompile libinput from source, a breeze following the instructions on the web site, so I could run libinput debug-gui, which is not enabled by default on Debian.
Play with the parameters, as per instructions.
Once I had the parameters right, installed them on my system.

The most important commands were:

libinput list-devices, to find which device corresponded to my trackpad (/dev/input/event2).
libinput quirks list /dev/input/event2 (this command is not documented in the help nor man page) to verify which quirks were applied correctly, and libinput quirks list --verbose /dev/input/event2 to have more insights on the paths and files loaded.
restarting X, to see the new parameters applied. As documented, libinput debug-gui is a bit slower than the real deal.

http://rabexc.org/posts/carbon-x1-trackpad

Using docker for persistent development environments

Jan 16, 2020 Updated Jan 16, 2020

Show full content

When thiking about Docker and what it is designed to do, what comes to mind are disposable containers: containers that are instantiated as many times as necessary to run an application or to complete some work, to then be deleted as soon as the work is done or the application needs to be restarted.

It's what you do when you use a plain Dockerfile to package an application and all its dependencies into a container at build or release time, for example, to then instantiate (and delete) that container on your production machines (kubernetes?).

It's what you do when you compose one or more containers to create an hermetic environment for your build or test system to run on, with a new container instantiated for each run of your build or test, deleted as soon as the process has completed.

Over the last year, however, I learned to love to use docker for something it was not quite designed to do: persistent development environments.

Before attacking me violently for committing such a horrible sin, let me explain the use case first.

Let's say you are a developer, working on a number of different projects at the same time. Some of those projects are quite different: there's some javascript, different versions of the tooling, there's some C code you are playing with, one of the projects uses a go backend, another two use different versions of node.

Let's also say that as most developers, you use a single laptop for all your work.

In this scenario, ideally, you'd want each project to have as little dependencies as possible on the host operating system, and have each project as isolated as possible from the other ones. Your C projects using LLVM libraries? Should not use the LLVM libraries on your system. An update may break them all. Your nodejs project? Can you keep a different version of node per project? Same thing.

Of course this is generally solved by dockerizing the build environment itself: run all the build in a docker container, together with all the dependencies it needs. But how do you kick this build at all? manually cutting and pasting commands from a doc? A Makefile? A shell script?

Will your developer on MAC be able to kick off this Makefile? What will the dependencies of the shell script be? Will you need yet another container to kick off a dockerized build environment? What if it's hard to build a Dockerfile at all, but you still want to hack on a linux project from your MAC or Windows machine?

This is where persistent development environments are most useful. Basically:

On your MAC/Windows/Linux box you create a container starting from your target operating system (eg, will you build/run/test your code under debian? use a debian container).
Get a shell in there, start hacking. Build, and test. Use it as if it was your own machine. Install and update dependencies, do what you like. Or if you prefer, use your graphical or favourite editor on the files directly, outside the container. Keep using this same container for as long as you have not fully automated / made portable / made hermetic / simplified your build system and full set of dependencies so they can run on your system. Stop and start the container as needed. Or just assume that working like this is good enough, and that it may be just simpler if your build and test systems only needed to worry about running on one single OS, with all developers using a specific container like described.
If you need to hack on a different project, just start a different container. If you need to make some dangerous changes to your container (eg, a system update) create a new container based on it.
Never touch your own host system. All it needs is well, docker.

If you've read my blog before, you may well remember that in the last few years, I've pretty much done the same using libvirt.

Given that this is not quite the recommended or common pattern with using docker, it is often hard to find the correct commands to use.

The practice

Let's look at a concrete use case: I'm on my work macbook. I'm developing an app that ultimately will run on linux, and whose CI/CD system runs on linux.

The build system is not quite written to run on Mac OS. Even though the intent was for the build to be portable, a number of GNU extensions were used, and it's now hard to get rid of them. Whenever I hack with this app, I would benefit from just using the same tools running on my CI/CD system.

I'll start by creating my "developer environment":

docker run -dt -v /home/me/projects:/opt/projects -p 5000-6000:5000-6000 --name project-foo debian:10 bash

This will start a docker container named project-foo mapping my source code /home/me/projects in /opt/projects and exposing ports from 5000 to 6000 as local ports. This will allow me, for example, to just use my favourite MAC editor to modify the code in /home/me/projects, and see the changes in the container, and start my dev app on port 5432 and access it at http://127.0.0.1:5432. I could of course use --network host instead of -p 5000...6000 as per my other article to just expose all ports, faster, with great performance, but unfortunately this does not work on Mac or Windows.

I can stop or start this container at any time with:

docker stop project-foo
docker start project-foo

even after a reboot of my machine, and I can see it is running with:

$ docker ps
CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS                              NAMES
b9ff98190412        debian:10           "bash"                   11 minutes ago      Up 22 seconds       0.0.0.0:5000-6000->5000-6000/tcp   project-foo

now, I can get as many shells I need in this container by running:

docker exec -it project-foo bash

in this shell, I can install any software I need, in complete isolation from other projects and from the host operating system, and still use it mostly like it was local.

I can even use a graphical editor to modify the files in /home/me/projects and run a watcher (like ibazel) in the container to automatically have the project rebuilt.

Now, let's say I want to change the port mappings, or I want to instantiate another copy of my development container so I can install a new version of gcc or LLVM to see if the project still builds. All I have to do is:

docker commit project-foo project-foo

and then:

docker run -dt -v /home/me/projects:/opt/projects -p 6000-7000:6000-7000 --name project-foo2 project-foo bash

to start a project-foo2, and well, keep hacking around. Same source code, but running in two very different environments at the same time.

And that's all for now.

http://rabexc.org/posts/docker-dev

Docker networking on Linux

Jun 1, 2019 Updated Jun 1, 2019

Show full content

When you run a an application under docker, you have a few different mechanisms you can choose from to provide networking connectivity.

This article digs into some of the details of two of the most common mechanisms, while trying to estimate the cost of each.

Using -p

The most common way to provide network connectivity to a docker container is to use the -p parameter to docker run. For example, by running:

docker run --rm -d -p 10000:10000 envoyproxy/envoy

you have exposed port 10000 of an envoy container on port 10000 of your host machine.

Let's see how this works. As root, from your host, run:

netstat -ntlp

and look for port 10000. You'll probably see something like:

[...]
tcp6   0  0 :::10000    :::*   LISTEN   31541/docker-proxy  
[...]

this means that port 10000 is open by a process called docker-proxy, not envoy.

Like the name implies, docker-proxy is a networking proxy similar to many others: an userspace application that listens on a port, forwarding bytes and connections back and forth as necessary.

From the standpoint of the networking stack, this is significantly different from what happens with a non-dockerized application. Instead of having: network card -> kernel -> application, we now have network card -> kernel -> application (docker proxy) -> kernel -> application.

You can see the benchmarks below, but unsurprisingly, this is not only introducing a significant performance bottleneck, but it is also costing us much more CPU and memory just to get packets in and out of a container.

Faster -p

docker-proxy is unsurprisingly one of the least loved components of docker. On linux, modern versions of docker support using iptables instead of a proxy. The idea is simple: rather than an userspace application proxying the connections on behalf of your container, the kernel is configured to modify them through NAT rules and route them appropriately.

This feature, however, is not enabled by default. By looking through the history of related bugs, it seems like it can tickle bugs on older kernels, or there are some corner cases by which this does not always work correctly.

In any case, you can enable the option by creating (or editing) /etc/docker/daemon.json to have:

{
    "userland-proxy": false,
    "iptables": true
}

If not done automatically, it may also be necessary to run:

/sbin/sysctl net.ipv4.conf.docker0.route_localnet=1

--network host

Another way to provide network connectivity to your job is to not use the -p parameter at all, and instead use --network host.

While -p starts a proxy to forward connections and data back and forth between the host and your application, --network host tells docker that you want your container to share the network configuration of the host.

This means that if the docker container opens port 9000, port 9000 will be open on your host directly - with nothing in between.

The main problem with this is that with -p x:y, two containers can use the same port, as long as they get mapped to different ports on the host.

But with --network host, instead, no two containers can use the same port.

So you need to be careful to only use containers that don't have conflicting port numbers. Further, you don't want to accidentally expose ports that your containers may have opened.

There are a few tricks you can use here. For example:

With the envoy image, you can supply a different configuration file, with whatever port you like. Either by creating a derived image (recommended), or by supplying -v to override the config file (-v local/envoy.yaml:/etc/envoy/envoy.yaml) at container run time.
You can use environment variables in the Dockerfile or your scripts, to pass down a port number (and address) to bind to. For example, docker run -e PORT=9000 will provide a $PORT variable with the number 9000 in it. If you consistently use it, you can easily move docker containers to use different ports.
To avoid accidentally export ports, you can also bind to 127.0.0.1. Instead of having your applications listen for any connection arriving, on 0.0.0.0, it is a good habit to configure them to only accept from 127.0.0.1, unless you desire them to be exposed. This works really well with something like envoy, where the main ports, 80 and 443, are exposed, while all other "backend ports" are bound to 127.0.0.1.

Some benchmarks

Just to get an idea of the cost of each method, I fired up a quick iperf3 on my laptop.

The full results are below, but in short:

Plain -p (with docker-proxy), yields ~42 Gbps with 14% system idle.
Fast -p (with iptables, userland-proxy: false), yields ~54 Gbps with 49% system idle.
Host networking, yields ~70 Gbps, with 49% system idle.

Adjusting for CPU load, we have that host networking is ~2.8 times more efficient than docker-proxy, and ~1.3 times more efficient than -p using iptables rules.

It is important to note that this test was ingress heavy, while traffic is often egress heavy, and that I only used 10 connections.

Plain -p (with docker-proxy)

# iperf3 -c 0 -P 10 -p 9000 -t 60
...
[SUM]  39.00-40.00  sec  4.88 GBytes  41.9 Gbits/sec    0

# mpstat -P ALL 1
...
08:02:56 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
08:02:57 PM  all    5.63    0.00   63.17    0.26    0.00   16.88    0.00    0.00    0.00   14.07
08:02:57 PM    0    7.29    0.00   52.08    1.04    0.00   13.54    0.00    0.00    0.00   26.04
08:02:57 PM    1    0.00    0.00   80.00    0.00    0.00   20.00    0.00    0.00    0.00    0.00
08:02:57 PM    2    3.03    0.00   68.69    0.00    0.00   19.19    0.00    0.00    0.00    9.09
08:02:57 PM    3   12.50    0.00   51.04    0.00    0.00   14.58    0.00    0.00    0.00   21.88

Using fast -p ("userland-proxy": false)

# iperf3 -c 0 -P 10 -p 9000 -t 60
...
[SUM]  58.00-59.00  sec  6.30 GBytes  54.1 Gbits/sec    0

# mpstat -P ALL 1
...
07:58:21 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
07:58:22 PM  all    1.26    0.00   39.70    0.00    0.00    9.80    0.00    0.00    0.00   49.25
07:58:22 PM    0    2.00    0.00   70.00    0.00    0.00   28.00    0.00    0.00    0.00    0.00
07:58:22 PM    1    2.00    0.00   87.00    0.00    0.00   11.00    0.00    0.00    0.00    0.00
07:58:22 PM    2    1.01    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00   98.99
07:58:22 PM    3    0.00    0.00    1.01    0.00    0.00    0.00    0.00    0.00    0.00   98.99

Host networking

# iperf3 -c 0 -P 10 -p 9000 -t 60
...
[SUM]   9.00-10.00  sec  8.15 GBytes  70.0 Gbits/sec    0

# mpstat -P ALL 1
...
08:05:07 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
08:05:08 PM  all    1.01    0.00   42.96    0.25    0.00    6.78    0.00    0.00    0.00   48.99
08:05:08 PM    0    0.00    0.00    0.00    1.03    0.00    0.00    0.00    0.00    0.00   98.97
08:05:08 PM    1    0.00    0.00   88.00    0.00    0.00   12.00    0.00    0.00    0.00    0.00
08:05:08 PM    2    3.00    0.00   82.00    0.00    0.00   15.00    0.00    0.00    0.00    0.00
08:05:08 PM    3    0.99    0.00    0.99    0.00    0.00    0.00    0.00    0.00    0.00   98.02

Notes export vs publish

docker run supports -p (--publish) and --export. There is conflicting information online on the exact meaning of export and publish.

From my understanding, --export (EXPORT in the Dockerfile) is just declaring that on a specific port there is a service running. This is used by -P (to publish all ports - how does it know which ones are all ports? Thanks to EXPORT!), and by service bindings.

It does not seem to affect container to container communication. At least on linux, each container gets an IP address, and with the default bridging network and no other specific setting, it does not seem like there are iptables rules or other configurations impeding the communication.

Getting the IP address

To get the ip address of a container, you can use:

docker inspect -f '{{range .NetworkSettings.Networks}}{{.IPAddress}}{{end}}' name

or just look at the output of:

docker inspect name

Finding out port owners with docker-proxy

Given that netstat -ntlp just shows docker-proxy, how can you know which container has the port open?

One simple way is to just run:

docker ps

and peek at the PORTS column. It will show you which ports are mapped to which container:

CONTAINER ID  IMAGE             COMMAND   CREATED       STATUS       PORTS     NAMES
614c350d87dc  envoyproxy/envoy  "envoy"   4 hours ago   Up 4 hours   9000/tcp  friendly_almeida

Given that there is a docker-proxy instance per docker container, with ps aux you can also peek at the command line to see the IP and ports a docker-proxy instance is tied to:

# ps aux
root     11747  0.0  0.0 474160  8216 ?        Sl   16:18   0:00 /usr/sbin/docker-proxy -proto tcp -host-ip 0.0.0.0 -host-port 9000 -container-ip 172.17.0.2 -container-port 9000
root     11836  0.0  0.0 547892  6228 ?        Sl   16:18   0:00 /usr/sbin/docker-proxy -proto tcp -host-ip 0.0.0.0 -host-port 9001 -container-ip 172.17.0.3 -container-port 9000

You can also peek at what's happening in the networking layer of the container by 1) discovering the namespace id used by the container, and 2) running commands in it.

A good way to do so is to run:

docker inspect -f '{{.State.Pid}}' friendly_almeida

where friendly_almeida is the name of the container, followed by:

nsenter -t 655 -n netstat -ntlp
nsenter -t 655 -n ip a show

for example, run as root. nsenter is particularly handy as it allows to run arbitrary commands from your host in the container of the docker app, like:

nsenter -t 655 -a ps aux

dockerd listening on the port

Even when using "userspace-proxy": false, with netstat -ntlp you can see dockerd listening on the ports you pass with -p.

This was extremely confusing to me, but after a bit of research, it turns out it does so only to allocate the port, so host applications will not be able to listen on it - which is a good idea, given that iptables is configured to modify that traffic and get it delivered to the container instead.

Something even more confusing here is that dockerd will listen on the ipv6 address ::1 (which also works for ipv4), while iptables rules will be installed for ipv4 only.

If you test with ::1 instead of 127.0.0.1, you'll land on this proxy (which does nothing) instead of the real port number.

http://rabexc.org/posts/docker-networking

Resizing an encrypted filesystem with LVM on Linux

Jun 11, 2017 Updated Jun 11, 2017

Show full content

I recently had to increase the size of an encrypted partition on my Debian server. I have been a long time user of LVM and dm-crypt and tried similar processes in the early days of the technology.

I was really impressed by how easy it was today, and how it all just worked without effort without having to reboot the system, mark the filesystems read only, or unmount them.

Here are the steps I used, on a live system:

Used mount to determine the name of the cleartext partition to resize. In my case, I wanted to add more space to /opt/media, so I run:
```
# mount |grep /opt/media
/dev/mapper/cleartext-media on /opt/media type ext4 (rw,nosuid,nodev,noexec,noatime,nodiratime,discard,errors=remount-ro,data=ordered)
```
which means that /opt/media is backed by /dev/mapper/cleartext-media
Used cryptsetup to determine the name of the encrypted Logical Volume backing the encrypted partition:
```
# cryptsetup status /dev/mapper/cleartext-media
/dev/mapper/cleartext-media is active and is in use.
  type:    LUKS1
  cipher:  [...]
  keysize: [...]
  device:  /dev/mapper/system-encrypted--media
  offset:  4096 sectors
  size:    [...] sectors
  mode:    read/write
  flags:
```
From this output, you can tell that /dev/mapper/cleartext-media is the cleartext version of the /dev/mapper/system-encrypted--media, where system is the name of the Volume Group while encrypted-media is the name of the Logical Volume.
I checked my Volume Group to determine it had enough space:
```
# vgs
VG     #PV #LV #SN Attr   VSize      VFree
[...]
system   1   6   0 wz--n- 3Tb        3Tb
[...]
```
Which means: it has 3Tb of free space. If it did not have enough space, I would have had to shring another Logical Volume, or added a new Physical Volume to the Volume Group, with pvcreate and vgextend.

I extended the size of the Logical Volume by 1Tb:

# lvextend -L +1T /dev/system/encrypted-media
  Size of logical volume storage/encrypted-media changed from 2.00 TiB (524288 extents) to 3.00 TiB (786432 extents).
  Logical volume encrypted-media successfully resized

Used the lvs command before and after to check the size: it should show 1 additional Tb of space.

Told cryptsetup about the additional space being available, so it could show the additional space in the cleartext version of the volume:
```
# cryptsetup resize /dev/mapper/cleartext-media
```

Resized the file system on top of the dm-crypt volume, with:

# resize2fs -p /dev/mapper/cleartext-media
resize2fs 1.42.12 (29-Aug-2014)
Filesystem at /dev/mapper/cleartext-media is mounted on /opt/media; on-line resizing required
old_desc_blocks = 128, new_desc_blocks = 192

The filesystem on /dev/mapper/cleartext-media is now 805305984 (4k) blocks long.

Running the command dmesg also showed a resize was happening:

[1874383.459856] EXT4-fs (dm-14): resized to 784695296 blocks
[1874393.519284] EXT4-fs (dm-14): resized to 787283968 blocks
[1874403.551540] EXT4-fs (dm-14): resized to 790003712 blocks
[1874413.643705] EXT4-fs (dm-14): resized to 792625152 blocks
[1874423.645100] EXT4-fs (dm-14): resized to 795246592 blocks
[1874443.269091] EXT4-fs (dm-14): resized to 795869184 blocks
[1874453.300005] EXT4-fs (dm-14): resized to 798588928 blocks
[1874463.352083] EXT4-fs (dm-14): resized to 801275904 blocks
[1874473.451900] EXT4-fs (dm-14): resized to 803897344 blocks
[1874478.593916] EXT4-fs (dm-14): resized filesystem to 805305984

Profit. You have successfully resized your volume.

Of course, it is strongly recommended you back up your data before starting. Note, however, that the entire process took no more than 10 minutes, and was generally painless.

http://rabexc.org/posts/resizing-filesystem

Debian releases - stable, testing, unstable

May 13, 2017 Updated May 13, 2017

Show full content

When talking about using Debian, one of the first objections people will raise is the fact that it only has "old packages", it is not updated often enough.

This is generally not true. Or well, "true" if you stick to the "stable" release of Debian, which might not be the right version for you.

Also, people don't often realize that it's easy to use more than one "release" on a given system. For example, you can configure apt-get to install "stable" packages by defautl, but allow you to do a manual override to install from "testing" or "unstable", or vice-versa.

Before starting, it's worth noting that this might not be for the faint of hearts: mixing and matching debian releases is generally risky and discouraged. Why is this a problem? Well, it all has to do with dependencies, backward compatibility and the fact that they may not always be correctly tracked.

For example: let's say you install a cool new gnome application from unstable. This cool and new application depends on the latest icons, which apt also correctly installs. Now, in the new icons package, some old icons have been removed. Old applications using them will either need to be upgraded, or their icons break. This is generally handled correctly by apt-get dependencies, assuming the maintainer did a really good job tracking versions. But this is hard to do, and error prone at times. Worst can happen with C libraries or different GCC versions using different ABIs, or systemic changes like the introduction of systemd or similar.

Debian (and technology) has gotten much better over the years, and I must say this will work most of the times. Still, be careful: whenever upgrading, check what is being installed, and try to make a call whatever the risk is worth it.

But let's start from the beginning. In order to make a good choice, you need to understand how this all works.

Stable, Testing, Unstable and Experimental

So, let's talk about the lifecycle of a debian package. Let's say a new version of vim or git is released upstream. What happens next?

A Debian developer will likely notice. Either because he's monitoring their development, their web site, has setup automated scripts, or a user has opened a bug like 'please update vim!'.
The Debian developer will download the new version, build it, make sure his build scripts still work, and if they do, build a .deb package.
The Debian developer will upload the package to unstable. In unstable, it will immediately become available to anyone using that release.
After a several days in unstable, and once a set of criteria has been met (like no new bugs filed of a certain priority, ...), a system move it automatically to testing.
Every once in a while, a group of Debian developers will decide to release a new version of Debian, a new stable. They will thus freeze the current testing archive, fix as many critical bugs as possible, and do a new release.
After a package makes it into stable, only security fixes and certain kind of updates can be made.

What does this all mean?

stable changes rarely, as a result of a manual process. Software is old.
testing changes continuously, as a result of an automated process. Software is relatively recent, although it has baked in unstable for a few days, and proved to be stable enough for the automated systems to move it to testing.
unstable has all the bleeding edge stuff, freshly uploaded by their maintainers. Software here is generally very new, the only exception being software that is not well maintained, where the maintainer is busy doing something else. The drawback of unstable is well, that sometimes there is broken software, or packages that don't interact well with each other.

There's also a fourth release: experimental. This is used by Debian developers to push packages that are really not that ready to be used, experiments for brave people to try.

Choosing one, or mixing them all

One little known feature of apt-get, aptitude and all related tools is that it is relatively easy to mix and match packages from any release.

First, you have to configure your /etc/apt/sources.list correctly. Mine looks like this:

deb ftp://my.debian.mirror.org/debian/packages stable         main contrib non-free
deb ftp://my.debian.mirror.org/debian/security stable/updates main contrib non-free
deb ftp://my.debian.mirror.org/debian/packages testing         main contrib non-free
deb ftp://my.debian.mirror.org/debian/security testing/updates main contrib non-free
deb ftp://my.debian.mirror.org/debian/packages unstable         main contrib non-free

deb ftp://ftp.at.debian.org/debian/ testing main contrib non-free
deb ftp://ftp.at.debian.org/debian/ stable main contrib non-free
deb ftp://ftp.at.debian.org/debian/ unstable main contrib non-free
deb ftp://ftp.at.debian.org/debian/ experimental main contrib non-free

deb-src ftp://ftp.at.debian.org/debian/ testing main contrib non-free
deb-src ftp://ftp.at.debian.org/debian/ stable main contrib non-free
deb-src ftp://ftp.at.debian.org/debian/ unstable main contrib non-free
deb-src ftp://ftp.at.debian.org/debian/ experimental main contrib non-free

deb http://security.debian.org/ testing/updates main
deb http://security.debian.org/ stable/updates main

It might look confusing at first, but really, it's not that hard: it is telling apt-get to download all indexes for all versions of the debian indexes - stable, testing, unstable, and even experimental.

If I was to run apt-get update and apt-get dist-upgrade after just setting this file up, apt-get would update my whole system to the unstable (or experimental) version of all packages. Which is not what I want.

I want to a) be able to pick per package, and b) have a reasonable default.

So the next thing I do, is create an /etc/apt/preferences file. In this file, I generally write something like:

Package: *
Pin: release a=stable
Pin-Priority: 800

Package: *
Pin: release a=testing
Pin-Priority: 950

Package: *
Pin: release a=unstable
Pin-Priority: 700

Package: *
Pin: release a=experimental
Pin-Priority: 500

This pretty much tells apt: "please, my kind apt friend, prefer packages in testing (highest priority). If you cannot find packages there, or cannot resolve dependencies there, fallback to stable first, unstable second, and experimental last". Of course you can change the priorities to pick whichever order you prefer. Just don't go above 1000, for a reason I will explain later.

Now, let's say I run:

apt-get install vim

apt-get will look for the latest vim in testing, and install it. If it cannot find it there - or any of the dependencies are not available there - it will look for packages in stabe, unstable and experimental.

But there's now one more trick I can use: I can tell apt-get manually what to do!

For example, if I want to install the latest and coolest vim, that hasn't made it to testing yet, I can run:

# apt-get install -t unstable vim

or:

# apt-get install vim/unstable

The former will install vim, and all the other packages apt-get decides to install or upgrade, from unstable. The latter will install only vim from unstable, and get all other packages based on the preferences above.

Now, what happens when you update your system? Well, apt-get will not downgrade an installed version of a package unless its priority is above 1000, or you manually specified its version.

One more thing to note is that if you try to install something that only exists in unstable or experimental, apt-get will figure it out and try to do the right thing (well, most of the times).

Marking and holding, with apt-mark

There is one more trick I use. With apt-mark, you can tell all apt tools to not touch a package.

Let's say vim is critical to your business, and you want to install and hold onto the current stable version of it.

All you have to do is:

# apt-get install -t stable vim
# apt-mark hold vim

The second command will instruct apt-get to never automatically upgraded, removed, or installed.

You can see all the 'holds' in place with:

# apt-mark showhold

and remove it with:

# apt-mark unhold vim

Similarly, you can mark a package as manually or automatically installed with:

# apt-mark auto screen
# apt-mark manual screen

The difference is subtle, but important. A manually installed package is a package you, as a user, care about. So apt will keep it up to date, and never remove it.

An automatically installed package, instead, is one that was installed because of a dependency. If that dependency is removed, or no longer necessary itself, the package might be removed during an upgrade.

When I play with new software, or try new packages, I generally mark them as installed automatically until I decide I want to keep using them. That way, they are likely to be removed as I update the system

Conclusions

I have been using Debian since the late 90s. One thing I love about it is apt, and the ability to have a continous update cycle. Rather than reinstalling the system every few years, or doing a giant upgrade every so often, I much prefer to stick to testing, and frequently run apt-get dist-upgrade to update the few packages that have changed since I last run the command.

Small, frequent, updates that give me fairly up to date software, rather than large, rare, updates that give me out of date software. This has worked well for me for production servers as well, with very minor glitches every now and then.

http://rabexc.org/posts/apt-config

Analyzing user behaviors via traffic dumps despite encryption

May 1, 2017 Updated May 1, 2017

Show full content

I have always heard that it was possible to guess the content of "HTTPS" requests based on traffic patterns. For example: a large long download from youtube.com, is, well, very likely to be a video :).

But how easy is it to go from a traffic dump, captured with something like tcpdump to the list of pages visited by an user? Mind me: I'm talking about pages here - URLs, actual content. Not just domain names.

Turns out it's not that hard. Or at least, I was able to do just that within a few tries. All you need to do is crawl web sites to build "indexes of fingerprints", and well, match them to the https traffic.

In this post, I'd like to show you how some properties of HTTPS can easily be exploited to see the pages you visited and violate your privacy - or track your behavior - without actually deciphering the traffic.

The post also describes a naive implementation of a fingerprinting and matching algorithm that I used on a few experiments to go from traffic dump, to the list of full URLs visited.

Let me get this straight from the beginning: there is no new attack described in this document, no new discovery about TLS and its implementation. It's all about playing with information that has always been available in cleartext on TLS (or SSL) streams.

If you are curious about the conclusions, you can jump there and skip all the gory details of the article.

Note that given the recent news that the US Senate voted to kill various privacy rules, one of the widely given advices to protect your privacy was to use HTTPS.

This is great advice: you should always use HTTPS. However, it is unlikely to achieve the goal you may think it achieves. For example, your ISP will still be able to see which sites and pages you visit most of the times, even if you use HTTPS and strong encryption.

Note also that the recipes in here apply to HTTP/1.0 and HTTP/1.1 only, no HTTP/2.0.

The basic ingredients, or what you need to know

When your browser makes an HTTPS request, both your request and reply are encrypted.

However, there are a number of things that are not encrypted, and easily visible to an attacker through a traffic dump obtained with tools like tcpdump or wireshark.

From this data, and I'll show you examples briefly, one can easily tell:

the IP address of the server you connected to. This is basic TCP/IP: there must be a cleartext IP address in every packet your computer sends and receive for the packet to make it to the server and back to your computer.
the number of connections opened to the remote server, and which packet belongs to which connection. Again, this is basic TCP/IP, an attacker just needs to look at the port numbers, and know how TCP works.
the domain name of the server. This is generally in clear text in the HTTPS request thanks to SNI. It is important to allow the remote host to pick the correct certificate. If you use an old browser that does not support SNI, it is still very likely that your computer made a DNS request earlier with the name in cleartext.

the size of your request, and the size of the reply. This might be a bit more surprising, but no major browser enables pipelining over the HTTP protocol by default. This means that whenever the browser requests a page, it will wait for the reply to come back before sending another request over a given TCP connection. Sure, there are multiple TCP connections, but for one connection, there's always one - and only one - request in flight, and one reply.

If an attacker can observe the traffic (again, a traffic dump, a MITM proxy, ...), than it is trivial to write a couple lines of code that will count how many bytes were sent over a connection until a response starts, count the bytes of the response until the next request starts, and so on.

You can see an example of such code here. But once you remove the empty ACKs, this is what a tcpdump of an HTTPS session looks like (with some edits for readability and anonimity):

# TCP handshake: SYN, SYN & ACK, ACK
10.11.24.36.49302 > 192.168.0.1.443: Flags [S], seq 3076221335, win 29200, length 0
192.168.0.1.443 > 10.11.24.36.49302: Flags [S.], seq 2430086164, ack 3076221336, win 28960, length 0
10.11.24.36.49302 > 192.168.0.1.443: Flags [.], ack 1, win 229, length 0

# Browser starts TLS handshake. Sends a bunch of stuff, including the
# name of the host it is connecting to. Dump obtained with "-s0 -xX" flags.
10.11.24.36.49302 > 192.168.0.1.443: Flags [P.], seq 1:290, ack 1, win 229, length 289
       [...]
       0x00f0:  0018 7777 776d 7972 616e 6f6d 7465 7374  ..www.myrandomte
       0x0100:  7369 7465 6374 2e63 6f6d ff01 0001 0000  stsite.com......

# TLS handshake continues, server returns its certificate.
192.168.0.1.443 > 10.11.24.36.49302: Flags [.], seq 1:1449, ack 290, win 939, length 1448
192.168.0.1.443 > 10.11.24.36.49302: Flags [.], seq 1449:2897, ack 290, win 939, length 1448
192.168.0.1.443 > 10.11.24.36.49302: Flags [P.], seq 2897:4097, ack 290, win 939, length 1200
192.168.0.1.443 > 10.11.24.36.49302: Flags [.], seq 4097:5545, ack 290, win 939, length 1448
192.168.0.1.443 > 10.11.24.36.49302: Flags [P.], seq 5545:5871, ack 290, win 939, length 326

# TLS handshake still, client negotiates the keys to use.
10.11.24.36.49302 > 192.168.0.1.443: Flags [P.], seq 290:365, ack 5871, win 342, length 75
10.11.24.36.49302 > 192.168.0.1.443: Flags [P.], seq 365:416, ack 5871, win 342, length 51

# TLS handshake still, server pretty much says he's happy with the keys.
192.168.0.1.443 > 10.11.24.36.49302: Flags [P.], seq 5871:6113, ack 416, win 939, length 242

# YAY! HTTP Get request encrypted over TLS
10.11.24.36.49302 > 192.168.0.1.443: Flags [P.], seq 416:533, ack 6113, win 364, length 117

# And the web page from the server
192.168.0.1.443 > 10.11.24.36.49302: Flags [P.], seq 6113:7518, ack 533, win 939, length 1405
192.168.0.1.443 > 10.11.24.36.49302: Flags [.], seq 7518:8966, ack 533, win 939, length 1448
192.168.0.1.443 > 10.11.24.36.49302: Flags [.], seq 8966:10414, ack 533, win 939, length 1448
192.168.0.1.443 > 10.11.24.36.49302: Flags [.], seq 10414:11862, ack 533, win 939, length 1448

# Until pretty much the TLS connection is closed.
192.168.0.1.443 > 10.11.24.36.49302: Flags [P.], seq 33376:34223, ack 533, win 939, length 847
10.11.24.36.49302 > 192.168.0.1.443: Flags [P.], seq 533:564, ack 34223, win 817, length 31
192.168.0.1.443 > 10.11.24.36.49302: Flags [P.], seq 34223:34254, ack 564, win 939, length 31
192.168.0.1.443 > 10.11.24.36.49302: Flags [F.], seq 34254, ack 564, win 939, length 0

Now, whenever your browser fetches a remote page and tries to display it to you, there are a few other things to keep in mind:

any web page is made by tens of resources: images, stylesheets, javascript, sometimes fetched from other remote sites (eg, bootstrap, google fonts, analytics, ads, ...).

Some of those resources will be cached by your browser, some won't.

For example: how often do you visit your bank web site? or your broker? Do you think the images/css/js are in your cache? Note that some web sites will instruct your browser not to cache the content, especially if any of those resources are dynamically generated.
some web pages fill themselves up with data retrieved via javascript, ajax is pretty common.

For example, let's visit https://www.duckduckgo.com, a very privacy conscious search engine. As you type in the search box you will see suggestions underneath.

Do you know how that works? It's quite simple: for every character you type in, a bit of javascript will send an HTTPS request to the duckduckgo servers for the letter you typed, and get back a reply with autocompletion suggestions. So every letter results in a request, and a reply.
Most users "click on links" to browse the internet. Generally, they start off a search engine, or a bookmarked page, or an initial url they remember. But from then on, they keep moving around the web site by clicking links. This is important in the next few sections.

How do we put all of this together

So, where am I going with all of this? How do I glue all of this together?

Well, the basic idea is really simple: if you forget about caching, if you fetch the same HTTPS static resources twice you'll see two requests for the same page which have the same size, and receive two responses of roughly the same size, and roughly fetch the same set of resources the page is made of (eg, css, js, ...).

When I say "roughly", I actually mean "almost the same" under many circumstances, if you take into account a few glitches. Of course real life is not that simple, but we'll get to that later.

All of this in turn means that if you account for those "facts of life", maybe, and I say maybe, an attacker (or an ISP...) can:

Record your traffic. As a starting point, the size of each request and reply, some timing, and some information that can be extracted from TLS headers. Basically, a fancier, better written, and faster version of this. Note that any ISP can do this easily.
Browse the same HTTPS domains you visited with some sort of crawler, and for each page build some sort of fingerprint based on the request and response size, which other resources each page requires, and so on.
Go over the collected requests and replies, and try to match the fingerprints identified to those collected, to determine which sites you visited, and what you did on those sites.

Of those steps, we established that 1 is simple to do. A very fast version in C / C++ to run on an embedded device to produce a "summary" to send to some "collection and processing pipeline" would probably take less than a day of work to write by an experienced programmer.

Building a crawler, like the one in 2, also does not sound that hard to do. It's been done before, hasn't it? Note also that an attacker doesn't have to crawl the entire internet beforehand. As long as the content has not changed, he can do so whenever, based on the observed domains you browsed.

An entrepreneurial kind of attacker could also build an index and sell it as a service. He could even sell packages with different kinds of indexes based on which behaviors you want to track of your users. For example: want to know what your users are doing financially? Here's an index of popular banks and financial sites. Want to know their sexual preferences? Here's an index of popular sites for dating, forums, or with sexual content. And so on.

The hard part seems to be mostly in 3: matching the fingerprint (and of course, building it). Or is it that hard? Let's start talking about the facts of life we temporarily set aside earlier.

The facts of life

There are several problems with using the size of a request and/or the size of a reply as a fingerprint:

The size must be unique. Well, unique enough. Is it? Well, it depends on the site. A catalog, for example, where every page is exactly the same except for some description and has million of entries won't work well. But by crawling a bunch of web sites (mostly financial institutions, but also a dating site, and an airline) I found that > 80% of pages have sizes that are different by at least 4 bytes, while only < 3% have exactly the same size. This, after manually adjusting for things like tracking images or some internal urls that really nobody would care about being able to tell apart in HTTPS traffic.

I also tried crawling under different usernames some parts of the sites that require registration. The finding are similar: although there are small variations in size due to dynamic conent (things like "hello Mr. Doe", or "your total is $0.0 vs $10.0"), the sizes are highly distinctive of which parts of the web interface you are browsing.

These statistics mean that if an attacker could see the exact size of the reply, and had a reasonable way to match it, by just looking at that size he would be able to guess the page correctly > 90% of the times.

Unfortunately, this is not always the case. But if we combine this with matching the request size as well, and being smart about which resources can be navigated to at any given time, the confidence of guessing correctly increases significantly. However...
Some encryption schemes will round the sizes, add some padding. Block ciphers will, with a typical rounding of 16 bytes (128 bit blocks). Stream ciphers, like AES in GCM mode (one of the most common) will not, they will just add a constant to the size, SIZE = LEN(CLEARTEXT) + K, with K = 16, for AES-GCM, generally.
Some standard features of the HTTP protocol will actually mess with the size. And this is probably the hardest problem to solve.

So, let's go over those HTTP features that mess up my grand scheme to snoop on your HTTPS transactions.

Chunked encoding - reply size

First, we have a problem with chunked encoding. Well, sometimes we do.

Let me explain. With old HTTP/1.0, or before chunked encoding was introduced, HTTP was a simple "open connection", "send request", "get reply back", "close connection", "open new connection", "send new request", and so on.

At a certain point, someone figured out this was expensive, so they looked for a way to reuse the same connection for multiple requests. The main problem was figuring out when a reply ended. So they introduced chunked encoding. With this encoding, a reply is made of a few bytes that indicate how big the "next chunk is", followed by that many bytes of reply itself, then another few bytes for the next chunk. If the size of the chunk is 0, the reply is complete.

For example, to return the string "Hello world", a chunked reply with 2 chunks could look like: {5}Hello{6} world{0}.

The problem this poses is that the encoding is encrypted, so the number of chunks returned and size markers themselves are not visible. Each size marker takes two bytes, and the number of chunks can be a constant, or change all the time depending on the server implementation.

However, many web servers or HTTP frameworks just return the whole reply as a single chunk.

The reason is simple: HTTP requires that cookies and errors must be returned in headers, before anything else. So if you have php / node / java / ... code generating the content, and your framework or web server wants to support setting a cookie or returning an error at any time, it must buffer the whole reply, and flush the buffer only when it is sure no error or cookie can be set anymore. Once we have the whole reply in a buffer, it is cheaper to just send a single chunk.

Reverse proxies, though, don't have this problem. Quite the opposite: they have an incentive to use small buffers to reduce memory consumption and latency. Depending on the implementation and configuration, and on network or backend load conditions, you may end up with a varying number of chunks - different for each reply.

If I had to guess, most web servers end up returning a single chunk. Either way, it is easy to detect the different behaviors at crawling time, and look at how many chunks are returned on average for replies.

For each chunk, the size will increase by 2 bytes, and our algorithm will need to work on a range of sizes.

Second, we have a problem with cookies: they can affect both the request and the response size.

On the request side, the browser might be passing a cookie. On the response side, the server might be setting (or changing) a cookie.

There are two common ways cookies are used:

They maintain a unique identifier, used to retrieve (and store) information related to the user.
They maintain state, often signed and encrypted, to prevent tampering, that the web site uses to determine where you stand in a complex interaction.

Either way, cookies can be observed while crawling. The first case is generally simple: your browser will get a coookie once, and keep returning it. So it's a constant added to every request size to the server, and to responses setting it.

The second case is more complex: the size of the request and reply will keep varying based on the state. A few crawling sessions could be used to narrow down a range of minimum cookie size and maximum, which would once again give us a range, and be enough for the algorithm described.

Referrer

Every time you click a link on a web page, the browser has to load the new page. When loading this new page, it generally passes down a "Referrer" header indicating the URL of the page that contained the link (well, depending on the referrer policy, and a few other rules).

This means that the size of the request to load a page changes based on the URL that contianed the link the user clicked on.

If we use the request size to tell pages apart well, this is a problem.

However, there are a few important things to note that actually help the steps described above:

When crawling a web site, the crawler knows both the pages containing links, and the pages linked. It can then compute a "normalized" request size, by removing the size of the referrer header, which becomes a simple property of an arc connecting the two pages in a graph.
If the algorithm to match pages works well, it generally knows the page visited before by the user, and can guess the length of the referrer header. Note that some web pages (notably, google) set an "origin" policy, so the "Referrer" header is always set to the domain name of the original site.

The problem is thus most visible on the first HTTP transaction tracked of a user, where the origin is unknown, and the content of the referrer header can't be guessed.

The algorithm described below can however take this into account, by broadening the range of sizes considered acceptable in those cases, or by only looking at the response size.

Others

There are several other parameters that can affect the request size and response size, or what is observed on the wire.

Things like different browsers sending different headers, negotiating gzip encoding, language settings, or caching behaviors sending headers with 'If-Modified-Since' or CORS causing OPTIONS requests.

The web server can also have dynamic pages with advertisements or banners that are changed server side, or that react to any of the client provided headers.

My experiments here were very limited: just what I could simulate easily with the browsers on my laptop.

However, I believe that with relatively modest efforts it would be possible to add heuristics to take into account those differences, by, for example, crawling multiple times simulating different browser behaviors, increasing the size ranges accepted, and so on.

Algorithm

To take into account all of the above, the algorithm I used does something like this:

Collect traffic dumps. For each connection, extract the domain name of the server visited and size of ciphertext requests, size of ciphertext replies, and timestamp of when each started, and when each ended.
If the domain name has been crawled before, use the already generated index. If not, crawl the web site and generate the index.
The index is made of 4 main data structures:
1. A graph, with each node being a web resource (a .html page, a .css, a .js, and so on with relative URL), "load arcs" indicating which other resources it loaded, and "link arcs" indicating which other pages this page may lead to. For example, a "test.html" loading a "test.css" and "test.js" file would have 3 nodes, one per resource, and 2 "load arcs" going from "test.html" to "test.css" and "test.js". If the page linked to "foo.html" and "bar.html", it would additionally have 2 "link arcs" going to those pages.
  
  Each node also has a response size and requset size range tied to it. For example, if "test.css" was retrieved from 3 different pages, and one of the responses or requests (adjusted for the referer) was larger, it should keep track of the range size of "test.css".
2. A response range index, a range index using the response size as the key. For example, if the measured response size is 3756 bytes, and we estimate there might be an error of 16 bytes, it should return all nodes (from the graph above) that could fit in 3740 and 3772 bytes.
3. A request range index, a range index, same as above, but using the request size as the key. Note that the request size varies based on the referrer header, the size should be adjusted based on that (normalize at crawling, and adjust at matching based on the possible origin nodes).
4. (optional) A delta range index, a range index, using the delta between the size of the base request and the size of each derived request as the key, and the list of archs that caused that delta as a value. For example, if a "test.html" requst was 372 bytes, which resulted in a "test.css" request which was 392 bytes, the arch connecting the two would be indexed under the key 20 in the delta range index.
As the crawling happens, the crawler adjusts the range index used as the key above based on the referer, cookies set, behavior with chunked encoding, or the fact that different content is returned for the same url.
Sort the requests and responses by the time they started.
For each request in turn:
1. lookup the response size in the response range index, and create an array of "possible response nodes". This might yield to one result already.
2. to verify the correctness of the guess, or to further narrow down a set of possibilities, lookup the estimate of the request size in the request range index, and create an array of "possible request nodes". Then, compute the intersection between "possible response nodes" and "possible request nodes".
3. repeat from 1 for the next request. To verify the correctness of the guess, or to further narrow it down, limit the set of nodes to those connected to the set of nodes computed from the previous request (recursively).
This should lead to the exact set of nodes visited. If not at the first request response observed, after a few iterations.

Note also that if the browser request size is significantly different from that used by the crawler, the algorithm can use the delta range index instead. Eg, looking up the delta between subsequent requests. It should also be relatively straightforward to assume a + or - K error on the measures, and either increase the set of nodes, or backtrack the algorithm to add (or remove a constant) and verify if a graph can be built then.

Some fun with autocomplete

Note that this algorithm also works well when trying to understand what an users is typing on an input form, where ajax is used for autocomplete.

Those sites are relatively simple: as the user types in more characters, more HTTPS requests are sent to the server to narrow down the list of possible responses.

Unfortunately, the same approach I described before can be used: by measuring the request size and response size, and by querying ("crawling") the web site once by myself and registering the request and response size, the algorithm could successfully guess the string that was being looked up.

I tried two web sites: on one, users were supposed to enter the source and destination airport, where whatever the user typed was the prefix of either the city or airport code. On the second web site, the autocomplete was used to fill in the address.

What I did, in both cases:

Created a list of possible prefixes, sorted from most likely to least likely. For example, I got the full list of airports worldwide, the list of addresses of a large city, and created a list of likely prefixes.
Queried the server exactly as if the browser was doing it, for the first letter only, capturing request and response size.
Matching the size of the request and responses given to the user to the ones I queried myself, to guess the first letter.
Repeated the process for the second letter and so on, caching the results, so that following requests would be decoded correctly.

Again, modulo a few glitches caused by "what happens in real life", it worked.

Conclusions

So where, does the approach described here work well? Well, within a few experiments of tuning the algorithm, by "crawling" a small set of web sites and then analyzing the tcpdump of a random browsing session, I was able to tell with reasanoable certainty that I had been looking on my bank's web site for a mortgage, and tried to sell stocks on my trading account.

I was also able to guess which destination I was buying an airplane ticket for with the autocomplete trick I described before, or which category of images I was browsing in a couple random web sites I tried.

Not all pages could be guessed due to the facts of life I explained earlier, and this was an extremely limited experiment with a very naive and rough implementation. Still, I am pretty sure that things can be improved by using using more advanced data mining algorithms, heuristics, a better crawler, and in general by spending more time on this.

Which is all kind of scary, and brings up the next question.

What can I really do to protect my privacy? I don't have great advice at this point. You should keep using HTTPS as much as possible, that's still great advice. Note that your credit card number or SSN are still secure, what you type cannot be deciphered with the approach I described - an attacker can only make informed guesses. However, you should assume that an attacker knows exactly which pages you are visiting, how long you spend on them and how you got there - despite HTTPS.

The general advice is to use a VPN to increase your privacy, but I am personally not very fond of this suggestion.

On one side, it makes it very easy to tell that you are purposedly trying to hide your traffic. On the other, it does not really solve the problem. The VPN provider himself could be profiting from your data: they have your credit card on file, they know the IP you are coming from, they see your traffic going out, they even know you are a person that for some reason is willing to pay to protect his own privacy, and they might just be profiling you and selling your data just like your ISP.

And once we established that even with VPNs there's nothing technically protecting your privacy, the only thing that's left is the user agreement you "signed", which I am sure you read, right? And you are sure it's better than the one you signed with your ISP?

If I was in charge of running any sort of law enforcement or spy agency, I'd probably monitor traffic from/to VPN providers more closely than that of any other random ISP.

The next suggestion would be "go for tor!", which I sometimes use. But it is often slow, and certainly does not give you the best browsing experience, especially if you disable javscript, cookies, and are seriously protecting your privacy.

Some web sites, like duck duck go (and kudos to their developers) return X-Something headers filled with random data to break the autocomplete analysis I described.

It should be trivial to create a browser extension to add random data in request headers, but preventing reply analysis requires work on the server side.

http://rabexc.org/posts/guessing-tls-pages

From PDF to interactive map

Nov 30, 2016 Updated Nov 30, 2016

Show full content

Let's say you are thinking about moving to Rome in the near future.

Let's say you have family, and you want to find all daycares within 30 mins by public transport to your perspective new house.

Or maybe you want to find a house that's near a daycare, which in turn should be within 30 mins to your workplace.

In the past, I would have done this manually: find list of day cares, look at a map, check workplace, apartment, eventually find something that works.

But with a little javascript, some scripting skills, and a couple hours to spare, it turns out that this sort of problem is really easy to solve by using public APIs, and a little work.

Before I get started, and only if you are curious, you can see the outcome here http://rome.rabexc.org. The source code can also be found on github, in this repository.

This article can serve as a very quick start and brief introdcution to Google Maps APIs.

Extracting the data

The first step for me was finding the data: the list of all daycares / pre-schools in Rome.

Turns out that Google maps, yelp, or the usual suspects don't have very good data about Rome: if you just search for "scuola dell'infanzia", "asilo" or similar, you will only get a handful of results.

However, it turns out that usrlazio.it, the regional entity responsible for licensing schools, maintains some relatively good lists (this one, for example).

The bad news? The lists are provided as PDF files, containing tables, that turns out to be fairly hard to parse automatically.

I was hoping that tools like pdftotext and pdftohtml from poppler-utils would produce a simple text file, one record per line, and some amount of spaces separating fields. Instead, the vertical centering of the text in cells caused the tools to be really confused, having records take multiple lines, and no real way to easily tell which line belonged to which record.

Fortunately, I found tabula-java, a simple tool written for the specific purpose of extracting tables out of PDF files. After a few attempts, running something like:

java -jar ./tabula-0.9.1-jar-with-dependencies.jar -r --pages all --guess ../data/ELENCONONPARITARIELAZIO2016_2017.pdf

gave me a nice csv file, that from manual inspection, looked mostly correct. Converting all the pdfs was then as easy as running:

for file in *.pdf; do
  java -jar ../tools/tabula-0.9.1-jar-with-dependencies.jar -r --pages all --guess $file > ${file%%.pdf}.csv;
done

Geolocating the schools

The next steps was to turn all the .csv files, in different formats, into a uniform format with the list of schools and their geographic coordinates. So:

Parse the CSV: easy - pretty much in any language. I picked golang, just because it is one of my favourite languages lately.
Turn an address typed by a human being into coordinates: easy as well - just use the google maps geocdoding APIs I started off with this example, and modified it to suit my needs.
Generate a JSON file to consume from a web site: easy as well. Really, not much to say here. My final json file can be found here.

Within about 1 hour of work, I had the list of schools, with latitude and longitued associated.

The only tricky part, in all fairness, was obtaining an API key from Google I could use, and opening it up so it could be used from my laptop. Not that hard, though: you can manage your API keys from this console, and generate new ones by following one of the thousands "GET A KEY" links in pretty much any of the Google Maps APIs tutorials.

Drawing a map

The next stpe was drawing a map, and show some points. I started with something really simple:

<!doctype html>
<html lang="en">
  <head>
    <title>Test</title>

    <style>
       <!-- important! how big is the map? 400px high, 100% wide -->
       #map {
        height: 400px;
        width: 100%;
       }
    </style>
  </head>
  <body>
    <!-- where the map will be! -->
    <div id="map"></div>

    <script>
    <!-- creates a map, centered on rome -->
    function initMap() {
      var rome = {lat: 41.9028, lng: 12.4964};
      var map = new google.maps.Map(document.getElementById('map'), {
        zoom: 12,
        center: rome
      });
    }
    </script>

    <!-- actually loads the maps APIs -->
    <script async defer src="https://maps.googleapis.com/maps/api/js?key=YOUR_KEY&libraries=places&callback=initMap"></script>
  </body>
</html>

Which pretty much displayed a simple map, centered on Rome. And moved on from there:

Add a marker on the map? Really easy:

 var coordinates = {lat, lng};
 var marker = new google.maps.Marker({
    position: coordinates,
    title: "My cool marker",
    map: map,
 });

Hide the marker?
```
marker.setMap(null);
```
Show it again?
```
marker.setMap(null);
```

Make it clickable? So a nice pop up would show details about the school?

var window = new google.maps.infoWindow({content: "<b>Arbitrary HTML HERE</b>"});
marker.addListener('click', function() {
  window.open(map, marker);
});

Automatically close a pop up when another marker was clicked on?

var nowOpen = null;
[...]
marker.addListener('click', function() {
  if (nowOpen) nowOpen.close();
  window.open(map, marker);
  nowOpen = window;
});

This pretty much got me a static map with all the schools in less than 30 minutes of work.

Picking a start address

Now I wanted to be able to 1) pick an address, and 2) only show the schools that were reachable by public transport within a given time from that address.

I started by adding two input boxes in the HTML:

One to specify a time, in minutes.
One to select an address, a starting point.

For this second box, I wanted to have an address autocomplete that worked well. Once again, it was really easy to do:

// Find the input box autocomplete should work on.
var input = $("#address-from")[0];
// Attach the autocomplete functionality to it.
var autocomplete = new google.maps.places.Autocomplete(input);
// Limit the search to the map on the screen.
autocomplete.bindTo("bounds", map);

and then:

autocomplete.getPlace();

would just return the coordinates of the selected place.

Filtering the schools by distance

The filtering part was actually the hardest. Turns out that there is a very simple API (distancematrix) to fetch the distance between multiple addresses with Google Maps. However:

The API is limited to about 25 origins per request. I had ~700 schools.
The API is limited to a few requests per second, unless you pay for the API. I did not want to pay for a toy project.

What I ended up with works well and it is still not that hard:

Query the distance for the first 25 schools.
Wait some time. Repeat for the next 25 schools.
If an error is returned, retry the same schools after waiting some time.

Something like this turned out to work pretty well:

// List of the coordinates of the schools.
var schools = [ ... {lat, lng}, {lat, lng}, ...];
// Prepare to use distancematrix API.
var service = new google.maps.DistanceMatrixService();

// Computes the distance for the 25 schools starting at
// offset start, and then the next 25, and so on.
var filterSome = function (start) {
  if (start >= schools.length) return;

  service.getDistanceMatrix({
    origins: schools.slice(start, start + 25),
    destinations: [autocomplete.getPlace().geometry.location],
    travelMode: 'TRANSIT',
  }, function(response, status) {
    // Retry same set of schools in 1 second if an error was received.
    if (status == "OVER_QUERY_LIMIT") {
      setTimeout(filterSome, 1000, start);
      return;
    }

    $.each(response.rows, function (key, value) {
      // [...] marker.setMap(null) or marker.setMap(map) to hide/show
      // schools, update a progress bar, add some text to the school
      // description.
    });

    // Move to the next set of schools if distance was computed correctly.
    setTimeout(filterSome, 1000, start + 25);
  });
};

// Start filtering schools from the beginning.
filterSome(0);

If you look at the html of the page at rome.rabexc.org, you can see that the code is a bit more complex, but not that much.

The main reasons for the difference is that I wanted to display a progress bar, as it would take about 30 seconds for the filtering to complete, and well, there's actually code to compute the time, and display or hide points accordingly.

Conclusions

Doing something like this 10 years ago would have probably been hard or unfeasible from the comfort of my home. Just finding good geolocation and route calculation APIs with public transports and related data would have been both hard and expensive.

As of today, I managed to put something together that is well usable in the time it would have taken to watch a movie, which hopefully will save me at least a few hours of planning, as we go through the list of apartments and actually find a daycare / pre-school we like.

The best part? The list from usrlazio.it is showing about ~600 - 700 schools, against the 10s I could find on maps / bing / google and similar. By manually checking some of them, it seems like they are indeed pre-schools and daycare, even though the information is often buried deep in their web sites (eg, elementary school that also provides pre-school, or religious estabilshment, ...).

The worst part? That same list seems manually maintained. The addresses, school names, and so on contain several errors or typos. Sometimes the address is a legal address, rather than where the school actually is, and sometimes the Google maps APIs could not figure out the typos or the correct address based on the text in the .pdf.

By skimming through the points and by double checking the addresses, I would say about 5% of the records have something wrong. The pop-up, though, is now showing all the data from the .pdf, and linking to a Google search. So it is pretty easy to sift through the errors.

http://rabexc.org/posts/rome-maps

Using CLANG to generate HTML files

May 17, 2016 Updated May 17, 2016

Show full content

Did you know that you can generate nicely formatted HTML file from your source code with clang?

I just noticed this by peeking in the source code, took me a few attempts to get the command line right, but here it is:

clang -S -Xclang -emit-html test.c -o test.html

Which will create a colorful and nicely formatted version of your .c file, which you can see here.

screenshot of nicely formatted C file

The only annoyance? Not surprisingly, it will fail if the file syntax is invalid, or it can't be parsed correctly. You should probably pass all the same options as if you were compiling it for real.

http://rabexc.org/posts/clang-html

SSL Certificates, Debian, and Java

Dec 10, 2015 Updated Dec 10, 2015

Show full content

Recently, I tried to run a Java application on my Debian workstation that needed to establish SSL / HTTPs connections.

But... as soon as a connection was attempted, the application failed with an ugly stack trace:

ValidatorException: No trusted certificate found
sun.security.validator.ValidatorException: No trusted certificate found
        at net.filebot.web.WebRequest.fetch(WebRequest.java:123)
          at net.filebot.web.WebRequest.fetchIfModified(WebRequest.java:101)
          at net.filebot.web.CachedResource.fetchData(CachedResource.java:28)
          at net.filebot.web.CachedResource.fetchData(CachedResource.java:11)
          at net.filebot.web.AbstractCachedResource.fetch(AbstractCachedResource.java:137)
          at net.filebot.web.AbstractCachedResource.get(AbstractCachedResource.java:82)
          at net.filebot.cli.ArgumentProcessor$DefaultScriptProvider.fetchScript(ArgumentProcessor.java:210)
          at net.filebot.cli.ScriptShell.runScript(ScriptShell.java:82)
          at net.filebot.cli.ArgumentProcessor.process(ArgumentProcessor.java:116)
          at net.filebot.Main.main(Main.java:169)
Failure (<C2><B0>_<C2><B0>)

First attempts at solving the problem were trivial: install all trusted SSL certificates on the Debian box.

apt-get install ca-certificates-java
apt-get install ca-certificates

This did not help, though: turns out that ca-certificates-java installs a script /etc/ca-certificates/update.d/jks-keystore that whenever ca-certificates is updated, re-generates the java certificates. Given that ca-certificates was already installed on my system, the newly installed script was not invoked (or it did not work properly? See below). Fail. A simple apt-get install --reinstall ca-certificates seemed to run the script, and create the file.

However, the stack trace did not go away: the application kept failing. As I know nothing about how java certs are handled in debian, I run a simple strace of the java interpreter running my application. A long shot, but grepping for open in the strace.log file and looking for files that seemed relevant, showed that java was looking (among many other files) for /etc/ssl/certs/java/cacerts, which contained very little (and was non existant before installing ca-certificates-java).

By quickly poking at /etc/ca-certificates/update.d/jks-keystore, a few things looked fishy, but did not spend time debugging it.

Rather, I tried to add the certificate manually as a first step to see if it would fix things.

Went to chromium, went to the https site the Java app was trying to access, looked at the root authority, looked for the corresponding file in /etc/ssl/certs/ (a simple ls |grep -i authorityname), and then run the commands:

# cd /etc/ssl/certs
# openssl x509 -outform der -in DigiCert_High_Assurance_EV_Root_CA.pem -out /tmp/certificate.der
# keytool -import -alias DigiCert -keystore ./java/cacerts -file /tmp/certificate.der

when prompted for a password, turns out there is a default signing password of changeit, which also works on Debian.

Once this was done, the stack trace disappeared. Rather than debug jks-keystore, the fastest way to get the java app to run was to (sigh, sigh, sigh :-() add all root certificates to the java/cacerts file.

A simple for loop helped:

# cd /etc/ssl/certs
# for file in *.pem; do openssl x509 -outform der -in "$file" -out /tmp/certificate.der; keytool -import -alias "$file" -keystore ./java/cacerts -file /tmp/certificate.der -deststorepass changeit -noprompt; done;

which solved the problem. Now to file bugs...

http://rabexc.org/posts/certificates-not-working-java

Flying with babies, how we survived

Apr 18, 2015 Updated Apr 18, 2015

Show full content

A bit more than a year ago I became the proud parent of the cutest little girl in the world.

By living abroad and traveling often, the little one had to endure quite a few trips with us on her first year: west to east coast and back, with a road trip involving New York and Boston, a few trips to Europe and one trip to Hawaii. All spiced up with hours of driving and a few rides on trains, buses and even trams.

In this blog post I'd like to tell you about our experience flying internationally with a baby: what worked, what didn't, and the lessons we have learned.

Planning your trip... Documents and paperwork

If you are traveling internationally, your baby needs his/her own passport.

Getting a passport is easy: bring a picture, birth certificate, your ID (passport, in state driving license may be ok), your child, the other parent of your child, and all together go to the nearest passport agency. The all together part of the process is important: if your partner can't be there, you'll need more paperwork ahead of time, and baby must be with you.

Once there, you'll need to fill a DS-11, read an oath, pay the fee (check or exact cash only in many agencies), sign some documents, and chit chat with the employee. If everything went well, you'll get your passport by mail a couple weeks later.

The hardest part for us was the picture: there are strict requirements (read online!), baby was just a few weeks old but still she had to have her eyes opened with her neck straight, a normal looking expression, with an uniform background.

The shop we went to was not really equipped for very young babies, and laying her down on a white duvet only addressed a few of the problems.

She kept moving and giving us cute, funny and lovely expressions that the passport office would unfortunately not accept.

In the end, we gave up. Drove home, and took the picture ourselves: set the camera to 5 shots per second, took 100s of pictures until one was good enough, and printed it at the nearest Walgreens/CVS/Fedex/UPS/Whatever store.

Dual citizenship

If your child was born in the USA he/she will be an American citizen, and can have an American passport. If you are traveling back to your own home country, make sure your child has the right paperwork to stay and travel there: he/she may need a VISA, or you may want to get the paperwork done to recognize his/her dual citizenship before traveling.

In our case, applying for dual citizenship involved collecting a fairly large list of documents, getting them translated, and mailing them to different offices. The whole process took about 5 to 6 weeks, but got us an European passport for our baby without too much hassle.

When traveling, though, remember to bring both passports and use the best passport for the country you are entering or leaving. It is not only a matter of convenience, as you may be entitled to use a different line, it may be mandated by the local laws or otherwise require you to present a VISA or other proof of citizenship.

For example, when we enter or leave the USA we will present her US passport, while when we enter or leave the EU we will present the EU passport.

Buying tickets for your baby

Conditions change wildly by airline, but there are normally two ways to get your baby a ticket: as a lap child, or with his/her own seat.

A ticket for a lap child is often free or only a small percentage of an adult ticket (10% is common).

Your baby must be younger than a certain age (normally 2 years), and he/she will not be entitled to a seat or a meal, you will need to keep the baby on your lap.

As soon as you indicate that you have a lap baby to the airline, they may prevent you from picking a specific seat online, as the rows with lap babies must meet certain requirements.

On some airlines, the ticket for a lap child may allow you to bring some extra luggage for free, like an extra suitcase.

If you decide to buy your baby his/her own seat instead, he/she may not be able to really seat on it unless you bring a car seat with you.

On long hauls, some airlines provide bassinets to be used for lap children after take off and as long as there are no turbulences. You may need to ask the attendants for it to be installed, although they always remembered with us.

I have yet to see an airline that allows you to reserve the bassinet online: some web sites go as far as stating that bassinets will only be offered on a first come first serve basis.

The trick is to call the airline by phone: so far the operator was always able to confirm the availability of bassinets, while blocking the right seats or the right flights for us (yes: some of the flights we were looking at had all the bassinets taken).

If you do this, make sure they don't change your seats at check in. Given that there can be only a certain number of babies per row and it is not common to have the right seats ahead of time, they may think that a change is necessary. We have blocked them at least once.

Here are a few more important things we learned:

conditions change wildly by airline. Some airlines don't even allow you to buy tickets for infants online, you have to call them by phone. Calling them by phone may be a good idea for the bassinet.
So far all airlines allowed us to gate check (or check in) both the stroller and car seat for free. By reading the fine lines, some airlines require that the stroller must be more like an umbrella stroller or a single piece, although we never had any issue.
Even if your infant is traveling as a lap child and did not pay for an extra ticket, if you ask nicely at the check in desk they may be able to block one of the seats next to you. In that case you don't have to check in your car seat: you can bring it with you all the way to the airplane and use it to give your baby a little comfort on the extra seat. Watch out though that they will not offer, you have to ask, and your car seat must be approved to be used on planes (which is common).

Car seats

Your American car seat may not meet the requirements to be legally used in other countries, and vice versa. Surely it cannot be used in Europe. If you travel to a place where it is legal to use your car seat and you will need to drive, bring it with you.

Renting a car seat will generally give you an horrible car seat, and just a few days of renting will be as expensive as buying a cheap car seat yourself.

Most airlines will check in your car seat for free, but beware: even though we bagged it, out of 4 flights our car seat came out damaged once (minor, but still annoying).

If you payed for a seat for your child and your car seat is approved by the FAA (common), or you lucked out at the check in desk and they had a free seat next to you (see above), you may bring your car seat with you on the plane and use it for your child.

Packing for the emergencies

While we were quietly enjoying our first flight with our baby, proud of our smooth experience so far, we suddenly realized there was liquid poop pouring through her clothing. All of her clothing, even socks.

Changing there was no problem: we were ready for such an event, and just locked all the dirty clothes in a ziplock bag.

A few hours later, literally before boarding our last flight, we again realized that poop was pouring out everywhere. Although we had one last change of clothes with us, the flight was about to take off, did not have the time to do the change. Further, the flight lasted only 45 minutes, and for the whole duration of the flight we had the "keep the seat belts on" sign lighted.

Luckily for us, we had brought some large chux (disposable underpads) which we used to wrap the baby (a lap baby) and isolate our body from the overflowing poop.

Our suggestion is to:

always have at least 2 full changes with you in your carry on luggage.
if you have a transfer, you may miss a connecting flight and be stranded there until the day after. Worse, your checked in luggage may not be available there. With transfers, we generally bring 24 - 48 hours of food, diapers and wipes with us, spread over mine and my wife's carry on.
disposable underpads are always useful, bring a few. Having them may make a significant difference in case of overflows or explosive poops.
bring one or two ziplock bags and/or plastic bags, in case you have some clothing you need to "seal away" and carry with you home.
bring some change of clothing for you as well. If your t-shirt gets stained with vomit or poop, it is nice to have a change. Unless you enjoy smelling it for the whole duration of the flight or plan to inflict some pain to other fellow passengers.

Planning for your stay

For any trip longer than a week or 10 days, it is hard to bring enough supplies for your baby with you (food, diapers, ...).

In addition to what we bring in our carry on luggage for the flight, we generally pack at least 3 or 4 days of baby food and supplies (diapers, wipes, ...) for when we reach our destination. This gives us enough time to find local supplies of the products we need.

Talking about hotels and where to sleep, call them ahead of time and make sure they can provide a crib. If there is no crib, you may want to explore co-sleeping (a no go for us) or buy something like a portable bed.

We bought what they call a "peapod", a very compact sleeping tent for babies, which we have used a few times by now. Baby was happy, and slept without much issues.

In terms of hotels, hostels, and places where to stay, we now strongly prefer locations that provide a kitchen or at least a refrigerator and microwave.

Your trip... Getting to the airport and back

Getting to and from the airport with your beloved one may be a little tricky. A few things we learned:

In many states, including California, a child does not need a car seat to ride in a cab or a shuttle. You can instead rely on cabs and buses never getting into accidents, magically.
Many cab / shuttle services provide a car seat for a little extra. Our local cab company asked for a 10 $ fee per cab ride. Note that this is 5% of the cost of a 200 $ car seat, 20 trips will buy you a new car seat.
Even when landing at the airport, some cab companies will gladly pick you up with a car seat ready, no questions asked. Just call them when you are ready and they will tell you where to go at the airport to be picked up. Often this involves going to the limo / driver corner, but no task is too challenging after flying for 15 hours with a baby. You may want to check with your cab company ahead of time, though, spending half an hour on the phone trying to find a way to get back home in case they don't provide this service is not fun.
If you drive your family to the airport, they don't have to park the car with you. It is much more convenient to drop everyone off, including luggage, at the correct departures door, and then park the car by yourself. You can then then find your way back to check in with no luggage and no screaming baby. Same thing on the way back: leave everyone chilling out on a bench at the arrivals, find your way to the parking lot and then drive back to the arrivals to pick everyone up.

Within the airport

Moving around with luggage, stroller, car seat and baby at the airport may not be an easy task. We generally use a cart for as long as we can.

You then need to decide 1) what to check in, and 2) what to carry to the gate all the way with you.

If you have bought an extra seat or lucked out at check in, you'll probably want to carry your car seat all the way to the plane. In this case, bring a caddy with you, a small car seat carrier (common for travel), and check in your stroller.

If not, you probably want to check in the car seat, and carry the stroller with you.

Until we drop off our luggage or get a cart, we try to put the car seat on the stroller and have baby in a carrier or carry around the car seat by tying it to the handles of a wheeled suitcase.

Past security, many airports have family spaces. The one in the Zurich airport for example turned out to be amazing: good, nice, changing tables, microwaves, sink, facilities to cook, wash and seat around a quiet table, tons of toys, extremely nice staff, and beautiful view of the airport.

Don't forget to change your baby every chance you have. If it took you 1 hour to get to the airport and you arrived 2 hours earlier, chances are that before boarding your baby will have a 3 hours old diaper. Change it right there and then, don't wait to do it on the plane, it is much less comfortable.

Strollers and planes

You will need your stroller while waiting for boarding, once you arrive, and if you transfer.

If you are strong and willing you may get away with just a baby carrier strapped to your body. We have never been brave enough to try: we always wanted a place for her to nap or enjoy her milk without being strapped to a sweaty and overheating parent.

Consider also that planes may be late, and you may be forced for several hours on the ground waiting for a transfer.

In any case, most airlines will gate check your stroller for free. At check in, ask to bring the stroller to the plane and tell them you want it back as soon as you land.

They will give you a label to tie to the stroller with a bar code and destination, and let you go. Once you reach the gate, after security, and once you are about to literally enter the plane, there will be a corner or a random guy with a security vest collecting all strollers and taking care of them.

On some small airplanes with steps and buses to board, we were instructed to abandon the stroller next to the airplane: although scary, it worked out.

Make sure you have the label: they seem to forget often in the confusion, and if you don't have it once you get to the plane they will send you back to the gate, where they have extra.

If you have a transfer, you may also need to change the label. Ask at the gate of your next flight.

Checking in

We gave up on online check in with babies. We have to drop our luggage anyway and always end up asking about different sitting arrangements.

Don't forget to ask for a tag to attach to your stroller, and if your child is a lap child, ask if they have seats available next to you that they can block off. They will not offer, but they will gladly help if they can.

Baby food and security

We had to supplement our baby with formula, and she loves milk in general. All we had to do was pack all the food, including milk, in a separate bag to run through the X-Ray machine. We were never asked any question and never had any issues despite quantities, bottling, or packaging.

We generally empty our own water bottles before security, and fill them up again past it, although we were told it was not necessary.

In long haul flights we were worried about milk going bad: we'd usually bring a bottle or two with fresh milk to use in the first hours of the flight, and then a few small bottles with pre-measured milk powder in them. All we have to do to feed baby is add water, shake, and attach a nipple.

The only issue we ever had with security was related to my wife being swabbed and testing positive for dangerous residues. We believe it was because of the butt cream she used on our baby just before leaving, with high concentrations of zinc oxide, but don't know for sure.

Boarding

Most airlines call families ahead of time, often right after business class.

Make yourself visible and known, and if you think they are about to call everyone, you may want to just get to the front of the line. Especially on small or short flights where families are fewer and rare, they often forget to call them explicitly.

Don't get me wrong, though. I'm not advocating skipping or cutting the line just because your child entitles you to do so: a few minutes standing in line on a slow boarding may swing the mood of a toddler or a baby from smiling/happy/laughing to crying/screaming/throwing a tantrum, especially if it's the second or third flight after a long haul.

Getting in front of the line may significantly improve the travel experience not only for you, but for all the other passengers. Those extra minutes will allow you to take care of the stroller, ensure you can safely store your carry on nearby, with easy access to food and changes, and will allow you to do everything without other passengers pushing to get past you and reach their own seat. A better experience for everyone involved.

On the plane

There is not much to say here. Facilities normally have a changing table. Bring something to keep your baby entertained. Walk from time to time, use the spaces you can find to let your baby crawl or stand for a few minutes. Use a pacifier if necessary, keep food and changes nearby, make sure you have water.

After the first trip, we always used a baby carrier on the plane: handy to keep her strapped to your body in a comfortable position, even when napping, while leaving your hands free. Forget to use a laptop: there won't be enough space.

A book or phone may be fine, unless he/she decides she wants to play with them.

If there is any passenger keen at entertaining your baby with smiles, some playful activity or silly chit chat... let them.

It is great to have some help and relief, and on long flights, you will need all of your energy for when nobody else is around.

Which reminds me... I should really thank you, you kind stranger, for all the entertainment you provided my baby and the relief you gave me, even if for a few minutes.

Conclusions

Traveling is fun, even with a little one! Every experience we had got us closer together, and every thing we did wrong thought us an important lesson.

She will probably not remember much as she grows up, but at least we can see her smile every time she hears the voices of the people she met while traveling.

We have memories of her shyly getting her feet in the ocean for the first time, or crawling on a Mediterranean beach making her first encounters with the sand and sun, or the stupor in her eyes the first time she boarded a subway in New York or when we first took off and landed.

http://rabexc.org/posts/travelling-with-babies-internationally

Getting started with VoIP

Mar 31, 2015 Updated Mar 31, 2015

Show full content

I have been living in different countries for the last 10 years. Although I can get in touch with friends and family using gtalk, skype, or you name it, having an extremely cheap phone line to receive or make calls on can come in handy.

Given my love for technology, trying out VoIP (or voice over IP, or phone over the internet), seemed like the next logical step.

There are 2 things you need to do to get started with VoIP:

Find a VoIP provider, what people in the industry refer to as an ITSP (aka Internet Telephony Service Provider). A fancy name for a company with tons of phone lines and good internet connectivity willing to turn phone traffic on good old phone networks (called PTSN, or public switched telephone network) into internet packets.
Get a VoIP phone, eg, any sort of "telephone like thingy" that is able to route and receive calls over the internet using the VoIP protocol. We will talk more about this later, but what I mean by "telephone like thingy" can take many different shapes and colors: it can be an iphone or android app installed on your smart phone, a black box with a telephone plug on one side, and an internet plug on the other side (generally referred to as ATA - Analog Telephone Adapter), or something that pretty much looks like a phone from the 60s but happens to have an ethernet plug rather than a telephone one.

Finding a VoIP provider

Let's start with finding a VoIP provider. Here is what I discovered with a few hours of research and some experiments:

Wherever you live, it is likely that your local telephone or internet provider will also provide VoIP services.

For example, Comcast (sigh...) about once a month sends me a postcard pledging the qualities of their VoIP contracts and how much I can save by having a contract with them.

Give them a call and enjoy the experience, something convenient may (or may not) come out of it. Don't stop there, though: do a bit of research before signing up any contract, you may end up with a much better deal.
If you google for about 15 minutes the word "VoIP provider", "ITSP", or "cloud phone" you will find plenty of VoIP internet providers. Generally you can find three classes of providers:
1. Those who specialize in outbound phone calls, often targeting the international market.
  
  They will not give you a phone number to receive calls on, or will charge you a lot for one, but they will allow you to call pretty much anywhere in the area they cover for extremely cheap rates.
  
  In my case, for example, I found a VoIP provider that would charge me half a cent a minute, including cell phones, for many countries in Europe! This is significantly cheaper than what my friends or relatives even pay over there!
  
  The main distinguishing factors for those companies are the rates: how much you will pay to call a specific country, availability of flat rates, and so on. Shop around, and you will find something that works well for you, but remember to check for the countries you call most often.
  
  Also, pay attention to what they allow you to do with the caller id, or CID: some do not allow you to set a CID. Any call you make will appear as coming from an "unknown phone number" or some other "weird crazy phone number" to the receiver. If you plan to use the phone for business, this might not sound very attractive.
2. Those who specialize in offering you a presence somewhere (... or anywhere in the world).
  
  Those companies allow you to "rent" or "buy" a phone number, even toll free numbers, for like 2 - 10 $ a month pretty much anywhere in the world, and route all incoming calls to your VoIP phone.
  
  Some of those providers just charge this fixed cost, others will add a setup fee, while others will also charge you a per-minute fare for every incoming call.
  
  Watch out, however, that if you use their services to also make outbound calls they generally charge much more than the providers I talked about earlier. And don't be fooled: 5 cents a minute compared to 0.5 is 10 times as much! Even when it comes shy to what your traditional provider is charging you for the same call.
  
  The main distinguishing factors here are the costs, if they let you choose the phone number, what numbers they are able to broker (all area codes? 800? ...) and how fancy their system is in handling incoming calls.
  
  Talking about fanciness: one provider may charge for every voice mail left on your line, allow you to receive FAXes as well, they may provide multiple lines so you can have more than one incoming call at any given time, or provide a full fledged PBX you can program with a web interface with the ability to provide voice menus, automated answers, and so on.
  
  In my case, I found a company that provided a "basic" phone number (no voice mail, no fax, just inbound phone calls...) in the UK for free! Literally: no monthly subscription, no per minute charge, no setup fee, for as long as you keep using their number.
3. Those who sell a full package pretty much equivalent to what you would buy from a local phone company. The package includes one or more phone numbers, one or more inbound or outbound lines, and fancy services like voice mails, PBX, FAXes and so on.
  
  They generally seem to target customer who are used to the traditional market, and are fearful of taking bold moves. The prices I found here and structure of the contracts, although significantly cheaper and more flexible, are similar to traditional providers.

Something that I greately enjoyed and is extremely important to note is that most VoIP phones allow you to use multiple providers, at once.

That's right: you can configure an account and use it only to make outbound calls, and another account (or more than one) to receive inbound calls.

For me, this was a revelation: I immediately opened an account with the cheapest outbound call provider I could find for Europe, and... got an inbound phone number for FREE in the UK.

If I was planning to use VoIP for my business I would probably look to buy my inbound phone numbers from a more reputable source, though: I would not want my phone number to disappear with a VoIP provider running out of business, or lose customers because they continuously have outages.

I would however have no qualms about having contracts with multiple outbound providers, and just use the cheapest one that works at any given time.

Talking about the telephone like thingy

Now that you have found a VoIP provider that you like with reasonable fares, let's see what you need to buy to get started.

First of all, you don't have to buy anything: VoIP is based on open protocols, and just like you have a browser on your phone or your computer to visit any web page, you can install an app to make or receive phone calls through the internet using any VoIP provider.

There are plenty of apps, some are good, some are bad. Just look on your favourite search engine or the market on your phone for "VoIP" and you will find as many as you need.

The main thing to be aware of here is that many web sites refer to VoIP apps as any application that lets you make phone calls through the internet. Many apps however use proprietary protocols and will only provide services through their one provider.

If you want to use a random provider like the ones you found in the previous section and do not want to be tied to a specific company, you need to find an app that lets you use the standard protocols. Look for keywords like "SIP" or "RTP", and read the comments.

Let's say now that you want a real phone, one of those big devices to keep on your desk that rings from time to time.

There are plenty of choices to go by:

You can buy a classic VoIP phone. You can find those in all sorts of shapes and colors: from desk phones that look like they are coming out from a movie in the 60s, to bleeding edge cordless phones. From terminals that look more like computers to devices running android.

One important thing to keep in mind is that VoIP phones require both a power plug, and internet connectivity. Requiring internet connectivity means that you either need wifi, but you need to buy a phone that supports it, or an ethernet cable from your modem. Wifi, for voice, may turn out to introduce jitter or delay, I personally would prefer cable.

Each one of those phones will need to be configured to connect to your VoIP provider. This can happen through menus on the phone itself, or by connecting a computer to their ethernet plug. Just follow the instructions that come with the phone.
You can buy an ATA. This is a fancy name for a small box that has an ethernet / wifi connection on one side, for The Internet, and a telephone connection on the other.

You can then connect your existing phones to the ATA, just like you would connect your phone to any other plug.

ATAs again come in all shapes, colors, costs and features: you can spend a few tens of $ for something pretty basic, to a few thousand dollars for something to use for your business.

One thing to keep in mind about ATAs is that many do not allow you to use old modems (like for a POS, for credit card payments) or FAX machines. Both require special support.

Look at the fine lines before buying the equipment. For faxes, you probably want devices (and VoIP providers) that support the T.38 standard. For modems, many ATAs provide options to use a "non compressing codec", like the G.711. This works, but is often brittle and unreliable.

You are better off buying a POS that connects directly via internet protocols, and/or pay for a VoIP provider that does FAX to PDF email or similar.

In my case, I went for a VoIP DECT set of phones: a small black box connecting to the internet, able to control up to 6 cordless phones using the DECT standard.

Before choosing the phone I wanted I spent a couple hours looking at the manual online, though: I wanted to ensure the box allowed me to configure multiple VoIP providers, using a specific VoIP provider for outbound calls, and ring phones differently based on the inbound call.

Although this is easy to do in terms of technology, some phones and ATAs have web interfaces or configuration files that only allow limited ability to specify what to do.

On the other end of the spectrum, you can find more expensive devices that allow you to program what to do to the letter: "if number starts with this prefix ... then use this provider", offer PBX functionalities, and so on.

Another alternative is to install a software like Asterisk, FreeSwitch or FreePBX on one of your servers / desktops. Have your phones connect to it, and have your server connect to VoIP providers. Those softwares allow you to do pretty much anything you may ever dream of.

Setting things up

There is really not much to say here. Once you subscribe with a VoIP provider, they will give you a username, password, a sip server, port number, and maybe a stun server.

If you know nothing about those, don't worry: connect your VoIP telephone or ATA box to a power plug, connect it to your modem with an ethernet cable, power it up, and follow the instructions.

Most often all you have to do is enter the parameters that were given to you in the corresponding boxes, and try it out.

If it doesn't work the first time, double check that you typed everything in correctly, and try again. Worst comes to worst, Google for your phone, for the VoIP providers, and you will likely find plenty of documentation.

A few words about your internet connection

If things don't work after the second shot, or if you are curious about how things are supposed to work, here is a short overview.

VoIP is normally based on two protocols:

To draw a parallel with traditional phone lines, the first protocol is used to make your phone ring or someone else's phone ring, it is used to "control phones and subscribe to events on phone numbers". But once you pick up the phone or the remote end answers, an algorithm (codec) will come into play to transform your voice in tiny 0s and 1s, while the RTP protocol will be taking care of transferring those tiny 1s and 0s from one end of the internet to the other.

That's right, as the name of the protocol suggests, SIP is used to establish a session between two VoIP endpoints, while RTP is used to transfer media (your voice, sound) in real time from one phone to the other.

When you have NAT

If you are a tiny bit familiar with networks, you may have spotted one of the most common issues related to VoIP at this point: two VoIP phones need to send packets to each other, directly.

This sounds easy, but most residential internet connections do not have a public IP, which means they don't have an address that can be used by others to initiate a connection to.

Most residential connections are in facts behind NAT, or Network Address Translation, which means that many devices (or customers) are hidden behind a single IP address, which may even change over time!

To draw a parallel, it is like if you lived in a condo where mail cannot indicate a name or an apartment number, only the message it is replying to.

The most common symptoms of having NAT related problems is having your phone ring, but once you answer, you can't hear anything or the other party can't hear your voice. Other symptoms include ability to make outbound calls, but inability to receive inbound calls.

But fear not, VoIP has been around for long enough that in most cases you will be able to find a solution.

The most common solutions and workarounds, include:

configuring a STUN server Nothing too complicated: if you search your VoIP provider web site, you will likely find the address of a STUN server they provide for their customers. Most VoIP phones and boxes at this point allow you to enter, in their configuration, an optional STUN server to use. Just fill the box! STUN allows your phone to discover how your connectivity is setup and communicate it to your VoIP provider. If you are lucky, just configuring a STUN server will solve all your problems. changing the NAT configuration of your modem Generally it is the modem your ISP installed in your house that performs NAT. If you spend some time looking at its configuration you can probably instruct it to forward all the VoIP traffic to your phone. One way is to just, well, forward all incoming traffic to your modem. Another way is to discover the ports used by your phone (or well, configure it to only use some ports) and make sure that those ports, on your modem, are forwarded to your phone. The normal port used by SIP is 5060, but RTP negotiates the port numbers as part of the protocol exchange. setting a lower keepalive interval Most phones allow you to configure a keepalive timer. If this is disabled or the interval is too long, a NAT device may think that your connections are dead, forget the association about public port / ip and internal port / ip, closing the port you need to receive calls on. If you decrease this time to send a message once a minute or every 30 seconds, the connection will be kept opened, which will help ensuring that SIP will work. It does not help with RTP, though.

If everything fails or not comfortable changing those settings, ask your ISP or VoIP provider. For a small fee, your ISP may be able to give you a public IP address to use, upgrade your modem or change its configuration to something that better supports VoIP. Some modems and NAT devices even have explicit support for VoIP and SIP, meaning that they know how the protocols work, and will do the right thing for you without too much effort.

When you have a public IP

In my case, my ISP provided me with a public (and static) IP address. With no NAT in between and a well known address, setting up VoIP turned out to be a breeze.

Problems however started a few days later, when my phone kept ringing every half an hours showing calls from random numbers (like 200, 500, 2000, ...) but without anyone on the other end of the line.

With a public IP address, no NAT, and no firewall, anyone can try to connect to your phone and control it over the internet.

Not only instruct your phone to initiate an inbound call, making it ring, but also reconfigure it or steal the username and password with your VoIP providers by accessing the web interface.

If you put your phone systems on a public IP address, you should:

Make sure the control panel to configure the phone system is protected behind an username and password. Make sure to change them.

My phone even allowed me to disable the configuration interface entirely, until re-enabled by physically pushing buttons on the phone itself.
Check if the phone has any parameters to restrict inbound phone calls to allow only the VoIP providers you configured to make the phone ring.

On my Panasonic phone, this feature was called "SSAF" or "SIP Source Address Filtering".

In general, installing your phone behind a firewall or NAT device might be wise in terms of security.

Talking about security

In the modern day and age many web sites like google or github use strong encryption by default. Governments have been exposed to snoop on our conversations, and privacy breaches make headlines almost daily.

Despite that, I was surprised to discover that with VoIP using encryption is no easy task.

Not because the technology does not support encryption: standards to encrypt traffic have existed for a very long time. The main issues I found are:

It is hard to find VoIP providers that allow the use of encryption. Those who do seem to target the security market, and ask for a premium price to offer you the privacy you deserve.
It is hard to find VoIP phones (hardware, ATA, ...) that support encryption and allow to configure it easily.

The general excuse is that PSTN networks, the old telephone networks, use no encryption and provide no protection. This means that as soon as your conversation hits the old phone networks, who knows who is snooping on those wires.

Still, encryption for network communications with all its pitfalls is not that hard to deploy today, and is becoming the default for many applications.

VoIP providers in different countries are fairly common, and knowing that my traffic was at least encrypted for as far as it is easy to do so would give me some peace of mind.

As it is, my neighbors, anyone who can access the box next to my house, or any of the ISPs my traffic goes through, any country it traverses, can easily hear what I say when talking with my friends (or my bank?).

With encryption, my traffic would be protected all the way to the servers of the VoIP provider somewhere in Europe, and likely only be in clear once it hits the national phone network.

Conclusions

I have been using VoIP for a couple weeks now. Despite a few hours of phone calls across the globe, I have been charged less than a dollar.

I have a phone number people can call me on in Europe, and setting up the line costed overall less than 10 $, with no montly fees.

The largest expense by far was butying the DECT ATA and a few cordless phones: less than 200 $, and an expense I would have made anyway even if I bought a traditional line.

Except for the encryption part of it, so far I am very happy with the outcome.

I have to thank the reddit community for all the help in getting started. They pointed me the right direction, and provided links to fill my poor knowledge of the technology.

http://rabexc.org/posts/getting-started-with-voip

The pitfalls of using ssh-agent, or how to use an agent safely

Nov 12, 2014 Updated Nov 12, 2014

Show full content

In a previous article we talked about how to use ssh keys and an ssh agent.

Unfortunately for you, we promised a follow up to talk about the security implications of using such an agent. So, here we are.

If you are the impatient kind of reader, here is a a few rules of thumb you should follow:

Never ever copy your private keys on a computer somebody else has root on. If you do, you just shared your keys with that person.

If you also use that key from that computer (why would you copy it, otherwise?), you also shared your passphrase. I generally go further and only keep my private keys on my personal laptop, and start all ssh sessions from there.
Never ever run an ssh-agent on a computer somebody else has root on.

Just as with the keys, I generally don't run ssh-agents anywhere but my laptop. And when I say "has root on", consider that you are both trusting that person to not abuse his privileges, and to do a good job at keeping the system safe, up to date, and without other visitors.
Only forward your agent connection to machines you trust.

As you will see further down in this article, forwarding an agent is equivalent to sharing your keys with anyone who managed to get root on that machine. And this is not theoretical: getting access to your keys takes at most a few lines of a shell script.
Make sure your keys and your agent are unloaded when you log off your machine.

If you are one of the old school guys that simply starts his agent with something like:
```
  if [ -z "$SSH_AUTH_SOCK" ] ; then
      eval `ssh-agent -s`
      ssh-add
  fi
```
in his .bashrc, don't forget that every time you open a terminal you are creating a new agent that nobody will ever kill. It will remain happily hanging there forever with all your keys ready for anyone to use.

(the snippet of code is the one suggested on various threads, including vairous stackoverflow answers)

To protect my agent forwarding, I personally follow a 5th rule:
Use different keys for different purposes, and keep them in different agents.

The reason for this rule is a direct consequence of the other rules, and is best explained with an example: let's say in order to connect to the servers at work you must use ssh keys. But these servers are on a private network, so you must first use agent forwarding and connect to some sort of "gateway".

Every time you connect to the gateway with agent forwarding you give the ability to anyone on that machine with root to use any and every key loaded in your agent.

If any of those people gains access to any other server at work, well, that's life. Something my employer will need to worry about. At the same time, I really don't want those people to gain access to my home server, personal github account, or to the VPS I use to backup my family photos.

What I do: one key for work, one key for home, one key for backup server, "one key per customer", or "per security domain". And I do the same for agents, as otherwise agent forwarding will expose my keys: one agent for work, one for home, and so on.

If I end up forwarding the agent to a compromised machine, the attacker will gain access only to machines within that domain.

Sounds like a giant pain to manage and use? Not really, if you use something like ssh-ident [disclaimer: I'm one of the authors].

Now that we have covered the bases, let's try to cover some of the reasons behind those recommendations...

Starting the agent

If you do a quick search on how to use an ssh-agent, most pages will tell you to start an agent by using something like:

$ eval `ssh-agent`

simple and fast, isn't it? But annoying to do every time you log in. Go back on google and search "how to automatically start ssh-agent", and you'll find many suggestions to add something like:

if [ -z "$SSH_AUTH_SOCK" ] ; then
  eval `ssh-agent -s`
  ssh-add
fi

to your .bashrc. Problem solved? Not really. Now for every console you open, you end up with a new agent. So back on google, and after some time you will find some variation of:

SSH_ENV="$HOME/.ssh/environment"

function start_agent {
    echo "Initialising new SSH agent..."
    (umask 066; /usr/bin/ssh-agent > "${SSH_ENV}")
    . "${SSH_ENV}" > /dev/null
    /usr/bin/ssh-add;
}

# Source SSH settings, if applicable

if [ -f "${SSH_ENV}" ]; then
    . "${SSH_ENV}" > /dev/null
    ps -ef | grep ${SSH_AGENT_PID} | grep ssh-agent$ > /dev/null || {
        start_agent;
    }
else
    start_agent;
fi

(from one of the most voted answers on stackoverflow)

Which in short keeps the details of the agent in a file, tries to load it, checks if that agent is still running (after a reboot or similar), and if not, it starts another one.

I personally don't like grepping for ssh-agent and checking pids, and I don't like the fact that the script above may break agent forwarding, as it does not detect any agent already available.

So I much prefer versions of the script like the one here:

ssh-add -l &>/dev/null
if [ "$?" == 2 ]; then
  test -r ~/.ssh-agent && \
    eval "$(<~/.ssh-agent)" >/dev/null

  ssh-add -l &>/dev/null
  if [ "$?" == 2 ]; then
    (umask 066; ssh-agent > ~/.ssh-agent)
    eval "$(<~/.ssh-agent)" >/dev/null
    ssh-add
  fi
fi

which just queries the agent for available keys. If none can be found, it will try to load the agent config from a file, and if still can't connect to the agent, it will start a new one. This version has the added benefit that if your window manager has an agent already running, you will use it. Easy peasy, right?

Well, there are a few problems with this approach:

Your agent will run forever! And keep your keys with it.
You have one agent for all your keys, which violates the 5th rule at the top of the document.

Let's try to solve one problem at a time, so let's try with problem #1 first.

You could:

Specify a maximum key lifetime with the -t parameter. For example, -t 3600 will keep your keys in memory for at most one hour.

But what happens after an hour of inactivity? Well, your key will disappear, and the next time you try to use ssh it will simply prompt you for your password. That's right, as the key is gone, it doesn't know there was a key in the first place. It will not tell you "look, we need to reload your key" or "ay yo, one of your keys has expired, give me your passphrase again, and I'll happily try to reload it".

This is generally taxing on my brain, as every time this happened to me, I had to reconcile the password prompt with the fact I always use the agent, and come up with "oh drat! my keys expired, let's run ssh-add again".

Annoying, isn't it? You can make it simpler with a few tweaks to ~/.ssh/config, but it still is pretty annoying.

What I ended up doing in the past, was well, never use ssh directly: instead, use a shell script that would check if my keys were still there, and if not, call ssh-add first magically. Complicated? that is another thing that ssh-ident can do for you.
Kill the ssh-agent when you are done using it. Easy peasy, no? You could try using .bash_logout. But if you do, and your shell 'execs' another command, crashes, your ssh terminal dies, or you use screen or tmux, well, it won't work very reliably, your ssh-agent will not be killed.

Feeling brave? Maybe you can trap EXIT 'killall ssh-agent' or something similar. But this still has many of the same drawbacks.

The most reliable method I found was the exec support in ssh-agent, that by looking around the .net, seems also the least mentioned?

After ssh-agent you can specify a command to run. That command will be started with the rigth environment variables set, and ssh-agent will keep running for as long as that command is alive.

For example, if I type something like:
```
  $ exec /usr/bin/ssh-agent /bin/bash
```
from my shell prompt, I end up in a bash that is setup correctly with the agent. As soon as that bash dies, or any process that replaced bash with exec dies, the agent exits. Simple enough, I could add it to my .bashrc, no? Watch out for loops, and well, you'll be disappointed to find out that in each and every terminal, you will end up with a different ssh-agent, needing to run ssh-add every time.

If you use a graphical interface, you can probably use this approach to load your window manager, so all your terminals will have an agent, which leads us straight into the 3rd approach...
The third method is to just rely on your distribution.

Given how many people are using ssh-agent today, many distributions just start your window manager with ssh-agent or some equivalent above.

That way, you have a nice ssh-agent tied to your session, which is killed when you log off. Some distributions even use dbus to start and manage an agent, which I have not dug into yet.

This has worked on and off for me as I upgraded laptops, changed window managers, login managers, and various versions or the graphical interfaces I use felt entitled to replace ssh-agent with something else for the sake of annoying their users.

To this day, the method I found most reliable and comfortable with is to 1) wrap my ssh around a script, that 2) load agents keys as necessary, and 3) expires them after a certain timeout.

Having fun with an agent

Now that we have determined that running and killing an agent is not as easy as it might seem, let's look at what someone can do with root access on a machine running your agent.

First, he may try to get your keys out of it. This is not as hard as it seems, you can find many tutorials online on how to do it.

It boils down to dumping the memory of the ssh-agent, and looking for the keys in memory.

Second, he may try to just use your agent. This literally requires no skill or tool whatsover. Let me give you an example, let's start by loading an agent, a key, and verifying it works:

$ eval `ssh-agent`
$ ssh-add ~/.ssh/my-private-key
Enter passphrase:
$ ssh-add -l
4096 0a:3c:c9:f7:d0:7a:6d:d2:c0:13:c6:0f:15:12:39:1d my-private-key

Now let's act as another evil user who has access as root to the machine:

$ su
# ssh-add -l
Could not open a connection to your authentication agent.
# ps aux |grep bash
...
myself 32684  0.0  0.0  18028  2008 pts/5    S    17:17   0:00 /bin/bash
...
# . <(cat /proc/32684/environ |xargs -0 -i echo {} |grep SSH)
# ssh-add -l
4096 0a:3c:c9:f7:d0:7a:6d:d2:c0:13:c6:0f:15:12:39:1d my-private-key

All the attacker had to do was find the PID of one or my processes, import the right environment variables, and well, profit! The magic was a single line of shell.

Having fun with forwarding

Turns out that the same exact trick used above works with agent forwarding: find a process your victim is running, look at his environment, and well, configure yours to use his agent forwarding socket. Total time to use your keys: < 1 minute.

The only improvement here is that the attacker can't steal your keys. Also, he can only authenticate for as long as you are logged in, both of which sound like a win. But is this such an improvement?

Keep in mind that the attacker can write a 2 line shell script to, for example, scan all the hosts nearby with nmap, and automatically run ssh-copy-id to install his keys on your machine while you are logged in.

Or keep watching what you connect to, and install his key on every such host. Hard? Not really:

while :; do servers=`pgrep -u victim -a ssh |sed -ne 's/.*ssh //p'`; \
    test -z "$servers" && { sleep 1; continue; }; \
    ssh-copy-id -i ~/.my-evil-key.pub $servers; done;

will basically intercept any ssh command you run, and install the attacker's keys on your remote server.

In short: even a few minutes of access to your agent will enable an attacker to do a lot of damage, escalate the number of machines it has access to, and install backdoors to access your system at the most convenient times.

Too many keys, github, and friends

There is one more problem with the naive approach to ssh-agents. Let's say you go the route of having at least one key per customer, or per "security domain", but still use a single agent.

One thing to keep in mind is that when you try to login into a remote host, ssh will try authentication with all the keys you have loaded, one at a time, one after the other.

This works ok for as long as you have a few keys. As soon as you start having many keys, with many being like more than 5, the remote server will kick you out even before you are able to prove your identity.

That's right: most ssh servers allow a maximum number of authentication attempts before killing your connection. Each key you have loaded counts as an attempt, and if you have more than a handful of keys, you will never be able to use your last ones.

Sites like github.com or gitorious also use your key to verify your identity. If you have a work account and home account, for example, you will always submit patches or login as the first key you have loaded in your agent, fancy, not?

Conclusions

I probably sound like a broken record by now, but something like ssh-ident allows you to keep different keys in different agents, easily, while loading agents and keys on demand, keep your identities separated, and easily set a timeout while reloading all keys as necessary.

It is not for everyone to use, but it has served me well so far, and addresses most of the issues discussed in this document with no effort on your side.

http://rabexc.org/posts/pitfalls-of-ssh-agents

Using an ssh-agent, or how to type your ssh password once, safely.

Oct 10, 2014 Updated Oct 10, 2014

Show full content

If you work a lot on linux and use ssh often, you quickly realize that typing your password every time you connect to a remote host gets annoying.

Not only that, it is not the best solution in terms of security either:

Every time you type a password, a snooper has an extra chance to see it.
Every host you ssh to with which you use your password, well, has to know your password. Or a hash of your password. In any case, you probably have typed your password on that host once or twice in your life (even if just for passwd, for example).
If you are victim of a Man In The Middle attack, your password may get stolen. Sure, you can verify the fingerprint of every host you connect to, and disable authentication without challenge and response in your ssh config. But what if there was a way you didn't have to do that?

This is where key authentication comes into play: instead of using a password to log in a remote host, you can use a pair of keys, and well, ssh-agent.

Using ssh keys

All you have to do is:

generate a pair of keys with ssh-keygen. This will create two files: a public key (normally .pub), and a private key. The private key is normally kept encrypted on disk. After all, it's well, supposed to be private. ssh-keygen will ask you to insert a password. Note that this password will be used to decrypt this file from your local disk, and never sent to anyone. And again, as the name suggest, you should never ever disclose your private key.
copy your public key into any system you need to have access to. You can use rsync, scp, type it manually, or well, use the tool provided with openssh: ssh-copy-id. Note that you could even publish your public key online: there is no (known) way to go from a public key to your private key and to get access to any of your systems. And if there was a way, well, public key encryption would be dead, and your bank account likely empty.

and ... done! That's it, really, just try it out:

# Generate and encrypt the key first.
$ ssh-keygen 
Generating public/private rsa key pair.
Enter file in which to save the key (/home/test/.ssh/id_rsa): 
Created directory '/home/test/.ssh'.
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in /home/test/.ssh/id_rsa.
Your public key has been saved in /home/test/.ssh/id_rsa.pub.
The key fingerprint is:
ec:38:bc:94:35:34:55:2b:9a:8d:44:d8:f0:93:09:fb test@joshua
The key's randomart image is:
+--[ RSA 2048]----+
|      o+. ...    |
|      .=.+   .   |
|      . O . .    |
|       = B .     |
|        E .      |
|     . = .       |
|      * .        |
|     . o         |
|      .          |
+-----------------+

# Copy the public key to my remote server, conveniently called
# 'name-of-remote-server'. Note that it will ask you the password
# of the remote server.
$ ssh-copy-id name-of-remote-server
The authenticity of host 'name-of-remote-server (144.144.144.144)' can't be established.
ECDSA key fingerprint is 9f:1e:ab:b6:ff:71:88:a9:98:7a:8d:f1:42:7d:8c:20.
Are you sure you want to continue connecting (yes/no)? yes
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
Password:
...

# Try now to login into the remote server. SSH will now ask you
# for your passphrase, what you used to encrypt your private key on
# disk, what you gave to ssh-keygen above.
$ ssh name-of-remote-server
...

# Let's say you have multiple keys, or you decided to store your key
# in a non standard place, and want to provide a specific one for a given
# host, you can use the -i option.
$ ssh -i /home/test/.ssh/id_rsa name-of-remote-server

So... what are the advantages of using keys? There are many:

Your passphrase never leaves your local machine. Which generally makes it harder to steal.
You don't have a password to remember for each different host. Or...
... you don't have the same password for all hosts you connect to (depending on your password management philosophies).
If somebody steals your passphrase, there's not much he can do without your private key.
If you fear somebody has seen your passphrase, you can change it easily. Once. And for all.
If there is a "man in the middle", he may be able to hijack your session. Once (and well, feast on your machine, but that's another story). If a "man in the middle" got hold of your password instead, he could enjoy your machine later, more stealthy, for longer, and may be able to use your password on other machines.
They just work. Transparently, most of the times. With git, rsync, scp, and all their friends.
You can use an agent to make your life happier and easier.

And if you're wondering what an agent is, you can go to the next section.

Your agent friend

Ok. So you have read this much of the article, and still we have not solved the problem of having to type your password every freaking time, have we?

Agent Smith from the Matrix saying "No Agent? I'm your friend"

That's where an agent comes in handy. Think of it as a safe box you have to start in the background that holds your keys, ready to be used.

You start an ssh-agent by running something like:

$ eval `ssh-agent`

in your shell. You can then feed it keys, with ssh-add like:

$ ssh-add /home/test/.ssh/id_rsa

or, if your key is in the default location, you can just:

$ ssh-add

ssh-add will ask your passphrase, and store your private key into the ssh-agent you started earlier. ssh, and all its friends (including git, rsync, scp...) will just magically use your agent friend when you try to ssh somewhere. Convenient, isn't it?

Assuming you added all the keys you need, you can now ssh to any host, as many times as you like, without ever ever having to retype your password.

Not only that, but you can exploit agent forwarding to jump from one host to another seamlessly.

Let me give you an example:

Let's say you have to connect to a server at your office.
Let's say this server is firewalled. In order to ssh there, you first need to ssh into another gateway. Sounds familiar, doesn't it? This means you end up doing:
```
 $ ssh username@my-company-gateway
 ...
 Welcome to your company gateway!
 ...
 $ ssh username@fancy-server-I-wanted-to-connect-to-to-start-with
 Password:
 ...
```

On this second ssh, what happens? Well, if you type your password, your cleartext password is visible to the gateway. Yes, it is sent encrypted, decrypted, and then through the console driver fed to the ssh process. If a keylogger was running, your password would be lost.

Worst: we are back to our original problem, we have to type our password multiple times!

We could, of course, store our private key on the company gateway and run an agent there. But that would not be a good idea, would it? Remember: your private key never leaves your private computer, you don't want to store it on a remote server.

So, here's a fancy feature of ssh and ssh-agent: agent forwarding.

On many linux systems, it is enabled by default: but if you pass -A to the first ssh command (or the second, or the third, ...), ssh will ensure that your agent running on your local machine is usable from the remote machine as well.

For example:

$ ssh -A username@my-company-gateway
...
Welcome to your company gateway!
...
$ ssh username@fancy-server-I-wanted-to-connect-to-to-start-with
... no password asked! your key is transparently used! ...

The second ssh here, run from the company gateway, will not ask you for a password. Instead, it will detect the presence of a remote agent, and use your private key instead, and ask for no password.

Sounds dangerous? Well, there are some risks associated with it, which we'll discuss in another article. But here is the beauty of the agent:

Your private key never leaves your local computer. That's right. By design, the agent never ever discloses your private key, it never ever hands it over to a remote ssh or similar. Instead, ssh is designed such as when an agent is detected, the information that needs to be encrypted or verified through the agent is forwarded to the agent. That's why it is called agent forwarding, and that's why it is considered a safer option.

Configuring all of this on your machine

So, let's summarize the steps:

Generate a set of keys, with ssh-keygen.
Install your keys on remote servers, with ssh-copy-id.
Start an ssh-agent to use on your machine, with eval ssh-agent.
ssh-add your key, type your password once.
Profit! You can now ssh to any host that has your public key without having to enter a password, and use ssh -A to forward your agent.

Easy, isn't it? Where people generally have problems is on how and where to start the ssh-agent, and when and how to start ssh-add.

The long running advice has been to start ssh-agent from your .bashrc, and run ssh-add similarly.

In today's world, most distributions (including Debian and derivatives), just start an ssh-agent when you first login. So, you really don't have anything to do, except run ssh-add when you need your keys loaded, and be done with it.

Still many people have snippets to the extent of:

if [ -z "$SSH_AUTH_SOCK" ] ; then
    eval `ssh-agent`
    ssh-add
fi

in their .bashrc, which basically says "is there an ssh-agent already running? no? start one, and add my keys".

This is still very annoying: for each console or each session you login into, you end up with a new ssh-agent. Worse: this agent will run forever with your private keys loaded! Even long after you logged out. Nothing and nobody will ever kill your agent.

So, your three lines of .bashrc snippet soon becomes 10 lines (to cache agents on disk), then it breaks the first time you use NFS or any other technology to share your home directory, and then... more lines to load only some keys, some magic in .bash_logout to kill your agent, and your 4 lines of simple .bashrc get out of control

Conclusion

I promised myself to talk about the pitfalls of using an agent and common approaches to solving the most common problems in a dedicated article. My suggestion for now?

Use the ssh-agent tied with your session, and managed by your distro, when one is available (just try ssh-add and see if it works!).
Use -t to ssh-add and ssh-agent, so your private key is kept in the agent for a limited amount of time. One hour? 5 miutes? you pick. But at the end of that time, your key is gone.
Use something like ssh-ident, to automatically maintain one or more agents, and load ssh keys on demand, so you don't even have to worry about ssh-add.

For full disclosure, I wrote ssh-ident. Surprisingly, that still doesn't prevent me from liking it.

http://rabexc.org/posts/using-ssh-agent

Recovering from a failed SSD on linux

Sep 25, 2014 Updated Sep 25, 2014

Show full content

Down with the spinning disks! And hail the SSDs!

That's about what happened the last time I upgraded my laptop. SSDs were just so much faster, energy efficient, and quieter that I couldn't stand the thought of remaining loyal to the trustful spinning disks.

So... I just said goodbye to a few hundred dollars to welcome a Corsair Force GS on my laptop, and been happy ever after.

Or so I thought. Back to the hard reality: last week my linux kernel started spewing read errors at my face, and here is a tale of what I had to do in order to bring my SSD back to life.

The Symptoms

It all started on a Friday morning with me running an apt-get install randomapp on my system.

The command failed with an error similar to:

# apt-get install random-app-whatever-it-was
...
(Reading database ... dpkg: error processing whatever.deb (--install):
dpkg: unrecoverable fatal error, aborting:
   reading files list for package 'libglib2.0-data': Input/output error
E: Sub-process /usr/bin/dpkg returned an error code (2)

where libglib.20-data had nothing to do with what I was trying to install.

Fear pervaded, and next thing I did was run dmesg to see if the kernel had anything to say about the problem:

# dmesg
....
[1841.216697] end_request: I/O error, dev sda, sector 5246153
...

and sure enough, here it was. Trying to read the accused file surely returned Input/Output error:

# cat /var/lib/dpkg/info/libglib2.0-data.list
... Input/output error

Luckily enough, most of the system was still accessible and usable, so it couldn't be so bad after all, or could it?

One more backup

I was commuting to work when this happened, and didn't have with me anything I could use for a backup. So, I did the only thing I could reasonably do: put the laptop in suspend to RAM, back in my backpack, and hope it would survive until I got home.

Once home, I resumed it from RAM (which worked without issues), and did one more backup with rsync (to copy all the files), and dd (to have an image of the partitioning scheme and so on).

Creating an image with dd turned out to require a bit more work than expected: the default parameters of dd make it slow, make it fail on first error, and don't really show what's happening. Here is the command line I ended up using:

# dd if=/dev/sda of=./backup.img bs=104857600 conv=noerror

... where bs=... has been increased to work 100 Mb at a time, and conv=... instructs dd to ignore errors. From another prompt, I also run:

# while :; do killall -USR1 dd; sleep 1; done;

To have dd output statistics once a second.

Don't forget to either boot on another recovery disk (USB key or similar) or mount all the file systems read only (mount -o ro / and all other partitions) before backing up with dd. Otherwise you will back up a file system with changes still in memory.

Assessing the damage

Once I sorted out the backup situation, it was time to recover.

But first thing I wanted to know was... what failed exactly on my system? Was this really the SSD (most likely)? Or something else? Maybe the last time I swapped disks with the laptop I did not push the plug properly, and the connection came loose? Unlikely, but let's debug.

I started by installing smartmontools. That package provides smartctl, which allows to query the drive SMART state, which generally contains useful debugging information.

Given that dpkg was bricked and could not install anything, I just installed the package semi-manually, by:

Mounting a tmpfs on /var/cache/apt, so writes in this directory would not touch the disk:
```
 # mount -t tmpfs none /var/cache/apt/
```
Downloading the right package and version with:
```
 # apt-get --download-only smartmontools
```

Opening the .deb manually:

 # cd /var/cache/apt/archives
 # ar xv ./smartmontools_6.2+svn3841-1.2_amd64.deb 
 x - debian-binary
 x - control.tar.gz
 x - data.tar.xz
 # tar -xJf data.tar.xz

Finally running smartctl from within that directory:

 # cd /var/cache/apt/archives/usr/sbin
 # ./smartctl -a /dev/sda

So, here finally I had smartctl. The first thing I did then was to look at the status of the drive, with -a:

# ./smartctl -a /dev/sda

smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.11-2-amd64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     SandForce Driven SSDs
Device Model:     Corsair Force GS
Serial Number:    1234567
LU WWN Device Id: 0 000000 000000000
Firmware Version: 5.24
User Capacity:    360,080,695,296 bytes [360 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS, ACS-2 T13/2015-D revision 3
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Thu Sep 25 09:06:04 2014 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x02) Offline data collection activity
                    was completed without error.
                    Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                    without error or no self-test has ever 
                    been run.
Total time to complete Offline 
data collection:        (    0) seconds.
Offline data collection
capabilities:            (0x79) SMART execute Offline immediate.
                    No Auto Offline data collection support.
                    Suspend Offline collection upon new
                    command.
                    Offline surface scan supported.
                    Self-test supported.
                    Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                    General Purpose Logging supported.
Short self-test routine 
recommended polling time:    (   1) minutes.
Extended self-test routine
recommended polling time:    (  48) minutes.
Conveyance self-test routine
recommended polling time:    (   2) minutes.
SCT capabilities:          (0x0025) SCT Status supported.
                    SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x0033   120   120   050    Pre-fail  Always       -       0/0
  5 Retired_Block_Count     0x0033   100   100   003    Pre-fail  Always       -       0
  9 Power_On_Hours_and_Msec 0x0032   099   099   000    Old_age   Always       -       1288h+09m+57.740s
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       774
171 Program_Fail_Count      0x000a   000   000   000    Old_age   Always       -       0
172 Erase_Fail_Count        0x0032   000   000   000    Old_age   Always       -       0
174 Unexpect_Power_Loss_Ct  0x0030   000   000   000    Old_age   Offline      -       507
177 Wear_Range_Delta        0x0000   000   000   000    Old_age   Offline      -       1
181 Program_Fail_Count      0x000a   000   000   000    Old_age   Always       -       0
182 Erase_Fail_Count        0x0032   000   000   000    Old_age   Always       -       0
187 Reported_Uncorrect      0x0012   100   100   000    Old_age   Always       -       0
194 Temperature_Celsius     0x0022   035   045   000    Old_age   Always       -       35 (Min/Max 13/45)
195 ECC_Uncorr_Error_Count  0x001c   120   120   000    Old_age   Offline      -       0/0
196 Reallocated_Event_Count 0x0033   100   100   003    Pre-fail  Always       -       0
201 Unc_Soft_Read_Err_Rate  0x001c   120   120   000    Old_age   Offline      -       0/0
204 Soft_ECC_Correct_Rate   0x001c   120   120   000    Old_age   Offline      -       0/0
230 Life_Curve_Status       0x0013   100   100   000    Pre-fail  Always       -       100
231 SSD_Life_Left           0x0013   100   100   010    Pre-fail  Always       -       0
233 SandForce_Internal      0x0032   000   000   000    Old_age   Always       -       1375
234 SandForce_Internal      0x0032   000   000   000    Old_age   Always       -       881
241 Lifetime_Writes_GiB     0x0032   000   000   000    Old_age   Always       -       512
242 Lifetime_Reads_GiB      0x0032   000   000   000    Old_age   Always       -       1342

SMART Error Log not supported

SMART Self-test Log not supported

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

By Googling around for a little bit and checking the Corsair forums, it seems like the important fields to look at are Raw_Read_Error_Rate, the Retired_Block_Count, Reallocated_Event_count and SSD_Life_Left. Note that sometimes you have to look at the 'VALUE' column, rather than the 'RAW_VALUE'. For example, SSD_Life_Left is 100% in this reading, and becomes a problem if it gets below 10%.

I was expecting here to see damaged sectors or failed reads, as SSDs are known to only allow a certain number of writes per cell. I was expecting to see some relocated cells, or otherwise errors.

However, nothing showed up here: most of the counters looked normal, and everything seemed in good enough shape.

Notice also how low the Lifetime_Writes_GiB counter looked: if we average this out, I had gone through each cell at most 2 times in a year, which should be far far below the limit of any modern SSD.

Most SMART capable disks, however, allow to run a self test to verify the integrity and state of a disk. And that's exactly what I did next:

# smartctl -t long /dev/sda
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.11-2-amd64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Execute SMART Extended self-test routine immediately in off-line mode".
Drive command "Execute SMART Extended self-test routine immediately in off-line mode" successful.
Testing has begun.
Please wait 48 minutes for test to complete.
Test will complete after Thu Sep 25 10:01:36 2014

Use smartctl -X to abort test.

Right after I run this command, things started to go awry. Any read to the SSD would fail, things as simple as:

# free
bash: /usr/bin/free: Input/output error

would return error.

screenshot with free returning error

Running smartctl -a /dev/sda one more time would fail with parsing errors, I had to use smartctl -T permissive -a /dev/sda to get an output similar to:

Vendor:   /0:0:0:0
Product:
Capacity: 600,332,565,813,390,450 [600 PB]
Logical block size: 774843950 bytes
...
Log sense failed, IE page [scsi response fails sanity test]
Error counter logging not supported
...
... response length too short ...

which seemed to indicate that the drive was responding with garbage to the SMART commands. Note the 600 peta bytes size, and response length too short, or the empty Vendor or Product string.

screenshot with smartctl showing inconsistent results

A reboot, however, brought back the drive to its original sorry state, with some files unreadable but most of the disk otherwise looking ok.

My conclusion was that starting a selftest was causing something in the firmware or the drive to crash, and this is roughly when I decided to file a ticket with Corsair, and ask for an RMA for the SSD.

Fixing the drive Assessing the damage

Corsair was extremely fast at responding and providing guidance, and their suggestion was simple: upgrade the firmware, and run a secure erase before replacing the drive.

Upgrading the firmware was extremely painful: they do not seem to provide a linux upgrade utility, and the windows version... well, it requires Windows.

So, find a machine with Windows I could use, install the Corsair utilities for SSD support, connect the SSD, and upgrade the firmware (to version 5.24, in my case). A process that overall made me very uncomfortable, but was otherwise fairly simple.

Before doing a secure erase, however, I wanted to test if the new firmware provided any benefits, and sure it did.

First, linux was now failing much much faster. Instead of blocking for several seconds before spewing an error when stumbling upon a bad block I would get an error almost immediately.

Second, running smartctl -t long /dev/sda now did not brick the drive! Running smartctl -a /dev/sda shortly after starting the test would show something like:

...
General SMART Values:
Offline data collection status:  (0x03) Offline data collection activity
                            is in progress.
                            Auto Offline Data Collection: Disabled.
Self-test execution status:      ( 249) Self-test routine in progress...
                            90% of test remaining.

...

... note the 90%, and the Self-test routine in progress text. So, I let it finish. And sure enough, after a few seconds, it started reporting errors. To see the errors, I had to run something like:

# smartctl -l xselftest /dev/sda

unfortunately I did not capture the output to paste it here. But this is what it looks like on my now healthy drive:

smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.11-2-amd64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Extended Self-test Log Version: 1 (1 sectors)
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Aborted by host               10%      1288         -
# 2  Extended offline    Completed without error       00%      1264         -
# 3  Extended offline    Completed without error       00%      1210         -

When it found an error, it had a Status similar to detected errors, a number in LBA_of_first_error indicating the sector with an error, and a Remaining of like '90%', indicating that 90% of the drive still had to be scanned.

Being the kind of curious person I am, I wanted to know if there were any more damaged sectors in the drive. I tried a few commands, like smartctl -t select,next /dev/sda, or smartctl -t select,#of-lba-error+1, or smartctl -t select,cont. None of them failed, but none of them caused the test to resume either.

Some forums suggested to use dd to overwrite the sector by computing its offset and size manually, which would force the drive to relocate it, and the test to continue. However, I believe my math was wrong: reading the sector before overwriting it succeeded, leading me to believe I had computed the wrong sector number.

I could have used a brute force kind of approach by running badblocks or similar to find the list of broken sectors, and try to overwrite them manually. But I decided there was not much point in doing so: the drive was clearly busted, I had a good backup, and doing a secure wipe as suggested by Corsair seemed the most sensible next step.

Secure wipe of the drive

Corsair suggested a secure wipe.

From my understanding of what I read around, SSDs have some sort of block mapper implemented in the firmware (or hardware) to spread the wear evenly and to keep track of which physical sector on the physical drive is mapped to the sector number used by the OS.

A secure wipe destroys and re-creates this data from scratch, basically formatting this hidden file system that may have become corrupt.

Performing a secure wipe turns out is not as simple as it seems. I had to follow all the steps indicated in this wiki.

And beware, a secure wipe will, well, wipe all your data.

In short, here are the commands I had to run:

I had to put the laptop to sleep, and wake it up. Yep, not joking. Turns out that many BIOSes (including the one on the laptop I was using) put all the drives into frozen state (visible with hdparm -I /dev/sda) right after boot. In frozen state, there's not much you can do beside reading and writing data to the disk.

A common trick to 'unfreeze' the drive is to put the laptop to sleep, and wake it up. The bios will not bother re-freezing the drive. So, run the command:
```
  # echo -n mem > /sys/power/state
```
And then opened the lid to wake it up again.
I had to set a security password on the drive. Some forums suggested this was not necessary, as NULL was a perfectly fine password to provide when asked for one. But no, some drives require a real password being set, and my drive turned out to be one of those:
```
  # hdparm --user-master u --security-set-pass foo /dev/sda
```
Finally, issued the secure wipe command:
```
  # hdparm --user-master u --security-erase foo /dev/sda
```
Note that surprisingly this only took a few seconds to complete. Note also that at the end of the process, the password is forgotten, so you don't have to worry about unsetting it.

Sure enough, the drive was now empty: no partitions, and no data I could tell about.

Checking the drive, and restoring the data

So, did this really fix the drive? Or did the drive just forgot about all the bad sectors and unreadable data that was on it? Were all errors really gone?

Before closing my ticket with Corsair I wanted to be sure everything looked ok.

To verify the state of the drive, I tried a few things:

I run another selftest, with smartctl -t long /dev/sda. 45 minutes later it completed without errors, which was exciting.
I restored the dd disk dump I had created earlier, and watched for errors. Restoring the dump caused each and every block to be written.

To do so, I just re-run the dd from earlier inverting if and of, while leaving error checking on:
```
  # dd of=/dev/sda if=./backup.img bs=104857600
```
Checked all the file systems for integrity, with e2fsck -f /dev/sda3, e2fsck -f /dev/sda5, ... each and every partition.

The finishing touches

Restoring the partitions from the dd image I took when the drive was already failing meant two things:

That some files would contain garbage, as they could not be read when the backup was taken.
That the whole disk would be marked as used by the SSD, as each and every block was written by dd. If you have paid attention to how SSDs work, you probably know that they need to be trimmed: they need to know which blocks are unused to ease the task of wear leveling and moving data around.

rsync

To solve the first problem I used rsync to compare the content of my disk with a backup I had taken a week before.

The rsync command line looked something like:

# rsync -avz --delete --progress --checksum --itemize-changes --dry-run \
             --exclude=/run/ --exclude=/proc/ --exclude=/sys/ --exclude=/dev/ \
             --exclude=/var/lock/ --exclude=/mnt/ --exclude=/var/log/ \
             /mnt/backup /

The important options here are:

--checksum Without this option, rsync will only look at a file last modified date and size. Given that dd corrupted the content, we want rsync to verify the content checksum. --itemize-changes To ask rsync to show on screen which files it would overwrite, copy or delete, and why. --dry-run To ask rsync to actually not do anything, just show the output as if it was actually running.

By piping the output to less and letting this command run for a few hours I obtained had a nice list of the files that differed between my disk and the week old backup.

The list turned out fairly short: probably ~200 files, only a few of which I actually cared about. Most of the files were libraries or system files (apt-get, dpkg, logs): stuff I could easily re-install or restore from the backup.

Once I had the list of files I wanted to restore, it was easy: I just run rsync again with the list of files.

Trimming

Trimming was not hard either: I just had to run the command fstrim for each partition:

# for partition in /dev/sda{1,5,6,7}; do fstrim -v $partition; done;

and let it run.

Conclusion

I ended up closing the ticket with Corsair without asking for a replacement. The drive is back in shape, and seems to be working like a charm.

The real question now is how long can I trust this drive for? Will something like this happen again? My belief is that if it happened once, it will surely happen again. But this time around, I will be prepared.

http://rabexc.org/posts/recovering-from-failed-ssd

A simple way to generate snippets in python

Jul 13, 2014 Updated Jul 13, 2014

Show full content

The Problem

Let's say you want to add a search box to your web site to find words within your published content. Or let's say you want to display a list of articles published on your blog, together with snippets of what the article looks like, or a short summary.

In both cases you probably have an .html page or some content from which you want to generate a snippet, just like on this blog: if you visit http://rabexc.org, you can see all articles published recently. Not the whole article, just a short summary of each.

Or if you look to the right of this page, you can see snippets of articles in the same category as this one.

Turns out that generating those snippets in python using flask and basic python library is extremely easy.

So, here are a few ways to do it...

Using javascript

Before starting to talk about python, I should mention that doing this in javascript should be extremely easy and straightforward. In facts, you can find many sites that load entire articles and then magically hide portions of it using javascript.

However, this has a few major drawbacks:

1) Unless you compute the snippets server side, the browser of your user will still receive the whole article so that javascript can chop it up and display only a piece of it.

2) The content will not necessarily be search engine friendly. If you embed the whole content on the page, a search engine may return the index page rather than the article when one of your user looks up an obscure word. Worse: this word and the interesting context may end up being hidden by your javascript, leading to an overall bad experience for your users. And if you decide to go the restful way, with javascript fetching the page via some API, the search engine is unlikely to see the content at all.

Nonetheless, some sites use javascript. Nonetheless, I will only talk about how to do this server side, in python.

Before getting started, don't forget to install all dependencies: this article depends on python being installed, BeautifulSoup, and well, there will be references to werkzeug. On a Debian system, you need to:

sudo -s 
apt-get install python-bs4 python-werkzeug

or, in a distro independent way with python installed:

pip install beautifulsoup4
pip install werkzeug

Using BeautifulSoup

Without getting too fancy, a really simple way to generate a snippet in python is to extract the content of a given article, and well, display it like in the column to the right of this article.

For example:

import bs4
page = FunctionThatGetsYourHTMLPage()

# Get the content of all the elements in the page. 
text = bs4.BeautifulSoup(page).getText(separator=" ")

# Limit the content to the first 150 bytes, eliminate leading or
# trailing whitespace.
snippet = text[0:150]

# If text was longer than this (most likely), also add '...'
if len(text) > 150:
  snippet += "..."

This code is pretty simple: it loads an HTML page with BeautifulSoup, extracts all the text in between html tags, and then limits it to 150 characters. Note that all the formatting will be lost: no bolds, italics, different fonts, and so on. But this is just what we want: we don't want this formatting in the snippet results.

This is still not the best way to do it, as:

1) It is slow: parsing the html page in python server side is not exactly the fastet thing you can do. With some measurements, it can easily add up and become one of the slowest operations on the site. For example: generating the bar to the right seemed to take several hundreds ms, while just returning the article seemed to take at most tens of ms.

2) You will see anything in between , headers, and so on.

3) If there are lots of whitespaces, you will see those as well in your page.

So, what can we do? Well, here's a few simple things you can do to only get the parts you are interested in:

1) Skip the whitespace:

snippet = " ".join(text.split()).strip()[0:150]

2) Start from the :

text = BeautifulSoup.BeautifulSoup(page).find("body").getText(separator=" ")

3) ... and well, cache, cache and cache the results. Eg, have your code be greedy: don't compute the snippet every time the same snippet has to be displayed. If you use flask and/or werkzeug, you can have something like:

import werkzeug
import bs4

class Article(object):
  @werkzeug.cached_property
  def html(self):
    # Reads the html from file, or generates it, or ...

  @werkzeug.cached_property
  def snippet(self, limit=150):
    text = bs4.BeautifulSoup(self.html).getText(separator=" ")
    snippet = " ".join(text.split()).strip()[0:limit]

    if len(text) > limit:
      snippet += "..."

    return snippet

Don't forget that to get any benefit from the caching above, you also need to cache Article objects (by generating them once and keeping them in a list, ...).

After the changes above, generating the snippet will be literally 6 lines:

def snippet(self, limit=150):
  text = bs4.BeautifulSoup(self.html).getText(separator=" ")
  snippet = " ".join(text.split()).strip()[0:limit]
  if len(text) > limit:
    snippet += "..."
  return snippet

Maintaining formatted text

The method above with BeautifulSoup works well to extract a snippet of unformatted text from an existing .html file. If you noticed above, all you get is a string containing the text in between tags.

I could not find any really easy way to extract text while maintaining some formatting, except for whitelisting some html tags and blacklisting others. But doing so in a very generic way gets tricky: what about stylesheets? What about some javascript formatting and magic? what if I have a large table? Images? ...

For this site, I found an extremely simple and elegant solution: all articles are written in MarkDown. Using the markdown library I take a wiki style text and turn it into html.

MarkDown is simple: parsing it is much easier than parsing .html. To generate a formatted text I just take the unformatted wiki like text in MarkDown format, and break it down after a few paragraphs, using something like:

def Summarize(markdown, limit=1000):
  """Returns a string with the beginning of a markdown article.

  Args:
    markdown: string, containing the original article in markdown format.
    limit: integer, how many characters of the original article to produce
      in the summary before starting to look for a good place to stop it.

  Returns:
    string, a markdown summary of at least limit length, unless the article
    is shorter.
  """
  import itertools

  summary = []
  count = 0

  # Skip titles, we don't want titles in summaries.
  def ShouldLineBeSkipped(line):
    if line and line[0] == '#':
      return True
    return False

  # Create an iterator to go over all lines.
  lines = itertools.ifilter(SholdLineBeSkipped, markdown.splitlines())

  # Save all lines until we reach our limit.
  for line in lines:
    summary.append(line)

    count += len(line)
    if count >= limit:
      break

  # Save lines until we find a good place to break the article.
  for line in lines:
    # Keep going until what could be the end of a paragraph.
    if not line.strip() and summary[-1] and summary[-1][-1] == ".":
      break
    summary.append(line)

  # Add an empty line, and bolded '...' at the end of the summary.
  summary.append("")
  summary.append("**[ ... ]**")

  # Finally, return the summary.
  return "\n".join(summary)

Now, to have a nicely formatted summary of an article, all I have to do is something like:

text = LoadMarkdownTextFromDisk()
markdown.markdown(Summary(text), ["codelite", "headerid", "def_list"])

The same suggestion about caching applies here: we could extend our Article class to have a summary cached property producing the formatted summary.

Conclusions

Neither of those methods are perfect: they rely on friendly html pages and markdown for formatting. However, they are extremely simple, a fully general solution would likely be more complex, and well, they seem to work well enough for me :).

http://rabexc.org/posts/html-snippets-in-python

When cardboard boxes are better than suitcases

Jul 7, 2014 Updated Jul 7, 2014

Show full content

Have you ever had to pay extra fees to carry oversize luggage? Flown somewhere, ended up buying so many things that they did not fit your suitcase anymore?

Here's a nifty trick we used in our last trip to Europe which allowed us to carry much more than we believed we could at no extra charge.

The Basics

First: make sure to read the allowances for carry on and check in baggage on your ticket. Make sure you understand them fully, and if unsure, call your airline.

In our case, me, my wife, and baby were traveling on a Swiss Airlines fligth, and our ticket allowed us to bring for free:

2 Carry on bags.
3 Checked in bags.

The checked in bags were limited in weight and length:

23 Kg (~50 lbs) at most.
158 cm (~62 inches) of linear length, where linear length is the sum of the width, depth, and height of your suitcase.

In our case, we discovered that:

The large suitcases we used weighted about ~7 Kg (~15 lbs) empty. With no clothes, no items whatsover, the suitcase itself used up ~30% of our allowance!!
The small suitcases weighted ~5 Kg empty, or about ~21% of our allowance.
None of the suitcases we had was even close to the 158 cm linear length limit. Additionally, we discovered that at least one of the large items we wanted to carry was well within those limits had the suitcase been a little taller and a little thinner.

The solution

Thanks to my long experience as a Tetris player, we soon realized that everything could fit within our free allowance had we had a much lighter and better shaped suitcase.

Easier said than done, though: finding just the right suitcase in terms of size in a super-light yet sturdy material, without any space used up for pockets or the internal mechanisms for wheels at a reasonable price is not an easy task.

But the solution was simple: use a cardboard box. Before you scream bloody murder, consider that a cardboard box does have some really nice properties:

It is much lighter - the box we used ended up weighting, empty, just 1.7 Kg, 7.4% of our allowance, or a 22% gain.
You can shape it to your will - with scissors, tape and about 30 minutes work you can get pretty much any shape you like, and with no cumbersome weird spaces to fill to allow for wheels and handles.
It is dirt cheap - you can buy a cardboard box big enough for about ~5 USD anywhere in the world, and probably get one for free at any supermarket or store if you ask nicely.
It is so cheap you don't care if it breaks - as long as your stuff is safely held together inside.
It is solid - especially if you wrap it in plastic before checking it in.
It can be carried by any airline - I looked online for a while, and could not find any airline that would refuse to carry a cardboard box, as long as it fit the required dimensions. And by experience, I can tell you I have not had any problem so far.

It does have a few drawbacks though:

It does not have wheels or handles, which makes it harder to carry, especially if you need to use public transports. Moving within airports is fine though, as you can easily get carts.
Water and humidity may cause it to fall apart more easily. But don't fear: if you bag your clothes and don't skimp on packaging tape, even if the cardboard breaks your beloved ownings will be well protected. For additional protection (and insurance!) you can even plastic wrap the cardboard box at the airport.

The process

The first thing I did was figure out what the best box for my purpose would be. Given that I had to carry an item that was about 80 cm long, I decided that my ideal "suitcase" would be about 81 cm x 54 cm x 22 cm, or 157 cm or total linear length (with the limit being 158 cm). In inches: 31 3/4 x 21 1/4 x 8 3/4.

So, what I did was simple:

While in a random shop, I asked if I could take a random cardboard box I saw lying around. The important bit is that the box length + width is greater than the length + width you need, and well, that it is taller than you need it to be.
I opened the box on the top and bottom sides carefully, and cut through one of the corners, along the corner, to get a nice flat cardboard surface to work on. One of the corners is usually glued, if you prefer instead you can easily pull the two sides apart.

Now I worked to get back a box of the right size:

Starting from one side of the cardboard box, and along the length, I folded at the desired width. Than again at the desired length, desired width, and desired length. The trick here is to keep the sides about 1 cm (1/2 inch) shorter, so the outer size of the box is the desired one, rather than too large due to measurment errors and/or the thickness of the material. You can use a piece of wood or a wood corner in the furniture of the room to make nice sharp and straight folds.
At the end, I cut the leftover material, and used duct tape to get back a parallelepiped.
Now I cut through the corner of the upper and lower side, so I could fold again the bottom and top. Make sure the cut is long enough so that the side of the box ends up of the right height.
Again, use the tap to seal the bottom.
Fill the box, bagging the content if you are worried about liquids / humidity, seal the top, and profit!

Really: the process will take at most 30 mins, it is much simpler done than explained, and if you are careful with your measurements, you'll get a box just of the right size.

The result

So, this is what I ended up checking in:

luggage before departure

Swiss took it withoug blinking an eye. This is how it arrived:

luggage after arrival

Nice and clean, without issues. And here is what it looks like once opened in comparison with my largest suitcase:

luggage comparison

Note how my expensive suitcase is somehow slightly wider and shorter, and due to this, would have not fit the object I wanted to carry with me. Note also that the box is much more spacious on the inside: the thickness of the cardboard uses less space, the corners are straight rather than rounded, and no space is used by the mechanisms for the wheels, pockets, zippers and so on.

The economics

If I look on a random web site for suitcases, like amazon or ebags.com and go for the "extra large" "checked in" size, there are a few things you can notice:

Some of the suitcases being sold would not meet the free check in airline requirements. Linear length is often > 62 inches, or 158 cm. Sometimes even by a few inches (65, 66, ...). Doubtful they will come after you, but they may charge you extra.
Most of the suitcases weight around 6 to 7 kg, and if you look at the picture inside, you can clearly see that much of the space is taken by zippers, wheels, ... Corners are almost always rounded.
The price goes anywhere from ~90 $ to ~1500 $ for the "extra large size". Even assuming we had to pay 10 $ for a cardboard box (bagging, wrapping, cart rental, ...), if you bought cardboard boxes instead of suitcases you'd be saving for 9 to 150 trips. And note that some of the cheaper suitcases are not that sturdy and may not even last that long.

Conclusions

Using cardboard boxes instead of suitcases sounds certainly very attractive to me at this point :)

This also helps saving space in my little house when not traveling (no need for a suitcase hanging around), and relieves the stress of having to choose / replace / fix the suitcase when broken. I can also dispose of a smaller box easily in exchange for a larger one (or the other way around) when travling, while I would certainly feel bad about leaving around or changing suitcases.

The only open question is how to get wheels: a handle is easy to build with duct tape, but wheels are certainly useful when the trip involves a fair amount of public transports.

http://rabexc.org/posts/custom-boxes-instead-of-suitcases

awesome window manager, i3lock, xautolock and suspend to disk

Jun 21, 2014 Updated Nov 25, 2015

Show full content

With my last laptop upgrade I started using awesome as a Window Manager.

I wasn't sure of the choice at first: I have never liked graphical interfaces, and the thought of having to write lua code to get my GUI to provide even basic functionalities wasn't very appealing to me.

However, I have largely enjoyed the process so far: even complex changes are relatively easy to make, while the customizability has improved my productivity while making the interface more enjoyable for me to use.

The switch, however, has forced me to change several things in my setup. Among others, I ended up abandoning xscreensaver for i3lock and xautolock, while changing a few things on my system to better integrate with the new environment.

In this article, you will find:

A description of how to use xautolock together with i3lock to automatically lock your screen after X minutes of inactivity and when the laptop goes to sleep via ACPI.
My own recipe to display the battery status on the top bar of Awesome. This is very similar to existing suggestions on the Awesome wiki, except there is support for displaying the status of multiple batteries at the same time. Which, for how rare this may sound, is something supported on my laptop which I regularly use (x230 with 19+ cell slice battery).
How I got NetworkManager to display properly.
Some details on what I had to do to get Suspend to disk to work.

In short: a handy list of things I had to plumb in manually to get the environment I wanted.

Using xautolock and i3lock

For those of you who are not familiar with these projects, i3lock is a very simple graphical screen lock program, part of the i3 window manager. When run, it just blanks your screen, and asks you for a password to unlock it.

xautolock instead just monitors your keyboard and mouse for activity. If both are inactive for a configurable amount of time, xautolock will run a command of your choice, like i3lock.

Before starting

... make sure you have xautolock and i3lock installed. On a Debian system, this means running:

sudo -s
apt-get install xautolock
apt-get install i3lock

Starting xautolock

To use xautolock with i3lock in awesome I started off writing a really simple shell script which I put in ~/.config/awesome/locker.sh:

#!/bin/sh

exec xautolock -detectsleep 
  -time 3 -locker "i3lock -d -c 000070" \
  -notify 30 \
  -notifier "notify-send -u critical -t 10000 -- 'LOCKING screen in 30 seconds'"

This script starts xautolock such us it will count time correctly even if the laptop goes to sleep (-detectsleep), lock the screen after 3 minutes of inactivity (-time 3) by running the command i3lock -d -c 000070 (-locker ...), but notify me that the screen is about to be locked 30 seconds early (-notify 30) by running the command notify-send -u critical ....

Let's look at the individual commands now.

notify-send is a really simple program that asks the dbus daemon on your system to notify you of an important event. To use it, you need to make sure that libnotify-bin in debian like systems is installed, with apt-get install libnotify-bin. You can then play with it by just running something like:

notify-send "hello world!"

On most window managers, this will result in a small pop up somewhere on your screen saying "hello world!". The -u critical option just hints the window manager that this is an important message, which colors it in a nice shade of red on awesome. -t 10000 tells the window manager to automatically delete the message in 10 seconds, so I don't really have to close it.

i3lock instead is what is really locking the screen. The -d option instructs i3lock to put the display to sleep, while the -c option picks the background color. For some reason, I did not like the default color, and preferred a nice shade of blue.

Now that I had this script, I had to instruct awesome to run it whenever the window manager was started. This was as simple as editing ~/.config/awesome/rc.lua and adding:

awful.util.spawn_with_shell('~/.config/awesome/locker')

at the end of the script.

Locking screen on command

One thing I really wanted to have is a key binding to allow to quickly lock the screen. Adding one was easy, I just had to edit ~/.config/awesome/rc.lua one more time and add:

awful.key({ modkey, "Control" }, "l",
          function ()
              awful.util.spawn("sync")
              awful.util.spawn("xautolock -locknow")
          end),

to make window + control + l lock my screen. Note that xautolock is able to both spawn a new xautolock daemon, or

Locking screen on sleep

Also, locking the screen when the laptop is put to sleep seemed like a really good idea. On my debian system, all I had to do was add a script /etc/pm.d/sleep/lock containing:

#!/bin/sh

logger "$0 - locking screen after sleep."
xautolock -locknow &> /dev/null

Just put your laptop to sleep now, and check /var/log/syslog to make sure the script was invoked. Don't forget to chmod 0755 /etc/pm.d/sleep/lock.

Battery status

Something that bothered me on awesome was the lack of an easy way to see the battery status.

The awesome wiki has several suggestions on how to show it. My main issue though is that my laptop allows me to have up to 2 batteries connected at the same time, and really, I want to see the status of both batteries.

What I came up with is a variation of what is already suggested on the wiki: 1. It runs the acpi command to get the status of each battery. 2. It formats it nicely and displays it in the top status bar.

Let's start with the function to get the battery status:

-- Create an ACPI widget
function GetBatteryState()
  local command = "acpi -b |sed -e 's@.*\\([0-9]:\\) [^,]*,@\\1@' -e 's@remaining@@' | sed -e :a -e '$!N;s@\\n@| @;ta'"
  local fh = assert(io.popen(command, "r"))
  local text = " | " .. fh:read("*l") .. " | "
  fh:close()
  return text
end

the function GetBatteryState just returns a string with the battery state. You can try to run command yourself to see the output, but here's what it looks like on my system:

acpi -b |sed -e 's@.*\\([0-9]:\\) [^,]*,@\\1@' -e 's@remaining@@' | sed -e :a -e '$!N;s@\\n@| @;ta'
0: 94%, 05:42:26 | 1: 76%

which should be pretty self explanatory: battery 0 has 94% charge, and will take 5 hours to discharge, while battery 1 has 76% charge, and not being used.

Now we need to create a widget, a graphic element to display this text, and make sure the function above is run once per minute.

The tricky part here is that Awesome 3.4 and Awesome 3.5 have significantly different APIs, that are not compatible with each other.

So, if you are running awesome 3.4, you need to add something like:

batterywidget = widget({ type = "textbox" })
batterywidget.text = GetBatteryState()
batterywidgettimer = timer({ timeout = 60 })
batterywidgettimer:add_signal("timeout",
  function()
    batterywidget.text = GetBatteryState()
  end
)
batterywidgettimer:start()

While if you are running Awesome 3.5, you need something like:

batterywidget = wibox.widget.textbox()
batterywidget:set_text(GetBatteryState())
batterywidgettimer = timer({ timeout = 60 })
batterywidgettimer:connect_signal("timeout",
  function()
    batterywidget:set_text(GetBatteryState())
  end
)
batterywidgettimer:start()

Now there is one last thing you need to do: tell awesome to display this box somewhere on the screen.

This, once again, changes based on the version of Awesome you are using. Let's start with 3.4, you need to have something like below in your rc.lua:

for s = 1, screen.count() do
    -- Create a promptbox for each screen
    mypromptbox[s] = awful.widget.prompt({ layout = awful.widget.layout.horizontal.leftright })
    -- Create an imagebox widget which will contains an icon indicating which layout we're using.
    -- We need one layoutbox per screen.

    [...]

    mywibox[s].widgets = {
        {
            mylauncher,
            mytaglist[s],
            mypromptbox[s],
            layout = awful.widget.layout.horizontal.leftright
        },
        mylayoutbox[s],
        mytextclock,
        s == 1 and mysystray or nil,
        batterywidget,
        mytasklist[s],
        layout = awful.widget.layout.horizontal.rightleft
    }
end

For Awesome 3.5, instead:

for s = 1, screen.count() do
    -- Create a promptbox for each screen
    mypromptbox[s] = awful.widget.prompt()
    -- Create an imagebox widget which will contains an icon indicating which layout we're using.
    -- We need one layoutbox per screen.

    [...]

    -- Create the wibox
    mywibox[s] = awful.wibox({ position = "top", screen = s })

    -- Widgets that are aligned to the left
    local left_layout = wibox.layout.fixed.horizontal()
    left_layout:add(mylauncher)
    left_layout:add(mytaglist[s])
    left_layout:add(mypromptbox[s])

    -- Widgets that are aligned to the right
    local right_layout = wibox.layout.fixed.horizontal()
    right_layout:add(batterywidget)
    if s == 1 then right_layout:add(wibox.widget.systray()) end
    right_layout:add(mytextclock)
    right_layout:add(mylayoutbox[s])

    -- Now bring it all together (with the tasklist in the middle)
    local layout = wibox.layout.align.horizontal()
    layout:set_left(left_layout)
    layout:set_middle(mytasklist[s])
    layout:set_right(right_layout)

    mywibox[s]:set_widget(layout)

Eg, some code that for each configured screen defines which widgets are to be shown. Note that in both snippets above I added my batteryidget nearby the mytextclock (the time indication), and mysystray (where all the small icons for things like chat, network, sound... go), in a right to left layout. But really, you can put the widget wherever you like it. The order is important, just move it around, restart awesome with Mod4 + Control + r or by quitting it with Mod4 + Shift + q and look at the result.

And that's it, I now had my battery indication.

Network Manager

After a few months of using nmcli, the network manager command line interface, to configure wifi I really wanted to get back to a graphical widget. Two main reasons pushed me in this direction:

Really, nmcli on my system is buggy. It is easy to crash it and often hard to parse the output (talking about 0.9.8.10).
I don't use it often enough to remember the command line parameters, and every time is a new learning experience. This is because once a network is configured, NetworkManager will happily connect automatically without bothering you.
When you are in a hurry, it's just nice to have a drop down menu with options to click on.

On the downside, it seems like nm-applet, in the network-manager-gnome, brought in lots of dependencies I really did not want on my system.

The compromise I reached was to install network-manager-gnome with --no-install-recommends, like:

sudo -s
apt-get --no-install-recommends install network-manager-gnome

Getting nm-applet to show up in awesome was a breeze: just added to my rc.lua the line:

awful.util.spawn_with_shell('nm-applet')

at the end of the file. The only annoying thing about the simple approach used here is that reloading the config file in awesome will cause multiple network manager statuses to be displayed.

Getting suspend to disk to work

Finally, I wanted suspend to disk to run at the simple press of a button. Unfortunately, it doesn't seem like my x230 has a pre-configured button for it. However, it has a ThinkVantage button that is pretty much useless in linux. So, all I had to do was tell acpid to turn ThinkVantage key presses in requests to suspend to disk. To do so, I created a file /etc/acpi/events/my-suspend with:

event=button/prog1 PROG1 00000080 00000000
action=/etc/acpi/sleep_suspendbtn.sh suspend

and run service acpid restart. Note that I discovered what to write in the line event=button/prog1 ... by running acpi_listen, and cutting and pasting the result after event=.

http://rabexc.org/posts/awesome-xautolock-battery

Sharing directories with virtual machines and libvirt

Apr 18, 2014 Updated Apr 18, 2014

Show full content

Let's say you want to make the directory /opt/test on your desktop machine visible to a virtual machine you are running with libvirt.

All you have to do is:

virsh edit myvmname, edit the XML of the VM to have something like:
```
<domains ...>
  ...

  <devices ...>
    <filesystem type='mount' accessmode='passthrough'>
      <source dir='/opt/test'/>
      <target dir='testlabel'/>
    </filesystem>
  </devices>
</domains>
```
where /opt/test is the path you want to share with the VM, and testlabel is just a mnemonic of your choice.

Make sure to set accessmode to something reasonable for your use case. According to the libvirt documentation, you can use:
mapped To have files created and accessed as the user running kvm/qemu. Uses extended attributes to store the original user credentials. passthrough To have files created and accessed as the user within kvm/qemu. none Like passthrough, except failures in privileged operations are ignored.
More details are provided in the next section. For now, just ensure that the new <filesystem> blobs are under <domains> and <devices>, order does not matter. Make also sure to save and exit.
Now start your virtual machine, with virsh start myvmname, and get a console.

Append a few lines to /etc/modules, to make sure the right modules are loaded:

 $ sudo -s
 # cat >>/etc/modules <<EOF
 loop
 virtio
 9p
 9pnet
 9pnet_virtio
 EOF

As a root user, load those modules:
```
 # service kmod start
```

Now you should be ready to mount the file system:

 # mount testlabel /opt/test -t 9p -o trans=virtio

et voila! If you cd /opt/test you should be able to see the files in your host physical machine.
if you want the file system to be automatically mounted at boot time, you can add something like:
```
 testlabel /opt/test            9p             trans=virtio    0       0
```
to your /etc/fstab file.

Issues

If you get access denied to your files in /opt/test or can't write in the directory, the problem is generally related to the accessmode you picked, and the user your VM is running as.

Don't forget that at the end of the day a Virtual Machine is just another process on your host operating system. This process is running with the privileges of a particular user, and only able to change and touch the files that the specific user is given access to.

If you run ps aux |grep kvm or ps aux |grep qemu on your host system, you will most likely see that a system VM is running as user libvirt-qemu on Debian, while it is running as yourself if it is a session VM. If you are confused about system or session VMs, you should read this article.

This means that kvm/qemu will be able to read or write files or directories either as you, or as the libvirt-qemu user. Make sure that file privileges and directories are set accordingly.

To change the uid under which system VMs are run, you need to:

edit /etc/libvirt/qemu.conf
modify the parameters user and group to have the desired value.
restart libvirt daemons, with service restart libvirt-bin and service restart libvirt-guests.
it may also be necessary to restart any VM that is still running, to pick up the new user and group.

Alternatively, you may want to change accessmode to mapped or none.

http://rabexc.org/posts/p9-setup-in-libvirt

Changing the default connect URI in libvirt

Apr 17, 2014 Updated Apr 17, 2014

Show full content

All the libvirt related commands, like virsh, virt-viewer or virt-install take a connect URI as parameter. The connect URI can be thought as specifying which set of virtual machines you want to control with that command, which physical machine to control, and how.

For example, I can use a command like:

virsh -c "xen+ssh://admin@corp.myoffice.net" start web-server

to start the web-server virtual machine on the xen cluster running at myoffice.net, by connecting as admin via ssh to the corresponding server.

If you don't specify any connect URI to virsh (or any other libvirt related command), by default libvirt will try to start a VM running as your username on your local machine (eg, qemu:///session). This unless you are running as root, in which case libvirt will try to run the image as a system image, not tied to any specific user (eg, qemu:///system).

I generally run most of my VMs as system VMs, and systematically forget to specify which connect URI to use to commands like virsh or virt-install. What is more annoying is that some of those commands take the URI as -c while others as -C.

However, turns out that most of those commands rely on libvirt, and that libvirt itself looks at LIBVIRT_DEFAULT_URI to pick the default connect URI.

All I had to do to have all of those commands use qemu:///system as default was to edit my .bahsrc to have:

export LIBVIRT_DEFAULT_URI="qemu:///system"

logout, login again, and enjoy!

http://rabexc.org/posts/libvirt-default-url

Making GRUB quiet

Sep 22, 2013 Updated Sep 22, 2013

Show full content

While traveling, I have been asked a few times by security agents at airports to turn on my laptop, and well, show them it did work, and looked like a real computer.

Although they never searched the content and nothing bad ever happend, every time I cross the border or go through security I am worried about what might happen, especially given recent stories of people being searched and their laptops taken away for further inspection.

The fact I use full disk encryption does not help: if I was asked to boot, my choice would be to either enter the password and login, thus disclosing most of the content of the disk, or refuse and probably have my laptop taken away for further inspection.

So.. for the first time in 10 years, I decided to keep Windows on my personal laptop. Even more, leave it as the default operating system in GRUB, and well, not show up GRUB at all during boot.

Not because I think it is safer this way, but just to create as little pretexts or excuses for anyone to further poke at my laptop, in case I need to show it or they need to inspect it.

Getting grub out of the way was not as easy as it should have been, so this post is to document what I did.

Problems

First of all, here are the problems:

The Debian GRUB setup scripts create a menu entry in GRUB for each kernel you have installed, followed by other detected Operating Systems. This means that every time you install a new kernel, the entry number of other Operating Systems change (eg, Windows becomes the 3rd entry, or 4th entry, ...). Given that the default Operating System is specified by entry number, if you want to default to windows, well, it doesn't play out well.
By default, GRUB will show a menu. If you disable that menu (relatively easy), it will still show a "Loading GRUB." message followed by "Welcome to GRUB!", something like:
```
 Loading GRUB. 
 Welcome to GRUB!
```
Turns out that those messages are not configurable, as they are printed before any config file can be read by GRUB. Ubuntu and a few other vendors have provided a patched version of GRUB, but I really don't want to go down that path: don't want to keep installing my own version of GRUB or patch and recompile for each new release.

So, here's what I did...

Fixing the order of the entries

There might be better ways to provide a default that is not an integer, the name of an entry, for example. However, I really wanted windows to show up first in GRUB.

To fix the order of the menu entries, I:

Opened /boot/grub/grub.cfg, and manually copied the entry for Windows I wanted to keep. In my case, the entry was:

 menuentry "Windows 7 (loader) (on /dev/sda2)" --class windows --class os {
         insmod part_msdos
         insmod ntfs
         set root='(hd0,msdos2)'
         search --no-floppy --fs-uuid --set=root F646B41846B3D817
         chainloader +1
 }

Disabled automated discovery of operating systems. I don't care, I don't install new systems that often, and when I do, I'm well aware I have to update grub config. To do so, you need to:
```
 $ sudo -s
 # vim /etc/default/grub
 ...
 GRUB_DISABLE_OS_PROBER=true
```
eg, add GRUB_DISABLE_OS_PROBER=true to /etc/default/grub.

In /etc/grub.d, added a script 06_windows like this:

 $ sudo -s
 # cd /etc/grub.d
 # cat > 06_windows <<EOF
 #!/bin/sh
 exec tail -n +3 $0

 menuentry "Windows 7 (loader) (on /dev/sda2)" --class windows --class os {
         insmod part_msdos
         insmod ntfs
         set root='(hd0,msdos2)'
         search --no-floppy --fs-uuid --set=root F646B41846B3D817
         chainloader +1
 }
 EOF
 # chmod 0755 ./06_windows

Run update-grub to get the grub configuration updated for real.
Checked the content of /boot/grub/grub.cfg manually, and reboot to verify. Windows should be the first entry now.

Disabling the boot menu

This was relatvely easy to do, just edit /etc/default/grub, make sure you have the following lines:

GRUB_DEFAULT=0
GRUB_TIMEOUT=0
GRUB_HIDDEN_TIMEOUT=5
GRUB_HIDDEN_TIMEOUT_QUIET=true

The first line will tell grub to start Windows by default (the first boot entry), the second one tells grub to show the menu for 0 seconds by default, thus not showing it, the 3rd one will wait for 5 seconds for you to press a key before the menu, the last one will not show the counter going from 5 to 0 before showing the menu.

The only hiccup I had here was that most of the documents say you have to hold shift to get into the menu, but no, for me I had to press ESC, or any other key? I still need to try :).

Don't forget to run update-grub and reboot to test this out. You should see that despite the changes, you will still have a Loading GRUB. message, and a Welcome to GRUB!, although nothing else will show up before booting Windows.

Disabling the boot messages

So, how do you get rid of the annoying:

Loading GRUB.
Welcome to GRUB!

? Most forums and online discussions will tell you to patch the GRUB source code and recompile. Those messages are printed out well before any config file can be loaded, and there are really not that many alternatives.

I really didn't want to patch, as I did not want to maintain a set of patched binaries for my own use on my own system (yes, I love keeping the system up to date! And I love playing with testing/unstable, which means frequent updates).

The idea was simple: if the messages are displayed, they must be stored somewhere. And if any equivalent of printf is used, I can replace the first character of each of those strings with a \0 to prevent them from showing up.

This is terribly terribly hacky. But 2 hours of work to find the right files and the right process gave me exactly what I wanted: a tool that modifies a few of the grub files to remove the messages, which works like a charm.

By adding a hook in /etc/initramfs-tools or /etc/grub.d, I can just run the tool every time grub configs are changed, without having to recompile and patch the source.

I've just uploaded some code to github if you want to try it. Read the README, but it should be really straightforward to get it rolling.

Again, don't expect too much, it's not clean and beautiful, it only works.

What next?

Most distributions used an entirely different path: patching GRUB to unconditionally disable those messages. As a user, I'd rather prefer to have the choice to disable those messages or not, especially given the fact that those messages can be useful for debugging.

Unsurprisingly, the GRUB maintainers refused those patches, which are now maintained separately by each distro that includes them.

Related to grub-shusher, I will need to update it every time bootstrap.S and a few other .S files change in GRUB. This doesn't happen often, but I am sure I will eventually grow tired of maintaining it.

It would still be great to have a real, supported, solution for configuration parameters that are needed before, well, a configuration file can be read and loaded.

Here are some proposals:

It would not be hard to add some sort of watermark before each configuration variable in the .S file, and binary blobs? Then we could have a tool like grub-shusher that reliably can find those watermarks, the corresponding variables, and change them directly into the binary? For example, in bootstrap.S we could have a bool to determine if messages have to displayed or not, code would check that bool value before displaying the messages. Before that bool definition, we can add a watermark like 0xabcd (any value that is not used throughout the binary, really) to indicate that the following bytes are a configurable bool? Have something like grub-shusher find those watermarks, and allow to change them. This is probably worth doing if there are more variables than well, just one.
Ship GRUB with variances of kernel.img, with different parameters compiled in, and let grub-install figure out which variances to install on the MBR based on user configs or command line flags. This would work only if there are a handful of variables, as the number of combinations would explode exponentially. It seems brittle, but would work.

http://rabexc.org/posts/grub-shush

I/O performance in Python

Sep 9, 2013 Updated Sep 9, 2013

Show full content

The Problem

I am writing a small python script to keep track of various events and messages. It uses a flat file as an index, each record being of the same size and containing details about each message.

This file can get large, in the order of several hundreds of megabytes. The python code is trivial, given that each record is exactly the same size, but what is the fastest way to access and use that index file?

With python (or any programming language, for what is worth), I have plenty of ways to read a file:

I can just rely on read and io.read in python having perfectly good buffering, and just read (or io.read) a record at a time.
I can read it in one go, and then operate in memory (eg, a single read in a string, followed by using offsets within the string).
I can do my own buffering, read a large chunk at a time, and then operate on each chunk as a set of records (eg, multiple read of some multiple of the size of the record, followed by using offsets within each chunk).
I can use fancier libraries that allow me to mmap or use some other crazy approach.

and even more ways to parse it. For now, I just wanted to focus on the reading part, as in my case parsing was a simple split or struct.unpack.

The Quest

Disclaimer: most of the examples below are oversimplified and not particularly beautiful. I should have used the with construct, checked for exceptions, or preferred list operations over string operations at times. I was trying to avoid any distractions, by keeping the examples as similar as possible. Please bear with me, and keep your pythonic feelings at bay, if you can :), but do please provide suggestions if you have any!

The first question I wanted to answer was "is solving this problem even worth spending time on it? Why can't I just write the simplest code, using read, and be done with it?".

So that's where I started from:

f = open(sys.argv[1], "rb")
for i in itertools.count():
  record = f.read(16)
  if not record:
    break

print i

The counter, printing i, is just to verify that all the records have been read correctly.

The file I've used for benchmarks is about 1.4 Gb in size:

$ ls -al ./test-index.bin
-rw-r--r-- 1 rabexc users 1493893120 Dec 16  2010 ./test-index.bin

My system has about 4 Gb of ram, only < 1 of which is used in steady state:

$ free
             total       used       free     shared    buffers     cached
Mem:       4039164    2363280    1675884          0      38128    1697472
-/+ buffers/cache:     627680    3411484
Swap:      6291452        140    6291312

Given that I'm not using the system for anything else but this test, this means that after I read the file for a few times, it should live in cache, and always be accessed from memory.

So, how do I time the code? well, a few run of the scripts with time should do it:

$ time python /tmp/snippets/snippet00.py ./test-index.bin
93368320
real    0m54.960s
user    0m53.715s
sys     0m0.620s

(this is like the 3rd run, which is important to make sure that all the important bits, including the test file, are read from cache rather than actual disk, so I can benchmark the code, and not my hard drive).

This means 55 seconds were spent running the python snippet above, and 0.6 seconds were spent by the operating system to provide the content of the file.

This is quite disappointing: I was hoping for my code to take < 10 seconds. How fast can this go at best? Let's try comparing this number with how long cat takes to read the file:

$ time cat ./test-index.bin &> /dev/null
real    0m0.531s
user    0m0.016s
sys     0m0.512s

well, a plain cat is taking about half a second, and I'm sure there are better ways to read files than what cat does! It ought to be possible to speed up my python code!

Given that I don't know anything about python internals, let's start by looking at what cat is doing internally with a simple strace:

fadvise64_64(3, 0, 0, POSIX_FADV_SEQUENTIAL) = 0
...
read(3, "QFI\373\0\0\0\2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\20\0\0\0\1@\0\0\0"..., 32768) = 32768
write(1, "QFI\373\0\0\0\2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\20\0\0\0\1@\0\0\0"..., 32768) = 32768
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 32768) = 32768
write(1, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 32768) = 32768
read(3, "\200\0\0\0\0\6\0\0\200\0\0\0\0\214\0\0\200\0\0\0\1\21\0\0\200\0\0\0\1\226\0\0"..., 32768) = 32768
write(1, "\200\0\0\0\0\6\0\0\200\0\0\0\0\214\0\0\200\0\0\0\1\21\0\0\200\0\0\0\1\226\0\0"..., 32768) = 32768
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 32768) = 32768
write(1, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 32768) = 32768
read(3, "\0\0\0\0\0\3\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 32768) = 32768
write(1, "\0\0\0\0\0\3\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 32768) = 32768
...

So:

It tells the operating systems it intends to read the file sequentially (fadvise).
It reads the file in chunk of 32K.
Which it immediately writes out to the output file.

The Results

Unless you are interested in spending the next 30 mins of your life learning about experiments that turned out to be of very little help, let me summarize the results first:

I now believe that python is actually quite fast at reading the file. The issue seems to be in accessing the buffer 16 bytes at a time. A simple substring using list slice syntax over a large buffer costs about 300 ns, multiply by a 100,000,000 runs, this gets to ~30 seconds.
I could not find any good way to speed this part up. Anything I did outside the example snippets ended up paying roughly the same cost. Note also that most of the APIs take strings as input, so it is hard to use anything else.
Eliminating this cost in example snippets of code was tricky, but brought the performance to be in the same order of magnitude as that of cat.
Not sure even eliminating this cost would buy much. Sure, it will make the benchmark faster, but any sort of processing in the middle of the loop will likely end up costing something.
Different ways to actually read the file did bring some improvements. Eg, I went from ~30-40 seconds with the best approaches, to about 2 minutes with the worst ones. Given that the naive code is about 50 seconds, those 10 seconds are probably not worth any sort of extra complexity in your code.

The Experiments Naive open with read

You have already seen this snippet of code:

f = open(sys.argv[1], "rb")
for i in itertools.count():
  record = f.read(16)
  if not record:
    break

print i

Running this script takes about 54 seconds. Just to put this in perspective, it is about 100 times slower than cat, although it is really not doing anything with each record.

$ time python /tmp/snippets/snippet00.py ./test-index.bin
93368320
real    0m54.960s
user    0m53.715s
sys     0m0.620s

If I run strace on this script, I see that internally python is reading 4K at a time, and buffering. Good!

read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096

this means it is using buffers of about 4K size, or 8 times smaller than cat. If I increase the buffer size to 32K, with:

f = open(sys.argv[1], "rb", 32768)

it is still taking ~48 seconds, which means that most of the time is still, well, spent by python itself, with sys time still ~0.5 seconds.

If I further increase the buffer size to 1 Mb and 10 Mb, performance does not improve.

But if I leave a large buffer, and also increase the f.read() size from 16 bytes to 32 K or 1 Mb, run time goes down to about 3 seconds.

Given that the buffer size controls how many times the read syscall is invoked, while the size of the f.read() controls how many times the loop, and the python code to perform the read is called, it seems like the cost is dominated by the python code to perform the read.

This is where knowning more of the python internals would come in handy, but by peeking at the python process with a profiler, I don't see that much, except for the fact that python itself is spending lots of CPU, which doesn't sound very surprising.

 35.17%  [.] 0x000bd586
 30.86%  [.] PyEval_EvalFrameEx
  9.15%  [.] PyDict_GetItem
  6.22%  [.] _PyObject_GenericGetAttrWithDict
  2.98%  [.] PyDict_SetItem
  2.13%  [.] Py_UniversalNewlineFread
  2.06%  [.] PyCFunction_Call
  1.90%  [.] PyObject_Malloc
  1.83%  [.] PyObject_GetAttr
  1.78%  [.] PyString_FromStringAndSize
  1.76%  [.] PyObject_Free
  1.37%  [.] PyEval_RestoreThread
  1.30%  [.] PyTuple_New
  0.68%  [.] PyEval_SaveThread
  0.58%  [.] _PyArg_ParseTuple_SizeT
  0.24%  [.] getresgid@plt
  0.00%  [.] _PyModule_Clear
  0.00%  [.] PyType_ClearCache

So the question arises: does it get faster if, for example, I use io.read instead of the plain read? And what if I use os.read on file descriptors instead?

io.open and read

This is what the code looks like:

f = io.open(sys.argv[1], "rb")
for i in itertools.count():
  record = f.read(16)
  if not record:
    break
print i

and this is how fast it is:

$ time python /tmp/snippets/snippet01.py ./test-index.bin
93368320

real    0m42.741s
user    0m42.039s
sys     0m0.636s

I was not expecting io module to be any faster, really, especially given that according to strace, they are using the same buffer size. This seems to indicate that the python code path to run io.read() is indeed slightly faster? oh, well, the 10 seconds gain doesn't make it particularly exciting.

open and os.read

Let's try with a simple os.read. Given that it is the lowest level function I have available, I expect it to have the least amount of python code in it. Although I know nothing about python internals, I would expect plain read and io.read to be implemented on top of os.read.

However, calling os.read directly to get 16 bytes at a time will incur in heavy syscall costs, but let's try:

f = open(sys.argv[1], "rb")
fd = f.fileno()
for i in itertools.count():
  record = os.read(fd, 16)
  if not record:
    break

print i

And here's how long it takes:

$ time python /tmp/snippets/snippet02.py ./test-index.bin
93368320

real    1m53.372s
user    0m58.292s
sys     0m54.927s

Which is roughly twice as slower than any other solution so far. Unsurprisingly, this means that buffering is important. Note, however, how the sys time has gone up significantly from previous tests, from ~half a second to about 54 seconds, while the user time is pretty much unchanged.

This probably means that we are paying roughly the same cost within python itself, plus the cost to perform many more syscalls and actually asking the kernel 16 bytes at a time of the file.

But what if I increase buffering here? And read 1 Mb at a time?

f = open(sys.argv[1], "rb")
fd = f.fileno()
for i in itertools.count():
  record = os.read(fd, 1048576)
  if not record:
    break

print i

Time goes down significantly:

$ time python ./snippet02.py /opt/vms/pool/test-index.bin 
1425

real  0m1.446s
user  0m0.012s
sys 0m1.424s

So, where is the cost? Is it in the read function codepath, or is it in working 16 bytes at a time in python?

Given that by using a plain read with 1 Mb at a time the run time went down to about 3 seconds, I expect the difference between os.read and read in terms of cost to be that 1.5 second.

So, what about the 40+ seconds caused by reading 16 bytes at a time?

Let's see what happens if I os.read() 1 Mb at a time, and then work on 16 bytes at a time. Here's what the code looks like:

f = open(sys.argv[1], "rb")
fd = f.fileno()
offsets = range(0, 1048765, 16)
i = 0
while True:
  buff = os.read(fd, 1048576)
  if not buff:
    break

  for offset in offsets:
    record = buff[offset:offset + 16]
    i += 1

print i

And here are the timings:

$ time python ./snippet05.py /opt/vms/pool/test-index.bin 
93405900

real    0m38.921s
user    0m37.328s
sys     0m1.520s

What makes these timings interesting is that if I comment out the inner loop, it takes 1.5 seconds once again. So the 37 extra seconds are spent by the inner loop!

If I turn the body of the loop into a simple pass, the run time goes down to about 6 seconds, while if I leave only the record = buff[offset:offset + 16] as instruction, this becomes ~30 seconds.

This means that computing the substring is really what I need to make faster, and costs about 300 ns.

Reading the whole file in one go

Here is what the code looks like:

f = open(sys.argv[1], "rb")
data = f.read()

for i in itertools.count():
  start = i * 16
  end = start + 16
  if start >= len(data):
    break
  record = data[start:end]

print i

and here is the output:

$ time python /tmp/snippets/snippet03.py test-index.bin
93368320

real    1m12.975s
user    1m11.120s
sys     0m1.720s

which is not that surprising. This is actually much slower than any other read operation. If the cost is in string processing, it may be that we are paying this cost multiple times now: once to fill data, reading one buffer at a time, and once every time I create a record object from the buffer.

In facts, if I comment out the record = data[start:end] line the time goes back to almost normal:

$ time python /tmp/snippets/snippet03.py test-index.bin
93368320

real    0m48.553s
user    0m44.427s
sys     0m1.764s

io.open and readinto

By peeking at pydoc io, there seems to be a method readinto that reads data into a bytearray instead of a string, so let's try it:

b = bytearray(16) 
f = io.open(sys.argv[1], "rb")
for i in itertools.count():
  numread = f.readinto(b)
  if not numread:
    break

print i

The first experiment doesn't really go well, this is worse than anything I've seen before:

$ time python /tmp/snippets/snippet04.py test-index.bin
93368320

real    1m40.463s
user    1m38.794s
sys     0m0.648s

By looking at strace, it seems to be doing lots of read(4096), which probably means read into a buffer, and then copy into the bytearray of 16 bytes:

read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096

Which seems to be slower than reading into a plain string? interesting.

Let's try with a larger bytearray, like 1048576, and see what happens instead:

b = bytearray(1048576)
# b = bytearray(16)
f = io.open(sys.argv[1], "rb")
for i in itertools.count():
  numread = f.readinto(b)
  if not numread:
    break

print i

run time is now:

$ time python /tmp/snippets/snippet04.py test-index.bin
1425

real    0m3.926s
user    0m0.220s
sys     0m1.568s

4 seconds, which is comparable to what I had before with large reads.

If I look with strace, mostly out of curiosity, here's what happens:

mmap2(NULL, 1052672, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7296000
read(3, "\277\267\330\5\301\224\230\374\212\335Y\354\326a\237K\217][\354\334a\257\376c\357,v\352\260\213\364"..., 1048576) = 1048576
munmap(0xb7296000, 1052672)             = 0

which looks like it's reading directly into the bytearray.

What is surprising is the mmap. As pointed out on the reddit thread, the mmap and munmap are used to allocate memory (flags indicate anonymous memory, and file descriptor is -1). This might simply be related to the malloc() call in linux glibc, that according to the man page will use mmap instead of sbrk for any allocation larger than MMAP_THRESHOLD, by default set to 128 k.

What I find suspicious here is that if you look at strace, you see a continuous pattern of mmap and munmap. This probably means a malloc followed by a free shortly after, but I would have expected a good memory allocator to "cache" allocations within the process for some time before returning memory to the operating system, although it's been a while since I've looked at the implementation of malloc.

So, without actually looking at the python code, here's my theory of the internals:

If "bytearray > buffer", read directly into bytearray.
Otherwise read into a buffer of configured size and copy the requested data into bytearray.

Let's try with a bytearray of 8192 bytes, which is twice as much as the buffer size:

$ time python /tmp/snippets/snippet04.py test-index.bin
182360

real    0m1.256s
user    0m0.704s
sys     0m0.548s

From strace, this is reading 8k at a times, presumably into the bytearray directly, given the run time.

Let's try one more time, with buffering disabled (note the 0 in io.open):

b = bytearray(16)
f = io.open(sys.argv[1], "rb", 0)
for i in itertools.count():
  numread = f.readinto(b)
  if not numread:
    break

print i

With strace, I can now clearly see that python is reading 16 bytes at a time directly:

read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", 16) = 16
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", 16) = 16
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", 16) = 16

However, this is far slower than I was expecting:

$ time python /tmp/snippets/snippet04.py test-index.bin
93368320

real    2m4.992s
user    1m10.476s
sys     0m54.287s

Note the sys time at 54 seconds, probably because of the repeated read calls. The 1 minute and 10 seconds of user time is probably the high number of iterations? I would not have expected readinto to be slower than other primitives.

With python3

Those were mostly run out of curiosity, with the default buffer size and the most naive code.

Naive open and read

$ time python3 /tmp/snippets/snippet00.py test-index.bin

real    0m32.056s
user    0m30.662s
sys     0m1.300s

io.open and read

$ time python3 /tmp/snippets/snippet01.py test-index.bin
93368320

real    0m38.533s
user    0m37.862s
sys     0m0.588s

open and os.read

$ time python3 /tmp/snippets/snippet02.py test-index.bin 
93368320

real    2m6.902s
user    1m11.432s
sys     0m55.223s

reading the file in one go

$ time python3 /tmp/snippets/snippet03.py test-index.bin
93368320

real    1m36.938s
user    1m33.706s
sys     0m1.712s

io.open and readinto

$ time python3 /tmp/snippets/snippet04.py test-index.bin
45590

real    0m0.595s
user    0m0.084s
sys     0m0.512s

http://rabexc.org/posts/io-performance-in-python

Hanging clothes in windy weather

Aug 20, 2013 Updated Aug 20, 2013

Show full content

Some say that clothes smell and look better when dried with sunlight in the summer breeze.

I can certainly state that if you have to pay to use a dryer, it is much cheaper to use sunlight, and line dry. And really, it's not that much more work.

The only annoying part where we live is the wind: yes, it gets the clothes to dry faster, but it also gets them to fall off the line, and on the grass of our backyard.

If we had a normal rack to dry clothes, this would not be a big deal: humanity has long figured out how to get clothes not to fall off clothes racks. There are such mechanical devices like clothespins that can be easily purchased and used.

The tricky part in our case is that we have some sort of line going from one building to another that we use to hang clothes on. This line is exposed to the weather, gets dirty to the point it is really hard to clean, and we generally don't want to put our clothes directly on it. Instead, we use hangers.

line with clothes

This is where the wind comes into play: it's not that hard for the wind to knock off our clothes, and get them to the ground, and well, dirty.

The best solution we have found so far has been to use a rubber band. Let me explain how.

You start off with a simple hanger, like the one here:

clothes hanger

You then put a rubber band around it:

clothes hanger with rubber band

If the rubber band is too long, you make another loop:

clothes hanger with looping rubber band

Then you put the hanger on the line as usual:

clothes hanger on the line

And finally you use the rubber band to lock the hanger:

clothes hanger on the line

Once done, you can just unlock the rubber band and leave it in place, ready for the next time. With the rubber band closed, it is really hard for the wind to knock off the clothes! Victory!

http://rabexc.org/posts/hanger-elastic

Getting started with a raspberry pi, or how I had to fix it

Aug 1, 2013 Updated Aug 1, 2013

Show full content

Just a few days ago I realized that the Raspberry PI I use to control my irrigation system was dead. Could not get to the web interface, pings would time out, could not ssh into it.

The Symptoms

The first thing I tried was a simple reboot. The raspberry is in a black box in my backyard, maybe the hot summer days were... too hot? I have a cron job that shuts it down if the temperature goes above 70 degrees. Or maybe the shady wireless card and its driver stopped working? I have another cron job to restart it, so this seems less likely.

So.. I reboot it by phyiscally unplugging it, but still nothing happens. The red led on the board, next to the ethernet plug is on, which means it is getting power. The green led next to it flashes only once. By reading online, this led can flash to report an error, or to indicate that the memory card is being read.

There is no error corresponding to one, single, flash, so I assume it means that it tried to read the flash, and somehow failed. It is supposed to be booting now, so I would expect much more activity from the memory card.

Maybe the card is corrupted, or something bad happened to the file system.

Checking the flash

Removed the memory card from the raspberry, inserted it in my laptop. First thing I do is run fsck.

Note that /dev/sdb is the memory card inserted in my laptop! On your computer, it will likely have a different name. Make sure you don't damage one of your real partitions.

Anyway, the command I use is:

# fsck -f /dev/sdb1

First partition is good, let's look at the second one:

# fsck -f /dev/sdb2

TADA! Lot's of problems reported! This is annoying, the file system was corrupted.

Next step is to back it up, just in case. Although there's not much on it, and it took very little to get it running in the first place, a backup may come in handy.

Backing it up

To copy the memory card, all I had to do was:

# dd if=/dev/sdb of=/opt/backup/raspberry-20130730.img

and let it run until completion.

Checking it

Next step is to fix the file system. Can I really do it? Let's try:

# fsck -f -p /dev/sdb2

Unfortunately, this fails with something like: "fsck failed, please repair manually". Not a good sign. So, let's try once again:

# fsck -y -f /dev/sdb2

This shows several screens of errors. Bad bad sign. Let's try to mount it:

# mount /dev/sdb2 /mnt/tmp

Seems to mount cleanly now. Let's try to put it back, and reboot the system one more time. Still no luck...

Setting up a raspberry from scratch

Unfortunately, I don't have a backup of the original working memory card. So let's start from scratch, like I did when I first got it.

Downloaded latest raspbian image from: http://downloads.raspberrypi.org/

Installed it, with:

# unzip 2013-07-26-wheezy-raspbian.zip
# dd if=2013-07-26-wheezy-raspbian.img of=/dev/sdb

Next step is configure the network on the memory card, so I can put it back in the raspberry, and finish the setup via ssh. To do so, I need to mount the memory card, and modify a few config files:
1. I need to tell my linux kernel to re-read the partition table of sdb, so it picks up the position of the partitions I just copied into /dev/sdb, with:
```
 # sfdisk -R /dev/sdb
```
  Alternatively, I could just have removed the memory card and re-inserted it. But I'm lazy, and the command is more convenient.
2. Mount the partition somewhere:
```
 # mkdir -p /mnt/raspberry
 # mount /dev/sdb2 /mnt/raspberry
```
3. Setup the wireless config. This means editing /etc/network/interfaces on a Debian based system:
```
 # vim /mnt/raspberry/etc/network/interfaces
```
  and added wpa-ssid and wpa-psk, leading to the file looking like this:
```
 auto lo

 iface lo inet loopback
 iface eth0 inet dhcp

 allow-hotplug wlan0
 iface wlan0 inet dhcp
   wpa-ssid "SSID-of-your-wireless-network"
   wpa-psk "password!"

 iface default inet dhcp
```
4. Save and umount.
```
 # sync # Just in case
 # umount /mnt/raspberry
```

Rebooting, and connecting via network

Now it is time to try it. Let's remove the memory card from the laptop, and put it back on the raspberry. Reboot, the green leds are blinking happily.

In my home server, responsible for my network, I have dhcpd running configured to assign the raspberry a static address. I do so with a block like:

subnet 10.1.40.0 netmask 255.255.255.0 {
  option domain-name-servers 10.1.40.254, 8.8.8.8, 8.8.4.4;
  option routers 10.1.40.254;
  range 10.1.40.20 10.1.40.200;

  group {
    use-host-decl-names on;
    host raspberry {
      fixed-address 10.1.40.9;
      hardware ethernet 80:1f:02:9a:9d:e6;
    }
  }

}

In /etc/dhcp/dhcpd.conf. The mac address 80:1f:02:9a:9d:e6 is the one of my raspberry. You can find it by running the command ifconfig or ip link show on the raspberry itself.

Thanks to that block, the raspberry gets assigned the address 10.1.40.9. If you don't have a similar configuration, or don't know the MAC address of your raspberry, don't despair! It is pretty easy to figure it out.

If you have a dhcp server in your network, you can just look at its logs. Around the time the raspberry is booted, you can probably see a line like:

...
Jul 28 07:13:13 yourserver dhcpd: DHCPDISCOVER from 80:1f:02:9a:9d:e6 via eth1
...

in /var/log/messages.

Alternatively, you can run tcpdump while the raspberry is rebooted, and most likely see its mac address and assigned ip. To do so, you can use something like:

# tcpdump -v -nei wlan0 port 67 or port 68

In any case, the raspberry boots. My:

$ ping 10.1.40.9

eventually succeeds, and I can login with ssh using username pi, password raspberry, and sudo -s to become root:

$ ssh pi@10.1.40.9
Password: raspberry
$ sudo -s

Configuring the raspberry

I use the raspberry as a headless server to control my irrigation system. Unfortunately, raspbian seems to be geared more to desktop users.

Here's what I did to configure it:

Install my ssh keys both for root and pi. This is necessary only if you use ssh-agent and ssh keys.

$ ssh-copy-id pi@10.1.40.9
$ ssh pi@10.1.40.9
$ sudo -s
# cp -a ~pi/.ssh ~root/.ssh
# chown root -R ~root/.ssh

Disabled password based access. Again, do this only if you use ssh keys. If you don't though, you should make sure to change the password of user pi.
```
# passwd -l pi
# passwd -l root
```

Pruned and installed a few utilities, while updating the system to the latest version:

# apt-get install bootlogd vim mosh screen bsd-mailx postfix
# apt-get --purge remove consolekit triggerhappy
# apt-get --purge remove cups.* xserver.* x11.*

# apt-get update
# apt-get dist-upgrade
# apt-get autoclean
# apt-get autoremove

Configured the language, so it would use my language (and above all, stop apt-get and other tools from complaining about a locale not being set):
```
# apt-get install locales
# dpkg-reconfigure locales
```
Changed a few settings, in particular, set RAMTMP=yes, to have /tmp in ram, rather than write on ssd, and mounted boot as read only. Both to protect the file systems, in case something else goes wrong with the SSD:
```
# vim /etc/defaults/tmpfs
...
RAMTMP=yes
...

# vim /etc/fstab
...
/dev/mmcblk0p1  /boot           vfat    defaults,ro       0       2
```
Given the corruption problem I had, I was tempted to mark the root file system as using data=journal, or even sync. Given that I had not found the root cause of the corruption, in the end I decided to do a back up and leave the setup as is :).

Installed cron jobs. I have two cron jobs on the raspberry pi:

To check the internal temperature, send me an email and reboot the device if it is too hot.
To verify that the wireless is up, and restart it if it is not. I have a tiny USB wireless dongle, an EW-7811Un, which generally works well. However, it does disconnect from time to time, especially if I reboot or poke at the access point :).

The first script is this one:

$ cat  ./check-connectivity.sh
#!/bin/bash

attempts=5
# This is any machine on your network that is always on. The script tries
# to ping this machine a few times, if it fails, it restarts the wireless.
server=server

for n in `seq $attempts`; do
  logger -t "connectivity-check" "Sending ping request $n to '$server'."
  ping -c1 "$server" &>/dev/null && {
    logger -t "connectivity-check" "Server is reachable, nothing to do."
    exit 0
  }
done

logger -t "connectivity-check" "Server is unreachable, restarting wireless."
( 
  set -x
  ifdown wlan0
  rmmod 8192cu
  modprobe 8192cu
  ifup wlan0
) 2>&1 | (while read line; do logger -t "connectivity-check" "Output: $line"; done;)

While this is the second one:

cat ./check-temperature.sh
#!/bin/bash

temperature=`vcgencmd measure_temp | sed -e 's/.*=\([^.]*\).*/\1/'`
precise=`vcgencmd measure_temp | sed -e "s/.*=\([^\']*\).*/\1/"`
email=youremailaddress@whatever.com
max=70

logger -t "temperature-check" "Temperature: $precise, max: $max"
if [ "$temperature" -ge "$max" ]; then
  (echo "The temperature is currently $temperature. Greater or equal to $max."
   echo ""
   echo "SHUTTING DOWN THE SYSTEM IN 30 SECONDS") | mail $email -s 'Temperature too high - shutting down!' &>/dev/null
  sync
  sleep 30
  halt
fi

To configure cron, I had to add the following lines to /etc/crontab:

* *     * * *   root    /root/utils/check-temperature.sh
* *     * * *   root    /root/utils/check-connectivity.sh

Ok, after I install the web interface of my irrigation system, everything seems to be up again.

Backing up the Raspberry

Now to the backup part. I could remove the memory card again, and do the same I did before. But I am lazy, and would like a backup I can do remotely. So, here's what I did:

Mounted the image I installed on the raspberry locally (remember? one of the first dds in this blog post). Given that the image contains a few partitions, I had to use kpartx to make them available, like this:
```
# kpartx -l 2013-07-26-wheezy-raspbian.img
loop0p1 : 0 114688 /dev/loop0 8192
loop0p2 : 0 3665920 /dev/loop0 122880
# mount /dev/mapper/loop0p2 /mnt/raspberry
```
Make sure to use the loop device created by kpartx, as shown in the output.
Once mounted, used rsync to copy everything from the raspberry to /mnt/raspberry. I have a terrible memory for the options and flags to use, so just used this ac-system-backup here, with something like:
```
 # ac-system-backup 10.1.40.9 /mnt/raspberry
```

At the end of the sync, unmounted the partitions with:

 # umount /mnt/raspberry
 # kpartx -d 2013-07-26-wheezy-raspbian.img

Finally, renamed the image as:

 # mv 2013-07-26-wheezy-raspbian.img backup-raspberry-2013-07-31.img

Next time I have to do a backup, I will first copy this image, and then run rsync.

Conclusion

I still have not found the cause of the file system corruption. In theory, ext4 with journaling should be able to recover cleanly from most states, especially on a system that is hardly (if ever) modified, and the only write activities are for logs.

I spent some time looking at the backup, and the file system was terribly corrupted. Eg, directories turned into files, files with the wrong content, and so on. If it is a software problem, it is a nasty one :)

I checked the power supply, and it is very good, both in terms of voltage and amperage. Given that this is not the first time it happens, I suspect the memory card or the thermal cycle causing issues.

If it happens again, I will probably replace the memory card and see.

http://rabexc.org/posts/raspberry-pi

Horizontal scrolling and you

Jul 11, 2013 Updated Jul 11, 2013

Show full content

When it comes to HTML, CSS, and graphical formatting, I feel like a daft noob.

Even achieving the most basic formatting seems to take longer than it should. Giving up on reasonable compromises is often more appealing to me than figuring out the right way to achieve the goal.

Anyway, tonight I am overjoyed! I wanted to have a <pre> block, with code that:

had an horizontal scroll bar.
but only when there are lines too long.
and well, long lines did not wrap.

I first fidgeted with the white-space property in the attribute, which has a nowrap value, and various other ones. None of them seemed to do what I wanted, the only valid value to preserve white spacing was pre.

overflow-x: auto was easy to find. It would do the right thing except... the text was wrapping, so the scroll bar never showed up.

It took me a while to discover that a word-wrap: normal would do exactly what I wanted.

So, here is the final CSS:

pre {
  word-wrap: normal;
  overflow-x: auto;
  white-space: pre;
}

And here is what it looks like rendered:

This is a really really really really really really really really really really really really really really really really really really really really really really really long line

It's amazing how happiness at times can come from very little things.

http://rabexc.org/posts/horizontal-scrolling

An unwilling dive in xfce4 internals

Jun 28, 2013 Updated Jun 28, 2013

Show full content

I've always liked text consoles more than graphical ones. This at least until some time in 2005, when I realized I was spending a large chunk of my time in front of a browser, and elinks, lynx, links and friends did not seem that attractive anymore.

Nonetheless, I've kept things simple: at first I started X manually, with startx, on a need by need basis. I used ion (yes! ion) for a while, until it stopped working during some upgrade. Than I decided it was time to boot in a graphical interface, and started using slim. Despite some quirks, I've been happy since.

In terms of window managers, I really don't like personalizing or tweaking my graphical environment. I see it as a simple tool that should be zero overhead, require no maintenance, and not get in the way of what I want to do with a computer. I don't want to learn which buttons to click on, how to do transparency, which icons mean what, or where the settings I am looking for were moved to in the latest version.

So I started using xfce. Not because of a particularly well informed choice, just because it worked out of the box with a reasonably minimal interface, and was fast to load. And if all you need is a browser opened on a pane, and gnome-terminal on the other, this is a really good choice.

Today, however, it broke :( for the first time since I have installed it, and through years of unattended upgrades, I'm forced to write this post from a tiny window on the left top corner of my monitor. And to fix it, I had to find out much more than I ever wanted to know about a window manager and xfce.

So, here are the symptoms: rebooted the laptop after a long long time (usually I just put it to sleep), got to the login prompt with slim, entered my username and password, and... sadness, I get a tiny terminal in my leftmost top corner, nothing else, my mouse looks like that X I hadn't seen in a long time, on black and white background. No sign of the usual desktop, tray or windows, which I can't even resize.

I really didn't want to deal with this, had other things I wanted to do on my laptop. So, what to do? For a while, I just used this tiny xterm. But this got boring pretty quickly: if I opened the browser, I had to close it to go back to the terminal. No alt+tab, no panes, couldn't move or resize windows. I finally decided it was time to fix the issue.

Here's the things I did and unwillingly learned in the process, just in case you end up in a similarly tragic situation. Note that each debugging session is about 30 minutes long, the time it takes me to get home and get to work on public transport, minus some time to read emails or interact with other people around me.

First round...

I don't believe in reboots as the one size fits it all solution, but my first hope was that this was some sort of transient failure. Maybe something just went wrong during startup, so I tried the following things:

/etc/init.d/slim restart - to restart the login manager, which in turn would start xfce4 again. It did not help, no useful message on the console, no error whatsoever in the logs. Clean as a bottle of grappa in the winter.
/etc/init.d/slim stop, and startxfce4 - to start xfce4 manually. Same problem as above, but at least ruled out a problem in slim, which is a good start.

From the tiny terminal, I then started xfce4-session, which supposedly is the component that starts up xfce4. Unfortunately, it just bailed out with error:

xfce4-session: Another session manager is already running

Which at least told me xfce4-session had been started already. I could confirm with ps -C xfce4-session, but ps faux only showed my terminal as a child of xfce4-session, while from my past memories I believe I would see many more xfce components.

So.. why did xfce4-session not start anything else? I started poking around in log files, with no luck. Nothing logged at all. strace -fp `pidof xfce4-session` also showed it was just sitting there waiting for some syscall to complete.

Maybe one of the components was not started properly? So I started manually fidgeting with the various pieces of xfce4.

Running xfwm4 manually gave me the naked window manager. At least I could now resize and move windows, victory! Still no start menu, still a single panel.

xfce4-panel gave me, well, the panels, and the "start buttons" at the bottom of the screen.

At this point, my graphical interface was in good enough shape to be usable again. Good, I could stop worrying about it, and do some real work :).

Second round...

Second day, second round. Let's assume that xfce4-session is not starting everything it should. How is it configured? according to the man page, beside a few caches it reads its configurations by using xfconf.

xfce4-session reads its configuration from Xfconf. xfce4-session stores its session data into $XDG_CACHE_HOME/sessions/.

man page also refers to a "sessions" subdirectory of $XDG_CACHE_HOME, by default in ~/.cache/, and a set of subdirectories in $XDG_CONFIG_HOME, by default in ~/.config/.

Let's start poking at xfconf-query. A simple:

$ xfconf-query
Channels:
  thunar-volman
  xfce4-mixer
  keyboards
  xfce4-desktop
  xfwm4
  xfce4-power-manager
  xfce4-settings-manager
  xfce4-panel
  xsettings
  xfce4-keyboard-shortcuts
  thunar
  pointers
  xfce4-session

returns a list of channels, and after reading xfconf-query --help, I tried:

$ xfconf-query -R -c xfce4-session -l
/general/FailsafeSessionName
/general/SaveOnExit
/general/SessionName
/sessions/Failsafe/Client0_Command
/sessions/Failsafe/Client0_PerScreen
[...]

which gave me the list of settings for xfce4-session. To fetch a variable, I can do:

$ xfconf-query -c xfce4-session -p /sessions/Failsafe/Count
5

But nothing particularly interesting turned out here. So let's poke at the $XDG_CACHE_HOME, and $XDG_CONFIG_HOME.

$XDG_CACHE_HOME seems just a collection of cached files, ranging from chrome to duplicity, and well, xfce.

In ~/.cache/sessions, aka $XDG_CACHE_HOME/sessions, referenced earlier, I see a list of files:

Thunar-xxxx-33af-41b8-80c9-xxxx
xfce4-session-joshua:0
xfce4-session-joshua:0.bak
xfwm4-xxxx-1ecf-41cf-8d02-yyyyy
xfwm4-xxxx-1ecf-41cf-8d02-xxxxx.state

let's look at xfce4-session-joshua:0 to start with, cat xfce4-session-joshua:0. Turns out it's a simple text file, maybe providing settings for each program I had started during the last session of xfce4? seems like a plausible idea (some stuff replaced by xxx and yyy):

[Session: Default]
Client0_ClientId=xxxx
Client0_Hostname=local/joshua
Client0_CloneCommand=xfwm4,--display,:0.0
Client0_DiscardCommand=rm,-rf,/home/yyy/.cache/sessions/xfwm4-xxx.state
Client0_RestartCommand=xfwm4,--display,:0.0,--sm-client-id,xxxx
Client0_CurrentDirectory=/home/yyy
Client0_Program=xfwm4
Client0_UserId=yyy
Client0_Priority=15
Client0_RestartStyleHint=2
Client1_ClientId=zzzz
Client1_Hostname=local/joshua
Client1_CloneCommand=Thunar
[...]

Note the CloneCommand line above shows a command line to run. Let's look at it in more details:

$ grep CloneCommand ./xfce4-session-joshua\:0
Client0_CloneCommand=xfwm4,--display,:0.0
Client1_CloneCommand=Thunar
Client2_CloneCommand=xfce4-panel
Client3_CloneCommand=xfdesktop,--display,:0.0
Client4_CloneCommand=xfce4-settings-helper,--display,:0.0
Client5_CloneCommand=gnome-terminal

Note that 2 out of 6 commands (xfwm4, xfce4-panel) are the ones I had to run manually to get back some of the normal features of a desktop environment. Let's try to run some of the others:

Thunar - a file manager kind of window appears. Given that I had never seen it before, I just close it. Useless, you can do the same with a shell.
xfdesktop - yay! my background (a solid reddish thing) appears. Together with 3 icons. Overall, I can do without, but it's nice to have a familiar and uniform color as a background.
xfce4-settings-helper - doesn't seem to be installed on my system. Weird.

So.. what is xfce4-settings-helper? and what happened to it?

$ apt-cache search xfce4-settings-helper

turns out empty. Let's go with apt-file:

$ apt-file search xfce4-settings-helper
xfce4-settings: /usr/bin/xfce4-settings-helper

So it should be part of xfce4-settings. Let's look at it:

$ dpkg -L xfce4-settings |grep bin
/usr/bin/xfsettingsd
/usr/bin/xfce4-settings-manager
/usr/bin/xfce4-display-settings
/usr/bin/xfce4-mime-settings
/usr/bin/xfce4-mouse-settings
/usr/bin/xfce4-settings-editor
/usr/bin/xfce4-accessibility-settings
/usr/bin/xfce4-keyboard-settings
/usr/bin/xfce4-appearance-settings

Looks like xfce4-settings-helper has been replaced by something else recently? My apt-file index is probably a few months old at this point. xfsettingsd seems useful, by the name of it. But turns out it's already running:

$ ps -C xfsettingsd
  PID TTY          TIME CMD
 4990 ?        00:00:03 xfsettingsd

and it's been running since my first attempt at fixing the system. If I run the xfce4-... commands in xfce4-settings manually, I see some sort of control panels to change the settings of keyboard, mouse, ... Not surprising :).

So, what about the other files in ~/.cache/sessions? I am not interested in Thunar.*, so let's look at the xfwm4 files:

$ cat xfwm4-2d1adf3c0-1ecf-41cf-8d02-0f70f2f2f5eb
[CLIENT] 0x2400004
  [CLIENT_ID] 2c734204f-71b3-4e34-916d-a3367d9c329f
  [CLIENT_LEADER] 0x2400001
  [WINDOW_ROLE] gnome-terminal-window-3521-307132299-1301003369
  [RES_NAME] gnome-terminal
  [RES_CLASS] Gnome-terminal
  [WM_NAME] ccontavalli@joshua: /var/log
  [WM_COMMAND] (1) "gnome-terminal"
  [GEOMETRY] (0,15,1280,785)
  [GEOMETRY-MAXIMIZED] (3,15,577,335)
  [SCREEN] 0
  [DESK] 1
  [FLAGS] 0x10300
[CLIENT] 0x1a0006a
  [CLIENT_ID] 2cde246f4-a13f-4060-bc80-638880912489
  [CLIENT_LEADER] 0x1a00001
  [WINDOW_ROLE] browser
  [RES_NAME] Navigator
  [RES_CLASS] Iceweasel
  [WM_NAME] slim xfce4 consolekit debian - Google Search - Iceweasel
  [WM_COMMAND] (1) "firefox-bin"
  [GEOMETRY] (0,15,1280,785)
  [GEOMETRY-MAXIMIZED] (0,15,1280,785)
  [SCREEN] 0
  [DESK] 0
  [FLAGS] 0x10300

This looks an awful lot like the screen I had when I last used xfce4, this is probably where xfwm4 stores my last session. Overall not very interesting.

Let's move back to exploring $XDG_CONFIG_HOME, ~/.config/. Here there seems to be a directory for each software I've used in X in the last few months. Not surprisingly, there is a xfce4 and xfce4-session subdirectory. Let's explore them.

The main directories I recognize are:

[...]
./xfce4/xfconf/xfce-perchannel-xml/xfce4-session.xml
./xfce4/xfconf/xfce-perchannel-xml/pointers.xml
./xfce4/xfconf/xfce-perchannel-xml/thunar.xml
./xfce4/xfconf/xfce-perchannel-xml/xfce4-keyboard-shortcuts.xml
./xfce4/xfconf/xfce-perchannel-xml/displays.xml
./xfce4/xfconf/xfce-perchannel-xml/xsettings.xml
[...]

Those seem the settings shown by xfconf earlier. Opening those files confirms that they are likely the same settings, stored in .xml.

[...]
./xfce4/panel/launcher-12533521212.rc
./xfce4/panel/systray-4.rc
./xfce4/panel/tasklist-12533520341.rc
./xfce4/panel/launcher-9
./xfce4/panel/launcher-9/13094492190.desktop
[...]

this looks an awful lot like what I have in my "menu bar" at the bottom of the screen. This is probably where xfce4 stores the buttons I have configured.

./xfce4-session turns out to be empty :(. Nothing here again. It's probably time to get to the next level, let's look at the xfce4-session source code.

First thing I notice in main are:

/* check that no other session manager is running */
sm = g_getenv ("SESSION_MANAGER");
if (sm != NULL && strlen (sm) > 0)
  {
    g_printerr ("%s: Another session manager is already running\n", PACKAGE_NAME);
    exit (EXIT_FAILURE);
  }

/* check if running in verbose mode */
if (g_getenv ("XFSM_VERBOSE") != NULL)
  xfsm_enable_verbose ();

So by removing the SESSION_MANAGER environment variable and setting XFSM_VERBOSE, hopefully I can run xfce4-session manually and see what's happening.

Let's try:

$ unset SESSION_MANAGER
$ export XFSM_VERBOSE=foo
$ xfce4-session
xfce4-session: Another session manager is already running

Argh, there is another check further down in the code:

if (DBUS_REQUEST_NAME_REPLY_PRIMARY_OWNER != ret)
  {
    g_printerr ("%s: Another session manager is already running\n",
                PACKAGE_NAME);
    exit (EXIT_FAILURE);
  }

so, no luck. And well, I am out of time for today.

Last and final round...

I'd really like at this point to run xfce4-session under strace or ltrace, to see what's happening under the hood. Given I can't just run xfce4-session from my shell easily, let's try to exit the graphical interface, and run strace -f startxfce4 2>/tmp/log. If I am lucky, I will see something failing after the exec(... xfce4-session ...) somewhere in the middle of the trace. If not, it will be too noisy, and will need to find a better way to trace the problem.

As soon as I exit the graphical interface, I notice on my console some messages that look like:

(xfce4-session:16876): xfce4-session-WARNING **: Unable to launch "xfwm4": Failed to change to directory '/home/xxx' (No such file or directory)
(xfce4-session:16876): xfce4-session-WARNING **: Unable to launch "xfwm4": Failed to change to directory '/home/xxx' (No such file or directory)
(xfce4-session:16876): xfce4-session-WARNING **: Unable to launch "xfce4-panel": Failed to change to directory '/home/xxx' (No such file or directory)
(xfce4-session:16876): xfce4-session-WARNING **: Unable to launch "xfce4-panel": Failed to change to directory '/home/xxx' (No such file or directory)
(xfce4-session:16876): xfce4-session-WARNING **: Unable to launch "xfdesktop": Failed to change to directory '/home/xxx' (No such file or directory)
(xfce4-session:16876): xfce4-session-WARNING **: Unable to launch "xfdesktop": Failed to change to directory '/home/xxx' (No such file or directory)

YAY! This is probably the culprit: a few months ago I moved my home directory to /opt, as my root partition (where /home used to live), was full. I changed the record in passwd, and naively assumed that programs would still find their data in the new location, using wordexp() for tilde expansion, environment variables or getpwnam().

I bet nobody ever changes his home directory, or has different NFS mount points on different machines (right, who would use /home/u/user, for example, on a crowded server? and /home/user in his own desktop? surely a home directory is visible from the same path on every computer where one might access it).

Conclusion

At this point, I first tried to create a symlink from the old home location to the new one, and everything appeared to work:

ln -s /home/xxx /opt/home/xxx

To fix the problem forever, I used something like:

find ~/.config ~/.cache/sessions -print0 |xargs -0 sed -i -e "s@/home/xxx@/opt/home/xxx@"

which updated all the paths in every config and sessions file.

Updates: as pointed out on the Debian bug I filed, I should have started looking from ~/.xsession-errors. That would have made things easier :)

http://rabexc.org/posts/an-unwilling-dive-in-xfce4-internals

How to get started with libvirt on Linux

May 14, 2013 Updated May 14, 2013

Show full content

If you like hacking and have a few machines you use for development, chances are your system has become at least once in your lifetime a giant meatball of services running for who knows what reason, and your PATH is clogged with half finished scripts and tools you don't even remember what they are for.

If this never happened to you, don't worry: it will happen, one day or another.

My first approach at sorting this mess out were chroots. The idea was simple: always develop on my laptop, but create a self contained environment for each project. In each such environment, install all the needed tools, libraries, services, anything that was needed for my crazy experiments. This was fun for a while and worked quite well: I became good friend with with rsync, debootstrap, mount --rbind and sometimes even pivot_root, and I was happy.

Until, well, I run into the limitations of chroots: can't really simulate networking, can't run two processes on port 80, can't run different kernels (or OSes), and don't really help if you need to work on something boot related or that has to do with userspace and kernel interactions.

So, time to find a better solution. Guess what it was? Virtual Machines.

At first it was only one. A good old image created from scratch I would run with qemu and a tap device. A few tens of lines of shell script to get it up as needed, and I was back in business with my hacking.

Fast forward a few years, and I have > 10 different VMs on my laptop, this shell script has grown to almost 1k lines of an unmaintainable entanglement of relatively simple commands and images to run, and I am afraid of even thinking of what to use for my next project. My own spaghetti VMs.

A few weekends ago I finally built up the courage to fix this, and well, discovered how easy it is to manage VMs with libvirt. So, here's what I learned...

This article is mainly focus on Debian, but most of the instructions should work for any derivative (Ubuntu and friends) and most Linux distributions.

Setup

You start by installing the needed tools. On a Debian system:

$ sudo -s
# apt-get install libvirt-bin virtinst

This should get a libvirtd binary running on your machine:

$ ps u -C libvirtd
USER    PID %CPU %MEM    VSZ  RSS TTY STAT START TIME COMMAND
root  11950  0.0  0.1 111928 7544 ?   Sl   Apr19 1:29 /usr/sbin/libvirtd -d

The role of libvirtd is quite important: it takes care of managing the VMs running on your host. It is the daemon that starts them up, stops them and prepares the environment that they need. You control libvirtd by using the virsh command from the shell, or virt-manager to have a graphical interface. I am generally not fond of graphical interfaces, so I will talk about virsh for the rest of the post.

First few steps with libvirt

Before anything else, you should know that libvirt and virsh not only allow you to manage VMs running on your own system, but can control VMs running on remote systems or a cluster of physical machines. Every time you use virsh you need to specify some sort of URI to tell libvirt which sets of virtual machines you want to control.

For example, let's say you want to control a XEN virtual machine running on a remote server called "myserver.com". When using virsh, you can refer to that VM by providing an URI like xen+ssh://root@myserver.com/, indicating that you want to use ssh to connect as root to the server myserver.com, and control xen virtual machines running there.

With QEMU (and KVM), which is what I use, there are two URIs you need to be aware of:

qemu://xxxx/system, to indicate all the system VMs running on server xxxx.
qemu://xxxx/session, to indicate all the VMs belonging to the user that is running the virsh command.

That's right: each user can have its own set of VMs and networks, and if allowed to do so, can control a set of system wide, global VMs. Session VMs run as the user that started them, while system VMs generally run as an unprivileged, dedicated, user, libvirt-qemu on a debian systems.

If you omit xxxx, with URIs like qemu:///system, or qemu:///session, you are referring to the system and session VMs running on the machines you are running the command on, localhost.

Note that if you use virsh as root, and do not specify which sets of VMs you want to control, it will default to controlling the system VMs, the global ones. If you run virsh as a different user instead, it will default to controlling the session VMs, the ones that only belong to you.

This is a common mistake and good source of confusion when you get started. To avoid mistakes, it is a good idea to explicitly specify which VMs you want to work on with the -c option that you will see in a few minutes.

Managing system VMs

On a Debian machine, for a user to be allowed to mange system VMs it needs to be able to send commands to libvirtd. By default, libvirtd listens on a unix domain socket in /var/run/libvirt, and for a user to be able to write to that socket he needs to belong to the libvirt group.

If you edit /etc/libvirt/libvirtd.conf, you can configure libvirtd to wait for commands using a variety of different mechanisms, including for example SSL encrypted TCP sockets.

Given that I only want to manage system local virtual machines, I just added my user, rabexc, to the group libvirt so I don't have to be root to manage these machines:

$ sudo usermod -a -G libvirt rabexc
# alternatively, use vigr and vigr -s

Defining a network

Each VM you define will likely need some sort of network connectivity, and some sort of storage to use. Each object in libvirt, being it a network, a pool of disks to use, or a VM, is defined by an xml file.

Let's start by looking at the default network configuration, run:

$ virsh -c qemu:///system net-list
Name                 State      Autostart
-----------------------------------------

This means that there are no active virtual networks. Try one more time adding --all:

$ virsh -c qemu:///system net-list --all
Name                 State      Autostart
-----------------------------------------
default              inactive   no

and notice the default network. If you want to inspect or change the configuration of the network, you can use either net-dumpxml or net-edit, like:

$ virsh -c qemu:///system net-dumpxml default
<network>
  <name>default</name>
  <uuid>ee49713c-d1c8-e08b-b007-6401efd145fe</uuid>
  <forward mode="nat">
  <bridge delay="0" name="virbr0" stp="on">
  <ip address="192.168.122.1" netmask="255.255.255.0">
    <dhcp>
      <range end="192.168.122.254" start="192.168.122.2">
    </range></dhcp>
  </ip>
  </bridge>
  </forward>
</network>

The output is pretty much self explanatory: 192.168.122.1 will be assigned to the virbr0 interface as the address of the gateway, virtual machines will be assigned addresses between 192.168.122.2 and 192.168.122.254 using dhcp, and forward traffic of those virtual machines to the outside world by using nat, eg, by mapping their IP address behind the address of your host.

A bridge device (virbr0) allows Virtual Machines to communicate with each other, as if they were connected to their own dedicated network. You can configure networking in many different ways, with nat, with bridging, with simple gateway forwarding, ... You can find full documentation on the parameters on the libvirt website, and change the definition by using net-edit. Other handy commands:

net-undefine default, to forever eliminate the default network.
net-define file.xml, to define a new network starting from an .xml file. I usually start from the xml of another network, by using virsh ... net-dumpxml default > file.xml, edit edit edit, and then virsh ... net-define file.xml.

Starting and stopping networks

Once you have a network defined, you need to start it, or well, tell virsh that you want it started automatically. In our case, the commands would be:

net-start default, to start the default network.
net-destroy default, to stop the default network, with the ability of starting it again in the future.
net-autostart default, to automatically start the default network at boot.

Now... what happens exactly when we start a network? My laptop has quite a few iptables rules and various other random network configurations. So, let's try:

$ virsh -c qemu:///system net-start default
Network default started
And have a look at the system: 
$ ps faux
[...]
root   1799 0.0 0.6 109688 6508 ? Sl May01 0:00 /usr/sbin/libvirtd -d
nobody 4246 0.0 0.0   4608  896 ?  S 08:35 0:00 /usr/sbin/dnsmasq --strict-order --bind-interfaces --pid-file=/var/run/libvirt/network/default.pid --conf-file= --except-interface lo --listen-address 192.168.122.1 --dhcp-range 192.168.122.2,192.168.122.254 --dhcp-leasefile=/var/lib/libvirt/dnsmasq/default.leases --dhcp-lease-max=253 --dhcp-no-override
# netstat -nulp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address  Foreign Address PID/Program name
udp        0      0 192.168.0.1:53 0.0.0.0:*       4246/dnsmasq
udp        0      0 0.0.0.0:67     0.0.0.0:*       4246/dnsmasq

# netstat -ntlp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address   Foreign Address  State   PID/Program name
tcp        0      0 192.168.0.1:53  0.0.0.0:*        LISTEN  4246/dnsmasq
tcp        0      0 0.0.0.0:22      0.0.0.0:*        LISTEN  2108/sshd

libvirt started dnsmasq, which is a simple dhcp server with the ability to also provide DNS names. Note that the command line parameters seem to match what we had in the default xml file.

$ ip address show
1: lo:  mtu 16436 qdisc noqueue state UNKNOWN
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0:  mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 52:54:00:2e:72:8b brd ff:ff:ff:ff:ff:ff
    inet 192.168.100.86/24 brd 192.168.100.255 scope global eth0
    inet6 fe80::5054:ff:fe2e:728b/64 scope link
       valid_lft forever preferred_lft forever
4: virbr0:  mtu 1500 qdisc noqueue state DOWN
    link/ether 8a:3c:6e:11:28:85 brd ff:ff:ff:ff:ff:ff
    inet 192.168.122.1/24 brd 192.168.122.255 scope global virbr0

This shows that a new device, virbr0, has been created, and assigned 192.168.122.1 as an address.

$ sudo iptables -nvL
Chain INPUT (policy ACCEPT 565 packets, 38728 bytes)
 pkts bytes target     prot opt in     out     source               destination
    0     0 ACCEPT     udp  --  virbr0 *       0.0.0.0/0            0.0.0.0/0            udp dpt:53
    0     0 ACCEPT     tcp  --  virbr0 *       0.0.0.0/0            0.0.0.0/0            tcp dpt:53
    0     0 ACCEPT     udp  --  virbr0 *       0.0.0.0/0            0.0.0.0/0            udp dpt:67
    0     0 ACCEPT     tcp  --  virbr0 *       0.0.0.0/0            0.0.0.0/0            tcp dpt:67

Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination
    0     0 ACCEPT     all  --  *      virbr0  0.0.0.0/0            192.168.122.0/24     state RELATED,ESTABLISHED
    0     0 ACCEPT     all  --  virbr0 *       192.168.122.0/24     0.0.0.0/0
    0     0 ACCEPT     all  --  virbr0 virbr0  0.0.0.0/0            0.0.0.0/0
    0     0 REJECT     all  --  *      virbr0  0.0.0.0/0            0.0.0.0/0            reject-with icmp-port-unreachable
    0     0 REJECT     all  --  virbr0 *       0.0.0.0/0            0.0.0.0/0            reject-with icmp-port-unreachable

Chain OUTPUT (policy ACCEPT 376 packets, 124K bytes)
 pkts bytes target     prot opt in     out     source               destination

$ cat /proc/sys/net/ipv4/ip_forward
1

Firewalling rules have also been installed. In particular, the first 4 rules allow querying of dnsmasq from the virtual network. Here they are meaningless: iptables default policy is to accept by default. But had I had my real iptables rules running, they would have blocked that traffic, while the new rules here, inserted before my existing rules, would have allowed it.

Forwarding rules, instead, allow all replies to come back in (packets belonging to RELATED and ESTABLISHED sessions), and allow communications from the virtual network to any other network, as long as the source ip is 192.168.122/24.

Note also that ip forwarding has either been enabled, or was already enabled by default.

$ sudo iptables -t nat -nvL
Chain PREROUTING (policy ACCEPT 1 packets, 32 bytes)
 pkts bytes target     prot opt in     out     source               destination

Chain INPUT (policy ACCEPT 1 packets, 32 bytes)
 pkts bytes target     prot opt in     out     source               destination

Chain OUTPUT (policy ACCEPT 1 packets, 1500 bytes)
 pkts bytes target     prot opt in     out     source               destination

Chain POSTROUTING (policy ACCEPT 1 packets, 1500 bytes)
 pkts bytes target     prot opt in     out     source               destination
    0     0 MASQUERADE  tcp  --  *      *       192.168.122.0/24    !192.168.122.0/24     masq ports: 1024-65535
    0     0 MASQUERADE  udp  --  *      *       192.168.122.0/24    !192.168.122.0/24     masq ports: 1024-65535
    0     0 MASQUERADE  all  --  *      *       192.168.122.0/24    !192.168.122.0/24

Finally, note that rules to perform NAT have been installed. Those rules are added by scripts when the network is setup. Some documentation is provided on the libvirt wiki.

If you want to, you can also add arbitrary rules to filter traffic to virtual machines, and have libvirt install and remove them automatically. As for the network commands, the main commands are: nwfilter-define, nwfilter-undefine, ...-edit, ...-list, ...-dumpxml, similar to the network commands. You can read more about firewalling (on the libvirt site)[http://libvirt.org/firewall.html].

Managing storage

Now that we have a network running for our VMs, we need to worry about storage. There are many ways to get some disk space, ranging from dedicated partitions or LVM volumes to simple files.

The main idea is to create a pool from which you can draw space from, and create volumes, equivalent to disks. If you are familiar with lvm, this should not sound very original. On my system, I just dedicated a directory to storing images and volumes.

You can start with:

$ virsh -c qemu:///system \
    pool-define-as devel \
    dir --target /opt/kvms/pools/devel

This creates a pool called devel in a drectory /opt/kvms/pools/devel. I can see this pool with:

$ virsh -c qemu:///system pool-list --all
Name                 State      Autostart 
-----------------------------------------
devel                inactive   no

Note the --all parameter. Without it, you would only see started pools. And as before, you can mark it to be automatically started by using:

$ virsh -c qemu:///system pool-autostart devel

and start it with:

$ virsh -c qemu:///system pool-start devel

To create and manage volumes, you can use vol-create, vol-delete, vol-resize, ... all the vol* commands that virsh help shows you. Or, you can just let virsh manage the volumes for you, as we will see in a second. The one command you will find useful is vol-list, to have the list of volumes in a pool.

For example:

$ virsh -c qemu:///system vol-list devel
Name                 Path
-----------------------------------------

Shows that there are no volumes. Don't forget that the pool has to be active for most of the vol- commands to work.

Installing a virtual machine

Now you are finally ready to create a new virtual machine. The main command to use is virt-install. Let's look at a typical invocation:

virt-install -n debian-testing \
             --ram 2048 --vcpus=2 \
             --cpu=host \
             -c ./netinst/debian-6.0.7-amd64-netinst.iso \
             --os-type=linux --os-variant=debiansqueeze \
             --disk=pool=devel,size=2,format=qcow2 \
             -w network=devel --graphics=vnc

and go over the command line for a minute:

-n debian-testing is just a name. From now on, and with every subsequent virsh command, this VM will be called debian-testing.
--ram 2048 --vcpus=2 should also be no surprise: give it 2Gb of RAM, and 2 CPUs.
--cpu=host means that I do not want to emulate any specific CPU, the VM should just be provided the same CPU as my physical machine. This is generally fast, but can mean troubles if you want to be able to migrate your VMs to a less capable machine. The fact is, however, that I don't care about migrating my VMs, and prefer them to be fast :).
-c ./netinst... means that the VM should be configured to have a "CD-ROM" drive, and this drive should have a disk in it with the content of the file ./netinst/debian-6.0.7-amd64-netinst.iso. This is just an installation image of debian. You need to download the install cd-rom, usb key, or ... your favourite media from the distribution web site.
--os-type, --os-variant are optional, but in theory allow libvirt to configure the VM with the optimal parameters for your operating system.

The most interesting part to me comes from:

--disk=pool=devel,size=2,format=qcow2, which asks libvirt to automatically allocate 2 Gb of space from the devel pool. Do you remember? The pool we defined just a few sections ago. The format parameter indicates how to store this VMs disks. The qcow2 format is probably the most common format for KVM and QEMU, and provides a great deal of flexibility. Look at the man page for more details, you can use a variety of formats.
-w network=devel means that the VM should be connected to the default network. Again, the network we created at the start of this article.
--graphics=vnc just means that you want to have a vnc window to control the VM.

Of course, you need to get a suitable installation media in advance, the file specified with -c ./netinsta.... I generally use CD or USB images suitable for a network install, which means minimal system, most of it downloaded from the network. virt-install also supports fetching directly the image to use from an http, ftp, or nfs server, in which case you should use the -l option, and read the man page, man virt-install. Don't forget that the image type must match the cpu you specify with --cpu (eg, you will get into trouble if you download a powerpc image and try to run it on an ARM VM, as you may guess).

Converting an existing virtual machine

In my case I had many existing VMs on my system. I did not want to maintain the same network setup, in facts, the default DHCP and NAT setup with a bridge provided by libvirt was far superior to what my shell script set up before. To import the VMs, I followed a simple procedure:

Copied the image in the directory of the pool: cp my-vm.qcow2 /opt/kvms/pools/devel
Refreshed the pool, just in case: virsh -c qemu:///system pool-refresh default

Created a new VM based on that image, by using virt-install with the --import option, for example:

virt-install --connect qemu:///system --ram 1024 \
    -n my-vm --os-type=linux --os-variant=debianwheezy \
    --disk vol=default/my-vm.qcow2,device=disk,format=qcow2 \
    --vcpus=1 --vnc --import

Note the default/my-vm.qcow2 indicating the file to use, and --import, to indicate that the VM already exists.

Of course, once the import was completed I had to connect to the VM and change the network parameters to use DHCP instead of a static address.

Managing Virtual Machines

You may have noticed that once you run virt-install, your virtual machine is started. The main commands to manage virtual machines are:

virt-viewer my-vm - to have the screen of your VM opened up in a vnc client.
virsh start my-vm - to start your VM.
virsh destroy my-vm - to stop your VM violently. It is generally much better to run "shutdown" from your VM, or better...
virsh shutdown my-vm - to send your VM a "shutdown request", like if you had pressed the shutdown button on your server. Note that it is then up to the OS installed and its configuration to decide what to do. Some desktop environments, for example, will pop up a window asking you what you want to do, and not really shutdown the machine.
virt-clone --original my-vm --auto-clone - to make an exact copy of your VM.
virsh autostart my-vm - to automatically start your vm at boot.

A few other random notes VNC console from remote machine with no libvirt tools

I had to connect to the VNC console of my virtual machines from a remote desktop that did not have virt-viewer installed, so I could not use the -c and URI parameters. A simple port forwarding got me what I wanted:

$ ssh rabexc@server -L 5905:localhost:5900
$ vncviewer :5

To forward port 5900, first VM running VNC, to the local port 5905, and asked vncviewer to connect directly to the 5th VNC console locally (5900 + 5 = 5905).

virsh snapshots and qcow2

First time I used virsh snapshot-save my-vm to take a snapshot of all the volumes used by my VM I could not find where the data was stored. It turns out that qcow2 files have direct support for snapshots, which are saved internally within the same file. To see them, beside the virsh commands, you can use: qemu-img info /opt/kvms/pools/devel/my-vm.qcow2.

Moving qcow2 images around

If you created qcow2 images based on other images by using -o backing_file=... to only record the differences, if you move the images around this diff will not work anymore, as it will not find the original backing file anymore. A quick fix was to use:

qemu-img rebase -u -b original_backing_file_in_new_path.img \
    derived_image.qcow2

Note that -u, unsafe, is only usable if really, the only thing that changed between the two images was the path.

Sending qemu monitor commands directly

Before switching to libvirt I was used to managing kvm / qemu VMs by using the monitor interface. Despite what the documentation claims, it is possible to send commands through this interface directly by using:

$ virsh -c qemu:///system \
    qemu-monitor-command \
    --hmp debian-testing "help"

for example. This may not always be a good idea, as you may end up confusing libvirt.

Finding the IP address of your VM

When a VM starts with the default network configuration it will be assigned an IP via DHCP by dnsmasq. This IP can change. For some reason, I was sort of expecting dnsmasq, also capable of behaving as a simple DNS server, would maintain a mapping VM name to IP, and accept DNS queries to resolve the name of the VM.

Turns out this is not the case, unless you explicitly add mappings between names and the MAC address of your VM in the network configuration. Or at least, I could not find a better way to do it.

The only reliable way to find the IP of your VM is to either provide a static mapping in the xml file, or look into /var/lib/libvirt/dnsmasq/default.leases for the MAC address of your VM, where default is the name of your network.

You can find the MAC address of your VM by looking at its xml definition, with something like:

virsh dumpxml debian-modxslt |grep "mac address"

You can find plenty of shell scripts on google to do this automatically for you.

Conclusions

Switching to libvirt took me only a few hours, and I am no longer afraid of having to deal with multiple VMs on my laptop :). Creating them, cloning temporarily, or removing them has become an extremely simple task.

http://rabexc.org/posts/how-to-get-started-with-libvirt-on

Many encrypted volumes, a single passphrase?

Apr 16, 2013 Updated Apr 16, 2013

Show full content

Just a few days ago I finally got a new server to replace a good old friend of mine which has been keeping my data safe since 2005. I was literally dying to get it up and running and move my data over when I realized it had been 8 years since I last setup dmcrypt on a server I only had ssh access to, and had no idea of what best current practices are.

So, let me start first by describing the environment. Like my previous server, this new machine is setup in a datacenter somewhere in Europe. I don't have any physical access to this machine, I can only ssh into it. I don't have a serial port I can connect to over the network, I don't have IPMI, nor something like intel kvm, but I really want to keep my data encrypted.

Having a laptop or desktop with your whole disk encrypted is pretty straightforward with modern linux systems. Your distro will boot up, kernel will be started, your scripts in the initrd will detect the encrypted partition, stop the boot process, ask you for a passphrase, decrypt your disk, and happily continue with the boot process.

But when your passphrase is asked, your network is not up yet, there is no ssh access. Either you sit in front of the monitor and type your passphrase, or there is really not that much you can do from a few thousand miles away.

To have encrypted partitions you can manage remotely, you pretty much need:

A "minimal" linux system to boot. Minimal enough that you can get your network up and running, and some protocol so you can connect and type your passphrase. I'll get back to this in a few paragraphs.
Some tool or script to mount your encrypted file systems and continue the boot process once you connect and enter your password.

Sounds easy, doesn't it? I spent some time looking around to see if I could find some pre-baked solutions, like a simple package to install that would tweak my initrd and add ssh and the needed scripts, or some suggestion on how to do it in a smart way. In the end, I baked my own solution, exactly like 8 years ago.

So, here it is...

A minimal system to boot on...

Creating an initrd or a tiny partition to do the initial boot did not seem very attractive: for one, I do not want to keep the whole root encrypted. Root contains only tools and scripts downloaded from the Debian repositories. Configs really contain no sensitive data, and the kind of logs I care about do not end up in /var/log. In second instance, my experience with initrd is that it changes quite a bit over time: you need to generate a new initrd for every new kernel and set of drivers, which is tricky by itself, the tools have changed significantly over time, you need to compute (and install) the dependencies for any tool you need from the minimal root, and hook your stuff well enough with the generator so it keeps working over time. And well, it's hard to test unless you reboot your system, which is really not the time I want to have surprises.

Creating a minimal root outside of the initrd did not seem very attractive either: not fancying having two root partitions to keep up to date in terms of kernel, grub, updates, and so on. And again, I did not need to encrypt root.

The solution I used is pretty simple: have your root and boot in clear, boot from there as normal. From rc*.d, disable all the services that require my encrypted data (like mysql, apache, or my repositories), remove my encrypted partitions from fstab and crypttab (or mark them noauto in both). Basically, have your normal system boot as usual, with ssh but no other service that needs your encrypted data.

Once the system boots...

Once the system boots, I have a script I can run manually that, in order:

decrypts all the partitions (...)
checks that the file systems are sane (remember the fsck run at boot?)
mounts them in the right location
starts all the other services that depend on that data

Encrypted partitions...

Let's start from the encrypted partitions. I've been using LVM and dmcrypt pretty much since they existed. I don't have hybrid systems, always linux only, and I like LVM much more than managing partition tables manually.

One common solution to have multiple volumes encrypted is to create an encrypted volume with LVM, decrypt it, and then use it as a physical volume for another volume group. For example, a system volume group, system/encrypted as a logical volume, and rather than have a simple file system in there, use it as a physical volume for another volume group with multiple encrypted sub volumes.

I am not quite fond of this solution as it makes it hard to borrow space from encrypted space to clear text space and vice versa, and generally makes things more confusing.

What I tend to do is just have a single volume group, containing some encrypted and clear text logical volumes. This however means that each logical volume has to be decrypted independently, and most of the tools will ask you for a passphrase for each volume, multiple times, which is annoying. Some wikis suggest you to keep keyfiles on disk, which is roughly what I do: I create an encrypted logical volume, called keys, with a strong passphrase, that contains truly random keys. To mount the other volumes I first decrypt this key volume, and then immediately unmount it.

Using this mechanism, the "decrypt all the partitions" step I was talking about becomes:

ask for passphrase
decrypt keys volume
mount it
for each encrypted volume
1. load key file in keys partition
2. decrypt the volume
umount the keys volume
... continue with checking the filesystems yadda yadda ...

As an additional requirement, I want those steps to be idempotent: if a partition is already mounted, it should be skipped. If I run the script multiple times, it should just complete the work that wasn't done before.

My solution...

Back in 2005, together with a few friends with whom I was sharing the server, we wrote a small script to maintain those volumes and implement the steps above. The script is now checked in on github, you can find it here.

Setup

To get it running, you first need to install the tools and create the volume where to store the keys:

# Install sys-scripts.
mkdir -p /opt/{scripts,conf}
git clone https://github.com/ccontavalli/sys-scripts.git /opt/scripts

# Install the tools that are needed.
apt-get install cryptsetup lvm2

# Create a volume "encrypted-keys" in group "system", this would
# be "vg0" unless you changed the default.
lvcreate -L 20M -n encrypted-keys system

# Encrypt the partition and open it.
cryptsetup luksFormat /dev/system/encrypted-keys \
    --cipher=aes-cbc-essiv:sha256 --key-size=256 --verify-passphrase
cryptsetup luksOpen /dev/system/encrypted-keys cleartext-keys

# Put a file system on that partition.
mkfs.ext4 /dev/mapper/cleartext-keys

Note that if your volume is not called system but vg0, you will need to edit /opt/scripts/ac-dmcrypt-manage and change cfg_key_volume to look like:

cfg_key_volume=${cfg_key_volume-vg0/encrypted-keys}

or remember to always call ac-dmcrypt-manage with the volume passed, like:

cfg_key_volume=vg0/encrypted-keys ac-dmcrypt-manage ...

Creating volumes

Now you are ready to create volumes. All you have to do is something like:

/opt/scripts/ac-dmcrypt-manage create-volume sytem \
    mysql 20G /opt/mysql ext4

for example, and follow the prompts. You can create as many volumes as you like. If you want to mount that volume, you can then run:

/opt/scripts/ac-dmcrypt-manage start

If you want to inspect the generated keys:

/opt/scripts/ac-dmcrypt-manage mount-keys

Just remember to umount them after a whlie, by using umount-keys.

You can also change the mount options of your partition by editing /opt/conf/ac-fstab, which has been generated automatically by create-volume.

Managing the boot process

Let's say now you want to mark mysql as a process that cannot be started until the encrypted partitions are mounted. What you have to do is:

/opt/scripts/ac-system-boot add mysql

The script will disable mysql from the normal boot, by running something like update-rc.d mysql disable.

When the system reboots

All you have to do is ssh on the system, and then run:

/opt/scripts/ac-system-boot start

Conclusions...

This set of scripts has served me well for several years. I will probably stick with them until I find a better mechanism for this kind of setup. Systems like ecryptfs or encfs look like viable alternatives for home directories or private data for individual users. But from what I have read so far dmcrypt still looks like the best option to keep system partitions encrypted, on a server.

Before using ac-system-boot, we tried using runlevels. Isn't this what they were meant for? The idea was to have a minimal network runlevel, and another runlevel with the system daemons to boot once the partitions are available, and use telinit to switch between them. But between the various alternatives to SysV init that popped up in the last few years, the attention to speeding up the boot process, and various distribution scripts fiddling with rc*.d or assuming one setup or another, this did not work well.

Do you have better proposals? alternatives? let me know.

http://rabexc.org/posts/many-encrypted-volumes-single-passphrase

Cleaning up a CSS

Apr 10, 2013 Updated Apr 10, 2013

Show full content

Let's say you have a CSS with a few thousand selectors and many many rules. Let's say you want to eliminate the unused rules, how do you do that?

I spent about an hour looking online for some tool that would easily clean up CSS files. I've ended up trying a few browser extensions:

CSS Remove and combine, for chrome, did not work for me. It would only parse the very first web site in my browser window, and seemed to refuse file:/// urls. I later discovered that chrome natively supports this feature: just go in developer tools (ctrl + shift + i), click the audits tab, click run, and you will find a drop down with the list of unused rules in your CSS.
Dust-me Selectors, for firefox, worked like a charm: it correctly identified all the unused selectors.

In both cases, however, the list was huge, I had thousands of unused selectors. I was really not looking forward to go through my CSS by hand, considering also that many styles had multiple selectors, and I could only remove the unused ones.

In the end, I noticed that "Dust-me" allowed to export the list of unused selectors as a .csv file, and wrote my own script, css-tidy just to pick an original name to read this csv, parse the .css, and output a cleaned up version of it.

The result was pretty good, and in the end it saved me lot of work :-), have a look at it. Note that this also works with Chrome: all you have to do is feed css-tidy with a list of selectors to eliminate.

http://rabexc.org/posts/cleaning-up-css

Getting back to use openldap

Apr 3, 2013 Updated Apr 3, 2013

Show full content

While trying to get ldap torture back in shape, I had to learn again how to get slapd up and running with a reasonable configs. Here's a few things I had long forgotten and I have learned this morning:

The order of the statements in slapd.conf is relevant. Don't be naive, even though the config looks like a normal key value store, some keys can be repeated multiple times (like backend, or database), and can only appear before / after other statements.
My good old example slapd.conf file, no longer worked with slapd. Some of it is because the setup is just different, some of it because I probably had a few errors to being with, some of it is because a few statements moved around or are no longer valid. See the changes I had to make.
Recent versions of slapd support having configs in the database itself, or at least represented in ldiff format and within the tree. Many distros ship slapd with the new format. To convert from the old format to the new one, you can use:
```
slapd -f slapd.conf -F /etc/ldap/slapd.d
```
I had long forgotten how quiet slapd can be, even when things go wrong. Looking in /var/log/syslog might often not be enough. In facts, my database was invalid, configs had error, and there was very little indication of the fact that when I started slapd, it was sitting there idle because it couldn't really start. To debug errors, I ended up running it with:
```
slapd -d Any -f slapd.conf
```
slapd will not create the initial database by itself. To do so, I had to use:
```
/usr/sbin/slapcat -f slapd.conf < base.ldiff
```
with base.ldiff being something like this.
Even if you set no password, ldapsearch with SASL authentication will likely ask you. It's easy to fix, though: just pass the -x parameter to go back to simple authentication, like with:
```
ldapsearch -x -H "ldap://127.0.0.1:9009/" -b dc=test,dc=it
```
Note that I had slapd run on a non standard port for experimentation purposes.
Let's say you use -h instead of -H for ldapsearch because your memory is flaky, but you specify the parameter like -H would expect:
```
ldapsearch -x -h "ldap://127.0.0.1:9009/" -b dc=test,dc=it
```

The command will silently fail. Eg, it will accept -h as "valid" parameter, but still report "unable to connect". Really, -h takes a simple hostname, like 127.0.0.1, but will not fail in a case like above. Took me a few minutes to realize the mistake.

Let's see what the next roadblocks will be ...

http://rabexc.org/posts/getting-back-to-use-openldap

Randomizing should be easy, right? oh, well, maybe not..

Apr 3, 2013 Updated Apr 3, 2013

Show full content

A simple problem...

Let's say you have a regression test or fuzzy testing suite that relies on generating a random set of operations, and verifying their results (like ldap-torture).

You want this set operations to be reproducible, so if you find a bug, you can easily get to the exact same conditions that triggered it.

There are many ways to do this, but one simple way is to use one of many pseudo random generators, one that given the same starting seed generates the same sequence of random numbers. Example?

Let's look at perl:

# Seed the random number generator.
srand($seed);

# Generate 100 random numbers.
for (my $count = 0; $count < 100; $count++) {
  print rand() . "\n";
}

Given the same $seed, the sequence of random numbers will always be the same. Not surprising, right?

Now, let's go back to our original problem: you want your test to be reproducible, but still be random. Something you can do is get rid of $seed, and just call srand(). srand will return the seed generated, that you can helpfully print on the screen and reuse if you need to. The final code would look like:

if ($seed) {
  # Use an existing seed to reproduce a failing test.
  srand($seed);
} else {
  # Let srand pick a seed to start a newly randomized test.
  $seed = srand();
}

print "TO REPRODUCE TEST, USE SEED: " . $seed . "\n";

A broken solution...

Now, where is the problem? Well, the problem is that before perl 5.14 (~2011, in case you are wondering), srand() did not return the seed it set. Just doing $seed = srand() did not work.

I was debugging a piece of code I wrote a long time ago (2004), and here's what I was doing:

...
} else {
  my $seed = int(rand(~0));
  srand($seed);
}
...

Now, looks nice, doesn't it? rand() will automatically seed to some random value the first time it is called, which means rand() will produce a reasonable value (well, reasonable for non-crypto purposes, and with reasonable versions of perl), which I can then store, use as a seed, and be done with it.

But what about the ~0 in parenthesis? Well, if I just call rand() by itself, without parameters, the returned value is a number between 0 and 1. srand() takes an integer, so something like $seed = rand(); srand($seed) would always lead to seeding the prng with 0, not good at all.

According to the man page, rand($limit) instead will return a random between 0 and $limit. By using ~0 as a parameter I get the maximum integer that perl can represent, an integer entirely made of 1s in binary. On my laptop, this is 2^64 - 1, pretty high.

So: get a random number in the widest range I can possibly get, feed it to srand, print it on the screen, and have a reasonably ok reproducible random sequence at every run of my tool, right?

WRONG! What, why?

Well, turns out that there are (at least) 2 problems.

rand() returns floating points. As you may remember, floating numbers are made of a significand and an exponent. if you ask too large of a number, and convert to integer, there will be no entropy in the lower bits (results of the exponent).
... and well, srand only seems to be using the lower bits of a number to actually seed the prng.

Try yourself if you don't believe me:

my $count;
for ($count = 0; $count < 10; $count ++) {
  srand(); # This just seeds the prng to a non-great but ok value.
  $seed = int(rand(~0));
  srand($seed);
  print "SEED: $seed, NEXT: " . rand() . "\n";
}

Let's try to run this program:

$ perl /tmp/srand.pl
SEED: 1060381934447820800, NEXT: 0.559209994114102
SEED: 7074176055472357376, NEXT: 0.559209994114102
SEED: 1895145662064951296, NEXT: 0.559209994114102
SEED: 8633284823558979584, NEXT: 0.559209994114102
SEED: 18297293223351091200, NEXT: 0.559209994114102
SEED: 12078737747670532096, NEXT: 0.559209994114102
SEED: 11431093298324897792, NEXT: 0.559209994114102
SEED: 15164597111904862208, NEXT: 0.559209994114102
SEED: 11321558760259911680, NEXT: 0.559209994114102
SEED: 7997988821801172992, NEXT: 0.559209994114102

Note that despite the SEED being reasonably random and significantly different every time, the next random number generated is always the same. This means I'd get the same sequence, despite the different seeds. But are the seeds so different? Let's look at them in hex, let's add a sprintf("%x"):

perl /tmp/srand.pl 
SEED: 871400663299457024 c17d64951010000 NEXT: 0.559209994114102
SEED: 8131900614985711616 70da508651010000 NEXT: 0.559209994114102
SEED: 2174713905224417280 1e2e22c651010000 NEXT: 0.559209994114102
SEED: 18069664157141106688 fac457d051010000 NEXT: 0.559209994114102
SEED: 16751261221931515904 e8786f5451010000 NEXT: 0.559209994114102
SEED: 13819218582325755904 bfc7ba1551010000 NEXT: 0.559209994114102
SEED: 4060176321643347968 3858a4da51010000 NEXT: 0.559209994114102
SEED: 17907160938465787904 f88303ff51010000 NEXT: 0.559209994114102
SEED: 12784784062795546624 b16cad5e51010000 NEXT: 0.559209994114102
SEED: 13701418886505758720 be2537e251010000 NEXT: 0.559209994114102

Tadah! Note that the last 32 bits of the seeds are always the same! Now, let's assume for a second that srand() is only using the last 32 bits of a number for seeding. If I shift the number right by 1, with >>1, I should have one bit of entropy, right? and two different outcomes for the next rand? Let's try:

$ perl /tmp/srand.pl
SEED: 40252668253339648, NEXT: 0.365019015110196
SEED: 7239562128531685376, NEXT: 0.865019015110196
SEED: 1399728002052423680, NEXT: 0.365019015110196
SEED: 6143080778424156160, NEXT: 0.365019015110196
SEED: 1155420768230735872, NEXT: 0.865019015110196
SEED: 1359272858982842368, NEXT: 0.365019015110196
SEED: 8077490881973747712, NEXT: 0.865019015110196
SEED: 1391389776166289408, NEXT: 0.865019015110196
SEED: 3567395640554061824, NEXT: 0.865019015110196
SEED: 5663678882486714368, NEXT: 0.365019015110196

Seems like the theory is correct: I only obtain two different values.

So.. I wish that whatever srand() is doing was documented somewhere, the manual page makes no mention of srand only looking at the lowest 32 bits. And I feel naive for not having thought about float conversion to int, and well, very large numbers.

In fairness, this was almost 9 years ago, haven't used perl in a while, and well, have been spoiled by integers in python, which can have arbitrary length.

http://rabexc.org/posts/randomizing-should-be-easy-right-oh

How much of a file system monger are you?

Mar 30, 2013 Updated Mar 30, 2013

Show full content

Have you ever been lost in conversations or threads about one or the other file system? which one is faster? which one is slower? is that feature stable? which file system to use for this or that payload?

I was recently surprised by seeing ext4 as the default file system on a new linux installation. Yes, I know, ext4 has been around for a good while, and it does offer some pretty nifty features. But when it comes to my personal laptop and my data, well, I must confess switching to something newer always sends shrives down my back.

Better performance? Are you sure it's really that important? I'm lucky enough that most of my coding & browsing can fit in RAM. And if I have to recompile the kernel, I can wait that extra minute. Is the additional slowness actually impacting your user experience? and productivity?

Larger files? Never had to store anything that ext2 could not support. Even with a 4Gb file limit, I've only rarely had problems (no, I don't use FAT32, but when dmcrypt/ecryptfs/encfs and friends did not exist, I used for years the good old CFS, which turned out to have a 2Gb file size limit). Less fragmentation? More contiguous blocks? C'mon, how often have you had to worry about the fragmentation of your ext2 file system on your laptop?

What I generally worry about is the safety of my data. I want to be freaking sure that if I lose electric power, forget my laptop in suspend mode or my horrible wireless driver causes a kernel panic I don't lose any data. I don't want no freaking bug in the filesystem to cause any data loss or inconsistency. And of course, I want a good toolset to recover data in case the worst happens (fsck, debug.*fs, recovery tools, ...).

So, what do I do? I stick to older file systems for longer. At least, for as long as the old system is well maintained, and the new system doesn't have something I really really want (like a journal, when they started popping up).

What else? Well, talking about ext.* file system and my setup...

I use data=journal in fstab, whenever possible. For the root partition, be careful that you need to either add the option rootflags=data=journal to your grub configuration, or use something like tune2fs -o journal_data /dev/your/root/device, so that the file system is first mounted with data journaling enabled. If you are curious, the problem stems from the fact that you can't change the journaling mode when the file system is already mounted, the boot process on some distros will fail if you don't follow those steps.
I make sure barriers are enabled. Most modern disks cache your data in an internal memory to be faster. If you lose power, journal or not, that data will be lost. Journal or not, you risk corrupting data. With barrier=1 in fstab you ensure that at least the journal entries are written properly to disk. This again can slow you down, but makes corruption significantly more unlikely.
keep the file systems I don't need to write to read only, with the hope that in case things go wrong, I will be at least able to boot and reduce the surface of damage.
other tunings to reduce battery use.

So, here's what my fstab looks like:

/dev/sda1        /boot   ext2 nodiratime,noatime,ro,nodev,nosuid,noexec,sync 0       2
/dev/mapper/root  /          ext3 nodiratime,noatime,data=journal,errors=remount-ro 0 1
/dev/mapper/opt   /opt       ext3 nodiratime,noatime,data=journal,errors=remount-ro 0 1
/dev/mapper/media /opt/media ext3 nodiratime,noatime,data=journal,errors=remount-ro 0 2 
/dev/mapper/vms   /opt/vms   ext3 nodiratime,noatime,data=journal,errors=remount-ro 0 2

Note that I use LVM underneath. It was hard for me to start, I was fearing an extra layer of indirection and possible caching would have complicated things :), but encryption and snapshotting features sold me, and I've been happily using it for years.

I use a similar setup on a raspberry pi in a small appliance that I cannot properly shutdown, and have been plugging and unplugging it directly without headaches for a while (luckily? maybe).

So, what's next? I'm looking forward for a logged file system or a stable file system that properly supports snapshots. Something like NILFS or maybe btrfs. In the past, I had a script taking snapshots of my LVM partitions periodically and at boot, so if I braoke my system with an update or accidentally removed a file I could easily go back in time.

I gave up on that as LVM snapshots turned out to be fairly buggy from kernel to kernel, not well supported by distributions (had at least one instance of initrd scripts getting confused by the presence of a persistent snapshot, and refusing to boot), and often lead more headaches than advantages, at least for personal use. I will probably give them a shot again in the near future :), but for now, I'm happy as is.

http://rabexc.org/posts/how-much-of-file-system-monger-are-you