GeistHaus
log in · sign up

https://blog.010.one/feed

atom
6 posts
Polling state
Status active
Last polled May 18, 2026 21:46 UTC
Next poll May 19, 2026 22:13 UTC
Poll interval 86400s
ETag W/"66f67bd5-1b776"
Last-Modified Fri, 27 Sep 2024 09:33:09 GMT

Posts

Don’t snipe me in space - intentional flash corruption for STM32 microcontrollers
Almost one and a half years ago I joined MOVE, the Munich Orbital Verification Experiment, a student club focusing on practical education in the area of satellites at the Technical University of Munich. MOVE has launched three CubeSats to date (First-MOVE in 2013, MOVE-II in 2018 and MOVE-IIb in 2019), and we are currently preparing two future missions. These missions require reliable software, and the ability to update in orbit.
Show full content

Almost one and a half years ago I joined MOVE, the Munich Orbital Verification Experiment, a student club focusing on practical education in the area of satellites at the Technical University of Munich. MOVE has launched three CubeSats to date (First-MOVE in 2013, MOVE-II in 2018 and MOVE-IIb in 2019), and we are currently preparing two future missions. These missions require reliable software, and the ability to update in orbit.

One of the first larger projects I took part in was building a bootloader for the STM32L4R5ZI MCU, which should enable us to do reliable on-orbit software updates. This MCU has 2 MB of flash storage, which we use to store the bootloader, firmware images and additional metadata (e.g. checksums for firmware images).

Bootloader requirements and reliability

The bootloader, which is written in Rust, is the first part of our software stack that runs. It has to be extremely reliable, even in really weird situations, because a failure of the bootloader could lead to a loss of the MCU or even the entire mission (depending on the exact design of the remaining system).

Let’s first take a look at what the bootloader actually does. It manages 3 slots for operating system images, with each having around 500 KB reserved for it. Additionally, 2 redundant metadata structs are stored on different flash pages. During an update, one slot is overwritten, and then metadata is adjusted. We are resilient against power failures at any point, and as long as at least one image slot contains an operating system image, we can boot.

To ensure all of this works as expected, we verify some properties using Kani, and we guarantee that no panic handler ends up in the binary (this mostly requires the compiler to prove that no bounds checks can fail, thus optimizing them away, thus making panic unreachable). We also have hardware tests in our CI pipeline that run against the actual bootloader on the target MCU, of which multiple are connected to a self-hosted GitLab runner. Additionally, we use the watchdog of the MCU to reset the chip in case our code would get stuck in some endless loop.

While this gets us pretty far, there are still some situations we have not yet handled, especially regarding interrupts. We don’t actually care about most interrupts in the bootloader, so we just tell the CPU to not handle them. Easy, right?

Well, it’s not that easy. There are some situations where a non-maskable interrupt (NMI) will be triggered, and you can’t ignore them. One of them is the ECCD non-maskable interrupt (ECC detection).

Flash ECC and related interrupts

The microcontroller has 2MB of flash storage with ECC. This means that for every 64 bit, it stores an additional 8 bits of error checking information. When reading from the flash, this information is automatically checked to detect bit flips. These can happen for a variety of reasons. In the case of satellites, radiation exposure can be a cause.

The manual states the following about what happens when you read from a block with one or more bit flips:

When one error is detected and corrected, the flag ECCC (ECC correction) is set in Flash ECC register (FLASH_ECCR). If ECCCIE is set, an interrupt is generated.

When two errors are detected, a flag ECCD (ECC detection) is set in FLASH_ECCR register. In this case, a NMI is generated.

If we have the first situation, that’s fine, because we just read and get the correct value. The second one is the problem, because it disrupts our program flow. Even worse, if this error happens when reading an operating system image, and we were to always try the same one (we have some mitigation against this), we could land in a boot loop if we don’t handle the situation.

Writing a handler for the flash ECCD NMI isn’t particularly hard using the cortex_m_rt and stm32l4 crates:

#[cortex_m_rt::exception]
unsafe fn NonMaskableInt() -> ! {
	let peripherals = unsafe { stm32l4r5::Peripherals::steal() };
	let reg_content = peripherals.FLASH.eccr.read();
	let is_flash_nmi: bool = {
		/// Note: initializes our custom flash abstraction
		let flash = Flash::new(peripherals.FLASH);
		if flash.is_dualbank() {
			/// In dual-bank mode, Bit 29 (ECCD2) is reserved, so only look at bit 31 (ECCD)
			reg_content.eccd().bit_is_set()
		} else {
			/// Bit 31 and Bit 29 - either lower or upper 64 bits of 128 bit value
			const ECCD_ECCD2_MASK: u32 = 0xa0000000;
			reg_content.bits() & ECCD_ECCD2_MASK != 0
		}
	};

	/// Address on 1MB bank + which bank it's on
	let dead_addr = reg_content.addr_ecc().bits() | ((reg_content.bk_ecc().bit() as u32) << 20);

	/// Some actual logic to handle this information
	if is_flash_nim {
		/// dead_addr has problems
	}
}

We essentially check a few bits to know that this is actually the flash ECCD NMI, and then extract the flash address of the offending 64 bit block.

In our bootloader we can now enable a custom boot mode that ensures that if at least one image is bootable, it is booted, which will enable us to fix this problem remotely.

That’s the theory. But how can we ensure that this works, and that our code handles this situation correctly? Usually, we would just run it in our tests, and see how it’s doing. However, since this handles a specific interrupt, we somehow need to trigger it intentionally. In other words, we need to mark certain blocks of the flash to make them trigger ECCD NMIs.

Placing ECCD NMIs

The STM32L4R5, as far as I know, does not offer a feature that enables us to generate an NMI on a custom-defined flash address. But that is exactly what we need to test our interrupt handler.

So I set out to explore my favorite RM0432 reference manual a bit more and found this interesting note:

Note: The contents of the Flash memory are not guaranteed if a device reset occurs during a Flash memory operation.

This gave me hope that it might be possible to corrupt a block when triggering a reset during a write operation, so I got to writing a small program that does the following:

  • First, the program reads the flash address it should corrupt
    • If it is already corrupted, the NMI handler will be executed. I’ve written one that turns on the green LED of the chip
  • Enable the hardware watchdog to reset us after a fixed time interval
  • Spend the majority of that time interval in a loop that busy-waits
  • Just towards the end, start a write operation into the flash

Then hopefully, the watchdog would reset us exactly when the write operation happens. And that actually turned out to work sometimes, I was really happy when I first saw the green LED come on.

To verify that the code actually did what I thought, I connected GDB to the chip and read out the FLASH_ECCR register, which contains information about flash ECC interrupts:

(gdb) x/wx 0x40022018
0x40022018:     0x80006000

In the value 0x80006000, the top bit means that the interrupt is actually the ECCD interrupt. The lowest 20 bit, or the last 5 hex characters, are the address of the block that was found to have two or more errors. This was exactly the address I had configured it to damage, so it was really nice to see it work as intended.

However, this would only work sometimes. The wait time can vary due to timings being slightly different, depending on temperature and other things, so a more dynamic approach that finds the correct timing was required.

Binary search over multiple resets

The approximate unit of time to wait varies a bit, but is in a certain range. In this case “unit of time” really just means how much overhead an almost empty loop has, because that’s what I used to wait before the flash programming start (there are honestly better ways, but this is one way, and it works).

So what I wanted to build is a binary search that keeps its state over resets. Keeping state is kind of the opposite of what a reset is intended to do, so a way to store data across resets was needed. The real time clock (RTC) of the MCU has 32 backup registers, which store 32 bits each. They are kept over multiple resets and thus enable us to keep state such as the bottom and top of the range that we are searching.

When doing a step, we first calculate the middle of the waiting range, busy-wait that amount of iterations, and then initiate flash programming. Once it’s finished, the blue LED turns on. Afterwards (or hopefully during the programming operation), the watchdog reset happens. The blue LED thus indicates that we need a lower timing. If the process worked, the green LED comes on, otherwise a next reset happens. If the program got into a spot where it cannot advance further (timings are a bit random after all), the red LED will come on. In that case, a manual reset can be done.

With that in mind, this is what destroying an address looks like in practice (Note the LD1-LD2 LEDs):

That’s essentially the entire thing in action. If the blue LED comes on, we have missed the point where we can interrupt, so once the watchdog triggers, we try again with a lower value. After a short pause of not seeing the LED turn on (this is where we took too little time and stopped before even programming the flash), short pulses return. At some point, we get the right timing, leading to a flash ECCD NMI, which is handled by turning on the green LED.

I uploaded the program to GitHub, so feel free to use it in your own testing.

Testing the bootloader

With this new tool under our belt, we can now intentionally affect flash addresses, especially ones on the metadata and image slot pages. Using the tool, I was able to verify that the bootloader can still boot our operating system even if all metadata pages and all but one operating system image contain a block where reading leads to an NMI.

This now gives me a reasonable peace of mind, even when the bootloader will be in space. To be honest, I will probably still have some worries for my first code in space, but at least now there is one less unknown.

Final note

If you think this kind of stuff is interesting and your company might be interested in supporting or sponsoring our student club, please reach out to me at philipp.erhardt@warr.de. Additionally, if your company has some space left on a satellite and wants to enable the next generation of builders to get hands-on experience, please also reach out. We are thankful for any support.

If you’re interested in hearing more about MOVE, satellites, or just want to stay updated on things like this, feel free to subscribe to the RSS feed of my blog or follow me on LinkedIn.

Thank you for reading!

https://blog.010.one/Dont-snipe-me-in-space-intentional-flash-corruption-for-stm32-microcontrollers
Extensions
How to fix fastboot device not visible and recovery flashing being stuck on Windows 11
AndroidWindows 11fastbootrecovery
This post shows how to fix two errors I ran into while flashing a custom recovery image using fastboot on Windows 11.
Show full content

I like trying out different Android-based operating systems and custom recoveries on my phone.

A custom recovery is basically a small operating system on your phone that you can boot into, allowing you to do things like flashing a new operating system or overwriting certain partitions. If you have used Magisk before, you’ve probably used a custom recovery to flash a patched boot image to root your phone.

The basic steps to installing a recovery are the following:

  • Make sure you have adb and fastboot installed on your PC
  • Put the phone in fastboot mode (usually by pressing the power button and volume down button at the same time while booting)
  • Run fastboot devices to make sure the device is visible to fastboot
    • This is where I had the first problem, a fix for Windows 11 is described below
  • Flash the recovery image
    • This is where I had a second problem: the flashing process seemed to be stuck forever. There’s a fix for that as well.

So now let’s get into installing a custom recovery.

Make the device visible to fastboot

To install a custom recovery, we use the fastboot tool. If you don’t have it installed, visit the official Android developer page and download the latest version for your operating system.

Put your device into fastboot mode and make sure it is recognized:

fastboot devices

In my case, the device didn’t show up in this list despite being in fastboot mode.

It took me ages to find the fix for that, so I decided to write this post to help others who might run into the same problem.

At first I installed the Universal ADB Drivers and made sure my adb and fastboot tools were at the latest version. However, neither of these fixed the problem.

At some point I found something interesting in the Windows 11 Update Settings: when going to Windows Update > Advanced Options > Optional Updates, there were some driver updates related to Android tools. I installed them and listed the devices again. This time, my device showed up.

Flashing the recovery image

Now it was time to flash the recovery image.

Installing a new custom recovery is rather easy if you know a bit on how to use command-line tools. When I recently installed OrangeFox, I downloaded the version for my phone (yours will very likely be different, so check your device codename etc!), unzipped the zip file and ran the following command in the folder where the recovery image was located:

fastboot flash recovery recovery.img

Fastboot was able to find my device, but the flashing process seemed to be stuck forever. After a few minutes I unplugged my phone, plugged it back in and also rebooted into fastboot mode. However, on the next attempt the flashing process was still stuck. Using different USB cables and ports didn’t help either.

What did fix the problem was the following:

  1. Make sure the device is in fastboot mode
  2. Unplug the device from the computer
  3. Now run the following command:

     fastboot flash recovery recovery.img
    
  4. This should show the < waiting for any device > message
  5. Plug the device back in and wait for the flashing process to finish
  6. Now flashing the recovery took around 2 seconds to complete

While a bit of a weird hack, in the end these were the steps that worked. I hope this helped you fix the problem as well.

https://blog.010.one/Fix-fastboot-recovery-flashing-stuck-forever
Declarative scraping for the modern web, or why your scraper breaks all the time
Web scrapers break all the time due to changes to websites. This post shows how to scrape modern sites with higher robustness.
Show full content

There are certain command-line tools we all use a lot. Whether it’s the GNU core utilities for quickly getting info about files, FFmpeg to convert between different image formats or youtube-dl to just download that small sound effect without having to find yet another free downloading site.

However, not all of theses tools are the same. How often have you updated the GNU core utils to try a new feature? Likely never. I have only updated FFmpeg intentionally like once, and that was when I came across a webp file for the first time. youtube-dl however? Very often.

That’s because the sites supported by it change all the time. The maintainers play the cat-and-mouse game and update the tool to fix yet another scraper that broke and prevented people from downloading yet another batch of sound effects.

The answer to why this happens is likely obvious to most readers, but stay with me for a different approach.

How web scraping works

Most web-scraper work very similar: they download an HTML page, parse it into a tree of elements and then run queries on that parsed tree. They define stuff like “I want the inner text of the span with the class price”, or “get the attribute src of the first video tag”. These are all fine things, but they are prone to breaking. If a CSS class is renamed or an element is moved somewhere else, the scraper breaks and needs to be fixed.

It’s even worse when programs need to extract JSON data from within a page. A regex like the following works, but is also really prone to breaking:

_YT_INITIAL_DATA_RE = r'(?:window\s*\[\s*["\']ytInitialData["\']\s*\]|ytInitialData)\s*=\s*({.+?})\s*;'

And after that regex was used to extract data, there’s still the problem that sites like YouTube deliver a very nested JSON document with at least a dozen levels of depth.

So in general, I think it is fair to say that imperatively describing how to get data from the page works fine for a while, but starts to break on most small changes, requiring updates.

SQL

Let’s talk about SQL for a bit. Yes, it has almost nothing to do with web scraping, but it has some nice properties I think we should have in web scraping.

The difference between SQL and most other languages we programmers use is that SQL is declarative. This means that we don’t tell the database system what it should do do get the data, we just tell it what kind of data we want. We define properties and conditions the result must have. The database management system must find a way to satisfy our query somehow. As users of database systems we don’t need to know or care whether the where condition was executed as an Index-Join, Hash-Join or a nested loop. We just get the data.

In web scraping we use the usual, imperative way of describing how to get different variables from the page. Sometimes we even programmatically navigate a browser just because the site is rendered exclusively using JavaScript.

I think we should do web scraping differently, a bit more like SQL.

A declarative approach

Now let’s think about how we could bring a more declarative approach to web scraping.

Modern web sites that use JavaScript for rendering their content often come with a rather large snippet of JSON data in their payload that describes what kind of page should actually be shown. We could now do the naive approach of extracting the JavaScript variable using a regex, but we also know that this is prone to breaking in the future.

Another problematic thing about this is the structure of the JSON data itself: if you want to get elements that are nested 20 levels deep, there are 20 different chances of something being renamed and breaking your scraper.

So here are basically the key points a declarative approach should solve:

  1. Stop relying on data location (e.g. “the object after var x = {...}”)
  2. Reduce dependency on internal data naming (e.g. the keys within the extracted JSON data)

And if we think about it, it actually sounds pretty easy: just write a program that finds any (largeish) JSON object in a page and then iterate over all levels in it to find what we are looking for (e.g. all objects with a title and videoId key).

If the tool is able to find any object in a page, we also don’t need to care about the position of the data anymore. And if we only rely on a minimal set of attributes the objects we’re looking for should have, then we don’t need to care if someone changes the structure of everything else.

Enter jsonx, a tool that does just that. If you have the Go toolchain installed, you can just install it from source using the following command. Alternatively, there’s binaries for Linux and Windows here.

go install github.com/xarantolus/jsonextract/cmd/jsonx@latest

Now we can just tell the tool to get all objects that have a videoId, title, and channelId from a page (I also added jq for nicer formatting of the output):

$ jsonx "https://www.youtube.com/watch?v=-Oox2w5sMcA" videoId title channelId | jq
{
  "videoId": "-Oox2w5sMcA",
  "title": "Starship Animation",
  "lengthSeconds": "310",
  "channelId": "UCtI0Hodo5o5dUb67FeUjDeA",
  "isOwnerViewing": false,
  "shortDescription": "",
  "isCrawlable": true,
  "thumbnail": {
    "thumbnails": [
      {
        "url": "https://i.ytimg.com/vi/-Oox2w5sMcA/hqdefault.jpg?sqp=-oaymwEiCKgBEF5IWvKriqkDFQgBFQAAAAAYASUAAMhCPQCAokN4AQ==&rs=AOn4CLDqv77rSQ83UV-8s5rWMX8iInJcgQ",
        "width": 168,
        "height": 94
      },
      {
        "url": "https://i.ytimg.com/vi/-Oox2w5sMcA/hqdefault.jpg?sqp=-oaymwEiCMQBEG5IWvKriqkDFQgBFQAAAAAYASUAAMhCPQCAokN4AQ==&rs=AOn4CLAizx8wyIv50KOlkMRQnj8WAAgJ1w",
        "width": 196,
        "height": 110
      },
      {
        "url": "https://i.ytimg.com/vi/-Oox2w5sMcA/hqdefault.jpg?sqp=-oaymwEjCPYBEIoBSFryq4qpAxUIARUAAAAAGAElAADIQj0AgKJDeAE=&rs=AOn4CLBL7HeKYvEL8u3Glg0SLPGGZNgtSg",
        "width": 246,
        "height": 138
      },
      {
        "url": "https://i.ytimg.com/vi/-Oox2w5sMcA/hqdefault.jpg?sqp=-oaymwEjCNACELwBSFryq4qpAxUIARUAAAAAGAElAADIQj0AgKJDeAE=&rs=AOn4CLDsOBxYvamnjSZZPKkIx87_JttNIQ",
        "width": 336,
        "height": 188
      },
      {
        "url": "https://i.ytimg.com/vi/-Oox2w5sMcA/maxresdefault.jpg",
        "width": 1920,
        "height": 1080
      }
    ]
  },
  "allowRatings": true,
  "viewCount": "1421951",
  "author": "SpaceX",
  "isPrivate": false,
  "isUnpluggedCorpus": false,
  "isLiveContent": false
}

So isn’t that just nice? We just said “I want all objects with these three attributes from this page” and it just worked. No need to look into the full structure of the page or data. We just describe what we want and the tool figures out the rest.

Obviously it relies on some object in the JSON tree having these three attributes, but compared to different approaches this is a very minimal dependency. So this approach now “just works”, is simpler to use and is arguably less prone to breaking.

Drawbacks

As with any approach, this one also has its disadvantages.

First of all, it does not work with all web pages, as most pages deliver their content mostly using HTML. Pages with JSON are somewhat rare, but if the data is there, it will be easy.

The second drawback is that not all data in JavaScript snippets of pages is actually valid JSON. Just add a NaN somewhere and it’s no longer valid JSON, which would break the scraper. The jsonx tool works around this by using a JavaScript lexer to directly transform some invalid tokens to valid JSON (e.g. NaN just becomes null). So jsonx is very liberal in what it accepts, reminding of the robustness principle.

The third drawback is somewhat implementation-specific: if you feed thousands of opening braces [ into the tool, it gets noticeably slow. That’s because as soon as it doesn’t find a matching bracket or the content between the two brackets is invalid JSON, it needs to go back to the first bracket and continue from there, possibly doing the same thing over and over (so this can become somewhat of an O(n2) complexity if I’m not mistaken). This doesn’t happen much in real pages, but a website looking to fight scrapers could use this implementation weakness.

What I want you to do

If you build a tool or app that could use this approach, you should definitely try to implement the data extraction part that just looks at everything in a page starting with [ or { in search for validish JSON data.

Also not relying on the data structure is very important. Feel free to implement logic in a programming language of your choice that parses JSON and dynamically finds only objects with certain keys, no matter the nesting. It’s actually pretty simple, you just need to do a case distinction between arrays (-> recursively iterate all objects in them), objects (-> check if they have all required keys) and primitive data types (ignore).

And if you like the approach, you should implement it in your scraper! This makes the software we use every day more robust, which is a goal we should strive for.

Conclusion

If you found this interesting, feel free to comment by opening an issue on my blog repository or send me an e-mail.

If you’re interested in low-level Android stuff, you can read my post about the Linux multitouch protocol on Android. Alternatively if you’ve heard of or have a KNX “smart home” system, you might be interested in this other post about my KNX setup.


Side note

This is not a rant about youtube-dl. In fact, I’m a big fan and thankful that people take the time to maintain it. The examples are used to illustrate what we programmers usually do because it works and are not meant to point fingers.

https://blog.010.one/declarative-web-scraping-for-the-modern-web
How to programmatically interact with a KNX smart home system
Interacting with a KNX system isn't always easy. This article shows how to write programs that switch lights and why you might want to do that. It also shows some demos of programs I use that interact with KNX.
Show full content

Note: This article is basically a guide on what I had to figure out on my own when interacting programmatically with a KNX system. Some things can be very dependent on how your setup works. I’m also not a KNX expert in any way, much of this stuff was found by “trial and error” instead of reading kind of outdated documentation.

Imagine this: You have an an alarm clock that sets itself according to your online calendar. You go to bed without having to set or think about it. And in case an event in the morning gets cancelled, it will notice and adjust your wakeup time while you sleep. No waking up for no reason!

Then when it’s time to wake up, a very soft sound starts playing. You can’t really hear it right now, but it steadily climbs up to a normal volume. At the same time, the light in your room turns on automatically and progresses from very dim to a normal brightness within a minute. At that level of brightness, it’s impossible to go back to sleep.

That’s basically how the mornings of my last few years of school went. The alarm clock ran on a Raspberry Pi and looked at the school’s website to find out if the teachers I had in the morning couldn’t come that day.

The most interesting part of this is how the alarm is able to turn the lights on and off. This is possible thanks to the KNX system at home. Let’s get into the details.

Note that in the code examples, I will use this KNX library for the programming language Go. It is important to note that the concepts are important, not the code itself. I have also successfully used this Node.JS library in the past, so it really doesn’t matter what you use. There are of course other libraries for other programming languages that might work for you.

Connecting

The assumption is that you already have a KNX system that is set up to be able to control the lights and the shutters. As in, when you send the packets from the ETS software, you can control the lights etc.

So what we want to to consists of two steps:

  • Connect to the KNX system
  • Send messages to switch certain lights

In my setup, I want to connect to a KNX IP BAOS 772 (Bus Access and Object Server). In KNX terms, this component is called a gateway. There are multiple ways to connect to a KNX system in the Go library I mentioned, but in this case the one we need is the “group tunnel”.

So to connect, we write something like the following code:

// Connect to the gateway.
client, err := knx.NewGroupTunnel("10.0.0.7:3671", knx.TunnelConfig{
    ResendInterval:    500 * time.Millisecond,
    HeartbeatInterval: 10 * time.Second,
    ResponseTimeout:   30 * time.Second,
})
if err != nil {
    log.Fatal(err)
}
// Close upon exiting. Even if the gateway closes the connection, we still have to clean up.
defer client.Close()

This is very close to the example given by the library.

Which IP to connect to?

You might wonder which IP and port you need to connect to. The port really should be 3671. For the IP you can look into the network overview of your router (where you see all kinds of IP addresses). Now we search for a device with “BAOS” in the name. In my case, it wasn’t there. It seemed to have gotten a default name from the router. So I had to go through all unknown devices, copy their IP address (e.g. 192.168.178.41) and visit it in a browser (http://192.168.178.41). At some point, you should find an almost empty page that contains only the name of the BAOS component, like this:

The web page of the KNX BAOS just shows a description of the model, in this case 'KNX IP BAOS 772'

So in my case, the gateway address string in the code (first argument of NewGroupTunnel) should be 192.168.178.41:3671. Let’s start the program and see if it works.

Possible errors

There are a bunch of error conditions I have faced while developing my own software that I just want to tell you about here. The connection to this gateway is a bit… interesting.

Multiple connections

The first thing you should try when the connection doesn’t work is closing ETS (or at least disconnecting it from the KNX system) and anything else that is connected to the KNX system. What I found out, at least about this gateway, is that it seems to only support exactly one connection at once. When you connect from your code, you might get an error like Response timeout reached. ETS4 is a bit more descriptive with the following message (german):

Fehler beim Öffnen der Verbindung: Die Schnittstelle konnte nicht geöffnet werden. Der Tunneling-Server ist erreichbar, aber er akzeptiert keine Verbindungen mehr zu diesem Zeitpunkt

Error when opening the connection: The interface could not be opened. The tunneling server is reachable, but it no longer accepts connections at this time

So basically the solution to this is to only have one thing connect to the KNX system at a time. You can’t use your own software and ETS at the same time.

Timeout

Another thing to note is that connecting to this BAOS gateway seems to be very slow. The default timeout of 10 seconds of the Go library was often not enough in my case. Normal pings are however answered very quickly, so my guess is that the actual software just does… interesting stuff (aka being slow for some reason).

So anyways, increase the timeout and build a reconnection logic into your program. So your program should hold the connection all the time (because the initial connection takes long, and you don’t want to wait 30 seconds before the light turns on or off). And for that initial connection code, you should add something like an exponential backoff timer to only reconnect after 30 seconds, then a minute, then two, four etc. After an unexpected disconnect the gateway seems to take 30 seconds to a few minutes until it can accept connections again, which can be annoying for debugging. Make sure to always call client.Close() before stopping your program, else you might need to wait a bit.


Sending signals, switching lights

So now I assume that you have a working, connected KNX client in the code – the very same that we set up in the previous section.

The KNX library now provides the following example code to send 20.5°C to group address 1/2/3.

err = client.Send(knx.GroupEvent{
    Command:     knx.GroupWrite,
    Destination: cemi.NewGroupAddr3(1, 2, 3),
    Data:        dpt.DPT_9001(20.5).Pack(),
})

We of course want to adapt this to a light switch.

So in the ETS4 software there’s a tab for “group addresses”, and when you right-click on one, you can read/write a value:

The 'group address' window shows the address we want to write to, so we click 'read/write value' and then read the data point type from the group monitor window

In the “group addresses” window, we select the light we want to switch for now (for debugging purposes). We right-click it, and ETS will open the “group monitor” window, which shows the group address. The type of data we need to send should be preconfigured.

Note that there are (at least) two formats for addresses: one with two numbers (1/2) and one with three numbers (1/2/3). Just make sure to use exactly the format that ETS uses.

So when we revisit the send snippet above, we can now write the following for a light switch:

err = client.Send(knx.GroupEvent{
    Command:     knx.GroupWrite,
    Destination: cemi.NewGroupAddr2(1, 91),
    Data:        dpt.DPT_1001(true).Pack(),
    Source:      cemi.NewIndividualAddr3(15, 15, 15),
})
  • The Command property is obvious: we want to send something, so we write our signal to the connection.
  • The Destination is the group address we want to send to. Since 1/91 has two numbers, we choose the NewGroupAddr2 constructor (instead of NewGroupAddr3 for 3 numbers)
  • For Data it’s important that the data format is correct. In the screenshot we can see “1.001 Schalten” as data type, so now we use the “Data Point Type 1001”, aka DPT_1001. Here true stands for on; false would turn the light off
  • We can also add a Source address, which identifies who sent the signal. I’m not 100% sure if the signal is accepted without a source, but you can just add it.

And that’s basically it. This now allows you to turn the light on and off. When changing the destination address, you should be able to switch any light connected to the KNX system.

Some things to note

I will be honest, when I started playing with the system I was kind of afraid that I could break it in some way. So here are some tips in Q&A style:

Can I break something in the system by turning on a light that is already on?

  • No. When you send an “on” signal (aka dpt.DPT_1001(true)), nothing happens when the light is already on.

How can I toggle a light without directly sending the new state it should have?

  • It doesn’t seem to be possible to just toggle a light. In order to toggle a light, your application needs to read the light state, then invert it. In my case, sending a knx.GroupRead command didn’t really do anything and also never returned any data (also in ETS, so reading doesn’t seem to work at all). The solution to this is to listen to inbound messages (basically you can listen to all events sent over KNX), and then you have to keep a mapping of light addresses to their current state. And now when you want to toggle a light, you basically invert the last state you received about that light. So yeah, rather annoying but possible to do.

Interesting applications

Now that we know how to switch lights (and shutters, and basically anything else in the system) I want to tell you about a few projects you can do with that knowledge.

Home software

The most obvious thing is to just make a website where you can switch lights.

Since the BAOS only allowed one connection, I created a “Hub” software that other software can send commands to. So it basically works like this:

This diagram shows the setup of how my programs interact with the hub, which connects to the KNX BAOS system

So this allows any software to just connect to the hub to receive live events (via a WebSocket connection) if it needs to. Other software can just use the REST API, which means that it can send really simple post requests to switch lights without having to know all the KNX stuff. This is especially useful for automation apps like “Siri Shortcuts” or the Android equivalent “Tasker” that allow you to send simple HTTP requests.

The hub runs on a Raspberry Pi and really doesn’t need much resources. It just needs to read which lights are switched by the KNX system and update its internal state accordingly. When a light switch request comes in, it inverts the last known state of the light and sends that to KNX. That way, a light that was on is switched off and vice-versa. So now let’s use the hub for real.

A light switch on your phone

On my Android phone, I use Tasker to send a HTTP request to the hub whenever I press a widget on my phone. With the introduction of the Android 11 power menu, this got even more interesting:

The Android 11 power menu shows light switch controls added using Tasker

Basically when I tap the button, Tasker sends a request to the hub (this request includes the group address of the target light). It checks if the light in question is on or off, and sends a request with the inverted state to the KNX BAOS (as described in the section about sending signals).

A light switch website

Since we can read live data from the hub, we can create a website that displays the current state of some light switches (e.g. by room). This site should of course also allow switching the lights.

And here’s what I came up with for my room:

A demo of my 'home' website that shows the light switches for my room and the current weather. It is possible to switch the switches from the site

The buttons switch automatically when the KNX system receives a switch event (either from physical light switches or from the hub). And it is of course also possible to switch the light using the switches on the website directly. I can’t tell you how surreal of a feeling it is when you switch a physical light switch and the website updates within milliseconds; it’s just cool to see.

Alarm clock

Another application of automatic light switching – as mentioned in the intro of this article – is an alarm clock. It really helps you wake up when the light is already on – there’s no chance to fall sleep again after that.

The most important part is calculating when you need to wake up (e.g. depending on an online calendar) and adding a fallback wakeup time in case the online source isn’t available for some reason.


Conclusion

In general it can be said that the KNX system is kind of annoying to use. But once you figure out the basics and make them work in your program, then it’s rather easy to apply the data gained from it (e.g. live switch events) to other software like the website.

I hope this article helped you in the quest of programmatically automating lights in your home and might have given you one or two ideas on what it could be useful for. If you have any questions please feel free to reach out either on GitHub (e.g. via an issue on my blog repository) or via an e-mail to document.getElementById('mail-span').innerText = atob('eGFyYW50b2x1c+RwbS5tZQ==').replace('ä', String.fromCharCode(8*8))[not available without JavaScript].

Thanks for reading!

https://blog.010.one/programmatically-interact-with-a-KNX-smart-home-system
How to tap the Android screen from the underlying Linux system
In recent years phone screens seem to only have gotten bigger. This is great because it allows you to see more on your screen, but it also has some drawbacks. One of them has been very annoying to me: I can no longer reach buttons at the top left of the screen in a comfortable way.
Show full content

In recent years phone screens seem to only have gotten bigger. This is great because it allows you to see more on your screen, but it also has some drawbacks. One of them has been very annoying to me: I can no longer reach buttons at the top left of the screen in a comfortable way.

In a way, I would divide the screen in three areas:

  • Easy to reach: the area can be reached with the thumb while holding the phone.
  • Not comfortable: you can reach the area, but it’s not as comfortable as the previously mentioned one.
  • Unreachable: this area is not in the reach of my thumb without repositioning my hand at the edge of the phone.
Here is a screenshot with an overlay that shows which areas are easy to reach with a thumb

So to me the most annoying buttons are those at the top left. While those on the top right can still be reached with a little effort, the ones in the top left corner require more effort.

So how do we solve this problem?

The best way I came up with to solve this problem was a simple idea: What if there was a way to tap the top left corner without leaving the “Easy to reach” category?

My phone has a fingerprint scanner at the back that is very easy to reach. This scanner also doesn’t have any functionality when the phone is unlocked.

Detecting a finger on the sensor

So I took a look at the Android system log and found the following lines when putting the finger on and off the sensor:

fpc_fingerprint_hal: report_input_event - Reporting event type: 1, code: 96, value:1
fpc_fingerprint_hal: report_input_event - Reporting event type: 1, code: 96, value:0

The only relevant difference between these lines is the number at the end – 1 for “finger down”, 0 for “finger up”.

So that was easy – just write a program that scans the logcat output, detects these lines and then runs the input tap x y shell command to tap a specific point. Right?

No.

It’s so slow

The input command seemed very slow to me. It took quite some time from tapping the sensor to a reaction to the click. While testing it appeared to take at least 300ms, often worse with about 400ms.

According to a lot of anecdotal evidence, actions that take 100ms or less are perceived as instant. So this command definitely fails all expectations of “instant” (it was probably not designed to be fast, anyway). But why is that?

The “input” command

Android comes with a lot of different commands in /system/bin. Most of them are to be expected in a typical Linux environment (like tail, cat etc.) and some of them are specific to Android.

The input command, to my surprise, was just a shell script:

#!/system/bin/sh
# Script to start "input" on the device, which has a very rudimentary
# shell.
#
base=/system
export CLASSPATH=$base/framework/input.jar
exec app_process $base/bin com.android.commands.input.Input "$@"

If I read that correctly, it basically starts a Java program that can simulate a tap. There are also other actions it can do but for this post I don’t care.

Reducing the delay

One method to not have the long, noticeable delay is – quite simply – not relying on the input command. It just writes some data, that shouldn’t be too hard to copy. So instead of starting a script that starts a program that writes a small piece of data, we can just write it ourselves.

But what should we write and where should the data be written?

I don’t know exactly why, but I never really looked at the documentation (also this now makes a lot more sense) and started reverse-engineering this… open source protocol. Yea… anyway.

The first step when trying to reproduce a behavior is watching it. So how can we watch taps on the screen as they happen?

The getevent utility allows us to watch certain events happen in real time. It also makes it easy to list device files associated with those events.

Using getevent -pl (in a root shell on the phone) we can get a nice overview of devices, their events and device file paths:

chiron:/ $ getevent -pl
add device 1: /dev/input/event6
name:     "msm8998-tasha-snd-card Button Jack"
events:
    KEY (0001): KEY_VOLUMEDOWN        KEY_VOLUMEUP          KEY_MEDIA             BTN_3
                BTN_4                 BTN_5
input props:
    INPUT_PROP_ACCELEROMETER
add device 2: /dev/input/event5
name:     "msm8998-tasha-snd-card Headset Jack"
events:
    SW  (0005): SW_HEADPHONE_INSERT   SW_MICROPHONE_INSERT  SW_LINEOUT_INSERT     SW_JACK_PHYSICAL_INS
                SW_PEN_INSERTED       0010                  0011                  0012
input props:
    <none>
add device 3: /dev/input/event4
name:     "uinput-fpc"
events:
    KEY (0001): KEY_KPENTER           KEY_UP                KEY_LEFT              KEY_RIGHT
                KEY_DOWN              BTN_GAMEPAD           BTN_EAST              BTN_C
                BTN_NORTH             BTN_WEST
input props:
    <none>
add device 4: /dev/input/event3
name:     "gpio-keys"
events:
    KEY (0001): KEY_VOLUMEUP
    SW  (0005): SW_LID
input props:
    <none>
add device 5: /dev/input/event0
name:     "qpnp_pon"
events:
    KEY (0001): KEY_VOLUMEDOWN        KEY_POWER
input props:
    <none>
add device 6: /dev/input/event2
name:     "uinput-goodix"
events:
    KEY (0001): KEY_HOME
input props:
    <none>
add device 7: /dev/input/event1
name:     "synaptics_dsx"
events:
    KEY (0001): KEY_WAKEUP            BTN_TOOL_FINGER       BTN_TOUCH
    ABS (0003): ABS_X                 : value 0, min 0, max 1079, fuzz 0, flat 0, resolution 0
                ABS_Y                 : value 0, min 0, max 2159, fuzz 0, flat 0, resolution 0
                ABS_MT_SLOT           : value 9, min 0, max 9, fuzz 0, flat 0, resolution 0
                ABS_MT_TOUCH_MAJOR    : value 0, min 0, max 255, fuzz 0, flat 0, resolution 0
                ABS_MT_TOUCH_MINOR    : value 0, min 0, max 255, fuzz 0, flat 0, resolution 0
                ABS_MT_POSITION_X     : value 0, min 0, max 1079, fuzz 0, flat 0, resolution 0
                ABS_MT_POSITION_Y     : value 0, min 0, max 2159, fuzz 0, flat 0, resolution 0
                ABS_MT_TRACKING_ID    : value 0, min 0, max 65535, fuzz 0, flat 0, resolution 0
input props:
    INPUT_PROP_DIRECT

It looks confusing at first, but especially the last device is interesting: It has all kinds of events that are associated with a multitouch device. That’s our screen. So now we know where to write data, the device file /dev/input/event1.

The question what we should write can be answered by watching the getevent -l output:

/dev/input/event1: EV_ABS       ABS_MT_TRACKING_ID   0000504c
/dev/input/event1: EV_KEY       BTN_TOUCH            DOWN
/dev/input/event1: EV_KEY       BTN_TOOL_FINGER      DOWN
/dev/input/event1: EV_ABS       ABS_MT_POSITION_X    00000037
/dev/input/event1: EV_ABS       ABS_MT_POSITION_Y    0000008d
/dev/input/event1: EV_SYN       SYN_REPORT           00000000
/dev/input/event1: EV_ABS       ABS_MT_TOUCH_MAJOR   00000006
/dev/input/event1: EV_SYN       SYN_REPORT           00000000
/dev/input/event1: EV_ABS       ABS_MT_TRACKING_ID   ffffffff
/dev/input/event1: EV_KEY       BTN_TOUCH            UP
/dev/input/event1: EV_KEY       BTN_TOOL_FINGER      UP
/dev/input/event1: EV_SYN       SYN_REPORT           00000000

This is the output when doing a single tap in the top left corner of the display. Note that the numbers next to ABS_MT_POSITION_{X,Y} are the coordinates I just tapped. So the question is: how do we translate this? Not at all, we just remove the -l (“label event types and names in plain text”) option to get a more “raw” data stream:

/dev/input/event1: 0003 0039 0000504d        # ABS_MT_TRACKING_ID  
/dev/input/event1: 0001 014a 00000001        # BTN_TOUCH           
/dev/input/event1: 0001 0145 00000001        # BTN_TOOL_FINGER     
/dev/input/event1: 0003 0035 00000037        # ABS_MT_POSITION_X   
/dev/input/event1: 0003 0036 0000008d        # ABS_MT_POSITION_Y   
/dev/input/event1: 0000 0000 00000000        # SYN_REPORT          
/dev/input/event1: 0003 0030 00000006        # ABS_MT_TOUCH_MAJOR  
/dev/input/event1: 0000 0000 00000000        # SYN_REPORT          
/dev/input/event1: 0003 0039 ffffffff        # ABS_MT_TRACKING_ID  
/dev/input/event1: 0001 014a 00000000        # BTN_TOUCH           
/dev/input/event1: 0001 0145 00000000        # BTN_TOOL_FINGER     
/dev/input/event1: 0000 0000 00000000        # SYN_REPORT          

OK, so that is the data. And we know where to write it. But still… how? Let’s take a look at the source code of the sendevent command. It seems to basically be a lower-level version of the input command (not really, but still kind of).

The most interesting part is the input_event struct, which is filled with data and then written to a device file:

struct input_event {
	struct timeval time;
	__u16 type;
	__u16 code;
	__s32 value;
};

So before we had three columns with numbers in our output, and now we have three unsigned integers we want to fill with data: type, code and value. The getevent command outputs hex numbers, so we have to make sure we don’t accidentally use the wrong number format when specifying them in a program (definitely never happened to me…sure ;)).

Putting it all together

Now all we have to do is write the twelve events we observed previously in sequence to the device file and then test the program.

While implementing this is possible in any language, I chose Go for the task because of the ability to easily cross-compile from Windows to Arm64 Android. It also made it extra easy to define the events needed for a single tap:

// Define the input_event struct, but in Go
type InputEvent struct {
	Time  syscall.Timeval
	Type  EventType
	Code  EventCode
	Value uint32
}

// Some const definitions, names are from the getevent output
type EventType uint16

const (
	EV_ABS EventType = 0x0003
	EV_KEY EventType = 0x0001
	EV_SYN EventType = 0x0000
)

// Known event codes for a touch sequence
type EventCode uint16

const (
	ABS_MT_TRACKING_ID EventCode = 0x0039
	BTN_TOUCH          EventCode = 0x014a
	BTN_TOOL_FINGER    EventCode = 0x0145
	ABS_MT_POSITION_X  EventCode = 0x0035
	ABS_MT_POSITION_Y  EventCode = 0x0036
	ABS_MT_TOUCH_MAJOR EventCode = 0x0030
	SYN_REPORT         EventCode = 0x0000
)

// Value field of BTN_TOUCH, BTN_TOOL_FINGER
const (
	TOUCH_VALUE_DOWN = 0x00000001
	TOUCH_VALUE_UP   = 0x00000000
)

// This event happens more often; marks the start/end of a sequence
var eventSynReport = InputEvent{
    Type:  EV_SYN,
    Code:  SYN_REPORT,
    Value: 0x00000000,
}

// touch is the whole sequence of events that simulates a single tap
// While testing it seemed like not all SYN_REPORT events are necessary,
// but we will just use the same sequence as observed above
var touch = []InputEvent{
    {
        Type:  EV_ABS,
        Code:  ABS_MT_TRACKING_ID,
        Value: 0x0000e800, // Touch tracking ID, seems like we don't need to care about it
    },
    // Pretend to put the finger down
    {
        Type:  EV_KEY,
        Code:  BTN_TOUCH,
        Value: TOUCH_VALUE_DOWN,
    },
    {
        Type:  EV_KEY,
        Code:  BTN_TOOL_FINGER,
        Value: TOUCH_VALUE_DOWN,
    },
    // Top left corner
    {
        Type:  EV_ABS,
        Code:  ABS_MT_POSITION_X,
        Value: 0x00000071,
    },
    {
        Type:  EV_ABS,
        Code:  ABS_MT_POSITION_Y,
        Value: 0x000000a3,
    },
    eventSynReport,
    {
        Type:  EV_ABS,
        Code:  ABS_MT_TOUCH_MAJOR,
        Value: 0x00000005,
    },
    eventSynReport,
    {
        Type:  EV_ABS,
        Code:  ABS_MT_TRACKING_ID,
        Value: 0xffffffff,
    },
    // Now put the finger up again
    {
        Type:  EV_KEY,
        Code:  BTN_TOUCH,
        Value: TOUCH_VALUE_UP,
    },
    {
        Type:  EV_KEY,
        Code:  BTN_TOOL_FINGER,
        Value: TOUCH_VALUE_UP,
    },
    eventSynReport,
}

Now we just write our sequence to the device file f:

// Assumption: f is the opened display device file /dev/input/event1
for _, ievent := range touch {
    err := binary.Write(f, binary.LittleEndian, ievent)
    if err != nil {
        panic("writing input event: " + err.Error())
    }
}

You can find the whole program here.

One interesting detail about the sequence is that it doesn’t always have to be the same. Sometimes, there are more SYN_REPORT events in a sequence, but interestingly they do not appear to change the result. According to the documentation, if no SYN_REPORT has been sent between two events, they are seen as sent in the same moment of time; so this event type acts as a separator.

Now that we have the code for a single tap, we can of course adjust the code to be able to tap any position by simply changing the x and y values.

In my tests this program has been a lot faster than the method with the input command, which was a nice outcome.

Actually using it

Now that we have done all the work to get a working tap program, we only need to integrate it into a program that detects the fingerprint press, then sends those events. I’ll spare you the details on that, you can see the whole program on GitHub.

It’s basically a daemon that runs in the background and detects the aforementioned log lines to react with a tap. It also has a few more commands, but they are not as technically interesting as the tap.

I also packaged the program into a Magisk (root solution with addons) module as that allows me to easily run it on boot.

Further ideas

One could use getevent and this method of writing events to create an event recorder that can accurately replay sequences of events. So if you want to automatically input a pin on the lock screen, that should be possible (the screen device file doesn’t have any restrictions on when the tap can happen, I think the input command is limited to an unlocked phone only, no lock screen access).

Thanks

If you found this interesting and want to create something like this or adapt the program for your phone, take a look at the repository.

If there are any mistakes in this post please feel free to point them out (by email, reddit etc.). Thank you :)

This post is also available on dev.to in case you want to comment there.

https://blog.010.one/how-to-tap-the-android-screen-from-the-underlying-linux-system
Extensions
How to run a python script from GitHub, no experience required
In the past weeks people often asked me how to run a python script they found on GitHub. So here’s a full guide for beginners on how to do that, which pitfalls exist and how to avoid them.
Show full content

In the past weeks people often asked me how to run a python script they found on GitHub. So here’s a full guide for beginners on how to do that, which pitfalls exist and how to avoid them.

I will explain all necessary details you need to know to get it running using examples, screenshots and videos.

But before starting please make sure the following is true:

  • You’re using Windows 10
  • The project you’re trying to run is using Python as programming language. GitHub will show a “Languages” section at the right side of the project page, which should look like this:
The languages section should list 'Python'

You can click images to enlarge them

So here’s our plan:

  1. Preparation
  2. Install Python
  3. Install the script you want to run
  4. Run the script

If anything unexpected happens along the way, you can also jump to the help section to see if there’s a tip for you.

Preparation

In the beginning, we will need to prepare some settings to make sure the installation process works correctly.

Disabling preinstalled aliases

Windows 10 comes with certain shortcuts preinstalled, which can be annoying when starting a python script. This is why we disable them.

To do so, search for “Manage App execution aliases” in the Windows 10 search bar typically located at the lower left side:

Windows search bar

In this settings window, we’ll disable everything that has to do with Python, which includes “python”, “idle” and the app installer that also mentions “python.exe”. After that, it should look similar to this:

Windows settings: App execution aliases Install Python

Now that we prepared everything, we can proceed by installing Python. Python is the programming language used by the project we want to use. Later, we’ll basically tell the “python” program to start the program we got from GitHub.

To start off, we might need to know which version to install. Do a quick check if the program you want to use mentions a specific version (e.g. “above version 3.4” or “use python version 3.8 or higher”). If it doesn’t mention the version, just choose the newest one.

Download

Head over to the official download page and either download the newest version or choose the version that was specified on the projects’ page:

Python download page

If you choose a specific version, you’ll get to the download page of that version. Find the “Files” section there and click on “Windows installer (64-bit)”:

Select this installer from the files section Installation

Now that we downloaded the correct package, we need to run the installer. Make sure the “Add Python to PATH” box is checked and continue with “Install Now”.

Python installer settings

If you get an error during the installation, you might want to start the installer again, but with admin rights from the beginning. To do that, right-click the file and choose “Run as administrator”.

Finding python

Now use the Windows 10 search bar to make sure python is installed (just search “Python”). If you installed Python 3.9.1 (like I did), you should find it there. Please note that the other versions shown here are not important for us and you only need the one you installed.

Windows 10 search listing for python

After clicking on the right arrow near the program name, the menu shown here should come up. There, we’ll click “Open file location”.

A new window should open with a file listing. There, we right-click on the selected file and again open its file location:

Click 'Open file location' for this step

This will lead to the directory we actually need. One file named “python” will already be selected:

This is the directory we want

Please keep this window open for later, we’ll need it.

Script installation

Now everything is prepared and we can finally install the actual script.

This is where your part will likely be a bit different from what I’m doing, but the general stuff should be the same.

The project you want to use likely has installation instructions. You should follow them, but to do that you need to know several things:

Open Command prompt

Instructions are often written as commands issued to the computer.

They might look like this:

pip install -U gallery-dl

or

python -m pip install -U gallery-dl

or

python3 script.py

or

python script.py

You have to type these into the command prompt, which is a window we’ll open next. Type in “cmd” in the search bar and open it.

Open command prompt

It’s just a window where we can type in text:

An empty command prompt

Now we have at least three windows open:

  • The one that contains the “python” / “python.exe” file (we opened it in “Finding python” before)
  • The command prompt window we just opened
  • Your browser with this page and the project page
Script installation

Here comes the part where we actually install the program we want to use.

Let’s imagine I wanted to download all images from this Flickr account. I found the command-line tool gallery-dl on GitHub and want to install it.

Its installation instructions mention the following:

pip install -U gallery-dl

When you type that into the command line, it might work. To make sure it works 100% sure, we have to do some extra steps:

  1. Drag & drop the “python” file from our opened window into the command prompt. This will fill in a long path.
  2. Write a space character (just “ “, without quotes) in the command prompt window
  3. This would start python, but we want to start pip (first word in the command above). We tell python to start pip by adding -m, then our actual command (pip install -U gallery-dl) that should be started. This is the command we actually type in:
    C:\directory\python.exe -m pip install -U gallery-dl
    

Here’s a quick video on how it works:

In general you’ll be given some commands you should type in to install. For each command, we try the following schema.

If the it starts with…

  • python / python3 / py: drag & drop python in the command prompt window, add a space and then copy everything after the word python / python3 in there (add a space between the long python path and everything else)
  • pip / pip3: you want to drag & drop python in the window, then write a space, then -m pip, then another space and everything after the word pip / pip3
  • anything else: you likely have to do the same as above, add -m (with a space in front of it!) and then type in/paste the whole command. If it doesn’t work on the first try replace any _ (after -m) with - (or vice-versa)

Now type in all commands that are given/required by the authors of the script.

If you ever accidentally press enter too early and now you’re stuck in python’s interactive mode (the line where you type will start with >>>), you can type in exit() to get back to the normal command line.

This is pythons' interactive mode Run the script

The project page mentions that I can run gallery-dl by typing this in the command prompt:

gallery-dl 

But that might not work.

This is why we now drop python in the command prompt again, add -m (with space in front of it) and then finally add the command from above:

There was an error: 'no module named gallery-dl'

Oh no, it couldn’t be found! One thing we can try in such a case is replacing the dash - with an underscore _, e.g. gallery-dl becomes gallery_dl. You can also try it the other way around, e.g. youtube_dl becomes youtube-dl. Often one of these tricks works.

And… it worked! At least the output we got is from the actual program.

We could start gallery_dl

But there’s still an error because we didn’t tell the program what to do. Note that it also tells us that we can add --help (make sure you add a space between the program name and --help) at the end “to get a list of all options”, as in the program will tell us what it can do (and how we specify it).

Please note that even though the program tells us that we can use gallery-dl --help to get more information, we still need to do our drag & drop routine from before. As in dragging python in there, writing a space, writing -m (again, adding spaces around it) and finally write the actual command it tells us to run.

C:\directory\python.exe -m gallery_dl --help

Here we used the third point from our schema above to start the program.

Command-line arguments

Most command-line programs don’t ask interactively what they are supposed to do, they expect you to tell them from the start.

In the case of gallery-dl it’s the following pattern:

usage: __main__.py [OPTION]... URL...

The __main__.py part could also be just the name of the program, as in:

usage: gallery-dl [OPTION]... URL...

[OPTION] means that there are optional (because of the brackets []) options we can use.

URL... means that gallery-dl for example expects one or multiple (because of the dots ...) URLs of galleries to download.

The order of these is important for most programs. As in the options (if any) come first, then anything else (e.g. URLs, filenames).

Sometimes there are options that are together with a filename (or any text really), e.g. gallery-dl’s --write-log option:

--write-log FILE          Write logging output to FILE

This means that when we write --write-log, the next text (after a space) must be filename. You would write it like this:

C:\directory\python.exe -m gallery_dl --write-log "log-file.txt" "https://www.flickr.com/photos/spacex/"
Starting the program

But if we want a simple download, we add the URL to the end of the command:

C:\directory\python.exe -m gallery_dl "https://www.flickr.com/photos/spacex/"

Also, when writing an URL (or file path) like this in the command line arguments of a program I recommend putting quotes " around it as done above.

So that seems to work!

… but wait. Where did it save these images?

If your program doesn’t show a path or a relative path (those with .\ at the beginning, those that start without a drive letter like C:\...), the files will likely be saved in the same directory that is shown at the beginning of your command prompt (in my case it’s C:\Users\aio).

We can open that directory by typing explorer . in the command prompt and pressing enter.

If we’re looking for the image with the path .\gallery-dl\flickr\Official SpaceX Photos\flickr_16169086873.png, we should find a directory called gallery-dl in our opened folder. There is a flickr folder, then another Official SpaceX Photos folder and then there’s a bunch of images. That’s where we wanted to go.

Configuration

There are often cases where the “normal”/easiest way to start a program (just adding the URL after the start command) is not enough.

Often the help page can be seen by starting the program with --help

C:\directory\python.exe -m gallery_dl --help

There you can find more options to start a program. I for example want to download the profile, but gallery-dl should also put it in a ZIP file. So I found this in the help text:

Post-processing Options:
  --zip                     Store downloaded files in a ZIP archive

Now I run this command with the correct order of arguments:

C:\directory\python.exe -m gallery_dl --zip "https://www.flickr.com/photos/spacex/"

And that was fast! Instead of spending way too long to download every image separately, we just instructed gallery-dl to do everything for us.

An additional tip

Instead of always opening the directory where python is located, then dragging it in the command prompt window, you could try this alternative (that might not work):

Instead of the full path, just write py in front of the program name, like

py -m gallery_dl --help

This shortcut can be quite nice if it’s there. But if it isn’t there you have to use the other method.


Something doesn’t work

When something doesn’t work, it can be quite frustrating and confusing. That’s normal.

Here are some things you can do:

  • Search the internet for your error message. Often adding the script name (e.g. gallery-dl) to the search yields results from people who have run into similar errors
  • Look on the project page if there are any hints.
  • Go to the “Issues” tab of the projects’ GitHub page and type the error message in the search bar. Often someone else already created an issue with details. If not, you can create one. Most projects are happy to answer any question you have.
  • Use an alternative program that does the same. Often older projects that weren’t updated in the last few months are no longer worked on and would require changes to work again. Don’t bother with that and search for another program.
  • You can ask on forums or Reddit how you would do a certain thing with a certain program

However, it could of course also mean that this guide is incomplete or has errors. If you think this is the case, please feel free to open an issue or write an e-mail to document.getElementById('mail-span').innerText = atob('eGFyYW50b2x1c+RwbS5tZQ==').replace('ä', String.fromCharCode(8*8))[not available without JavaScript] (you can also find the address at my GitHub profile). Please also feel free to open an issue/write a mail for any minor comments, feedback etc.

When you ask others a question about an error you got, you should definitely include these things:

  • What you’re trying to do, e.g. “I wanted to download all images from a Flickr profile using gallery-dl”
    • You should also provide a link to the tool you’re using, e.g. “https://github.com/mikf/gallery-dl”
  • The command you’re using to start the program, e.g. “python -m gallery-dl”
  • The output you got from the program (post it as text, screenshots are usually hard to read), select everything and copy it to your post (don’t assume that any part of the output is unnecessary, just post the whole thing). If the forum supports it, you can format it as a code block (makes it more readable)
  • What else you have done so far (“I installed python”)

This makes it more likely that someone else can spot the error and tell you how to fix it.

Thank you :)

https://blog.010.one/run-python-script-from-github-no-experience-required