Taco Steemers — GeistHaus

Taco Steemers Dec 31, 2023 Updated Dec 31, 2023

Software build dependencies are a risk in general. A specific risk less often discussed is that libraries and packages might become unavailable.

Show full content

The problem

Build dependencies, alternatively called packages or libraries, can disappear from dependency servers. Dependency servers can go offline, even if only temporarily. This can leave projects in a state where they can not build.

The corporate solution

The easiest way for companies to solve this is to use their own dependency server and configure their build processes to pull dependencies from and publish dependencies to their private dependency server. That dependency server can then serve as a backup for the public dependency servers. It also becomes easier to use dependencies that were developed in-house. Example solutions are Sonatype Nexus, Artifactory and AWS CodeArtifact.

Other things to consider

Source code and documentation for libraries may also be considered important artifacts for the development process, depending on your specific situation. Documentation may disappear from the web and source code that used to be available might become closed off. I have also seen cases where applications and libraries that we started using many years back were sold, renamed or rewritten. As a result online documentation and sources were too recent for us, as well as difficult to find because names and websites had changed. Consider making backups of documentation and source code for libraries, and installers and documentation for software.

A hobby project solution

A less ideal solution is storing the dependencies offline. To be able to use backups of build dependencies we would have to set the build system to an offline mode where we point it to a directory. A better way might be to use the build system's own caching system and copy them there. That is easy to do with Java's Maven build system, but may need configuration and could be more difficult for other build systems such as Gradle.

One thing to consider is if one wants to make backups of entire build system cache directories, or just backup dependencies used in individual project's build steps.

For a hobby project written in Java I am trying out the Gradle task below. It should run on every build. It copies all dependencies that the build system, Gradle, can find. The copied files can be added to version control, or to some kind of backup system if that is preferred.

I am considering changing that code to use the information resolved by Gradle to look up the matching entries in the Gradle cache directory and backing up those directories instead. That way it would be easy to insert them back in to the cache and let Gradle use them.

task backupResolvedLibs() {
    doFirst {
        var outDirPath = Paths.get(project.projectDir.toPath().toAbsolutePath().toString(), 
                                   'build_dependencies'+File.separator)
        String outDir = outDirPath.toString()
        Files.createDirectories(outDirPath)
        configurations.resolvableImpl.resolvedConfiguration.resolvedArtifacts.each {
            var source = it.getFile().toPath().toAbsolutePath()
            var target = Paths.get(outDir, it.getFile().getName()).toAbsolutePath()
            println source
            println target
            if (it.getFile().getName().contains("SNAPSHOT") || !Files.exists(target)) {
                Files.copy(source, target, StandardCopyOption.REPLACE_EXISTING)
            }
        }
    }
}

We need to make sure that the task runs regularly. In Gradle that means adding it to another task, a commonly run task. It also needs to be run after running the build step, to make sure that the dependencies have actually been resolved. The lines we need to add to that Gradle task look something like this:

dependsOn 'clean'
dependsOn 'build'
dependsOn 'backupResolvedLibs'
tasks.findByName('build').mustRunAfter 'clean'
tasks.findByName('backupResolvedLibs').mustRunAfter 'build'

One nice way to add things to build steps is to use a shell script that fails when any step fails. The shell script can call the 'backupResolvedLibs' Gradle task after the build has succeeded.

tag:tacosteemers.com,2024-01-01:/articles/consider_backing_up_build_dependencies.html

Factors in deciding which jlink compression level to use

Taco Steemers Mar 11, 2023 Updated Mar 11, 2023

Here are some ideas that may help knowledge workers in their approach to assignments or projects.

Show full content

jlink is the Java equivalent of a linker. It can be used to bundle your code and dependencies with a Java Runtime Environment.

With the use of jlink it becomes easier to deploy an application to a customer because we don't need the customer to manually install the correct Java runtime environment. jlink comes with a few options for reducing the output size.

Here we look at the --compress option. It has three options: 0: no compression, 1: constant string sharing and 2: ZIP. I think the numbers indicate how much time they cost during the execution of jlink, with option 2 taking the most time.

According to this bug report the ZIP option can lead to a 13 millisecond slower startup time on a Hello World program. Personally I am okay with small startup delays for the graphical desktop application I am working on.

What concerns me more is installer size and size on disk. I will show an example based on a private project I am working on. Here I used the excellent InnoSetup for creating the installer, but the general idea will apply to any installer or packaging system that applies compression to reduce the total package size.

jlink compression level InnoSetup Installer size Size on disk Size on disk compared to installer size Level 0: No compression 21.864.448 bytes 97.210.368 bytes ~4.5 times increase Level 1: Constant string sharing 21.835.776 bytes 77.504.512 bytes ~3.6 times increase Level 2: ZIP 36.855.808 bytes 61.747.200 bytes ~1.7 times increase

As we can see the installer size will be bigger when we use the most compression. I believe that this is because the components that have been compressed by jlink cannot be compressed much further and might reduce the potential for compression among the other files. It can also be that InnoSetup uses a more efficient compression method.

Conclusion

To conclude, there are four factors we can take in to account when deciding which jlink compression level we want to use.

Package or installer size. Use level 1 if you want a smaller package size. It will still benefit the size on disk after installation.
Size on disk. Use level 2 to get the smallest disk usage after installation. Note that more bytes will need to stored and transferred by the software distribution system.
jlink execution speed. Choose level 0 if you want the fastest jlink execution speed.
Application startup time. Choose level 0 if you want the fastest application startup.

tag:tacosteemers.com,2023-03-11:/articles/java_jlink_compression_levels.html

Succeeding with work assignments

Taco Steemers Jul 18, 2022 Updated Jul 18, 2022

Here are some ideas that may help knowledge workers in their approach to assignments or projects.

Show full content

Table of contents:

Refinement: planning ahead
Does the assignment fit our current capabilities?
Let's get to work
Keep refining, working on tasks and reviewing
Getting un-stuck
Keep stakeholders informed
Manage stakeholders' expectations
External dependencies
Storage and delivery of our work
Consider the project's (time) budget
Change requests
Some tasks are never finished

Taking action and solving problems is fun! But we can't get started taking action right away. We want the end result to be satisfactory, and we are unlikely to get a satisfying result if we jump right in.

Here I present some ideas that may help us to successfully complete our assignments. These are just suggestions and may not work well in your specific situation.

Usually an assignment has at least a refinement phase and an execution phase. In both phases the general idea is to take the next question or action that comes up and get started on that. Finally, there is a review phase, where we review our work or have it reviewed by others. When we are working on assignments with sub-tasks, like a whole project, we will keep going from refinement, to execution, to review, delivery and so on.

We will spend a lot of our time communicating with people. Asking for input, clarification on that input, keeping stakeholders informed and soliciting feedback on our work.

A good first question to ask may be if we are really supposed to do work for the person who gave us the assignment. We don't want to do good work but end up getting reprimanded for it by our manager.

Thinking things over will do wonders for the end result, but only up to a point. Thinking things over is not the same as completing the assignment. Completing something requires taking action. Each action we take will help clarify the assignment. We need to keep evaluating if we need a moment to investigate, or if we should take action based on we know at this moment.

Refinement: planning ahead

First we must make sure there is a written description that we can use as a basis. If it does not yet exist we will make one. The reason we need a written description is that details and nuance matter. Without a description to fall back on and update over time, we will forget or fail to think of important details. It will not only help to clarify what should get done; it will clarify what did and what did not get done over the course of the assignment. We may need to talk to several people to get a complete enough picture.

It is important that the description is clear about what is expected from us, and what isn't. What does an acceptable result or delivery look like? Are we expected to provide support to anyone after we have finished this assignment? Whoever asked for the delivery might expect you to be available for questions and additional work afterwards. Is there a set number of hours listed in the contract? We need to take this type of information in to account when we plan our work.

If the assignment doesn't meet our standards, we should refine it first. Find out what is missing from the story.

Let us look at the description of the assignment. In it, we find sub-tasks and more things we should look at. Maybe there are unfamiliar terms and acronyms or there is a lack of information in a specific area. Make note of these as we read through. Each loose end must be written down to allow us to look in to them at the right time, that way we hope to avoid wasting our time now and in the future. We write down new sub-tasks for these loose ends.

An assignment and each of its sub-tasks:

Should be specific as to the desired outcome versus the current situation.
Should have a short but accurate title. This makes it easier to talk about the assignment and avoid misunderstandings.
Should be very clear on what problems we are and are not solving here.
Should mention related assignments/tasks to allow for better decisions while working on this one.
Should be in some way time-boxed, though it may slide out of that time box. To time-box something means to indicate roughly how many hours is okay to spend on something.
Should mention other topics that might be affected and might need our attention.
Should mention alternative solutions or workarounds, or a lack of them. Workarounds can be vital in determining our priorities.
Preferably can be explained easily. This may be hard to achieve but is worth the effort.

When we are done refining we may need to remove unnecessary information. We do this last; if we do this early we don't know enough to decide which parts are not relevant.

Does the assignment fit our current capabilities?

We should also pause to consider that an assignment should be assigned to someone that might have a good chance of completing it to everyone's satisfaction, given the resources and time available. Is that the case here? If not, waste no time and discuss this with the relevant people.

Broadly speaking we can aim for one of these three outcomes:

Expand the available resources, such as additional assistance becoming available to us.
Reduce what is being asked of us in the given time frame. For example, the assignment can be split up and assigned among more people or requirements can be dropped.
Get someone else to pick up the assignment.

Let's get to work

Some sub-tasks can be assigned a block of 25 minutes (like the Pomodoro technique). Others are tasks that have external dependencies. An example of this is when we put in a request for access or information somewhere, and after that we just have to wait it out. These can be picked up in-between other tasks. Don't forget to add a task to remind ourselves of these open tasks. In all likelihood we will need to follow up on these several times before they are resolved. While waiting, we may be able to pick up other tasks. If not, we might look at relevant standards and documents, as well as existing work we have in-house. Make notes of things that might be relevant. It would be great if we could use this waiting time to refine the assignment description, and it's sub-tasks.

Try to stay in the flow of picking up and completing tasks. Keeping it up is the easiest way to get through. There may be delays due to people reviewing our work or because we need to get access to external systems. That is all part of the job. We don't let it get to us.

Keep refining, working on tasks and reviewing

While working on completing our tasks, we keep making notes of any possible ways to split up bigger tasks, or ways to clear up large or vague tasks, and any open questions and loose ends that need to be investigated. We keep switching between refining, taking action and reviewing. We keep refining our backlog of open tasks and evaluating our finished work, preferably with our stakeholders. New insights may lead us to re-work things that we thought were already finished. We keep iterating on our work, building it up to an acceptable delivery.

These ideas are part of what some people call the "agile" way of working. Instead of planning everything ahead and strictly sticking to the original plan, we accept that requirements can change. As long as the changes fit within the timeframe and budget, more on that later.

We may feel that there could be undesirable interactions between the current work and existing activities. Write these thoughts down and follow them up in a timely manner. Don't pick them up when it feels too late already. Here I don't necessarily mean too late in the day, I mean to pick things up soon to make sure they don't become a problem.

In a general sense small open tasks are probably best to handle as soon as possible. Done is done. After that we can get back in to the flow, working on the bigger parts of the assignment where we have less context switching.

Getting un-stuck

Everyone gets stuck at some point, and we are no different. It is good to try to recognize if we are stuck, before we have lost too much time and energy. When we get stuck we can ask for help, but we can also take a walk or try again tomorrow. Whichever feels suitable. We might realize how we can get un-stuck while trying to explain the situation, while taking our walk or while unwinding at home.

In programming circles there is the idea of "rubber duck debugging". The solution to our problem often comes to us while explaining our problem to someone else, and we might as well explain it to a rubber duck. That way we don't bother anyone.

Keep stakeholders informed

The people who depend on our work will need to be kept informed. If we don't keep them informed they may get anxious. Try to find out who needs to be informed, and how often they want to be informed. Perhaps one person wants to have a short chat about our progress on a near daily basis, but another person prefers a more formal bi-weekly meeting.

We also keep the stakeholders informed of any risks we may see coming up.

Manage stakeholders' expectations

We should at all times try to manage expectations. Maybe we have some early results that we are happy to show off. Those early results might give people the idea that we are almost there, even when in reality we are just exploring something that will become a dead end. Getting feedback is great, but don't let stakeholders feel that they are looking at something that is nearly finished.

Some people say that it is better to "under-promise and over-deliver". What they mean is that it is better to promise less than we expect to deliver. That way we may either end up delivering what we promised, or delivering more than we promised. That would be better than delivering less than what we promised.

External dependencies

Sometimes the success of our assignment is dependent on things that are out of our control. In that case we need to make sure that these external dependencies are making progress as well. We don't want to get stuck with our work because other people did not start their part yet.

Potentially problematic external dependencies are one of the types of risks that stakeholders need to be informed about.

We also keep track of people's availability. We ask them if they are available on the days when we need them.

Storage and delivery of our work

Some questions to ask ourselves are:

In what format are we expected to deliver our work?
Are we expected to use specific technologies or formats?
How can we keep it accessible to stakeholders while the work is in progress?
How can we export our work from the application we are working in?
Is our work being backed-up? We don't want to lose it all due to a computer failure or lost laptop.

Consider the project's (time) budget

Perhaps we are working on a project with a fixed budget. As a junior employee we are unlikely to have to worry about budgeting. Even so, it is good to know what the status of the project is (are we running out of money or time?) and who is paying for it.

Which budget is this work being paid from? Or is it billed directly to a client? We need to ask the relevant people how many hours we could spend on this at most. We need to plan accordingly, to give us a good chance of ending up with a good deliverable before we run out of budget. Note that we are not asking them how long they think the assignment should take.

We might be asked for a perfect deliverable. If our budget doesn't seem to allow for that, we may need to opt for a merely acceptable deliverable instead. Be sure to discuss this situation with the appropriate stakeholders. This may only become clear when work has already been started.

Sometimes we can avoid some work by spending money. It is easier to take advantage of that if we already know who to ask for permission and how to expense these costs to our organization or client. Any expense is probably coming out of the same budget as our working hours, and will decrease how many hours we can work on this assignment.

Finally, what are the rules for tracking the time or money spent on this assignment? Do we need to use a special code for our timesheet application?

Change requests

What do we do if we are asked to make changes late in the assignment? These changes and additional changes resulting from them may not fit in the budget. We should discuss the risk of failure as soon as possible, and get written approval and acknowledgement of the increased risk of failure before we get started on any changes.

Some tasks are never finished

Be aware that it is possible that people will keep asking us about the work we did for years to come. Perhaps our contribution makes us look like we are responsible for whatever related problems come up. New insights and new requests may come up, and our name is the first that people think of. If we don't have time for this additional work it is best to discuss that with our manager. Note that when requests for work are coming in from people other than our manager we may need to decline and refer these people to our manager. If we don't we may risk getting overloaded with work and deadlines.

tag:tacosteemers.com,2022-07-19:/articles/succeeding_with_work_assignments.html

In case you have doubts about putting your (hobby) stuff out there

Taco Steemers Jan 13, 2022 Updated Jan 13, 2022

There is going to be someone out there who is a good fit for what you made. They will like it, and even though you might not know they exist, and they forget about you the moment a notification comes in on their phone, something good was accomplished there.

Show full content

A video version can be found here

Maybe you have wondered if you want to put things online, out in the public. Like a personal website, a blog, or videos. Maybe you wonder about what people think.

I did at some point have doubts. I have a personal website, I like writing small articles or little webpages like the one you see now. I was worried about what people might think. I know I usually shouldn't worry about what people think, but I do sometimes do that.

People might think that I have an amateurish website and low-quality articles. They might have opinions about that strange person in my YouTube videos, sounding kind of robotic.

But then I started looking at which of my pages get visits. And usually, none of my pages get any visits at all. So I guess there is nothing to worry about. I should just do what I like, nobody is having an opinion of me.

There are some search terms where I am in the top five of Google search results. Those people get value from me and can go on with their lives without being stuck on something. I didn't try to get a good Google ranking for those search terms. I was just doing what I like. So mission accomplished, I had fun and some people found a good result for their search terms.

These days I also have a few informational videos online on YouTube, and YouTube provides analytics. Some people open them, but most don't actually watch a useful amount of the video. I was thinking, that is okay, because I like having made them, and I learned about making videos and doing basic editing.

Then, someone out there put a like on one of the videos. That right there, is another mission accomplished. Someone was helped by my video. I think that makes it worth the time I have spent on making videos.

I think that it is important to realize that people will only have an opinion of you if they are thinking about you, and they probably are not thinking about you at all. They might click away what you made, but I do that too. That is fine.

There is going to be someone out there who is a good fit for what you made. They will like it, and even though you might not know they exist, and they forget about you the moment a notification comes in on their phone, something good was accomplished there.

So if you like making harmless things, think of putting them out there. You might enjoy it like I do, and someone out there will enjoy it too at some point. A true win-win situation.

tag:tacosteemers.com,2022-01-14:/articles/hobby_stuff_doubts.html

What is HTTP, actually? What happens when we access a web page

Taco Steemers Jan 7, 2022 Updated Jan 7, 2022

What is HTTP, actually? A short description the Hypertext Transfer Protocol and what happens when we want to access a web page.

Show full content

In this article we take a look at HTTP based on what people most often use it for: requesting a webpage.

HTTP stands for HyperText Transfer Protocol. This webpage is served to your browser by a webserver. Due to the Hypertext Transfer Protocol both your browser and the webserver know what they have to do when you decide to visit a website. A protocol is a set of rules that explain how to handle a situation. HTTPS is HTTP with added Security. Hypertext is text linked to by hyperlinks, the links we click to move from webpage to webpage. Webpages today are not purely text anymore; they also contain images, videos and sounds. For that reason we often talk about hypermedia instead of hypertext. The reason they call this media "hyper" is because it is interactive, as opposed to text on paper. We usually call these documents or resources instead of hypermedia. In this article I will use an example with a document.

HTTP is used for internet communications. It is an application layer protocol, meaning that it is used between applications. In the model for computer to computer communications, the Open Systems Interconnection model, application layer protocols sit at layer 7, the highest layer. Besides HTTP there are a lot more details to how this page was delivered to you!

In the protocol there is a client and a server. In this example my browser is the client and whichever computer contains my website is the server. The client sends an HTTP request to the server. The server gives a response. The client may request a specific document, using a specific version of HTTP. That is what we call the GET request, and it is the simplest example.

This is what a request from my browser for my webpage looks like:

GET https://tacosteemers.com/articles.html

The client can also add more details to their request, called header fields. Examples are the user's login information or that they prefer not to be tracked.

My browser has added many details to the request. Here are some of the request header fields:

Host: tacosteemers.com
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8
If-Modified-Since: Sun, 02 Jan 2022 20:56:04 GMT

The Host request header is mandatory and clarifies which host we want to place a request with. This may seem redundant because we are already placing our request at this host when we GET https://tacosteemers.com/articles.html. However, it is not redundant. My website is served to the client from a server that serves up many websites. That server does not have the name tacosteemers.com. Instead it will be accessed by a name that may look like web887.dc3.example.com. By the time the GET request arrives at this server somewhere in a datacenter, passing through many different computers and routers, that initial request will have been translated several times to reflect hostnames encountered along the way. The server needs the Host request header to know which website we want.

The Accept request header let's the server know what kind of documents the client can accept. If-Modified-Since means that the client only wants to receive the document if it has been changed since the given time. If the client sends this it means that it already has a copy from that date and time on disk, and if the server doesn't have a newer version it will tell the client in it's response. The server's response will not include the document in that situation.

The server responds with:

A status code
A list of response headers
The response body, which contains the actual document

The statuscode for this response is 200, which simply means "OK". If the document on the server was not newer than the browser indicated with If-Modified-Since the server would have given statuscode 304 "Not Modified" and the response body would have been empty.

Some of the response headers are:

Content-Length: 30224
Content-Type: text/html
Last-Modified: Mon, 03 Jan 2022 06:33:28 GMT

The first two response headers tell the client how to interpret the contents of the response body. Last-Modified tells us that the document has indeed changed since we last accessed it. My hosting company has also added two custom response headers that tells us which webserver and loadbalancer this request and response have passed through. They probably do this to allow them to diagnose problems in their network.

You may have noticed that we used the word GET, and wondered if there are any other words. We call these request methods. There are nine request methods.

GET
HEAD, a GET request without getting the body in the response
POST, where the client sends data to the server for further processing
PUT, where the client sends data that overwrites something that already exists on the server, that could be something that has been POST-ed earlier.
DELETE
CONNECT, a more complicated request method
OPTIONS, where the client asks the server what options there are for communicating with the server or a specific resource
TRACE, this method is new to me, apparently it is used for troubleshooting and will give back information about what the request looked like to the server after travelling through all the intermediary systems
PATCH, for sending instructions on how to partially update a resource or document

I haven't used PATCH but I imagine that PATCH is handy for when the client doesn't have the document or doesn't want to send it because it is too large, but the client does know what modifications need to be made to the document.

Here is the proposal for the current HTTP version, HTTP/2, from May 2015. The first eight request methods are described in the earlier HTTP/1 proposal. The PATCH method is described in a separate specification.

tag:tacosteemers.com,2022-01-08:/articles/http.html

Starting up: low friction, minimal process and minimal tools

Taco Steemers Jan 1, 2022 Updated Jan 1, 2022

Preserving project speed and enjoyment -- We can keep up a high iteration speed by keeping things simple and only introducing tools and processes when we absolutely have to.

Show full content

When starting a new project we may not need everything that we need when working on established projects. We can probably wait with setting up a build server and CI/CD, uploading a build from our computer instead. We may not need a project management software. Perhaps it is just you, or you and two other people. You can use a chat app, emails and phone calls. A decision log can be noted in a notepad and emailed to the participants.

A few years ago (time flies, truly), I was working on the software side of a hardware platform project that took sensor input from an Arduino board, state output from (VR) games, and input from an existing piece of hardware that had an Arduino board attached to it that spoke the XBox controller protocol. Together, this platform can simulate almost any type of vehicle. I found it to be an interesting project, and it was refreshing to work with hardware again. I also enjoyed working with Arduino.

The other people on the project needed an easy way to receive my code. After they received my code it should be clear what they needed to do with it. The project was without any income and thus without any infrastructure. The people working on it were not users of version control platforms such as Gitlab or GitHub. I didn't think that introducing them to the modern software development process was a good use of our time. Instead, I resorted to scraping files together and zipping them up in an ad-hoc way. Sometimes several times a day, as what I had written in the morning before work would have been tested during the day, and I sent in improvements after work.

The process of sending updates was tedious and error-prone, so I created a tool to help gather files, remove unwanted files and zip them up. I created that tool when I felt I needed it. I did not create an installer and updater. Me emailing the file and the other person unzipping the file was all that was required. Simple, frictionless and very little development effort was required for making builds and installing them. Which is good, because that is not the focus at the start of a project.

We can set up a CI/CD or a regular build and release process when not having them provides too much friction. Before we reach that moment we can work with a loosely described standard operating procedure and some code to automate the annoying bits.

One of the other participants set up a Trello board. We put effort in to filling it up. In the end though, I felt the other participants didn't use it. They didn't record their thoughts or test results there, even if I did update my development tasks there. I chatted and called a lot with the main person behind the project, who had the hardware. I made notes of their conclusions and would iterate in the evening or in the morning. We just didn't need project management software yet.

In the end the project didn't succeed as a business. But working on it was a ton of fun. It was as frictionless as it could have been. By skipping a lot of administrative and non-core activities I was able to focus on what mattered: the minimum viable product.

My conclusion is that when starting a project we don't need to worry about what we will need to have in the future. Instead, I prefer to focus on the project goals and creating standard operating procedures for the current tasks. When friction appears we can perhaps automate that away, but we shouldn't get sidetracked on that.

tag:tacosteemers.com,2022-01-02:/articles/2022-02-01-starting_up_low_friction.html

Title keywords make company websites easier to find later on

Taco Steemers Dec 26, 2021 Updated Dec 26, 2021

It is a good idea to put some keywords in your landing page title. Without keywords in the title I can't find you anymore next week when I need you.

Show full content

It is a good idea to put some keywords in your landing page title. Without keywords in the title I can't find you anymore next week when I need you. I mean the title of the page, not the title displayed in the page. In HTML terms, I mean the <title> tag in the <head>.

Some websites just state their name, like this made-up company example "FlowerHill Inc.". Why is this a problem? They clearly state their name. Yes, but the lack of keywords makes it difficult to find them again.

Let's say that this is a made-up landscaping company in my area. The week after I first visited that website I find myself in need of a landscaping company. This is a great opportunity for the company that I just found last week! They are immediately on the top of my mind.

But there is a problem. I search my browser history and my bookmarks. I search them for these terms: - "landscaping" - the three other local terms for landscaping - colloquial name for my geographical area - nearby town that they might have been located in - other towns... The page I am looking for isn't coming up.

By now I might remember that other company someone mentioned they used, and go with that instead. Or I start a web search for "landscaping in ...", where many other companies will show up in the web search.

Initially I wanted to use the initial website, their services or product. Unfortunately I just couldn't find them anymore. So let's make sure that we have some keywords in the landing page title. For example: "FlowerHill landscaping and rainwater management -- Hill Valley City".

tag:tacosteemers.com,2021-12-27:/articles/2021-12-27-keywords-in-website-title.html

Catching an exception from an annotation on a JAX-RS resource

Taco Steemers Nov 24, 2021 Updated Nov 24, 2021

An exception resulting from an annotation cannot be caught in a regular try / catch block because the result of the annotation is computed before our code is executed.

Show full content

An exception resulting from an annotation cannot be caught in a regular try / catch block because the result of the annotation is computed before our code is executed.

Example

Take this resource for example:

@Resource
@Path("/example")
public class ExampleResource {

    @GET
    @Path("/")
    @AnnotationThatThrowsException
    public Response getExample() {
        try {
            return Response.status(Response.Status.NO_CONTENT).build();
        } catch (Exception e) {
            return Response.serverError().entity(e.getMessage()).build();
        }
    }
}

Here the code that is run by the annotation processor for AnnotationThatThrowsException, throws an exception. This happens before we actually enter the method. Our try / catch block cannot catch the exception. How do we handle this?

Catching exceptions with a filter

In this example we know about an AccessDeniedException that can be thrown from code that is run because of an annotation. We want to return a 403 Forbidden status code when that exception has been thrown.

Depending on your situation, the exception may already have been caught and rethrown. For that reason we also check any throwable to see if it's cause is an AccessDeniedException.

import javax.enterprise.context.ApplicationScoped;
import javax.servlet.Filter;
import javax.servlet.FilterChain;
import javax.servlet.ServletRequest;
import javax.servlet.ServletResponse;
import javax.servlet.annotation.WebFilter;
import javax.servlet.http.HttpServletResponse;
import java.io.IOException;

@ApplicationScoped
@WebFilter(filterName = "ExceptionHandlingFilter", urlPatterns = "/*")
public class ExceptionHandlingFilter implements Filter {

    private static final Logger LOGGER 
        = LoggerFactory.getLogger(ExceptionHandlingFilter.class);

    @Override
    public void doFilter(ServletRequest request, ServletResponse response, FilterChain chain)
        throws IOException {
        try {
            chain.doFilter(request, response);
        } catch (AccessDeniedException e) {
            ((HttpServletResponse) response)
                .sendError(HttpServletResponse.SC_FORBIDDEN, e.getMessage());
        } catch (Throwable t) {
            if (t.getCause() != null && t.getCause() instanceof  AccessDeniedException) {
                    ((HttpServletResponse) response)
                        .sendError(HttpServletResponse.SC_FORBIDDEN, t.getCause().getMessage());
            } else {
                ((HttpServletResponse) response)
                    .sendError(HttpServletResponse.SC_INTERNAL_SERVER_ERROR, t.getMessage());
            }
        }
    }

}

The filter should be registered automatically due to the filter's WebFilter annotation. If for some reason that doesn't work one can try registering it like so:

import javax.inject.Inject;
import javax.servlet.DispatcherType;
import javax.servlet.FilterRegistration;
import javax.servlet.ServletContainerInitializer;
import javax.servlet.ServletContext;
import javax.servlet.ServletException;
import java.util.EnumSet;
import java.util.Set;

public class FilterInitializer implements ServletContainerInitializer {

    @Inject
    private ExceptionHandlingFilter exceptionHandlingFilter;

    @Override
    public void onStartup(Set<Class<?>> c, ServletContext ctx) throws ServletException {
        FilterRegistration.Dynamic reg = 
            ctx.addFilter("ExceptionHandlingFilter", exceptionHandlingFilter);
        reg.setAsyncSupported(true);
        reg.addMappingForUrlPatterns(EnumSet.of(DispatcherType.REQUEST), false, "/*");
    }
}

Exception Mappers

Exception mappers are another way to handle exceptions when using JAX-RS. I believe these are the recommended solution for handling previously uncaught exceptions, and it is worth looking in to them. However, I haven't gotten exception mappers to work for an exception from an annotation.

tag:tacosteemers.com,2021-11-25:/articles/catching_exceptions_from_annotated_resources.html

Checking checksums

Taco Steemers Sep 21, 2021 Updated Sep 21, 2021

This page is about file checksums for situations where the distributor of the file also provides the checksum. If available, we always want to compare a given checksum with the checksum of the file we downloaded.

Show full content

Table of contents:

Topic
SHA checksums
First, create or check your checksum file
Using sha256sum (GNU)
Using shasum (more cross-platform)
Using OpenSSL
Comparing hashes by hand

Topic

This page is about file hashes (checksums) for situations where the distributor of the file also provides the checksum.

If you want to use checksums in your own code you might want to look at the CRC-32 algorithm.

The examples here are for SHA 256 checksums but can easily be adjusted to SHA 512, for example. OpenSSL is also easy to use for any algorithms.

SHA checksums

If available, we always want to compare a given checksum with the checksum of the file we downloaded. This is to make sure nothing went wrong during transit, in memory or in storage. Another reason is to make it less likely we fall for a man-in-the-middle attack. Checking the checksum for that reason will only work if the man in the middle is not in a position to manipulate the page that lists the checksum.

First, create or check your checksum file

Before we run a checksum command on a file we need to have a corresponding checksum file from the distributor of the file. For example, I download a gradle binary distribution and the corresponding checksum file:

https://services.gradle.org/distributions/gradle-6.9.1-bin.zip
https://services.gradle.org/distributions/gradle-6.9.1-bin.zip.sha256

The contents of this checksum file is only the has, as we see here:

$ cat gradle-6.9.1-bin.zip.sha256
8c12154228a502b784f451179846e518733cf856efc7d45b2e6691012977b2fe

The checksum tools that I use on Linux and macOS expect a format like the following:

8c12154228a502b784f451179846e518733cf856efc7d45b2e6691012977b2fe  gradle-6.9.1-bin.zip

Note that there are two spaces used here. Apparently the missing character in between the spaces means the file will be interpreted as regular text, which is what we want.

Let's create that file now, so we can use it in our examples:

$ echo "$(cat gradle-6.9.1-bin.zip.sha256)  gradle-6.9.1-bin.zip" > gradle-6.9.1-bin.zip.sha256.checksum

Using sha256sum (GNU)

sha256sum is available on GNU/Linux distributions, as part of the coreutils. As far as I know, sha256sum is not available on brew or macports.

$ cat gradle-6.9.1-bin.zip.sha256.checksum | sha256sum --check
gradle-6.9.1-bin.zip: OK

With --status it only gives a 0 status code for success and 1 otherwise. Useful for when you want to check the status code in scripts.

$ cat gradle-6.9.1-bin.zip.sha256.checksum | sha256sum --check --status

We can also use it to create a checksum:

$ sha256sum gradle-6.9.1-bin.zip
8c12154228a502b784f451179846e518733cf856efc7d45b2e6691012977b2fe  gradle-6.9.1-bin.zip

Using shasum (more cross-platform)

shasum is available to Linux distributions and macOS. On macOS it needs to be installed with brew or macports.

We need to indicate which algorithm to use, with the -a argument.

$ cat gradle-6.9.1-bin.zip | shasum -a 256 -c gradle-6.9.1-bin.zip.sha256.checksum
gradle-6.9.1-bin.zip: OK

Returning a statuscode works the same as it does with sha256sum:

$ cat gradle-6.9.1-bin.zip | shasum -a 256 -c gradle-6.9.1-bin.zip.sha256.checksum --status

As does creating a checksum:

$ shasum -a 256 gradle-6.9.1-bin.zip
8c12154228a502b784f451179846e518733cf856efc7d45b2e6691012977b2fe  gradle-6.9.1-bin.zip

Using OpenSSL

openssl can also generate the hash for us.

$ openssl sha256 gradle-6.9.1-bin.zip
SHA256(gradle-6.9.1-bin.zip)= 8c12154228a502b784f451179846e518733cf856efc7d45b2e6691012977b2fe

Comparing hashes by hand

With sha356sum and shasum we can let the tool compare the hashes. Maybe we are using a tool that doesn't do the comparison for us, like openssl. In that case comparing hashes can be easy with python or any other scripting language. We start the console, and copy and paste the hashes to do a string comparison.

$ python
>>> "the hash" == "the hash"
True
>>> quit()

We do need to make sure we did copy and paste the two different hashes, instead of pasting the one hash twice. One way to be sure is copying and pasting something else before we copy and paste the second hash.

tag:tacosteemers.com,2021-09-22:/articles/checking-checksums.html

Thinking about how to organise my writing

Taco Steemers Sep 17, 2021 Updated Sep 17, 2021

Thinking out loud about where to place notes, as opposed to blog posts, and how to get them easy to find.

Show full content

I am often conflicted on where to keep my notes. Specifically, notes that can be public and that I would like to be able to access from work devices as well as private devices. Things I know I will want to look up again in the future. Preferably I would just post them here, on my website. However, I often don't, for a number of reasons.

Information can easily become outdated.
Sharing can feel uncomfortable because I might write something that is incorrect.
I think the notes are not high-quality, in-depth or useful enough for others.
I think nobody will find them anyway, so why share.
I'm not sure where to place them. Should I place them as blog posts or separate notes?

As a result I tend to lose these type of notes, or I don't even bother to create them.

In this post I want to talk about the last point. Where would I place them? Notes will be added to over time. Blog posts don't expand in size, though there may be corrections. Notes don't have a story, they usually consist of a few bullet points, command examples and documentation links. I don't feel that blog posts and notes are a good mix.

Notes and blog posts could both be created as blog posts, and then separated by categories. They would both show up in posts lists and share tags. As a result the notes would be easy to find. One downside might be that blog posts don't have any hierarchical sorting. They have only chronological sorting. I could work around that by just using one page per topic, expanding the page over time, and then linking the topics with tags.
The main benefit of adding my notes as blog posts would be that they can then share tags with the actual blog posts. This improves the discoverability of the notes as well as the blog posts. Some notes are already on this website, on the page "Code Notes and Snippets", which itself is hidden on another page. Not easy to find, and it doesn't feel likely someone else might find the information when they need it.

It would be great if it was easier to find what I wrote. For that reason I think I will start turning my notes in to blog posts. Even though conceptually they are not a good match. The blog post listing and the category pages will make the notes easier to find.

There are some open questions:

How can I leave them out of the automatically generated RSS feed?
What will I do with the existing notes?
How can I make it really easy to add and edit notes? Currently, my website is statically generated from a specific computer.
How can I make my non-public writing accessible to myself on my private devices as well as my work devices?
What category name should I use for the notes? I already use "General Notes" and "Technical Notes" for my blog posts. Maybe something like "Quick Notes"?

Perhaps I should not get too many categories. The current "Technical Notes" could move to "General Notes". The quick notes could then move to "Technical notes".

It does occur to me that I am perhaps making things too difficult.

tag:tacosteemers.com,2021-09-18:/articles/2021-09-18-thinking_about_how_to_organise_my_writing.html

Automatically blocking a git commit if we detect a known mistake

Taco Steemers Aug 9, 2021 Updated Aug 9, 2021

It is possible to automatically block a git commit if it makes a known mistake. This can be done with a pre-commit hook.

Show full content

Recently I made a mistake. As a result I was thinking about something I hadn't done in a while; making a git pre-commit hook. A pre-commit hook is code that runs before a commit, and can block the commit if there are any problems. People write git pre-commit hooks to help detect problems such as files with cross-platform encoding and line-ending issues, filenames that differ only in capitalization (1)☟, and files that are not allowed to end up in a repository (2)☟.

Another problem that we can fix with a git pre-commit hook is mistakes in URL linking spotted by Pelican. This can be found in the build output: WARNING: Unable to find '/articles/abc.html', skipping url replacement. We can check the build output for this type of problem. One way is to display the build output to the user but store it in a file as well. In the pre-commig hook we then check the build output for this type of problem.

Many pre-commit hooks have been shared online. You may be able to find some that are useful to you. For example, you will find various indentation related pre-commit hooks if you do a web search on "git pre-commit hook indentation".

What happened is that I updated my website, changed my mind about the title of the article I had just added, changed it, and updated the website again. The problem is that I also changed the slug to match the tile, which is what the URL is based on. This makes it look like a different article. It is possible that the article appeared twice some people's feeds, as it did in mine.

Not a big deal. Except that I don't like making mistakes. Especially if they are avoidable.

To make this type of mistake less likely I am adding two small pieces of automation. First is adding a small script that will detect if the RSS feed has changed even though it still contains the same number of article titles. This runs as a git pre-commit hook. This is what I am testing now. The basic idea works, but it does depend on using one commit per article. It does not work if we commit a new article at the same time that we change the slug on an existing article. To me that is acceptable; I already prefer that type of commit, where each commit represents one topic.

Second is adding some scripting that will work as a pelican pre-upload hook. The script will need to stop the upload if there are any modified files left. In other words, all files have been committed before we can upload the new website content. This way we know for sure the git pre-commit hook has been run.

Footnotes 1

Filesystems used by Windows ignore capitalization in filenames. Other filesystems do not. This results in problems when changing the capitalization of a version-controlled filename on a Windows computer. This change may not end up in a commit. As a result a build may succeed locally, but fails on other computers. ☝

2

See my article "Some files and information should not be in source control" for reference. By creating a a pre-commit hook for these situations we can detect these types of files before we commit them. ☝

tag:tacosteemers.com,2021-08-10:/articles/2021-08-10-automatically_blocking_a_git_commit_if_we_detect_a_known_mistake.html

Avoid unmaintained and undifferentiated forks on your repository hosting profile

Taco Steemers Aug 8, 2021 Updated Aug 8, 2021

By forking many projects some people end up with too much on their profile page. Their original projects are hard to find. If you want to check out a project you can make a local clone. When you fork a project online, you are offering that project in that state. People might take you up on your offer and start using it as-is. You have a responsibility there.

Show full content

When viewing people's profiles on web applications centered around repository hosting services, such as Github, Gitlab and Bitbucket, we can find profiles that are full of forks of projects that this person does not seem to have done any meaningful work on. I think this is because some people click the "fork me" button on any project that they want to play around with. That is not necessary, and it has a downside.

By forking many projects they end up with their original work mixed in with pages full of forked projects. Their handful of original projects are difficult to find among the many forks. To the casual profile visitor it looks as though they don't understand how version control can be used well together with these online platforms.

If you want to check out a project, just do "git clone" or equivalent on the main project. Create a local copy. Rebase it when you want to pull in updates. If you have changes you can go through that project's steps for contributors to get your changes included. This can be as simple as opening a pull request in whichever way is standard for that repository hosting service. Exact details will be different for each project. If it is unclear you might search for contact details of current contributors and ask them how to proceed.

If they don't want to merge your changes you can consider forking. When you do fork, you will have to keep your fork up to date. If not because of the feature updates then at least because of the security updates that may have been done on the original project. Note that updates to which versions of dependencies the project uses can also include security updates. As a result, security improvements may be mixed in with other types of updates.

When you fork a project online, you are offering that project in that state. People might take you up on your offer and start using your fork as-is, instead of the original. You have some responsibility there.

tag:tacosteemers.com,2021-08-09:/articles/2021-08-09_avoid_unmaintained_and_undifferentiated_forks_on_your_repository_hosting_profile.html

Diagramming can be a valuable tool for thinking as well as communication

Taco Steemers Nov 21, 2020 Updated Nov 21, 2020

Words alone are sometimes not sufficient. Diagrams can help us understand a situation in new ways, assisting with our thinking as well as communication. If you don't diagram already, try it out!

Show full content

Table of contents:

Why create diagrams?
Where I am coming from on this topic
What does a diagram look like?
Pick any tool to start
Automatic class diagrams
Conclusion

Why create diagrams?

I don't do a lot of diagramming, but occasionally it can be a valuable tool. Diagrams can help us understand a situation in new ways. Words alone may not be sufficient, especially if the people who are trying to communicate come from different backgrounds or don't share a native language. Diagramming can be a fast and powerful way to communicate. The diagram can be a tool for thinking, individually or as a group. It can be more than just a way of creating documentation. It doesn't always need to be 100% correct when it is used as a tool for generating new insights. If you don't diagram already, try it out!

Where I am coming from on this topic

Some people may have learned to make diagrams while they were studying. A lot of people who program or design systems did not study computer science and may not be familiar with the practice. The same is true for myself.

Over the years I became accustomed to reading diagrams. Some people I worked with were old-school computer scientists who used a specific notation (UML) that took some time getting used to, especially as it wasn't really explained to me, and I did not have reference material. Occasionally I was tasked with creating class diagrams in specific UML tools, but personally I don't find creating hand-made class diagrams useful because they become outdated. When I was thinking about a new system or trying to find the source of bugs in a complex interaction I would doodle a bit on paper, but I was not comfortable sharing these drawings because I did not like the idea that people would think I didn't do them 'right'. Due to all this I quickly forgot about diagramming when I moved on to other assignments.

Today, I don't want 'right' to get in the way of 'useful'. I don't think we need to go all-in with the Unified Modelling Language (UML) to enjoy the benefits of the occasional quick diagram.

What does a diagram look like?

The official PlantUML site and Real World PlantUML have good examples that might get you inspired, and show the basic building blocks. A specific page I want to share is the deployment diagrams overview on uml-diagrams.org. I think it is a good demonstration of how a lot of useful information can be expressed in a diagram.

Of course there are also different situations where you might diagram, such as describing use cases.

My own PlantUML notes page shows the source of this example diagram.

Personally I don't bother too much with the 'correct' ways to draw something. As long as the general idea is clear the diagram can be of value. I do think that it will benefit the usefulness of your diagrams if you look up the basic conventions for the type of diagram you want to make. For example, there is a specific way to draw the connection between an interface and an implementation. It needs to be clear which element is the interface, which elements are using the interface and which elements are implementing the interface. It is good to use notation that people may already be familiar with and can be understood in the future, when we have moved on.

Pick any tool to start

I like the PlantUML tool. I use it as a command on the commandline, and write the diagram in whichever plain text editor is at hand. They also provide an online service. I think PlanUML is a low-complexity way to go. Unfortunately error messages can be short and unclear. The example pages I linked to earlier can be handy to see how types of diagrams can be made.

Personally I feel that pencil and paper or any drawing program can be a valid tool for diagramming. Especially when we just want to get the basic ideas on paper, in a way that can be used as a starting point for further thinking.

If we are using software it can be handy if we also have a drawing program with support for multiple layers. Then we can add notes on top in a different color, like we might do with pencil and paper during a conversation with our colleagues. The multiple layers then make it possible to switch layers of notes on and off.

Pick any tool from a search result. Try out several if you like.

Automatic class diagrams

Sometimes we want to have class diagrams for existing codebases. Luckily many integrated development environments support creating them automatically. I have done so in the past with Microsoft's IDEs. IntelliJ IDEA also seems to support automatic class diagrams.

Conclusion

Diagramming can be a great way to jot down your thoughts, and offload whichever system design you are thinking about to paper, or computer. It allows us to think about the system in new ways. This is difficult to explain in words, and best experienced by yourself. If you don't see the use yet, just keep it in mind for the next time you need to communicate your ideas, or analyze a system.

tag:tacosteemers.com,2020-11-22:/articles/2020-11-22-diagramming-can-be-a-valuable-tool.html

On clarifying the status of demoed products

Taco Steemers Oct 25, 2020 Updated Oct 25, 2020

Giving a demonstration of a future product or feature can be a great way to check if development is on the right track. Unfortunately stakeholders don't always understand that a demo can be very far from a finished product.

Show full content

Giving a demonstration of a possible product or feature can be a great way to get feedback and check if development is on the right track. Unfortunately stakeholders and customers don't always understand that a demo can be very far from a finished product.

A discussion of this phenomenon can be found here on Hacker News.

Personally I can't recall having had a lot of issues with this when the customer is inside my own (technical) organisation. It did get me thinking. It is important to tell our outward-facing colleagues that based on this demo we cannot make any promises to anyone outside the organisation. Unfortunately a generic statement like that may not be taken serious, because it looks like standard boilerplate. Perhaps an analogy to the physical world might help? "What we are showing today is similar to an architect's 3D model of a house. We want to show the 3D model to get feedback. No work has been done on the actual house, and it might take a long time to build." This type of analogy may seem silly, I understand that. But I would rather be safe than be sorry.

The situation gets more difficult when it comes to external customers. They may not be familiar with how long it can take to fully develop a feature or product, and get it ready for launching. They might think we have more people working on the project than we actually have. In our enthusiasm to show what we are working on we might end up turning the customer against us. It will look like the project is not working out if there are no regular updates after a demo. Enthusiasm will go down over time even if there are regular updates.

I think there should only be a project demonstration if we can give ourselves a hard deadline that we will make no matter what happens. That means the project needs to be at a certain stage of maturity before we demo it. It also means that we need to be able to add more people, or the right experienced people, when the project has setbacks. We need to be certain we actually want to do the project, and that we are able to. Once we have demonstrated the future product to customers, it is bad form to cancel it.

tag:tacosteemers.com,2020-10-25:/articles/2020-10-25-on-clarifying-the-status-of-demoed-products.html

Some files and information should not be in source control

Taco Steemers Oct 22, 2020 Updated Oct 22, 2020

Which are they, what should we do with them instead, and how can we avoid mistakes?

Show full content

Some files and information should not be stored in version control systems such as git. Which are they, what should we do with them instead, and how can we avoid mistakes?

Table of contents:

Secrets
Secrets, in practice
Generated files
An exception, generated interfaces
Other files
What about backups?
What about documentation?
How to avoid adding secrets to version control
How to avoid adding unwanted files to Git

Secrets

Examples of files that do not belong in a version control system are (unencrypted) files containing API credentials, keys, and anything else that is supposed to stay secret.

Over the course of a project's lifetime many people might get access to the source code. This makes the source code an unsafe place to keep secrets.

Another aspect is that some secrets change, such as credentials. This is easier to do if the secrets are stored separate from the source code. If we had them in a distributed source control system it would be more work to change them for all active versions of the software. It would also require a new release and deploy.

Loose secrets and files that contain secrets are usually made available to the applications through environment variables, or a shared data source such as a secret store. The environment variables can be set by the software responsible for deploying, starting and stopping the applications. An example of a secret store is Vault.

Secrets, in practice

In practice, we are likely to find that both methods are used to make secrets available. The URI of the secret store could be stored in the application database. To get that information we need to connect to the application database first. We can't connect to a database to get the database password from the database. Thus, the database connection details may be passed through environment variables, or the deployment software may write them to a file on the application server.

Generated files

Files that are the result of build steps, such as output from generators and compilers, should not be added to a source control management system. These are not source files. Any changes made to them will be overwritten the next time the build is run. Another example is files created by runtime environments, such as the __pycache__ directory which is created when a Python program is run.

An exception, generated interfaces

As far as I know there is only one type of exception to the rule. Generating a SOAP interface from a local WSDL file during every build is a waste of resources. It can be an acceptable solution to generate it once and add the output to the project source files. An alternative to adding the output to version control is to package it as an artifact (dependency) and add it to the organization's private artifact repository.

Other files

An example of other more mundane files is the .DS_Store file. This is a MacOS file for storing details of how a directory needs to be shown on the desktop. It is unrelated to the project.

IDE files such as IML files and .idea directories should also not be added. These contain the developer's personal settings and preferences. Occasionally we may share them to help a new colleague get up and running, but it is not part of the project source code.

What about backups?

There is no need to store several versions of the file next to each other in the project directory. The version control system controls file versioning. The previous version of the file is the backup.

Database backups don't belong in the source control system. They belong on a properly secured storage server.

What about documentation?

Personally I feel that some level of documentation can be good to add. This includes instructions about development dependencies, local development setup, and documents concerning integration with external APIs. Having this type of information close at hand can be very helpful to developers.

How to avoid adding secrets to version control

This is a problem that probably does not have a full technical solution. Awareness is key.

There are projects such as git-secrets that try to solve this. Personally I have not used git-secrets or similar tools. Secrets detection is tricky to automate and won't be fool-proof. I imagine that they can detect secrets that they already know; common secrets such as AWS related credentials. Secrets specific to your software on the other hand, I expect to be difficult to detect. The creators of the tool are not familiar with them.

How to avoid adding unwanted files to Git

Git has a special file, the .gitignore file. This can be used to specify a list of files that should not be added to the source control system. The file itself is always added to the source control system, that way every developer can benefit from it.

This file is easy to create. Here is an example .gitignore file for a website project generated with Pelican. The developers are using a MacOS computer and the IntelliJ IDEA development environment.

.DS_Store
*.iml
.idea
generated/
pelican/output/
pelican/__pycache__/

As we can see, we ask Git to ignore the MacOS-specific file, the IDEA specific files and directories, and the output directories.

tag:tacosteemers.com,2020-10-23:/articles/2020-10-23-some-files-and-information-should-not-be-in-source-control.html

Prefer to create constructive or uplifting conversations at work

Taco Steemers Oct 17, 2020 Updated Oct 17, 2020

It is healthy to discuss negative parts of situations. When we are part of such a discussion, it is good if we can turn it in to an uplifting or constructive conversation.

Show full content

It is natural for people to focus on negative parts of situations. We experience something we don't like and want to discuss it. This can be healthy. We are letting off some steam, as it were.

Occasionally we might be actively participating in a discussion with a negative tone. If we don't take action or give actionable advice then all we are doing is complaining. Complaining is not helpful if we do it too often. On top of that, the situations we are bringing up for discussion are probably not new to our colleagues. They know about it, they know it is not optimal. They haven't had the energy or drive yet to fix it. If we are not bringing something constructive or positive to the discussion we will end up taking more of their time and energy.

It has become clear to me that this type of conversation tends not to become a constructive conversation, unless we make a conscious effort. If we let the conversation flow naturally it is rare for this kind of conversation to become one where solutions are offered and follow-up actions are defined.

When this kind of conversations come up with colleagues, we have three good options. Option one is to let people just get it out, but keep it short. If the topic continues, we can go to options two and three. Option two is to share actionable advice now or even offer to solve the problem, if we can do so and the person is open to it. If we can't, option three is to offer to schedule a meeting with people who might. At this point the people we are talking with will indicate whether there really is a problem that needs solving, or they were just letting off some steam.

Sometimes a complaint comes up that we just can't really do anything with. This can cost us a lot of energy. The best way to handle this type of conversation may be to acknowledge the complaint, but add something that turns it in to an uplifting conversation. Preferably there is something happy to be said related to the complaint. If the complaint goes on too long, and we can't think of anything relevant and useful to say, we can always transition the conversation to the good weather, sports, or whatever happened in a famous television show.

tag:tacosteemers.com,2020-10-18:/articles/2020-10-18-prefer-to-create-constructive-or-uplifting-conversations-at-work.html

Dark and light web themes: consider using a hybrid CSS/JS implementation

Taco Steemers Oct 16, 2020 Updated Oct 16, 2020

Instead of using either CSS media queries for operating system theme preferences or a Javacript-based theme selector we can use both. An automatic CSS-based switching and JS-based switching where the user can choose.

Show full content

An excellent article on website dark mode and light mode implementation can be found here on ccs-tricks. It describes a style sheet-based implementation and a Javascript-based implementation. The style sheet-based implementation uses the user's operating system preferences to automatically select a dark or light theme. The Javascript-based implementation allows a user to select the theme they want to use.

Adding to that article, I want to advocate for a hybrid approach where we use both. An automatic CSS-based switching and JS-based switching where the user can choose.

The advantage

The advantage is we can add theme selection, and the default theme can be the preferred theme as configured on the operating system level. This way we have all options open for users that browse with JS on. Users that browse with JS disabled will still get the style that they have selected as the preferred style in their operating system preferences.

The themeswitcher control

The tutorial linked to at the beginning of this article shows an example implementation. The tutorial code removes CSS classes and adds CSS classes when the user switches to a different theme.

My own implementation changes the entire stylesheet. This is done by changing the stylesheet href.

The stylesheet is linked as follows: <link rel="stylesheet" id="css_colors" href="/css/colors/1.css" />

It can be changed with Javascript in the following way: document.getElementById("css_colors").href="/css/colors/2.css";

The manual themeswitcher on this website is currently a big dropdown control. That is not necessary. It can be a button, an anchor link or a simple icon.

The downside to my hybrid approach

The downside to my hybrid approach is that there is some duplication. For the CSS-only functionality we need a stylesheet that uses switching based on media queries. For the JS-based functionality we need to be able to load a stylesheet specific to the chosen theme. The contents of these stylesheets would partially overlap. As a proponent of the 'do not repeat yourself' idea this is a downside. Duplicated source code makes it easy to create inconsistencies.

Solution to the downside

To avoid having to write duplicate stylesheet contents we can generate the required stylesheets from de-duplicated input files. My scripts for generating the css can be found here.

Let's take a look at how this would work.

The three color stylesheets can be generated from three input files:

A file containing the CSS rules that apply the color variables
A file containing the light mode color variables
A file containing the dark mode color variables.

We might have four stylesheet files for the website:

The general stylesheet that does not contain color information
The general color stylesheet that contains both light and dark mode color information, for operating system preference support. This is input files 1, 2 and 3 combined in a specific structure.
The light mode stylesheet for the javascript switching support. This is input file 1.
The dark mode stylesheet for the javascript switching support. This is input file 2.

Based on this idea, the general color stylesheet would look like this:

/* This section is identical to the light mode CSS file contents */
:root {
    color-scheme: light dark;

    --code-text-color: black;
    --background-color: white;
}

@media (prefers-color-scheme: dark) {
    /* This section is identical to the dark mode CSS file contents */
    :root {
        color-scheme: light dark;

        --text-color: white;
        --background-color: black;
    }
}

/* This section contains the rules for applying the color variables */
body {
    color: var(--text-color);
    background-color: var(--background-color);
}

Conclusion

If we only support the automatic theme selection a user might be forced to use a theme that they don't want to use. The user might prefer their operating system controls in a dark theme, but that does not guarantee that they want to see our website in our website's dark theme. For that reason I prefer having the automatic system as well as giving the user the option to choose a theme. It takes a bit more work to support both, but it helps make our websites more accessible.

tag:tacosteemers.com,2020-10-17:/articles/2020-10-17-dark-and-light-web-themes-consider-using-a-hybrid-approach.html

Usability anti-pattern: no controls in fullscreen

Taco Steemers Oct 15, 2020 Updated Oct 15, 2020

A fullscreen window needs to have an obvious button to get out of fullscreen mode.

Show full content

General observations on the Zoom user interface

The Zoom video meeting desktop application has different windows and types of controls. It regularly uses three main windows. These windows have overlays, drop down menus, and regular buttons.

Some buttons move around between different windows, based on the currently active features. For example, the mute / un-mute button moves around based on whether screen sharing is active on your machine.

Some labels look like buttons but are in fact not buttons.

There is a taskbar on the bottom of one of the main windows that contains buttons. Other buttons and labels can be found on the top edge, or above the bottom taskbar.

There is also one main windows that has a right-click menu.

There are other separate windows that pop up when clicking controls on the main windows, such as the chat window.

As a whole it is messy. Some of the user interface choices are perhaps understandable due to the functionality in the application.

Fullscreen mode

The least user-friendly part of the application can be seen during fullscreen mode. When a participant shares a screen it is immediately shown fullscreen to the other people.

During some meetings the shared screen window has a menu that occurs when the mouse comes close to the top edge. It can be used to exit fullscreen.

Unfortunately there appears to be a different type of screen sharing as well. During other meetings the fullscreen shared window does not have any button to turn the fullscreen window in to a regular window. There are no controls at all. Moving the mouse or clicking the screen does not make any controls appear. The escape key does not work. In a sense, Zoom locks the user out of their computer. It turns out that double-clicking the video on the shared window exits the full-screen mode. The double-clicking is not obvious. It is the only double-clicking in the application.

Conclusion

It should be easy to escape fullscreen mode. Especially in the case of Zoom screen sharing. The user did not choose to enter fullscreen mode. Zoom did that by itself. This is confusing to the user.

tag:tacosteemers.com,2020-10-16:/articles/2020-10-16-ux-anti-pattern-no-controls-in-fullscreen.html

Thoughts on self-managing teams

Taco Steemers Oct 7, 2020 Updated Oct 7, 2020

Self-managing teams are great. People feel their shared responsibility and come together to solve problems. There should be someone that is authorized to make decisions on specific topics. And the topic 'people' should not be forgotten.

Show full content

A self-managing team is a team without supervisory managers. Everyone who works toward the mission is an active member of the team. This kind of team can also be called a flat team. I find there to be many upsides to being part of a flat team. These mainly come from the fact that it allows the subject matter experts a great deal of freedom to decide how the work will be done.

Mutual decisions based on shared responsibility

It seems widely accepted that a software product development team needs a product owner. Someone who is a kind of customer representative, and keeps track of progress towards the long-term vision of the product and the business. Another widely accepted role, either implicit or explicit, is the technical lead, or technical product owner. The one person who can be relied upon to make the final decision on technical matters. We may not always agree with their decisions, but we respect the decisions. Because they are part of the team they share our lived experiences and should be taking them in to account.

Should we have any other 'special' colleagues apart from these two roles? Perhaps not. We take away shared responsibility every time we give a specific person a specific title. For example, people feel a bit less responsible for sticking to the product vision if they are not the product owner.

Downsides

There can be downsides to a flat team too. In thinking about software teams and product teams we tend to forget that these are made out of people. The people in the team have a big impact on the success of the team's mission. People also need tending to. In a hierarchical team this responsibility would fall to the supervisory manager. In a flat hierarchy we can get in to uncomfortable situations if teams don't have a specific person to discuss people problems with. There is nobody to go to for an outside perspective if there seems to be a problem in the team. There is nobody who is in a position to make a decision. If this person is part of the team they might become biased. If they are outside of the team they won't know what is going on.

This principle can also apply to other topics, such as budgets and spending.

Outside influences

The average business consists of more parts than just the one flat team. There are sales teams, account management teams, customer support teams, and others. A common outside influence on a software development team is when the sales team or account management team makes a promise to a customer that is up to the software development team to fulfill. It is not acceptable if someone else can make a decision about what the team can work on. The team is not a self-managing team if someone else can make a decision without involving the team. In that case the team is at best just a normal team. It doesn't have the power to say yes or no. At worst, the team can become stuck between conflicting decisions by others. This will reduce the team's impact and quality of work.

There is nobody to protect the team because there is no officially appointed team manager. The self-managing team must protect itself and clarify that any commitment that involves the team must be pre-approved by the team.

Conclusion

Flat teams are great. Generally people feel their shared responsibility and really come together to solve problems. However, even in a flat team there should be someone that is authorized to make decisions on specific topics. And the topic 'people' should not be forgotten.

tag:tacosteemers.com,2020-10-08:/articles/2020-10-08-thoughts_on_self-managing_teams.html

Git: how can we squash (flatten) commits

Taco Steemers Sep 26, 2020 Updated Sep 26, 2020

In this article I explain two ways in which we can squash commits.

Show full content

Earlier I wrote about when (not) to squash commits. To squash commits means to flatten, or combine, a series of consecutive commits in to one commit. Here I explain two ways we can combine commits.

Squashing consecutive commits on your feature branch

The first way is interactive rebasing. With this tool we can choose to combine several commits that only exist on this branch in to one commit. In my opinion, interactive rebasing is only a good idea in these two situations:

you rebase right before merging the feature branch into the main branch, after this branch has been reviewed, and you know no-one has branched off of this branch
you know you are the only one working on this branch, and as a result there is no possibility that you will rebase commits that someone else is basing their own commits on

The basic example is as follows. We have finalized part of our work on this feature branch. This part is spread out over 5 commits. We would like to flatten this in to one commit for this part of the work, rather than keeping the 5 separate intermediate commits. This can be done with the following command:

git rebase -i HEAD~5

Note that this sign before the 5 is the tilde, ~, not the minus sign. This command will fail if there are only 5 commits on the branch.

We will be presented with the text editor. It shows a list of commits, ordered from the oldest on top to the newest at the bottom. Here we will indicate which commits we want to pick (use as a basis to flatten on) and which commits we want to squash away. The text editor window will contain an explanation of all possible commands.

We will pick the first and squash the rest. We don't need to type out the full squash. Just replace pick with s.

After we save the file and exit the editor we will receive instructions on how to proceed.

How does this look in a bigger process?

Let's look at the situation where we are manually merging a feature branch into the main branch. Our feature branch has received 5 commits since it was branched off of the main branch. These commits have passed the review phase. Now we want to combine these commits in to one commit and add that to the main branch. This one commit will be easier to backport than several commits, and easier to understand for future review than 5 separate commits would be.

First we update the main branch:

git checkout main
git pull

Then we squash our 5 commits on the feature branch:

git checkout feature
git rebase -i HEAD~5

The next step is where we rebase our feature branch with the current state of the main branch. In other words, we put the current state of the main branch below the additional commits that we added to the feature branch. It is possible that there are conflicts that we need to resolve by hand. We don't commit the changes that follow from the conflicts. We only stage the changed files.

git rebase main

If we are resolving conflicts, we can get a hint on how to proceed from git, by asking for the status.

git status

When we are done resolving conflicts, we finalize the rebase:

git rebase --continue

If we feel that something is going wrong, we can abort the rebase:

git rebase --abort

After finalizing the rebase we should perform our testing again. Once we are satisfied that everything is as it should be, we can merge the feature branch to the main branch.

Squash everything, while merging to the main branch

A second way we can combine commits is the actual git merge --squash command.

For conversation about this workflow: https://stackoverflow.com/questions/5308816/how-to-use-git-merge-squash

Personally I don't use this workflow by hand, as it is automated away where I work.

First we update the main branch.

git checkout main
git pull

Then we rebase the feature branch with the current state of the main branch, and resolve conflicts by hand.

git checkout feature
git pull
git rebase main

After we finish rebasing and resolving conflicts we should perform our testing again.

The next step is to check out the main branch and merge --squash the feature branch onto it:

git checkout main
git merge --squash feature

Finally, the changes need to be staged and committed with git commit. You might want to write a new message for this, using git commit -m.

tag:tacosteemers.com,2020-09-27:/articles/2020-09-27-git-how-can-we-squash-flatten-commits.html

The Twitter share button does not need javascript

Taco Steemers Aug 29, 2020 Updated Aug 29, 2020

When we let Twitter generate a 'tweet this button' for our website, Twitter includes a javascript file. We don't need to include this javascript file. The share button can be just a hyperlink. Twitter uses the referrer header to determine which URL the user wants to tweet about.

The javascript …

Show full content

When we let Twitter generate a 'tweet this button' for our website, Twitter includes a javascript file. We don't need to include this javascript file. The share button can be just a hyperlink. Twitter uses the referrer header to determine which URL the user wants to tweet about.

The javascript does not benefit the reader. It might include tracking features that the reader does not like.

Twitter generates the following HTML:

<a href="https://twitter.com/share" 
    class="twitter-share-button" 
    data-via="{{TWITTER_USERNAME}}" 
    data-text="{{ article.title }}" 
    data-dnt="true" data-show-count="false" 
    data-count="horizontal">Tweet about this article</a>
<script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>

We only need the a-tag. The Twitter-specific anchor attributes become unused after removing the javascript. We can simplify the anchor to the following:

<a href="https://twitter.com/share?url=url_of_this_page">Tweet about this article</a>

In Pelican, the URL would be built up as follows:

https://twitter.com/share?url={{ SITEURL }}/{{ output_file }}

If we want to have an icon then we can do that with an image. To get the image file you can copy one from Twitter, or any other share button such as the one on this website.

<div>
    <img class="icon" src="/{{ THEME_STATIC_DIR }}/images/twitter.png" alt="Twitter"/>
    <a href="https://twitter.com/share">Tweet about this article</a>
</div>

The image would preferably be placed on a location that is yours. The src attribute needs to refer to the exact location of the image.

tag:tacosteemers.com,2020-08-30:/articles/2020-08-30-twitter-share-button-does-not-need-javascript.html

Git pull without merge

Taco Steemers Aug 17, 2020 Updated Aug 17, 2020

Applying remote changes to our local branch without an additional merge commit

Show full content

The problem

I want to update my local git branch based on the changes that are on the remote branch. If I just do git pull, I will need to do a merge. I don't want to get an additional commit just for merging. Instead, I want my local changes to be applied on top of the changes that have been made to the remote branch.

A solution

git pull --rebase

There will be no additional commit for merging. Instead, the changes required to merge your changes with the remote branch will be made to your local commits. Now it looks like the branch has always been based on the current state of the remote branch.

Background

This is what we call rebasing. One might say that the state we base our changes on is re-done.

There is also interactive rebasing, but that is a different topic.

We should not do git pull --rebase on the same main branch that other people are pushing work in progress to as well. That will get messy really fast. One proper way to handle that type of situation is to use a feature branch workflow.

tag:tacosteemers.com,2020-08-18:/articles/2020-08-18-git-pull-without-merge.html

When (not) to squash commits

Taco Steemers Jul 9, 2020 Updated Jul 9, 2020

In this article I want to talk about squashing (flattening) a series of commits in to one commit. Squashing is a good tool to have, but not everything should be squashed away. Intermediate commit history can have value.

Show full content

Summary: Squashing is a good tool to have, but not everything should be squashed away. Intermediate commit history can have value. Please don't squash rework on a PR during the review phase. If you do, the reviewers may have to start over from scratch.

In this article I want to talk about when it is appropriate to be squashing (flattening) a series of commits in to one commit. I assume that we are using a branch for each issue (such as a bug or a feature), and we merge that branch to the main branch when it is considered finished, tested and properly reviewed.

Why do we squash during development?

In my experience it is good practice to commit often. It allows us to easily revisit or roll back any specific step that we take towards the final version of our code. This makes the process of finding the desired solution easier. When it comes time to share our work, we might want to squash our commits.

After we decide that we have found a sufficient solution for a specific problem, the road we took may not seem interesting anymore. At that point we are only interested in having solved the issue at hand. We might then squash a series of commits that we made while working towards this solution, and move on to the next problem.

There is another reason one might squash their commits. When we are writing code we make that code permanent by committing the code locally and eventually pushing it out to a central server, after which our code will be available to everyone. Sometimes we feel kind of bad about our earlier commits and don't want to make them permanently visible to others. Perhaps they were failed attempts and would show to others how little we understood when we started to work on this issue. This is also a valid reason to squash, and I don't want to minimize these kinds of feelings.

Whatever the reason, we decide to remove the earlier commits from existence such that only the final result remains in one neat commit.

Squashing is a good tool to have

Having a series of commits squashed in to one can be very handy when that commit eventually ends up on the main branch; with each issue contained to one commit it is a lot easier to research changes than when changes related to single issues are spread out over ten or a hundred commits. For that reason, I always squash before I merge an approved pull request.

Another reason is that it makes cherry-picking less error-prone and less time-consuming.

Squashing is great. However, there are also situations where I think we should not squash commits.

Not everything should be squashed away

Some pull requests contain a large amount of changes done in several steps that are clearly separate topics in the minds of the developers. In these cases it can be beneficial to do these steps one by one and squash the commits for one step in to one commit before proceeding to the next step. This will make the end result easier to review because it is still separated in understandable steps. It also makes it easier to get review comments on intermediate work; this can be important if the requested changes have impact on the next steps.

A situation where one should not squash at all is when one is committing rework in response to changes requested by reviewers. The reviewers are not as focused on your work as you are. They did the review and went on to another activity. By the time they get back to your pull request to see what has changed since the last time they looked at it, the reviewers will not remember the exact state of your branch before you did rework and squashed the branch again. The reviewers probably can't be certain if your rework has even improved the branch. Your reviewers will have to redo the entire review as a result of you squashing the rework on top of your original work. By the time they are half-way through re-reviewing your work they might be quite tired and have difficulty keeping focused on giving the pull-request a high-quality review.

We can save our colleagues as well as ourselves some time by taking a moment to consider before we squash. Our reviewers will be grateful!

tag:tacosteemers.com,2020-07-10:/articles/2020-07-10-when-(not)-to-squash-commits.html

Please use root relative URLs

Taco Steemers Jun 20, 2020 Updated Jun 20, 2020

Resources with URLs that do not specify the root will not be loaded if the user visits a subpage directly.

Show full content

Recently I fixed a bug in one of my employer's web applications, that would only sometimes occur, and only when using Internet Explorer 11. It had to do with application behavior that requires JavaScript.

Our mind may race to some obscure JavaScript Internet Explorer facts like the missing URL object constructor or that console.log() will fail if the console is not open. The actual cause was much more mundane.

The URL to the JavaScript relating to this feature was not relative to the root. It was missing the initial forward slash, /. As a result it refers to a file at the same node as the file the reference occurs in. Resources with URLs that do not specify the root will not be loaded if the user visits a subpage directly. An easy mistake to make, but less easy to discover due to the modern web application and browser landscape.

Some modern web applications consist of a single page. This type of application is referred to as a Single Page Application, SPA. All JavaScript is loaded through that one page. Other applications may have a single page where the user lands, the main page of the application. In this situation the main page is likely to load most or all of the JavaScript for the application. In both cases, the JavaScript will load correctly even if the / has been omitted, as the URL for the page will be equal to the web application root.

If we were to use a nested URL directly, without visiting the main page, it would go wrong. The reference to the JavaScript file will be parsed as relative to the nested URL rather than to the root.

In a stand-alone web application this may never be a problem. However, many modern applications integrate with other applications. Software vendors may provide one main application and several other linked applications for added value. Even a small business is likely to use a series of interconnected tools.

Please always use root relative URLs to avoid this type of issue. To make this more easy to do we can stop putting slashes at the start and end of variables, and always put them between the variable names. This practice also makes the code more easy to read because we never have to asks ourselves if the slashes are correct; we see them immediately.

The surprising part is that the failure situation does not occur in other browsers because they retrieve the correct file from the cache even though technically a wrong file is referenced. In my opinion Internet Explorer 11 shows the correct behavior. The other browsers make it impossible to reliably use the same JS file name in nested directories.

Due to the fact that Microsoft product support dates are linked to Operating System support dates, Internet Explorer 11 will be supported until September 2029 in some situations. An example is the Windows 10 Enterprise 2019 Long Term Servicing Channel.

tag:tacosteemers.com,2020-06-21:/articles/2020-06-21-please-use-root-relative-urls.html

Creating a new website without breaking search engine results and bookmarks

Taco Steemers Jun 13, 2020 Updated Jun 13, 2020

When moving our website to a different CMS or framework we might end up with a different URL structure. This is bad for search engine rankings and people's bookmarks. To avoid problems we can use rewrite rules and redirect rules.

Show full content

When moving our website to a different CMS or framework we might end up with a different URL (web address) structure. This is bad for search engine rankings and people's bookmarks. To avoid problems we can use rewrite rules and redirect rules.

An acquaintance asked for help with a redirect rule on LinkedIn. They explained that their new website design used different software that placed their blog articles in a different directory, resulting in a different URL. The change in URLs was not acceptable to them because they did not want to lose the SEO profile (search engine rankings) for their website. If a search engine cannot find your pages anymore, your pages lose their ranking in the search results. They thought they needed a redirect rule to redirect the search engine bots from the old URL pattern, /YYYY/MM/slug, to the new pattern, /posts/YYYY/MM/slug. On Netlify, their previous host, they could add /:year/:month/:slug /posts/:year/:month/:slug to their _redirects-file to achieve the redirect effect. Their new host uses the Apache web server, a very common choice. They wondered how they could achieve the same effect with Apache. The Apache documentation has a great page on this topic.

It is clear what they want to achieve, and it can be done with the RedirectMatch directive, but they do not need to use a redirect. A redirect requires clients and search engine bots to actually visit the old address before they are redirected to the new address. That takes a small amount of time. Humans prefer low load times. Depending on the contents and architecture of your site, a single page on your site might load twenty resources such as fonts, styling documents, javascript, images, pieces of advertising copy, contact details and finally the actual content. In the end all small load times add up to a bigger, more noticeable load time. Redirects may also be applied on top of each other. Mistakes can lead to wrong or infinite redirects. This problem might suddenly occur when you want to take the website down for maintenance and redirect all URLs to an "under maintenance" page.

An alternative solution is possible with a RewriteRule directive. This solution does not use redirects towards the clients but instead tells Apache how to fulfill the request, internally. The following example keeps the URLs the same, regardless of the change in directory structure:

RewriteEngine On 
RewriteRule ^([0-9]+)/([0-9]+)/(.+)$ posts/$1/$2/$3 [NC,L]

As a result, visiting /1999/01/slug gives us the resource on the server-side path /posts/1999/01/slug. A RedirectMatch directive would look similar. () gives us a numbered match that is loaded into the corresponding variable $. The first match becomes variable $1, and so forth. This rule will be applied to everything that looks like numbers/numbers/any_character. It can be improved by indicating that the first match should have four numbers (for the year), and the second match should have two numbers (for the month). The rule as is only specified +, 'one or more'. The final square brackets give two more instructions. NC indicates that the rule is not case sensitive. L indicates that if this rule matches the request it should be the last to be applied. In other words, this finalizes the rewriting portion of the request handling process.

tag:tacosteemers.com,2020-06-14:/articles/2020-06-14-creating-a-new-website-without-breaking-search-engine-results-and-bookmarks.html

Notes on making drag-and-drop functionality with Javascript

Taco Steemers Jun 5, 2020 Updated Jun 5, 2020

When I wanted to make drag and drop functionality I found there were plenty of tutorials out there. I am adding a few more words on the topic because I had to find workarounds for issues that I didn't find mentioned elsewhere.

Show full content

When I wanted to make drag and drop functionality I found there were plenty of tutorials out there. This is the page I used myself.

I am adding a few more words on the topic because I had to find workarounds for issues that I didn't find mentioned elsewhere.

Drag and drop has four visual elements:

The items that can be dragged
The area they originally came from
The areas they can be dropped on
The area it gets dropped on when the drag action ends

We will call the items that can be dragged the draggables. The areas that a draggable can be dropped on will be called the receivers.

Topics

In this article I want to discuss three topics. For Safari the receivers need an extra attribute, to be able to receive events that indicate the mouse cursor has left the area while dragging. In all browsers the receivers need an extra attribute to be able to properly handle situations where the mouse cursor stays inside receivers, but while doing so has also been moved over an item inside the receiver that is itself not a receiver. The final topic is draggable elements that contain a textarea, and making the textareas function as expected in the Edge and Firefox web browsers.

Event dragexit does not work as expected on Safari

I gave a different background color to receivers while the user was dragging the draggable over the receivers. By using a different background color the user can see which areas are receivers. We don't want to let the user find out by trial and error; dropping the draggable somewhere that cannot receive a draggable will release it from the mouse cursor. The user would have to find the draggable again, and hope the next area would accept the draggable. Adding the different background color when the mouse cursor enters the receiver can be done with an eventhandler added to the ondragenter attribute. My addition comes in when we want to remove the background color after the mouse cursor has passed over the reveiver. On Firefox it is sufficient to add the eventhandler to the ondragexit attribute. For Safari support, and possibly other browsers, we need to add the same event handler to the ondragleave attribute as well.

Event dragenter has quirky behaviour

With the previous improvement we have added a new problem. In all browsers we lose the background color on the receiver if we move the mouse cursor over an unrelated item inside the receiver, even though the mouse cursor has not left the receiver. This is as expected; the item the mouse cursor is now positioned above is not the receiver itself. As a result, the ondragexit or ondragleave event handler will be called.

The problem is that the background color is not added back to the receiver when the mouse cursor is positioned above the receiver again. This can be solved by adding the color-change code to both the dragenter event handler and the dragover event handler.

Text selection in a textearea inside a draggable element

The final problem we need to solve is the text selection behavior in a textearea that is placed inside a draggable. In Edge and Firefox the mouse-based text selection does not work. It does work in Safari. I have not been able to solve this yet. Other applications I am aware of avoid this problem; when the user clicks the draggable element they open a modal screen where the user can edit the contents of the draggable element. I think this work-around also gives the user a better experience because they will always know which draggable they are typing in.

tag:tacosteemers.com,2020-06-06:/articles/2020-06-06-notes-on-making-drag-and-drop-functionality-with-javascript.html

Why can the data we just stored on disk not be read?

Taco Steemers May 30, 2020 Updated May 30, 2020

Data that our code writes to a file may not be accessible immediately after writing. One reason is that some output stream implementations use buffers, and prefer to write out their whole buffer in one go. In these situations, if we want to be sure the data can be read …

Show full content

Data that our code writes to a file may not be accessible immediately after writing. One reason is that some output stream implementations use buffers, and prefer to write out their whole buffer in one go. In these situations, if we want to be sure the data can be read from the file right away, we flush the buffer. Another option in Java is to open the file for writing with the StandardOpenOption.SYNC parameter. SYNC Requires that every update to the file's content or metadata be written synchronously to the underlying storage device. Note that this sounds like it could lead to performance issues, as opposed to a single sync when finished writing for the current task.

The problem

While investigating a flaky end-to-end test at work, I have found that the data may not be available after writing even if we have flushed the buffer, the output stream implementation does not buffer, or we used the SYNC option when opening the file for writing.

It is true that the SYNC option comes with some cautions. As far as I can tell we are fulfilling the requirements needed to get the expected behaviour.

How could the data not be on disk?

Several possible reasons occurred to me. Perhaps the runtime implementation does not honor the SYNC instruction. This seems unlikely because our error situation was occurring on an Oracle JVM implementation. I don't expect a bug in file writing code there.

Neither do I expect a bug in the operating system disk cache. Please check out the diagram listed on that page! It is incredible how many components our bytes might flow through.

Perhaps some other buffer is getting in between me and my data. SSDs and hard disks have a disk buffer, similar to RAM. Disk buffer is going to be read from first before looking further. That is the idea behind disk buffers; providing fast storage for recent or popular data. This article claims that they have seen write operations on SSDs take as long as 6 seconds, so the hardware write time may still be an interesting angle if there is no disk buffer, or there is a disk buffer bypass for some reason.

A filesystem driver could do additional buffering. I suppose it is possible that the filesystem driver does not honor the SYNC instruction.

Certainly a remote filesystem cannot be expected to honor that; it would require a lot more work to do so across computers and I would expect quite low performance, especially if the protocol is built on a reliable protocol such as TCP. The guarantees such reliable protocols give require a lot more network traffic than is required for transferring the data itself. I imagine that when very small byte chunks have to be transferred and immediately written to disk it will require more resources because it requires more traffic. It might also incur more delays because the bytes have to be written out in the correct order.

I see reliability in the face of hardware or software failure as the main benefit of direct data synchronization. The extra network traffic required to make a protocol reliable makes it less likely that this benefit of direct synchronization can be achieved. Several network packets may stay buffered on the network adapter because one packet is out of order. The missing packet may never arrive because of a failure in any of the involved systems. In that case the data inside buffered packets would not be written to disk. Note that the same is true with disk buffers; most SSDs don't have enough power stored to write their disk buffer to disk when power is lost. That kind of feature is called 'power loss protection'.

Other ways, in Java

Is there another way in Java to ask for the file contents to be written to the file immediately? The FileDescriptor class has a sync method. When opening the file with the SYNC option we use the java.nio.file.Files class and receive an output stream. The FileDescriptor class is in the java.io package. These two avenues cannot be combined. FileChannel's force method won't help us as we need to pass an output stream to an external library.

Conclusion

Our situation occurred in an end-to-end test. It is not clear to me how our specific situation can be solved without adding a wait period in between. This situation where we write to disk and immediately want to write the same data from disk should not occur in production code. It requires less disk I/O to keep the data in memory after writing it to disk, instead of reading it from disk directly after writing. That means the performance as a whole should be considerably better because RAM is considerably faster to access than disks. This problem should only occur in practice if you cannot change the code that writes to disk, and the code that reads from disk. In our production environment our testing problem is not relevant. Still, I wonder why the data could not be read from disk even though we had asked for it to be written out immediately.

tag:tacosteemers.com,2020-05-31:/articles/2020-05-31-why-can-the-data-we-just-stored-on-disk-not-be-read.html

How to avoid displaying directory listings on your website

Taco Steemers May 22, 2020 Updated May 22, 2020

Our websites contain directories with files that are not usually read by humans. Examples are directories containing Javascript files or files for XML feeds. Sometimes we want to disallow directory listings for these directory contents.

Show full content

Our websites contain directories with files that are not usually read by humans. Examples are directories containing Javascript files or files for XML feeds. Sometimes we want to disallow directory listings for these directory contents. Here is an example of a directory listing for my blog articles: Screenshot of a directory listing

Normal users of our site do not visit these directory listings. To reach them requires adjusting the address bar by hand.

The problems

The files in these directories are unlikely to be useful to visitors. If they are then the visitor should be guided towards them through your navigation structure. Then they will have the proper context to interpret these files.

The RSS and Atom feed files for my articles are available on my main page as well as the article category pages. There are also some automatically generated feed files that I don't list on my website because I don't think they will benefit people as much as the ones that I do list (1)☟.

A potential issue is that the files, when loaded directly from a directory listing, may not have the header and footer they would have when they are loaded the usual way. You may have navigation elements in the header and terms and service elements in the footer. A document that is accessed directly trough a directory listing may not contain these elements.

A problem wih the example in the screenshot is that the links in that automatically generated listing do not work. The links lead the user to an error page.

For these reasons I think the directory listings are not beneficial to our visitors.

The solution

This website uses a standard webhosting plan. The web services that serve up the websites on these webhosting plans usually support the hypertext access file.

This .htaccess file can be used to hide the directory listings by using the following entry:

Options -Indexes

The attribute name is Indexes because a directory listing can also be called a directory index. The - means 'no'. Here is a manual on the Options entry.

After adding this the following message appears on my site instead of the directory listings:

Forbidden
You don't have permission to access this resource.

The .htaccess file can be placed in the root directory of your website. Your hosting provider may have instructions on their site as well.

If the .htaccess file is not supported in your situation you may want to contact your hosting provider and ask them to disable directory listings for your site.

An alternative solution

There is a manual workaround. It requires adding a file to each directory listing that we want to hide. To each directory that should not be listed we can add an index.html file. Webserver software is usually configured in such a way that it will prefer sending this file to the client instead of showing the directory listing. The file can be empty, show a "file not found" message, and it can show your website navigation.

Footnotes

(1) It would be preferable these unused files would not be generated. I have not figured out how to stop these files from being generated. ☝

tag:tacosteemers.com,2020-05-23:/articles/2020-05-23-how-to-avoid-displaying-listings-on-your-website.html

How to include HTML documents inside HTML documents

Taco Steemers May 17, 2020 Updated May 17, 2020

There are two good ways to include our static HTML documents inside the main HTML documents that make up our website. One is to include them server-side, before the server sends the main HTML document to the web browser. The other is client-side loading where we have Javascript on the …

Show full content

There are two good ways to include our static HTML documents inside the main HTML documents that make up our website. One is to include them server-side, before the server sends the main HTML document to the web browser. The other is client-side loading where we have Javascript on the client (the web browser) request information from a server, format that as HTML and append that to the main HTML document inside the web browser. It is the best way to load dynamic content, such as a feed that receives updates. Mozilla provides a resource that explains why you would want to use this method to load dynamic data. However, the topic of my article today is including static HTML documents, and for that purpose I personally find client-side loading to be too involved. For example, it requires putting some thought in to error handling, and the order in which information and styling becomes visible to people during the page loading phase. To keep this article short I will skip client-side loading. I do want to mention the security implications related to loading other people's documents into your pages. To keep our users safe we have to assume that not all security problems can be solved by restrictions inside web browsers, and keep an eye on the security implications of using documents and scripts that we did not make ourselves.

There is a third way that we will discuss first. An iframe can be used to load an HTML document inside another document.

iframe

From talking to colleagues I know that they don't like iframes. Perhaps they have one of the many criticisms listed here. One of the biggest downsides of iframes, also listed on that page, is that screen readers have difficulties to explain them to the user.

In my opinion the iframe has one purpose that it serves well. It allows one to load an entirely separate page into another page. This can be handy for all sorts of uses such as manuals, blogs with comments, or separated functionality such as a widget that contains a video or a comic panel. In case of classic, book-inspired manuals the iframe can be used to create a navigation panel on one side of the page and show the user the page or manual they asked for on the other side of the page. Blogs with external commenting systems can use them to load the commenting functionality without letting the external scripts have access to the other contents of the blog. For external video and picture files it has the same advantages. If these contents from external documents were not restricted to the size of their iframe they might be able to show a full-page advertisement.

Given all the criticisms of iframes we saw on that Wikipedia page earlier, we really should avoid using them. At the time of writing I am unfortunately still using them in two locations on this website.

One is a page where I have some easy to use timers. I use them as a pomodoro timer and a break timer. To avoid having to add the same HTML several times I load these timers with an iframe. There is an important principle in computer programming: don't repeat yourself. The timers need HTML tags, styling and javascript code to be able to function as intended. If I were to paste several copies of everything they need on the same page I would be violating this DRY principle. The result would be that a single mistake would have to be fixed in several places. As I wrote about before, we don't want to set ourselves up for mistakes.

Another place I use an iframe is a page where I embed a board that I use when I want to track the progress of several smaller tasks I have to accomplish during my work day. My justification for using it here is that this board is a separate application maintained in a separate codebase. If I want to use an updated version on this website I copy the files over and overwrite the ones that were there. The naive way to avoid using an iframe for this use case is copying the actual HTML in to the Markdown document that my static blog generator uses to generate the page that ends up on my website. The reason we should avoid doing that is that it would again lead to repetition and an increased risk of mistakes.

Server-side inclusion

I am going to stop using iframes for these use cases. Instead, I am going to use the include statement of the blog generator's template language to include these HTML documents in to the main HTML document on the server, before the server sends the main document to the client. I will remove the head and body tags from the timer document. That way we can include the timer document without ending up with several head and body tags, as there should be only one of those if the document is not inside an iframe. Luckily documents without those tags are considered valid since at least 2014. As a result of that we will still be able to use the vertical board and the timers as stand-alone tools even when omitting these tags.

The downside for the timers example is that the payload increases in size. In the iframe situation the web browser would do only one extra request to the server to get the embedded HTML document that contains the timer. The web browser is smart enough to realise that it doesn't make sense to make another request for the other two timers; the response would be the same. In the new situation the three copies of the timer will actually have to be sent to the client in duplicate. All three will be sent as part of the main HTML document that contains them. In this case it is entirely acceptable because the timer HTML is small.

Currently, the timer Javascript and CSS are stored inside the timer HTML. Apart from some laziness, a valid reason to keep it that way is that by not using separate files for the Javascript and the CSS we avoid creating two extra HTTPS requests to the server. Keeping the JS and CSS separate from the HTML is what one should usually do in more standard situations. When the timer documents are not in iframes anymore I will be move the JS and CSS to separate files. In this particular situation it will come at the cost of two more HTTPS requests to the server; one for the JS file and one for the CSS file. If anyone is still with me at this point: we will be making one more request than we did in the original situation. At the scale of a website like this the extra load on the server will not be noticed.

If we insist on using an iframe

If we insist on using an iframe, let's use them well! Iframes have some cross-browser problems that I have run in to myself.

One is that the difference in screen size between a computer screen and a smartphone screen can be large. We still need to get the document inside the iframe to be displayed properly on both screen sizes, without pushing away the content on the main document. The difference in screen sizes could in the past easily be accounted for with CSS media queries in combination with device-related CSS properties like max-device-height. Unfortunately these device-related properties are deprecated. The word 'deprecated' means that it has fallen out of use and is no longer supported by the browser developers. The browser on my phone doesn't support it any more. That is how I found out that the property is deprecated even though testing on my computer showed that it worked as expected.

In my case I want the contents to be shown correctly on different sizes of screens, but I also want to show a message that explains the vertical board application does not work on small screens. To get a similar effect today, without using the device-related CSS properties, we have to take three steps. - we use the regular max-height css property on the media queries inside the iframe to hide the application on the small screen and show the message instead.

@media only screen and (max-height: 800px) {
    #application {
        display: none;
    }
    #notmobilefriendly {
        display: block;
    }
}

- we set width: 100% on the body inside the iframe

body {
    width: 100%;
}

- on the iframe tag in the main document we add CSS to set a size based on how big the browser window is, like this:

    height: calc(100vh - 80px);
    width: 100vw;

Here I have subtracted the height of the header on the main document from the height of the window (vh, which stands for view height) to get the correct height for the iframe. The browser will now calculate the correct height and width for us.

Another problem is that the Safari webbrowser, used on Apple devices, does not handle iframe content size well. We will skip the details and go right to the solution.

We place the iframe inside a container that has some size of its own, a padded div for example:

<div style="padding: 1px;"><iframe class="timer" src="timer.html"></iframe></div>

for this situation we also set width: 100% on the body inside the iframe
```
body {
    width: 100%;
}
```

Now the Safari webbrowser will also give the iframe the requested dimensions.

Web browser behaviour is a moving target

To get non-standard things like dynamically-sized iframes to work predictably across all web browsers is a difficult task. What makes it even more difficult is that the goal of having your document look the same in all browsers is a moving target; browsers receive updates and people on different continents tend to use different browsers. Whereas in Europe we mainly use Firefox, Chrome, Safari and Edge we find other browsers in Asia. At the time of writing, this statcounter page about browser market share in China shows that the UC Browser and QQ Browser have a combined marketshare of 22.6% there.

Keeping the 'why' in mind

I think the best way to look at web pages is that they are a way to provide information to people. We can't expect the pages to look the same in all web browsers. We must make sure that the information we want to provide to people is readable in all browsers. For that reason I find it best to keep our designs simple. Our design can be hip, or elegant, or show our personality and that of our company, but the information must be readable at all times.

tag:tacosteemers.com,2020-05-18:/articles/2020-05-18-how-to-include-html-documents-inside-html-documents.html

Mistakes will be made

Taco Steemers May 11, 2020 Updated May 11, 2020

I occasionally make a mistake in a professional environment. Mistakes will be made, that is just how it is. We do have to keep looking at how we handle them and make sure we are not making a habit of it. Here I share some of my thoughts on the …

Show full content

I occasionally make a mistake in a professional environment. Mistakes will be made, that is just how it is. We do have to keep looking at how we handle them and make sure we are not making a habit of it. Here I share some of my thoughts on the topic.

Last week a customer ran in to a problem that I had created by mistake. This is not a difficult kind of situation to handle and goes something like this: I own up to it, give a quick explanation and then fix it myself or assist a colleague in fixing it, and when the situation allows for it I talk to all involved and provide all the necessary details about how the mistake came to be.

That last part is important. Not only to the customer and your colleagues, but also for yourself. In explaining the situation in detail you will find exactly what went wrong and what role you played in the situation. Perhaps you zigged when you should have zagged, as they say.

Key to my learning has been not just the breaking but more importantly the fixing. The mistakes made I saw as lessons learned when I realised what had happened, and a chance to dive deeper in to what I was working on. It feels good to learn to fix things for yourself, especially when the pressure is low.

Mistakes will get made. Problems will occur. Hopefully everything will be fine. If the mistakes and the problems you created still hurt you will be okay, because you have an incentive to avoid them.

Colleagues

During the last 9 years I was able to experience other people make mistakes in professional environments as well. In some sense I enjoy the mistakes when they happen because they tend to be kind of harmless in the grand scheme of things, yet a lot can be learned by paying attention when we are correcting them. Another person's mistake and solution can be quite educational.

Would you have handled the situation the same way? If not, you may want to ask what your colleague thinks about your proposed solution, after the urgency has died down.

Sometimes you have to interrupt then and there if you see a new problem in the proposed solution. That is a more difficult situation and requires more insight in to the situation and people. How certain are you? Is the potential problem a big problem? Are you able to bring it up while keeping the discussion productive? You may have to be a bit insistent to make sure you are heard.

Though at times you may feel your colleagues are taking a sub-optimal action, a more likely situation is that you are missing some context that justify your colleagues' action. Because of this a good approach can be to ask if the scenario you are thinking of is possible, just to give the others the opportunity to recognize something they might be overlooking, and then let them continue. If you still have doubts or questions after the situation has been resolved, you can discuss them with your colleagues at that time.

Take note of the situation and move on

You need to realize that you are going to make mistakes. Some won't be fixable. For example, some production data will never come back. It depends on a lot of factors, the simplest being the granularity of the backups. You can't just put the old backup back; all newer data would then be lost forever. The missing data might have to be scripted back in, which brings a lof of new risk.

Heated discussions can occur. Just remember it is not about you or them, it is about solving the problems at hand. Blame is not what it is about and rarely truly needs to be assigned. If there was no malice, ignore the blame. I have personally never seen malice, only honest mistakes.

Unfortunately there is a lot of carelessness as well. Carelessness is not the same as the initial inexperience that leads to mistakes. It stings a bit when I have to give someone my time due to their carelessness, but it is part of my job to give support where needed. I try to avoid being the careless one.

Don't hold a grudge. Tomorrow there will be another day with another problem that you will resolve together. Take note of the situation and move on. If the company culture allows for it, it may be a good idea to make a write-up to share what happened and what you have learned. In follow-up discussions all involved can walk away with more ideas for improvement.

Users and their representatives

I make sure to remember that even though I am sometimes far removed from the consequences of mistakes made, there is an end user of my work and that of my team. Perhaps it is a person who has no choice in their use of our product, but does have mouths to feed. It is probable that I am going to make some people sad, over the course of my career. That is kind of inevitable if you don't work in a vacuum. Software can be very frustrating and confusing, to the layman, the professional user and the software developers as well. Still, we should be gracious, even if a support ticket seems unfair to us and our work. Just as it isn't always easy to be in our position, it isn't always easy to be in their position. The best response is a helpful response.

There is your environment, and then there is you

Where does our responsibility start and end? If we have a responsibility in some area, we should also have a way to influence the outcome of the actions taken in that area.

The new senior software developer who was replacing me didn't appear to be listening when I tried to warn him about the inadequate deployment procedure, and the production database credentials that were included in the version control system. I was there as a contractor for only five months and tasked with creating two web applications in a domain that was new to me. I had no influence and no time to improve that part of the organization. It did feel bad when a few hours later some production data was lost. This is one of the many situations that I have learned from. Here I learned that I should trust my instinct and be more insistent the next time I feel that such a warning is not properly heard.

It is up to the workplace as a whole to get safe procedures in place. We can and must advocate for safe procedures. In the meantime we do what we can to avoid making mistakes and creating problems. We should also acknowledge that there are limits to what we can do by ourselves. We are always dependent on others. If situations in your environment force you to log in to a production database, and you then proceed doing so, there is a chance of you doing some damage. You should in your mind take responsibility for taking the risky action. You should also recognise that apparently your environment is pushing you towards situations where you are likely to create problems at some point. Can you and your colleagues improve the workflow? In my opinion it is always worth it to at least to have the discussion, and it is worth the effort it takes to get time scheduled for an investigation in to better practices.

Avoiding mistakes by avoiding risky situations

Mistakes are most likely to be made when we are tired, distracted, or not fully aware of the details of the system we are working in. The easiest way to avoid creating big problems is to not let ourselves be set up for a situation where mistakes are likely to be made and likely to have a meaningful impact. Because mistakes are never just likely. I feel that in workflows where mistakes are likely, they are also inevitable.

tag:tacosteemers.com,2020-05-12:/articles/2020-05-12-mistakes-will-be-made.html

Still searching for a daily use computer that just works

Taco Steemers Apr 25, 2020 Updated Apr 25, 2020

Unfortunately there have been several problems with my macOS install, and with the Unix ports I install with brew.

There has been a time when the Finder appeared to not get an updated list of inodes. After moving files with the Finder the files appeared in the original directory as …

Show full content

Unfortunately there have been several problems with my macOS install, and with the Unix ports I install with brew.

There has been a time when the Finder appeared to not get an updated list of inodes. After moving files with the Finder the files appeared in the original directory as well as the updated directory.
Confused by this you might think that you had accidentally copied the files instead of moving them. Deleting 'the original files' would lead to deleting the files in their new directory. They wouldn't be available in their new directory anymore. After all, there were never any copies. It was just that the Finder was still showing them in the original directory. I don't know if the Trash would have been able to restore them; I have the bad habit of bypassing the thrash can. An old habit from when disk sizes were small. So I resorted to always using the commandline even for file-related tasks that the Finder was able to handle reasonably efficiently in the past, using the Finder only for the image previews.
On one recent morning the Finder started working correctly again.

There have been other problems such as sound not working anymore.

Twice brew has appeared to become entirely confused, a lot of packages had to be reinstalled. The second time I reinstalled brew itself as well if I remember correctly.

Today I had to reinstall nmap. I don't know how it broke. It worked a few days earlier.

$ nmap XXX.YYY.ZZZ.0/24
dyld: Library not loaded: /usr/local/opt/openssl/lib/libssl.1.0.0.dylib
  Referenced from: /usr/local/bin/nmap
  Reason: image not found
Abort trap: 6
$ brew install nmap
...
Error: nmap 7.70 is already installed
To upgrade to 7.80_1, run `brew upgrade nmap`.
$ brew uninstall nmap
Uninstalling /usr/local/Cellar/nmap/7.70... (807 files, 26.8MB)
$ brew install nmap

After reinstalling it worked again.

The reason I switched from using only Linux to using macOS for my laptop was that I was getting a bit tired after +- 15 years of fixing problems with my personal Linux installs. I wanted something that just works. Unfortunately I can't say that has really become true. What I will say though is that the general quality of this 2018 MacBook Pro is quite good. There have been few crashes, no unintended openings or cracks have appeared on the laptop, and there have been no bulging heat pipes so far. Fingers crossed!

Just make sure not to get any breadcrumbs in your keyboard.

And make sure to charge it from the right side ports.

tag:tacosteemers.com,2020-04-26:/articles/2020-04-26-still-searching-for-a-daily-use-computer-that-just-works.html

Quick text manipulation, a practical `sed` example

Taco Steemers Apr 21, 2020 Updated Apr 21, 2020

Suppose you have received a 10k line file of text in a format that is difficult for you to work with, like XML. You want to get some specific information from that file, and realize that getting that information by hand will take a long while.

In some cases an …

Show full content

Suppose you have received a 10k line file of text in a format that is difficult for you to work with, like XML. You want to get some specific information from that file, and realize that getting that information by hand will take a long while.

In some cases an editor like IntelliJ IDEA can be really useful (see footnote), but a full-blown IDE may not what be you are looking for. You may want to use what you have, or you may need it as part of a script. Here I want to show you an example of how I have found sed the stream editor quite useful. It is available on Linux and macOS without having to install anything.

Our input file has tags (names) and values for many types of data. We only want the list of values of the identifier tags of a specific type of parent tag. Let's say we are working with some enterprise resource planning applications and are looking for a list of widget identifiers that are in the input file. For example, we want to use these to correlate data between two systems such that we can a put together a more complete dataset for a once-yearly report. Luckily for us that identifier we need is on the line after the opening tag of the widget datatype. Once we have the list of identifiers we want to use those to get more information from an SQL database.

To clarify, the input data we are interested in looks like this:


...
    <widget>
        <id>6Q</id>
...

Let us look at the different steps and how sed fits in. First we make the output file that we want to store the output in.

touch output.txt;

Then we want to search the input file for the <widget> tags, when we find one we go to the next line and just forget about the line that we found the token on.

touch output.txt; sed -n '/<widget>/ {n;p;}' < input.xml

Here we have asked sed to go to the next line and print that if we have found <widget>. By specifying -n we ask sed not to print anything other than what we specifically asked it to.

Now we will have a list of about 2k identifiers, but they still have their tags, like so: <id>6Q</id>. We don't want those tags. Neither do we want the whitespace around the tags.

So let us pipe the stream that sed outputs into the next steps using that vertical thing called the pipe, |. For example. we will tell sed to replace <id> with nothing. We do that by using the pattern s/existing text we don't want/new text we do want

touch output.txt; sed -n '/<widget>/ {n;p;}' < input.xml | sed 's/<id>//g' | sed 's/<\/id>/,/g'

Now we have removed <id> and replaced </id> with a comma.

Because we don't use any additional flag in the new sed commands we can also combine them in one call with the -e flag. I believe that will be faster because there is no more data transfer from one process to another. The version of sed on my Linux computer does not seem to support the -e flag though. If you are using sed on Linux you might need to keep piping the commands to each other.

touch output.txt; sed -n '/<widget>/ {n;p;}' <input.xml | sed -e 's/<id>//g' -e 's/<\/id>/,/g'

Next we remove the whitespace that we don't want.

touch output.txt; sed -n '/<widget>/ {n;p;}' < input.xml | sed -e 's/<id>//g' -e 's/<\/id>/,/g' -e 's/ //g' -e ':a;N;$!ba;s/\n/ /g' > output.txt

We have replaced the whitespace with nothing. What comes next is difficult to read. Why not just use sed 's/\r\n//g' to replace \n or \r\n with nothing? The newlines and carriage returns will not be seen by sed because sed will normally work on the actual contents of each line, line after line. We have to do more work or switch to using tr, which is a simpler way to swap text. A good explanation of what the pattern does can be found here. In short, we mark a position as a, and add the next line with N. $!ba means we keep doing this until the whole file is inside our computer memory. If we are not at the last line, $!, we move back to position a and keep going. This way sed can handle all input in one go, including newlines.

The result has been written to the file output.txt which now contains all widget id values separated by a comma. The last widget id also ends in a comma, that needs to be removed as well. Here sed comes to the rescue again with $ which will refer to the last character. sed 's/.$//' < output.txt will remove the last comma. Now we can use it to make a select statement on a database table: SELECT name, quantity_sold, quantity_unit FROM widgets WHERE objectid IN (...) where we fill the brackets with the list of ids we created.

echo "SELECT name, quantity_sold, quantity_unit FROM widgets WHERE objectid IN ("; sed 's/.$//' < output.txt; echo ");"

will then result in the following format

SELECT name, quantity_sold, quantity_unit FROM widgets WHERE objectid IN ( 1Z, 2G, 3A, 4T, 5H, 6Q, 7P );

There we have it, the full query.

The Grymoire probably has the best sed page I have seen, go check it out!

IntelliJ IDEA footnote: It offers Column Selection Mode and multiple carets. Note that the linked blog post is quite old, I suspect more is possible these days.

tag:tacosteemers.com,2020-04-22:/articles/2020-04-22-quick-text-manipulation-a-practical-sed-example.html

I found myself working hard instead of smart

Taco Steemers Apr 15, 2020 Updated Apr 15, 2020

Ah, the often mentioned "work smart, not hard". As well as "not invented here"...

I had set out to re-launch my blog. Somehow, in my enthusiasm, I ended up writing a static website generator almost to completion. The only two remaining topics were the two topics that most looked like …

Show full content

Ah, the often mentioned "work smart, not hard". As well as "not invented here"...

I had set out to re-launch my blog. Somehow, in my enthusiasm, I ended up writing a static website generator almost to completion. The only two remaining topics were the two topics that most looked like work instead of hobby. Though I love both learning and programming, there are limits to what I will do in my free time. The remaining work lead me to thinking about how I could reduce the amount of work by scrapping features and by intelligent reuse of software projects I had already lying around. That was probably the first time I approached this new hobby project as a serious project, and the flaws in what I was doing immediately became clear to me.

Instead of doing things that I find challenging such as writing an interesting and readable text and then putting that work out there to be seen, I was doing something that I find easy and enjoyable; I was using my blog as a reason to refresh my shell scripting skills. Another flaw in my approach was that I had not approached the blog re-launch as a serious project and had not done some things I always do when I do work for someone else.

By working hard instead of smart after failing to create any type of design document and keeping every idea in my head. By deciding not to use existing tooling because they are not as fun and flexible to use as I thought my own static website generator would be. While trying to use my own static website generator for this re-launch I found plenty of remaining work which refute those points; the last 20% of the work tends to be as much work as the first 80% of the work.

Luckily I did set a deadline for the initial objective, the re-launch of my blog, which has not been delayed by much. The question is now which road to I will take; will I continue on with my own static website generator or will I use Pelican again like I am doing with my current blog. (Edit: I am using Pelican now and I do enjoy it.)

Either way, it was a refreshing learning experience. It was all on me and there were no external factors that could be blamed. That makes it easier to learn from the situation than in most situations that arise on the job.

tag:tacosteemers.com,2020-04-16:/articles/2020-04-16-i-found-myself-working-hard-instead-of-smart.html

Connecting to your printer on a Linux system

Taco Steemers Feb 7, 2015 Updated Feb 7, 2015

If you are using a GNU/Linux distribution and are having trouble finding your printer in print dialog screens, the following may be of help to you. Make sure you have CUPS installed, the Common Unix Printing System. Start cups first, on my system this can be done by executing …

Show full content

If you are using a GNU/Linux distribution and are having trouble finding your printer in print dialog screens, the following may be of help to you. Make sure you have CUPS installed, the Common Unix Printing System. Start cups first, on my system this can be done by executing

# /etc/init.d/cups start

CUPS 1.5.3 uses http://localhost:631/ as the configuration utility. On my install, I was unable to print anything because no printer was found. Unfortunately I was also unable to add a printer. I had to edit the configuration file, which is located at /etc/cups/cupsd.conf Be sure to make a backup, if you do decide to edit it. I changed all lines containing

"DefaultAuthType ..." into "DefaultAuthType none"
"AuthType ..." into "AuthType none"

and removed all lines such as "Require user @SYSTEM" Then I was able to find my printer and print.

There are probably ways to get more fine-grained access control that still allows you to print, but frankly, when you need to print something you need to print it right then, and not a couple of days later. The mentioned edits might be seen as bad for security, but for a small network the risk seems low. Especially when compared to not being able to print.

tag:tacosteemers.com,2015-02-08:/articles/2015-02-08-connecting-to-your-printer-on-a-linux-system.html

Programmatically creating scalable vector graphics (SVG)

Taco Steemers Feb 6, 2015 Updated Feb 6, 2015

This is a small note on programmatically creating scalable vector graphics. For this we use Python with svgwrite which was simply the first tool I found. We will not be comparing different tools.

When creating graphics for posters, programs, or the web there are some advantages in using scalable vector …

Show full content

This is a small note on programmatically creating scalable vector graphics. For this we use Python with svgwrite which was simply the first tool I found. We will not be comparing different tools.

When creating graphics for posters, programs, or the web there are some advantages in using scalable vector graphics over regular graphics such as PNG and JPG. SVG is scalable (resizeable) without any loss of detail, or 'fuzzyness'.

Here we see the output of the source code used in this example. You can zoom in as much as you like without the graphic looking 'pixelated'. This is becasue SVG does not use pixels to describe the graphic, it uses vectors. Note that the program that you use to look at the SVG file may limit how far you can zoom; the browser I used to proof-read my post only allows a 2x zoom. However, there are no technical limitations.

The basic idea is that we create an object with a shape and a location. Then we add that object to our 'canvas', the SVG document. Like so:

# Creating a canvas
svg_document = svgwrite.Drawing(filename = "using-svgwrite.svg", size = ("100px", "100px"))

# Creating a line 
lineA = svg_document.line((xStart, yStart), (xEnd, yEnd), stroke_width = 1, stroke = colorA)

# Placing the line on the canvas
svg_document.add(lineA)

It is that simple.

In the source code we use two different code paths for the 'horizontal' and the 'vertical' four-leaf clover. If the code had better structure, and the methods returned elements rather than adding them to the document immediately, we could have used the 'rotate' transformation to rotate the four-leaf clover as we wished.

One of the clovers doesn't look quite right. I was not able to get the 'swirl' to look right on all four leafs. Can you solve that?

You can create prettier patterns than the ones shown here, but I have deliberately not included mine. It is much more fun to try creating your own patterns than it is to look at someone else's!

Another library you might want to look at is Cairo, which has bindings for many different languages, according to Cairo's Wikipedia page.

The source was run with Python 2.7.3 and svgwrite 1.1.6.

tag:tacosteemers.com,2015-02-07:/articles/2015-02-07-programmatically-creating-scalable-vector-graphics-svg.html

Setting up a Secure FTP server (SFTP)

Taco Steemers Jan 28, 2015 Updated Jan 28, 2015

We want to set up a secure FTP server (let us call this 'the service', to avoid confusion). This service will receive backups. The service (and it's clients) don't need access to any unrelated commands. So we will make an empty PATH and won't let the users perform a normal …

Show full content

We want to set up a secure FTP server (let us call this 'the service', to avoid confusion). This service will receive backups. The service (and it's clients) don't need access to any unrelated commands. So we will make an empty PATH and won't let the users perform a normal login. The service only needs to have access to one directory. We will attempt to restrict it to that directory by using the chroot utility ('change root') which will restrict the service's view of the server's filesystem. The service will need to be able to find all it's dependencies as well. One complication is that this will entail having to place the service and it's dependencies in a location that is not known to the normal system update service. I considered trying to fix this by automatically copying the updated versions over the new versions and restarting the service, until I realized a symlink would probably be much better.

As it turns out, there is a good but shallow tutorial over at Debian Administration. This tutorial shows the correct settings to set in /etc/ssh/sshd_config. These are 'Subsystem sftp internal-sftp' and

Match group sftponly
     ChrootDirectory /home/%u
     X11Forwarding no
     AllowTcpForwarding no
     ForceCommand internal-sftp

According to this tutorial, the system described there does not suffer from the mentioned update problems. We can use this tutorial. Be sure to set 'AllowTcpForwarding no' so that your service cannot be used as a proxy. Note that internal-sftp is not the binary that we run, that is still the regular ssh server. It is an instruction to use a version of the sftp service that can work in combination with chroot. Only that instruction will be run when a user in sftponly tries to use the service.

Unfortunately, there seems to be no way to have a user use a chrooted SFTP service while still allowing their accounts to easily be used for other services. This is because one will want to let these other services store files in the user's home directory, which the user will be able to access. This access is not always desired. The reason that one cannot constrain the SFTP user to a specific directory in the home directory of the user, has to do with the proper usage of access patterns when using chroot, and the access patterns enforced by sshd. If we want to use /home/%u as the chroot directory, we must allow only root to manipulate that directory. As a result, we need a directory that the user has privileges in, let's make that /home/%u/sftp. However, when the user connects to SFTP, they will be dropped into their home directory, /home/%u. We removed their privileges for this directory. To let the user be dropped into /home/%u/sftp instead, we need to make that their home directory. This makes it difficult to store files that should not be accessible over SFTP.

Let's start setting up the service. First, we create a group sftponly to which we will add the user accounts intended for sftp. We remove the users from other groups . From here, we can largely follow the other tutorial. To troubleshoot problems, stop the sshd service and use /usr/sbin/sshd -d which will give easy access to debug logging. To allow the sshd to do it's work, we have to set chmod 755 /home/myuser/ and chmod 755 /home/myuser/sftp. Preferable I would only have let the owners interact with the intended directories, using chmod -R o-rwx . on /home/myuser. However, this will cause SFTP to malfunction; it cannot drop the user in the user's directory, as it does not have access to it. We set the user's home directory to the files directory inside the user's SFTP root directory; usermod -d /sftp myuser. At this point, the user will be dropped in /home/%u/sftp/files when they have succesfully connected. To stop the user from logging in over SSH, we disable their shell; usermod -s /sbin/nologin myuser. If the user tries to SSH, they will be told that 'This service allows sftp connections only.'

tag:tacosteemers.com,2015-01-29:/articles/2015-01-29-setting-up-a-secure-ftp-server-sftp.html

Some notes on trying out Crashplan, Duplicati and BackupPc

Taco Steemers Aug 29, 2014 Updated Aug 29, 2014

Observations after testing Crashplan.

The Crashplan test-setup has not been able to connect for a while. As a result these notes are partially from memory and could not be revisited. The user interface is confusing, buttons are placed far away from the context they operate in. There are buttons that …

Show full content

Observations after testing Crashplan.

The Crashplan test-setup has not been able to connect for a while. As a result these notes are partially from memory and could not be revisited. The user interface is confusing, buttons are placed far away from the context they operate in. There are buttons that fold out UI elements when you click on the buttons, but not a lot of extra information appears. It would have been better if more information is grouped together under fewer buttons. The Linux and Windows versions have inconsistent user interfaces. At some point my test setup suddenly stopped connecting. I tried several command-line actions; either recommended or not recommended. It didn't help. The restore-screen has a nice filter for finding the file that you want. There are confusing problems with files that were recently removed. It looks like it does not let you restore a file if it does not realise yet that the file has recently been changed or restored. Or perhaps it is related to the files being in the trashcan? For reasons that are unclear it takes several completed backups before a 4GB file is actually backed-up. The restore-list would show a 0 byte file during that time.

Observations after testing Duplicati.

It supports transfer over SSH which is great. I love tried and true technology. Unfortunately I have not been able to restore any files. The restore-screen requires a lot of clicking. It keeps giving warnings, even if you are not interested. There are very detailed descriptions of how Duplicati handles copying locked files. Even so, it is unclear to me how successful it tends to be in backing up locked files. Note that when you want to use SFTP, you should select 'SSH-based' rather than 'FTP-based'. This is because FTP and SFTP are not the same thing, and SFTP makes use of an SSH connection. Make sure to check the box 'Ignore file modification timestamp when making incremental backups'. Otherwise, Duplicati will only use a very simplistic measure of determining whether to make a backup (the file modification timestamp) which will fail when you get a file with an older timestamp than the one known to Duplicati. This can easily happen if you get the latest version of the file from a different computer (or a service running on a different computer, such as a mail service). A discussion of that can be found here: https://code.google.com/p/duplicati/issues/detail?id=911

Observations after testing BackupPc

BackupPc appears to perform the base required functions as expected. Files are being backed-up and they are restorable. Additional positive features are that the files are stored in such a manner that hey appear to be manually restorable with ease. The server-side configuration of the clients is finicky. It can take several tries to find the correct format of listing the client address. Once this is done there is no hassle. BackupPc can mail reports, but I have not tried this. The downside, for my use case, is that I have not been able to think of an easy way to notify the client that a backup is running or has finished.

Conclusion

As it stands, I am not satisfied that I have found a workable solution. Currently I am considering to use the backup utilities supplied with the Windows OS. These work over SMB shares. Initially this was not seen as an option because this is not CryptoLocker-safe. To try to hack on such safety one might make a server-side copy of the share at regular intervals. Mirroring would be a mistake; the crypto-locked files will also be mirrored. Considering this, a simple mirror would provide no utility against CryptoLocker-like malware. To be more robust, the server-side backup of the backup target will have to contain versions of several moments in time. This will require additional storage space. The amount of required storage space can be reduced by using deduplication. This backup-of-a-backup (with deduplication) could perhaps be provided by the previously mentioned BackupPc. Unfortunately, SMB sharing has also let me down; the shares are often not reconnected on startup. Instead we find the notice that 'the network name cannot be found'. This is curious, because a ping using that network name is successful. After disconnecting and reconnecting

In this scheme, the clients would use the user-friendly Windows-native backup tools to backup to the share. These will be versioned backups. Let us call this the client backup collection. The server would make this process robust against malware by storing several versions of the client backup collection in a location that cannot be visited by clients. If the client becomes infested with malware that attacks their files, the client backup collection will most likely also be attacked. If we have infested clients, assuming that we are looking at a small network consisting of around ten devices, a single operator could take the following steps: First: - Stop all network connectivity. - Inspect the devices to see which require further care. Easier said than done. The follwing actions could be performed in parralel: - Re-load/re-install the client OS. Make sure not to attach an invested device to a vulnerable network or device. It seems wise to keep the newly cleaned devices disconnected for now. - On the server, bring the client backup collection back to the latest state before infection. Once it is concluded that all devices are clear: - Reconnect them - Restore their files using the OS-provided tools. Depending on network performance this may have to be performed in more than one batch.

tag:tacosteemers.com,2014-08-30:/articles/2014-08-30-some-notes-on-trying-out-crashplan-duplicati-and-backupc.html

Memory for the SuperMicro X7SPA-H-D525-0

Taco Steemers Jan 11, 2014 Updated Jan 11, 2014

I recently bought the SuperMicro X7SPA-H-D525-0 motherboard. The Micron memory listed as compatible by SuperMicro was no longer available. I then checked Kingston's compatibility list. It listed two options for this motherboard.

Some blog- and forum posts mentioned using similar Kingston memory for similar SuperMicro motherboards. Taking the Atom D525 …

Show full content

I recently bought the SuperMicro X7SPA-H-D525-0 motherboard. The Micron memory listed as compatible by SuperMicro was no longer available. I then checked Kingston's compatibility list. It listed two options for this motherboard.

Some blog- and forum posts mentioned using similar Kingston memory for similar SuperMicro motherboards. Taking the Atom D525 CPU in to account, I decided to order the first stick on Kingston's compatibility list. It didn't work.

I hadn't been able to find anyone mentioning my exact motherboard. There were plenty using similar motherboards (such as the HF, with IPMI) and all seemed to use Kingston, or memory that just wasn't available anymore. So I went ahead and ordered the second stick on Kingston's compatibility list. I think you can guess where this is going...

It didn't work. After a couple additional evenings of searching I finally found someone mentioning that they used this specific motherboard in combination with the Corsair CMSO2GX3M1A1333C9 (I lost this reference). I can confirm that this works. For the X7SPA-H-D525-0 you can use the Corsair CMSO2GX3M1A1333C9, and luckily it is widely available.

I figured I would just post this, in case anyone else was planning to get that motherboard...

tag:tacosteemers.com,2014-01-12:/articles/2014-01-12-memory-for-the-supermicro-x7spa-h-d525-0.html

Configuring an Apache installation for use with the SSL protocol

Taco Steemers Jan 11, 2014 Updated Jan 11, 2014

Is your ownCloud client saying Failed to connect to ownCloud: Connection refused?

A possible cause could be that the webserver that is serving your ownCloud does not have SSL enabled. In this note I will describe how I did that for my own Apache 2 install. If you do a …

Show full content

Is your ownCloud client saying Failed to connect to ownCloud: Connection refused?

A possible cause could be that the webserver that is serving your ownCloud does not have SSL enabled. In this note I will describe how I did that for my own Apache 2 install. If you do a websearch for apache2 ssl you will probably find many search results, but none of the pages I found applied to the install I had - all used different files and directories. For that reason I am posting this note.

If you are using the Apache web server, a version close to 2.2, you can probably enable SSL the way it is outlined in this note. To find out which version of Apache my server has, I ran apache2 -v on it.

$ apache2 -v
Server version: Apache/2.2.22 (Debian)
Server built:   Mar  4 2013 22:05:16

We will now create a private key and a certificate, but before we do that, we should create and navigate to the /us/lib/apache2/ssl/ directory. Our server is called server1. This is what we will enter as the 'common name' when we are asked for it. We can create a key/certificate pair with the following example command:

openssl req -new -newkey rsa:2048 -nodes -keyout server1.key -out server1.csr

It willl probably make sense to add something like -days 365, which indicates how long the certificate should be valid. In my case it does not seem necessary, as both server and clients are on my personal network.

Now we need to tell Apache to use it. We make sure the top of our site configuration, which is contained in /etc/apache2/sites-available/default by default, looks like this:

<VirtualHost *:443>
    ServerAdmin webmaster@localhost
    ServerName server1:443

We also add the following:

    SSLEngine on
    SSLCertificateKeyFile /etc/apache2/ssl/server1.key
    SSLCertificateFile /etc/apache2/ssl/server1.crt

We will instruct Apache to use the 'mod_ssl' module, which uses OpenSSL. Install it if it isn't installed yet (check /usr/lib/apache2/modules/ to see if it is installed). We can use a2enmod ssl and a2ensite default-ssl to enable 'mod_ssl' for us. The latter enables it specifically for the website listed in /etc/apache2/sites-available/default.

We can also do it manually. If we check which files are listed in /etc/apache2/mods-available, we should find ssl.conf and ssl.load. We will now create symbolic links to these files in the /etc/apache2/mods-enabled directory, that way Apache knows we want these mods to be enabled.

cd /etc/apache2/mods-enabled
ln -s ../mods-available/ssl.conf ssl.conf
ln -s ../mods-available/ssl.load ssl.load

mod_ssl is now enabled.

Now the apache2 server needs to be restarted. One can use

service apache2 reload

on modern Debian(-based) installs.

Your owncloud should now be reachable on 'https://<server>/owncloud'. Of course, your ownCloud client and web browser will ask you if you trust this self-signed certificate.

Warnings about current SSL Connection:
The host name did not match any of the valid hosts for this certificate
The certificate is self-signed, and untrusted

...
...

In this case I'm fine with this - I can check the certificate details myself, and am only really using the certificate to get my own ownCloud client working with my own ownCloud server.

tag:tacosteemers.com,2014-01-12:/articles/2014-01-12-configuring-an-apache-installation-for-use-with-the-ssl-protocol.html

Switched from Wordpress to Pelican

Taco Steemers Jan 2, 2014 Updated Jan 2, 2014

Over the past two years I've posted some writing on a different domain. That is a shared Wordpress site which hasn't been seeing much love (the other party hasn't made any posts). I've been thinking of also writing some smaller notes, things that I tend to forget and then have …

Show full content

Over the past two years I've posted some writing on a different domain. That is a shared Wordpress site which hasn't been seeing much love (the other party hasn't made any posts). I've been thinking of also writing some smaller notes, things that I tend to forget and then have to find out again. I'd rather not post them on this shared site because of the constant need for Wordpress updates, and because I tend to find Wordpress to be too much of a hassle when you want to make small adjustments. Therefore I decided to use a different domain and add a static site there (that is, here, where you are right now).

This site uses Pelican to create a static site from files written in Markdown. I've used Thomas Frössman's exitwp tool to get my posts from TophatCoders, the Wordpress site, in to text files with the Markdown markup. This worked quite well. Exitwp, however, is targeted at Jekyll rather than Pelican. To make sure Pelican could read the files, I wrote a small script (Python 2) to remove two lines and remove the time from the date, as Pelican didn't support the time notation that was written to the files. Of course, I only found out afterwards that there is a file in the Pelican project that seems to do the same (pelican/tools/pelican_import.py. Because I haven't used it, I don't know how well it works, but such functionality also appears to be mentioned in the manual so it should be supported. Once again it is shown that reading the manual can be a good idea ;).

I must say I'm very pleased with how easy the entire process has been. I'm not very up to date on Pelican yet, and I haven't gotten around to adding comment functionality yet, but I am liking things so far.

There was only one problem with my workflow, accented characters:

WARNING: Could not process Notes/2014-01-03-switched-from-wordpress-to-pelican.markdown 'utf8' codec can't decode byte 0xf6 in position 967: invalid start byte

To my surprise, it choked on Thomas Frössman's name. I was typing the post on a Windows laptop and had saved that post to disk on a Debian server using nano, after pasting the characters in to nano over a PuTTY SSH connection. PuTTY didn't use the UTF8 encoding that was expected by Pelican. The fix was simple, set the Window->Translation->Character set translation->Remote character set setting to UTF8.

Now I have everything the way I want it. I have a low-maintenance site and I can add posts with all my devices.

tag:tacosteemers.com,2014-01-03:/articles/2014-01-03-switched-from-wordpress-to-pelican.html

Something to think about when storing or processing files in your web app

Taco Steemers May 17, 2013 Updated May 17, 2013

Recently I saw code that looked like the following:

function handleFileRequest
    variable file = getFile('storage/${params.filename}')
    send file

This code readily accepts a user-supplied piece of information to retrieve a file. This is very wrong, luckily it was caught in a code review.

It is wrong because it allows …

Show full content

Recently I saw code that looked like the following:

function handleFileRequest
    variable file = getFile('storage/${params.filename}')
    send file

This code readily accepts a user-supplied piece of information to retrieve a file. This is very wrong, luckily it was caught in a code review.

It is wrong because it allows for directory traversal, and directory traversal is dangerous.

It is natural to think something along the lines of 'Hmm, simply removing all instances of "..", and maybe "/", would be a good start!'. But that wouldn't solve much, it would solve just this one instance where an attacker would try something like "../private_file" as an input.

Depending on what kind of tools you are using or could potentially use, you can likely find solutions that are better thought-through and likely less complex than re-inventing a solution for your particular case. Use the tools your framework provides for sanitation. If you don't use a framework, or it doesn't supply such a tool, find such a tool.

It would be better to just not be passing identifiers that are also file or directory names. It is likely that there are better solutions to be found, and in your case too. This is comparable to how most developers have moved on from building SQL queries by concatenating user-supplied input with function and parameter names, to parametrized queries and stored procedures.

It is always a good idea to use more than one security layer. Your web application likely runs on an operating system that has built-in security features such as file access control, and your application/web server (i.e. Tomcat or Apache) likely does too. So if possible, make sure to use proper access control! If you fetch files using a different process than your main process, perhaps that script can be run with a user that only has access to a particular directory. In most cases that won't be enough, because users should only be permitted to access their 'own' files. Information like that can be kept track of in a database. These two suggestions have different goals: one is aimed at protecting the web application from the user, the other is aimed at protecting users from each other.

It could be a good idea to ask yourself, for your particular applications, who or what needs protecting from whom. And could it be possible that technology you are already using has a ready-made solution?

tag:tacosteemers.com,2013-05-18:/articles/2013-05-18-something-to-think-about-when-storing-or-processing-files-in-your-web-app.html

Weak references, are you sure you want to use them?

Taco Steemers May 8, 2013 Updated May 8, 2013

One of the projects that I have been working on lately is a standard C# codebase, a framework of sorts, for a particular niche category of software. This is for a client that develops and uses a lot of this kind of applications in-house. Many of these have been written …

Show full content

One of the projects that I have been working on lately is a standard C# codebase, a framework of sorts, for a particular niche category of software. This is for a client that develops and uses a lot of this kind of applications in-house. Many of these have been written as a stand-alone effort. Some are difficult to maintain and extend, as each development effort followed it's own path.

I have looked at the MVVM Light Toolkit to see if it could be of use to us. My test application would show odd behaviour. After a little while, an exception would show up when I clicked a button. The root cause is a series of null-checks in the toolkit, that don't cover all cases. Also of interest to me was how my code led to that bug showing up.

I found that the target of a reference that I had passed to a MVVM Light Messenger object had become garbage collected. A Messenger object can be used for communication between viewmodels. Internally, the Messenger object held a weak reference to the object I had passed in. In my test application, that object went out of scope almost immediately. Because there only existed a weak reference to it, the garbage collecter removed it. Subsequently, the Messenger object's weak reference no longer had a target, i.e., it pointed to an object that did not exist anymore.

I had thought that the Messenger's reference to my object would keep my object from being gc'ed, but because some of the Messenger's internal references are a WeakReference, it does not stop those objects from becoming eligible for garbage collection.

As it turns out, this has been listed as a bug on the MVVM Light Toolkit's CodePlex repository.

This is a simple example of how one might encounter this situation (copied from a post on that same page): ``

public MainWindow()
{
    InitializeComponent();

    LogManager _log = new LogManager(typeof(MainWindow).Name);

    Messenger.Default.Register<NotificationMessage>(this,
    x =>
    {
         // call to this fails on .Send unless you remove the local _log reference
         // or change the local variable to being a field of the 
         //  MainWindow class instead was a simple workaround for me
        _log.Error("An unexpected error occured. " + x.Notification, x.Content);
    });
 }

If I recall correctly, my case was different. I passed the object as the first variable, the recipient.

I'm not certain that this should be classified as a bug in the MVVM Light Toolkit, as this is probably by design. It makes sense to use weak references for event-related things because one might forget to unsubscribe ones objects, a situation which I would consider a memory leak. Then again, why would you want that for the Messenger object, typically used by viewmodels which exist during the entire lifetime of an application?

I'm not making use of the MVVM light framework anymore, but because initially it looked as though I would, I had to do something with this situation. I can't give my coworkers a codebase that makes it that easy to create bugs that are easy to miss. A bug like this is easy to miss during testing because it occurs intermittently. Many developers have a fuzzy understanding of how memory management works (in general, or in whatever language or runtime they end up using today or tomorrow), which could make this kind of situation difficult to fix.

I set out to find a way to be able to use the MVVM Light Toolkit code to keep the relevant objects alive longer, and settled on using a ConditionalWeakTable. The way the adjusted Messenger makes use of it ensures that the targets of the weak references do not get gc'ed, but the entry itself will be removed and gc'ed when it is no longer relevant to the Messenger. This means that the rest of the code does not need changing. Unfortunately, this type of collection is only available starting from C# 4.0.

I haven't proposed a patch to the Toolkit's codebase. In this instance, my own preference simply appears to be different than those of the Toolkit's author. The Toolkit isn't meant to help develop intense event-based systems, it is intended for user-facing systems with a GUI. If you are not writing real-time systems, are you sure you need to use a weak reference?

tag:tacosteemers.com,2013-05-09:/articles/2013-05-09-weak-references-are-you-sure-you-want-to-use-them.html

Why I like the filesystem as an interface to the OS

Taco Steemers Apr 20, 2013 Updated Apr 20, 2013

A short while ago I made a post about a daemon that I have developed. This daemon needs information about connected storage devices and their mounted partitions. There is currently only a Linux version. I tried to develop a Windows version too (in the same code base even) but that …

Show full content

A short while ago I made a post about a daemon that I have developed. This daemon needs information about connected storage devices and their mounted partitions. There is currently only a Linux version. I tried to develop a Windows version too (in the same code base even) but that effort stranded at some point. Developing the daemon for Linux was easy. It simply required reading the correct files from /proc, /sys and /dev.

Initially I tried to develop both versions at the same time. I looked at what daemons on Linux and services on Windows generally look like. I took a look at getting a small piece of code that could fork on both platforms. Then I looked at accessing information about hardware. Using C on Windows, things look to be massively less simple, and less easy as well. Windows Management Instrumentation seems most promissing, but (parts of) it appear(s) to need to be installed on some clients, and the amount of code required to retrieve the needed information seems massive. Using C#'s libraries, the daemon's functionality looks easy to implement on Windows. Maybe a bit easier and less complex than it was using C on Linux, even. But C# also requires the installation of a runtime, something I would really like to avoid.

Everyone that can program can write a program that reads files from the file system, and they can educate themselves about which files they need simply by reading them. But as soon as you have to use complex series of commands that are not well documented and may not (easily) be available to your programming language, development- or client environment, educating oneself becomes less easy and developing your program becomes more complex.

tag:tacosteemers.com,2013-04-21:/articles/2013-04-21-why-i-like-the-filesystem-as-an-interface-to-the-os.html

USB Storage Back Up Daemon

Taco Steemers Apr 12, 2013 Updated Apr 12, 2013

I am working an a daemon (service) that will automatically back up all mounted partitions on USB storage devices, with or without prior configuration for a particular drive. In it's simplest form, the daemon will back up any USB storage device that you plug in. This includes thumb drives and …

Show full content

I am working an a daemon (service) that will automatically back up all mounted partitions on USB storage devices, with or without prior configuration for a particular drive. In it's simplest form, the daemon will back up any USB storage device that you plug in. This includes thumb drives and all sorts of camera storage, such as SD and SDHC cards.

All desired functionality is in place, including white- and blacklist functionality. Additional functionality may be added in the future, and some ideas for that can be found in the readme file.

The software is known to work on the Ubuntu GNU/Linux distribution. It doesn't work on any Windows OS, but I'm working on a project specifically for versions Vista and 7.

tag:tacosteemers.com,2013-04-13:/articles/2013-04-13-usb-storage-back-up-daemon.html

A small script to help organize your torrent downloads

Taco Steemers Mar 18, 2013 Updated Mar 18, 2013

A couple of days ago I wrote a small housekeeping script (in Python), that lists stale torrent data. I use this to help clean up the directory that I let my torrent application store the torrent data to. The script will list those files that are not part of the …

Show full content

A couple of days ago I wrote a small housekeeping script (in Python), that lists stale torrent data. I use this to help clean up the directory that I let my torrent application store the torrent data to. The script will list those files that are not part of the torrents loaded by a torrent application, but do occur in the given download directory. I find this useful to keep track of which files I am still up- or downloading, and which I am not. Those files I might decide to remove, or move to a different directory.

Finding out which torrents are in use by the torrent application turned out to not be that difficult for Transmission and Deluge, as they keep a directory with current torrents. They do this regardless of if you used a magnet link or an actual torrent file. To find out which files belong to torrent files, one will need to read the torrent files. As is shown on the bittorrent.org website, torrent files use a specific encoding, which is called bencoding. As it turns out, Fredrik Lundh published a decoder in August 2007, which was very useful to me.

My script will list each file, with their full paths. Example usage: python listStaleTorrentData.py /home/user/.config/deluge/state /home/user/downloads/ python listStaleTorrentData.py /home/user/.config/transmission/torrents/ /home/user/downloads/ The script will work with any torrent application that uses a directory with torrent files to store it's state. Note that only the one torrent application should be using the download folder to store files, because only that application's known torrents will be checked.

If you wish to pass this list to a command, such as the rm command on your UNIX(-like) operating system, you may have to tweak the output a bit. If the resulting files don't contain spaces and only contain those characters in the basic Latin block that are allowed by your filesystem, you can probably pass the output to rm by doing something like | xargs rm, provided that your platform has a utility such as xargs. If your files do contain spaces, you will have to tweak the output such that quotes are added. Stack exchange has you covered on that front.

You may notice on that page that the accepted answer uses a null character as a filename separator. Not every script or application accepts each character as a seperator, but there is a good reason to use that separator if you can. If you are using rm and xargs you can separate filenames by a null character. This will prevent one silly but dangerous vector of attack, which is filenames with a newline in them. rm and xargs have commandline arguments that you can use to indicate that their input is null separated data. The null character can never appear in filenames (at least not to my knowledge).

Regarding filenames with a newline in them, take a look at the following to see why they might be problematic (use a fresh, empty directory!): $ touch 'a b' $ touch 'b' $ ls a?b b $ ls | xargs rm rm: cannot removea': No such file or directory rm: cannot remove b': No such file or directory $ ls a?b The rm command tried to remove the files a and b, after encountering a[newline]b. Of course a never existed, but b did, and it got removed. If the characteres after the newline are a valid path, that file might be deleted if the file permissions allow it.

tag:tacosteemers.com,2013-03-19:/articles/2013-03-19-liststaletorrentdata-a-small-script-to-help-organize-your-torrent-downloads.html

Setting up network shares with NFS and Linux systems

Taco Steemers Feb 5, 2013 Updated Feb 5, 2013

For quite a while, I have had a backup solution that I was not happy with. I have now remedied this. I am using a small server, a HP Proliant Microserver N40L. I tend to use Debian-based GNU/Linux systems at home. While I do eventually want my new solution …

Show full content

For quite a while, I have had a backup solution that I was not happy with. I have now remedied this. I am using a small server, a HP Proliant Microserver N40L. I tend to use Debian-based GNU/Linux systems at home. While I do eventually want my new solution to support Windows systems, it is not a priority. To support both Windows and Linux operating systems, I would use Samba. However, I have had mixed results with Samba, in home and small business settings. I am now succesfully using NFS (Network Filesystem Share) to back up computers with Linux on it. Unfortunately, NFS support under Windows isn't great. Apparently it is only supported by several Microsoft Windows Vista, Server 2008, Windows 7 Enterprise and Windows 7 Ultimate (http://www.microsoft.com/en-us/download/details.aspx?id=2391). In this post I will outline how to install NFS under Debian-based operating systems, as a server and as a client. As it turns out, this is pretty simple.

The server We will be using a volume that is mounted under /media/backupspace.

First, we will install the required software (I'm using apt and sudo to get the necessary packages and rights, just substitute as necessary). sudo apt-get install nfs-kernel-server This install should make sure that everything required for NFS is loaded at startup.

Now we will mount this same volume in a directory that will be used by NFS. First, we create the new mountpoint sudo mkdir -p /exports/backupspace and then we bind the volume, the one that we wish to place the backups on, to that new mountpoint. We can do this manually with sudo mount --bind /media/backupspace /exports/backupspace To have it be mounted during startup, we can add it to the /etc/fstab file. This file can only be edited by a superuser. We would add the following line: /media/backupspace /exports/backupspace none bind 0 0

Three other files are relevant to our configuration: /etc/default/nfs-kernel-server /etc/default/nfs-common /etc/exports These three files can also only be edited by a superuser.

For this example we will not be setting up authentication requirements. You may wish to do so yourself, perhaps after you have gotten your share accessible without authentication. In /etc/default/nfs-kernel-server we will be setting NEED_SVCGSSD=no. This indicates that we don't need authentication.

In /etc/default/nfs-common we set NEED_GSSD=no because we are not using authentication, and NEED_IDMAPD=yes because we want to map user IDs from names (for properly preserving permissions). For that to work, 'idmapd' should be running and configured properly, something that it probably does by default.

In /etc/exports we will indicate that we want to use our newly created /exports/backupspace. We add something like /exports/backupspace 10.0.0.0/24(rw,fsid=0,insecure,no_subtree_check,async) This should share /exports/backupspace, to the indicated subnet, with fairly standard settings and no authentication.

Restart the NFS service for the changes to have effect (/etc/init.d/nfs-kernel-server restart should do it).

The client First, we need to determine where we will mount our network share (i.e., what path we will be using to access our files). This can't be done in an encrypted directory. I suggest using the location /media to create your mountpoint in, as this is the directory that storage devices are mounted to by the operating system itself. For this example I will be using /media/backupspace to mount my networked storage. sudo mkdir /media/backupspace

We install the client software: apt-get install nfs-common

We can manually mount our network share using the mount command: sudo mount -t nfs4 -o proto=tcp,port=2049 backupserver:backupspace /media/backupspace "-t nts4" indicates that we want to access something that is shared over nfs4. "-o proto=tcp,port=2049" are the default connection settings for the NFS service. "backupserver:backupspace" indicates which server and which share we will be using. Note that we do not list the server-side path here. The last part, "/media/backupspace", is the mountpoint on our client.

If we want to have the networked share to be mounted on boot we can use fstab, as shown earlier. We will use the flags "hard,intr". To quote the NFS how-to on the "hard" setting:

"The program accessing a file on a NFS mounted file system will hang when the server crashes. The process cannot be interrupted or killed (except by a "sure kill") unless you also specify intr. When the NFS server is back online the program will continue undisturbed from where it was. We recommend using hard,intr on all NFS mounted file systems."

(http://nfs.sourceforge.net/nfs-howto/ar01s04.html) Since this is a backup system, it is crucial that we do not use the "soft" setting, which will lead to data corruption in such situations.

We will add something like the following to fstab: backupserver:backupspace /media/backupspace nfs rw,hard,intr 0 0 On my backup server I also have a share that is rarely written to. That share is itself not a backup, though a backup is created of it. Currently I mount it manually, but if I were to add it to fstab, I would have it mounted as read-only ("ro,hard,intr").

Some notes It is worth spending some time thinking about NFS security (http://nfs.sourceforge.net/nfs-howto/ar01s04.html). It is probably a wise idea to set up a firewall. You may wish to use a 'defense in depth' approach, and set up a firewall both for your network but also specifically on your NFS server machine. Currently I am using the rsync command to duplicate my files. This works fine. One downside is that there is no solid, pre-built way to handle filename changes. This isn't really surprising, without storing metadata there is no way to determine that a path has been adjusted. Instead, it concludes that items have been removed and that items have been added, and starts doing the same to the target location. If you are copying to an NTFS formatted target then it is important to know that NTFS doesn't support permissions.

tag:tacosteemers.com,2013-02-06:/articles/2013-02-06-setting-up-network-shares-with-nfs-and-linux-systems.html

Minimal Linux and Windows process spawn test

Taco Steemers Oct 22, 2012 Updated Oct 22, 2012

I am working on a small backup utility meant to add value to other, proper backup utilities. At the least, I want this program to run on any vanilla install of a (semi-) recent Linux or Windows desktop version. I would also like to keep a small file-size, and prefer …

Show full content

I am working on a small backup utility meant to add value to other, proper backup utilities. At the least, I want this program to run on any vanilla install of a (semi-) recent Linux or Windows desktop version. I would also like to keep a small file-size, and prefer not to depend on any non-standard libraries. The language of choice is C. The planned functionality of this program requires spawning different processes, without destroying the original process.

For that reason, I have looked in to writing the simplest cross-platform way of doing so. Before I started to look for information on launching new processes in C, running on a Linux and/or Windows OS, I had expected to find several examples of possible approaches. This turned out not to be the case, and so I had the pleasure of finding my own solution. Naturally, it was only after finishing my code sample that I stumbled upon the Wikipedia page for "Spawn (computing)", which contains a lot of useful information.

For spawning a different process that runs beside the current process, the fork function in combination with a function from the exec family appears to be the standard on Linux operating systems. fork() duplicates the running process, but execv() loads a different process image into the duplicate process. My preference to not depend on any non-standard libraries excludes a solution such as the Cygwin dll, which I am told would support fork() under Windows. Instead, I wrote some platform specific code that makes use of the _spawnv function. If you are not familiar with it yet, be sure to read that page before using it. The page contains important information on the environment of the spawned process. The second and third arguments of execv() and _spawnv() respectively, are the arguments (argv) for the new program. That is what the v in the function names refers to.

I have probably overlooked something that a practiced C programmer would not. If you find anything to improve, feel free to get in contact or fork the github gist.

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#ifdef _WIN32
    #include <process.h> /* Required for _spawnv */
    #include <windows.h>
    /* We make getpid() work in a similar 
        way on Windows as it does on Linux */
    #define getpid() GetCurrentProcessId()
#endif
#ifdef __linux__
    #include <unistd.h>
#endif

void spawn_new_process(char * const *argv);
int pid;

int main(int argc, char *argv[])
{
    pid = getpid();
    if(argc > 1 && strcmp(argv[1],"the_new_process") == 0)
    {
        printf("[%d] This is a new process, and not a fork.\n", pid);
    }    
    else
    {
        printf("[%d] This is the original process.\n", pid);
        char *new_args[2];
        new_args[0] = argv[0];
        new_args[1] = "the_new_process";
        spawn_new_process((char * const *)new_args);
    }
    return(0);
}

void spawn_new_process(char * const *argv)
{
    #ifdef _WIN32
        /* This code block will also be reached on a 
           64 bit version of a Windows desktop OS */
        _spawnv(_P_NOWAIT, argv[0], (const char * const *)argv);
    #endif


    #ifdef __linux__
        pid = getpid();

        /* Create copy of current process */
        pid = fork();

        /* The parent`s new pid will be 0 */
        if(pid != 0)
        {
            /* We are now in a child progress 
               Execute different process */
            printf("[%d] Child (fork) process will call exec.\n",
                 pid);
            execv(argv[0], argv);


            /* This code will never be executed */
            printf("[%d] Child (fork) process is exiting.\n", pid);
            exit(EXIT_SUCCESS);
        }
    #endif

    /* We are still in the original process */
    printf("[%d] Original process is exiting.\n", pid);
    exit(EXIT_SUCCESS);
}

When run on Linux, we get the following output:

prompt> minimal_fork_test
[22166] This is the original process.
[22167] Child (fork) process will call exec.
[0] Original process is exiting.
[22166] This is a new process, and not a fork.
prompt>

When run under Windows, we get the following output:

prompt>minimal_fork_test
[14800] This is the original process.
[14800] Original process is exiting.

prompt>[14808] This is a new process, and not a fork.

tag:tacosteemers.com,2012-10-23:/articles/2012-10-23-minimal-linux-and-windows-process-spawn-test.html