Home · Simon Holywell on Simon Holywell

Simon Holywell Nov 19, 2024 Updated Nov 19, 2024

Code portability is the practice of writing code that can be easily reused or transferred between different environments, such as databases or frameworks, with minimal changes. By prioritising portability, developers enhance code maintainability, reduce vendor lock-in, and improve their own adaptability across projects and organisations.

Show full content

Recently, a friend let me know that the SQL style guide I wrote was back on Hacker News. Against my better judgement, I took a moment to read the latest comments.

One recurring theme stood out—a misunderstanding of code portability. Although I’ve addressed this before in SQL Style Guide Misconceptions, it seems worth revisiting, as many readers still seem to struggle with the concept.

Many readers who have commented in the past and probably will comment in the future have a very limited vision of code portability and its benefits.

What is code portability?

Code portability is generally refers to the ability to move code from one environment to another with minimal changes. For example, in the context of the SQL style guide, this might mean taking code written for MySQL and adapting it easily for PostgreSQL.

The same principle applies across software development: keeping your business logic decoupled from specific frameworks ensures it can be migrated to another framework with minimal effort.

However, this is unlikely to be something that many developers will experience in their careers. It is rare that a project will move to a different environment once it has been established. Where it does happen it is likely to be larger body of work and changing the SQL queries will be the least of your worries.

Whilst this is one aspect of portability it is not the only one. In my opinion it is not the most compelling reason to write portable code either.

What is the real benefit of code portability? Reuse

One major benefit of portability is the ability to reuse code. You can reuse code in a different project, even if it’s running in a different environment, without making any changes. For example, a query written for a MySQL-based project could be reused in another project running PostgreSQL with minimal effort.

Reusing code saves time and ensures we avoid reinventing the wheel.

You, the developer

A very real benefit of writing portable code that is very often overlooked is that it makes you, the developer, more portable. If you write portable code then you are able to move between projects, companies and teams more easily.

Take SQL as an example: why would want to pigeonhole yourself as a MySQL developer? Your next project might use PostgreSQL or MS SQL Server. If you have only learnt vendor specific SQL functionality then your skills are not going to be transferable to your new environment.

Instead of learning the Laravel way of doing something, learn the PHP way of doing something. Instead of learning the React or NestJS way of doing something, learn the JavaScript way of doing something. Sure, use the framework to make your life easier, but do not compromise your business logic to accommodate the framework.

Maintainability and ease for others

Portable code is easier to maintain and understand, both for yourself and for others who might work on it in the future. Standards-compliant, portable code ensures a wider audience of developers can quickly grasp its intent, making debugging, updating, and extending the codebase simpler.

Remember, you’re not just writing code for now—you’re writing it for your future self and your colleagues.

Tooling

When you are writing portable code you are writing code that is more likely to be supported by the tools you use. This can be your editor, code linters, code formatters, other static analysis tools and dashboards.

In the case of SQL, this could be your desktop database client and its internal code formatting, syntax highlighting and code completion features.

Vendor lock-in

Writing portable code reduces the risk of vendor lock-in. Relying on vendor-specific features ties your code—and your team—to that vendor.

While this might not seem like a concern for individual developers, it can pose significant challenges for organisations. However, by writing portable code you are reducing the risk of this happening and you get this advantage for free.

That said, let’s be realistic: with many of us relying on specific cloud providers, some level of lock-in is often inevitable.

What if I can’t?

Sometimes it is just not possible to write portable code and you have to use vendor specific functionality. This is fine and completely unavoidable. Do it. There is no need to self-flagellate.

Seriously, just move on and implement the feature.

Conclusion

If you have the option to write portable code, you should. Choosing to write vendor-specific code unnecessarily forfeits the benefits I have outlined above. Most commenters appear to be blind to this and I hope that this article has helped to explain why I think it is important.

Ultimately, I am some bloke on the internet and you can choose to ignore me. The results of your choices will be yours to live with. I would urge you to think of others that will come after you though and make their lives easier by writing portable code where you can.

https://www.simonholywell.com/post/a-note-on-code-portability/

BikeYoke Revive dropper post alternative oils

Simon Holywell Oct 15, 2024 Updated Nov 19, 2024

The BikeYoke Revive dropper post is a great piece of kit, but it can be hard to find the correct oil for servicing it. I have found some alternatives that work well and are easier to source.

Show full content

The oil used in the BikeYoke Revive dropper post is a custom blend of oil and an additive that is designed to work in varied temperature conditions and provide both a smooth and consistent action. It is a key component of the dropper post system and should be replaced during a full service that is usually only required if the post is not working as expected or if you accidentally spill some — as I did. The oil can ber purchased directly from BikeYoke, but if you need oil in a hurry or you’re wondering if some suspension oil you already have will work, then read on.

BikeYoke Revive cartridge service

The BikeYoke Revive has a user serviceable cartridge system that allows you to replace the oil in the dropper post without having to send it back to the manufacturer. This is a great feature because it means you can keep your dropper post running smoothly without having to wait for a service or pay for one. The cartridge system is a simple design that allows you to remove the cartridge from the post, drain the oil, refill it and then reinsert it into the post.

I am not going to go into the details of how to service the cartridge in this post, but I will say that it is a simple process that only requires a few tools and some oil. You can follow along with the BikeYoke published instructional video on how to service the cartridge.

BikeYoke Revive oil

There are two recommended oils for the BikeYoke Revive dropper post:

BikeYoke Sanguine dropper fluid - 250ml
96% Mobil DTE 10 Excel 15 with 4% r.s.p No Stick Slip additive

The Sanguine dropper fluid is the oil that BikeYoke sells directly and is the recommended oil for the dropper post. It is also very reasonably priced so if there is stock and you can wait a few days then I would recommend buying it from your local BikeYoke distributor.

From what I can glean Sanguine is actually a pre-mixed version of the Mobil DTE 10 Excel 15 with the r.s.p No Stick Slip additive. This makes sense because the only way to buy the Mobil DTE 10 Excel 15 is in 20L drums, which is a lot of oil for a single dropper post service.

Speaking of which the Sanguine dropper fluid is sold in a 250ml bottle which is enough for a few services given the small oil capacities of the dropper post.

Oil capacities

The BikeYoke Revive dropper post comes in a few different lengths and each length has a different oil capacity.

Post length Oil capacity Tolerance 125mm 30ml +/- 2ml 160mm 41.5ml +/- 2ml 185mm 46ml +/- 2ml 213mm 60ml +/- 2ml

So as you can see the oil capacities are not large so you’ll get a few services out of a 250ml bottle of the Sanguine dropper fluid, or any other oil you choose to use.

Selecting an alternative oil

Whilst BikeYoke are quite clear about the oil they recommend for the dropper post they do note that the best performance will be achieved with the Sanguine dropper fluid. That said from forum posts by Sacki of BikeYoke they have been discussing the use of alternative oils from the beginning and have tested a few.

Early discussions on the forums reference a specification of 15cSt at 40°C for the oil, but recent technical documents from BikeYoke mention that Sanguine is a 16cSt oil at 40°C. Something within a reasonable range of this should work well in the dropper post.

Do note that we’re talking about cSt here and not the weight of the oil, which does not directly correlate to the viscosity of the oil. Also, this can be noted with the mm²/s unit that is sometimes used to denote the viscosity of oil, but is the same as cSt. You want a suspension oil that is as close as possible to 16cSt or 16mm²/s at 40°C.

Note that this value is taken from the oil and pressure specifications for BikeYoke products document.

Alternative oils

I have found a few alternative oils that are close to the 16cSt at 40°C specification of the Sanguine dropper fluid. These oils may be easier to find as they’re more commonly used in motorbikes.

Oil Viscosity at 40°C IPONE Fork Fluid 3 15.5cSt Maxima fork oil 5wt 16.2cSt Motorex Racing Fork Oil 4W 16.0cSt Motul shock oil factory line 16.3cSt Motul factory line fork oil 2.5W 15cSt Showa SS-05 15.1cSt Suzuki L01 15.5cSt

You are sure to find more oils that are close to the 16cSt at 40°C specification of the Sanguine dropper fluid, but these are the ones that I have found. I managed to find the Motul factory line fork oil 2.5W at a local motorbike shop, and it has worked well in my dropper post - I have not noticed any difference in the action of the post.

Refilling the oil

The easiest way to measure and refill the oil in the dropper post is to use a syringe. I got a syringe from my local chemist and whilst it was only 5ml in capacity with a few syringe refills I was able to measure out the correct amount of oil for my dropper post.

Pressurising the post

Once you’ve changed the oil you will need to repressurise the post with air and a shock pump.

Minimum pressure Standard pressure Maximum pressure 200psi 250psi 300psi

The post is designed to work with a pressure of 250psi, but you can run it at a lower pressure if you want a softer/slower return or a higher pressure if you want a firmer/faster return to maximum height.

Conclusion

The BikeYoke Revive dropper post is a great post that is easy to service and maintain. The oil capacities are small so you’ll get a few services out of a 250ml bottle of the Sanguine dropper fluid or any other oil you choose to use. My post has been running well with an alternative oil and I have not noticed any difference in the action of the post. I do not ride in freezing weather though so if you do then you may want to stick with the Sanguine dropper fluid or performance maybe be affected.

https://www.simonholywell.com/post/bikeyoke-revive-alternative-oils/

Focus Jam frame bearing replacement

Simon Holywell Sep 27, 2024 Updated Nov 19, 2024

I recently replaced the frame bearings in my 2018 Focus Jam mountain bike and found it hard to find out exactly which bearings I needed until I had the whole thing apart. To make it easier next time I am sharing my notes from the process.

Show full content

Introduction

I have owned a 2018 Focus Jam Elite 29er for a few years now and it has been a great bike. However, I have noticed that the rear suspension has been feeling a bit rough lately. I have been meaning to service the bearings in the frame for a while now, so I decided to finally get around to it.

You should do too if the rear suspension feels rough, creaks or you can feel play in the rear triangle.

Bearings required

The Focus Jam from 2017 to 2021 has the following bearings in the frame:

4x 61901-2RS bearings (12mm x 24mm x 6mm)
2x 61902-2RS bearings (15mm x 28mm x 7mm)
2x 63801-2RS-MAX bearings (12mm x 21mm x 7mm)
- do note that the bearing is 7mm wide and not the more commonly available 8mm wide - be sure to double check this.
- alternative names for this bearing:
  - 3801-H7 Full complement
  - 63801 Full complement

I bought my bearings from my local SKF bearing supplier/factor, but you can also get them as kits. In fact if I were to do it again I would probably buy the kit as it is cheaper or from a Enduro bearing supplier online.

The DIY MTB wholesaler in Australia has the bearings and a kit for the Focus Sam 2017-2018, which looks like it is the same as the Jam 2017-2021. Do verify with the supplier before purchasing, but the GPBSETRSFO2 kit looks like it has all the bearings you need.

A quick note on pressing the bearings

You do not need any special tools to press the bearings or remove them from the frame. You can use a socket that is the same size as the bearing and a long bolt or some all-thread combined with some washers & nuts to press the bearing in.

Bearing pressing tool

I have couple of photos of this tool in action later in the article that may help you understand how it works.

Removing the bearings

To remove the blind bearings you can gently them out with a flat head screwdriver like I did or using a bearing puller if you have one. A blind bearing is one that is pressed into a hole and has no lip to press against.

To press non-bearing blind bearings you can use a socket that is the same size as the bearing and a long bolt or some all-thread to press the bearing out. As you can see from the image I used a metric socket, a length of all-thread, two nuts and three washers. Then using two spanners to turn the nuts I was able to press the bearing out.

Pressing the bearings

To press the bearings back in you can use a socket that is the same size as the bearing and a long bolt or some all-thread to press the bearing in. Of course you can buy a bearing press tool if you want to, but it is not necessary.

Sockets are an excellent tool for pressing bearings in and out of frames because they come in many sizes and many people already have a set of sockets. Be sure to select a socket that is the same size as the outer race/shell of the bearing. Do not press on the inner race or the ball cage cover as you will damage the bearing.

Pressing a bearing into the frame

Again, from another angle to show you how I used the tool to press the bearing in.

Pressing a bearing into the frame - angle two

Conclusion

Hopefully this post has been helpful to you if you are looking to replace the bearings in your Focus Jam frame. It is a relatively simple job that can be done with basic tools and a bit of patience.

https://www.simonholywell.com/post/focus-jam-frame-bearing-replacement/

Paraíso dark VS Code and Zed themes

Simon Holywell Apr 4, 2024 Updated Nov 19, 2024

A dark theme for both the Visual Studio Code (vscode) and Zed editors based on the Paraíso theme from TextMate by Jann T. Sott and Chris Kempson.

Show full content

Introduction

I like to use the Paraíso (dark) theme for my editor and the options for VS Code were no longer maintained or poorly executed and non-existent for Zed. So I decided to fork the gerane.Theme-Paraiso_dark and create a version for both editors.

Screenshot

Paraíso dark theme in the Zed editor

Installing Visual Studio Code

The VS Code theme is available in the marketplace and can be installed from the in-editor marketplace.

Zed

To install the theme in Zed you can search the extension store for “Paraíso” and install it from there.

Source code

You can see the VS Code theme and the Zed theme on GitHub.

https://www.simonholywell.com/post/paraiso-dark-vscode-and-zed-theme/

Git and delta

Simon Holywell Feb 19, 2024 Updated Nov 19, 2024

Adding delta to your workflow will give you a nice interactive diff so make command line git so much better!

Show full content

Until recently I was using the diff-highlight script that comes with a git installation, but it stopped working and instead of investigating why, I started looking for more modern alternatives. One of the projects I kept seeing was Delta so I gave it a quick go and it was a great replacement with nice additional features.

It is nice and fast (written in Rust of course) and renders nice diffs when I am using git diff or git add -p (where -p is short for --patch) to add files to commits.

As an aside, if you’re not already using git add -p to stage your commits then you’re missing out. It allows you to interactively stage a file or just part of it (git refers to these parts as hunks). I have previously written about this in Staging patches with git add.

Getting setup is pretty easy by installing the git-delta package from your systems package manager - I am using Nix with home-manager that has a delta configuration built-in, but it could be Homebrew or even Chocolately.

Now you just need configure git to use it as the pager and for interactive diffs (git add -p for example). In ~/.gitconfig you can add something like the following to get started and as a good base to customise further too. The configuration options are in the delta documentation.

[core]
  pager = delta

[interactive]
  diffFilter = delta --color-only

[delta]
  navigate = true    # use n and N to move between diff sections

[merge]
  conflictstyle = diff3

[diff]
  colorMoved = default

As I am using home-manager my configuration looks more like this Nix configuration, which automatically handles setting delta as the pager and interactive diff tool.

  programs.git = {
    enable = true;
    delta = {
      enable = true;
      options = {
        hyperlinks = true; # makes file paths clickable in the terminal
        hyperlinks-file-link-format = "vscode://file/{path}:{line}"; # opens links in vscode

        features = "decorations interactive";

        interactive = {
          keep-plus-minus-markers = false;
        };

        decorations = {
          commit-decoration-style = "bold yellow box ul";
          file-style = "bold yellow ul";
          file-decoration-style = "none";
        };
      };
    };
  };

Now if you run git add -p you’ll get nicely highlighted diffs for each hunk that you’re staging for commit. Delta makes it easier to read the diffs and therefore, hopefully, spot mistakes quicker.

https://www.simonholywell.com/post/git-delta/

Staging patches with git add

Simon Holywell Feb 15, 2024 Updated Nov 19, 2024

Using git add -p to interactively stage specific parts of a file allowing for more precise control over commits in git. It provides a walkthrough on how to split hunks for granular commit control, ensuring that only desired changes are staged. The post emphasizes the utility of this approach in enhancing commit precision and managing contributions more effectively in git.

Show full content

Introduction

If you’re not already using git add -p to stage your commits then you’re missing out. It allows you to interactively stage a file or just part of it giving you greater control over your git commit process.

Why should you want to do this?

Here are some of the reasons why I prefer using git add -p in my workflow. Primarily, because it allows me to review my changes as I stage them and I often find mistakes this way. By reviewing changes during staging, I catch bugs, typos, and other issues that might have slipped through during initial coding or content creation.

There is an additional benefit though; it allows me to stage only part of file. Git, rather oddly, refers to these parts as hunks so I will use that term going forward.

This feature is really useful when you have a number changes, but you want to group them up into different commits. Imagine you’ve made several related changes across different parts of a file. With git add -p, you can selectively stage these changes together, ensuring cleaner and more organized commits.

Staging part of a file

Here is an example of the interface showing you the diff and then prompting you to “Stage this hunk?”.

diff --git a/main.mts b/main.mts
index e1132f2..8f7c279 100644
--- a/main.mts
+++ b/main.mts
@@ -1,2 +1,4 @@
 export const add = (a, b) => a + b
+export const div = (a, b) => a / b
 export const sum = (xs) => xs.reduce((acc, x) => sum(acc, x))
+export const avg = (xs) => div(sum(xs), xs.length)
(1/1) Stage this hunk [y,n,q,a,d,s,e,?]?

In its simplest form we can enter y to stage that diff ready for commit or n not to. For this example I am not ready to commit the avg function, but I want to get div pushed up so I choose to enter s to split the hunk into smaller hunks. Git then asks me this.

Split into 2 hunks.
@@ -1,2 +1,3 @@
 export const add = (a, b) => a + b
+export const div = (a, b) => a / b
 export const sum = (xs) => xs.reduce((acc, x) => sum(acc, x))
(1/2) Stage this hunk [y,n,q,a,d,j,J,g,/,e,?]?

So I enter y to stage that hunk for commit and git responds with the next hunk.

@@ -2 +3,2 @@
 export const sum = (xs) => xs.reduce((acc, x) => sum(acc, x))
+export const avg = (xs) => div(sum(xs), xs.length)
(2/2) Stage this hunk [y,n,q,a,d,K,g,/,e,?]?

Remembering that I only want to commit the div function I then enter q to quit the interactive hunk staging.

After interacting with the hunk staging, I return to the command prompt. From there, I can proceed with git commit or any other necessary commands.

Checking it worked

If I were to run a git status to check the staged files I would see that the fil (main.mts) appears in both sections; to be committed and not staged for commit. This is because we only staged part of the file and it is what we wanted!

On branch main
Your branch is ahead of 'origin/main' by 1 commit.
  (use "git push" to publish your local commits)

Changes to be committed:
  (use "git restore --staged <file>..." to unstage)
	modified:   main.mts

Changes not staged for commit:
  (use "git add/rm <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
	modified:   main.mts

By selectively staging only part of the file, we’ve successfully prepared a single hunk—a patch—for our upcoming commit.

Other options

The list ([y,n,q,a,d,s,e,?]) of potential responses is shortened, but you can get extended information by entering ? to get the help documentation.

Here are some response options you can use during interactive hunk staging taken from the git documentation. You’ll notice that there are a lot more options than in the list we saw earlier.

y: stage this hunk
n: do not stage this hunk
a: stage this and all the remaining hunks in the file
d: do not stage this hunk nor any of the remaining hunks in the file
g: select a hunk to go to
/: search for a hunk matching the given regex
j: leave this hunk undecided, see next undecided hunk
J: leave this hunk undecided, see next hunk
k: leave this hunk undecided, see previous undecided hunk
K: leave this hunk undecided, see previous hunk
s: split the current hunk into smaller hunks
e: manually edit the current hunk
?: print help

When not to use it

I use git add -p nearly every time I commit every working day. There are two occasions where I don’t:

There is a newly created file to commit for the first time - when a file is newly created there is no previous version to diff against of course so git add -p cannot present a diff for you to approve for staging.
In rare cases, when I want to commit an entire directory and am confident about its content, I usually opt for the standard approach. However, even in such edge cases, I often find myself using git add -p for finer control.

Conclusion

By incorporating git add -p into your workflow, you’ll streamline your git and commit process.. I use this technique, without exaggeration, nearly every single time I need to commit a changeset to git. It allows me to easily review my code as I stage it for commit and control exactly what goes into each of my commits.

https://www.simonholywell.com/post/git-add-p/

Dynamic docker image loading

Simon Holywell Feb 11, 2024 Updated Nov 19, 2024

Learn how to dynamically load different base images to build application images and for testing using Docker. Utilize build-time variables in a Dockerfile to specify the Node.js version/tag and even extend the flexibility to customize both the image name and tag. Bonus - Discover a GitHub Actions workflow that builds and tests your project against multiple Node.js versions, ensuring compatibility across different environments.

Show full content

Introduction

In a recent project, I encountered the need to dynamically load different base images based on the Node.js version I wanted for testing. Docker came to the rescue, offering a simple yet powerful solution using command-line arguments within the FROM section of a Dockerfile.

The Dockerfile

Let’s start with a straightforward example of a Dockerfile with the dynamic version loading included.

ARG NODE_VERSION_TAG
FROM node:${NODE_VERSION_TAG} as build

The ARG keyword sets up a build-time variable named NODE_VERSION_TAG. In the FROM clause, we immediately utilize this variable with ${NODE_VERSION_TAG}, dynamically loading the specified version/tag of the node image.

Building with dynamic arguments

When you execute the build process you can specify an argument as follows:

docker build --build-arg NODE_VERSION_TAG=latest -t custom-image-name:v1.0 .

You can also choose any other tag associated with different Node.js versions and underlying OS combinations.

docker build --build-arg NODE_VERSION_TAG=21.2.0 -t custom-image-name:v1.0 .

The container would then be built using the node image at version 21.2.0 as its base.

Extending for image name and tag

To take it a step further, you can dynamically alter both the image name and tag.

ARG BASE_IMAGE_NAME
ARG BASE_IMAGE_TAG
FROM ${BASE_IMAGE_NAME}:${BASE_IMAGE_TAG} as build

Now, you can build with different base images.

docker build --build-arg BASE_IMAGE_NAME=node --build-arg BASE_IMAGE_TAG=21.2.0 -t custom-image-name-node:v1.0 .
docker build --build-arg BASE_IMAGE_NAME=rust --build-arg BASE_IMAGE_TAG=1.74.0 -t custom-image-name-rust:v1.0 .

This approach extends the versatility of your Dockerfile, allowing you to seamlessly adapt to various base images and tags.

Github Actions workflow

As a bonus here is a Github Actions workflow that uses this technique to build a docker image against multiple versions of node.js.

name: Node.js Version Matrix

on:
  push:
    branches:
      - main

jobs:
  test:
    runs-on: ubuntu-latest

    strategy:
      matrix:
        node-version: [14, 16, 18]

    steps:
      - name: Checkout Repository
        uses: actions/checkout@v2

      - name: Set up Node.js
        uses: actions/setup-node@v4
        with:
          node-version: ${{ matrix.node-version }}

      - name: Build and Test
        run: |
          docker build --build-arg NODE_VERSION_TAG=${{ matrix.node-version }} -t myapp-${{ matrix.node-version }}:test .
          docker run myapp-${{ matrix.node-version }}:test npm test

This workflow is performing the following actions.

The workflow triggers on pushes to the main branch.
It defines a job named “test” that runs on the latest version of Ubuntu.
The build matrix is configured with three different Node.js versions: 14, 16, and 18.
The actions/checkout action is used to fetch the repository.
The actions/setup-node action is used to set up the specified Node.js version from the matrix.
The build and test steps use the docker build command with the NODE_VERSION_TAG argument set to the corresponding Node.js version from the matrix.
The docker run command executes your project’s tests inside the Docker container.

With this workflow, your project will be built and tested against multiple Node.js versions, providing you with valuable insights into its compatibility across different environments.

https://www.simonholywell.com/post/dynamic-docker-image-loading/

Pinning nix-shell package versions for reproducibility

Simon Holywell Jan 16, 2024 Updated Nov 19, 2024

Learn to wield nix-shell’s power for precise project dependency management, ensuring hassle-free development environments. Discover techniques for pinning specific Node.js versions, simplifying dependency references, and integrating yarn for enhanced control.

Show full content

Introduction

I rely on nix-shell to manage dependencies for the projects I work on, and when combined with direnv (more on that in a future post!), it automatically configures my shell. All the necessary dependencies are installed with the correct versions for the project, making them readily available in my path.

If you’re new to nix-shell, it acts as an environment manager designed to simplify project dependency management, providing a straightforward and reproducible approach to setting up development environments. It ensures that the required dependencies, libraries, and tools are configured consistently, offering a hassle-free development experience. Nix-shell’s key advantage lies in its ability to create self-contained environments, particularly valuable for projects with intricate dependencies.

You can create a shell with only the dependencies and versions you need, and nothing more.

Practical example

For example, consider a simple project written in TypeScript/Node.js/JavaScript. Typically, you’d have a file named shell.nix in the project’s root directory. To enter the shell, navigate to the directory containing the file and run nix-shell.

{ pkgs ? import <nixpkgs> {}
}:
pkgs.mkShell {
  name = "projects.my-project-name";
  buildInputs = [
    pkgs.bashInteractive
    pkgs.nodejs
  ];
}

It will include bash and nodejs from latest nixpkgs when the user enters that shell on the command line.

Latest major version

Now what if we need a particular version of Node? For a major version that is pretty easy - we can just append a major version number to the package reference. In the following example we pin to Node.js version 18.* in semantic versioning terms, but you could also indicate 14, 16, 19 or 20 for example. Be aware though that not all major versions are available in the latest nixpkgs repository so you should check using the website I’ll describe next.

{ pkgs ? import <nixpkgs> {}
}:
pkgs.mkShell {
  name = "projects.my-project-name";
  buildInputs = [
    pkgs.bashInteractive
    pkgs.nodejs-18_x
  ];
}

Using with to simplify buildInputs

As an aside we can use little trick to simplify the references in buildInputs so we don’t have to prefix each dependency with pkgs..

{ pkgs ? import <nixpkgs> {}
}:
pkgs.mkShell {
  name = "projects.my-project-name";
  buildInputs = with pkgs; [
    bashInteractive
    nodejs-18_x
  ];
}

Now each reference will be prefixed with pkgs automatically because we included with pkgs; after buildInputs =.

Pinning to a specific version

To pin to a specific version of Node.js though we need to add a little more code to the nix-shell file. This is because we need to pull the package from the nixpkgs repository at the git commit that references that particular version. Firstly, we can use the Nix package versions website to find a SHA hash of the dependency.

The author of that handy page also has an interesting blog post that goes into more detail on the problem that you might find interesting as an aside. He also goes into more detail on how this style of pinning can be imperfect.

Enter the name of the package you want to find and you’ll get back a table of available versions. I searched nodejs and got back a list of versions and found the version 18.14.0 that I was looking for. Copy the SHA hash from the Revision column of the table because you’ll need this up next - in my case this was 55070e598e0e03d1d116c49b9eff322ef07c6ac6.

As an example of how this kind of pinning can be imperfect there is no Node.js version 18.13.0 available because a derivation was never committed for it!

Now we can use the following sample nix-shell file to pull down a particular version of the nodejs package. We will still pull the latest version of Bash, but we are specifying a particular git commit hash of nixpkgs to pull nodejs from.

{ pkgs ? import <nixpkgs> {}
}:
let
  # node 18.14.0
  nodePkgs = import (fetchTarball "https://github.com/NixOS/nixpkgs/archive/55070e598e0e03d1d116c49b9eff322ef07c6ac6.tar.gz") { };
in
pkgs.mkShell {
  name = "projects.my-project-name";
  buildInputs = with pkgs; [
    bashInteractive
    nodePkgs.nodejs-18_x
  ];
}

Note the new nodePkgs variable that imports a tar file of the commit in question from github and is then used to prefix the reference to nodejs-18_x. This is how we can now be sure that we will always get node 18.4.0 when we instantiate this nix-shell.

Using yarn as the package manager

Now as a bonus let’s see how we can use yarn with our project and ensure it is referencing the correct version of node. Normally you would just add the package reference yarn into the buildInputs list, but we need to tell it to use the exact version of node that our project specifies.

{ pkgs ? import <nixpkgs> {}
}:
let
  # node 18.14.0
  nodePkgs = import (fetchTarball "https://github.com/NixOS/nixpkgs/archive/55070e598e0e03d1d116c49b9eff322ef07c6ac6.tar.gz") { };
in
pkgs.mkShell {
  name = "projects.my-project-name";
  buildInputs = with pkgs; [
    bashInteractive
    nodePkgs.nodejs-18_x
    (yarn.override { nodejs = nodePkgs.nodejs-18_x; })
  ];
}

This installs yarn and passes an override through to it that specifies the correct version of node for yarn to reference.

Putting .bin from node_modules into $PATH

Speaking of node dependencies; we can also put the node_modules/.bin directory on the path to make it easier to run and reference scripts installed by our npm/yarn dependencies. This can be achieved by adding a shellHook to the nix-shell file that is written in bash/shell and will be executed right before the new shell is handed to the user.

{ pkgs ? import <nixpkgs> {}
}:
let
  # node 18.14.0
  nodePkgs = import (fetchTarball "https://github.com/NixOS/nixpkgs/archive/55070e598e0e03d1d116c49b9eff322ef07c6ac6.tar.gz") { };
in
pkgs.mkShell {
  name = "projects.my-project-name";
  buildInputs = with pkgs; [
    bashInteractive
    nodePkgs.nodejs-18_x
    (yarn.override { nodejs = nodePkgs.nodejs-18_x; })
  ];

  shellHook = ''
    export PATH="$PWD/node_modules/.bin/:$PATH"
  '';
}

Of course you could run any code here or add any path to your $PATH.

Pinning all the things

You can also pin the overall packages import so that you always get the same version of bash or any other package that are being imported from pkgs. This is done by replacing the <nixpkgs> token with a fetchTarball that takes a SHA hash from nixpkgs just like the nodePkgs pinning we did earlier.

{ pkgs ? import (fetchTarball "https://github.com/NixOS/nixpkgs/archive/e0629618b4b419a47e2c8a3cab223e2a7f3a8f97.tar.gz") {}
}:
let
  # node 18.14.0
  nodePkgs = import (fetchTarball "https://github.com/NixOS/nixpkgs/archive/55070e598e0e03d1d116c49b9eff322ef07c6ac6.tar.gz") { };
in
pkgs.mkShell {
  name = "projects.my-project-name";
  buildInputs = with pkgs; [
    bashInteractive
    nodePkgs.nodejs-18_x
  ];
}

A good place to find the latest safe hash for the various branches of nixpkgs a good place to start is status.nixos.org that shows the build status for each of them. Generally, you will want to either grab the latests stable version, which is nixpkgs-23.11 at the time of writing, or the latest unstable with nixpkgs-stable.

Conclusion

So, there are a few tricks to make using a nix-shell a little easier and more specific. In another post I will describe how I combine this with direnv so that the shell is automatically instantiate for me when I change directories.

https://www.simonholywell.com/post/pinning-nix-shell-package-versions-for-reproducibility/

YOW! 2023 conference highlights

Simon Holywell Dec 6, 2023 Updated Nov 19, 2024

Highlights from the YOW! 2023 conference in Brisbane. This blog post covers engaging talks on topics ranging from creating a programming language for children to real-world applications of WebAssembly. Gain valuable insights from speakers like Felienne Hermans, Chanuki Illushka Seresinhe, Frank Yu, Dylan Beattie, Katie Bell, and Brooklyn Zelenka. Delve into the future of technology, deterministic execution, local-first software, and more.

Show full content

Over the last two days I was fortunate enough to attend the YOW! 2023 conference in Brisbane. It took the form of three tracks of talks over the two days with an opening keynote each morning. Here are some of the talks that stood out too me from the event and my thoughts on the topics.

Hedy: Creating a programming language for everyone

Felienne Hermans was telling the story of the creation of an education tool that makes it easier for children to learn to programme. As I was taking my own children to school on this morning I missed the beginning of this talk and caught it from about half-way through. However, what I did see was a very interesting look at the intersection of education and programming.

When a new user is given Hedy to work through they are given tasks that look like natural language and then slowly build the syntax up to be fully fledged Python. It is also possible to programme in many different languages including non-latin and right to left languages to make it even easier for newer programmers to learn. Now they can learn the language without needing to learn bits of English at the same time.

You can see an older version of the talk on YouTube from GOTO 2022 too.

Deep learning computer vision

Chanuki Illushka Seresinhe has been working on some interesting things using computer vision to determine if a given photography contains beauty or not as her passion project. Working at Zoopla she has used computer vision to assess photos of properties to inform value and detect if the renovations have been performed.

The understanding around what is a beautiful place and how it can be mapped and used to inform the creation of better public spaces was the more interesting aspect of the talk to me. One idea I particularly liked was having an app that could give you a suggested route to a given destination using the most scenic path possible.

She has co-founded a Community Interest Company in the United Kingdom to continue this work called Beautiful Places.

There is a condensed version of this talk from the Turing Institute two years ago.

Deterministic execution to scale and simplify

Frank Yu works at Coinbase and works on engineering problems that require low latency in the trading exchange. This talk spoke to making systems deterministic such that consistency between regions can be guaranteed. Essentially, you can send the same message to a service (or a collection of services) and the result will be the same.

The main thrust of the talk was saving money on transfer by computing smaller deterministic messages multiple times - they idea being that compute is cheap and network is expensive.

There is a video of a similar talk Frank gave at QCon London 2023.

Tipping points in Technology

Dylan Beattie took us through an entertaining discussion of what the future could look like through the lens of previous technological scares. When something new comes along (such as AI) then it is natural for some to be concerned or predict wild outcomes. It could be the case that AI takes all our jobs, but currently it is being used to produce things such as books and code that need a lot of human intervention for them to produce useful content.

There doesn’t seem to be a video of this talk out there anywhere at this point, but the git related song, re:bass, that Dylan created and played as an introduction is on YouTube.

Real world uses for WebAssembly

Katie Bell spent the talk discussing how WebAssembly can be used to create a safe execution target for arbitrary code on the server and browser/client.

She illustrated how this can work best using the example of Shopify that use it encapsulate user code within the checkout flow so they can ensure third-party code (plugins or APIs) do not slow down the checkout process. She also covered other technologies for isolated execution such as docker/containers with and without gVisor and V8 Isolates for Node.js. CloudFlare use V8 Isolates at edge to reduce the time a function takes to start up.

Using a rock, paper, scissors tournament application she demonstrated live in her talk how users could submit and execute code on her server. You can either upload a WebAssembly .wasm file or enter Python code directly into a code editor on the page to create a bot that will accept some input and return either rock, paper or scissors. It is then entered into a knockout tournament with other people’s bots to see who comes out on top.

Local first software

Brooklyn Zelenka discussed the work her team has been doing to bring about a local first form of computing. This is in contrast to the current paradigm of putting everything on the cloud and having the cloud as the single source of truth like Google Docs etc.

The main idea is tha things should work locally first and also give the user control over their data. This is built upon deterministic units of work that are signed.

There is an older version of this talk available on YouTube.

There’s no such thing as plain text

Dylan Beattie gave a humorous look at just what constitutes text and how it is represented by computers. There is a video of this talk from Devoxx in the UK.

In summary

These were some of the standout talks I attended at YOW! 2023. While not every talk was directly relevant to my work, the conference provided interesting insights, prompting me to delve deeper into some of the ideas.

https://www.simonholywell.com/post/yow-2023-brisbane/

Duty-free with TRS for Australian residents

Simon Holywell Dec 5, 2023 Updated Nov 19, 2024

In a departure from common international practices, Australia allows both residents and travellers duty-free purchases under the Tourist Refund Scheme. Buy the goods, get an invoice and you can claim the duty back on the purchase as you depart. Even if you plan to return to Australia with the purchased goods, you can still claim back the tax.

Show full content

Introduction

In a departure from common international practices, Australia allows both residents and travellers duty-free purchases under the Tourist Refund Scheme (TRS). Even if you plan to return to Australia with the purchased goods, you can still claim back the tax.

I dropped my phone before a recent overseas trip and so I claimed back the duty on its replacement when I travelled. This made the phone slightly cheaper to purchase outright and somewhat softened the blow of breaking a 9-month-old phone.

When travelling overseas it is possible to claim back the tax on purchases made within Australia totalling $300 or more including GST using the Tourist Refund Scheme (TRS). With an invoice and an international boarding pass you can claim the tax back even if you have already opened the packaging or started using the item.

This post recounts my latest TRS experience, but keep in mind that rules and regulations can change, so it’s advisable to check with Australian Border Force (ABF) beforehand.

What you need to make a claim

You must purchase the goods within 60 days of departure, obtain a tax invoice and spend at least $300 including GST. It is possible to buy the goods in multiple transactions so long as the total of all invoices for each retailer is $300 or more.

You could buy three different items from JB HiFi on three different invoices so long as all the ABNs matched across all three invoices.

“You cannot claim on goods wholly or partially consumed in Australia such as food, drinks, health supplements and perfumes” (Tourist Refund Scheme: Common Questions). This also only works for goods purchased in Australia from an Australian business.

You should make sure that you have:

the article/goods you are claiming for (you can even use it beforehand)
passport
boarding pass/proof of international travel
original invoice (in good condition) with
- an ABN, retailer name and address
- your name (and only your name) as it appears in your passport
- description of the goods
- the amount of GST paid
- the date of the purchase

For more details on this you should check the Australian Border Force TRS page and the associated Common Questions page.

I bought a phone and used it for over a week before flying out and claiming the duty on it. Importantly, just the phone was required at claim time - you do not need the original packaging - just the item you are claiming on so “long as the officer can verify the goods against the corresponding tax invoice when the TRS claim is processed” (TRS Common Questions).

In some cases it is possible to put the goods in your checked luggage at the airport or where they must be handled as dangerous goods, but it is best to have them with you at the time of the claim. This way the border agent can sight the goods and verify them against your invoices. It has been a source of spurious claims and fraud in the past so having it handy to show them is a good idea.

How much will you get back

Before making a purchase you can calculate the amount that you will get back by taking the total price and dividing that by 11 (assuming GST is still at the current rate of 10%). For example, with a purchase amount of $1,600 it would be a refund of $145.45 ($1,600 / 11 = $145.45).

There is also a GST calculator on the Australian government Moneysmart financial advice website.

After the purchase you can refer to the GST amount indicated on the invoice for the product to determine how much you stand to claim back.

How to claim

On the day of departure you need to find the TRS desk at the airport with time to spare as all claims need to be processed no later than 30 minutes before your flight. Depending on your luck there can be long queues for the TRS service so it is a good idea to complete this step as soon as you have cleared security.

It is best to fill out the claims online beforehand and you might be able to use a dedicated queue, which could well be quicker. This is especially true where you have multiple invoices. If you don’t and they are busy then they may ask you to step out of line, complete the claim online and then return to the back of the queue.

I only had one and they processed it at the desk, but he reminded me to use their phone apps or the website in future. He also asked me to show him the phone I had bought. This was simply a visual check and he did not even hold the phone let alone check the IMEI number against the invoice.

The staff you are claiming from here are Australian Border Force officers - in fact I saw the same officer that processed my claim checking passports on my return to Australia.

Keep the receipt you are issued for the refund by the officer in case you need it upon your return to Australia. I just tucked it into my Australian passport so I would have it handy at arrivals.

When you return

If you bought the items as gifts for family overseas and the items are not returning with you then the usual border process applies, but if you do bring the items back into the country with you then you might need to declare them.

On the Incoming Passenger Card (IPC) you must declare that you have goods that you purchased tax-free in Australia if the total of all goods you are returning with is over $900. That is all goods, which means those purchased during the trip and those you claimed under TRS at the point of departure. At the time of writing this is completed on the first page of the card as question 3 under “Are you bringing into Australia:”.

Don’t forget that this is effectively statutory declaration/affidavit and the penalties for falsify or incorrectly completing it can be high. It is definitely cheaper to declare and pay back any GST owing than to get caught mis-declaring and cop a fine.

As you pass through the border control desks at the airport you will be asked why you have declared and you need to tell them you made a TRS claim. I simply said that I had claimed for a phone on departure and the border officer directed me to the exit. That was it. I did not need to pay any tax back or, in this case, even show the goods in question on return.

If you do get stopped and they check the value of the goods then it is important to know that you cannot bring more than $900 of general goods into the country duty free as an adult and $450 as a child.

Pooling your duty-free allowance with others

When travelling with family you can pool your goods allowance so for two adults travelling together that is $1,800 for example. This is where a family is defined as:

A family includes a person and his or her de facto partner (including same-sex couples) and any of their children under 18 years of age; or a husband and wife, and any of their children.

Paying back the duty

If you have claimed over your allowance then you may have to pay back the GST that you claimed through the Tourist Refund Scheme on arrival in Australia.

Anecdotal evidence suggests that unless you’re a regular customer or otherwise flagged for high value items that, generally, the border officers will allow you through without checking too thoroughly. That is not to say they won’t check and you should be prepared to pay back the GST if you’ve brought in more than the allowance.

It should be noted that despite what forum users might write to the contrary there is in fact no consideration for depreciation on the items when they return. You will pay back the GST amount that you claimed on the item.

If the item is over the allowance then you will pay back all the GST on that item or items exceeding the allowance. You cannot just pay back the GST on the amount that exceeds the duty-free allowance.

So in the case of a single person with a phone that cost $1,500 including GST originally ($1,363.64 excluding GST) they would pay back the full GST amount of $136.36 because the total excluding GST exceeds $900. They could not just pay back the GST amount on the portion of the price over $900 and they would need to pay the whole GST amount. However, if the item was $990 including GST ($900 excluding GST) then on return you would not need to pay or declare (so long as it was the only duty-free item you had in your possession) because at the time of importation it is deemed to be $900 for tax purposes.

In my case I had pooled my allowance with my partner and the phone was well within that combined duty-free allowance should they have checked.

Fly safe

Of course, it goes without saying, that I am not a border agent or a specialist in tax matters so you should verify everything beforehand and make your own decisions.

Thats it! Have a great flight and enjoy the little extra bit of spending money for your holiday.

https://www.simonholywell.com/post/duty-free-with-trs-for-australians/

Docker cross-compilation

Simon Holywell Nov 17, 2023 Updated Nov 19, 2024

Explore the advantages of cross-compiling in Docker whilst working through a specific case involving a Node.js project using pkg, aiming for a Linux ARM64 architecture in the Docker build.

Show full content

Every so often it can be really helpful to cross compile a docker image or stage in a multi-stage build. This is especially useful where the programming language you are using supports cross-platforms builds or the tooling you are using for compilation provides good support for it.

Recently, I had this need when I was attempting to create build of a project written in node using pkg. Although I did not end up going this route in the end, I think the principal is interesting enough to talk about.

No matter the underlying platform I wanted to ensure I was getting a Linux ARM64 architecture during the docker build. You can use the same technique to build on other architectures and even more than one providing that the image you have chosen supports them.

Here I am using the official node image and it currently supports amd64, arm32v6, arm32v7, arm64v8, ppc64le and s390x.

FROM --platform=linux/amd64 node:18 as build

RUN yarn install --frozen-lockfile
RUN npx pkg ./index.js

Note the inclusion of the --platform=linux/amd64 argument to the FROM directive in the Dockerfile.

For more information the docker documentation has a helpful page that covers this functionality in more depth.

https://www.simonholywell.com/post/docker-cross-compilation/

DIY hybrid valve headphone amplifier (SSMH)

Simon Holywell Feb 12, 2022 Updated Nov 19, 2024

I made some circuit diagrams for the various versions of the Starving Student Millet Hybrid amplifier, which is a tube/valve headphone amplifier you can build yourself relatively easily for fun.

Show full content

A number of years ago I made my own 12AU7 version of the Starving Student Millet Hybrid headphone amplifier. When the designs were first published using a 19J6 tube it was relatively easy to find them, but in recent years this has become much harder. To workaround this a number of designs using various other valves have been posted on forums.

Trawling through hundreds of pages of posts to find the details was tiresome and as I enjoyed the project I wanted to give back to the community. So, two years ago I created schematics of the variations and published them to GitHub.

I used KiCad to create the diagrams and published both the source files and rendered schematics. You can download the variations and diagrams from GitHub. Alternatively, you can download the lot as a zip file.

My SSMH amplifier

https://www.simonholywell.com/post/diy-hybrid-valve-amplifier/

Windows 10 shortcuts

Simon Holywell Nov 14, 2020 Updated Nov 19, 2024

Here are some of the window management shortcuts that I have discovered and found very useful having moved to working on a Windows 10 machine.

Show full content

Recently, I have moved from a MacBook Pro to a Dell XPS running Windows 10 with Window’s Subsystem for Linux (WSL2) for my main workstation. This means learning new software and therefore keyboard shortcuts.

Here are some of the window management shortcuts that I have discovered and found very useful.

To cycle through the three window states (Minimized, Restore and Maximized) for a given window/application:

Win + Ctrl + Up arrow
Win + Ctrl + Down arrow

Force a window to a SnapZone (Windows 10 ships with a 50% split on each monitor so this shortcut will push an application into one):

Win + Left arrow
Win + Right arrow

Move a window to another physical monitor:

Win + Shift + Left arrow
Win + Shift + Right arrow

Jump to another virtual desktop/workspace:

Win + Ctrl + Left arrow
Win + Ctrl + Right arrow

Give a virtual desktop/workspace a vanity name:

Win + Ctrl + R

Move a window to another virtual desktop/workspace:

Win + Alt + Left arrow
Win + Alt + Right arrow

To open an application As Administrator you can hold down Ctrl + Shift whilst pressing Enter. The UAC dialogue will show, press Right arrow to move to the confirm button and press Enter again.

https://www.simonholywell.com/post/windows-10-shortcuts/

Jest Date mocking

Simon Holywell Sep 10, 2020 Updated Nov 19, 2024

The release of Jest 26 brought a new timer faking interface, which now supports Date mocks. I couldn’t readily find any documentation for this feature so, here is how I used in a project recently.

Show full content

The release of Jest 26 brought a new timer faking interface, which now supports Date mocks. I couldn’t readily find any documentation for this feature so, here is how I used in a project recently.

If you are not already running Jest 26 then upgrade and if you’re using TypeScript don’t forget to update the @types package at the same time.

First, you need to enable the new timer handling in your test file as it is currently behind a feature flag.

jest.useFakeTimers("modern");

When Jest 27 is released then it should be the default - you’ll still need to enable fake timers of course! At that point you should be able to get away with the following:

jest.useFakeTimers();

Now to mock the Date in the tests I used the jest.setSystemTime() function.

jest.setSystemTime(new Date("2012-10-10")); // a Date object
jest.setSystemTime(1349852318000); // same date as a unix time in milliseconds

Any call to new Date() or Date.now() will now return the hardcoded time we set above.

it("should return the hardcoded date", () => {
  jest.setSystemTime(1349852318268);
  expect(new Date().valueOf()).toBe(1349852318268);
});

This will affect any code that uses Date that is called after you’ve set the time - either in the test file itself or in the subject code that is under test (your application code for example).

You can still get to the real time in the test if you need it with jest.getRealSystemTime(). Here is an example test that you can try out in Jest that shows this in action.

it("should return the hardcoded date", () => {
  jest.setSystemTime(1349852318268);
  expect(new Date().valueOf()).toBe(1349852318268);
  expect(jest.getRealSystemTime().valueOf()).not.toBe(1349852318268);
});

You can also remove the mocking entirely by calling jest.useRealTimers(). This is something you might like to do in a afterEach() function perhaps.

https://www.simonholywell.com/post/jest-date-mocking/

Yarn and NPM on WSL

Simon Holywell Jul 30, 2020 Updated Nov 19, 2024

Old versions of WSL can lead to issues with file system permissions with Node. Upgrading the version of WSL can solve this issue for you.

Show full content

If you are running Node in Windows Subsystem for Linux then you may come across errors like the following when running yarn or npm install.

EPERM: operation not permitted, copyfile '/home/simon/.cache/yarn/v6/npm-@babel-code-frame-7.10.4-168da1a36e90da68ae8d49c0f1b48c7c6249213a-integrity/node_modules/@babel/code-frame/package.json' -> '/mnt/c/Users/simon/Documents/projects/project/node_modules/@babel/code-frame/package.json'

These errors are likely being triggered because you need to upgrade the version of WSL that you are running. To verify the version you can list the installed distros.

wsl.exe -l -v

You can drop the .exe part of the command if you’re using PowerShell or Command Prompt, but you don’t have to. It is only neccessary when you’re in a WSL bash prompt.

If the distribution you are using has a version of 1 then you can upgrade it - make sure your replace Ubuntu-20.04 with the name of the distribution you’re using.

wsl.exe --set-version Ubuntu-20.04 2

To ensure that any further distributions use WSL2 you can also update the default installation version:

wsl.exe --set-default-version 2

Finally, when you listed the versions (wsl.exe -l -v) you may have noticed there was an asterisk (*) beside one distribution - this indicates that it is the default distribution. You can change this to your preferred distribution (again change the distribution name to match the distribution you are using).

wsl.exe --set-default Ubuntu-20.04

https://www.simonholywell.com/post/yarn-npm-wsl/

Making rope working fids

Simon Holywell Jun 14, 2020 Updated Nov 19, 2024

Rope working fids make it easier to splice ropes, but they can be expensive to buy for a small DIY project. Here are some ideas of how to make your own fids and the required dimensions of each fid size.

Show full content

If you want to splice ropes then a common tool is the “Selma” fid. Essentially, it is a metal tube with a blunt point at one end. There are other styles but this is the most common.

Rope is loaded into the rear of a fid and the pointed tip is used to work into and through rope. In this way it can be used to splice rope much like a needle is used to stitch fabric.

You’ll see that a lot of rope-work tutorials use fids and, crucially, fid lengths to describe the various techniques. For limited hobbyist use fids are expensive to purchase and I wanted to avoid the expenditure whilst teaching myself rope splicing.

A common work around is to buy metal knitting needles, cut them down to size and shape the rear to hold rope. The trouble is finding the correct length can be difficult so I’ve written this quick article to refer back to later.

You can find the appropriate fid length by taking the rope diameter and multiplying it by twenty-one (21). The fid diameter is three-quarters (3/4) of the rope’s diameter. Finally, the short fid mark is made at three-quarters (3/4) of the fid’s overall length from the pointed tip.

   fid length
|--------------|   = 21 x rope diameter
<==========|====
           |---|   = 1/4 of fid length
           `-> short fid mark

So, if we’re working with 10mm diameter rope (most rock climbing rope is some where between 10mm and 13mm):

Length:
  21 x 10    = 210

Diameter:
  3/4 of 10, which is the same as
  0.75 x 10  = 7.5

Short fid mark:
  3/4 of 210, which is the same as
  0.75 x 210 = 157.5

The following measurements are all given in millimetres.

Rope diameter Fid length Fid diameter Short fid mark 1 21 0.75 15.75 2 42 1.5 31.5 3 63 2.25 47.25 4 84 3 63 5 105 3.75 78.75 6 126 4.5 94.5 7 147 5.25 110.25 8 168 6 126 9 189 6.75 141.75 10 210 7.5 157.5 11 231 8.25 173.25 12 252 9 189 13 273 9.75 204.75 14 294 10.5 220.5 15 315 11.25 236.25

That should be all the information you need to start making fids of your own. You’ll probably also need a pusher to force the fid through the rope. This is a handle with a sturdy length of metal wire protruding from it. When a fid gets stuck; the pusher can be used to exert force on the rear of the fid.

Where even this struggles you can use a loop of wire to pull the rope through with pliers instead of pushing it. There are commercially available options here called needles and the D-Splicer.

https://www.simonholywell.com/post/making-rope-working-fids/

Testing TypeScript types

Simon Holywell Jul 14, 2019 Updated Nov 19, 2024

To make the construction and maintenance of more advanced types easier it can be helpful to write some tests that ensure their correct function. This sounds a little easier than it turns out to be. As part of the ecosystem for TypeScript Microsoft have written and released the dtslint tool. It can be used to link and compile TypeScript types for static analysis and mostly serves to keep the @types/* packages in line.

Show full content

To make the construction and maintenance of more advanced types easier it can be helpful to write some tests that ensure their correct function. This sounds a little easier than it turns out to be.

As part of the ecosystem for TypeScript Microsoft have written and released the dtslint tool. It can be used to link and compile TypeScript types for static analysis and mostly serves to keep the @types/* packages in line.

Firstly, install the dependencies that we will need to test the types.

npm install --save-dev dtslint conditional-type-checks

Then in the directory you wish to write your tests (the examples in this article use a directory ./typings/__tests__ for this) - create a new file index.d.ts with the following contents:

// TypeScript Version: 3.3
// see https://github.com/Microsoft/dtslint#specify-a-typescript-version for more information

The first line lets dtslint know which TypeScript version you would like to test your types against.

In that same directory you will also need to include a tsconfig.json file like the following:

// this additional tsconfig is required by dtslint
// see: https://github.com/Microsoft/dtslint#typestsconfigjson
{
  "compilerOptions": {
    "module": "commonjs",
    "lib": ["es6"],
    "noImplicitAny": true,
    "noImplicitThis": true,
    "strictNullChecks": true,
    "strictFunctionTypes": true,
    "noEmit": true,

    // If the library is an external module (uses `export`), this allows your test file to import "mylib" instead of "./index".
    // If the library is global (cannot be imported via `import` or `require`), leave this out.
    "baseUrl": "."
  }
}

Finally, dtslint needs to be added to the configuration for tslint in the tslint.json:

{
  "extends": ["dtslint/dtslint.json"],
  "rules": {
    "no-useless-files": false,
    "no-relative-import-in-test": false
  }
}

You should now have a directory structure that looks something like the following:

typings
  `-- __tests__
    `-- index.d.ts
    `-- tsconfig.json
    `-- tslint.json

Now in your package.json file you can add a script to run the dtslint testing.

  "ts:dtslint": "dtslint ./typings/__tests__",

To make the next few steps easier to follow we’ll quickly write out a new TypeScript type in ./typings - without this we wouldn’t actually have anything to test! So, let’s write an implementation of the Omit type that now comes with the TypeScript standard library.

It uses both Pick and Exclude, which are also included with TypeScript. If they are new to you then you might like to read my previous article Advanced TypeScript Types to get an introduction first.

/**
 * Remove all keys listed in K from the object T
 *
 * @example type MyType = Omit<{ a: '1'; b: '2'; c: '3' }, 'a' | 'b'>  // { c: '3' }
 */
export type Omit<T extends object, K extends keyof T> = Pick<
  T,
  Exclude<keyof T, K>
>;

Now you are ready to write some tests for the types you have defined. dtslint uses the $ExpectType annotation to state the type of the type expression on the next line.

// $ExpectType Pick<{ a: "1"; b: "2"; c: "3"; }, "c">
type Test_01_Omit = Omit<{ a: "1"; b: "2"; c: "3" }, "a" | "b">;

dtslint will now evaluate the Test_01_Omit expression and determine the resultant type to compare it against the type you’ve specified with $ExpectType. If you’re anticipating your type to result in a type error then this can be asserted with $ExpectError. These are documented in the README for the dtslint project.

Next up we can use some of the assertion types from the conditional-type-checks package we installed earlier to run some additional unit style tests of the type.

type Test_02_Omit =
  | AssertTrue<IsExact<Test_01_Omit, { c: "3" }>>
  | AssertFalse<Has<Test_01_Omit, { a: "1"; b: "2" }>>;

Here the assertions state that the final evaluated expression only has one key c and does not have the keys a or b. There are more conditional types that you can employ documented in the project README.

Using both of these projects you can test the more advanced types that your projects employ to ensure their continued success against various TypeScript versions.

https://www.simonholywell.com/post/testing-typescript-types/

Advanced TypeScript types

Simon Holywell Jun 25, 2019 Updated Nov 19, 2024

As TypeScript applications get more complex so do the types required to describe it. There are a number of built-in types that can be used for this purpose or combined to create more specialised types. What I term modifying types such as Partial and Required are included in the language and I will quickly cover these first to warm up for the deeper types we’ll address later. This article will quickly move on to focus on the slightly more advanced types beginning with Extract.

Show full content

As TypeScript applications get more complex so do the types required to describe it. There are a number of built-in types that can be used for this purpose or combined to create more specialised types.

What I term modifying types such as Partial and Required are included in the language and I will quickly cover these first to warm up for the deeper types we’ll address later.

This article will quickly move on to focus on the slightly more advanced types beginning with Extract. You can see the source of the various types by looking at the lib.es5.d.ts declaration file inside TypeScript.

Partial

This generic type takes a single argument, an object type, and returns a new type where all the properties are defined as optional.

interface UserRecord {
  name: string;
  age: number;
}

type User = Partial<UserRecord>;

With the application of the Partial type TypeScript will interpret User as the following type where all properties are now optional.

type User = {
  name?: string;
  age?: number;
};

You can also get a little creative and keep some properties required when applying partial.

type User = Partial<UserRecord> & { name: UserRecord["name"] };

Whilst this works and the name property is now mandatory there are easier ways to do this that will become apparent further into this article.

Required

Much like partial this type takes a single argument of an object type and returns a new type where all the properties are required.

interface ComputerRecord {
  clockSpeed?: number;
  ram?: number;
}
type Computer = Required<ComputerRecord>;

Creates a new type with the following form when interpreted by TypeScript.

type Computer = {
  clockSpeed: number;
  ram: number;
};

Readonly

Pretty much what it says on the tin; this type marks all the properties of an object type as readonly.

interface CarRecord {
  make: string;
  seats?: number;
}
type Car = Readonly<CarRecord>;

Revealing a new TypeScript type that takes the following shape.

type Car = {
  readonly make: string;
  readonly seats?: number;
};

Record

This type is a little different to the three types that we’ve already reviewed so far; it takes two arguments. A union of keys and a type. With this information TypeScript will construct a new type that includes each of these keys set to the supplied type.

type Building = Record<"streetNumber" | "floors" | "bedrooms", number>;

Which TypeScript will expand into the following type when it is interpreted.

type Building = {
  streetNumber: number;
  floors: number;
  bedrooms: number;
};

Again, you can get a little creative with this type and do some things like this.

type Building = Partial<Record<"streetNumber" | "floors" | "bedrooms", number>>;

Will create a type when interpreted that looks a lot like what you might write as:

type Building = {
  streetNumber?: number;
  floors?: number;
  bedrooms?: number;
};

Another neat trick is to use Record to create types that include properties of multiple types.

type Plant = Record<"name" | "family", string> &
  Record<"height" | "age", number>;

That will create a type that will be interpreted into the following:

type Plant = {
  name: string;
  family: string;
  height: number;
  age: number;
};

Extract (better known as intersection)

Set notation: A∩B

Extract venn diagram: includes a and b, but excludes x and z

Items that exist in both the first and second arguments to the type are kept, but unique items from either side are dropped. This type essentially fills the role of an intersection between two types.

type T1 = Extract<"a" | "b" | "x", "a" | "b" | "z">; // 'a' | 'b'

Describing the same operation in TypeScript code this type could be written using the in-built Array.prototype.filter() function.

const t1 = ["a", "b", "x"].filter((x) => ["a", "b", "z"].includes(x)); // ['a', 'b']

If you have a two union types and you want to the find the intersection then Extract is very useful.

Exclude (better known as difference)

Set notation: A – B

Exclude venn diagram: includes x, but excludes a, b and z

Calculates the difference between two types (important to note that this is not the symmetrical difference). Everything that exists in the first argument excluding all items that appear in the second argument will be included in the resultant type.

// keep everything from the left excluding any from the right
type T2 = Exclude<"a" | "b" | "x", "a" | "b" | "z">; // 'x'

This can also be described by the following TypeScript implementation code.

const t2 = = ['a', 'b', 'x'].filter(
  x => !(['a', 'b', 'z'].includes(x))
) // ['x']

Exclude is used to narrow union types back down again. I am including the following code as a demonstration, but it is not production ready code and in some ways takes the form of pseudocode.

enum ConfigType {
  INI,
  JSON,
  TOML,
}

interface ConfigObject {
  name: string;
  port: number;
}
type JSONConfig = string;
type TOMLConfig = string;
type INIConfig = string;
type ENVConfig = ConfigObject;

type Config = JSONConfig | TOMLConfig | INIConfig | ENVConfig;
type UnparsedConfig = Exclude<Config, ENVConfig | ConfigObject>;
type ParsedConfig = Exclude<Config, UnparsedConfig>;

function loadJsonConfig(cfg: JSONConfig): ParsedConfig {
  return JSON.parse(cfg);
}

function loadTomlConfig(cfg: TOMLConfig): ParsedConfig {
  return TOML.parse(cfg);
}

function loadIniConfig(cfg: INIConfig): ParsedConfig {
  return INI.parse(cfg);
}

function loadConfig(cfg: UnparsedConfig): ParsedConfig {
  if (isType(ConfigType.JSON, cfg)) {
    return loadJsonConfig(cfg);
  } else if (isType(ConfigType.TOML, cfg)) {
    return loadTomlConfig(cfg);
  } else if (isType(ConfigType.INI, cfg)) {
    return loadIniConfig(cfg);
  }
}

Pick

Set notation: A∩B

Pick venn diagram: includes a and b, but excludes x

Similar to an intersection, but it is based on the keys defined in the first type argument. The second argument is a list of the keys to copy into the new type.

type T3 = Pick<{ a: string; b: number; x: boolean }, "a" | "b">;
// { a: string, b: number }

Here is a very contrived example of a possible use for Pick:

interface Config {
  host: {
    uri: string;
    port: number;
  };
  authentication: {
    oauth: {
      uri: string;
    };
  };
}

// these get config functions could be loading from the environment
// or different files etc in a real application. Here they are hard
// coded for demonstration purposes.
const getHostConfig = (): Pick<Config, "host"> => ({
  host: {
    uri: "http://example.org",
    port: 1337,
  },
});

const getAuthConfig = (): Pick<Config["authentication"], "oauth"> => ({
  oauth: {
    uri: "http://example.org",
  },
});

const main = (cfg: Config) => {
  // this is where you application code would probably be
};

// assemble the final config object by piecing together
// the various parts that were loaded up from the env etc.
// and start the application
main({ ...getHostConfig(), authentication: getAuthConfig() });

Omit

Set notation: A – B

Omit venn diagram: includes x, but excludes a and b

Again, this type is similar to the Exclude type, but it takes an object type and a list of keys. The keys indicate which properties should be dropped from the new object type.

type T4 = Omit<{ a: string; b: number; x: boolean }, "a" | "b">;
// { x: string }

This has recently been added to the set of types that come with TypeScript by default in 3.5, but older code will need to implement this manually using code like the follow.

Notice how it builds upon two types that we’ve already looked at - Exclude and Pick.

export type Omit<T extends object, K extends keyof T> = Pick<
  T,
  Exclude<keyof T, K>
>;

Using the same example types as Pick we could have a function something like the following:

const startServer = (cfg: Omit<Config, "authentication">) => {
  http.listen(cfg.host.port, () => {
    console.log("Started...");
  });
};

Difference (symmetrical)

Set notation: ‘(A∩B) or (A∪B) - (A∩B)

Difference venn diagram: includes x and z, but excludes a and b

Providing types for symmetrical difference is a little more difficult. This is where values that are unique from both the left and right should be included in the resultant type. Essentially this will lead to a final type that will be used in the following way.

type T5 = Difference<
  { a: number; b: number; x: number },
  { a: number; b: number; z: number }
>; // { x: number; z: number }

As I mentioned this is a fair bit more difficult than it sounds and there are a number of steps required so hang in there.

To produce this we must first workout the difference between the keys in each of the input types. We’ll first write a key differencing type - AMinusB. This will take two object types and keep all the keys of A that do not exist in B.

export type AMinusB<A extends keyof any, B extends keyof any> = ({
  [P in A]: P;
} & { [P in B]: never } & { [x: string]: never })[A];

The set notation for this is A - B (as you would expect) and that makes this type is very similar to one that we’ve just explored - Omit. AMinusB is a little different in that it can take any two objects and calculate the keys that exist in A, but not in B. Omit on the other hand dictates that the keys it is supplied are on the object it is given.

To get the symmetrical difference of the keys we can execute the AMinusB type twice and join them in a sum type.

export type SymmetricalKeyDiff<A extends object, B extends object> =
  | AMinusB<keyof A, keyof B>
  | AMinusB<keyof B, keyof A>;

Note that the key lists are flipped between the two calls to AMinusB so as to get key difference both ways - thus powering the “symmetrical” part of this difference type.

With these two key types we can now create the final differencing type that will take the keys and apply them to an object type. Given what we’ve already learned about the inbuilt types we know that Pick takes an object type and a list of keys and will return a new object type with just the specified properties/keys.

So, given SymmetricalKeyDiff and Pick we can create a symmetrical difference type. The input object for Pick is the union of A and B and the list of keys is the SymmetricalKeyDiff of A and B.

export type SymmetricalDiff<A extends object, B extends object> = Pick<
  A & B,
  SymmetricalKeyDiff<A, B>
>;

Putting this type into action looks something like this:

type T5 = SymmetricalDifference<
  { a: number; b: number; x: number },
  { a: number; b: number; z: string }
>; // { x: number; z: string }

Intersection

Using the same basic underlying types it is also possible to get the intersection of two object types.

export type Intersection<A extends object, B extends object> = Omit<
  A & B,
  SymmetricalKeyDiff<A, B>
>;

Put into practice this type can be used in the following way:

type T6 = Intersection<
  { a: number; b: number; x: number },
  { a: number; b: number; z: string }
>; // { a: number; b:number }

So, there you have it - some reasonably complicated types defined in TypeScript. Hopefully, you’ve been able to follow along until the end and you get some use out of what you’ve learnt here.

https://www.simonholywell.com/post/advanced-typescript-types/

TypeScript constructors and generic types

Simon Holywell May 27, 2019 Updated Nov 19, 2024

I have recently found myself needing a type for class constructors that is at once generic and tight enough to ensure a genuine constructor. This is useful in situations where you must handle a variety of classes - those that come from other libraries or applications that you cannot control. When writing your own dependency injection container or some other generalised library you cannot know what class constructors might be passed in.

Show full content

I have recently found myself needing a type for class constructors that is at once generic and tight enough to ensure a genuine constructor. This is useful in situations where you must handle a variety of classes - those that come from other libraries or applications that you cannot control.

When writing your own dependency injection container or some other generalised library you cannot know what class constructors might be passed in. The code just needs to know that calling the constructor will lead to an instance.

I’ve settled on a type that I use for this situation. Whilst the type itself will land up being rather small, and some might say simple, it is, nevertheless not particularly obvious.

An example class constructor we might want to pass to other functions could be something like this little Author class definition.

class Author {
  public readonly age: number = NaN;
  public readonly email: string = "";
  public readonly name: string = "";
}

When creating API endpoints it is common to accept a JSON string in the request body that needs to be validated and, ideally where TypeScript is involved, correctly typed. To facilitate this we might have a function that takes a JSON string and converts it into an instance of a predetermined class.

This code is for demonstration and not production ready, but you could imagine it handling requests for a REST API.

/**
 * Using a given JSON string construct and populate an instance of the
 * supplied class constructor
 * @param source JSON request payload string that the API receives
 * @param destinationConstructor a class constructor
 */
const json2Instance = (source: string, destinationConstructor: any) =>
  Object.assign(new destinationConstructor(), JSON.parse(source));

const simon = json2Instance('{"name":"simon"}', Author);

This looks like it will work nicely, but in practice by using the any type on the destinationConstructor the types have been broken. This prevents type checking from working correctly, which also means that auto hinting will no longer work in developer’s IDEs or editors. So, we need to come up with a type for it so that json2Instance() allows the type signatures to flow through.

Types given as any effectively block all benefits of using TypeScript in the first place - there is a place for them, but that is another article entirely.

Looking at the types available in lib.es5.d.ts from the TypeScript language source code shows us what a constructor type could look like. There are types for all the native JavaScript constructors such as Number, String, Function and Object.

Both the Function and Object constructor types have additional properties that are possibly undesirable, so instead of using them directly or extending them we’ll create our own.

The most basic of constructor types could be defined as a function that returns an Object, but this is not quite what we are after as you might see.

type Constructor = new () => Object;

const json2Instance = (source: string, destinationConstructor: Constructor) =>
  Object.assign(new destinationConstructor(), JSON.parse(source));

Unfortunately, we’re still losing the type - we know it’s an Author, but this constructor type is telling TypeScript that it is a standard or plain old JavaScript Object. To tighten this up it is necessary to introduce generic types.

Before we move onto that though - a quick word on constructors that take arguments (args in the example code). To handle constructor functions that take arguments we can make use of the spread operator in the constructor type.

class AuthorWithConstructor extends Author {
  public readonly greeting!: string;
  constructor(name: string = "") {
    this.greeting = `Top of the muffin to you, ${name}`;
  }
}
type Constructor = new (...args: any[]) => Object;

This Constructor type is still indicating that the returned value is of type Object, which as we discovered before is breaking the typings for the json2Instance() function. Using TypeScript’s generics features it is possible to correct this behaviour.

type Constructor<T> = new (...args: any[]) => T;

By passing in the type (T) as a generic type argument it is possible to state that the return value of the constructor is of type T. To use this new type there are a couple of changes to make to the json2Instance() function to allow for the generic to be specified.

const json2Instance = <T>(
  source: string,
  destinationConstructor: Constructor<T>,
): T => Object.assign(new destinationConstructor(), JSON.parse(source));

When called the type (Author) now flows through as the generic T type.

const simon = json2Instance('{"name":"simon"}', Author);
console.log({ age: simon.age, nextYear: simon.age + 1 });
// no type errors because it knows age is number in the addition

// also in your IDE/editor you'll now get code completion/suggestions where you type
// the instance name `simon` and get a list of possible properties:
// simon.
//   |--> age
//   |--> email
//   |--> name

So, we have solved the problem where the type of the constructor (Author) is known. However, it is not always possible or desirable to know the type. Think of defining other types or interfaces that have a constructor as a property.

A limited example of this in action might be to have a list of class constructors.

type ControllerList = Constructor[];

We do not know the class type of the constructors in the list and it is not necessary to know for our calling code. It just needs to know it can create an instance. By providing a default for the type argument (T) of {} we allow implementing types avoid providing a type argument that they cannot know.

type Constructor<T = {}> = new (...args: any[]) => T;

By default the type will be a constructor that returns an object, but as before if you specify the type argument T then it will use the given type.

It is possible to tighten up our definition a little further using the extends keyword so that any T must have an object type - as all constructors do.

type Constructor<T extends {} = {}> = new (...args: any[]) => T;

And, there you have it. A constructor type that is at once flexible and restrictive.

https://www.simonholywell.com/post/typescript-constructor-type/

The lambda calculus for developers

Simon Holywell Feb 17, 2019 Updated Nov 19, 2024

This will be a quick introduction to the lambda calculus syntax, alpha (α) equivalence and beta (β) reduction. What does a lambda look like? I am going to use the identity function as an example for the simplicity it provides. This can be expressed as a lambda function with the notation λx.x. It is a function that when given an argument outputs that argument as its return value. You can also have multiple arguments with a lambda like λxy.

Show full content

This will be a quick introduction to the lambda calculus syntax, alpha (α) equivalence and beta (β) reduction.

What does a lambda look like?

I am going to use the identity function as an example for the simplicity it provides. This can be expressed as a lambda function with the notation λx.x. It is a function that when given an argument outputs that argument as its return value. You can also have multiple arguments with a lambda like λxy.xy.

A lambda is comprised of a head, argument and body.

diagram illustrating the parts of a lambda function - head(argument).body

The head of the lambda begins with the lambda character (λ), which indicates the start of a function. This is immediately followed by the argument that is separated from the body of the function by a dot/period (.).

Applying a lambda

The identity function can be applied to any argument passed to it. Here it is when applied to the digit two (2).

(λx.x) 2
2

The steps to arrive at the final answer can be further broken down and illustrated.

the application of the identity lambda against the digit 2 as a diagram

We have just completed the simplest of beta reductions. This is the process of reducing all expressions to their normal form or smallest unit - to the point where you can do no more reduction. In this case we can reduce all the way to a single value, but this is not always the case as you’ll see further on.

In a programming language you may already know

In JavaScript this would be written as

((x) => x)(2)(
  // OR
  function (x) {
    return x;
  },
)(2);

and in PHP

<?php
(function($x) {
    return $x;
})(2)

and Python

(lambda x: x) (2)

and Ruby

->(x) { x }.call(2)

and Haskell

(\x -> x) 2

Some more simple examples

(λx.x) 10
10
---
(λx.x * 2) 2
4
---
(λx.1 + x) 2
3

Free variables

A free variable is one that is not mentioned in the head of a lambda - y is a free variable in the expression λx.xy. This does not prevent the expression from being reduced though.

(λx.xy) z
zy

The opposite of a free variable is a bound variable - it is bound to an argument specified in the head of the lambda.

Higher order

The lambda calculus can also encode higher order operations - that is lambdas that can accept lambdas as arguments and/or return lambdas. To make it easier to keep track of this process I will use “notes to self” inside square brackets ([]) that illustrate the value an argument was substituted with.

For reference here is the identity example again with the additional substitution notation.

(λx.x) 2
[ x := 2 ]
2

Now for the first higher order lambda expression - the identity lambda applied to itself.

(λx.x) (λy.y)
[ x := (λy.y) ]
λy.y

This is a good time to mention that the lambda calculus is left associative during beta reduction. We start with the leftmost expression and apply the left most argument to it. In the following example I’ve added an extra initial step to wrap the first reduction inside parentheses (()) so as to make this association explicit

(λx.x) (λy.y) z
((λx.x) (λy.y)) z  ; here are those extra parentheses
[ x := (λy.y) ]
(λy.y) z
[ y := z ]
z

α-equivalence

This refers to two different expressions that when given the same argument would return the same result. They are functionally equivalent to each other and you could substitute one for the other.

λx.x == λy.y == λz.z
λxy.yx == λzq.qz == λpt.tp

conversely, due to different ordering in the body the following is not equivalent.

λxy.xy != λzq.qz

Importantly, this property gives you the opportunity to rename variables where there may be clashes in an expression.

(λz.z) λz.zz == (λz.z) λy.yy

Where there are free variables in the expression it is not possible to establish equivalence, but you can still rename those that are bound (x in the following example).

λx.xz != λx.xz

If you wanted to avoid a variable name collision in the aforementioned expression you could rename x.

λx.xz -> λy.yz

Multiple arguments and currying

The same basic reduction rules apply when dealing with lambdas that have multiple arguments, but there is a little additional rule. Each argument is actually a lambda and they are nested - this is currying.

λxy.xy

is actually more like the following when considered in its most explicit form.

λx.(λy.xy)

The more arguments you have the more nested lambdas you’ll have.

λxyz.xyz == λx.(λy.(λz.xyz))
λxyzq.xyzq == λx.(λy.(λz.(λq.xyzq)))

This is to say that the first notation is shorthand for each argument being the application of a lambda function.

Now for a multi-argument reduction

(λxy.xy) p t
(λx.(λy.xy)) p t
[ x := p ]
(λy.py) t
[ y := t ]
pt

and again for a higher order example.

(λxy.xy) (λz.q) 1
(λx.(λy.xy)) (λz.q) 1
[ x := (λz.q) ]
(λy.(λz.q)y) 1
[ y := 1 ]
(λz.q) 1
[ z := 1 ]
q

Worth noting here that the z is not used in the lambda body so the value 1 simply evaporates (dropped from the expression/result).

It is, of course, possible to work through more complex problems like the following expression - remembering we start with the left most expression first.

(λxyz.xz(yz)) (λmn.m) (λp.p)
; let's expand to indicate curried arguments
(λx.(λy.(λz.xz(yz)))) (λm.(λn.m)) (λp.p)
[ x := (λm.(λn.m)) ]
(λy.(λz.(λm.(λn.m)) (z) (yz))) (λp.p)
[ y := (λp.p) ]
λz.(λm.(λn.m)) (z) ((λp.p) z)
; there are no more arguments to apply but
; we can still reduce internally
; again we want to do the left most first
[ m := z ]
λz.(λn.z) ((λp.p) z)
[ n := ((λp.p) z) ]
; as n is not used in the body it evaporates
; and the lambda returns z
λz.z

So after all that we land up with the identity function at the end.

Combinators

A lambda term with no free variables (all variables are bound), which serves to combine values.

λz.z
λzy.zy
λxyz.xz(yz)

As opposed to those that contain free variables - in this case y.

λz.y
λz.xy

There is a very famous combinator called the Y combinator that looks like this.

λf.(λx.f (x x))(λx.f (x x))

Divergence

Not all expressions can be considered to converge because they lead to replication and the beta reduction process never ends. Consider the Ω (Omega) divergence below.

(λx.xx) (λy.yy)
[ x := λy.yy ]
(λy.yy) (λy.yy)
[ y := λy.yy ]
(λy.yy) (λy.yy)
; and so on forever

Some exercises for you

The worked answers to these exercises are available after the summary.

Combinator exercises

For each of the following determine if they are combinators or not.

λq.qq
λts.stp
λz.fg
λxy.yx
λrgf.f (ri) g

α-equivalence exercises

Are these terms α-equivalent?

λz.z and λa.a
λbq.qb and λzt.zt
λz.zg and λp.pg
λb.λa.a and λa.λb.b
λd.λxy.y and λf.λyx.x

β-reduction exercises

Reduce the following to their β-normal forms. It will be a lot easier if you use a pen and paper or even a text document in your editor to work through these.

(λa.ab) (λq.q)
(λf.(λg.fgg)) (λn.n) m
(λs.(λp.(sp) s)) (λt.q)
(λb.(λm.(bb) m) (λq.vq)) (λx.(λe.e))
λf.((λx.f (x x)) (λx.f (x x)))
(λq.qg) (λp.(λs.ss)) (λt.t) z
(λfg.gf) (λba.a) (λz.z) pq
(λpt.pt) (λx.xx) (λf.ff)
(λpt.t) g (λq.(λv.vv)) o (λu.uu) (λpf.fz)

Summary

Lambda expressions are:

reduced from left to right

(λx.x) (λy.y) g
[ x := (λy.y) ]
(λy.y) g
[ y := g ]
g

left associative and greedy

(λx.x) (λy.y) g != (λx.x λy.y g)

; first expression
(λx.x) (λy.y) g
[ x := (λy.y) ]
(λy.y) g
[ y := g ]
g

; second expression
(λx.x λy.y g)
(λx.(x(λy.y g)))

applied/reduced through β-reduction to their β-normal form or point of divergence (they either self-replicate or grow - think Y-Combinator and Ω (Omega))
combinators when all variables are bound to arguments (no free variables) - therefore serving to combine values together

Answers Combinator answers

Yes
No (p is free)
No (fg are free)
Yes
No (i is free)

α-equivalence answers

Yes
No
No
Yes
Yes

β-reduction answers 1

(λa.ab) (λq.q)
[ a := (λq.q) ]
(λq.q) b
[ q := b ]
b

2

(λf.(λg.fgg)) (λn.n) m
[ f := (λn.n) ]
(λg.(λn.n) gg) m
[ g := m ]
(λn.n) (m)m
[ n := m ]
mm

3

(λs.(λp.(sp) s)) (λt.q)
[ s := (λt.q) ]
(λp.(λt.q)p) (λt.q)
[ p := (λt.q) ]
(λt.q) (λt.q)
[ t := (λt.q) ]  ; not that it matters, `t` is dropped anyway
q

4

(λb.(λm.(bb) m) (λq.vq)) (λx.(λe.e))
[ b := (λx.(λe.e)) ]
(λm.((λx.(λe.e)) (λx.(λe.e))) m) (λq.vq)
[ m := (λq.vq) ]
(λx.(λe.e)) (λx.(λe.e)) (λq.vq)
[ x := (λx.(λe.e)) ]
(λe.e) (λq.vq)
[ y := (λq.vq) ]
(λq.vq)

5

λf.((λx.f (x x)) (λx.f (x x)))
[ x := (λx.f (x x)) ]
λf.(f((λx.f (x x) (λx.f (x x)))))
[ x := (λx.f (x x)) ]
λf.(f(f((λx.f (x x) (λx.f (x x))))))
[ x := (λx.f (x x)) ]
λf.(f(f(f((λx.f (x x) (λx.f (x x)))))))
; the Y-combinator diverges and the expression actually grows!

6

(λq.qg) (λp.(λs.ss)) (λt.t) z
[ q := (λp.(λs.ss)) ]
(λp.(λs.ss)) g (λt.t) z
[ p := g ]
(λs.ss) (λt.t) z
[ s := (λt.t) ]
(λt.t) (λt.t) z
[ t := (λt.t) ]
(λt.t) z
[ t := z ]
z

7

(λfg.gf) (λba.a) (λz.z) pq
[ f := (λba.a) ]
(λg.g(λba.a)) (λz.z) pq
[ g := (λz.z) ]
(λz.z) (λba.a) pq
[ z := (λba.a) ]
(λba.a) pq
[ b := p ]
(λa.a) q
[ a := q ]
q

8

(λpt.pt) (λx.xx) (λf.ff)
[ p := (λx.xx) ]
(λt.(λx.xx)t) (λf.ff)
[ t := (λf.ff) ]
(λx.xx) (λf.ff)
[ x := (λf.ff) ]
(λf.ff) (λf.ff)
; this diverges

9

(λpt.t) g (λq.(λv.vv)) o (λu.uu) (λpf.fz)
[ p := g ]
(λt.t) (λq.(λv.vv)) o (λu.uu) (λpf.fz)
[ t := (λq.(λv.vv)) ]
(λq.(λv.vv)) o (λu.uu) (λpf.fz)
[ q := 0 ]
(λv.vv) (λu.uu) (λpf.fz)
[ v := (λu.uu) ]
(λu.uu)  (λu.uu) (λpf.fz)
[ u := (λu.uu) ]
(λu.uu) (λu.uu) (λpf.fz)
; this diverges before it can reduce all its terms leaving `(λpf.fz)` unreachable/dangling

https://www.simonholywell.com/post/the-lambda-calculus-for-developers/

Search and replace with confirmation in Bash

Simon Holywell Sep 12, 2017 Updated Nov 19, 2024

Automated search and replace can be very handy although there are occasions where a human needs to get involved on some of the decisions. If the search term isn’t unique or appears as part of other words or something like that. When this is the case you’ll want a confirm step where you can approve each replacement before it happens. With very little work we can achieve this using a combination of vim and grep.

Show full content

Automated search and replace can be very handy although there are occasions where a human needs to get involved on some of the decisions. If the search term isn’t unique or appears as part of other words or something like that. When this is the case you’ll want a confirm step where you can approve each replacement before it happens.

With very little work we can achieve this using a combination of vim and grep. I’ll use grep to find the search term and return a list of affected files. Looping over each of these files I can employ a simple vim substitution with the /c (confirmation) flag.

FROM="your search term here"
TO="your replacement here"
FILES=$(grep -rl "$FROM" *)
echo ""
echo "$FROM => $TO"
echo "-----------------------------------------------"
for SUBJECT_FILE in ${FILES//\\n/ } ; do
    echo "$SUBJECT_FILE"
    vim "$SUBJECT_FILE" -c "%s/$FROM/$TO/gc" -c "wq"
done
echo "-----------------------------------------------"
echo ""
echo "Done!"

The key this little script is the ability to pass vim a string of commands with the -c flag. In this way it is simple to:

open a file with vim vim "$SUBJECT_FILE"
perform the substitution and -c "%s/$FROM/$TO/gc"
then automatically save it before quitting -c "wq".

A simple little hack I use quite often in some of my bash scripts - most recently used to upgrade from Bootstrap v4-alpha.2 to v4.alpha.5 in a project. Between the versions, a fair proportion of CSS class names had been changed so these needed to be updated across the project.

To get the list of all the classes that need to be replaced I used nokogiri (a Ruby HTML parsing library) command line tools. This was executed directly against the bootstrap upgrade guide web site.

https://www.simonholywell.com/post/2017/09/search-and-replace-with-confirmation-in-bash/

PHP and immutability: objects and generalisation - part three

Simon Holywell Apr 27, 2017 Updated Nov 19, 2024

In the last article we learnt how to create modified copies of an immutable in PHP. This one is going to tackle an issue I have hitherto skirted around and avoided. Objects in immutable data structures. This article is part of a series I have written on the topic of immutability in PHP code: Part one - a discussion of caveats and a simple scalar handling immutable Part two - improve the process of creating modified copies of the immutable Part three - objects in immutable data structures and a generalised immutable implementation Also available in Русский (Russian):

Show full content

In the last article we learnt how to create modified copies of an immutable in PHP. This one is going to tackle an issue I have hitherto skirted around and avoided. Objects in immutable data structures.

This article is part of a series I have written on the topic of immutability in PHP code:

Part one - a discussion of caveats and a simple scalar handling immutable

Part two - improve the process of creating modified copies of the immutable

Part three - objects in immutable data structures and a generalised immutable implementation

Also available in Русский (Russian):

Часть 1 - PHP и неизменяемость. Часть 1

Часть 2 - PHP и неизменяемость: экземпляры, которые могут быть изменены. Часть 2

Часть 3 - PHP и неизменяемость: объекты и обобщение. Часть 3

What’s the problem with objects?

Objects or instances of classes are passed by reference in PHP. Any changes to the class will be reflected in all places it is passed to. This is different to scalar values like strings, that are passed by value instead.

$class = new stdClass();
function addItem($x, $item) {
    $x->$item = $item;
}
var_dump($class); // object(stdClass)#1 (0) {}
addItem($class, 'test');
var_dump($class);
/*
object(stdClass)#1 (1) {
  ["test"]=> string(4) "test"
}
*/

Here you can see a function called addItem() that adds a property to stdClass instance - this produces a side effect. The original $class is also updated as it references the same value so if we dump the variable we can see it’s value has changed.

Now consider the same example with a simple scalar string where pass by value takes effect.

$string = 'begin';
function addItem($x, $item) {
    $x .= $item;
}
var_dump($string); // string(5) "begin"
addItem($string, 'end');
var_dump($string); // string(5) "begin"

Here the original value remains intact because, unlike an object, there is no reference to it from within the addItem() function.

These side effects make putting an object into an immutable data structure difficult. Someone with access to the reference could simply change the object after the fact - thus breaking immutability.

What about resources?

Turns out the same issues plague resources as well. They are just references to a resource ID so any change to one will affect all those that also reference it. Simply moving the pointer in a file resource would break immutability.

$f = fopen('/tmp/test.txt', 'r'); // contains "123456789"
$out = fread($f, 3);
var_dump($out); // string(3) "123"

$out2 = fread($f, 3);
var_dump($out2); // string(3) "456"

This happens because fread() advances the file pointer as it reads. Even if we do rewind() the pointer then it is no guarantee of getting the same value back.

An additional issue with resources is that they are, by their nature, not a finite thing so even if you did prevent changes within your program you could still end up having mutations - someone updating a file on disk for example.

$f = fopen('/dev/urandom', 'r');
$out = bin2hex(fread($f, 3));
var_dump($out); // string(6) "82e42b"

rewind($f); // reset pointer to beginning
$out2 = bin2hex(fread($f, 3));
var_dump($out2); // string(6) "e20c78"

In between the two calls to fread() the data in the resource has changed through outside intervention. A new random value has effectively been written to /dev/urandom meaning the dumped value changes too even though we have rewound the pointer and used the same offset/index of 3.

Note, that the use of bin2hex() converts the binary bytes that /dev/urandom produces into a hexadecimal representation making it more legible to humans. This conversion process also increases the length of the value as �@D��N� becomes dd4044f5f84ed6 in hexadecimal notation. This is why the offset maybe 3, but the string that comes back is actually 6 characters long.

However, if your data source is not binary then you do not need to use bin2hex() in your code.

What can we do to fix it?

In the case of resources, it is too hard to protect them from unauthorised changes so we won’t bother. If you need an immutable resource you’ll have to fetch it as a scalar first and then put that into your immutable data structure.

$f = fopen('/dev/urandom', 'r');
$randomStr = bin2hex(fread($f, 7));
var_dump($randomStr); // string(14) "d102c7ca28b6f1"

var_dump(substr($randomStr, 0, 3)); // string(3) "d10"
var_dump(substr($randomStr, 0, 3)); // string(3) "d10"

As you can clearly see the value does not change between prints in this example because we are accessing a scalar string instead of a resource directly. You could just as easily feed the $randomStr into the Immutable definitions that are described further on.

In the case of objects though there is something we can do to protect the immutable from their pass by reference nature. For simple objects you can simply clone the incoming object value when setting it in an immutable data structure. This will create a new copy of the object with its own reference and, therefore, break the dependency on the previous reference - the two objects are not linked by reference. Any change in one will not be reproduced in the other.

declare(strict_types=1);

final class Immutable {
    private $data;
    private $mutable = true;
    public function __construct(stdClass $value) {
        if (false === $this->mutable) {
            throw new \BadMethodCallException('Constructor called twice.');
        }
        $this->data = clone $value;
        $this->mutable = false;
    }
    public function get() {
        return $this->data;
    }
}

$test = new stdClass();
$test->data = 'test';
echo $test->data; // test

$imm = new Immutable($test);
echo $imm->get()->data; // test

$test->data = 'simon';
echo $test->data; // simon
echo $imm->get()->data; // test

By cloning the object we have created a duplicate instance and referenced that instead from within the Immutable. This means that when $test is later updated it does not affect the value inside $imm as it is does not have the same reference as $test.

So, there it is, we are done.

Deep nesting though

Yeah, right, not so fast! The previous example can easily be broken with one small change; provide an object for storage inside $test.

$value = new stdClass();
$value->data = 'value';

$test = new stdClass();
$test->data = $value;
var_dump($test->data);
/*
object(stdClass)#1 (1) {
  ["data"]=> string(5) "value"
}
*/

$imm = new Immutable($test);
var_dump($imm->get()->data);
/*
object(stdClass)#1 (1) {
  ["data"]=> string(5) "value"
}
*/

// change the nested object's value to see if the immutable changes too
$value->data = 'changed value!';
var_dump($imm->get()->data);
/*
object(stdClass)#1 (1) {
  ["data"]=> string(14) "changed value!"
}
*/

As you would expect just because we cloned $test when it is set inside Immutable does not mean its contents are cloned too. Unfortunately, $value is still referenced directly, so any subsequent updates get reflected across all referring locations - including inside our Immutable.

The same would be true of any immutable containing an array too. You could just set one of the array elements to be an object and change it later just like $value in this object example.

Long story short, this immutable is in fact mutable.

Immutable deep nesting with __clone()

You could work around the lack of protection by implementing the __clone() magic method in all classes that might be put inside an immutable. You could then clone all objects stored in the class when it, itself, is cloned. A simplified demonstration of how this could work is below.

declare(strict_types=1);

final class Immutable {
    private $data;
    private $mutable = true;
    public function __construct(MySimpleClass $value) {
        if (false === $this->mutable) {
            throw new \BadMethodCallException('Constructor called twice.');
        }
        $this->data = clone $value;
        $this->mutable = false;
    }
    public function get() {
        return $this->data;
    }
}

class MySimpleClass {
    private $data;
    public function __construct(stdClass $value) {
        $this->data = $value;
    }
    public function get() {
        return $this->data;
    }
    public function __clone() {
        $this->data = clone $this->data;
    }
}

$stdClass = new stdClass();
$stdClass->value = 'Hello';
var_dump($stdClass);
/*
object(stdClass)#1 (1) {
  ["value"]=> string(5) "Hello"
}
*/
$toBeStored = new MySimpleClass($stdClass);
var_dump($toBeStored->get());
/*
object(stdClass)#1 (1) {
  ["value"]=> string(5) "Hello"
}
*/
$imm = new Immutable($toBeStored);
var_dump($imm->get()->get());
/*
object(stdClass)#5 (1) {
  ["value"]=> string(5) "Hello"
}
*/

As you can see MySimpleClass is very naive to make the demonstration easier to grasp. You will also note that the object ID jumps to 5 when the final var_dump() is applied - this is because __clone() in MySimpleClass was triggered.

If we step through the implementation again and attempt to make a change to $stdClass then it might be clearer.

$stdClass = new stdClass();
$stdClass->value = 'Hello';
var_dump($stdClass);
/*
object(stdClass)#1 (1) {
  ["value"]=> string(5) "Hello"
}
*/

$toBeStored = new MySimpleClass($stdClass);
// we can still modify this object as the clone has not yet happened
$stdClass->data = 'World';
var_dump($toBeStored->get());
/*
object(stdClass)#1 (2) {
  ["value"]=> string(5) "Hello"
  ["data"]=> string(5) "World"
}
*/

// the clone will be triggered by the constructor in Immutable right here
$imm = new Immutable($toBeStored);
// Note that the following line returns a different object (#5 instead of #1)
// due to the clone operation in the Immutable constructor
var_dump($imm->get()->get());
/*
object(stdClass)#5 (2) {
  ["value"]=> string(5) "Hello"
  ["data"]=> string(5) "World"
}
*/

// the following line will not affect the Immutable wrapped data as $stdClass references
// the original #1 object
$stdClass->combined = $stdClass->value . $stdClass->data;
var_dump($imm->get()->get());
/*
object(stdClass)#5 (2) {
  ["value"]=> string(5) "Hello"
  ["data"]=> string(5) "World"
}
*/

Unfortunately, this would require you to trust developers to actually implement this correctly and there would be no way of accurately verifying that a __clone() method has been specified properly.

To solve this issue we must eschew quite a bit of flexibility and only allow known immutable objects to be set inside the Immutable. This means that we have to recursively step down through any arrays looking for mutable classes and rejecting them too.

Generalised immutable deep nesting

For those of us who want a more stringently protected immutable we can generalise the problem by making an immutable class that can sanitise itself. It will only allow known immutables to be set as data inside it thereby preventing nested object state changes, which would break its immutable property.

declare(strict_types=1);

final class Immutable {
    private $data;
    private $mutable = true;

    public function __construct(array $args) {
        if (false === $this->mutable) {
            throw new \BadMethodCallException('Constructor called twice.');
        }
        $this->data = $this->sanitiseInput($args);
        $this->mutable = false;
    }
    public function getData(): array {
        return $this->data;
    }
    public function sanitiseInput(array $args): array {
        return array_map(function($x) {
            if (is_scalar($x)) return $x;
            else if (is_object($x)) return $this->sanitiseObject($x);
            else if (is_array($x)) return $this->sanitiseInput($x);
            else throw new \InvalidArgumentException(gettype($x) . ' cannot be stored in an Immutable.');
        }, $args);
    }
    // This method prevents untrusted objects from being set using a type hint
    // in combination with the declare(strict_types=1) at the top of the file.
    // Note that it also clones the supplied object.
    private static function sanitiseObject(Immutable $object): Immutable {
        return clone $object;
    }
    public function __clone() {
        $this->data = $this->sanitiseInput($this->data);
    }
    public function __unset(string $id): void {}
    public function __set(string $id, $val): void {}
}

This class can then be implemented to create immutable lists of things.

$immA = new Immutable([1, 'unjani wena']);
var_dump($immA);
/*
object(Immutable)#1 (2) {
  ["data":"Immutable":private]=> array(2) {
    [0]=> int(1)
    [1]=> string(11) "unjani wena"
  }
  ["mutable":"Immutable":private]=> bool(false)
}
*/
$immB = new Immutable([2, $immA]);
var_dump($immB);
/*
object(Immutable)#2 (2) {
  ["data":"Immutable":private]=> array(2) {
    [0]=> int(2)
    [1]=> object(Immutable)#4 (2) {
      ["data":"Immutable":private]=> array(2) {
        [0]=> int(1)
        [1]=> string(11) "unjani wena"
      }
      ["mutable":"Immutable":private]=> bool(false)
    }
  }
  ["mutable":"Immutable":private]=> bool(false)
}
*/
$immC = new Immutable([2, new stdClass]);
// Error: Argument 1 passed to Immutable::sanitiseObject() must be an instance of Immutable,
// instance of stdClass given

The main new concept here is the recursive method sanitiseInput(), which recursively steps through the data array cloning any objects it finds. This is completed in sanitiseObject() that you will also note, uses a type hint to ensure only instances of Immutable can be set as values. This is how we ensure that only known immutable objects are being set inside an Immutable.

If you need to check for more than one known immutable class then you could check in a number of ways:

extend a base or abstract class when implementing them all,
use an interface that they all implement or
a simple set of instanceOf checks.

Something like this might do it.

/**
 * @param Immutable|MyOtherImmutable|SomeOtherImmutable $object
 */
protected function sanitiseObject($object) {
    if (array_filter(
        ['Immutable', 'MyOtherImmutable', 'SomeOtherImmutable'],
        function($x) use ($object) { return $object instanceOf $x; }
    )) {
        return clone $object;
    }
    throw new \InvalidArgumentException(gettype($x) . ' cannot be stored in an Immutable.');
}

Whichever way you choose or prefer is up to you of course.

So, that finally gives us a simple immutable structure that can store objects, scalars and arrays. You can use the techniques discussed in the previous article (part two) to easily create modified copies of your new immutable.

Using a generator to make generalisation easier

The same functionality can also be written using a generator class to create the immutable data structure. In this section though we are going to be extending the idea just a little further to add some convenience methods.

The data structure

Turning to the structure itself, we are going to add a few methods that will make data access more robust in the generalised class. To this end, it is useful to know if a value exists so we are going to add a has($key) method. This will also be used by a getOrElse($key, $default) function to allow a default value to be provided where a key does not already exist.

declare(strict_types=1);

final class ImmutableData {
    private $data = [];
    private function __construct() {}
    public static function create(array $args): ImmutableData {
        $immutable = new self;
        $immutable->data = static::sanitiseInput($args);
        return $immutable;
    }
    public function has($key) {
        return array_key_exists($key, $this->data);
    }
    public function get($key) {
        return $this->data[$key];
    }
    public function getOrElse($key, $default) {
        if($this->has($key)) {
            return $this->get($key);
        }
        return $default;
    }
    public function getAsArray(): array {
        return $this->data;
    }
    protected static function sanitiseInput(array $arr): array {
        return array_map(function($x) {
            if (is_scalar($x)) return $x;
            else if (is_object($x)) return static::sanitiseObject($x);
            else if (is_array($x)) return static::sanitiseInput($x);
            else throw new \InvalidArgumentException(gettype($x) . ' cannot be stored in an Immutable.');
        }, $arr);
    }
    protected static function sanitiseObject(ImmutableData $object): ImmutableData {
        return clone $object;
    }

    // return a parsable text representation of the class
    public function __toString(): string {
        return var_export($this->getAsArray(), true);
    }
    // called when a var_export'd class is parsed
    public function __set_state(array $args): ImmutableData {
        return static:create($args);
    }
    public function __unset($a): void {}
    public function __set($a, $b): void {}
    private function __clone() {
        $this->data = static::sanitiseInput($this->data);
    }
}

This is the complete immutable structure that our generator will populate for us.

Unlike the last Immutable this one makes use of static methods and prevents access to the class constructor by making it a private method. This skips the $mutable true/false dance we have been doing elsewhere. I prefer the dance, but this serves as a nice example of another method to achieve a similar result.

You will notice that there are actually a few other methods in there that we have not discussed yet. There is a get($key) that allows us to access a value by its key easily and getAsArray() has taken over the duties of returning the complete $this->data array. Finally, there is a toString() method, which produces a PHP parsable string representation of the stored data.

A generator in detail

Now onto the generator that will produce the populated instances of the ImmutableData class.

The main aim of this generator is to make it as generalised as possible - allowing a consumer to store the widest selection of types and values as possible whilst ensuring immutability is not broken. In tandem with this we will also add some methods to make modifying a copy of the immutable easier.

All the data will be stored in an array internally to easily facilitate different data shapes that may be thrown at the Immutable class.

All data will need to be stored against a key so that it can be accessed again easily.

class Immutable {
    private $data = [];
    private function __construct() {}
    public static function create(): self {
        return new self;
    }
    public static function with(ImmutableData $old): self {
        $new = static::create();
        $new->data = $old->getAsArray();
        return $new;
    }
    public function set(string $key, $value): self {
        return $this->setData($key, $value);
    }
    public function unset($key): self {
        unset($this->data[$key]);
        return $this;
    }
    public function setIntKey(int $key, $value): self {
        return $this->setData($key, $value);
    }
    private function setData($key, $value): self {
        $this->data[$key] = $value;
        return $this;
    }
    public function arr(array $arr): self {
        foreach($arr as $key => $value) {
            if (is_string($key)) {
                $this->set($key, $value);
            } else if (is_int($key)) {
                $this->setIntKey($key, $value);
            }
        }
        return $this;
    }
    public function unsetArr(array $arr): self {
        foreach($arr as $key) {
            $this->unset($key);
        }
        return $this;
    }
    public function build(): ImmutableData {
        return ImmutableData::create($this->data);
    }
    public function getAsArray(): array {
        return $this->data;
    }
}

Again this class uses a private constructor and static method to prevent calls to the constructor. You could use the $mutable true/false setup here, very easily, if you wanted to though.

Simple usage

These two classes can now be used to generate an immutable data structure like so.

$immX = Immutable::create()
    ->set('test', 'a string goes here')
    ->set('another', 100)
    ->arr([1,2,3,4,5,6])
    ->arr(['a' => 1, 'b' => 2])
    ->build();
echo (string) $immX;

This uses the __toString() method to print a simple and parsable text representation.

array (
  'test' => 'a string goes here',
  'another' => 100,
  0 => 1,
  1 => 2,
  2 => 3,
  3 => 4,
  4 => 5,
  5 => 6,
  'a' => 1,
  'b' => 2,
)

You can also put a trusted object into the immutable as well - in this case we will just use the immutable we created earlier, $immX.

$immY = Immutable::create()
    ->set('anObject', $immX)
    ->build();
echo (string) $immY;

Again, the output is parsable by the PHP engine so you will notice the slightly weird __set_state() magic method call in there - you can safely ignore this and concentrate on the data itself. This magic method is implemented in the ImmutableData class that we defined earlier and it merely serves to populate a class with a set of data/state when a var_export() output is parsed by PHP.

array (
  'anObject' =>
  ImmutableData::__set_state(array(
     'data' =>
    array (
      'test' => 'a string goes here',
      'another' => 100,
      0 => 1,
      1 => 2,
      2 => 3,
      3 => 4,
      4 => 5,
      5 => 6,
      'a' => 1,
      'b' => 2,
    ),
  )),
)

So, what is the point if we cannot get our data out? Well, remember those get(), has() and getOrElse() methods? They can be used to quickly and relatively easily access the stored data by key. The methods are fairly self-explanatory so here are a few examples just to demonstrate their usage against $immY.

echo $immY->get('test'); // a string goes here
var_dump($immY->has('test')); // bool(true)
var_dump($immY->has('non-existent')); // bool(false)
echo $immY->getOrElse('test', 'some default text'); // a string goes here
echo $immY->getOrElse('non-existent', 'some default text'); // some default text

This should give you enough of a foundation to build additional functions like map, reduce, etc upon were you choose to do so. You could also write methods to fetch items by their value rather than their key as well.

Modifying copies of the immutable structure using the generator

The key to making immutables useful is allowing consumers to easily and quickly create modified copies of the underlying data. This has been written into the generator we defined earlier and can be best described with a few examples. Note that the with() static method can accept an ImmutableData object as its first parameter and modification is exactly what this is for. You can then use set() to add or modify values.

$immZ = Immutable::with($immY)
    ->set('a story', 'This is where someone should write a story')
    ->setIntKey(300, 'My int indexed value')
    ->arr(['arr: int indexed', 'arr' => 'arr: assoc key becomes immutable key'])
    ->build();
echo (string) $immZ;

In the result we should see our new properties added to the stored array from $immY.

array (
  'x' =>
  ImmutableData::__set_state(array(
     'data' =>
    array (
      'test' => 'a string goes here',
      'another' => 100,
      0 => 1,
      1 => 2,
      2 => 3,
      3 => 4,
      4 => 5,
      5 => 6,
      'a' => 1,
      'b' => 2,
    ),
  )),
  'a story' => 'This is where someone should write a story',
  300 => 'My int indexed value',
  0 => 'arr: int indexed',
  'arr' => 'arr: assoc key becomes immutable key',
)

Of course, you can also use arr() or setInt() here in the same way too when setting new values or overwriting existing ones. Just set a value with a key that already exists in the structure and you will overwrite it.

$throwAway = Immutable::with($immZ)
    ->set('a story', 'My story begins by the slow moving waters of the meandering river.')
    ->build();
echo (string) $throwAway;

This would result in a data structure like the following.

array (
  'x' =>
  ImmutableData::__set_state(array(
     'data' =>
    array (
      'test' => 'a string goes here',
      'another' => 100,
      0 => 1,
      1 => 2,
      2 => 3,
      3 => 4,
      4 => 5,
      5 => 6,
      'a' => 1,
      'b' => 2,
    ),
  )),
  'a story' => 'My story begins by the slow moving waters of the meandering river.',
  300 => 'My int indexed value',
  0 => 'arr: int indexed',
  'arr' => 'arr: assoc key becomes immutable key',
)

It is also used to remove items from the data list quite simply too. We can either remove them one at time with unset($key) or you can remove many by supplying a list to unsetArr().

$immAA = Immutable::with($immZ)
    ->unset('x')
    ->unsetArr(['a story', 300])
    ->build();
echo (string) $immAA;

The execution of this results in the following modified output where a number of keys have been removed.

array (
  0 => 'arr: int indexed',
  'arr' => 'arr: assoc key becomes immutable key',
)

You can unset(), unsetArr, set, setIntKey and arr as much as you like before calling build() all in the one building chain.

Conclusion

Now you have a generalised immutable data structure that you can store anything you like in. If you have an untrusted object you will need store it as a string using either serialize() or var_export(). The same goes for resources like file handles where you will need to extract value as text before storing it.

Apart from these two caveats though, you are relatively free to use the immutable as you see fit.

This article is part of a series I have written on the topic of immutability in PHP code:

Part one - a discussion of caveats and a simple scalar handling immutable

Part two - improve the process of creating modified copies of the immutable

Part three - objects in immutable data structures and a generalised immutable implementation

Also available in Русский (Russian):

Часть 1 - PHP и неизменяемость. Часть 1

Часть 2 - PHP и неизменяемость: экземпляры, которые могут быть изменены. Часть 2

Часть 3 - PHP и неизменяемость: объекты и обобщение. Часть 3

If you like this article then you might get a kick out of writing functional php code as taught in the Functional Programming in PHP book that I wrote.

https://www.simonholywell.com/post/2017/04/php-and-immutability-part-three/

PHP and immutability: modified copies - part two

Simon Holywell Apr 3, 2017 Updated Nov 19, 2024

In the last article we learnt how to create an immutable data structure in PHP. There were a few issues to work through, but we got there in the end. Now onto making the immutable class more useful and easier to create modified copies. Note that these are copies and not modifications, in-place, to the original objects. This article is part of a series I have written on the topic of immutability in PHP code:

Show full content

In the last article we learnt how to create an immutable data structure in PHP. There were a few issues to work through, but we got there in the end. Now onto making the immutable class more useful and easier to create modified copies. Note that these are copies and not modifications, in-place, to the original objects.

This article is part of a series I have written on the topic of immutability in PHP code:

Part one - a discussion of caveats and a simple scalar handling immutable

Part two - improve the process of creating modified copies of the immutable

Part three - objects in immutable data structures and a generalised immutable implementation

Also available in Русский (Russian):

Часть 1 - PHP и неизменяемость. Часть 1

Часть 2 - PHP и неизменяемость: экземпляры, которые могут быть изменены. Часть 2

Часть 3 - PHP и неизменяемость: объекты и обобщение. Часть 3

Simple parameter mutations

When you want to modify a property in an immutable object you must, by definition, create a new object and insert the modified value into that new location. You could simply get the values from the current instance and pass them into a new instance as you create it.

$a = new Immutable('Test');
echo $a->getX(); // Test
$b = new Immutable($a->getX() . ' again');
echo $b->getX(); // Test again

So simple. Too simple!

This technique works reasonably well for this small dataset, but what if we had five or ten parameters that would have to be replayed every time? An exaggerated example to illustrate my point follows.

$a = new Immutable('A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K');
echo $a->getK(); // K
$b = new Immutable(
    $a->getA(), $a->getB(), $a->getC(), $a->getD(), $a->getE(), $a->getF(),
    $a->getG(), $a->getH(), $a->getI(), $a->getJ(), $a->getK() . ' some change'
);
echo $b->getK(); // K some change

It is certainly doable, but I, for one, am not going to be executing that every time I need to work with an immutable instance. Certainly, not if I can avoid it.

Mutation at clone time

There is a very handy quirk in PHP that we can exploit. It will allow us to create new modified copies of the object in question.

Instead of mutating the object in place like you would in traditional OOP, we’re going to make a clone of the object and changes it’s private properties. Yes, you read that correctly, you can change the private properties of a class instance!

So you’ve probably learnt that private means that a class property cannot be changed from outside or by other classes overriding it. Whilst this is generally true; when we clone an object we get a fleeting opportunity to change it’s private properties.

$a = new Immutable('A', 'B');
echo $a->getB(); // B
$b = clone $a;
$b->B = '22';
// Fatal error: Cannot access private property Immutable::$B

Well that didn’t work! I should’ve said that you can only perform the clone from within a method of the same class to be able to modify it like this.

declare(strict_types=1);

final class Immutable {
    private $x;
    private $mutable = true;
    public function __construct(string $input) {
        if (false === $this->mutable) {
            throw new \BadMethodCallException('Constructor called twice.');
        }
        $this->x = $input;
        $this->mutable = false;
    }
    public function getX(): string {
        return $this->x;
    }
    public function withX(string $input): Immutable {
        $clonedClass = clone $this;
        $clonedClass->x = $input;
        return $clonedClass;
    }
}
$a = new Immutable('TEST');
echo $a->getX(); // TEST
$b = $a->withX('noop');
echo $b->getX(); // noop

In this way it becomes easier to modify a value inside an immutable - we can wrap up the clone and set the right values for them. Having a shortened syntax like this really serves to help implementers work with immutable objects.

Preventing the setting of unexpected properties

There are other ways that a seemingly immutable class can be messed with too. Fortunately, these can be stopped with a couple of PHP magic methods.

In PHP it is possible to add properties to a class at run time - even a final class. We don’t want this as it would change the shape of our class and therefore mean that it was mutable. The simple way to prevent this is to add an empty __set() magic method implementation to your class.

    public function __set(string $id, $val): void {
        return;
    }

It is also possible to remove property values by using the unset() construct. We can also prevent this using another empty magic method:

    public function __unset(string $id): void {
        return;
    }

It is important to ban these. Whilst they do not allow modification of our private properties - they do allow outside agents to change our immutable class by adding and remove their own public properties. The class would no longer be immutable were this allowed to happen.

Merged clone time mutation

So far we’ve seen the ability to change one property using a withX style method, but what if we want to change more? Well, you could just chain the changes up with something like this.

$a = new MyFantasyImmutable('TEST', 'foo');
echo $a->getX(); // TEST
echo $a->getY(); // foo
$b = $a->withX('noop')->withY('bar');
echo $b->getX(); // noop
echo $b->getY(); // bar

Whilst it works, there are a few things that I dislike about this approach. A throwaway instance is created between the calls to withX() and withY(), you have to create a with*() function for every property and the method chaining quickly gets irritating.

There is, of course, another way.

First, let’s define a new immutable class with a few properties.

declare(strict_types=1);

final class Bike {
    private $engineCc, $brakes, $tractionControl;
    private $mutable = true;
    public function __construct(int $engineCc, string $brakes, bool $tractionControl) {
        if (false === $this->mutable) {
            throw new \BadMethodCallException('Constructor called twice.');
        }
        $this->engineCc = $engineCc;
        $this->brakes = $brakes;
        $this->tractionControl = $tractionControl;
        $this->mutable = false;
    }
    public function __get($property) {
        if (property_exists($this, $property)) {
            return $this->$property;
        }
    }
    public function __set(string $id, $val): void {
        return;
    }

    public function __unset(string $id): void {
        return;
    }
}

To keep the example shorter, I’ve employed a small __get() magic method instead of writing a get method of each property the class. You would have to write one for each ending up with functions like getEngineCc(), getBrakes() and getTractionControl(). Instead you access them directly as properties instead.

$zx9r = new Bike(900, '2 piston floating discs', false);
echo $zx9r->engineCc; // 900
echo $zx9r->brakes; // 2 piston floating discs

$cagivaRaptor = new Bike(1000, '2 piston floating discs', false);
var_dump($cagivaRaptor->tractionControl); // bool(false)

Anyway back to the mutations! To allow for the easy manipulation of the classes properties when cloning we can add a simple method to the class.

    public function with(array $args): Bike {
        $clonedClass = clone $this;
        foreach($args as $property => $value) {
            if (property_exists($clonedClass, $property)) {
                $clonedClass->$property = $value;
            }
        }
        return $clonedClass;
    }

Now when you want a new class with modifications - perhaps when you’re releasing a new motorbike model - you can just call with() and include an associative array.

$zx9r = new Bike(900, 'Floating 2 piston', false);
$zx10r = $zx9r->with(['engineCc' => 1000, 'tractionControl' => true]);
echo $zx10r->engineCc; // 1000
echo $zx10r->brakes; // Floating 2 piston
var_dump($zx10r->tractionControl); // bool(true)

While it works OK, you may have noticed that we’ve now effectively eliminated the ability for PHP to type check the input. We’re no longer populating the class via the constructor.

This is bad because a non-scalar value could be passed in (more on why this sucks in a future article).

One way we could solve this is to replace with() with a function that uses reflection to workout the constructors parameter order and merge newly supplied values in.

    public function with(array $args): Bike {
        $reflection = new ReflectionMethod($this, '__construct');
        $new_parameters = array_map(function($param) use ($args) {
            $x = $param->name;
            return (array_key_exists($x, $args))
                ? $args[$x] // use newly supplied value
                : $this->$x; // fallback to the current value
        }, $reflection->getParameters());
        return new self(...$new_parameters);
    }

When the new class instance is created the newly supplied values are passed to the constructor, which ensures that they’re correctly type checked.

You would call this method in the same way as the last with() implementation. It does make the assumption that the class properties will have the same name as the constructor parameter name ($this->engineCc is the same as constructor parameter $engineCc for example).

This would leave you a final Bike class of:

declare(strict_types=1);

final class Bike {
    private $engineCc, $brakes, $tractionControl;
    private $mutable = true;
    public function __construct(int $engineCc, string $brakes, bool $tractionControl) {
        if (false === $this->mutable) {
            throw new \BadMethodCallException('Constructor called twice.');
        }
        $this->engineCc = $engineCc;
        $this->brakes = $brakes;
        $this->tractionControl = $tractionControl;
        $this->mutable = false;
    }
    public function __get($property) {
        if (property_exists($this, $property)) {
            return $this->$property;
        }
    }
    public function with(array $args): Bike {
        $reflection = new ReflectionMethod($this, '__construct');
        $new_parameters = array_map(function($param) use ($args) {
            $x = $param->name;
            return (array_key_exists($x, $args))
                ? $args[$x] // use newly supplied value
                : $this->$x; // fallback to the current value
        }, $reflection->getParameters());
        return new self(...$new_parameters);
    }
    public function __set(string $id, $val): void {
        return;
    }

    public function __unset(string $id): void {
        return;
    }
}

Also bear in mind that the reflection API provided by PHP is not crazily quick so if micro-optimisations are your thing then you’d probably want to avoid this. If you can take the hit then the extra security you get from the type checking is worth it.

Using a builder to generate immutable objects

Another way around this particular issue with immutable objects can be to use a second class to generate the immutable objects. This will allow you to avoid the use of the Reflection API and still give you the advantage of type checking. A little touch of irony here as we’ll use a mutable builder class to produce an immutable object, but bear with me.

Firstly, we need to define the immutable object our builder will produce. I am removing the __get() magic here too as our aim is to make it easier for our code to be analysed statically. This will help IDEs to type hint, code quality tools to read our code and ostensibly make the code easier to follow cognitively.

declare(strict_types=1);

final class Bike {
    private $engineCc, $brakes, $tractionControl;
    private $mutable = true;
    public function __construct(int $engineCc, string $brakes, bool $tractionControl) {
        if (false === $this->mutable) {
            throw new \BadMethodCallException('Constructor called twice.');
        }
        $this->engineCc = $engineCc;
        $this->brakes = $brakes;
        $this->tractionControl = $tractionControl;
        $this->mutable = false;
    }
    public function getEngineCc(): int {
        return $this->engineCc;
    }
    public function getBrakes(): string {
        return $this->brakes;
    }
    public function getTractionControl(): bool{
        return $this->tractionControl;
    }
    public function __set(string $id, $val): void {
        return;
    }

    public function __unset(string $id): void {
        return;
    }
}

Now we have a simple little immutable class we can get on with the business of creating a generating class. This new class will accept all the values we wish to store in the immutable class and return an instance of Bike.

class BikeGenerator {
    private $engineCc, $brakes, $tractionControl;
    public static function create(): self {
        return new self;
    }
    public static function with(Bike $oldBike): self {
        $generator = new self;
        $generator->setEngineCc($oldBike->getEngineCc());
        $generator->setBrakes($oldBike->getBrakes());
        $generator->setTractionControl($oldBike->getTractionControl());
        return $generator;
    }

    public function setEngineCc(int $cc): self {
        $this->engineCc = $cc;
        return $this;
    }
    public function setBrakes(string $brakes): self {
        $this->brakes = $brakes;
        return $this;
    }
    public function setTractionControl(bool $tractionControl): self {
        $this->tractionControl = $tractionControl;
        return $this;
    }
    public function build(): Bike {
        return new Bike($this->engineCc, $this->brakes, $this->tractionControl);
    }
}

The BikeGenerator duplicates some code of the original class and really serves as a glorified queue. We add to the queue until we’re happy and execute build() to be given a freshly populated instance of Bike.

$zx9r = BikeGenerator::create()
  ->setEngineCc(900)
  ->setBrakes('2 piston floating disc')
  ->setTractionControl(false)
  ->build();

echo $zx9r->getEngineCc(); // 900
echo $zx10r->getBrakes(); // 2 piston floating disc
var_dump($zx10r->getTractionControl()); // bool(false)

$zx10r = BikeGenerator::with($zx9r)
  ->setEngineCc(1000)
  ->setBrakes($zx9r->getBrakes() . ' ABS')
  ->setTractionControl(true)
  ->build();

echo $zx10r->getEngineCc(); // 1000
echo $zx10r->getBrakes(); // 2 piston floating disc ABS
var_dump($zx10r->getTractionControl()); // bool(true)

This shows a use of the builder pattern to generate a ready made immutable Bike instance. You can then call ::with() to easily create a new modified version of an existing object.

Setting larger amounts of properties

There is not all that much that you can do remove the tedium of dealing with many values in an immutable with PHP. One way to get past this is to pass in an array of values that are checked and stored in the immutable.

declare(strict_types=1);

final class Config {
    private $properties = [
        // property => data type
        // assume no type = string
        'name',
        'version'   => 'int',
        'released'  => 'bool',
        'licence',
        'private'   => 'bool',
        'url',
        'repo',
        'downloads' => 'int'
    ];
    private $data = [];
    private $mutable = true;
    public function __construct(array $values) {
        if (false === $this->mutable) {
            throw new \Exception('Constructor called twice.');
        }
        $this->set($values);
        $this->mutable = false;
    }

    public function __get($property) {
        if (array_key_exists($property, $this->data)) {
            return $this->data[$property];
        }
        throw new \Exception('The property ' . $property . ' does not exist');
    }

    private function set(array $values) {
        foreach($this->properties as $prop => $type) {
            if (!is_string($prop)) {
                // coalesce to string for properties that don't have a type specified
                $prop = $type;
                $type = 'string';
            }

            if (array_key_exists($prop, $values)) {
                $this->setValue($prop, $type, $values[$prop]);
            }
        }
    }

    private function setValue($prop, $type, $value) {
        $check = 'is_' . $type; // eg. is_int()
        if ($check($value)) {
            $this->data[$prop] = $value;
        } else {
            throw new \InvalidArgumentException('Incorrect type passed for the "' . $prop . '" property - expected ' . $type . ' , but got ' . gettype($prop));
        }
    }
    public function __set(string $id, $val): void {
        return;
    }

    public function __unset(string $id): void {
        return;
    }
}

We’ve had to eschew the type system in favour a small custom type check defined in the $properties class property and evaluated in the setValue() method.

$c = new Config([
    'name' => 'foo',
    'version' => '10'
]);
// Uncaught InvalidArgumentException: Incorrect type passed for the "version" property - expected int , but got string

The way that this works also makes it easy to handle optional arguments at instantiation - up until this point all arguments have been mandatory. Here you can supply an array missing one or more properties and they simply won’t be set in the $this->data array.

$c = new Config([
    'name' => 'foo',
    'version' => 10
]);
echo $c->name; // foo
echo $c->version; // 10
echo $c->repo; // Uncaught Exception: The property repo does not exist

There are further improvements that could be made here to make the class better able to handle non-existent values, but they’ll have to be the subject of another article.

Also, note that the way the set() method is designed means that it will silently ignore properties passed to it that do not exist in the $properties class property. You may wish to change this to throw an exception or warning depending on your use case, of course.

Merging larger collections of properties

It is a very simple exercise to perform a merge now that we have an array for the value store and we’re accepting an associative array for input. We can add the following method to the Config class.

    public function with(array $values): Config {
        return new self(array_merge($this->data, $values));
    }

This can then be exercised with:

$c = new Config([
    'name' => 'foo',
    'version' => 10,
    'repo' => 'github.com',
]);
echo $c->name; // foo
echo $c->version; // 10
echo $c->repo; // github.com

$c2 = $c->with(['name' => 'bar', 'version' => 12]);
echo $c2->name; // bar
echo $c2->version; // 12
echo $c2->repo; // github.com

By writing the code this way, we’ve effectively written our own little implementation of named parameters too. As great as that may be we’re also losing some clarity and IDE type hinting with the inherent indirection.

Anyway you cut it, manipulating an immutable in PHP can get annoying pretty quickly. There appears to be no really simple way of avoiding typing more and/or affecting the type hinting abilities of IDEs. These are some of the techniques I’ve used before to workaround some of the frustration, but there is definitely room for improvement.

This article is part of a series I have written on the topic of immutability in PHP code:

Part one - a discussion of caveats and a simple scalar handling immutable

Part two - improve the process of creating modified copies of the immutable

Part three - objects in immutable data structures and a generalised immutable implementation

Also available in Русский (Russian):

Часть 1 - PHP и неизменяемость. Часть 1

Часть 2 - PHP и неизменяемость: экземпляры, которые могут быть изменены. Часть 2

Часть 3 - PHP и неизменяемость: объекты и обобщение. Часть 3

If you like this article then you might get a kick out of writing functional php code as taught in the Functional Programming in PHP book that I wrote.

https://www.simonholywell.com/post/2017/04/php-and-immutability-part-two/

PHP and immutability: difficulties and scalars - part one

Simon Holywell Mar 16, 2017 Updated Nov 19, 2024

Being a weakly typed dynamic language, PHP has not really had the concept of immutability built into it. We’ve seen the venerable define() and CONSTANTS of course, but they’re limited. Whilst PHP does ship with an immutable class as part of it’s standard library, DateTimeImmutable, there is no immediately obvious method to create custom immutable objects. This article is part of a series I have written on the topic of immutability in PHP code:

Show full content

Being a weakly typed dynamic language, PHP has not really had the concept of immutability built into it. We’ve seen the venerable define() and CONSTANTS of course, but they’re limited. Whilst PHP does ship with an immutable class as part of it’s standard library, DateTimeImmutable, there is no immediately obvious method to create custom immutable objects.

This article is part of a series I have written on the topic of immutability in PHP code:

Part one - a discussion of caveats and a simple scalar handling immutable

Part two - improve the process of creating modified copies of the immutable

Part three - objects in immutable data structures and a generalised immutable implementation

Also available in Русский (Russian):

Часть 1 - PHP и неизменяемость. Часть 1

Часть 2 - PHP и неизменяемость: экземпляры, которые могут быть изменены. Часть 2

Часть 3 - PHP и неизменяемость: объекты и обобщение. Часть 3

Constants

The only immutable data store available in PHP is constants, which are set with define() or the const keyword on a PHP class. There is one important difference between the two options though.

Class constants (const) are set when the code is written and cannot be changed at runtime. This property makes them PHP’s most immutable user defined structure. If you wanted to set a different value conditionally or set a value from another variable you’re out of luck.

class Immutable {
    const TRICK = 'kickflip';

    public function __construct() {
        echo static::TRICK;     // kickflip
        static::TRICK = 'HHHH'; // Parse error unexpected '='
    }
}
new Immutable();

Traditional constants, set with define(), can be initialised conditionally and be built from other variables - once set though, they are immutable.

$skater = 'Mullen';

if ($skater === 'Mullen') {
    define('TRICK', $skater . ' created the flatground Ollie');
} else if ($skater === 'Hawk') {
    define('TRICK', $skater . ' invented the Kickflip McTwist');
}
echo TRICK; // Mullen created the flatground Ollie
define('TRICK', 'nothing'); // Notice: Constant TRICK already defined
echo TRICK; // Mullen created the flatground Ollie;

As you can see, if you try to modify the constant then you’ll get a PHP notice - the value though will remain unchanged. You can check if a constant is already defined with the function defined() - note the extra d!

So, that works well for simple scalars. If you want to store any kind of array structure as an immutable though you’ll have to use a class constant or be disappointed by any version of PHP before PHP 7.

define('TRICKS', [
    'Ollie',
    'Darkslide',
    'Heelflip',
    'Nollie'
]);
// Warning: Constants may only evaluate to scalar values

If you’re feeling smug and running PHP7 then, well, I have some bad news - arrays work, sure, but objects not so much!

define('TRICKS', new stdClass());
// Warning: Constants may only evaluate to scalar values or arrays

There are other reasons that you probably don’t want a constant anyway.

In the case of class constants we’ve seen how limited they are with their requirement to be set before runtime. You get a some scoping though with them bound to the class definition in question.

Traditional constants are defined globally. Is that really what you want? Surely you’d rather have a proper variable that you can pass around between functions and that adheres to PHP’s scoping rules.

Both of these methods result in weird syntax usage, where a far simpler variable syntax would be more appropriate and likely expected.

Custom immutable classes

It is possible to write your own immutables using some simple and sneaky PHP techniques though. We’re going to use a simplistic data requirement to make the examples in this article easier to follow. I’ll be using professional skateboarders and the tricks that they brought to the world.

First up, is a simple immutable PHP class that accepts it’s values as parameters to the constructor. I have called the class Immutable, but this is probably a bad idea in non-trivial code as it may become a reserved word in the future. Right now though it has absolutely no significance beyond being a nice name to give the class.

class Immutable {
    private $skater, $trick;

    public function __construct($skater, $trick) {
        $this->skater = $skater;
        $this->trick = $trick;
    }

    public function getSkater() {
        return $this->skater;
    }

    public function getTrick() {
        return $this->trick;
    }
}

By using private class properties here we have ensured that the values cannot be changed by code outside of the Immutable class definition.

$x = new Immutable('Hawk', 'Frontside 540');
echo $x->skater = 'Mullen'; // Fatal error: Cannot access private property Immutable::$skater

To allow external access the values bound up into an Immutable object is simply a case of writing a public method that returns the required value. In our example here we have two class properties we want access to so I have added getSkater() to get the skater’s name and getTrick() to get the name of the trick they invented.

This gives you a very simple immutable class that once initialised cannot be changed externally.

Avoiding mutations

It is important that no methods are added to Immutable that will allow values inside the class to be mutated. Obviously you do not want to be writing public methods into the class like setSkater() or setTrick() as these would allow an implementer to mutate our mutable class.

class Immutable {
    ...
    // WRONG: don't do this!
    public function setSkater($skater) {
        $this->skater = $skater;
    }
    ...
}
$x = new Immutable('Mullen', '50-50 Sidewinder');
echo $x->getSkater(); // Mullen
$x->setSkater('Hawk'); // Argh! No! You're MUTATING!
echo $x->getSkater(); // Hawk

Now we have a working example and understanding of an immutable object in PHP it’s time to declare that there are a couple of other pit falls here though too. Sorry! You didn’t really expect it to be that easy did you?

Stopping circumventions

A lot of people out there don’t like immutability as much as you and I. They’ll do everything in their power to circumvent our very specifically designed immutable objects.

Overriding the immutable variable

The easiest way to get around intended immutability in PHP is simply to override the variable the immutable instance is assigned to.

$a = new Immutable('Mullen', 'Casper slide');
$a = new TrickRecord('Mullen', 'Airwalk');

When the code is run $a will be silently overridden by a new TrickRecord instead of throwing an error as we would hope. This is a symptom of PHP’s history and the intended behaviour. Of course being a naughty developer you can be sure they’ve not made TrickRecord immutable!

As of writing there is no way to prevent this from happening in PHP. As we know from before objects can’t be assigned to a constant (only scalars and in PHP 7 arrays). In JavaScript we now have the const keyword that would prevent this from happening, but no such luck in PHP.

The only way that you might be able to prevent this is to use type hints everywhere that you expect an Immutable to be passed in, but as we’ll see there are ways around this too!

Multiple calls to the constructor

Another simple way of frustrating the immutable is to simply call the constructor again with different parameters.

$x = new Immutable('Hawk', 'Frontside 540');
echo $x->getSkater(); // Hawk
$x->__construct('Song', 'Frontside 540');
echo $x->getSkater(); // Song

Obviously, this is a pain and highly undesirable. Luckily, we can work around this problem by including a flag in our class to prevent mutations. I’d just like to mention here that Daewon Song is a great skater - it’s just that he didn’t invent the frontside 540 so we cannot allow it.

class Immutable {
    private $skater, $trick;
    private $mutable = true;

    public function __construct($skater, $trick) {
        if (false === $this->mutable) {
            throw new \BadMethodCallException('Constructor called twice.');
        }
        $this->skater = $skater;
        $this->trick = $trick;
        $this->mutable = false;
    }

    public function getSkater() {
        return $this->skater;
    }

    public function getTrick() {
        return $this->trick;
    }
}

With this change multiple calls to the constructor will not be allowed and throw a fatal exception.

$x = new Immutable('Hawk', 'Frontside 540');
echo $x->getSkater(); // Hawk
$x->__construct('Song', 'Darkslide');
// Fatal error: Uncaught BadMethodCallException: Constructor called twice.

Extending the Immutable class

It is also possible for a developer to write a class that extends our Immutable. This would allow them to overload our constructor with their own.

Once they have their own constructor they can assign $skater and $trick to their own public class properties. This breaks the immutability of the class as public properties can be changed. Similarly they could add new methods to set the value of their properties too.

class NaughtyDev extends Immutable {
    public $mySkater, $myTrick;

    public function __construct($skater, $trick) {
        $this->mySkater = $skater;
        $this->myTrick = $trick;
    }

    public function setTrick($trick) {
        $this->myTrick = $trick;
    }
}

As NaughtyDev extends Immutable any type checking done on an instance will pass. Here a sneaky developer is passing us an instance of NaughtyDev where our code wants an Immutable, but we’re none the wiser.

$x = new NaughtyDev('Hawk', '900');
$x instanceof Immutable; // true

function onlyGiveMeAnImmutable(Immutable $z) {
    return $z;
}
onlyGiveMeAnImmutable($x); // does not throw a type error

It is all perfectly valid code and the expected behaviour of PHP. We’re doing something strange by insisting on an immutable. So that’s a fun little caveat to the whole process.

There is a way around this one with PHP’s final keyword though. Final tells the parser that the class in question is complete and must not be extended with other functionality.

With one small change to our Immutable class we can prevent this attack on the immutable.

final class Immutable {
    private $skater, $trick;
    private $mutable = true;

    public function __construct($skater, $trick) {
        if (false === $this->mutable) {
            throw new \BadMethodCallException('Constructor called twice.');
        }
        $this->skater = $skater;
        $this->trick = $trick;
        $this->mutable = false;
    }

    public function getSkater() {
        return $this->skater;
    }

    public function getTrick() {
        return $this->trick;
    }
}

class NaughtyDev extends Immutable {

}
// Fatal error: Class NaughtyDev may not inherit from final class (Immutable)

So now we know that we need a class that is declared final and uses private properties to store our data. It is important to ensure that no methods in Immutable allow changes to any values after the class is instantiated.

Finally, a more complete example

To finish off let’s go through a final working example of an immutable object in PHP. In this example PHP 7 type hinting has been added as well.

declare(strict_types=1);

final class SkateboardTrick {
    private $inventor, $trickName;
    private $mutable = true;

    public function __construct(string $skater, string $trick) {
        if (false === $this->mutable) {
            throw new \BadMethodCallException('Constructor called twice.');
        }
        $this->inventor = $skater;
        $this->trickName = $trick;
        $this->mutable = false;
    }

    public function getInventor(): string {
        return $this->inventor;
    }

    public function getTrickName(): string {
        return $this->trickName;
    }
}

The strict type declaration and the use of function argument hinting further fortifies the immutable class. We don’t unexpectedly get given an object for instance. This would be bad because that object could be altered elsewhere, which would change the contents of our immutable - bad! More on this in a later article.

Conclusion

The methods and properties defined in the final class SkateboardTrick ensure that the object is immutable and cannot be extended. This is, of course, our goal.

$x = new SkateboardTrick('Mullen', '540 Shove-it');
echo $x->getInventor(); // Mullen
echo $x->getTrickName(); // 540 Shove-it

$x->inventor = 'Hawk'; // Fatal error: Cannot access private property SkateboardTrick::$inventor

$x->__construct('Hawk', $x->getTrickName());
// Fatal error: Uncaught BadMethodCallException: Constructor called twice.

This means that if you need to alter a value in SkateboardTrick you’ll have to create a new instance with the modified values.

$x = new SkateboardTrick('Mullen', 'Ollie');
echo $x->getInventor(); // Mullen
echo $x->getTrickName(); // Ollie

$z = new SkateboardTrick(
    $x->getInventor(),
    $x->getTrickName() . ' fingerflip'
);
echo $z->getInventor(); // Mullen
echo $z->getTrickName(); // Ollie fingerflip

In my next article I will cover how we can optimise the process of creating new instances with modified information contained within them. They’ll still be immutable of course, just with a little more sugar to make things a touch easier.

This article is part of a series I have written on the topic of immutability in PHP code:

Part one - a discussion of caveats and a simple scalar handling immutable

Part two - improve the process of creating modified copies of the immutable

Part three - objects in immutable data structures and a generalised immutable implementation

Also available in Русский (Russian):

Часть 1 - PHP и неизменяемость. Часть 1

Часть 2 - PHP и неизменяемость: экземпляры, которые могут быть изменены. Часть 2

Часть 3 - PHP и неизменяемость: объекты и обобщение. Часть 3

If you like this article then you might get a kick out of writing functional php code as taught in the Functional Programming in PHP book that I wrote.

https://www.simonholywell.com/post/2017/03/php-and-immutability/

Alter a MySQL column in all databases

Simon Holywell Mar 1, 2017 Updated Nov 19, 2024

When you have a series of applications all running the same database structure it can be annoying to roll out schema updates across all the databases. If you’ve got migrations then great - script their deployment, but when you’re dealing with an old legacy application you probably don’t have the luxury. I was firmly in the latter class of devops when working on a project a couple of years ago so I wrote a handy little snippet of SQL to help me out.

Show full content

When you have a series of applications all running the same database structure it can be annoying to roll out schema updates across all the databases. If you’ve got migrations then great - script their deployment, but when you’re dealing with an old legacy application you probably don’t have the luxury.

I was firmly in the latter class of devops when working on a project a couple of years ago so I wrote a handy little snippet of SQL to help me out. It allowed me to automatically assemble an ALTER TABLE query for MySQL - it’s gonna be a hack in case you’d not already guessed.

It is MySQL specific so to make it portable across MySQL installations I have used backticks to escape identifiers rather than the double quotes I advocate in the SQL style guide. tl;dr: double quotes as escape is not enabled in MySQL by default.

This query hinges on the information_schema tables and the GROUP_CONCAT functionality provided in MySQL, which will allow us to create one big query file in tandem with OUTFILE.

So without further ado here is the SQL code:

SET SESSION group_concat_max_len = 1000000;

SELECT `query`
  FROM (SELECT GROUP_CONCAT(
                   CONCAT('ALTER TABLE `', `isc`.`table_schema`, '`.`', `isc`.`table_name`, '` ALTER `', `isc`.`column_name`, '` DROP DEFAULT;', '\n'),
                   CONCAT('ALTER TABLE `', `isc`.`table_schema`, '`.`', `isc`.`table_name`, '` CHANGE COLUMN `', `isc`.`column_name`, '` `', `isc`.`column_name`, '` VARCHAR(300) NOT NULL;')
                   SEPARATOR '\n'
               ) AS `query`,
               1 AS `groupbyme`
          FROM `information_schema`.`columns` AS `isc`
         WHERE `isc`.`table_schema` NOT IN ('information_schema', 'mysql', 'performance_schema')
           AND `isc`.`table_name` = 'users'
           AND `isc`.`column_name` = 'ipAddress'
         GROUP BY `groupbyme`
       ) AS T
   INTO OUTFILE '/tmp/alter_table.sql'
       FIELDS TERMINATED BY '\n'
       OPTIONALLY ENCLOSED BY ''
       ESCAPED BY ''
       LINES TERMINATED BY '\n';

Great, you say, but what does it do? Well here is a brief breakdown of its constituent components.

SET SESSION group_concat_max_len = 1000000;

By default MySQL has a much smaller GROUP_CONCAT maximum length and as we could be operating on a lot of tables (meaning many results to concatenate) we need to up this default for our particular query.

SELECT `query`
  FROM (SELECT GROUP_CONCAT(
                   CONCAT('ALTER TABLE `', `isc`.`table_schema`, '`.`', `isc`.`table_name`, '` ALTER `', `isc`.`column_name`, '` DROP DEFAULT;', '\n'),
                   CONCAT('ALTER TABLE `', `isc`.`table_schema`, '`.`', `isc`.`table_name`, '` CHANGE COLUMN `', `isc`.`column_name`, '` `', `isc`.`column_name`, '` VARCHAR(300) NOT NULL;')
                   SEPARATOR '\n'
               ) AS `query`

Here GROUP_CONCAT concatenates the results of the queries together separated by a newline character \n. Inside there are two CONCAT statements that are building two ALTER TABLE queries. The first removes a DEFAULT declaration from a column and the second changes column to be a VARCHAR of 300 characters in length.

               1 AS `groupbyme`

This simply creates a column that whole lot can easily be grouped by as every row is ascribed the same value of 1.

         WHERE `isc`.`table_schema` NOT IN ('information_schema', 'mysql', 'performance_schema')
           AND `isc`.`table_name` = 'users'
           AND `isc`.`column_name` = 'ipAddress'

We don’t want to accidentally perform actions against the information_schema or other internal MySQL tables so we exclude them here. To ensure we only operate on the correct table we then specify its name and the relevant column name. This way we can be sure that the DB has our target table name in it and that the target table has our target column name in it.

         GROUP BY `groupbyme`

This refers to the simple column we setup earlier - it meant all rows had a value of 1 allowing the results to be easily grouped.

       ) AS T

After all that the result must be set against an alias so for no particular reason I chose T here.

   INTO OUTFILE '/tmp/alter_table.sql'
       FIELDS TERMINATED BY '\n'
       OPTIONALLY ENCLOSED BY ''
       ESCAPED BY ''
       LINES TERMINATED BY '\n';

Finally, the results are written to a file on disk with a few custom options. Importantly, the field are prevented from being enclosed or escaped - if these were left enabled then MySQL would break our query output by escaping new lines.

Now all you have to do is run contents of the file we just created and the ALTER TABLE queries will be executed against your database.

https://www.simonholywell.com/post/2017/03/mysql-alter-column-in-all-databases/

Email me when the file changes

Simon Holywell Jan 25, 2017 Updated Nov 19, 2024

It is important to ensure that Google does not index sites whilst they are still on a staging environment, but you cannot lock it down completely - how would your clients proof it? So I run a simple global rewrite rule in Apache that redirects all requests for robots.txt to a central disallow all response. This works great and Google appears to honour the rule as one would hope. What happens though when something about that central file changes?

Show full content

It is important to ensure that Google does not index sites whilst they are still on a staging environment, but you cannot lock it down completely - how would your clients proof it? So I run a simple global rewrite rule in Apache that redirects all requests for robots.txt to a central disallow all response. This works great and Google appears to honour the rule as one would hope.

What happens though when something about that central file changes? One fateful night it so happened to occur on an old server I manage. Someone had altered the file and replaced it with an allow all rule. A site from the server started to appear in Google’s listings and thankfully it was picked up quickly, banned through Google webmaster tools and the original robots.txt put in place to protect against future indexing.

This left me needing a quick and dirty little monitoring script to keep an eye on the file. It really didn’t need to be anything crazy - just email me when the file changes so I can investigate what or whom changed it - tell them to desist - and revert it’s contents.

To do this I employed sha512sum and mail inside a simple cron job that would regularly compare the file’s hash against the known good hash. If the hashes do not match then the script will email a short message to let me know to check into it.

Now of course you could just use the cron job to revert the contents automatically, but I wanted to look into why it was happening first. If you’re really worried you could of course replace the contents of the file and then email yourself. In this case it wasn’t so important.

There are plenty of command line tools to help you get a hash for a file - handy when you’ve downloaded something and you want to verify the integrity of file. It used to be common for open source projects to list hashes beside their downloads before GitHub. Anyway there are a number of choices with increasing length and therefore less collision prone (two different files creating the same hash):

md5sum
sha1sum
sha256sum
sha512sum

By default they’ll spit out the hash(es) onto the command line (STDOUT), but we’re going to redirect them to a file so we can refer to them later.

sha512sum robots.txt index.html > cron_sums.txt

This will create a text file containing two hash values that we can use to later verify against the files in question. If you, later, take another hash of the files and it doesn’t match the one in cron_sums.txt then that file has changed. There is a handy switch you can pass to sha512sum that makes this process much easier.

sha512sum --status -c cron_sums.txt

The above command’s exit status code can be used to generate a human readable message using a simple || (or) operator on the command line.

sha512sum --status -c cron_sums.txt && echo "Success" || echo "Failed"

The above command is pretty self explanatory so I won’t bother working through it and I’ll move onto sending the email instead. This will be done by using the venerable mail command.

sha512sum --status -c cron_sums.txt && echo "Success" || echo "Failed" | mail -s "File hashes didn't match" example@example.co.zw

Here the output messages are piped to mail and dutifully sent through to your inbox. This works, but I’d like to only be disturbed when it goes wrong - I don’t care if it succeeds. To do this we’ll pipe the success output to the /dev/null blackhole.

sha512sum --status -c cron_sums.txt && echo "Success" > /dev/null || echo "Failed" | mail -s "File hashes didn't match" example@example.co.zw

So we’ve worked out the bash command we want cron to run for us every minute of the day. Let’s tell cron about it! Execute crontab -e on the command line to open the crontab in your default editor.

Now add the following cron job to the file.

*   *   *   *   *    /usr/bin/sha512sum --status -c cron_sums.txt && echo "Success" > /dev/null || echo "Failed" | mail -s "File hashes didn't match" example@example.co.zw

It is worth noting that the paths in the cron_sums.txt file are all relative so you may need to change into the directory containing the files you want to check before running the sha512sum command from cron. Also cron will run in the user’s home directory by default.

*   *   *   *   *    cd /var/www; /usr/bin/sha512sum --status -c cron_sums.txt && echo "Success" > /dev/null || echo "Failed" | mail -s "File hashes didn't match" example@example.co.zw

It isn’t pretty and it certainly doesn’t scale (although you could email a list/forwarding group), but it does serve as a quick and dirty fix to warn you of file inconsistency.

As a bonus; to automatically revert the file as well you could add the following to the crontab.

*   *   *   *   *    cd /var/www; /usr/bin/sha512sum --status -c cron_sums.txt && echo "Success" > /dev/null || echo "Failed" | mail -s "File hashes didn't match" example@example.co.zw && echo "Disallow: All" > robots.txt

Whilst this is a very simple one-liner example you could of course use the same principles to write a simple little bash script that would be triggered by failure instead.

*   *   *   *   *    cd /var/www; /usr/bin/sha512sum --status -c cron_sums.txt && echo "Success" > /dev/null || /usr/scripts/files_changed.sh

https://www.simonholywell.com/post/2017/01/email-when-file-changes/

SQL style guide misconceptions

Simon Holywell Dec 9, 2016 Updated Nov 19, 2024

Many people have read, reviewed and even implemented the SQL style guide that I wrote. This is great, but there have also been a number of commonly held misconceptions or incorrect readings of the points made in the guide. I have decided to address some of the more common ones via a blog post that will, hopefully, clarify the situation. Basics A lot of people seem to have a very weird understanding of the basic terms used in the guide so here is what I mean when I say ‘Avoid’ and ‘Try’:

Show full content

Many people have read, reviewed and even implemented the SQL style guide that I wrote. This is great, but there have also been a number of commonly held misconceptions or incorrect readings of the points made in the guide. I have decided to address some of the more common ones via a blog post that will, hopefully, clarify the situation.

Basics

A lot of people seem to have a very weird understanding of the basic terms used in the guide so here is what I mean when I say ‘Avoid’ and ‘Try’:

In this context I mean that where it does not make sense (for performance, readability, etc reasons) then don’t follow the guide. Follow it where possible and be mindful that when you deviate you’re adding to tech debt.

I wrote the guide for people to employ with their brains engaged and not just to blindly follow it. It is just a guide at the end of the day and not the law! You may have an exception that the guide says to avoid - try to, but if you can’t then you can’t. Document it and highlight why the problem has been introduced to the application and move on.

Comma positioning

So many people took issue with this - they desperately wanted commas before terms. I like to stick to reading conventions as much as possible as it makes code far more legible to me. In the English language, a comma separated list always places the comma immediately after a term.

SELECT manufacturer, model, engine_size
  FROM motorbikes;

Note that commenting out the engine_size column here would cause a SQL parser error because there would be a dangling comma left after the model column.

Many people, when coding SQL, like to place the comma at the beginning of a term as they seem to think it makes it easier to comment out parts of a query.

SELECT manufacturer
       , model
       , engine_size
  FROM motorbikes;

There’s no way of being polite here - this looks hideous, weird and totally unconventional in a bad way. A proponent of this style would contend that they can now easily comment out the engine_size or model column if they needed to.

SELECT manufacturer
       , model
       -- , engine_size
  FROM motorbikes;

The query still works, yay! This MUST be better. Well until you try to comment out manufacturer and realise all you’ve succeeded in doing is moving the problem to the other end of the list!

SELECT -- manufacturer
       , model
       , engine_size
  FROM motorbikes;

Boom! A big SQL parser error because of the stray comma left in front of the model column. You’d be asking the parser to run the following broken SQL code:

SELECT , model
       , engine_size
  FROM motorbikes;

See what I mean now?

So just stick to the English language convention for legible queries and your query will be just as easy to comment out anyway.

Object oriented (OOP)

This is one of my favourite rules but has apparently left quite a few people scratching their heads. Given its importance, it makes sense to cover it briefly here.

Object oriented design principles do not effectively translate to relational database designs—avoid this pitfall

– Designs to avoid

Fortunately, this is pretty simple and shouldn’t require too much explaining, but it is a very an important aspect of the guide and should not be ignored.

In its most simplified form, I am trying to say that you should not design your database with the schema (table structure) dictated by the application level code objects that access it. Do you use an ORM? So what, who cares? Your database certainly shouldn’t!

The database’s primary concern should be with the most efficient storage of data in a relational structure that emphasises normalisation. You can choose to ignore this, but you need to know that you’ve just introduced a pungent smell into your application.

This desire to allow OOP thinking infect the database layer is known by the snappy title “Object-relational impedance mismatch” and there is a nice Wikipedia article of the same name that goes into far more detail on this topic.

id columns and surrogate keys

Another fun topic that has been bashed to death when discussing the SQL style guide is the banning of surrogate keys. You do not need them in most cases as better keys often exist in the data already.

Of course, like I mentioned before, you can use them at your own peril - we’re discussing a guide here - not the law.

So if you choose to use an ORM that requires them then use them! I have made recommendations for best practice, but you’re free to disregard them. Just because I think that’s foolish doesn’t mean that it is wrong for you or your project. Make your own choices - you’ve been warned, however.

Avoiding vendor/proprietary functions

Many readers have struggled with the recommendation to only use standard SQL in the erroneously believing that the sole reason for this is to promote SQL portability between engines. What about ease of reading for other developers, reducing complexity and portability of you - the developer?

Why you would willingly choose a proprietary solution when a standard SQL method already exists is beyond me. I really do fail to see the problem that readers have here with this recommendation. You’re introducing complexity where it is unnecessary.

One reader was so incensed by this simple rule that they felt unable to continue reading the guide. This, of course, made me giggle at the absurdity of such a stance.

UPDATE 19/11/2024: 8 years on and this is still something that I see developers taking issue with on Hacker News. So, I figured I would expound upon the ideas here in a new article: A note on code portability.

Joins over the river

Another point of contention is the style of putting table joins on the other side of the river (east). There have been a few readers that find this interrupts their reading flow or they don’t like the way it makes the whole query look.

SELECT r.last_name
  FROM riders AS r
       INNER JOIN bikes AS b
       ON r.bike_vin_num = b.vin_num
          AND b.engines > 2

       INNER JOIN crew AS c
       ON r.crew_chief_last_name = c.last_name
          AND c.chief = 'Y';

Here are a few reasons for this style of join layout:

The join is an addition to the FROM clause so it makes logical sense for it to exist grouped under it
It provides a nice clean way of specifying join conditions (eg. AND b.engines > 2)
They’re an alteration or operation of the FROM clause
Join syntax doesn’t play nicely with the river either

Simple.

I really dislike X and Y

That’s perfectly understandable and precisely why I made it so easy to fork the guide. The core of the guide is even a separate Markdown file you can drop into a project repository and start editing it.

The whole idea of the guide is that it gives you a nice set of defaults to work from and a nicely formatted output. You can enjoy these even with a completely different set of guidelines.

Where is the justification?

It beggars belief how many readers claim to have read the guide and yet I still get messages asking for justifications. At the very top, the introductory paragraph clearly states that Celko’s book is the place to find in-depth discussion of each point. Discussions of justification have no place in a style guide anyway.

So that is where you’ll find all the justification you could need. Buy it!

Who do you think you are telling me what to do?

Interestingly, some readers have laboured under the misapprehension that this guide is somehow the law. Far from it, as the name guide suggests these are suggestions or guidelines. Right at the top of the guide, there is a clear link to where it can be forked to suit your tastes. Use it as a handy base and hack away at it!

Conclusion

So the guide has been controversial, but that is just fine. You don’t have to like it or agree with it. All I’d hoped for was that it might get you thinking about standards in your projects and talking about well-formed SQL. If it was adopted by some then all the better of course.

An impetus if you will.

https://www.simonholywell.com/post/2016/12/sql-style-guide-misconceptions/

Quick way to create a PHP stdClass

Simon Holywell Nov 16, 2016 Updated Nov 19, 2024

A very short and simple trick for creating new stdClass objects without having to set every property individually. This is akin to JavaScript’s object notation, but not quite as elegant. Creating a new object in JavaScript looks like the following example. const x = { a: "test", b: "test2", c: "test3", }; With PHP it is possible to use type casting to convert a simple array into a stdClass object which gives you a similar looking syntax although there is a little more typing required.

Show full content

A very short and simple trick for creating new stdClass objects without having to set every property individually. This is akin to JavaScript’s object notation, but not quite as elegant.

Creating a new object in JavaScript looks like the following example.

const x = {
  a: "test",
  b: "test2",
  c: "test3",
};

With PHP it is possible to use type casting to convert a simple array into a stdClass object which gives you a similar looking syntax although there is a little more typing required.

$x = (object) [
    'test',
    'test2',
    'test3',
];
var_dump($x);

/*
object(stdClass)#1 (3) {
  [0]=>
  string(4) "test"
  [1]=>
  string(5) "test2"
  [2]=>
  string(5) "test3"
}
*/

Note the type casting with (object) just before the array definition - this is what does the work of converting the simple array definition into a stdClass object.

Of course you’re going to want named properties too and handily casting an associative array does just that.

$x = (object) [
    'a' => 'test',
    'b' => 'test2',
    'c' => 'test3'
];
var_dump($x);

/*
object(stdClass)#1 (3) {
  ["a"]=>
  string(4) "test"
  ["b"]=>
  string(5) "test2"
  ["c"]=>
  string(5) "test3"
}
*/

What happens if you have two array indexes with the same key though? Obviously you cannot have two class properties with the same name so one would presume an error. Well being that this is PHP you may not be surprised to see the following.

$x = (object) [
    'a' => 'test',
    'b' => 'test2',
    'c' => 'test3',
    'a' => 'wipeout'
];
var_dump($x);

/*
object(stdClass)#1 (3) {
  ["a"]=>
  string(7) "wipeout"
  ["b"]=>
  string(5) "test2"
  ["c"]=>
  string(5) "test3"
}
*/

There is no great big crash. PHP simply overwrites the property value ($x->a in this case) with the last value of key a in the array. This is why $x->a is set to wipeout and not test.

Also note here that the order of the properties in the stdClass might not be what you were expecting either. Perhaps you were expecting $x->a to be last, but no $x->a comes first because 'a' => 'test' is set against the stdClass and it is then overwritten (rather than replaced) by the later definition 'a' => 'wipeout'.

Well there you have it - a very simple way of getting yourself a new stdClass with your desired properties set.

https://www.simonholywell.com/post/2016/11/quick-way-to-create-php-stdclass/

Functional Programming in PHP Second Edition Available Now

Simon Holywell Oct 27, 2016 Updated Nov 19, 2024

It is with great pleasure that I announce the second edition of the Functional Programming in PHP book that I have been working on. There is twice the content of the first edition of the book as well as updates for PHP 7 and Facebook’s HHVM (HipHop Virtual Machine). There are now more functional techniques and patterns included with pipelines, pattern matching and flat maps among them. I have added a section of the book dedicated to the handy syntax and functionality that HHVM can provide functional programmers with.

Show full content

It is with great pleasure that I announce the second edition of the Functional Programming in PHP book that I have been working on. There is twice the content of the first edition of the book as well as updates for PHP 7 and Facebook’s HHVM (HipHop Virtual Machine).

There are now more functional techniques and patterns included with pipelines, pattern matching and flat maps among them. I have added a section of the book dedicated to the handy syntax and functionality that HHVM can provide functional programmers with.

In addition I have, of course, listened to reader feedback and gone into a lot more detail about functions themselves, type signatures and their use, functional programming history and provided more examples of functional code in use. There is also a glossary of terms and appendices on libraries, REPLs and the frequently requested guide to using the UTF-8 ellipsis effectively in various editors.

On top of all that the book has been completely reorganised into a more logical structure with a better chapter breakdown.

So if you’re the kind of programmer who likes clean and easy to test code resulting in less bugs this is the PHP book for you. Even if functional programming isn’t really your thing all the techniques in the book will help you to become a better object oriented or procedural programmer.

To get your copy head on over to the books website for purchase links.

https://www.simonholywell.com/post/2016/10/functional-programming-in-php-second-edition/

Importing and aliasing PHP functions

Simon Holywell Oct 18, 2016 Updated Nov 19, 2024

As a follow on to my short post about namespaces and functions from a year ago I thought it would be worth covering importing a specific function and aliasing functions via namespace operators too. This has been possible since PHP 5.6, but there is a nice addition in PHP 7 I’ll cover towards the end. In the previous article I demonstrated how you can namespace functions and use them, but as a refresher; you can enclose functions within a namespace just like a class.

Show full content

As a follow on to my short post about namespaces and functions from a year ago I thought it would be worth covering importing a specific function and aliasing functions via namespace operators too. This has been possible since PHP 5.6, but there is a nice addition in PHP 7 I’ll cover towards the end.

In the previous article I demonstrated how you can namespace functions and use them, but as a refresher; you can enclose functions within a namespace just like a class. In the following example there is a function setup in the MyProject\MyModule namespace first, which is subsequently called by code inside the root namespace (namespace { }).

namespace MyProject\MyModule {
    function get_nice_superlative_for_me_please() {
        return 'gorgeous';
    }
}

namespace {
    use MyProject\MyModule as M;
    echo "You're " . M\get_nice_superlative_for_me_please() . ' today';
    // You're gorgeous today
}

Hopefully this is all pretty straight forward and clear to follow. Now onto the use keyword and how it can be implemented to import a specific function.

namespace MyProject\MyModule {
    function get_nice_superlative_for_me_please() {
        return 'gorgeous';
    }
}

namespace {
    use function MyProject\MyModule\get_nice_superlative_for_me_please;
    echo "You're " . get_nice_superlative_for_me_please() . ' today';
    // You're gorgeous today
}

Note the use function construct here instead of the usual straight forward use that you’d normally see when importing a PHP namespace. This imports just the specified function for use in the current scope.

So, my example function name sure is a little ungainly and a bit of a chore to type with it’s long name. For the sake of demonstration let’s assume that we can’t change the function name, but we still want it to be shorter in our current scope. To do this we can use the familiar as keyword used in namespace aliasing.

namespace MyProject\MyModule {
    function get_nice_superlative_for_me_please() {
        return 'gorgeous';
    }
}

namespace {
    use function MyProject\MyModule\get_nice_superlative_for_me_please as compliment;
    echo "You're " . compliment() . ' today';
    // You're gorgeous today
}

As we did earlier we’ve gone with use function to show that we’re importing and later aliasing a particular function. This is also works for functions in the root or same namespace as each other.

use function get_nice_superlative_for_me_please as compliment;

function get_nice_superlative_for_me_please() {
    return 'gorgeous';
}

echo "You're " . compliment() . ' today';
// You're gorgeous today

Now if you’re running PHP 7 you can do a another little trick with the namespace operators. If you want to include multiple specific functions in the current scope you can use the new braced grouping syntax.

namespace MyProject\MyModule {
    function get_nice_superlative_for_me_please() {
        return 'gorgeous';
    }
    function verb() {
        return 'looking';
    }
    function when() {
        return 'today';
    }
}

namespace {
    use function MyProject\MyModule\{verb,when};
    use function MyProject\MyModule\get_nice_superlative_for_me_please as compliment;
    echo "You're " . verb() . ' ' . compliment() . ' ' . when();
    // You're looking gorgeous today
}

Here you can see group syntax ({verb,when}) and function aliasing that we saw earlier working together to create the expected text output.

Unfortunately, you cannot alias and group import in the one hit as you would generate a parse error (syntax error, unexpected 'as' (T_AS), expecting ';') with code like the following example:

namespace MyProject\MyModule {
    function get_nice_superlative_for_me_please() {
        return 'gorgeous';
    }
    function when() {
        return 'today';
    }
    function verb() {
        return 'looking';
    }
}

namespace {
    use function MyProject\MyModule\{verb,when} as {v,w};
    use function MyProject\MyModule\get_nice_superlative_for_me_please as compliment;
    echo "You're " . v() . ' ' . compliment() . ' ' . w();
}

So more recent additions to PHP give namespaces more power than previously discussed with the ability to import a specific function being the highlight.

https://www.simonholywell.com/post/2016/10/importing-and-aliasing-php-functions/

Installing pgmodeler on Ubuntu

Simon Holywell Oct 18, 2016 Updated Nov 19, 2024

Pgmodeler is a handy tool for designing databases with an ERD style interface specifically aimed at the PostgreSQL community. It can come in a couple of different ways, but I am going to covering the self build process here. So if you want to learn how to build and install pgmodeler read on! To be able to build anything you’ll need to install the tools from Ubuntu’s repositories. sudo apt-get install gcc libxml2-dev postgresql The first dependency you will need to install is the from QT5 UI toolkit.

Show full content

Pgmodeler is a handy tool for designing databases with an ERD style interface specifically aimed at the PostgreSQL community. It can come in a couple of different ways, but I am going to covering the self build process here. So if you want to learn how to build and install pgmodeler read on!

To be able to build anything you’ll need to install the tools from Ubuntu’s repositories.

sudo apt-get install gcc libxml2-dev postgresql

The first dependency you will need to install is the from QT5 UI toolkit.

sudo add-apt-repository ppa:ubuntu-sdk-team/ppa && sudo apt-get update && sudo apt-get dist-upgrade && sudo apt-get install ubuntu-sdk

Next up you’ll need to install libpq:

sudo apt-get install libpq-dev

To ensure that QT is installed correctly verify that you get similar responses to me from the following commands:

pkg-config libpq --cflags --libs
# -I/usr/include -L/usr/lib64/libpq.so

pkg-config libxml-2.0 --cflags --libs
# -I/usr/include/libxml2 -lxml2

If you do not the there is a little manual intervention that you’ll need to perform by creating the /usr/lib/pkgconfig/libpq.pc manually. It should contain code of the following block.

prefix=/usr
libdir=${prefix}/lib/postgresql/[VERSION]/lib
includedir=${prefix}/include/postgresql

Name: LibPQ
Version: 5.0.0
Description: PostgreSQL client library
Requires:
Libs: -L${libdir}/libpq.so -lpq
Cflags: -I${includedir}

So that is all the dependencies taken care of and we can move onto building and installing the actual binary itself.

qmake pgmodeler.pro
make
sudo make install

The binary is now in the build directory so we can call it from there after passing it a few options with a start-up script.

Move into the build directory and create start-pgmodeler.sh.

cd build
vim start-pgmodeler.sh

In the start-up file we want to add the following lines of bash script.

#/bin/bash

# Specify here the full path to the pgmodeler's root directory
export PGMODELER_ROOT="."

export PGMODELER_CONF_DIR="$PGMODELER_ROOT/conf"
export PGMODELER_SCHEMAS_DIR="$PGMODELER_ROOT/schemas"
export PGMODELER_LANG_DIR="$PGMODELER_ROOT/lang"
export PGMODELER_TMP_DIR="$PGMODELER_ROOT/tmp"
export PGMODELER_PLUGINS_DIR="$PGMODELER_ROOT/plugins"
export PGMODELER_CHANDLER_PATH="$PGMODELER_ROOT/pgmodeler-ch"
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:"$PGMODELER_ROOT"
export PATH=$PATH:$PGMODELER_ROOT

# Running pgModeler
pgmodeler

After writing that file and quitting vim you need to set the files executable bit using chmod.

chmod +x start-pdmodeler.sh

You can now run pgmodeler by calling the start-pgmodeler.sh shell script on the command line.

./start-pdmodeler.sh

After the application starts itself you will be greeted by the nice QT interface of pgmodeler. You’re done!

https://www.simonholywell.com/post/2016/10/install-pgmodeler-ubuntu/

Tamiya Manta Ray

Simon Holywell Sep 13, 2016 Updated Nov 19, 2024

I’ve finally found some time to look at my old remote control car again - I was so happy when I bought it as a kid nearly twenty years ago. I’ve still got the original nickel cadmium battery (all 1300mAh!) and charger so I charged it up expecting that there’d be little to no response from the car. Amazingly the kids and I got about 5 minutes of full throttle action out of it before it slowed to a halt.

Show full content

I’ve finally found some time to look at my old remote control car again - I was so happy when I bought it as a kid nearly twenty years ago. I’ve still got the original nickel cadmium battery (all 1300mAh!) and charger so I charged it up expecting that there’d be little to no response from the car. Amazingly the kids and I got about 5 minutes of full throttle action out of it before it slowed to a halt.

As I picked it up and the wheels spun, winding down the last of the charge in the old battery, I noticed that it was quite a bit noisier than I remembered as a kid. Especially from around the rear differential area so I figured some grease was required. Naturally, I looked around online for stories from others who’d resurrected their old cars and surprisingly there was a lot of information out there.

Most of it in quite difficult to find tidbits that are spread out across many different online forum threads. So I’ve assembled all the information I found useful here for my own reference and yours too.

Parts compatible models

The Manta Ray (58087 and the later 58360 re-release) it would seem is a much loved car both back then and now. It was part of the first wave of modern 4WD Tamiya remote control cars and was the first to carry chassis designation of DF01. There were other DF01 cars including the Blazing Star, Top Force, Top Force Evolution, Dirt Thrasher and Terra Conqueror.

Later Tamiya made a number of other rally (such as the Lancia Delta Integrale) and road cars (including the Nissan Skyline) that shared many of the same parts and chassis as the DF01 cars. These later cars were designated as, first, the TA01 and then later the TA02.

Some parts on the later cars are considered to be stronger or better designed so most of the people rebuilding will replace broken or worn parts on a DF01 with parts for the TA01 or TA02 chassis. Being rally/road cars though there are a couple of differences and deviations:

the suspension strut towers (front and back) are shorter than the buggy
in sympathy the struts themselves are shorter
with that the body mount pins are in different positions too.

It is important to note that parts from the DF02 or DF03 are completely different to the DF01, TA01 or TA02 so you can’t use them in your Manta Ray. Some people have mentioned that you can use the wheels from a DF03 on the DF01 as they’re both compatible with 12mm hex wheels - the advantage of swapping here is that you can use newer (read better) tyres in the new 2.2 profile.

Also, note that the re-re-release and Quick Drive version of the Manta Ray is also incompatible - you can tell these easily as they’ve got little impact wheels on either end of the front bumper and/or are smaller.

Replacement parts

As mentioned there are a number of common parts between the DF01 and the later TA01 or TA02. After reading around I noted down the parts that people were having issues with or those that regularly break and ordered the recommended Tamiya part numbers.

Tamiya 50197 Snap pin set - my model was missing a few securing the body - there are after market alternatives too
Tamiya 50478 TA01/TA02 Skyline rear drive/gear case - a weak point on the original so I wanted a spare and apparently these are an improved design
Tamiya 50602 Bevel gear differential set - can be worn so I got these as spares (you’ll probably want two - one for the front and one for the rear)
Tamiya 50541 TA01/TA02 4WD rally car front drive/gear case - another weak point like the rear
Tamiya 50529 TA01/TA02 4WD plastic gear set - replace the aluminium gears with a stronger, improved plastic gear set designed for the later cars

I may refer to the above part numbers later in the article, but these should give you a go to list of compatible part. These parts are relatively easy to find on Amazon, eBay and rcMart.

Upgrade/Hop-Up parts

The upgrade parts for the DF01 are all but impossible to find - Tamiya called these Hop-Up parts. There are some companies like Fibre-Lyte, Carson, GPM and Yeah Racing that do provide some after market parts for the TA01 and TA02 so you can use some of those.

If you can find them then the following Tamiya parts are highly prized:

Tamiya 53079 Stainless steel prop shaft for Manta Ray
Tamiya 53099 FRP double deck chassis plate set
Tamiya 53100 Top Force double deck carbon graphite chassis plate set
Tamiya 53070 Manta Ray/TA01 ball diff set
Tamiya 53071 Manta Ray/TA01 torque splitter set
Tamiya 53164 Hollow carbon gearshaft set
Tamiya 53073 Bearing set for the Manta Ray

Some of the more desirable after market parts include:

Fibre-Lyte TOP 01 & TOP 02 chassis set (you’ll need to choose the straps to pair with it based on the battery pack you intend to use - you’ll also need two of each strap)
Fibre-Lyte TOP 03 front strut tower/mount
Fibre-Lyte TOP 04 rear strut tower/mount
GPM TA1012 front gear case cover - this is good to have as the suspension arms connect to it and this upgrade will strengthen the front end in case of a crash
GPM TA1025 prop shaft or main shaft
Yeah Racing TA01-027BU rear gear case cover

Most of the Tamiya parts a really hard to find and cost exorbitantly large stacks of cash. It’s not uncommon for a chassis to be above $200USD or the bearing set to be in excess of $140USD. Thankfully there are after market parts that can fill this void and make upgrades to the Manta Ray affordable.

Bearing set

To help reduce the wear on moving parts it is recommended that one of the first and most valuable upgrades you could do to the car is to replace all bushings with ball bearings. Due to exposure to dirt (being a buggy rather than a street car) it is recommended to go with a plastic/PFTE/nylon shielded sealed ball bearing as opposed to the slightly cheaper metal shielded variety.

You’ll need 22 bearings made up of:

6x 1150 bearings (5 x 8 x 2.5mm)
16x 850 bearings (5 x 11 x 4mm)

These can be bought as kits (look for TA01 kits) or you can use the bearing numbers/dimensions to buy them individually from a bearing or hobby shop.

Manufacturers of kits include:

Tamiya 53073 - rare as hens teeth
Acer Racing ceramic bearings
Alicenter Bearings
Yeah Racing YB0018B
Fast Eddy 769173950092
VXB Ball Bearings Kit177096
Boca Bearing #55-145, #55-145GS, #55-145C-YZ, #55-145C-YS and #55-145C-YU in ascending order of price

If you’re going to be installing a more powerful motor then you really should install the bearing kit beforehand.

Propeller shaft or main shaft

By far and away the easiest method of getting a new prop shaft (sometimes known as a main shaft) is to buy a Yeah Racing (TA01-134BU) after market alloy one. Some people using fast brushless motors have reported shearing the dog bones off the end of shafts like this. I am not sure how much abuse they were giving them, but it might be worth the extra dollars up front to get a more sturdy GPM (TA1025) or Tamiya Hop-Up (53079) part if you intend to go crazy brushless.

Whilst the next one is not a Hop-Up, but it is more substantial than the wire prop shaft that comes by default in many of the DF01s you can use the Top Force shaft apparently, which is Tamiya 3485025 (also X10185). As they were standard with a few models they can be easier to come by one eBay - bear in mind you may need the shaft, two circlips and the end cups. Jury is out on that one as I’ve also read some people have had success with the standard prop shaft cups.

The standard Manta Ray prop, should you need it, is even cheaper and easier to get as it is not desirable - look for Tamiya 3485039 - sometimes derisively known as the twisted coat hanger in RC forums.

If you’re installing a more powerful motor then this is definitely a sensible upgrade to make to ensure the power is being expended to the wheels and not in deforming the prop shaft!

ESC (Electronic Speed Controller)

Most Manta Rays weren’t fitted with an ESC but with an MSC (Motorised/Manual Speed Controller) and these aren’t really up to modern batteries or motors - quickly burning out. On top of that they’re less efficient and heavier than even the cheapest of ESCs.

You can swap in a new ESC very easily as they will plug right into the radio receiver already in your car so you don’t need to change your radio equipment. The connectors for the motor will probably also be compatible, but there may be some soldering required.

Interestingly you’ll also no longer need the additional power wire that runs from the MSC to the radio receiver. The ESC will power the radio receiver via it’s one control/power cable triptych.

I just went with the simplest and cheapest ESC I could find that had some positive reviews. A Hobby King X-Car 45, which is simple, small and suitable for brushed motors and up to a 2S LiPo battery - perfect!

If you want something waterproof or just that bit better then I’ve not seen any bad words said about the Hobbywing QuicRun WP 1060 Brushed or it’s rebadged clone the Yeah Racing Tritronic (ESC-1060WP).

As I am not running a brushless motor or ever intending to in this car I’ve not looked into brushless ESCs, but you need to ensure the ESC you select is specifically for a brushless motor if you intend to run one. You cannot, generally, use a brushed ESC with a brushless motor and vice versa so make sure you check carefully if you intend to do so - you’ll probably be able to tell by the crazy price you’re paying!

So if you plan to go to high capacity LiPo or NiMH batteries it is important that you look into replacing that MSC with a nice new ESC. In the case of a LiPo this is especially true as you’ll see in the batteries section.

Batteries and charging

Battery technology has really moved on since the Manta Ray was released along with its low capacity NiCd batteries (1300mAh!). The next step up from a NiCd is probably a NiMH battery, which is an improvement in terms of capacity and charge time. You can handle this pretty much like you would your NiCd pack from the past and they come in a similar form factor - round cells in a shrink plastic wrap.

They can be a bit of pain like the old NiCds though in that they do end up suffering from the memory effect over time and they display voltage sag during use. Another aspect to consider is whether your old charger can charge a NiMH battery properly.

After looking at these options I decided that as I needed to upgrade the charger and I wanted to run an ESC anyway I might as well go for a LiPo battery pack and enjoy the longer and more consistent power delivery. The cost difference between a LiPo an NiMH battery also appeared to be negligible for a similar capacity.

Some anecdotal evidence suggested that the greater output of a modern NiMH battery would be enough to burn out the MSC pretty quickly as well anyway so I saw no point in keeping the old technology.

It is important that LiPo batteries are not discharged too far so you’ll want to select an ESC that is designed to cut-out/off when the battery voltage drops - most LiPo compatible ones will provide this feature.

Charging a multi-cell LiPo battery is different to a NiMH or NiCd as the cells need to be balanced so it is best to get a charger that can do this for you. I went with a tried and trusted iMax B6 charger that can do up to 60W and 5A. As you shouldn’t charge at greater than 1C and the battery I intended to buy was 4200mAh a 5A charger should be more than enough.

The C rating is something that I found confusing at first, but it essentially means charge rate so if your battery capacity is 4200mAh then 1C would be 4200mA or 4.2A and 2C would be 8400mA or 8.4A. This can apply to charge and discharge rates - usually a batteries charge rate is lower than it’s discharge rate. This means it will take longer to charge than it will to be discharged - presuming it is discharged at its maximum rate.

You’ll also need to be a little more careful with your LiPo batteries as they can catch fire if they’re incorrectly charged or damaged. Most people recommend charging them in a fireproof container like a sealed metal tin or a specialist LiPo charging bag. It is recommended that you only charge them on a solid non-flammable surface like concrete to further minimise any risk.

Now choosing a LiPo for your DF01/TA01 chassis cars isn’t as easy as you’d think. To reduce the chance of damaging the battery it is recommended that cars use the hard case packs - given their position in the buggy they could come into contact with something during a crash or hard landing. Most hard packs have a rectangular profile so they simply won’t fit into the chassis oval shaped battery receptacle.

So you’ll be looking for what are known as car stick packs or hardcase stick packs to use in your car. Choose a 2S pack (means that there are 2 cells in the battery) - 3S will not be the right shape for the Manta Ray or other DF01s.

I went with a Turnigy nano-tech 4200mAh 2S (NC4200.2S2P.4) LiPo battery that claimed to fit the Manta Ray chassis, but it doesn’t, I discovered, without some modification to enlarge the ovalised shape in my chassis. Others have claimed to get a good fit from the following batteries, but they weren’t available to me when I was buying mine:

Speed Energy SE-4800/30/TP 7.4v 4800mAh 30C 2S
Core RC CR158 7.4v 4000mAh 20C 2S
Core RC CR293 7.4v 4000mAh 30C 2S
Jamara #141390 7.4v 5000mAh 30C 2S
Gens ace #8 7.4v 4000mAh 25C 2S
Intellect IP2500 7.4v 2500mAh
Intellect IP4000 7.4v 4000mAh
Hyperion G3 SWIFT HP-SW20-4000CP-2S Classic Pack 4000mAh 7.4v 20C 2S

I also took the opportunity to change the old Tamiya battery connector over to a Deans connector, which has a larger contact area for better power transmission and they’re easier to use. They were marketed as T connectors by the shop I bought them from, but they’re the same as the Deans as far as I can tell.

Motor and pinion gears

With the standard motor (silver can) you probably want a 19 tooth or (19T) steel pinion gear - the more powerful the motor you run the fewer teeth you’ll want say a 16T. Confusingly, motors are also rated in terms of T except in this case it stands for turns. So a 9T motor will be faster than a 21T motor. For a hot, but still brushed motor consider something like a 17T motor with a 19T pinion. You’ll want to be looking for 540 sized motors as replacements.

This steel pinion gear should be 0.6 pitch and there are at least two manufacturers currently making them Robinson Racing (USA) or RW Racing (UK). Their part numbers are simple and you can adjust the last two numbers to the number of teeth you want.

Robinson Racing RRP1119
RW Racing ARW0600-19

Many people have gone with the newer brushless motors to get top speed out of their models such as 9T motor paired with a 16T pinion gear. You will need a brushless specific ESC too, but I’ll cover that later. So back to pinions 16T would be:

Robinson Racing RRP1116
RW Racing ARW0600-16

Another good upgrade is to replace the plastic motor mount with a metal one for increased strength. There is evidence that the plastic mount can deform or crack allowing the motor pinion gear to move around and crash into the gearbox destroying the gears. A number of people mentioned that they preferred the mounts that had fixed posts like the Yeah Racing one rather than those with screw on posts like the GPM.

It was also noted that some cheaper eBay versions don’t have one post shorter than the other like they should to match the factory plastic one. I ended up going for the Yeah Racing one based on recommendations although I was initially attracted to the GPMs heat sink style design - it has screw on posts. Whilst looking I found the following mounts available:

Tamiya 53142
Yeah Racing TA02-013BU
GPM TA1002
Pargu MS0104

If you do decide to fit a more powerful motor then ensure you install an aluminium motor mount, a set of bearings, a compatible ESC and a more substantial prop shaft as well. These are considered the basic essentials before introducing more power to the DF01/TA01/TA02 chassis. You should also upgrade the gear set to the Tamiya 50529 and use one of the pinion steel pinion gears discussed above.

Obviously the faster you go the harder you impact when you crash so maybe sticking the fastest motor the chassis can handle into a 20-30 year old RC car isn’t the best idea…

Wheels and tyres

You’re going to be looking for 12mm hex drive off road buggy wheels.

The wheels from a Tamiya DF-03 Dark Impact look good on the Manta Ray with the advantage that you can use newer (read better) tyres (2.2 profile) than the Manta wheels. You’d need both front (Tamiya 10440209) and rear (Tamiya 10440210) wheels with tyres:

Tamiya 54185 (front) and Tamiya 54186 (rear)
Tamiya 51240 (front) and Tamiya 51241 (rear)
Tamiya 53878 (front) and Tamiya 53879 (rear)

to match.

Other cheaper options also abound on eBay, but fitting may be more hit and miss with diameter, offset and interior space within the wheel for the axle stub/wheel hub.

Suspension

You can use pretty much any remote control car alloy strut that is between 95mm and 100mm long. I’ve not bothered looking into this so far as my dampers are in really good condition luckily. There are Tamiya rebuild kits available, but they seem very difficult to come across.

If mine do need replacement then I think I’d be looking for some generic aluminium dampers that are the correct length to suit the car. This looks to be an easier and cheaper way of fixing the suspension than buying replacement Tamiya parts - plus they look better than the yellow originals!

Conclusion

It is also worth pointing out here that some people have spent an awful lot of money improving their cars or replacing bits with up-spec parts. One build I saw was over $1k AUD and another around the $700 USD mark - so you can go nuts with aluminium, carbon and titanium. Set yourself a budget as all the tiny parts at $15-20 each very quickly add up too.

A few of the most hopped up cars I found whilst looking for information:

Some other places to get parts excluding the obvious:

Twokey’s RC Parts - Australia
Jason’s RC Store - Japan (cheap international postage)
rcMart - Hong Kong (cheap parts - reasonable postage)
Asiatees - Hong Kong
Stella Models - Hong Kong
Goldstar Stockists - Tamiya parts UK
Hobby King - (Australia, US, Hong Kong and UK warehouses)
Fusion Hobbies - UK
Modelsport - UK

https://www.simonholywell.com/post/2016/09/tamiya-manta-ray/

Brisbane

Simon Holywell Sep 2, 2016 Updated Nov 19, 2024

Way back in 2007 I arrived in London, England ready to start a new chapter of my life working in the big city. I’d left a good job at a web agency in Melbourne - one of the world’s most livable cities - to experience the financial capital of Europe. It was a fantastic time and so much happened I could write a novel, but here are a few highlights.

Show full content

Way back in 2007 I arrived in London, England ready to start a new chapter of my life working in the big city. I’d left a good job at a web agency in Melbourne - one of the world’s most livable cities - to experience the financial capital of Europe.

It was a fantastic time and so much happened I could write a novel, but here are a few highlights.

During my time in England I met my now wife and decided to move to the south coast. Soon after she completed her degree we wed at the extravagant Royal Pavilion in Brighton.

On the way to a brief sojourn in Auckland, New Zealand we travelled through Europe by train and car before honeymooning in Phuket, Thailand.

Soon after flying back to England and setting up in Worthing (near Brighton) we welcomed the first of two children into the world. During this time we travelled to Zimbabwe where I met my wife’s parents for the first time.

We had our second child and bought our first house on a street in Worthing and did a little DIY.

Throughout this time I took up roles with a number of companies that finally culminated in becoming Technical Director at Mosaic Digital in Brighton.

Oh and yeah I wrote a book on Functional Programming in PHP.

After nearly nine years in the United Kingdom it was time to say goodbye and try something new. So for various reasons we decided to move to Brisbane in Australia.

We came via a motorcycle track day at Donington, a long weekend in Bruges, Belgium and a second wedding/vowal renewal at Nesbitt Castle in Bulawayo, Zimbabwe.

Before leaving I’d organised a job with a company called Temando as a Lead Developer based here in Brisbane. I now work at a company called Aurion in Toowong as a Senior Developer primarily working with Node.js.

https://www.simonholywell.com/post/2016/09/brisbane/

Intelligent Vagrant and Ansible files

Simon Holywell Feb 8, 2016 Updated Nov 19, 2024

I use both Vagrant and Ansible to run and provision development virtual machines for testing work locally. This provides an easy to build environment as close to production as possible that all developers can easily create from the source code repository. A simple vagrant up and the associated Ansible scripts will handle all of the configuration and package installation for the VM. This is unbelievably handy and it really helps to reduce the kind of bugs that are difficult to track down - “it works on my machine!

Show full content

I use both Vagrant and Ansible to run and provision development virtual machines for testing work locally. This provides an easy to build environment as close to production as possible that all developers can easily create from the source code repository. A simple vagrant up and the associated Ansible scripts will handle all of the configuration and package installation for the VM.

This is unbelievably handy and it really helps to reduce the kind of bugs that are difficult to track down - “it works on my machine!”

Shared configuration

Recently, though I got to thinking about how the configuration is bound up in a rather unhelpful Vagrantfile, which is a Ruby script underneath in reality. The same configuration details need for the Vagrantfile will likely also be required by your provisioning scripts.

There are at least two ways to achieve this - each with their respective advantages and pitfalls. You can use a central file or pass the information as arguments to Ansible from the Vagrant provision commands. If you need to support machines that Ansible cannot run on then you’ll prefer the central configuration file as otherwise you need to pass the parameters in two locations. Using a bash script to support Windows machines is discussed further on.

Central configuration file

One way to work around this is to use a universal configuration file that both your provisioning scripts (Ansible, etc) and the Vagrantfile can read. The common thread between Ansible and Ruby (of course) is that they both parse YAML so a central config file is going to be the ticket. I am calling this file vagrant.yml and I have it sat at the same level as Vagrantfile in my projects.

In vagrant.yml you can have a structure like:

---
ip_address: 192.168.33.66
vm_name: example
server_domain: example.dev

From the Ruby script in Vagrantfile it is possible to parse the vagrant.yml configuration file and set the values against internal Vagrant options.

require 'yaml'
settings = YAML.load_file 'vagrant.yml'

Vagrant.configure("2") do |config|
  config.vm.network :private_network, ip: settings['ip_address']
end

You can also use these configuration details from Ansible project by loading it in a vars_files: directive. The variables will then become available in the global space. In the example code you can see the variables in use to define servername:.

---
- hosts: all
  sudo: true
  vars_files:
    - ../vagrant.yml
    - vars/common.yml
  vars:
    servername: "{{ server_domain }} www.{{ server_domain }} {{ ip_address }}"
    timezone: Europe/London
  roles:
    - init

Note that my Ansible configuration is in a subfolder hence the need to call the shared configuration with ../vagrant.yml.

Passed as arguments

Another way of having shared configuration between Vagrant and Ansible is to pass arguments from Vagrant into Ansible at provision time. This is done using the ansible API in your Vagrantfile and specifically the extra_vars property.

The sample code below illustrates how this might look in a simple Ansible backed Vagrant setup.

Vagrant.configure("2") do |config|
  ansible_inventory_dir = "ansible/hosts"

  config.vm.provision "ansible" do |ansible|
    ansible.playbook = "ansible/playbook.yml"
    ansible.inventory_path = "#{ansible_inventory_dir}/vagrant"
    ansible.limit = 'all'
    ansible.extra_vars = {
        vm_cores: cpus,
        vm_memory: mem,
        server_domain: servers['server_domain'],
        ip_address: servers['ip_address'],
        additional_server_domain_aliases: servers['additional_server_domain_aliases'],
        vm_user: settings['vm_user']
    }
  end
end

Just like the shared configuration these variables can be accessed in the global space of Ansible.

---
- hosts: all
  sudo: true
  vars_files:
    - vars/common.yml
  vars:
    servername: "{{ server_domain }} www.{{ server_domain }} {{ ip_address }}"
    timezone: Europe/London
  roles:
    - init

Dynamically create the Ansible inventory file

One aspect of projects that can be annoying to maintain or see committed into the project is the Ansible inventory file. Thankfully this can easily be automated from the Vagrantfile and the path dynamically set against Vagrant’s configuration.

In the code below the Ansible directory is set to a variable and then Ansible is set as the provisioning setup for Vagrant. This is all pretty much standard, but then the code moves onto handle the actual inventory file creation.

Vagrant.configure("2") do |config|
  ansible_inventory_dir = "ansible/hosts"

  config.vm.provision "ansible" do |ansible|
    ansible.playbook = "ansible/playbook.yml"
    ansible.inventory_path = "#{ansible_inventory_dir}/vagrant"
    ansible.limit = 'all'
  end

  # setup the ansible inventory file
  Dir.mkdir(ansible_inventory_dir) unless Dir.exist?(ansible_inventory_dir)
  File.open("#{ansible_inventory_dir}/vagrant" ,'w') do |f|
    f.write "[#{settings['vm_name']}]\n"
    f.write "#{settings['ip_address']}\n"
  end
end

It simply creates the directory if it doesn’t already exist and then opens the inventory file for writing whereupon it puts the machine name and IP address into the file. This is a simple way to save yourself a little work when creating new Ansible backed Vagrant projects.

Give the box all virtual cores and a quarter of the systems memory

Another tip I have picked up is from Stefan Wrobel’s article How to make Vagrant performance not suck. He suggests an automatic method for determining the number of CPU cores available on your host machine and then giving the Vagrant box access to all of them. To further increase performance you can also have the Vagrantfile calculate and assign a quarter of available host system memory.

The Ruby code to perform this is reasonably self explanatory and uses command line to establish the system resources.

Vagrant.configure("2") do |config|
  config.vm.provider :virtualbox do |v|
    v.name = settings['vm_name']

    # taken from http://www.stefanwrobel.com/how-to-make-vagrant-performance-not-suck#toc_1
    # assigns all available CPU cores and 1/4 of the host systems memory to the vm
    host = RbConfig::CONFIG['host_os']

    # Give VM 1/4 system memory & access to all cpu cores on the host
    if host =~ /darwin/
      cpus = `sysctl -n hw.ncpu`.to_i
      # sysctl returns Bytes and we need to convert to MB
      mem = `sysctl -n hw.memsize`.to_i / 1024 / 1024 / 4
    elsif host =~ /linux/
      cpus = `nproc`.to_i
      # meminfo shows KB and we need to convert to MB
      mem = `grep 'MemTotal' /proc/meminfo | sed -e 's/MemTotal://' -e 's/ kB//'`.to_i / 1024 / 4
    else # sorry Windows folks, I can't help you
      cpus = 2
      mem = 1024
    end

    v.customize ["modifyvm", :id, "--memory", mem]
    v.customize ["modifyvm", :id, "--cpus", cpus]
  end
end

Finally, using v.customize the values are set against the Vagrant configuration.

Provision without Ansible installed

It is possible to provision a Vagrant box on a system that doesn’t have Ansible installed by using a small shell script. This is the approach that phansible.com has taken and with some slight modification I have adopted.

The first step is two write some Ruby in the Vagrantfile that determines if Ansible is installed in the user’s path. If it is not then we should use the shell script as the provisioner.

# Check to determine whether we're on a windows or linux/os-x host,
# later on we use this to launch ansible in the supported way
# source: https://stackoverflow.com/questions/2108727/which-in-ruby-checking-if-program-exists-in-path-from-ruby
def which(cmd)
  exts = ENV['PATHEXT'] ? ENV['PATHEXT'].split(';') : ['']
  ENV['PATH'].split(File::PATH_SEPARATOR).each do |path|
    exts.each { |ext|
      exe = File.join(path, "#{cmd}#{ext}")
      return exe if File.executable? exe
    }
  end
  return nil
end

Vagrant.configure("2") do |config|
  if which('ansible-playbook')
    config.vm.provision "ansible" do |ansible|
      ansible.playbook = "ansible/playbook.yml"
      ansible.inventory_path = "#{ansible_inventory_dir}/vagrant"
      ansible.limit = 'all'
    end
  else
    config.vm.provision :shell, path: "ansible/windows.sh"
  end
end

This shell script will handle the base setup of the box before Ansible can run - so installing Ansible dependencies, then Ansible, setup SSH keys, link the Ansible inventory and finally running the playbook locally on the box. This script targets Ubuntu/Debian Vagrant boxes, but it could be adapted for other POSIX systems.

#!/usr/bin/env bash
sudo apt-get update
sudo apt-get install -y python-software-properties
sudo add-apt-repository -y ppa:ansible/ansible
sudo apt-get update
sudo apt-get install -y ansible
cp /vagrant/ansible/hosts/vagrant /etc/ansible/hosts -f
chmod 666 /etc/ansible/hosts
cat /vagrant/ansible/files/authorized_keys >> /home/vagrant/.ssh/authorized_keys
sudo ansible-playbook /vagrant/ansible/playbook.yml --connection=local

Some non-config related tips Handy plugins

In most of the configurations I prepare I also make use of vagrant-cachier and vagrant-hostsupdater. The former aims to prevent duplicate package downloads for a given Vagrant box so that subsequent provisions are faster. Hostsupdater will automatically add the IP address and host name of the project to your hosts file so that you don’t have to. Both of their configurations are pretty straight forward and dealt with on their respective project pages so I won’t duplicate effort here.

Moving the Vagrant and VirtualBox VMs to an external HDD

It is rare for a computer not to contain an SSD drive of some sort these days and they’re often set as the primary drive for the machine. This means that both Vagrant and Ansible will be storing their large VMs and files on your limited capacity (unless you’re ultra lucky) SSD. To free up space a USB 3.0 external HDD can really help without slowing down performance too much.

If you’ve already got a few boxes and/or VirtualBox VMs setup then this process can take some time as you will be copying large files - you may want to leave it over night rather than a cup of coffee! It is a pretty simple process though.

For the sake of this example I am going to assume the external HDD is mounted at /media/simon/mydrive/ and you’ll need to substitute this for your drive as you follow along.

The first step is to move the Vagrant home directory to a new location on your external hard drive.

rsync -av ~/.vagrant.d/ /media/simon/mydrive/.vagrant.d/
echo 'export VAGRANT_HOME="/media/simon/mydrive/.vagrant.d"' >> ~/.bash_profile

We’ve also added the new location to your .bash_profile so that it will automatically available when you boot your machine.

With that out of the way the bulk of the copying is still to come! Open the VirtualBox application and then in Preferences set the Default Machine Folder to /media/simon/mydrive/VirtualBox VMs. Now to move your current VMs to the new location.

rsync -av ~/VirtualBox VMs/ /media/simon/mydrive/VirtualBox VMs/

The next step maybe unnecessary, but you can then re-open VirtualBox and remove any VMs that are showing as inaccessible. To re-add them it you can simply run

find /media/simon/mydrive/VirtualBox VMs/ -iname *.vbox -exec vboxmanage registervm '{}' \;

Finally, you can now move your actual project directories to the external harddrive too and they’ll use the new locations for storage and access. They could also stay where they are as they are only small so up to you!

https://www.simonholywell.com/post/2016/02/intelligent-vagrant-and-ansible-files/

Scraping websites with wget and httrack

Simon Holywell Sep 5, 2015 Updated Nov 19, 2024

Scrapes can be useful to take static backups of websites or to catalogue a site before a rebuild. If you do online courses then it can also be useful to have as much of the course material as possible locally. Another use is to download HTML only ebooks for offline reading. There are two ways that I generally do this - one on the command line with wget and another through the GUI with httrack.

Show full content

Scrapes can be useful to take static backups of websites or to catalogue a site before a rebuild. If you do online courses then it can also be useful to have as much of the course material as possible locally. Another use is to download HTML only ebooks for offline reading.

There are two ways that I generally do this - one on the command line with wget and another through the GUI with httrack. By far the easiest if you want an entire site is the wget method so I’ll introduce that first.

I like to use the following command so that a browseable local copy is created. Two of the options that are useful to ensure this are --convert-links and --restrict-file-names=windows. The former converts any links into local relative URLs so that the site can be browsed locally and I am using --restrict-file-names for the purpose of ensuring safe file names. This is particularly relevant when the URLs you’re trying to scrape contain parameters.

wget -H -r --level=5 --restrict-file-names=windows --convert-links -e robots=off http://example.org

The rest of the options can easily be looked up on the wget manual page so I’ll leave that as an exercise for the reader to save some time.

For more complicated scrapes and those that require authentication in particular httrack is very handy. You do need a GUI though and it uses Chrome underneath to request the pages from what I can tell - at least for the WebHTTrack software for linux. There is a Windows version (I have never used it) which seems to have a typical Windows application interface from the screenshots on their site so I can’t be sure what this doing underneath.

The options for httrack are reasonably well documented on their site, but when you start it up it will walk you through a relatively straight forward wizard process anyway. Once it has started you’re able to pause and restart downloads, which is a nice feature and it automatically fixes the URLs so the site is browseable locally.

As I read on the train and complete course work I find both methods very handy - can also be used to download the videos too!

https://www.simonholywell.com/post/2015/09/scrape-site-with-wget-and-httrack/

Crop and resize images with bash and ImageMagick

Simon Holywell Aug 24, 2015 Updated Nov 19, 2024

Not wanting to repeat myself I have written a small bash script to handle the parallel processing of the post images for this site. This involves resizing, cropping and then compressing the images ready for the web. Currently the script supports both JPEG and PNG images for all these operations. On top of this I wanted to ensure that only recently added or modified images would be processed rather than processing the entire folder again.

Show full content

Not wanting to repeat myself I have written a small bash script to handle the parallel processing of the post images for this site. This involves resizing, cropping and then compressing the images ready for the web. Currently the script supports both JPEG and PNG images for all these operations.

On top of this I wanted to ensure that only recently added or modified images would be processed rather than processing the entire folder again. There is a handy option for touch that we’ll see later that makes this process much easier.

So let’s work through the bash script to slowly build it up into a working example. The first item on the agenda is to declare the hashbang for the script.

#! /usr/bin/env bash

Here we are using env to locate the bash executable - this should help to make the script more portable between systems rather than hard referencing /usr/bin/bash directly. Some systems might have bash in /bin/bash for example and using env will prevent this from breaking our script.

Now the script can begin in earnest by declaring a few variables to store the width and heights we want the final images to be. A temporary file path is also required to store the last run timestamp to prevent re-processing the same image twice.

TH_WIDTH=720
TH_HEIGHT=70

LG_WIDTH=720
LG_HEIGHT=480

TOUCH_FILE="last.run.time"

Across the article I will refer to thumbnail, TH and list image interchangeably - same goes for large, LG and post image.

If the touch file doesn’t exist then we need to create it and specify the timestamp to use as it’s default. As I am tracking the entire project in git the last git commit date will do for the default date. This will prevent any already committed in images from being run again.

if [ ! -f "$TOUCHFILE" ]; then
    # http://stackoverflow.com/a/19812608/461813
    LAST_COMMIT_TIMESTAMP=$(git show -s --format=%ct)
    # http://unix.stackexchange.com/a/36765/10219
    touch -d "@$LAST_COMMIT_TIMESTAMP" "$TOUCHFILE"
fi

There is one slight caveat here - if you clone the git repository then all the files will have a modification time of the clone date and not their original resize date. Therefore the resizing will be run against all images on initial clone. This is not an issue for me as I will rarely clone the repo - if it is for you then you could get the latest modification time across all the files and use that instead.

All of the images we wish to resize are stored in a directory called src so we need to find all the files in there that have a more recent modification time than the touch file. find has a handy switch -newer that will allow us to easily locate them.

FILES=$(find src -newer "$TOUCHFILE" -iname '*.jpg' -or -newer "$TOUCHFILE" -iname '*.png')

This will find all files that are newer than the touch file and that have either .jpg or .png extensions. If there are any then we want to resize and crop them to the correct dimensions using ImageMagick’s convert utility. To complicate this we’re also going to using GNU parallel to process the images across processors.

If you haven’t used parallel before it is probably worth checking out my other post to get an idea of the syntax and opportunities it provides.

To test that there are some files to process we can simply test it with the -n switch.

if [ -n "$FILES" ]; then
    # process the large images
    parallel -j8 convert "{}" -strip -resize "${LG_WIDTH}x${LG_HEIGHT}^" -gravity center -crop "${LG_WIDTH}x${LG_HEIGHT}+0+0" -filter catrom "t_post/{/}" ::: $FILES

    # process the image slices
    parallel -j8 convert "t_post/{/}" -gravity center -crop "${TH_WIDTH}x${TH_HEIGHT}+0+0" -filter catrom -extent "${TH_WIDTH}x${TH_HEIGHT}" +repage "t_list/{/}" ::: $FILES
fi

The cropping and resizing particulars can be researched in the ImageMagick manual so I won’t spend too much time covering it here. Note that the parallel utility uses the same syntax (pretty much) as xargs where the file names are passed into convert - as detailed in my previous post. Also note how $FILES is passed into the parallel command as an argument after the special ::: blockade.

So in the first call to parallel you can see {} being used - that is the file name/path as it is passed back from find without modification. You’ll see it used else where with {/}, which will be the same as {} except that it strips the preceeding path from the argument before printing it (eg. /var/www/index.html becomes index.html). You can also strip the extension from the argument with {.} giving /var/www/index when fed /var/www/index.html. Finally you can also combine the two; {./} produces index when given the same.

As the thumbnail quality is less important than the actual large image I have cheated a little performed the second crop and resize on the large image rather re-cutting from the src. This has two purposes; it is quicker to process a smaller image and it means the image is already at the correct width.

So now we have resized and cropped both our large and thumbnail image - it is time to compress them. Before we get into that however now is a good time to go over the required dependencies and how to install them. I have wrapped them all up into installation bash script you can use at the end of this of article too.

Handily some of the requirements can be obtained from Ubuntu/Debians’s repositories.

sudo apt-get install imagemagick optipng advancecomp parallel

This gives you the ImageMagick package to do the resizing and cropping, two PNG optimisation tools and GNU parallel to handle the multi-processor usage.

Compressing JPEGs nicely takes a little more work as we must manually compile the dependencies here - not at all hard I promise! To facilitate compilation we need to install some build tools from the repositories.

sudo apt-get install build-essential autoconf pkg-config nasm libtool git

With these in place we can turn our attention to mozjpeg which sits under our final library jpeg-archive.

git clone https://github.com/mozilla/mozjpeg.git
cd mozjpeg
autoreconf -fiv
./configure --with-jpeg8
make
sudo make install
cd -

Now that has been built and installed it is possible to jpeg-archive up and running with another simple build script.

git clone https://github.com/danielgtaylor/jpeg-archive.git
cd jpeg-archive
git checkout 2.1.1
make
sudo make install
cd -

After the dependencies are available we can get on with process of compressing the resized and cropped image files. It is essential that different file types are handled differently here. You cannot compress a PNG with the same tools as a JPEG and vice versa. Additionally I want to compress the thumbnail/list images more than the large/post images.

Let’s begin with handling the JPEG results first.

JPOST_FILES=$(find t_post -newer "$TOUCHFILE" -iname '*.jpg')
JLIST_FILES=$(find t_list -newer "$TOUCHFILE" -iname '*.jpg')

The next step is to loop over these results in parallel and apply the compression tools we installed earlier.

if [ -n "$JPOST_FILES" ]; then
    parallel -j8 jpeg-recompress --method smallfry --quality medium --min 60 "{}" "{}" ::: $JPOST_FILES
fi
if [ -n "$JLIST_FILES" ]; then
    parallel -j8 jpeg-recompress --method smallfry --quality low --min 50 "{}" "{}" ::: $JLIST_FILES
fi

From the jpeg-archive suite the above code is jpeg-recompress to perform the compression using the so called smallfry algorithm/technique. As you can see the thumnail/list and large/post images are handled separately and the options passed to the list jpeg-recompress are far more severe.

PNGs are simpler, because they’ve not got the same level of compression options. We’re going to use a PNG optimiser followed by a compressor/reducer (GZIP underneath essentially).

PNG_FILES=$(find t_post t_list -newer "$TOUCHFILE" -iname '*.png')
if [ -n "$PNG_FILES" ]; then
    parallel -j8 optipng -o 3 -fix "{}" -out "{}" ::: $PNG_FILES
    parallel -j8 advdef --shrink-extra -z "{}" ::: $PNG_FILES
fi

Together these two utilities will shave something like 10% or so off of a PNG image in my limited experience with 10 or so images.

With all the actual operations now complete it just remains to update the last.run.time file to prevent the same images being run over twice.

touch "$TOUCHFILE"

Simple! So, yes, it took some work to get here, but you’ve now got repeatable and efficient image manipulation with a small and easily modified bash script.

To make it easier to copy and paste and verify your final result the full installation and resize scripts are included below.

resize.sh

#! /usr/bin/env bash
LG_WIDTH=720
LG_HEIGHT=480

TH_WIDTH=720
TH_HEIGHT=70

TOUCHFILE="last.run.time"

if [ ! -f "$TOUCHFILE" ]; then
    # http://stackoverflow.com/a/19812608/461813
    LAST_COMMIT_TIMESTAMP=$(git show -s --format=%ct)
    # http://unix.stackexchange.com/a/36765/10219
    touch -d "@$LAST_COMMIT_TIMESTAMP" "$TOUCHFILE"
fi

echo "Resizing in post images"
FILES=$(find src -newer "$TOUCHFILE" -iname '*.jpg' -or -newer "$TOUCHFILE" -iname '*.png')

if [ -n "$FILES" ]; then
    # process the large images
    parallel -j8 convert "{}" -strip -resize "${LG_WIDTH}x${LG_HEIGHT}^" -gravity center -crop "${LG_WIDTH}x${LG_HEIGHT}+0+0" -filter catrom "t_post/{/}" ::: $FILES

    # process the image slices
    parallel -j8 convert "t_post/{/}" -gravity center -crop "${TH_WIDTH}x${TH_HEIGHT}+0+0" -filter catrom -extent "${TH_WIDTH}x${TH_HEIGHT}" +repage "t_list/{/}" ::: $FILES
fi

# compress jpg images
JPOST_FILES=$(find t_post -newer "$TOUCHFILE" -iname '*.jpg')
JLIST_FILES=$(find t_list -newer "$TOUCHFILE" -iname '*.jpg')
if [ -n "$JPOST_FILES" ]; then
    parallel -j8 jpeg-recompress --method smallfry --quality medium --min 60 "{}" "{}" ::: $JPOST_FILES
fi
if [ -n "$JLIST_FILES" ]; then
    parallel -j8 jpeg-recompress --method smallfry --quality low --min 50 "{}" "{}" ::: $JLIST_FILES
fi

# compress png images
PNG_FILES=$(find t_post t_list -newer "$TOUCHFILE" -iname '*.png')
if [ -n "$PNG_FILES" ]; then
    parallel -j8 optipng -o 3 -fix "{}" -out "{}" ::: $PNG_FILES
    parallel -j8 advdef --shrink-extra -z "{}" ::: $PNG_FILES
fi

echo " "
echo "Completed resize operation"
touch "$TOUCHFILE"

install.sh

echo "Installing imagemagick"
sudo apt-get install imagemagick

echo " "
echo "Installing optipng and advdef"
sudo apt-get install optipng advancecomp

echo " "
echo "Installing gnu parallel"
sudo apt-get install parallel

echo " "
echo "Installing mozjpeg"
sudo apt-get install build-essential autoconf pkg-config nasm libtool
git clone https://github.com/mozilla/mozjpeg.git
cd mozjpeg
autoreconf -fiv
./configure --with-jpeg8
make
sudo make install

cd -

echo " "
echo "Installing jpeg-archive"
git clone https://github.com/danielgtaylor/jpeg-archive.git
cd jpeg-archive
git checkout 2.1.1
make
sudo make install

https://www.simonholywell.com/post/2015/08/image-resize-crop-bash-imagemagick/

Namespace PHP functions

Simon Holywell Aug 10, 2015 Updated Nov 19, 2024

With the release of PHP 5.3 namespaces became a reality in PHP and they’ve made so much possible including better autoloading. The majority of the time you’ll be used to seeing them at the top of each class file. They can also be used to namespace functions however. A standard PHP namespace declaration would look similar to the following at the top of a class file. namespace Treffynnon\Html; class Tag { // .

Show full content

With the release of PHP 5.3 namespaces became a reality in PHP and they’ve made so much possible including better autoloading. The majority of the time you’ll be used to seeing them at the top of each class file. They can also be used to namespace functions however.

A standard PHP namespace declaration would look similar to the following at the top of a class file.

namespace Treffynnon\Html;

class Tag {
    // ...
}

In this scheme you should not have multiple namespaces in the same file - generally frowned upon anyway as class files should really only include the one class declaration.

Namespaces can actually wrap code - rather than just being declared at the top of files. There is an unbracketed syntax too, but I don’t like it so I am not going to use it here. I much prefer the clear boundaries of the wrapping braces and some indentation.

namespace Treffynnon\Html {
    function get() {
        return '';
    }
}

You can also have multiple namespaces in the same file this way although I would not recommend this in practice much like I don’t like multiple classes in the same PHP file.

namespace Treffynnon\Html {
    function get() {
        return '';
    }
}

namespace Treffynnon\Utils {
    function get() {
        return '';
    }
}

If you want to dip back into the global namespace you can do so by specifying a namespace without an explicit name.

namespace Treffynnon\Html {
    function get() {
        return '';
    }
}

namespace {
    function get() {
        return '';
    }
}

I have found this especially helpful in Drupal or Wordpress where you have hook functions that get called from/in the global namespace. These functions can be treated as entry points with the bulk of the actual code inside a module specific namespace.

namespace {
    use MyProject\MyModule as M;

    function my_module_menu() {
        return M\get_menu();
    }
}

namespace MyProject\MyModule {
    function get_menu() {
        return [
            // some Drupal menu implementation goes here
        ];
    }
}

This is a bit of a flimsy example, but you can see how it can be used to isolate your code from the huge global space of Drupal. It is more helpful with hooks that change the page or node and in the case of data a view. Wordpress is similar in this regard so I won’t go into more detail.

I have written an addendum to this post that covers importing and aliasing functions with PHP’s namespace keywords (use and as).

https://www.simonholywell.com/post/2015/08/namespace-php-functions/

SQL style guide

Simon Holywell Jul 24, 2015 Updated Nov 19, 2024

When you’re working in a team you need ways to easily share and denote good style and taste. This is true of your primary programming language with PEP8 for Python and PSRs 1 & 2 for PHP being well known. There is probably even a style guide for HTML and CSS set out at your company. So why should SQL miss out on the party? I have written a style guide for SQL to promote a consistent code style ensuring legible and maintainable projects - sqlstyle.

Show full content

When you’re working in a team you need ways to easily share and denote good style and taste. This is true of your primary programming language with PEP8 for Python and PSRs 1 & 2 for PHP being well known. There is probably even a style guide for HTML and CSS set out at your company. So why should SQL miss out on the party?

I have written a style guide for SQL to promote a consistent code style ensuring legible and maintainable projects - sqlstyle.guide.

SELECT a.title, a.release_date
  FROM albums AS a
 WHERE a.title = 'Charcoal Lane'
    OR a.title = 'The New Danger';

There are so many variant SQL styles that projects and people use which can make code difficult to easily read. Looking over various questions on Stackoverflow (on of which was mine!) I noticed that there were elements of good style that were shared by most examples.

I figured it was time that SQL had a concise and easy to read style guide that could easily be adopted and/or modified for bespoke requirements.

It is trivial to apply this style to your projects now or going forward. In the case of PHP you could have some code like the following.

$year = filter_input(INPUT_GET, 'year', FILTER_SANITIZE_NUMBER_INT);
$db = new PDO(
    'mysql:host=localhost;dbname=testdb;charset=utf8',
    'username',
    'password'
);
$statement = $db->prepare("
SELECT r.last_name,
       (SELECT MAX(YEAR(championship_date))
          FROM champions AS c
         WHERE c.last_name = r.last_name
           AND c.confirmed = 'Y') AS last_championship_year
  FROM riders AS r
 WHERE r.last_name IN
       (SELECT c.last_name
          FROM champions AS c
         WHERE YEAR(championship_date) > :year
           AND c.confirmed = 'Y');
");
$statement->bindParam(':year', $year, PDO::PARAM_INT)
$statement->execute();
$rows = $statement->fetchAll(PDO::FETCH_ASSOC);

To produce the guide I settled upon using GitHub Pages, Jekyll and Markdown sources. This means it is very easy to make forks, open issues and pull requests as GitHub Pages will handle the hosting and site build process. It is released under the Creative Commons Attribution-ShareAlike 4.0 International License.

The style in this guide is explicitly designed to be compatible with Joe Celko’s book SQL Programming Style so that teams who have already read that book will find the guide easy to adopt.

To read the guide you can simply visit sqlstyle.guide and to access the sources you can find the repository on GitHub.

If you like the guide please consider sharing it with your team and via twitter - thanks!

https://www.simonholywell.com/post/2015/07/sql-style-guide/

International PHP dates with intl

Simon Holywell Jul 21, 2015 Updated Nov 19, 2024

I wrote about localising dates (and other data) in a recent blog post, but unfortunately there were some shortcomings where time zones were concerned. As I alluded to in that post there is a way around this via the Intl extension that exposes a simple API to format DateTime instances. Thankfully this follow up post will be quite short as the setup is very simple for those of you on Ubuntu/Debian you can use the repositories.

Show full content

I wrote about localising dates (and other data) in a recent blog post, but unfortunately there were some shortcomings where time zones were concerned. As I alluded to in that post there is a way around this via the Intl extension that exposes a simple API to format DateTime instances.

Thankfully this follow up post will be quite short as the setup is very simple for those of you on Ubuntu/Debian you can use the repositories.

sudo apt-get install php5-intl

For other distributions you can use PECL to install Intl after building the dependencies - icu (on Redhat this is as simple as yum install libicu libicu-devel.x86_64).

pecl install intl
echo "extension=intl.so" >> /etc/php.ini

Don’t forget to restart your webserver (eg. Apache) to make the new extension available to PHP.

Now it is installed you can use the extension to format dates in code.

$DateTime = new DateTime();
$IntlDateFormatter = new IntlDateFormatter(
    'es_ES',
    IntlDateFormatter::FULL,
    IntlDateFormatter::FULL
);
echo $IntlDateFormatter->format($DateTime);
// martes, 21 de julio de 2015, 14:11:11 (Hora de verano británica)

In this example I am telling the IntlDateFormatter that I want to use the Spanish locale (es_ES) to print out the full date and time. By changing the constants you can alter the format of the date that will be printed.

So if you just want the time you could create a formatter in the following manner.

$DateTime = new DateTime();
$IntlDateFormatter = new IntlDateFormatter(
    'es_ES',
    IntlDateFormatter::NONE,
    IntlDateFormatter::FULL
);
echo $IntlDateFormatter->format($DateTime);
// 14:15:04 (Hora de verano británica)

There are a few constants that can be passed into both the second (date) and third (time) parameters of the IntlDateFormatter constructor to change output format.

IntlDateFormatter::NONE - exclude this element from display
IntlDateFormatter::SHORT - shortest format (22/07/2007)
IntlDateFormatter::MEDIUM - abbreviated format (Jul 22, 2007)
IntlDateFormatter::LONG - unabbreviated format (July 22, 2007)
IntlDateFormatter::FULL - full date information

Another example using the long format with the Spanish locale:

$DateTime = new DateTime();
$IntlDateFormatter = new IntlDateFormatter(
    'es_ES',
    IntlDateFormatter::LONG,
    IntlDateFormatter::LONG
);
echo $IntlDateFormatter->format($DateTime);
// 21 de julio de 2015, 14:25:41 GMT+1

So now you have seen how to change the format of the date we can look at the timezone aspect of the equation. Normally it uses the default PHP timezone, but this can be specified as the fourth parameter of the IntlDateFormatter constructor. You can pass in an instance of DateTimeZone or IntlTimeZone or a timezone string such as Europe/London or GMT+10.

$DateTime = new DateTime();
$IntlDateFormatter = new IntlDateFormatter(
    'es_ES',
    IntlDateFormatter::FULL,
    IntlDateFormatter::FULL,
    'Australia/Yancowinna'
);
echo $IntlDateFormatter->format($DateTime);
// martes, 21 de julio de 2015, 23:03:30 (Hora estándar de Australia central)

Incidentally, you can also pass other values into the format() method too and not just an instance of DateTime. You can also pass an IntlCalendar instance, the number of seconds since the Unix epoch (01/01/1970 00:00:00) or a an array compatible with localtime().

By default Intl uses the Gregorian calendar but it can also make use of other calendars by specifying a fifth parameter in the calls to IntlDateFormatter constructor. So by default the previous example would include a calendar specification like the following.

$DateTime = new DateTime();
$IntlDateFormatter = new IntlDateFormatter(
    'es_ES',
    IntlDateFormatter::FULL,
    IntlDateFormatter::FULL,
    'Australia/Yancowinna',
    IntlDateFormatter::GREGORIAN
);
echo $IntlDateFormatter->format($DateTime);
// martes, 21 de julio de 2015, 23:12:43 (Hora estándar de Australia central)

Should you wish to use another calendar it can be specified as part of the locale. In the following example I have decided to use the Buddhist calendar.

$DateTime = new DateTime();
$IntlDateFormatter = new IntlDateFormatter(
    'es_ES@calendar=buddhist',
    IntlDateFormatter::FULL,
    IntlDateFormatter::FULL,
    'Australia/Yancowinna',
    IntlDateFormatter::TRADITIONAL
);
echo $IntlDateFormatter->format($DateTime);
// martes, 21 de julio de 2558 BE, 23:16:08 (Hora estándar de Australia central)

Notice that I have changed the code for the locale (first parameter) to es_ES@calendar=buddhist and the calendar (fifth parameter) to IntlDateFormatter::TRADITIONAL. You could also use the Islamic calendar with:

$DateTime = new DateTime();
$IntlDateFormatter = new IntlDateFormatter(
    'es_ES@calendar=islamic',
    IntlDateFormatter::FULL,
    IntlDateFormatter::FULL,
    'Australia/Yancowinna',
    IntlDateFormatter::TRADITIONAL
);
echo $IntlDateFormatter->format($DateTime);
// martes, 5 de Shawwal de 1436 AH, 23:23:10 (Hora estándar de Australia central)

The calendars ICU allows you to play with include:

Japanese (@calendar=japanese)
Buddhist (@calendar=buddhist)
Chinese (@calendar=chinese)
Persian (@calendar=persian)
Indian (@calendar=indian)
Islamic (@calendar=islamic)
Hebrew (@calendar=hebrew)
Coptic (@calendar=coptic)
Ethiopic (@calendar=ethiopic)

So there you have it; localised time zone aware dates with PHP on multiple calendar types. If the provided formats suit your application then this is a simple way to ensure your date and time information is readable in various locations and languages.

https://www.simonholywell.com/post/2015/07/international-php-dates-with-intl/

PHP date localisation with setlocale

Simon Holywell Jul 20, 2015 Updated Nov 19, 2024

Localising sites can be a chore, but PHP has the venerable setlocale() to use system locales. These are like templates or profiles that describe how various types of data should be displayed. Should a price have a comma or point to indicate the decimals? When printing a date should PHP output Monday or Montag? All of these considerations are locale specific and they map to a geographical area. Various cultures have their own standards for displaying this kind of information not to mention different languages to accommodate.

Show full content

Localising sites can be a chore, but PHP has the venerable setlocale() to use system locales. These are like templates or profiles that describe how various types of data should be displayed. Should a price have a comma or point to indicate the decimals? When printing a date should PHP output Monday or Montag?

All of these considerations are locale specific and they map to a geographical area. Various cultures have their own standards for displaying this kind of information not to mention different languages to accommodate. This is why the locale name first specifies the language (en - English) and then the geographic location (GB - Great Britain).

The full locale name would look like en_GB to match ISO639 and RFC1766.

To find the list of locales that your system has available you can run a simple Linux command.

locale -a

This will return a simple list of locale files; something like the following:

C
C.UTF-8
POSIX
en_AG
en_AG.utf8
en_AU.utf8
en_BW.utf8
en_CA.utf8
en_DK.utf8
en_GB.utf8

If the locale you need is not there then you will need to install it on your machine. In my case I wanted to use a Spanish locale (es_ES) and as you can see from the list above it is not currently available.

First check if your system can install the required locale by browsing through the supported locales with:

less /usr/share/i18n/SUPPORTED

Having spotted the locale you wish to install you can now compile it from the sources on your machine.

sudo locale-gen es_ES es_ES.utf8

Alternatively, if you’re on Ubuntu, you can install it from the package repository using apt-get.

sudo apt-get install language-pack-es

Where es at the end is the language you want to install.

You can now run locale -a again and you will see your new locale available in the returned list. A further test you can do before moving to PHP is to run the locale from the command line. Firstly though you need to take a note of you current locale:

echo "$LC_ALL"

Make sure you keep the local that is printed to hand as you’re going to want to switch back to it later.

To change the default locale to es_ES on the command line you would first set value of the LC_ALL environment variable.

export LC_ALL=es_ES

Now we can do a simple test by printing out the current month and ensuring it is using our desired locale.

date +B%

The month should have been printed out in the language of the current locale - in my case this is Spanish so it would be Julio.

Now that is tested we can switch back to your original locale using that same export method from before.

export LC_ALL=en_GB

In my case I am switching back to en_GB, but you will want to substitute that for the locale you noted down earlier - you did note it down didn’t you?

This is great - we’ve now got a working locale that we can use from PHP to localise our content. (You may need to restart the webserver if you’re using one).

My concern here is to change the locale for dates (LC_TIME), but you can find out the other constants by looking at the relevant PHP manual page. For the remainder of this article I will concentrate on the time.

The first thing we must do is set the locale against the time.

setlocale(LC_TIME, 'es_ES');

Now to actually print a date you will need to make use of the strftime() function which basically wraps the date command we were using from the command line earlier.

echo strftime('%B'); // junio

You should note that the defacto PHP date handling functionality does not respect the prevailing locale like strftime() does.

echo date('F'); // June

Also the same is true of PHP’s DateTime classes too.

$date = new DateTime();
echo $date->format('F'); // June

One way around this is to pass the date out as a timestamp from DateTime and then format it using the strftime() function:

$date = new DateTime();
echo strftime("%B", $date->getTimestamp()); // junio

This is fine, but it does mean that you’re going to miss out on the time zone power of the DateTime library and have to revert back to using date_default_timezone_set to set the current time zone instead. By doing this you will miss out on the time zone conversion trick I wrote about previously; Convert UTC to local time.

Additionally strftime() does not support dates before the UNIX epoch (January 1, 1970). If these caveats are not a concern the setlocale() can provide a helpful way of localising the output of your PHP code.

Finally, there is a big red warning on the PHP manual page for setlocale() that applies to those running multithreaded servers:

Warning The locale information is maintained per process, not per thread. If you are running PHP on a multithreaded server API like IIS, HHVM or Apache on Windows, you may experience sudden changes in locale settings while a script is running, though the script itself never called setlocale(). This happens due to other scripts running in different threads of the same process at the same time, changing the process-wide locale using setlocale().

Another way around this is to use the Intl extension for PHP with its IntlDateFormatter class, but that is another post!

https://www.simonholywell.com/post/2015/07/php-date-setlocale-localisation/

Simultaneously benchmark many URLs with ApacheBench and GNU parallel

Simon Holywell Jun 25, 2015 Updated Nov 19, 2024

Once in a while you come across situations where someone wants to know what a server can do or how many requests it can handle under a realistic load scenario. It could simply be that you want to hit a large selection of sites or even that you want to simultaneously hit a number of different pages on the same site. In my case I am testing the performance of a Drupal multisite installation where one core set of code is shared by many sites on different URLs.

Show full content

Once in a while you come across situations where someone wants to know what a server can do or how many requests it can handle under a realistic load scenario. It could simply be that you want to hit a large selection of sites or even that you want to simultaneously hit a number of different pages on the same site.

In my case I am testing the performance of a Drupal multisite installation where one core set of code is shared by many sites on different URLs. I wanted to find out how many simultaneous requests the server would be able to handle when key URLs in each of the sites were interacted with. In production the respective load on each site was estimated to be approximately the same which made it easier as I can just replicate the same scenario on each site/URL.

This can be difficult to achieve as you need to simulate traffic across a number of website URLs as many of the benchmarking tools, including ApacheBench, do not support this. Whilst ApacheBench can perform concurrent requests to one URL it cannot do the same across a number of URLs or domains.

A way to work around this limitation is to make use of Ole Tange’s GNU parallel utility. When piped a list of arguments parallel will execute a command against them concurrently (generally limited by the number of physical CPUs where on job maps to one processor). This means you can take any crusty Linux utility that runs serially and turn it into a concurrently executed task with minimal effort. (If you are using parallel with gzip though you might prefer pigz instead.) On top of that it is also possible to farm out the parallel processing to other machines if you have them.

If you are paying attention then you will probably have noticed that I just described a rudimentary botnet. You could take a server of limited resources offline in a DDOS (Distributed Denial of Service) style attack using the method I am going to describe here. Please use this knowledge to interact with networks and hardware you own though and be responsible. Of course this is not the most efficient or practical means of performing such an attack anyway so you would be wasting your own time as well as your targets time.

Note

You could easily bring your own site offline or piss off your hosting provider so do not actually execute this against a URL without being sure of the consequences first. Basically, do not run it against your production server!

So now I have the obligatory warning out of the way we can get on with the good stuff.

If you are running Ubuntu (like me) or Debian then GNU parallel is really easy to install (other distros may be easy too - I have not tested).

sudo apt-get install parallel

In addition, if you have not already installed it, you will have to install ApacheBench (often known simply as ab).

sudo apt-get install apache2-utils

Should you have a number of servers you want to network the jobs out to then also perform the same installation steps above on them too. If you do not install parallel on the other hosts then the process will only have access to one CPU core. It is also important to note here that parallel uses SSH communicate with the other machines so you will want to setup password-less login on those machines - I previously wrote about this in Securing SSH with Key Based Authentication.

A simple ApacheBench test might be to make 100 requests with up 10 of those occurring concurrently at any one time. This simple test should be easy for most webservers to shrug off.

ab -n 100 -c 10 "http://www.example.org"

You will be given a report back from ApacheBench containing all the vital stats of the benchmark run. As you can see this gives you concurrent tests on one URL or domain so this is where parallel will step into parallelize the benchmarking process.

(echo "http://www.example.org"; echo "http://www.example.com") | parallel 'ab -n 100 -c 10 {}'

This command pipes two URLs into the parallel utility which then fires up a process running ab -n 100 -c 10 against one of the URLs. The results will be printed to screen in the order that the jobs are completed and not necessarily in the order the URLs are specified. This is normal and to be expected when working with parallel implementations, but it might seem strange at first.

You might be wondering what the weird {} near the end of the command means and why it might be needed. parallel uses the same syntax as xargs for handling the argument substitution in the command. In this case when the ab command is run parallel will substitute the token {} for the URL it has been passed. There are a number of other options and things you can do such as substitution without a file extension {.}, which are described on the manual page.

It is not strictly necessary in the commands we are doing here as “…[where] the command line contains no replacement strings then {} will be appended to the command…”, but I like to be explicit should I later want to add any other arguments or options to the command.

If you want to maintain the order then you can simply pass parallel the -k switch as an argument.

(echo "http://www.example.org"; echo "http://www.example.com") | parallel -k 'ab -n 100 -c 10 {}'

As this is not really important though further examples will omit this parameter for brevity.

Another option you may wish to tweak is the number of jobs you want parallel to run concurrently using the -j option. Normally this would be mapped to the number of cores you have available on your machine (-j+0), but you can change it as you see fit. The following code would be limited to just two concurrent processes.

(echo "http://www.example.org"; echo "http://www.example.com") | parallel -j2 'ab -n 100 -c 10 {}'

What if you have more than two URLs to test though? You could keep chaining calls to echo for each one, but that would be a pain. The easiest method is to put all your target URLs into a text file with each URL on a newline like my URLs.txt file below.

http://example.org
http://www.example.net
http://tools.example.com
http://secure.example.tk/my-passwords.html

This can then easily be passed into parallel through the use of the cat utility.

cat URLs.txt | parallel 'ab -c 10 -n 100 {}'

Results for all four URLs will then be printed to screen. To speed the whole thing up we can employ other machines to handle some processing as I mentioned previously. This can be performed with a simple list of IP addresses or hostnames provided to parallel with the -S option. Do not forget that parallel communicates via SSH so you will need to setup password-less access to each of these servers before continuing - I previously wrote about this in Securing SSH with Key Based Authentication.

cat URLs.txt | parallel -S 192.168.0.215,192.168.0.99,: 'ab -c 10 -n 100 {}'

The colon (:) at the end of the list specifies that I also want the job to run on the local machine too.

To make the process more transparent it is also possible to get parallel to generate an ETA and some server usage statistics with the --eta switch.

cat URLs.txt | parallel --eta -S 192.168.0.215,192.168.0.99,: 'ab -c 10 -n 100 {}'

This will cause parallel to output some additional information breaking down the percentage of jobs each server processed and the time it took.

ETA: 33s 8left 4.23avg  192.168.0.215:2/4/31%/14.0s  192.168.0.99:2/4/31%/10.1s  local:4/9/28%/6.2s

To increase the load on the server you can either add extra URLs to the text file or you can adjust the options passed to ab. In the following example -c and -n are increased tenfold.

cat URLs.txt | parallel --eta -S 192.168.0.215,192.168.0.99,: 'ab -c 100 -n 1000 {}'

Now you have seen how parallel can help you perform many benchmarking requests against many URLs using ApacheBench I will throw in a little bonus. parallel can be used to run all the jobs on all the available computers so it could actually be used to roll out a change to all machines as part of system orchestration. It is not designed to do this, but it can! Another use might be performing a benchmark on all machines to determine the best.

To make our earlier example with ApacheBench run all jobs on all available machines it as simple as:

cat URLs.txt | parallel --onall --eta -S 192.168.0.215,192.168.0.99,: 'ab -c 100 -n 1000 {}'

There is one design decision that Ole made here though; if you use --onall and also specify -j the value passed to the latter will be used to determine the number of machines to login into in parallel and not the number of jobs to run in parallel. This is an important distinction that the manual describes thus:

Run all the jobs on all computers given with -S. GNU parallel will log into -j number of computers in parallel and run one job at a time on the computer. The order of the jobs will not be changed, but some computers may finish before others.

Ole has also written about it on StackOverflow:

You are hitting a design decision: What does -j mean when you run –onall? The decision is that -j is the number of hosts to run on simultaneously (in your case 2). This was done so that it would be easy to run commands serially on a number of hosts in parallel.

To work round it he suggests wrapping the call to parallel with another call to parallel, which in our example would look like:

cat URLs.txt | parallel parallel -j2 --onall --eta -S 192.168.0.215,192.168.0.99,: 'ab -c 100 -n 1000 {}'

There is so much more you can do with both ApacheBench and GNU parallel so you should have a quick look over their respective manuals and resources too.

https://www.simonholywell.com/post/2015/06/parallel-benchmark-many-urls-with-apachebench/

Memoization or function cache

Simon Holywell May 18, 2015 Updated Nov 19, 2024

A little known feature of PHP’s static keyword is that it allows for memoization or function caching. This is a process whereby a functions heavy lifting can be cached so that subsequent calls are faster. It is possible to store any value in a memoized way such as arrays or even objects. This is done without any external side effects - that is to say that the code calling the function will require no changes to support memoization.

Show full content

A little known feature of PHP’s static keyword is that it allows for memoization or function caching. This is a process whereby a functions heavy lifting can be cached so that subsequent calls are faster.

It is possible to store any value in a memoized way such as arrays or even objects. This is done without any external side effects - that is to say that the code calling the function will require no changes to support memoization.

This can be illustrated in the following example:

<?php
function get_static_rand() {
    static $rand = null;
    if(is_null($rand)) {
        $rand = rand();
    }
    return $rand;
}
echo get_static_rand(); // 985873932
echo get_static_rand(); // 985873932

As you can each time the function is called it will return the same value, but just how does this work? Well let’s run down from the top.

First up a static variable is declared ($rand) and this is the hidden secret that performs all the magic. It causes PHP to cache the value of the variable between calls to the function.

Next we set a value against the variable by calling rand(). Note that this only occurs if the $rand variable is null. Without this check the value would be changed on every call to get_static_rand().

Finally $rand is returned. It is clear from this example that you can implement this without changing the functions external API and in a way that subsequent calls reference cached information.

In contrast the same function without memoization would look something like this:

<?php
function get_rand() {
    return rand();
}
echo get_rand(); // 1487005861
echo get_rand(); // 1262820787

Here is another simple example that takes a slowish call to read a JSON encoded data set from a local file and parses it. The file does not change between function calls so it is an ideal candidate for memoization.

<?php
function get_json_index($index) {
    static $json = null;
    if(is_null($json)) {
        $json = json_decode(
            file_get_contents('stock_codes.json')
        );
    }
    if(array_key_exists($index, $json)) {
        return $json[$index];
    }
}

As you can see the subsequent calls to the $json variable will use the pre-chewed data.

This effectively eliminates multiple calls to file_get_contents() and json_decode(). Of course in the real production code you would include more checks before opening the file such as; does it exist? Is it actually JSON?

Through memoization you would remove the overhead of these checks during subsequent calls to get_json_index() without the surrounding code needing to worry about the caching strategy used.

Whilst I have shown examples of functions this technique can also be used in exactly the same way on class methods not demonstrated here.

Although a simple technique it is very often overlooked or unknown. If you only need to cache a value in a function for the current execution cycle (on a web server this is one request) then there is no need to get complicated.

Caching mechanisms utilising NoSQL datastores such as memcached or redis or even the venerable APC can be overkill in many cases. The caveat to bear in mind however are that the variable is only accessible in and from the current execution cycle.

On the next cycle the value will be reset and other processes will have no access to it. So if you have many parallel workers operating on the same cache then this is not the method for you.

One final example I will include involves caching a function based on its parameters. A common place that this might be useful is when your caching database queries that vary dependent upon the functions parameters. In this case we might like to cache the blog posts by tag as they’re used in a listing and in a latest posts widget on the same page.

<?php
function get_blog_list($tag) {
    static $_c = array();
    if(!array_key_exists($tag, $_c)) {
        $_c[$tag] = get_blogs_by_tag($tag);
    }
    return $_c[$tag];
}

You can probably think of a number of other ways that this simple pattern can be used to speed up operations. Either way it is a handy technique both for functional and object oriented programmers.

It is also worth a quick note, even though it is obvious, that whilst this technique will give you a faster running site it will consume more memory so don’t go memoizing everything!

note

If you’re interested in more functional techniques like this then checkout my book; Functional Programming in PHP. A guide to advanced and powerful functional programming techniques in your favourite language.

https://www.simonholywell.com/post/2015/05/2015-05-18-memoization-or-function-cache/

PHP Function Objects

Simon Holywell Apr 16, 2015 Updated Nov 19, 2024

It is possible to treat a class instance as a function in PHP. Quite often this is referred to as a functor even though it should really be known as a function object. This is because functions actually serve a different role in languages that support their use. The convenience of having a reusable function that can be overloaded and carry a context is something to weigh up against using functions or closures.

Show full content

It is possible to treat a class instance as a function in PHP. Quite often this is referred to as a functor even though it should really be known as a function object. This is because functions actually serve a different role in languages that support their use.

The convenience of having a reusable function that can be overloaded and carry a context is something to weigh up against using functions or closures.

<?php
class MyFunctionObject {
   public function __invoke($name) {
       echo "My name is $name.";
   }
}
$MyFunctionObject = new MyFunctionObject;
echo $MyFunctionObject(‘Simon’); // My name is Simon.

The usual method for creating autoloaded functions is to create them as static methods on a class. This does work, but of course carries with it no context.

<?php
class MyStatic {
    public static function myName($name) {
        echo "My name is $name.";
    }
}

echo MyStatic::myName('Simon'); // My name is Simon.

Coming at it from another angle you can also bind closures to class instances. This then allows you access to the classes context from within your closure. Of course you miss out on the autoloading in this case. Also it may make the code harder to read if the binding is not obvious.

<?php
class MyContextClass {
    public $context = 'My name is ';
}
$my_function = new function($name) {
    echo $this->content . "$name";
}
$MyContextClass = new MyContextClass;
$my_function->bindTo($MyContextClass);

$my_function('Simon'); // My name is Simon

So as you can see a function object is a happy medium that allows you the best of both. The main thing to note is that you can only have one function per object.

An instance of a function can be passed into anything expecting a callable and it will be executed just like any other function. They also allow you to alias any function object using PHPs namespace syntax:

<?php
use MyFunctionObject as F;

This is a small benefit that you cannot currently get from any other method already discussed. PHP has a pending patch to make this possible with regular functions as well.

There are at least two downfalls of function objects however and the first of which is most annoying. You cannot currently call the function inline without creating the instance.

<?php
$f = new F;
echo $f();

There is currently an RFC and patch targeted at PHP 5.6 that will allow for the following syntax:

<?php
echo new F();

Additionally function objects can be frowned upon as they are not self documenting. Every __invoke() API can be different making predictable use difficult.

In my research most IDE’s are not able to give parameter hints for the function objects either. Additionally once an object is instantiated it is not clear if it is a function object or not. Without documentation it can be difficult for an implementer to know whether to call it via the function notation or to continue using an OOP So there are a number of reasons that this feature is so underused, but it certainly does have it’s uses. Now you know about it and and how it works you may have the killer blow for that problem that has been nagging you.

note

If you’re interested in more functional techniques like this then checkout my book; Functional Programming in PHP. A guide to advanced and powerful functional programming techniques in your favourite language.

https://www.simonholywell.com/post/2015/04/php-function-objects/

Functional Programming in PHP - The book

Simon Holywell Aug 29, 2014 Updated Nov 19, 2024

After working hard on the guide to Functional Programming in PHP I am pleased to announce that it has been published by php[architect]! The book is offcially now available and you can purchase your very own copy! If you’re a programmer who wants less bugs and easier testing then this is the functional introduction for you. Throughout the chapters I gently lead you through the various functional constructs available in and with PHP.

Show full content

After working hard on the guide to Functional Programming in PHP I am pleased to announce that it has been published by php[architect]! The book is offcially now available and you can purchase your very own copy!

If you’re a programmer who wants less bugs and easier testing then this is the functional introduction for you. Throughout the chapters I gently lead you through the various functional constructs available in and with PHP.

You will witness the power of map/reduce and the ease of composition monads can bring to your code.

Even if functional isn’t your bag then (as any functional aware programmer will tell you) there is much you can apply from functional to OOP code. It gives you a new and beneficial perspective on coding be it procedural or object oriented.

Of course functional style code and object oriented code can also co-exist in PHP just as they can in Scala. You can use a functional approach where it makes sense to you and an object oriented one where it is an advantage in PHP.

Functional and event driven programming are here to stay - buy the book now and get ahead of the curve.

Increasingly computing power is being delivered through the use of multiple cores in CPUs, which require code that can run in parallel to fully exploit their power. Functional programming provides a way to reduce the difficulties that surround concurrent code. As computing requirements increase multi-process programming will be unavoidable - even in PHP.

This book provides a way for developers to gain an understanding of functional programming in their own language. Functional programming is a Turing complete methodology, which means that anything you can achieve procedural or object oriented code you can also complete in a functional style.

By the end you will have seen and implemented a number of core functional concepts and patterns leaving you empowered to include them in your own projects.

For more information please see the Functional PHP book website or buy the book through php[architect] or Amazon.

https://www.simonholywell.com/post/2014/08/functional-programming-in-php-the-book/

Functional Programming on Three Devs and a Maybe

Simon Holywell Jul 24, 2014 Updated Nov 19, 2024

I was recently invited to join Edd and Michael to appear on the Three Devs and a Maybe podcast to discuss function programming. The recording of our chat is now available so head on over and have a listen. If you haven’t listened to the podcast before there are some 34 past episodes archived there as well! note If you are interested in finding out more about my book on functional programming in PHP or to subscribe for notification of its release please visit functionalphp.

Show full content

I was recently invited to join Edd and Michael to appear on the Three Devs and a Maybe podcast to discuss function programming. The recording of our chat is now available so head on over and have a listen.

If you haven’t listened to the podcast before there are some 34 past episodes archived there as well!

note

If you are interested in finding out more about my book on functional programming in PHP or to subscribe for notification of its release please visit functionalphp.com or follow @FunctionalPHP on Twitter.

https://www.simonholywell.com/post/2014/07/2014-07-24-three-devs-and-a-maybe-podcast/

HHVM vs Zephir vs PHP: The showdown

Simon Holywell Feb 28, 2014 Updated Nov 19, 2024

Since its inception the slow running speed of PHP has been widely publicised and over the years there have been a number of improvements. The first Zend Engine arrived with PHP4 and delivered various performance enhancements (among other features). Each release since this time has delivered some sort of increased efficiency in one way or another. It has become more interesting recently however with three projects looking for improvements in different ways.

Show full content

Since its inception the slow running speed of PHP has been widely publicised and over the years there have been a number of improvements. The first Zend Engine arrived with PHP4 and delivered various performance enhancements (among other features). Each release since this time has delivered some sort of increased efficiency in one way or another.

It has become more interesting recently however with three projects looking for improvements in different ways. The core has adopted the Zend OPcache for future versions of PHP, Facebook has been working on a just in time compiler called HipHop VM and the team that brought us Phalcon framework have created Zephir.

All of these projects have chosen to tackle the issue of PHP’s speed via different avenues. It has therefore left one simple question - who’s making the biggest improvements? Who’s the fastest?

With this question in my mind I decided to do something quite ridiculous and write a simple benchmarking setup to test the various ways these projects can be employed. Yes, there is one outright winner in this particular benchmark, but it is important not to get hung up on that.

Like I mentioned all of these techniques are different and therefore they are likely to be a better fit for differing situations. Although winning in terms of outright speed in this particular test it may not work for you from another perspective. Each carries with it certain side effects or caveats that you’ll need to take into consideration.

You should take all factors into consideration and additionally bear in mind that all benchmarks are flawed. The only way to truly test it out is to use real algorithms in a production environment. By using this benchmark code I am focusing on one particular and simplified problem.

Of course, as in this case, it is not always reasonably possible to port a sufficiently complex and realistic problem to all benchmark targets. So it is typical to pick a trivial, but computationally intense problem that can easily be implemented in all of the benchmark subjects.

Now that I have addressed some of the fundamental assumptions of benchmarks; let’s meet the contenders!

The contenders HHVM

We’ll begin by introducing Facebook’s PHP runtime, which has been receiving a lot of attention recently. The project is interchangeably known as HHVM, HipHop VM and less frequently HipHop Virtual Machine.

Facebook originally created HipHop VM to replace HPHPc which was their PHP to C++ compiler. They also sought to speed up their application infrastructure through the use of just in time compilation. More recently they have put a lot of effort into improving compatibility with existing PHP libraries (including Idiorm and Paris).

This means that it is now possible to run many PHP applications directly on HHVM and take advantage of its JIT (just in time compiler) to increase the speed of code. There are still some rough edges and some aspects that will not work, but there are regular commits on the project from the core team. It would seem that eventually HHVM will become fully compatible parallel runtime for PHP code.

PHP 5.5 and OPcache

PHP and Zend have also been busy trying to make PHP faster and with the advent of OPcache they have shaved between 10% to 20% off of PHP processes. It is a modern replacement for the bytecode cache features of the APC PECL extension. Unlike APC it doesn’t have the userland memory key/value store and it is entirely focused on the caching and optimisation of PHP code.

I am not au fait with the techniques that these caches use, but reading through the available documentation I found the following high level explanation. Zend OPcache offers increased “performance by storing precompiled bytecode in shared memory” to reduce reads from disk. Additionally it will apply a number of “bytecode optimization patterns” to the code decreasing execution times.

Zephir

In a separate effort and taking a different direction the team behind the Phalcon framework for PHP have been working on Zephir. Phalcon is a web application framework written in C and bound as a PHP extension with the aim of being the fastest framework for PHP. It is worth mentioning here that Phalcon certainly is not the first to take this approach with Yaf existing before it’s inception.

With the pursuit of speed however there are usually some trade-offs and in the case of Phalcon it is difficult for PHP developers to understand the source code in the framework. This hampers adoption and also makes it difficult to encourage community contributions to the project. So they’re in the process of re-writing Phalcon on top of their own language called Zephir.

Zephir is a fairly simple language that is sort of a mixture of PHP and some aspects of C. When compiled Zephir converts it’s code into the lower level C underpinnings and the resulting library is installed as a PHP extension. In some ways it is a little bit like a cross between a PECL extension and Facebooks old HPHPc project.

The work

This benchmark uses the Mandelbrot Set fractal as it’s algorithm of work for no particular reason other than the fractals look pretty when compiled. It is reasonably intensive computationally and does take a little bit of time for all test subjects to complete.

I did not actually write most of the code that implements the Mandelbrot Set as this was already available from the The Computer Language Benchmarks Game under it’s BSD licence.

All the C based tests are extended from the C gcc #2 base programme and the PHP oriented languages (including Zephir) are based on the PHP programme.

I did however make a number of changes - for example the code now writes to a stream rather than directly to STDOUT and it can produce ASCII art interpretations or Portable Bitmap binary files. In these benchmarks all the programmes are set to create ASCII and print the result out to STDOUT.

To test out the various ways of creating faster PHP I have implemented the algorithm in the following ways:

Plain C
Plain PHP
PHP Extension (just like PECL)
HHVM’s HACK/PHP++/PHQ
HHVM Extension
Zephir CBLOCK (C code dumped inside Zephir)
Zephir Optimiser (C code accessible to Zephir - kinda like an extension)
Plain Zephir Lang

If you’re interested in finding out more about the code used to produce these sets of statistics then you can checkout the repository for the code on GitHub.

The command line problem

note

Sara Golemon has graciously got in touch to help me cover an important caveat in the benchmarking in this article. If you didn’t already know Sara is on the HHVM team at Facebook and she has written PHP internals, extensions, articles and the Extending and Embedding PHP book.

She has prepared the following section that describes some of the possible short-comings in my method of benchmarking these PHP run times.

tl;dr: most PHP is run in the server where start up and shutdown times do not directly affect the run times in the way they do in CLI tests like I have used.

Every test run in this suite was executed on the command-line with a fresh process environment for each. This inherently biases the results in favor of pre-compiled solutions first, and strongly against a multi-threaded JIT based approach. In the real world, using a long-running webserver, these startup costs would disappear in the noise and we could focus solely on the per-request time.

In the case of PHP, the script must be recompiled to bytecode on every invocation since the bytecodes are not saved to disk. Worse, in fact, with an OpCache forcibly enabled (which it’s not normally for CLI), we must then make a second copy of those bytecodes into shared memory (shared only with ourselves) before execution can begin.

Similarly, in the case of HHVM, forcibly turning on the JIT incurs extra startup cost (though with a runtime benefit) since the script must be compiled from PHP to bytecode*, then from bytecode into machine code. For short-running scripts, the extra compilation time is often worse than the JIT savings, so it’s disabled from the command line by default.

A proper comparison of these technologies would require a warmed up multi-request environment, probably with each running as a FastCGI server using a basic fcgi client over unix socket to reduce the overhead of making the request.

Bottom line, these results are like most benchmarks: Only as good as the methodology.

* Normally HHVM caches the bytecode compilation to disk, however this test suite may be negating that due to the cache being inadvertently deleted or overwritten.

note

If anyone wants to help to improve the tests then please submit any pull requests on the GitHub project. This project was a way for me to play with all the elements and I was under no illusions that there would be potential for improvement.

As I tried to make clear in this article you shouldn’t take these results as imperical or even as givens. Benchmarks test isolated things and will not be directly applicable to your situation. Unless you’re calculating the Mandelbrot set then these results are but a possible indication of a trend.

The fight

So how does PHP 5.5 with OPcache compare with Facebook’s Hip Hop VM or Phalcon’s Zephir project?

From here on out I will be addressing this question in terms of outright speed only in terms of seconds elapsed. During the benching I did grab other statistics from the processes, but I will leave these for later discussion.

When actually performing the benchmarks I used a very simple system to account for variance between runs of the same code. Instead of just running each set of code once I ran them for 1, 20 and 40 iterations. This then allowed me to take an average of the results and therefore hopefully arriving at a fairer figure.

In addition to the tests that have already been mentioned I also tested each HHVM item with the JIT on and off with the same going for PHP and it’s OPcache.

The machine I am running the benchmarks on is a Intel® Core™ i3 CPU 530 @ 2.93GHz x 4 with 8GB of RAM with a installation of Linux Mint for OS. The versions of the relevant software are PHP 5.5.8, Zephir (fc08fab1e - Feb 3 2014) and HHVM (55212b92 - Jan 21 2014).

To time each run I simply used the Linux utility time, which can also gather other information such as processor load for the task and memory usage.

The results are in

To be completely honest the results are not really shocking or that different to what you would expect I imagine. Going into this testing I already an order in my mind and an idea of the orders of magnitude that might exist between the various techniques. Needless to say I was so close to right as to make this benchmark almost entirely pointless.

With this particular benchmark Zephir is the outright fastest, followed by HHVM and then PHP where they are all set to use optimum speed. If you would like to see all the statistics up close then you can checkout the graphs I prepared using D3 or read the raw CSV data dumps.

Based on this I would make the following loose recommendations:

You need outright speed = standard PHP C extension
You need non-C programmers to help maintain it = Zephir Lang
You cannot port away from PHP but can install other runtimes = HHVM
Your only deployment path is PHP = setup and enable the OPcache

As I mentioned previously; none of this is actually that much of a shock or a departure from the recommendations I already had in my mind. Additionally, you should of course do your own testing/benching using your specific domain model and not just take my word for it!

The caveats

I should point out here that Zephir lang is not as simple as it may seem and at times the syntax can make problems harder to express. I also found myself regularly having to look at the compiled C source code output to debug the operations within my code. This is something that would be difficult for someone with no prior knowledge of C or the PHP extension architecture.

Zephir is a nice project and it does bring with it a number of performance advantages, but I would agree with the project maintainers that it is not a general purpose language (at least not yet). For the time being it is more focused on the issues it was built to solve in the Phalcon project.

One of the unexpected outcomes was to be from HHVM with it’s JIT free run coming dead last. According to Facebook their HHVM without the compilation should run at the same speed as PHP. In my testing it was much much slower.

It is therefore worth noting here that by default HHVM does not JIT scripts when it is run from the command line. So if you do find yourself regularly running HHVM CLI scripts then don’t forget to set the appropriate flag: -vEval.Jit=1.

Running your own benchmarks

If you want to gather your own statistics in the same way that I have done above then you can use my code. It is all up on GitHub under a liberal 3-clause BSD licence.

In the repository you’ll find a handy readme that goes some way to explaining how it all works and how to compile and run all the code yourself. In some of the subdirectories there are also readmes that are related solely to that technique.

It would be pretty easy to add your own benchmarking algorithm or test out various other techniques using the loose benching “framework” I have thrown together here.

Should you find any bugs or have any suggestions for improvement then please report them on the repository issue tracker.

https://www.simonholywell.com/post/2014/02/2014-02-28-hhvm-zephir-php-benchmark/

Functional PHP talks

Simon Holywell Feb 13, 2014 Updated Nov 19, 2024

I was recently invited to speak about functional programming in PHP for both BrightonPHP and PHP Hampshire. The details of which are in a previous blog post. If you attended either talk and you’ve yet to leave feedback then please do on the respective Joind.in pages: Brighton PHP joind.in page PHP Hampshire joind.in page You can view the slides from the sessions on my website. I created the slides using reveal.

Show full content

I was recently invited to speak about functional programming in PHP for both BrightonPHP and PHP Hampshire. The details of which are in a previous blog post.

If you attended either talk and you’ve yet to leave feedback then please do on the respective Joind.in pages:

You can view the slides from the sessions on my website. I created the slides using reveal.js and my pandoc boilerplate project.

The main take-aways from the talk are:

Avoid state
Keep things small
Make reusable components
Separate logic from data

These principles apply no matter whether you’re writing in a functional style or object oriented approach. Learning functional programming is great idea even if you do not intend to use it day to day. Things you learn in a functional paradigm are directly applicable to your object oriented code too.

You should follow @BrightonPHP and @PHPHants to keep up to date with these meetups in the future.

I enjoyed presenting at both events and it was great to meet you all!

note

If you are interested in finding out more about my book on functional programming in PHP or to subscribe for notification of its release please visit functionalphp.com or follow @FunctionalPHP on Twitter.

https://www.simonholywell.com/post/2014/02/functional-php-talks/

Speaking about Functional PHP at BrightonPHP and PHP Hampshire

Simon Holywell Jan 9, 2014 Updated Nov 19, 2024

I have been invited to speak at both the upcoming meetings of BrightonPHP and PHP Hampshire about functional programming. This is off the back of the site I created for my (soon to be released) book tentatively entitled Functional Programming in PHP. To get a better idea of what the talk will include I have prepared an abstract: In the PHP world functions are generally sneered at due to their simplicity and perceived as an evil side effect of spaghetti code.

Show full content

I have been invited to speak at both the upcoming meetings of BrightonPHP and PHP Hampshire about functional programming. This is off the back of the site I created for my (soon to be released) book tentatively entitled Functional Programming in PHP.

To get a better idea of what the talk will include I have prepared an abstract:

In the PHP world functions are generally sneered at due to their simplicity and perceived as an evil side effect of spaghetti code. This is not necessarily the case however as when functions are combined in a logical manner they can be very powerful.

In fact they can be deployed to great effect in all manner of applications to create advanced and potentially less error prone software.

This talk will take the form of a gentle introduction to functional programming concepts in a PHP context. It will cater to a variety of levels of knowledge. Right from those who have never heard of functional programming to coders who have been practicing aspects for years in other languages (JavaScript!) - perhaps without even knowing.

During my talk you’ll hear some history, functional theory (introduced gently I promise) and of course some practical examples. You definitely do not need to be a mathematician or expert/functional coder to enjoy this session.

Both are not-for-profit PHP user groups setup on the south coast of England that organise free to attend monthly meetups.

Brighton - January

The BrightonPHP talk is first up on Monday the 20th of January 2014 at 19:00 and will be held at The Skiff in the North Laine area of Brighton (easily accessible by train). To register your attendance for the evening please visit the associated Lanyrd page and there is also a joind.in page for the talk.

You should follow @BrightonPHP to keep up to date with this meetup in the future.

Portsmouth - February

On Wednesday the 12th of February I will be speaking at the PHP Hampshire user group; again starting at 7pm. The group meets at the Oasis Conference Centre on Arundel Street every month. You can claim your free spot by going to the EventBrite page for the meetup and there is also another joind.in page for the PHP Hampshire talk.

You should follow @PHPHants to keep up to date with this meetup in the future.

I look forward to presenting at both events and hopefully I will see you there!

note

If you are interested in finding out more about my book on functional programming in PHP or to subscribe for notification of its release please visit functionalphp.com or follow @FunctionalPHP on Twitter.

https://www.simonholywell.com/post/2014/01/speaking-at-brightonphp-and-php-hampshire/

Add a duration or interval to a date

Simon Holywell Jan 3, 2014 Updated Nov 19, 2024

In PHP you can easily add a duration to a DateTime instance in a number of ways. I will review the most common methods for completing the task starting with those available on the DateTime object itself. If you are running PHP 5.2 then the only way to achieve this is to call the modify method. This allows you to pass in a date format, but in this case we are most interested in the relative formats.

Show full content

In PHP you can easily add a duration to a DateTime instance in a number of ways. I will review the most common methods for completing the task starting with those available on the DateTime object itself.

If you are running PHP 5.2 then the only way to achieve this is to call the modify method. This allows you to pass in a date format, but in this case we are most interested in the relative formats.

So if we want to add one day to a date we would use the format +1 day implemented as so:

<?php
$date = new DateTime('2014-01-03');
$date->modify('+1 day');
echo $date->format('Y-m-d'); // 2014-01-04

PHP date and time handling was significantly improved in PHP 5.3+ to include a specific add (and subtract) method to DateTime. Although instead of simply including a text format you must use a DateInterval to represent the duration.

In addition the format is slightly different to incorporate the ISO 8601 duration specification. All intervals should begin with a P (period) and include a integer representation of the interval followed by a period designator such as D for day.

This would simply be P1D for our 1 day time period. A more complex example would be P4Y1M2D, which is 2 days, 1 month and 4 years.

A handy addition to the format specification is the availability of W for weeks such as P4Y1W, which works out as 1 week and 4 years. You should note that W cannot be combined with days because W units get converted into days.

You can follow this with a T if you wish to include a time period. Just as with the date you include an integer for the interval period followed by a period designator.

Should you want to make our period 1 day and 2 hours then the format would be P1DT2H. To make it more interesting here is a full format P4Y1M2DT1H2M3S that works out as 2 days, 1 month, 4 years and 1 hour, 2 minutes and 2 seconds.

Of course it is also possible to only specify the time portion of the interval. This allows you to add periods of an hour or even seconds to a DateTime instance. Logically this is specified as PT1H2M3S for a interval of 1 hour, 2 minutes and 2 seconds.

One final note on the formatting; you must specify the units in descending order where you are using more than one period designator. Essentially this means that larger units such as years (Y) should come before smaller units like seconds (S).

With this in mind we can see that P4YT2S and P4Y1D are valid formats whereas PT2S4Y and P1D4Y are not.

It is worth pointing out here that you can use the same relative formats from the the DateTime::modify() method if you prefer using the DateInterval::createFromDateString() static method (see PHP documentation).

So back to the original example of adding one day to our DateTime instance:

<?php
$date = new DateTime('2014-01-03');
$date->add(new DateInterval('P1D'));
echo $date->format('Y-m-d'); // 2014-01-04

You can also subtract intervals from DateTime instances as well:

<?php
$date = new DateTime('2014-01-03');
$date->sub(new DateInterval('P1D'));
echo $date->format('Y-m-d'); // 2014-01-02

As you can see PHP provides a rich API for dealing with dates and times and their modification via simple instructions. It is also possible to adjust a DateTime instances time zone as I have previously blogged in Convert UTC/GMT or any time zone to local time in PHP.

Based on this I would point out that if you are still using the older procedural API for your PHP date operations then you are missing out on a lot of the power afforded you by the languages standard library.

For a final tidbit I would recommend a little further reading about the DateInterval::format() method, which can also be handy when debugging intervals.

https://www.simonholywell.com/post/2014/01/add-a-duration-or-interval-to-a-date/

Reverse a git pull request on GitHub the hard way

Simon Holywell Dec 12, 2013 Updated Nov 19, 2024

As you may know I am currently the maintainer of both Idiorm and Paris; well recently I merged in what looked to be an innocuous pull request from a contributor. Unfortunately this merge had unintended consequences and basically broke the backwards compatibility of the library. Shame on me! After waiting for a patch that would fix the problem and coming up short I decided enough was enough. So today I backed out the errant merges in both Idiorm and Paris.

Show full content

As you may know I am currently the maintainer of both Idiorm and Paris; well recently I merged in what looked to be an innocuous pull request from a contributor. Unfortunately this merge had unintended consequences and basically broke the backwards compatibility of the library. Shame on me!

After waiting for a patch that would fix the problem and coming up short I decided enough was enough. So today I backed out the errant merges in both Idiorm and Paris.

Now it is important to note here that both repositories had commits and merges in their history post the introduction of this rogue pull request. In addition these changes had been out in the wild for quite some time.

With this in mind it would not be possible to simply amend the previous commit, rebase the changes or reset to a previous commit. These would either break the history or in the case of amend simply be impossible as HEAD had moved on.

To do this I used a technique that I could find little reference to online - a reverse patch application. This is actually very simple with GitHub as you can use an equally unknown feature to obtain a patch file from a pull request.

You can go to any commit or pull request in the GitHub web interface and append .patch onto the end of the URL. This will then spit out the full raw commit or pull request as a pull request patch file.

For example if you have a pull request at https://github.com/user/project/pull/123 then you should access https://github.com/user/project/pull/123.patch to gain access to the aforementioned patch file. So either download this via your web browser or use wget on the command line.

wget https://github.com/user/project/pull/123.patch

We can now use git to reverse apply the patch:

git apply -R 123.patch
git status

You will now see that the files affected by the patch are in a modified state and you can commit them in just like you would any other commit or patch application.

Once committed you can push the changes to your projects GitHub project without worrying about breaking its commit history or angering other contributors! This technique can of course be used with any git patch file and not just those obtained via GitHub.

note

This is not THE way to do this, but the hard way. This is more of a post to expose these two features as they are not so well known. You would of course usually perform this task using git revert with the merge commit. There are tonnes of posts out there about that though so what would be the fun in that!

https://www.simonholywell.com/post/2013/12/reverse-github-pull-request/

Convert UTC/GMT or any time zone to local time in PHP

Simon Holywell Dec 11, 2013 Updated Nov 19, 2024

Wrangling dates and times can be a somewhat arduous task for all programmers. One very common requirement is to convert a time from one time zone to another. In PHP this is greatly simplified with the DateTime standard library classes and especially DateTimeZone. For this example let us assume we have a UTC date and time string (2011-04-27 02:45) that we would like to convert to ACST (Australian Central Standard Time).

Show full content

Wrangling dates and times can be a somewhat arduous task for all programmers. One very common requirement is to convert a time from one time zone to another.

In PHP this is greatly simplified with the DateTime standard library classes and especially DateTimeZone.

For this example let us assume we have a UTC date and time string (2011-04-27 02:45) that we would like to convert to ACST (Australian Central Standard Time).

<?php
$utc_date = DateTime::createFromFormat(
    'Y-m-d G:i',
    '2011-04-27 02:45',
    new DateTimeZone('UTC')
);

$acst_date = clone $utc_date; // we don't want PHP's default pass object by reference here
$acst_date->setTimeZone(new DateTimeZone('Australia/Yancowinna'));

echo 'UTC:  ' . $utc_date->format('Y-m-d g:i A');  // UTC:  2011-04-27 2:45 AM
echo 'ACST: ' . $acst_date->format('Y-m-d g:i A'); // ACST: 2011-04-27 12:15 PM

After some experimentation between time zones that do and do not currently have DST (Day Light Savings Time) I have discovered that this will take DST into account.

Anyway a very simple tip for moving dates around in PHP.

https://www.simonholywell.com/post/2013/12/convert-utc-to-local-time/

Conditionally loaded responsive content on the client side

Simon Holywell Nov 21, 2013 Updated Nov 19, 2024

Sometimes it is helpful to entirely change chunks of markup when a certain CSS media query is triggered. This could because a certain layout will not work on smaller screen sizes or because it refers to media that you would not want a mobile to download. Recently I had this exact problem when dealing with a site that included a grid based layout in a carousel/slider. When the browser was sized down from desktop it needed to change the number of columns in the grid.

Show full content

Sometimes it is helpful to entirely change chunks of markup when a certain CSS media query is triggered. This could because a certain layout will not work on smaller screen sizes or because it refers to media that you would not want a mobile to download.

Recently I had this exact problem when dealing with a site that included a grid based layout in a carousel/slider. When the browser was sized down from desktop it needed to change the number of columns in the grid. If a column is suddenly removed from the grid then where does the content residing in it go?

Well in this case it would slip to the next slide in the carousel, which then means that every slide in the carousel changes as all the items are reshuffled in the respective grids. This re-gridding of the content could not realistically be performed by JavaScript on the fly.

To workaround this I would need to have a way to have three different versions (one for each supported responsive break point) of the carousel in the page and only show the one relevant to the current media query. Before you run off screaming, no, I did not fire up three carousels and have them waiting for instruction!

You can have hidden HTML/content inside your document that you then conditionally load in at the right point in time using JavaScript. Firstly, to be able to match against media query break points you will want to use window.matchMedia and if you are stuck support <=IE9 you can use Paul Irish’s polyfill.

Next up we need to hide the content that we will conditionally load. Now there are a few ways to do this, but I settled on so called JavaScript templates. They are really simple just involve one script tag surrounding the content:

<script type="text/conditional-html" id="tablet-a-unique-reference">
  <p>I am a hidden block of HTML.</p>
</script>

You should note that this script tag is a little different to the ones you are probably used to writing. Instead of having a type of text/javascript they have a type of text/conditional-html. This is what causes the content to be hidden and not evaluated by JavaScript - it serves no other purpose.

Now let us actually go ahead and replace the content. For this demonstration I have assumed that we are moving from a desktop responsive breakpoint to a tablet one. The intial HTML loaded with the page is that of the desktop experience.

var div = document.getElementById("a-unique-reference"),
  desktop_html = div.innerHTML,
  tablet_html = document.getElementById("tablet-a-unique-reference").innerHTML;

The above code just grabs the HTML from the currently displayed <div> and also the content of the JavaScript template we setup above.

var checkAndChangeHtml = function () {
  if (matchMedia("screen and (max-width : 62.000em)").matches) {
    div.innerHTML = tablet_html;
  } else {
    div.innerHTML = desktop_html;
  }
};
window.addEventListener("resize", checkAndChangeHtml);
checkAndChangeHtml(); // call it once on document load incase it
// initially loads at a lower width/on a mobile device

Now this is where the action takes place. The code adds a listener to resize event so that we can run our code when the user resizes their browser. If the width of the browser falls into our tablet media query range then the div has it’s content set to the tablet HTML.

You should wrap all the of the above in the following to ensure that the DOM is ready (supports >=IE8):

document.onreadystatechange = function () {
  if (document.readyState == "interactive") {
    // code would go here
  }
};

Obviously if you prefer you can use jQuery to abstract away these event bindings.

So there you have it. A very simple way of conditionally loading content on the client side for use in responsive designs.

note

After writing this I discovered that someone had already released something called ResponsiveComments that is similar. It however uses more mark up as it wraps the hidden content in a <div> and then HTML comments (). I do not like this approach as it inserts block level items into your source code unlike the script tag example given here. Divs have layout and script tags do not.

https://www.simonholywell.com/post/2013/11/conditionally-loaded-responsive-content/

Idiorm and Paris 1.4.0 Released

Simon Holywell Sep 5, 2013 Updated Nov 19, 2024

After a lot of work and many contributions from valued community members Idiorm and Paris versions 1.4.0 have been released into the wild. You can download them now from their respective repositories or via Composer/Packagist. As you are probably aware from my previous posts Idiorm is a minimalist ORM that is targetted at PHP 5.2 and above. It combines a fluent query builder with a simple ORM interface to allow fast and easy access to databases over PDO.

Show full content

After a lot of work and many contributions from valued community members Idiorm and Paris versions 1.4.0 have been released into the wild. You can download them now from their respective repositories or via Composer/Packagist.

As you are probably aware from my previous posts Idiorm is a minimalist ORM that is targetted at PHP 5.2 and above. It combines a fluent query builder with a simple ORM interface to allow fast and easy access to databases over PDO. On top of this can sit Paris, if you so choose, which acts as a lightweight active record implementation to allow for more defined functionality and models. Both Idiorm and Paris have shared the same minimalist philosophy since their inception with the key aim being to provide compact and simple, but powerful code. This makes for a smaller and easier to learn set of libraries with less namespace pollution (it supports PHP 5.2 remember).

This lightweight and simple philosophy has been challenged along the way with requests for ever more complicated features to be added to the libraries. Unfortunately this has led to some pull requests being summarily closed over the past year or so, but for good reason. It is this very reason that has led Jamie and myself to decide that both Idiorm and Paris are more than feature complete at this time. So many useful features have been submitted over the years and we have been able to adopt most of them - some of them are out there as forks.

So what am I alluding to? Idiorm and Paris will no longer receive new features going forward. Of course bug fixes will still be maintained along with support of the libraries via the respective GitHub issue trackers. So if you find any issues please report them or better yet if you have bug fixing patches please submit them (preferably with a regression test or two).

Should you wish to add features to either library or raise their minimum requirement above PHP 5.2 then you are more than welcome to create your own fork for this purpose. You can, if you like, open a pull request, but it is likely to be given short shrift if I am honest.

Anyway with that out of the way there are a few niceties that Idiorm 1.4.0 brings with it:

find_many() now returns an associative array with the databases primary ID as the array keys
Calls to set() and set_expr() return $this allowing them to be chained
PSR-1 compliant camelCase methods can be mapped to the original underscore ones (PHP 5.3+ required due to __callStatic() usage``)
Add get_config() to access configuration values
Support for a logging callback function allowing usuage of any logging library
MS SQL TOP select query limit style support
And many smaller bug fixes

Paris 1.4.0 also comes with a few new features:

Ability to methods against the model class directly eg. User::find_many() (PHP 5.3+ required)
find_many() now returns an associative array with the databases primary ID as the array keys
PSR-1 compliant camelCase methods can be mapped to the original underscore ones (PHP 5.3+ required due to __callStatic() usage``)
And a couple of smaller bug fixes

There you have it. Idiorm and Paris have matured!

https://www.simonholywell.com/post/2013/09/idiorm-and-paris-1-4-0-released/

Improve PHP session cookie security

Simon Holywell May 14, 2013 Updated Nov 19, 2024

The security of session handling in PHP can easily be enhanced through the use of a few configuration settings and the addition of an SSL certificate. Whilst this topic has been covered numerous times before it still bears mentioning with a large number of PHP sites and servers having not implemented these features. To prevent session hijacking through cross site scripting (XSS) you should always filter and escape all user supplied values before printing them to screen.

Show full content

The security of session handling in PHP can easily be enhanced through the use of a few configuration settings and the addition of an SSL certificate. Whilst this topic has been covered numerous times before it still bears mentioning with a large number of PHP sites and servers having not implemented these features.

To prevent session hijacking through cross site scripting (XSS) you should always filter and escape all user supplied values before printing them to screen. However some bugs may slip through or a piece of legacy code might be vulnerable so it makes sense to also make use of browser protections against XSS.

By specifying the HttpOnly flag when setting the session cookie you can tell a users browser not to expose the cookie to client side scripting such as JavaScript. This makes it harder for an attacker to hijack the session ID and masquerade as the effected user.

A helpful setting has been added to the PHP configuration to automate this process this for you.

session.cookie_httponly = 1

It is also a good idea to make sure that PHP only uses cookies for sessions and disallow session ID passing as a GET parameter:

session.use_only_cookies = 1

It is important to point out that HttpOnly, whilst useful as another layer in the onion of security is not going to protect a user from other forms of XSS attack. As previously mentioned on GNUCitizen session hijacking is often avoided by attackers as it requires getting and keeping a user in the right state in the target application. You must ensure that the rest of you application is not XSS vulnerable to prevent attackers utilising other vectors.

So it is just one very small step in making your PHP installation slightly more secure, but if you are not doing it then you are failing to exploit all the avenues available to you.

Another important way to increase the security of PHP sessions in your application is to install an SSL certificate on the web server and force all user interactions to occur over HTTPS only. This will prevent the users session ID from being transmitted in plain text to make it much harder to hijack the users session.

Helpfully PHP has another ini setting to assist you in ensuring session cookies are only sent over secure connections (thank you to Padraic for reminding me):

session.cookie_secure = 1

If you liked this post then you’ll probably also like 3 things I set on new servers for more security tips.

https://www.simonholywell.com/post/2013/05/improve-php-session-cookie-security/

3 things I set on new servers

Simon Holywell Apr 23, 2013 Updated Nov 19, 2024

There are a number of things you can do to make a server more secure whilst protecting your hosted entities and their users. Here are just three of the many things I do on every new server I commission. I hasten to add that these are not necessarily the most effective or at the top of my list - they are just that: 3 things I set on new servers.

Show full content

There are a number of things you can do to make a server more secure whilst protecting your hosted entities and their users. Here are just three of the many things I do on every new server I commission. I hasten to add that these are not necessarily the most effective or at the top of my list - they are just that: 3 things I set on new servers.

Prevent framing

One method of attempting to fool users into interacting with a website is by loading it into an iframe on the attackers page. More formally known as Clickjacking; this technique involves overlaying the iframe with a different user interface.

As you can imagine this makes it possible for the attacker to mask the actions of the user. They think that they are using the attacker supplied interface when in actual fact they are performing actions on the background application in the iframe.

There are a couple of techniques that can be employed to combat this threat and protect your users. At the server level and as javascript in the web pages themselves.

On the server you can set the X-Frame-Options header, which tells the web browser how to treat the page when it is framed. It is possible to set this header to DENY, which blocks all loading of the page via frames. By setting it to SAMEORIGIN you can relax the restrict and only allow framing by pages on the same domain.

On the Apache webserver this directive is set like so (on Debian/Ubuntu servers this is /etc/apache2/apache2.conf):

Header always append X-Frame-Options SAMEORIGIN

Alternatively if you are using nginx then you can implement it in the following way:

add_header X-Frame-Options SAMEORIGIN;

Do not forget of course that Apache configuration changes require an Apache server reload or restart.

Unfortunately this header is only supported on more recent browsers with the following minimum requirements:

Firefox 3.6.9
Chrome 4.1.249.1042
Internet Explorer 8
Opera 10.50
Safari 4.0

In addition to the two aforementioned settings there is also ALLOW-FROM to whitelist a certain domain.

Header always append X-Frame-Options ALLOW-FROM http://foo.com

There are at least three problems with this particular option though:

It is only (currently) supported in Firefox 18 and above as it is a relatively new addition to the specification in 2012.
For some reason it was not foreseen that someone implementing this directive would want to list a number of domains to be allowed to frame the content so you can only specify one domain.
Why add a hyphen between the words when SAMEORIGIN does not? Maybe there is a technical reason, but from the outside it just creates an inconsistency.

When any of these options are set and an invalid framing occurs Firefox will show a blank page, but Internet Explorer rather more helpfully will display an error for the user.

Now for legacy browsers you will need to drop back to using a JavaScript framebusting code. It goes without saying however that this can be circumvented by a potential attacker through techniques such as double framing and exploiting cross site scripting filters in some browsers.

if (top != self) {
  top.location = self.location;
}

The latest recommendation from The Open Web Application Security Project (OWASP) is to include the following code in the <head> section of your web page:

<style id="antiClickjack">
  body {
    display: none !important;
  }
</style>
<script type="text/javascript">
  if (self === top) {
    var antiClickjack = document.getElementById("antiClickjack");
    antiClickjack.parentNode.removeChild(antiClickjack);
  } else {
    top.location = self.location;
  }
</script>

This works by disabling the whole page using the CSS style at the beginning and then later on in the javascript checking to see that the page is not framed. It then removes the style from the pages HTML thereby revealing the content. If it is framed then it sets itself as the parent page.

It should be noted that if the user has JavaScript turned off or it does not run due to an error then they will not be able to see anything of the page.

HTTP TRACE

Apache responds to debugging requests via the HTTP TRACE method with the exact request that was received. In general this does not appear harmful, but it can be used to find out information about a users request. An attacker will attempt to trick a user into making a TRACE request and then obtain information about session cookies or authentication from the response. The information can then be used to exploit other potential vulnerabilities such as cross site scripting.

It is possible to disable this helpful debug functionality in production by setting the TraceEnable Apache directive in the following fashion:

TraceEnable Off

Do not forget of course that Apache configuration changes require a server restart.

If you do not have access to the main Apache configuration then you can still block TRACE requests using mod_rewrite:

RewriteEngine On
RewriteCond %{REQUEST_METHOD} ^(TRACE|TRACK)
RewriteRule .* - [F]

You will note the addition of a rule to prevent TRACK requests as well, which is a similar debugging method.

For the nginx users out there; you have no need to worry as from version 0.5.17 all TRACE requests are rejected with a 405 error and it has no concept of a TRACK request. To debug this server you are expected to have access to the logs on disk.

Hide your versions

Another super simple, but often overlooked adjustment to make is to prevent the server from broadcasting too much information about itself. Whilst attackers maybe able to source the information in other ways the harder we make it the more likely potential attackers are to give up and move onto a softer target. It is similar to introducing yourself to someone and giving them specific details about yourself such as “I rarely lock the back window when I pop into town”.

Firstly, let us silence the Apache server a little. Gag it with the following configuration changes:

ServerSignature Off
ServerTokens ProductOnly

nginx servers can be muffled by setting the following directives in the configuration after installing the module:

server_tokens off;

# Install HttpHeadersMoreModule first for this one
more_set_headers 'Server: Nginx';

Secondly, PHP is also being too noisy about its presence. Put a sock in it by editing the core php.ini file (typically /etc/php5/apache2/php.ini on Debian/Ubuntu servers):

expose_php Off

Once again do not forget of course that Apache and PHP configuration changes require an Apache server restart.

Join in the conversation about this article on Hacker News or reddit.

If you liked this post then you might also like my follow up post Improve PHP session security.

https://www.simonholywell.com/post/2013/04/three-things-i-set-on-new-servers/

Install Netbeans and Scala on Ubuntu

Simon Holywell Mar 17, 2013 Updated Nov 19, 2024

If you want to install and run the latest version of Scala and/or Netbeans then you cannot simply install it from your distributions repositories or pre-built packages. It may sound easy enough to just grab Netbeans from their site and install it, but most Linux distributions no longer have Sun Java packages in their repositories. So after a little bit of mucking about, reading manual pages and documentation I struck upon the following method of setting it all up.

Show full content

If you want to install and run the latest version of Scala and/or Netbeans then you cannot simply install it from your distributions repositories or pre-built packages. It may sound easy enough to just grab Netbeans from their site and install it, but most Linux distributions no longer have Sun Java packages in their repositories.

So after a little bit of mucking about, reading manual pages and documentation I struck upon the following method of setting it all up. Fortunately it is very easy to install and I have used the following instructions to setup configure Ubuntu or Ubuntu like machines.

Installing Sun Java JDK

As I mentioned earlier most distributions no longer come with Sun Java and do not have the ability to include it in their extras/proprietary packages. This is caused by a change in the licencing of Java after Oracle bought out Sun. To compound this pain further Oracle do not produce packages for the various popular Linux variants.

Luckily a number of like minded people have set about creating custom packages for those of us that require Sun Java. On Ubuntu you can use the PPA produced by the team at webupd8.org to install Sun JDK.

sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update
sudo apt-get install oracle-java7-installer

To test and ensure that your system is running the correct version of Java you can execute the following command:

java -version

The resulting output should look something like the following:

java version “1.7.0_17”

Java(TM) SE Runtime Environment (build 1.7.0_17-b02)

Java HotSpot(TM) 64-Bit Server VM (build 23.7-b01, mixed mode)

The webup8.org team have a number of troubleshooting hints on their announcement article if you have some kind of trouble getting this PPA installed.

Installing Netbeans

Netbeans is nice and simple to install on Linux as it has a handy graphical wizard to lead you through the installation steps. I use the following commands to download and install it (7.3 is the latest version at the time of writing, but you can easily supply an releases download URL here):

cd ~
wget download.netbeans.org/netbeans/7.3/final/bundles/netbeans-7.3-linux.sh
chmod +x netbeans-7.3-linux.sh
./netbeans-7.3-linux.sh

The installation wizard will then start up after a moment and you can pretty much stick with the default options. This should mean that it will install Netbeans to a directory at /home/<user>/netbeans-7.3.

Of course if you are more comfortable with a GUI then you can simply download it from the Netbeans website in your web browser.

Installing Scala

Finally we get to Scala itself and thankfully it also has a reasonably simple set of setup instructions. You will need to download the latest release from the Scala website. Once again here are the commands I used to download and install the latest release of Scala on my machine:

wget http://www.scala-lang.org/downloads/distrib/files/scala-2.10.1.tgz
tar -xzf scala-2.10.1.tgz
sudo mv scala-2.10.1 /usr/share/scala

Now we need to tell Linux where to look when references to Scala are made so edit ~/.profile to add the following environment variables:

export SCALA_HOME=/usr/share/scala
export PATH=$PATH:$SCALA_HOME/bin

You will now need to log out and back in again for the environment setting to take effect.

As an aside and in case you did not already know; .bashrc is executed whenever a new bash session is started (such as opening a new terminal) and .profile is executed when a user logs in. Netbeans is not able to see environment variables set in .bashrc as it is run after the environment Netbeans is started in has been constructed. This is explained in more detail by a Netbeans issue report.

There is another way around this problem, but I prefer the .profile route as it universally available. However if you wish you can set a Netbeans startup configuration variable called scala.home on end of the netbeans_default_options setting in ~/netbeans-7.3/etc/netbeans.conf like so:

netbeans_default_options="-J-client -J-Xss2m -J-Xms32m -J-XX:PermSize=32m -J-Dapple.laf.useScreenMenuBar=true -J-Dapple.awt.graphics.UseQuartz=true -J-Dsun.java2d.noddraw=true -J-Dsun.java2d.dpiaware=true -J-Dsun.zip.disableMemoryMapping=true -J-Dsun.awt.disableMixing=true -J-Dscala.home=/usr/share/scala"

Installing nbscala

Now that Netbeans is successfully installed and it can see the SCALA_HOME environment variable we can install the nbscala plugin. The simplest method is to simply install the plugin using the Netbeans plugin manager, which can be found in Tools > Plugins > Available Plugins. Install all the available Scala plugins.

If you prefer a more manual route then you can download the latest nbscala from the project on SourceForge. It is then possible to manually install the plugin using the process documented on the Netbeans wiki.

Restart Netbeans and then go File > New Project... followed by choosing Scala > Scala Application and clicking Next. After giving it a name and location click Finish to be presented with your new application. With the example Main.scala file open in your main Netbeans pane hit F6 to compile and run the code.

You should be presented with a nice compile status message and the Hello World message in your output pane at the base of the Netbeans IDE.

The installation of all the tools is now complete and you can go ahead and begin developing with Scala!

https://www.simonholywell.com/post/2013/03/install-netbeans-scala-ubuntu/

Create a Google Talk bot with Node.js Part Two

Simon Holywell Mar 1, 2013 Updated Nov 19, 2024

In part one of the tutorial you built a bot with Node.js that could connect to the Google Talk network and announce its presence to other users with a status message. The bot was also configured to listen for subscription requests from other users and automatically accept them. Now you are going to further enhance the bot with additional functionality and commands as you proceed through part two of the tutorial.

Show full content

In part one of the tutorial you built a bot with Node.js that could connect to the Google Talk network and announce its presence to other users with a status message. The bot was also configured to listen for subscription requests from other users and automatically accept them.

Now you are going to further enhance the bot with additional functionality and commands as you proceed through part two of the tutorial. Once complete your bot will be able to furnish users with help information, bounce messages back and search twitter for user supplied keywords.

So lets get back to the hacking!

note

This article was originally published in the April 2012 issue of .net Magazine. You can also download a PDF of the original article.

Part one of this article is also published on my blog.

There is a demo bot for this article documented at http://njsbot.simonholywell.com and the complete code is on github.

Making the bot do something

Whilst it was fun and exciting to get your bot to this point, it is, unfortunately a little boring. Wouldn’t it be neat if the bot could receive commands and action them?

To begin with a simple callback registry is added to allow for commands to be easily added to the bot. Near the top of bot.js add a new object variable definition and two new functions:

var commands = {};
The first function allows you to easily add new functionality to the bot by simply setting up a callback function for a command name.
function add_command(command, callback) {
    commands[command] = callback;
}

Now to allow the execution of a request we need to check if the supplied command has been added to the bot’s registry with the add_command() function.

function execute_command(request) {
  if (typeof commands[request.command] === "function") {
    return commands[request.command](request);
  }
  return false;
}

These two functions mean that you can very easily add new commands to the bot. The following example changes the bot’s status message.

add_command("s", function (request) {
  set_status_message(request.argument);
  return true;
});

As you can see though a request object is passed as the parameter to the function, but where does this come from?

Dispatching requests

To handle the routing of incoming requests I like to use a simple dispatching mechanism that takes the incoming stanza and interprets it.

function message_dispatcher(stanza) {
  if ("error" === stanza.attrs.type) {
    util.log("[error] " + stanza.toString());
  } else if (stanza.is("message")) {
    util.log("[message] RECV: " + stanza.toString());
    var request = split_request(stanza);
    if (request) {
      if (!execute_command(request)) {
        send_unknown_command_message(request);
      }
    }
  }
}

The first step in the code addresses a common oversight; an XMPP client must not respond to a stanza with a type of error. In this instance we should log it to the console using util for debugging purposes.

If the stanza is a message then this function will log it to the console and attempt to split it up into a custom request object created by the split_request() function (defined in the next code block). Given that a valid request has been made it will then attempt to execute the supplied command.

Should the command not exist it will send the user an error message suggesting that they try again or consult the help text.

function split_request(stanza) {
  var message_body = stanza.getChildText("body");
  if (null !== message_body) {
    message_body = message_body.split(config.command_argument_separator);
    var command = message_body[0].trim().toLowerCase();
    if ("help" === command || "?" == command) {
      send_help_information(stanza.attrs.from);
    } else if (typeof message_body[1] !== "undefined") {
      return {
        command: command,
        argument: message_body[1].trim(),
        stanza: stanza,
      };
    }
  }
  return false;
}

This simply extracts the message text from the supplied stanza object and then splits the message using the regular expression defined earlier in the configuration file. By default this will split on a semicolon and allow for any amount of white space surrounding the separator (/\s*;\s*/).

The command is assumed to be the first part of the message before the separator and the argument is after it. The following is an example of what a command message sent by a user might look like:

s;This is the bot’s new status message!!

If the user has simply typed help or ? then we should send them a message explaining the available commands otherwise the function will construct the request object.

The request object consists of the command string, an argument string and the full initial stanza object that was sent to the bot for processing.

Now let’s attach the dispatcher to an event so that it is triggered when each new stanza comes in over the wire.

conn.addListener("stanza", message_dispatcher);

Handling errors and help

In the previous section the send_help_information() and send_unknown_command_message() functions were mentioned so here they are!

The help function outputs a simple message that describes how the user can make use of the bot’s commands.

function send_help_information(to_jid) {
  var message_body =
    "Currently bounce (b), twitter (t) and status (s) are supported:\n";
  message_body += "b;example text\n";
  message_body += "t;some search string\n";
  message_body += "s;A new status message\n";
  send_message(to_jid, message_body);
}

Should the user attempt to access a command that the bot does not recognise then they will be sent an error message.

function send_unknown_command_message(request) {
  send_message(
    request.stanza.attrs.from,
    'Unknown command: "' +
      request.command +
      '". Send "help" for more information.',
  );
}

As you may have noticed both of these functions depend upon the send_message() helper function that we have yet to define.

function send_message(to_jid, message_body) {
  var elem = new xmpp.Element("message", { to: to_jid, type: "chat" })
    .c("body")
    .t(message_body);
  conn.send(elem);
  util.log("[message] SENT: " + elem.up().toString());
}

This function accepts two strings as parameters; the JID of the receiving user and the message text. All messages that are sent are also logged to the console using the util package.

Finally we also need to add a handler for system error events originating from node-xmpp:

conn.on("error", function (stanza) {
  util.log("[error] " + stanza.toString());
});

Increasing one’s vocabulary

This is another good point to test your bot as there is already a command you can use: “s;My new status message” and error messages to provoke. This should be working flawlessly before you continue to add new functionality to the bot.

As the help function text alluded to earlier there will be two further commands added to the bot in this tutorial.

The first of which is the bounce functionality; when sent a message by the user the bot will bounce it straight back to them. This is a useful sanity check for when you are testing the bot as you add extra commands.

add_command("b", function (request) {
  send_message(request.stanza.attrs.from, request.stanza.getChildText("body"));
  return true;
});

Going global with Twitter

Next up the bot will talk to the outside world by searching Twitter for the user supplied request argument against the public JSON API.

add_command("t", function (request) {
  var to_jid = request.stanza.attrs.from;
  send_message(to_jid, "Searching twitter, please be patient...");
  var url =
    "http://search.twitter.com/search.json?rpp=5&show_user=true&lang=en&q=" +
    encodeURIComponent(request.argument);
  request_helper(url, function (error, response, body) {
    if (!error && response.statusCode == 200) {
      var body = JSON.parse(body);
      if (body.results.length) {
        for (var i in body.results) {
          send_message(to_jid, body.results[i].text);
        }
      } else {
        send_message(
          to_jid,
          "There are no results for your query. Please try again.",
        );
      }
    } else {
      send_message(
        to_jid,
        "Twitter was unable to provide a satisfactory response. Please try again.",
      );
    }
  });
  return true;
});

Whilst the function is the longest in the bot so far it is also possibly the easiest to follow so I will skip to the more interesting bits.

The request_helper object was created at the top of the bot.js file right at the beginning of the tutorial and now it is finally coming to some use. It is basically a nice simple wrapper around the httpClient functionality provided in Node.js by default.

If the request receives a satisfactory response then the JSON will be parsed and each tweet is sent to the user as a new message.

C ya l8r

So you now possess a complete Google Talk bot (don’t get too giddy!), which you can easily add new functionality to with add_command(). It is pretty neat, but what use is it?

Perhaps you could program it to message you whenever a server drops offline or when your continuous integration builds break.

If your bot needs to handle exceptionally large numbers of users then you may wish to consider refactoring it as an XMPP Component rather than a XMPP Client (node-xmpp makes this very easy). The Twitter bot reached its limit with 40,000 subscribers though so there is plenty of head room with a client based bot.

Further resources

Stay in touch

If you liked this article then please follow me on Twitter and let me know.

note

Part one of this article is also published on my blog.

This article was originally published in the April 2012 issue of .net Magazine.

In tandem with the Google bot tutorial I wrote four smaller articles:

The Node.js eco-system

XMPP and Jabber

Node.js in the real world

Node.js for hosting websites

https://www.simonholywell.com/post/2013/03/create-a-node-js-google-talk-bot-pt2/

Node.js for hosting websites

Simon Holywell Feb 26, 2013 Updated Nov 19, 2024

Whilst Node.js is primarily aimed at creating non-blocking servers it can also be used to host simple web pages such as homepages and blogs. We are going to be using a simple web framework for Node.js called Express (http://expressjs.com), which can be installed via the Node Package Manager on the command line. npm install express Firstly, create a new file called server.js and begin by instantiating the express framework. var app = require("express").

Show full content

Whilst Node.js is primarily aimed at creating non-blocking servers it can also be used to host simple web pages such as homepages and blogs.

We are going to be using a simple web framework for Node.js called Express (http://expressjs.com), which can be installed via the Node Package Manager on the command line.

npm install express

Firstly, create a new file called server.js and begin by instantiating the express framework.

var app = require("express").createServer();

The server now needs a web route to dispatch, so define the URL and pass in a callback function to handle the request.

app.get("/hello/:name", function (request, response) {
  response.send("Hello " + request.params.name);
});

See how :name in the route becomes a variable you can access in the request parameters and is used in the response text from the server.

Finally from the server point of view you need to define a port number to listen to incoming requests on.

app.listen(3000);

If you now run node server.js on the command line and navigate to http://localhost:3000/hello/Test in your web browser you will see the page you just created. Try swapping Test for your name in the URL and refresh the browser.

Now you are able to create simple pages with the Express framework and Node.js. For further experimentation you could take a look at using Jade (jade-lang.com) as a template engine and making use of static routes to serve images and CSS files.

If you like this article then you will probably also be interested in Getting started with Node.js and CouchDB and Nodester environment variables for sensitive data and passwords.

note

This article was originally published in the April 2012 issue of .net Magazine.

The accompanying tutorial to create a Google Talk bot with Node.js is also published on my blog.

In tandem with the Google bot tutorial I wrote four smaller articles:

The Node.js eco-system

XMPP and Jabber

Node.js in the real world

Node.js for hosting websites

https://www.simonholywell.com/post/2013/02/node-js-for-hosting-websites/

Node.js in the real world

Simon Holywell Feb 20, 2013 Updated Nov 19, 2024

With Node.js yet to reach a major release you may be wondering if it is mature enough for production environments and live projects. Whilst it is also true, in the past, that the Node.js creators have warned off people with mission critical objectives it is now in a much more stable state. Some well known companies including Plurk, LinkedIn and GitHub are using Node.js to deliver vital parts of their offerings everyday.

Show full content

With Node.js yet to reach a major release you may be wondering if it is mature enough for production environments and live projects. Whilst it is also true, in the past, that the Node.js creators have warned off people with mission critical objectives it is now in a much more stable state.

Some well known companies including Plurk, LinkedIn and GitHub are using Node.js to deliver vital parts of their offerings everyday. In the case of Plurk this was a publicised move in January 2010 from Java and JBoss Netty to Node.js for a stated ten fold decrease in memory usage across their application. This version of Plurk is still live today and serving over 100k open connections simultaneously at any given moment.

LinkedIn have employed Node.js to handle the server aspect of their mobile applications. In pre-launch testing they estimated that the switch away from Ruby on Rails would take them from fifteen servers to just one whilst being able to handle over twice the traffic load.

Other companies such as Klout, Transloadit and LearnBoost (the team behind Socket.IO) are betting their main infrastructure on Node.js.

Yahoo! have also experimented with it internally most notably in the Yahoo! Mail and YUI Library groups, but nothing has been publicly launched at this time.

Another notable mention goes to the search engine DuckDuckGo who are using Node.js to power their XMPP bot at im@ddg.gg.

Of course it should also be noted that Node.js has a commercial sponsor of development in the form of Joyent with Node.js creator, Ryan Dahl, on their books. In addition they offer hosting on their SmartMachines service and also organise the Node.js Camp conference.

So even if it is to remain a peripheral technology for you it is well worth keeping an eye on its progress in the future. For the more adventurous it may just become a full time vocation with both Klout and Yahoo! looking for more Node.js engineers at the time of writing.

note

This article was originally published in the April 2012 issue of .net Magazine.

The accompanying tutorial to create a Google Talk bot with Node.js is also published on my blog.

In tandem with the Google bot tutorial I wrote four smaller articles:

The Node.js eco-system

XMPP and Jabber

Node.js in the real world

Node.js for hosting websites

https://www.simonholywell.com/post/2013/02/node-js-in-the-real-world/

XMPP and Jabber

Simon Holywell Feb 18, 2013 Updated Nov 19, 2024

Jabber was originally invented by Jeremie Miller in 1998 who was sick of using many different closed protocol instant messenger clients. To begin fixing this situation he created jabberd as an open source server in 1999 and by May of the next year version 1.0 was released. Over the course of the last twelve or so years Jabber has evolved into a standards organisation (XMPP Standards Foundation) and developed an open industry standard for instant messaging called the Extensible Messaging and Presence Protocol (XMPP).

Show full content

Jabber was originally invented by Jeremie Miller in 1998 who was sick of using many different closed protocol instant messenger clients. To begin fixing this situation he created jabberd as an open source server in 1999 and by May of the next year version 1.0 was released.

Over the course of the last twelve or so years Jabber has evolved into a standards organisation (XMPP Standards Foundation) and developed an open industry standard for instant messaging called the Extensible Messaging and Presence Protocol (XMPP). As the name suggests XMPP is created using XML, which forms the basis for the systems message stanzas.

The protocol has been stable for sometime and now enables projects to take advantage of instant messaging, presence, conference and video chat. With all this on tap Google are not the only ones working with XMPP.

AOL Instant Messenger, Skype and Facebook have dabbled with the protocol although more recently Microsoft have announced that the Windows Live APIs support XMPP. This will allow any software to communicate with Windows Live Messenger via Live Connect. Apple, Cisco, IBM, Nokia (Ovi) and Sun have also made use of XMPP in one way or another in various projects.

More recently Google and XMPP have added a draft extension to the protocol called Jingle, which enables voice over IP and video conferencing. Whilst it is yet to be fully approved it is at a stage where it is ready for deployment with the libjingle library used by Google Talk freely available under a BSD licence.

One surprising implementation that demonstrates the flexibility of the protocol is RemoteVNC (an application to remotely control a computer), which uses Jingle to share desktops between users.

Some other implementations of the system include team collaboration, geo location and vehicle tracking, data syndication, identity services and even games.

The XMPP website (http://xmpp.org) maintains a large catalogue of client applications, server software and code API libraries along with full specifications for each aspect of the protocol so it is an excellent place to start learning more about the standard.

Join in the conversation about this article on Hacker News.

note

This article was originally published in the March 2012 issue of .net Magazine.

The accompanying tutorial to create a Google Talk bot with Node.js is also published on my blog.

In tandem with the Google bot tutorial I wrote four smaller articles:

The Node.js eco-system

XMPP and Jabber

Node.js in the real world

Node.js for hosting websites

https://www.simonholywell.com/post/2013/02/xmpp-and-jabber/

ssdeep PHP extension in git

Simon Holywell Feb 13, 2013 Updated Nov 19, 2024

The PHP project as a whole has been migrating to use git as its SCM of choice. This includes the core code and some PECL extensions such as BitSet and the sources for the PHP.net web properties like wiki.php.net Well now the PHP team have helped me to migrate the source of ssdeep to git, which means it is now also mirrored (where you can star it) to the official PHP github account.

Show full content

The PHP project as a whole has been migrating to use git as its SCM of choice. This includes the core code and some PECL extensions such as BitSet and the sources for the PHP.net web properties like wiki.php.net

Well now the PHP team have helped me to migrate the source of ssdeep to git, which means it is now also mirrored (where you can star it) to the official PHP github account. If you also maintain a PECL extension then you might be interested in the thread I created on the pecl.dev mailing list.

If you do not know what the ssdeep extension is then I have previously written a post describing it and how it came into being.

https://www.simonholywell.com/post/2013/02/php-ssdeep-in-git/

The Node.js eco-system

Simon Holywell Feb 13, 2013 Updated Nov 19, 2024

JavaScript started life as a project named Mocha created by Brendan Eich at Netscape in 1995. By the time Netscape Navigator 2.0 was due for release the language had changed names twice becoming LiveScript and then, finally, JavaScript (JS). Originally Netscape were considering a derivative of Scheme for client side scripting, but swept up in the buzz of Sun’s Java at the time management stated that the language must look like Java.

Show full content

JavaScript started life as a project named Mocha created by Brendan Eich at Netscape in 1995. By the time Netscape Navigator 2.0 was due for release the language had changed names twice becoming LiveScript and then, finally, JavaScript (JS).

Originally Netscape were considering a derivative of Scheme for client side scripting, but swept up in the buzz of Sun’s Java at the time management stated that the language must look like Java. Eich famously prototyped JS in just 10 days, which is astounding given the exposure the language would receive over the course of the next 16 years.

Server Side JavaScript (SSJS) is nothing new with Netscape’s LiveWire launching in 1996 in the Enterprise Server 2.0 offering. Since then there have been a number of projects built upon the venerable Rhino engine, but it was not until recently that blistering performance has been brought to SSJS.

One such advancement is Google’s V8 JavaScript engine, which underpins Node.js.

The popularity wave that Node.js currently rides upon is mostly focussed on its efficiency. It is at its best when serving short lived requests to many clients such as instant messaging, poll or rating applications for example rather than crunching numbers on the next aircraft wing design for days on end.

A concern for the Node.js project is born out of the event driven nature it employs with each action requiring a callback. This can lead to confusing code with nested callback functions obscuring the real business goals unless it is carefully managed.

There are rivals to Node.js out there looking to fulfil your SSJS needs such as Aptana’s Jaxer (similar to LiveWire), EJScript and RingoJS among others. All of them approach the task of JavaScript on the server in slightly different ways and therefore maybe a better fit for certain situations.

All these projects share one similarity though; rapidly evolving APIs, which can lead to your projects being left behind in obsolescence unless you keep on top of the changes.

Their ease of use makes them ideal for rapid development and prototyping, but their continuous evolution should also be factored into your language selection decision.

note

This article was originally published in the March 2012 issue of .net Magazine.

The accompanying tutorial to create a Google Talk bot with Node.js is also published on my blog.

In tandem with the Google bot tutorial I wrote four smaller articles:

The Node.js eco-system

XMPP and Jabber

Node.js in the real world

Node.js for hosting websites

https://www.simonholywell.com/post/2013/02/node-js-eco-system/

Create a Google Talk bot with Node.js: Part One

Simon Holywell Feb 6, 2013 Updated Nov 19, 2024

Programming a chat bot was once the domain of the hardcore hacker tapping packets as they passed over the wire from proprietary client applications to closed source servers, but not any more! With the open Extensible Messaging and Presence Protocol (XMPP) once closed networks are becoming accessible to the rest of us. I selected Google Talk as it is probably the most well known implementation of XMPP and it is easy and free to sign up for, but Windows Live Messenger, AIM and Skype all support it to some extent.

Show full content

Programming a chat bot was once the domain of the hardcore hacker tapping packets as they passed over the wire from proprietary client applications to closed source servers, but not any more!

With the open Extensible Messaging and Presence Protocol (XMPP) once closed networks are becoming accessible to the rest of us. I selected Google Talk as it is probably the most well known implementation of XMPP and it is easy and free to sign up for, but Windows Live Messenger, AIM and Skype all support it to some extent.

As an open protocol XMPP is supported by a large number of messaging networks and clients for just about all web enabled devices, which means your bot will also be able to communicate with users on other XMPP servers such as Jabber.

note

This article was originally published in the March 2012 issue of .net Magazine. You can also download a PDF of the original article.

There is a demo bot for this article documented at http://njsbot.simonholywell.com and the complete code is on github.

Let’s get set up

Firstly, an admission; I use Linux as my main desktop operating system so these setup instructions are taken from that environment, but also apply to Mac OS systems. On Windows you can either use Cygwin or a Linux virtual machine such as Ubuntu.

I am going to assume that you are familiar with Node.js and already have it and the NPM (Node Package Manager) installed and working. If not then there is great installation documentation on both their respective websites.

One of the Node.js packages (node-xmpp) we are going to be using has an external dependency on libexpat1-dev for its XML parsing. Some Linux distributions and Mac OS appear to come with this by default, but Ubuntu does not so keep this in mind as you proceed.

It also has an optional dependency on libicu-dev, which is not necessary to get the bot up and running. If you decide to skip installing this library then you can safely ignore the StringPrep warnings the Node.js console issues when you start up your bot.

Back to NPM and we need to install the following packages so we can begin development:

npm install node-xmpp
npm install request

Optionally, although recommended, we can install the supervisor package so that we no longer need to restart the Node.js process.

npm install supervisor -g

Supervisor will continually scan the project directory for any changes and restart the Node.js process for you, which makes development a little bit easier and faster.

Lastly, the bot will need a new Google Account (https://accounts.google.com/NewAccount) so that users are able to contact it. The account name will become your bot’s Jabber ID (JID) for example my bot resides at n.js.bot@gmail.com as explained at http://njsbot.simonholywell.com.

Configuration

To keep the project simple and clean we are going to move the configuration settings into a separate JavaScript file, which can be included at the top of the main bot script. Create a new file called config.js and include the following lines of code:

Add your bot’s Google Account details for jid and password after which you are ready to begin programming. Create a new file in the same directory called bot.js and start by including the configuration file:

const config = require("./config.js").settings;

Then include the Node.js packages that you installed earlier and util for logging actions in the bot:

const xmpp = require("node-xmpp");
const request_helper = require("request");
const util = require("util");

Connecting to the server

The node-xmpp package makes the process of communicating with XMPP servers very easy:

const conn = new xmpp.Client(config.client);

Notice how the code is using the client element from the configuration file that you created earlier to setup the parameters for the XMPP connection.

To help our bot have the best chances of living a long life the Node.js default socket time out setting of 60 seconds should really be overridden. I am using the following settings in my project:

conn.socket.setTimeout(0);
conn.socket.setKeepAlive(true, 10000);

The keep alive setting should send white space pings through the wire to stop proxies or routers on the way from timing out the connection when it is idle. Without these settings your bot will stop communicating after a minute of inactivity.

Announcing the bot’s arrival

The bot is now connected to the Google server, but nobody knows it is online so we have to announce its availability. We must now programme it to send out a presence XML stanza when it connects to the server. This is also an excellent opportunity to set a status message.

Firstly, let’s create a simple function to send out the stanza:

function set_status_message(status_message) {
  var presence_elem = new xmpp.Element("presence", {})
    .c("show")
    .t("chat")
    .up()
    .c("status")
    .t(status_message);
  conn.send(presence_elem);
}

Whilst this code may look a little complex it is in actual fact quite straightforward. Initially a new XML element called presence is created and then two child elements are added to it.

The show child has the text content of chat, which tells the server that the bot is ready to receive messages.

The up() function call causes the current element to return its parent element. So in this example it causes show to return its parent element presence. Without this the status element would be added as a child of show instead of presence.

The second child, status, is a little more obvious as it sets the status message with its text content.

Finally this whole XML element object is sent to the server using the connection (conn) that was created earlier. So the server will receive an XML element similar to the following given the previous code:

<presence>
    <show>chat</show>
    <status>I am a happy bot</status>
</presence>

For debug purposes you can see the XML in a xmpp.Element() or a stanza from the server (explained later) by calling .toString() on them.

console.log(presence_elem.toString());

That all sounds fantastic, but the function will never be run without being added to an event listener on the connection. In this case it should be triggered by the online event, which occurs right after the bot connects to the server.

conn.on("online", function () {
  set_status_message(config.status_message);
});

Once again the code is using the configuration file to set the content of the initial status message for the bot.

Meeting new people

The bot has now announced its arrival and other people can see it is available on the Google Talk network. Unfortunately if they attempt to connect with your bot in its current state their subscription requests will go unanswered.

Firstly, there is a Google specific roster call that needs to be made so that the bot will be registered to receive subscription request notifications from the server. Without including this function you will not be able see requests coming over the wire.

function request_google_roster() {
  var roster_elem = new xmpp.Element("iq", {
    from: conn.jid,
    type: "get",
    id: "google-roster",
  }).c("query", {
    xmlns: "jabber:iq:roster",
    "xmlns:gr": "google:roster",
    "gr:ext": "2",
  });
  conn.send(roster_elem);
}

As you can see this uses the same syntax as the set_status_message() you wrote earlier although it is a fraction more complex. The objects that are passed as the second parameter to xmpp.Element() and c() are interpreted as attributes on the XML element.

To make this a little clearer I will include the XML version of the call:

<iq type='get'
    from='romeo@gmail.com/orchard'
    id='google-roster-1'>
  <query xmlns='jabber:iq:roster' xmlns:gr='google:roster' gr:ext='2'/>
</iq>

As with set_status_message() this function will need to be plumbed in with an event listener on the connection object.

if (config.allow_auto_subscribe) {
  conn.addListener("online", request_google_roster);
  conn.addListener("stanza", accept_subscription_requests);
}

As you may wish to deactivate auto acceptance of subscription requests the listener is only added if the configuration file allows it. Just like when the status message was set earlier the request_google_roster() function is set to listen to the online event. The second listener is addressed in the next section.

Making friends

The bot is now receiving subscription requests so it is time to begin accepting them with the accept_subscription_requests() function mentioned in the previous code sample.

function accept_subscription_requests(stanza) {
  if (stanza.is("presence") && stanza.attrs.type === "subscribe") {
    var subscribe_elem = new xmpp.Element("presence", {
      to: stanza.attrs.from,
      type: "subscribed",
    });
    conn.send(subscribe_elem);
  }
}

This function has already been added to a listener for the stanza event in the previous section. The event is triggered not only by subscription requests so each incoming stanza is checked for the type of subscription.

To accept the request another simple XML element is sent back to the requesting party, which is obtained from the subscription stanza they sent to the bot with stanza.attrs.from. Setting the type to subscribed accepts their subscription request whilst setting it to unsubscribed will reject their request.

Connect with your bot

This is the first point at which connecting with your bot will give you any meaningful results so let’s give it a run. On your system’s console run the following command (if you have not installed supervisor then substitute it for node).

supervisor bot.js

Now logging into your own Google Talk account (not the bot account you created earlier) add your bot as a contact. If the code is working properly then you will now see your bot online with the status message you set in the configuration file.

As mentioned earlier you will not need to restart the node process as supervisor will do this for you whenever you save changes to the project’s files.

’til next time

In part one of the tutorial you have successfully written a bot in Node.js that can announce its presence and auto accept new subscription requests. I will guide you through adding new functionality and commands to your bot in the next instalment.

Once you have completed part two your bot will be able to provide users with help information, bounce messages back and search twitter for user supplied keywords. It will also become clear just how easy it is to add your own custom commands to your project.

Part two of this is article is now also posted to my blog - you should check it out!

Further resources

Stay in touch

If you liked this article then please follow me on Twitter and let me know.

note

Part two of this article is also published on my blog.

For now the complete code is on github.

This article was originally published in the March 2012 issue of .net Magazine.

In tandem with the Google bot tutorial I wrote four smaller articles:

The Node.js eco-system

XMPP and Jabber

Node.js in the real world

Node.js for hosting websites

https://www.simonholywell.com/post/2013/02/create-a-node-js-google-talk-bot-pt1/

Git tag secrets

Simon Holywell Feb 4, 2013 Updated Nov 19, 2024

Tags are quite a simple aspect of git, but there are a few things that a lot of people don’t know about. These shortcuts will make it quick for you to tag and manage those tags in your git repositories. Probably the most common use of tags is to note when a version of the software the repository is tracking is released. Usually this will be something like ‘1.2.8’ if you are sticking with SemVer.

Show full content

Tags are quite a simple aspect of git, but there are a few things that a lot of people don’t know about. These shortcuts will make it quick for you to tag and manage those tags in your git repositories.

Probably the most common use of tags is to note when a version of the software the repository is tracking is released. Usually this will be something like ‘1.2.8’ if you are sticking with SemVer.

Creating tags Lightweight tags

At its simplest level you can simply add a tag to your git repository with:

git tag 1.2.8

These tags can then be used to easily revert to or diff between versions of the project. For example to see the changes between versions of the project you can simply execute:

git diff 1.2.6 1.2.8

Annotated tags

Tags can also include a more descriptive body message or annotation much like a commit message. Usually this is achieved using (-a for annotation):

git tag -a 1.2.8

This will open up your commit editor - in my case this is vim - so that you can enter an annotation with your tag.

However if you only have a one line description to include then you can simply use -m just like you can with git commit:

git tag -m "Releasing version 1.2.8" 1.2.8

Listing tags Lightweight tags

To simply list the tags in your repository you can call:

git tag

If your project has many tags then it can be very useful to filter the list you are getting:

git tag -l '1.2.*'

Which would show you all the tags beginning with 1.2. as * is a wildcard.

Annotated tags

You will notice that when you call git tag you do not get to see the contents of your annotations, but just the headline tag (1.2.8). To see the annotations you must add -n to your command:

git tag -n

If you have entered multiple lines in your annotation then you will need to specify how many of those lines you want git tag to display. By default -n only shows the first line so if you wanted to see the first 10 lines for example you would use:

git tag -n10

Tag details

To see the details of a particular tag you can call git show much like you would with a commit hash:

git show 1.2.8

This will present you with the tag details and the information from the commit that was tagged.

Publishing tags

When you push your changes to another repository git does not transfer the tags along with it by default.

This is easily overriden like so:

git push --tags origin master

If you are pulling in someone elses changes then their tags will be pulled in and tracked by default. If you do not want this to happen then you can add the following switch:

git pull --no-tags origin master

https://www.simonholywell.com/post/2013/02/git-tag-secrets/

Idiorm and Paris 1.3.0 released - the minimalist ORM and fluent query builder for PHP

Simon Holywell Jan 31, 2013 Updated Nov 19, 2024

Idiorm is a PHP ORM that eschews complexity and deliberately remains lightweight with support for PHP5.2+. It consists of one file and primarily one class that can easily be configured in a few lines of code. There are no models to create, no convoluted configuration formats and there is no database introspection just a simple PDO connection string. However, having said this, Idiorm is very powerful and it makes most of the queries PHP applications require pain free.

Show full content

Idiorm is a PHP ORM that eschews complexity and deliberately remains lightweight with support for PHP5.2+. It consists of one file and primarily one class that can easily be configured in a few lines of code. There are no models to create, no convoluted configuration formats and there is no database introspection just a simple PDO connection string.

However, having said this, Idiorm is very powerful and it makes most of the queries PHP applications require pain free. Some of these features include fluent query building, multiple connection support and result sets for easy record manipulation.

Paris sits on top of Idiorm to provide a simplified active record implementation based upon the same minimalist philosophy. To configure Paris you just add simple models so that you can exploit the full power of Idiorm’s fluent query API, table/object associates (one-to-one, one-to-many, etc.) and filter methods to encompass common queries.

Some simple examples

Fetching and updating a record is easy with Idiorm:

<?php
$user = ORM::for_table('user')
        ->where_equal('username', 'j4mie')
        ->find_one();

$user->first_name = 'Jamie';
$user->save();

$tweets = ORM::for_table('tweet')
          ->select('tweet.*')
          ->join('user', array(
              'user.id', '=', 'tweet.user_id'
            ))
          ->where_equal('user.username', 'j4mie')
          ->find_many();

foreach ($tweets as $tweet) {
    echo $tweet->text;
}

Whereas Paris makes it easier to think of records and associations as objects:

<?php
class User extends Model {
    public function tweets() {
        return $this->has_many('Tweet');
    }
}

class Tweet extends Model {}

$user = Model::factory('User')
        ->where_equal('username', 'j4mie')
        ->find_one();
$user->first_name = 'Jamie';
$user->save();

$tweets = $user->tweets()->find_many();
foreach ($tweets as $tweet) {
    echo $tweet->text;
}

These examples show a very simple use case worked in both the Idiorm ORM way and using the Paris Active Record method. Of course the documentation for each project is a good place to start and find out more information on each projects capabilities.

A potted history

Both projects were written by Jamie Matthews; a friend and former colleague, but in recent times his focus has shifted to Python. This left Idiorm and Paris in an unmaintained state until, first, Durham Hale and then myself took up maintenance duties on the libraries.

This has led to a number of new features recently making their way into Idiorm and some long standing bugs being quashed. In this release quite a few changes have taken place including:

Multiple connections
Result set implementation
Tests refactored to use PHPUnit and configured with Travis-CI for continuous integration
Documentation revisited and now built with Sphinx on Read the Docs at idiorm.rtfd.org
Support for Firebird and PostgreSQL on top of the existing MySQL and SQLite
Installation via Packagist/Composer for Idiorm
And much more besides in the changelog

These updates have been complemented by some similar changes in Paris:

Documentation on Read the Docs at paris.rtfd.org
Installation via Packagist/Composer for Paris
With more minor updates maintained in the changelog

The future

The present aim with both Idiorm and Paris is to maintain the current code base and add suitable new features as and when they become required or useful. There have been a number of suggestions for possible improvements that would break backwards compatibility such as:

Changing the code to utilise late static bindings and thereby drop PHP5.2 support
Updating the classes, tests and documentation to meet the PSR-1 FIG standard

Neither of these two are holding the current project back or would significantly contribute to it use. They are proposals that would make the libraries inaccessible to a large number of users on cheap shared hosting environments or working on legacy projects.

It is likely that, at some point, Idiorm and Paris will make the move to only support PHP5.3+, but without a compelling reason there is no point in breaking backwards compatibility. I have previously addressed this in a pull request so for the full arguments please head over there.

If you have an suggestions or better yet pull requests then please lodge an issue on github for the relevant project. Some people disagree with this and others agree, but I would like to hear your opinion.

Project updates

If you like Idiorm and Paris then please follow me on Twitter and github for updates. You might also like some of my other projects such as Navigator, php_ssdeep and PHP at Job Queue Wrapper.

https://www.simonholywell.com/post/2013/01/idiorm-and-paris-the-minimalist-orm/

Unicode shortcut in Netbeans for React/Curry

Simon Holywell Jan 22, 2013 Updated Nov 19, 2024

In some of my code I use a PHP library called React/Curry and to save typing it uses a unicode ellipsis (…) for a method name. Yes, that is right unicode method names can be legal in PHP! <?php $firstChar = Curry\bind('substr', Curry\…(), 0, 1); See I told you so! Well that is great, but how do you type a unicode character into a Netbeans document? To save having to constantly copy and paste the … character from a symbols list I have setup a very simple macro in Netbeans to print the character for me.

Show full content

In some of my code I use a PHP library called React/Curry and to save typing it uses a unicode ellipsis (…) for a method name. Yes, that is right unicode method names can be legal in PHP!

<?php
$firstChar = Curry\bind('substr', Curry\…(), 0, 1);

See I told you so!

Well that is great, but how do you type a unicode character into a Netbeans document?

To save having to constantly copy and paste the … character from a symbols list I have setup a very simple macro in Netbeans to print the character for me. This means that control + alt + . now dumps the … character into the document at the current cursor position.

You to can enjoy this revelation in four simple steps:

Tools > Options > Editor > Macros
Click New and give it a name (“ellipsis” perhaps)
Enter the following (include the quotes!) into the Macro Code box: "…"
Click Set Shortcut and choose the keystroke you want (I chose control + alt + .)

Now when you bash those keys in your documents you’ll get a nice … character or whatever other unicode character you have setup.

One little caveat is that if you are using jVi like I am then the Macro editor seems to play a little havoc with it after setting up a new macro. This is easily resolved by ticking jVi off and on again in the Options menu of Netbeans.

Slightly annoying, but perfectly liveable.

I picked this little tip up on the Netbeans forums and as it took me a long time to find I thought I would share it!

https://www.simonholywell.com/post/2013/01/unicode-shortcut-in-netbeans/

Navigator: Geographic calculation library for PHP

Simon Holywell Jan 18, 2013 Updated Nov 19, 2024

Navigator is a PHP library for easily performing geographic calculations and distance unit conversions on Earth or any other spheroid. Currently it supports distance calculations between two coordinates using Vincenty, Haversine, Great Circle or Cosine Law. By default it uses the most accurate, but computationally intensive: Vincenty. To calculate the distance between two points on Earth in metres it is as simple as: <?php use Treffynnon\Navigator as N; $distance = N::getDistance(10, 81.

Show full content

Navigator is a PHP library for easily performing geographic calculations and distance unit conversions on Earth or any other spheroid.

Currently it supports distance calculations between two coordinates using Vincenty, Haversine, Great Circle or Cosine Law. By default it uses the most accurate, but computationally intensive: Vincenty.

To calculate the distance between two points on Earth in metres it is as simple as:

<?php
use Treffynnon\Navigator as N;
$distance = N::getDistance(10, 81.098, 50.821389, -0.147222);

The example above uses decimal notation for the latitude and longitude of the coordinate, but you can also specify them in DMS (degrees, minutes and seconds).

<?php
$distance = N::getDistance(10, 81.098, 15.6, '5° 10\' 11.009"W');

There are also helpers to swap coordinate notation back and forth between decimals and DMS.

In addition there are utilities for converting distance units between each other including metres to nautical miles, kilometres, miles, parsecs, furlongs and leagues. Custom converts can easily be added to support other units of measurement should you need them.

By default the library will perform calculations on Earth, but Mars and Earth’s Moon have also been configured. Of course it is easy to add new celestial bodies should you need to perform distance calculations on Mercury for some reason!

The code is available on github of course and it is fully documented.

https://www.simonholywell.com/post/2013/01/navigator-geographic-calculations-library-for-php/

Force URLs to lowercase with Apache rewrite and PHP

Simon Holywell Nov 1, 2012 Updated Nov 19, 2024

Canonical pages are an important aspect of maintaining a website and ensure that search engine rankings are not affected by any duplicated content. In *NIX based systems file names with varying capitalisation are treated as separate files. For example filename.txt is not the same file as FileName.TXT. This extends into the world of Apache where URLs are also case sensitive. So that means that we really should pick a case for our URLs and force all browsers to redirect to our chosen scheme.

Show full content

Canonical pages are an important aspect of maintaining a website and ensure that search engine rankings are not affected by any duplicated content.

In *NIX based systems file names with varying capitalisation are treated as separate files. For example filename.txt is not the same file as FileName.TXT. This extends into the world of Apache where URLs are also case sensitive.

So that means that we really should pick a case for our URLs and force all browsers to redirect to our chosen scheme. Lowercase is the only sensible choice so my examples will only cover it.

mod_rewrite does not have an easy way to do this from a .htaccess file without a large amount of repeat (recursive) requests to itself for each letter. This is certainly not a desirable load to put Apache under when it can be handled in another way.

This is where your favourite programming language steps into save the day. I am using PHP in these examples as it is the most commonly paired language with Apache.

In your projects .htaccess file you’ll need to add the following rewrite rules.

RewriteEngine on
RewriteBase /

# force url to lowercase if upper case is found
RewriteCond %{REQUEST_URI} [A-Z]
# ensure it is not a file on the drive first
RewriteCond %{REQUEST_FILENAME} !-s
RewriteRule (.*) rewrite-strtolower.php?rewrite-strtolower-url=$1 [QSA,L]

To describe briefly what is happening here:

Setup up the rewrite module
Check if the incoming URL contains any uppercase letters
Ensure that the incoming URL does not refer to a file on disk (you may want to host a file with upper case letters in its name - something like a PDF file that a client has uploaded through the CMS you have supplied them for instance)
Send all the requests that match aforementioned rules are then rewritten to our script that will do the actual conversion to lowercase work. The only thing to note here is the QSA modifier, which makes sure all the GET “variables” are passed onto the script

Next up is the little snippet of PHP that does all the work! This is a file called rewrite-strtolower.php in the same directory as your .htaccess file mentioned above.

<?php
if(isset($_GET['rewrite-strtolower-url'])) {
    $url = $_GET['rewrite-strtolower-url'];
    unset($_GET['rewrite-strtolower-url']);
    $params = http_build_query($_GET);
    if(strlen($params)) {
        $params = '?' . $params;
    }
    // if you don't have SSL/a security certificate at the destination change https:// to http:// below
    header('Location: https://' . $_SERVER['HTTP_HOST'] . '/' . strtolower($url) . $params, true, 301);
    exit;
}
header("HTTP/1.0 404 Not Found");
die('Unable to convert the URL to lowercase. You must supply a URL to work upon.');

As you can quickly see this is a very simple script that simply takes in the URL passed to it from the rewrite rules above.

It grabs the supplied URL and removes it from the $_GET variable to stop it from being passed to the destination page amongst the GET query parameters
It then rebuilds the GET parameters into a query string for use in the redirect. If there are none then this will just be an empty string which will have no consequence on the final URL
Finally the redirect is performed using PHP’s header() function

After these few steps have been completed as browser will always be redirected to the lowercase version of a URL. Try entering /mY-TEST-url and you’ll see instantly become /my-test.

I first came up with this technique a good few years ago so if you know of a better solution for Apache that has appeared in the interim then please let me know.

Alternatives

It is also worth noting that there are alternatives for those without the .htaccess file requirement.

If you are not on a shared hosting environment and happy to enter the rules directly into your Apache configuration you can use mod_rewrites RewriteMap directive to do the lowercase conversion:

RewriteMap lc int:tolower
RewriteRule (.*?[A-Z]+.*) ${lc:$1} [R]

For more information on this see the Apache manual: Redirect a URI to an all-lowercase version of itself. Although it is noted there that it is recommended to use mod_speling instead of this rewrite rule.

https://www.simonholywell.com/post/2012/11/force-lowercase-urls-rewrite-php/

Fish Console Reborn

Simon Holywell Jun 11, 2012 Updated Nov 19, 2024

Installing curses: sudo apt-get install libncurses5-dev fish is now installed on your system. To run fish, type ‘fish’ in your terminal. To use fish as your login shell: add the line ‘/usr/local/bin/fish’ to the file ‘/etc/shells’. use the command ‘chsh -s /usr/local/bin/fish’. To set your colors, run ‘fish_config’ To scan your man pages for completions, run ‘fish_update_completions’ Have fun!

Show full content

Installing curses:

sudo apt-get install libncurses5-dev

fish is now installed on your system. To run fish, type ‘fish’ in your terminal.

To use fish as your login shell:

add the line ‘/usr/local/bin/fish’ to the file ‘/etc/shells’.
use the command ‘chsh -s /usr/local/bin/fish’.

To set your colors, run ‘fish_config’ To scan your man pages for completions, run ‘fish_update_completions’

Have fun!

https://www.simonholywell.com/post/2012/06/fish-console/

NetBeans with jVi vim bindings

Simon Holywell Mar 20, 2012 Updated Nov 19, 2024

I love vim and its very handy shortcuts, but I also like to be in a GUI IDE for most of my development. Thankfully there is an answer; add vims keybindings to the NetBeans environment with jVi. Once you have Netbeans 7+ installed you can install jVi by going to Tools > Plugins > Available Plugins and searching for jVi. Select jVi for NB-7.0 Update Center and click Install. Now click the Reload Catalog button and wait for the updates to stream in.

Show full content

I love vim and its very handy shortcuts, but I also like to be in a GUI IDE for most of my development. Thankfully there is an answer; add vims keybindings to the NetBeans environment with jVi.

Once you have Netbeans 7+ installed you can install jVi by going to Tools > Plugins > Available Plugins and searching for jVi. Select jVi for NB-7.0 Update Center and click Install.

Now click the Reload Catalog button and wait for the updates to stream in. Select jVi for NetBeans and click Install.

You will then be asked to restart NetBeans and jVi will be installed. It can easily be enabled or disabled from the Tools menu by clicking jVi.

When enabled you can use vim commands as you would in vim. For example typing :w<enter> will save the document as in vim.

https://www.simonholywell.com/post/2012/03/netbeans-jvi-vim-bindings/

.net magazine article: Create a Google Talk bot with Node.js

Simon Holywell Feb 3, 2012 Updated Nov 19, 2024

I have written a two part article for this months .net magazine detailing how easy it is to write a Google Talk bot with the evented power of Node.js. “Programming a chat bot was once the domain of the hardcore hacker, tapping packets as they passed over the wire from proprietary client applications to closed source servers, but not any more!” note As of 6/2/2103 I have now published this article on my blog.

Show full content

I have written a two part article for this months .net magazine detailing how easy it is to write a Google Talk bot with the evented power of Node.js.

“Programming a chat bot was once the domain of the hardcore hacker, tapping packets as they passed over the wire from proprietary client applications to closed source servers, but not any more!”

note

As of 6/2/2103 I have now published this article on my blog.

In issue 225 (out now) you will learn how to build a Google Talk bot that is able to set its own status messages and accept new contact requests. I then follow this up in part two of the tutorial (issue 226 , on sale 28 February) by adding message bounce back and Twitter searching functionality.

Additionally, I give a little bit of history from both Node.js and XMPP/Jabber along with some background on projects and companies that are using Node.js and hiring experts. As an aside there is a micro-tutorial on creating webpages with Node.js using the express framework like the bots demo website.

There is also a demo bot and documentation over at njsbot.simonholywell.com, which is hosted on cloudno.de (Thanks Hans). The source code for the demo site and bot can be found on github.

https://www.simonholywell.com/post/2012/02/create-a-google-talk-bot-with-nodejs/

Installing a MySQL UDF errors with Function already exists

Simon Holywell Jan 31, 2012 Updated Nov 19, 2024

When installing a UDF recently I got an annoying error message, which didn’t seem to want to go away. Deleting the function before attempting to remove it did not work so I used the following set of escalating commands to attempt to get it to install. But back to the error for a moment: bash > mysql -u user -p < installdb.sql Enter password: ERROR 1125 (HY000) at line 7: Function 'lib_mysqludf_ssdeep_info' already exists This can be solved really simply with the following options:

Show full content

When installing a UDF recently I got an annoying error message, which didn’t seem to want to go away. Deleting the function before attempting to remove it did not work so I used the following set of escalating commands to attempt to get it to install.

But back to the error for a moment:

bash > mysql -u user -p < installdb.sql
Enter password:
ERROR 1125 (HY000) at line 7: Function 'lib_mysqludf_ssdeep_info' already exists

This can be solved really simply with the following options:

Attempt to delete the function and then reinstall it
Delete the function row from the mysql.func table and then reinstall it
Stop the MySQL server (after trying option 2), start it again and then reinstall it

https://www.simonholywell.com/post/2012/01/mysql-udf-install-error-function-already-exists/

Nodester environment variables for sensitive data and passwords

Simon Holywell Oct 27, 2011 Updated Nov 19, 2024

When I began using Cloudno.de recently to have a go at Node.js and CouchDB I stored my username and password in plain text in a configuration file. If you are also looking to get CouchDB going with CloudNo.de then my earlier Getting started with Node.js and CouchDB post may be of interest. The configuration file was fine for testing as nobody who came across the database login details could do any real damage, but as the project got more interesting I wanted to send it live and these details would need to be kept private.

Show full content

When I began using Cloudno.de recently to have a go at Node.js and CouchDB I stored my username and password in plain text in a configuration file. If you are also looking to get CouchDB going with CloudNo.de then my earlier Getting started with Node.js and CouchDB post may be of interest.

The configuration file was fine for testing as nobody who came across the database login details could do any real damage, but as the project got more interesting I wanted to send it live and these details would need to be kept private.

Thankfully the Nodester platform, which CloudNo.de is using, has environment variables built in and you can use them to store sensitive data such as passwords. It is also good to know that the variables will persist even after the host machine is cycled.

To set an environment variable you simply make a curl request to the API like the following:

curl -k -X PUT -u "[username]:[api key/password]" -d "appname=[app name]&key=[environment variable name]&value=[environment variable value]" https://api.cloudno.de/env

Replace the items in the square brackets with the values from your Nodester based hosting solution (such as cloudno.de) and your environment variable content.

If you are not using CloudNo.de then you will also need change the URL at the end of the command above to point to the correct API URL for your service. For example with Nodester’s own hosting the URL should be http://api.nodester.com/env.

So if I wanted to set an environment variable called “first_name” and set the value to “Simon” I would issue the following curl request.

curl -k -X PUT -u "username:api_key" -d "appname=app_name&key=first_name&value=Simon" https://api.cloudno.de/env

To check it has been set correctly you can GET the value:

curl -u "[username]:[api key/password]" -d "appname=[app name]&key=[environment variable name]" https://api.cloudno.de/env

You can also delete an environment variable from the system using the following command:

curl -X DELETE -u "[username]:[api key/password]" -d "appname=[app name]&key=[environment variable name]" https://api.cloudno.de/env

So that sets up the environment variable, but how do you access it from within Node.js?

It is as simple as accessing the global process object in the following manner:

var environment_variable_value = process.env.[environment variable name];

Again replace the square brackets with the actual environment variable name to access it. So to access the environment variable I set in my earlier example it would look like:

var environment_variable_value = process.env.first_name

Now you have a working environment variable setup to store you sensitive data in.

https://www.simonholywell.com/post/2011/10/nodester-environment-variables-for-sensitive-data/

PHP extension writing: PHP Extensions Made Eldrich

Simon Holywell Oct 26, 2011 Updated Nov 19, 2024

PHP extension writing: PHP Extensions Made Eldrich Since writing my 15 Excellent Resources for PHP Extension Development post in September last year Kristina Chodorow of 10gen (MongoDB) has written an excellent four part article on writing PHP Extensions on her blog Snail in a Turtleneck.

Show full content

PHP extension writing: PHP Extensions Made Eldrich

Since writing my 15 Excellent Resources for PHP Extension Development post in September last year Kristina Chodorow of 10gen (MongoDB) has written an excellent four part article on writing PHP Extensions on her blog Snail in a Turtleneck.

https://www.simonholywell.com/post/2011/10/php-extensions-made-elrich/

The world has lost an excellent and exciting rider in Marco Simoncelli

Simon Holywell Oct 24, 2011 Updated Nov 19, 2024

The world has lost an excellent and exciting rider in Marco Simoncelli. He was a true character. RIP #58. You can leave a tribute on the MotoGP website.

Show full content

The world has lost an excellent and exciting rider in Marco Simoncelli. He was a true character. RIP #58.

You can leave a tribute on the MotoGP website.

https://www.simonholywell.com/post/2011/10/marco-simoncelli/

Getting started with Node.js and CouchDB

Simon Holywell Oct 21, 2011 Updated Nov 19, 2024

Node.js and CouchDB feel like they were made for each other right from the very first time I used them. With the cradle node package the integration becomes even easier. Whilst both Node.js and CouchDB are open source with packages for most operating systems it maybe easier for you to start out using a hosted solution such as CloudNo.de (has CouchDB now) or Nodester for example. As far as the CouchDB portion goes there is only one place to go and that is IrisCouch.

Show full content

Node.js and CouchDB feel like they were made for each other right from the very first time I used them. With the cradle node package the integration becomes even easier.

Whilst both Node.js and CouchDB are open source with packages for most operating systems it maybe easier for you to start out using a hosted solution such as CloudNo.de (has CouchDB now) or Nodester for example. As far as the CouchDB portion goes there is only one place to go and that is IrisCouch.

With the exception of IrisCouch they are all in private beta so there might be a small wait before you get access.

I wrote the following example code before CloudNo.de supported CouchDB so it references IrisCouch, but the setup is similar.

So at this point I am going to assume that you have successfully installed Node.js and CouchDB or are using hosted services. All the projects have good installation documentation for most platforms and there are many blog posts rehashing this step out there.

Firstly you will need to install the cradle package using:

npm install cradle

It should be noted here that if you are using a hosted service the command will be something like:

cloudnode app npm install cradle

from the applications root directory (the one with the .git folder in it). Both the CloudNo.de documentation and Nodester documentation cover this process on their respective websites.

CouchDB is documented on the Apache foundation’s web pages and the cradle middleware has its source and documentation on GitHub.

var http = require("http");
http
  .createServer(function (req, http_res) {
    http_res.writeHead(200, { "Content-Type": "text/plain" });
    var response = "";

    var cradle = require("cradle");
    var connection = new cradle.Connection(
      "https://subdomain.iriscouch.com",
      443,
      {
        auth: { username: "username", password: "password" },
      },
    );

    var db = connection.database("database_name");
    db.save(
      "document_key",
      {
        name: "A Funny Name",
      },
      function (err, res) {
        if (err) {
          // Handle error
          response += " SAVE ERROR: Could not save record!!\n";
        } else {
          // Handle success
          response += " SUCESSFUL SAVE\n";
        }
        db.get("document_key", function (err, doc) {
          response += " DOCUMENT: " + doc + "\n";
          http_res.end(response);
        });
      },
    );
  })
  .listen(8071);

I think that the code is fairly easy to follow if you have written JavaScript code utilising callbacks before, but I will step through it for clarity.

A new HTTP server is setup first so that we can access the Node.js programme through the a web browser to trigger the database changes. Incidentally this would be accessed at http://localhost:80871 or http://subdomain.cloudno.de.

Cradle is then setup and the connection parameters are set with the CouchDB database being selected on line 11.

Next a new document is saved into the DB (replace document_key with your document name to customise) with the contents of { name: “A funny name” }. Once this is saved a callback function is called that will attempt to retrieve the record.

Depending upon how successful the whole process is you should see something like

SUCESSFUL SAVE
DOCUMENT: { name: "A funny name" }

in your browser.

As you can see it is really easy to change what is saved and how it is retrieved. This was just a very simple introduction, but there are many neat tricks documented at each projects homepage.

https://www.simonholywell.com/post/2011/10/getting-started-node-js-couch-db/

PHP elephants

Simon Holywell Oct 21, 2011 Updated Nov 19, 2024

PHP elephants

Show full content

PHP elephants

https://www.simonholywell.com/post/2011/10/php-elephants/

How the ssdeep PHP extension came into being

Simon Holywell Jul 14, 2011 Updated Nov 19, 2024

Recently (well in a loose sense anyway) I had the need to build a document bank in PHP for a client at Mosaic. It was a fairly involved application with various public and private APIs for integration into the clients network of websites. The core PHP code was written on top of the Agavi framework and various PHP libraries for extracting text and meta data from documents. One of the major features the client required was for the system to detect similar files to prevent unintentional duplicates making it into the document bank.

Show full content

Recently (well in a loose sense anyway) I had the need to build a document bank in PHP for a client at Mosaic. It was a fairly involved application with various public and private APIs for integration into the clients network of websites.

The core PHP code was written on top of the Agavi framework and various PHP libraries for extracting text and meta data from documents. One of the major features the client required was for the system to detect similar files to prevent unintentional duplicates making it into the document bank.

The idea was that this document bank would be the one central resource for all of the documents written and managed by the organisation. Duplicates or near duplicates would of course make this a pointless exercise. So I turned to StackOverflow for some pointers, but came up empty.

After some research and much searching of the web I came across an open source package called ssdeep written by Jesse Kornblum. I found it through reading his research papers; Identifying almost identical files using context triggered piecewise hashing.

ssdeep is based upon work by Andrew Tridgell of samba fame who produced spamsum and the basis of the mathematics behind ssdeep. To summarise ssdeep would be to say that it can detect homologous files or signatures in files.

Despite the fact that ssdeep was originally intended to be used for malware detection it is equally suited to the more mundane task of detecting duplicate documents.

With this discovery I immediately began creating a prototype version written in basic PHP that would serve as a wrapper around the ssdeep binary. I have, by request, made this code public , but it is a pretty old hack and I would not recommend using it.

As I got this prototype up and running I began to see how powerful ssdeep was, but with one small caveat - it works best on files above 4KB as noted in an erroneous bug report on the ssdeep package in PECL. In my application this was fine as it was handling large PDF and Word documents for the most part.

Soon I became aware that there was API for the ssdeep package that I could extend to create a PHP extension. So I spent quite some time figuring out how to actually write a PHP extension from various sources and then even more time looking into the autoconf build process.

If you are interested in writing your own extension I have documented my resources previously on this blog.

After some annoying errors with baffling outputs I finally had my extension written, building and tested. Pretty soon it was a on a production box and working like a dream on thousands of documents.

Now I wanted to share the code for others to use so initially I hosted it all on github, but soon realised it would get far more exposure if it was included in PECL. I also wanted to have ssdeep properly documented in the main PHP manual to further promote the extension.

This did look unlikely in the beginning as the ssdeep package is licenced using GPL, which cannot be accepted into PECL due to its viral nature. Thankfully after contacting Jesse it became clear that the original work by Tridgell had been dual licenced and he could therefore grant me an exemption from GPL for the purposes of PECL.

After completing the application process, accepting code reviews and jumping the legal hurdles I was finally ready to publish my first PECL extension! I began building the PECL extension locally and installing it on as many machines as possible through the PECL method.

Thankfully it all went to plan apart from a bug I discovered in Pyrus and I managed to get a release up. Next was the process of documenting it all for PHP, suffice to say its not as easy as it sounds. (Two spaces of indent not four!)

In the end though with thanks to Pierre, Johannes, Gustavo and of course Jesse I had released my first extension into the wild.

There you have it. That is how a PHP extension is born and merged into PECL.

https://www.simonholywell.com/post/2011/07/how-php-ssdeep-was-made/

Running a sane version of Linux on a Dell Inspiron 2500

Simon Holywell Jun 22, 2011 Updated Nov 19, 2024

I have ended up with a very old piece of hardware and of course the first thing I did was wipe the Windows 2000 installation and stick a few versions of Linux on it. Unfortunately it only came with 128MB of memory from factory so nothing would run very well. The PCMIA wireless card that came with it wouldn’t work with WPA2 under Windows 2000 so an upgrade was required.

Show full content

I have ended up with a very old piece of hardware and of course the first thing I did was wipe the Windows 2000 installation and stick a few versions of Linux on it. Unfortunately it only came with 128MB of memory from factory so nothing would run very well. The PCMIA wireless card that came with it wouldn’t work with WPA2 under Windows 2000 so an upgrade was required.

£5 later I got a matched pair of Crucial 256MB sticks on ebay so I could try out Linux Mint 6 Fluxbox edition and Linux Mint 10 LXDE. Whilst both worked right out of the box I did have to configure the X11 monitor settings so that it would support the full 1024x768 splendour that the Inspiron 2500 affords you. See the gist I have setup on github for my configuration file and some short instructions on getting it setup.

In the end I decided to go with Mint 10 LXDE as it used a similar level of system resources, but ran more smoothly and of course benefited from being the latest version with full package update support.

I expected to have a load of issues getting the Belkin wireless adapter working, but in the end this distro had all the drivers so need for all those ndiswrapper recipes that are out there. It is running much better than the Windows 2000 OS was as well and benefits from having and running newer versions of all the applications I need.

Of course web streaming of Flash video is somewhat staccato with such a small CPU but it is a perfectly usable web browser and word processor.

https://www.simonholywell.com/post/2011/06/linux-on-dell-inspiron-2500/

I remember watching trains from this bridge when I first arrived…

Simon Holywell Jun 10, 2011 Updated Nov 19, 2024

I remember watching trains from this bridge when I first arrived in Edinburgh from Australia. It was cold then too! nationalgeographicdaily: Children Watching Train, Edninburgh Photograph by Vishal Soniji This photo was taken during my visit to Edinburgh as I was on my way to Edinburgh Castle.

Show full content

I remember watching trains from this bridge when I first arrived in Edinburgh from Australia. It was cold then too!

nationalgeographicdaily:

Children Watching Train, Edninburgh

Photograph by Vishal Soniji

This photo was taken during my visit to Edinburgh as I was on my way to Edinburgh Castle.

https://www.simonholywell.com/post/2011/06/i-remember-watching-trains-from-this-bridge-when-i/

Ah, so eloquent! jeffreyshek: Only in Scotland. Photo taken by a…

Simon Holywell Jun 3, 2011 Updated Nov 19, 2024

Ah, so eloquent! jeffreyshek: Only in Scotland. Photo taken by a friend of mine.

Show full content

Ah, so eloquent!

jeffreyshek:

Only in Scotland. Photo taken by a friend of mine.

https://www.simonholywell.com/post/2011/06/fucking-keep-it/

New version of the Agavi framework support for NetBeans 7.0

Simon Holywell May 5, 2011 Updated Nov 19, 2024

necora-markus: Released a new version of the Agavi framework support plugin for the shiny new NetBeans 7.0. Still depends on implementation versions of the PHP-plugin, so if something doesn’t work, please let me know. Prebuilt NBM available here, source code here. UPDATE: Even newer version available for download here. Should fix a null pointer exception.

Show full content

necora-markus:

Released a new version of the Agavi framework support plugin for the shiny new NetBeans 7.0. Still depends on implementation versions of the PHP-plugin, so if something doesn’t work, please let me know.

Prebuilt NBM available here, source code here.

UPDATE: Even newer version available for download here. Should fix a null pointer exception.

https://www.simonholywell.com/post/2011/05/netbeans-7-agavi-support/

TZ3 Stradale

Simon Holywell May 5, 2011 Updated Nov 19, 2024

TZ3 Stradale

Show full content

TZ3 Stradale

https://www.simonholywell.com/post/2011/05/tz3-stradale/

Whilst the wedding is a great event for the participants, I am n…

Simon Holywell Apr 28, 2011 Updated Nov 19, 2024

Whilst the wedding is a great event for the participants, I am not interested in the slightest. However William was seen last night out on his motorbike in less than adequate gear (squidding) and confirming his continued love for Ducati - some would say at the cost of Hinckley. Triumph being the only mass produced British rival to the big Duke.

Show full content

Whilst the wedding is a great event for the participants, I am not interested in the slightest. However William was seen last night out on his motorbike in less than adequate gear (squidding) and confirming his continued love for Ducati - some would say at the cost of Hinckley. Triumph being the only mass produced British rival to the big Duke.

https://www.simonholywell.com/post/2011/04/prince-william-a-squid/

Why won't ssh-agent save my unencrypted key for later use?

Simon Holywell Apr 26, 2011 Updated Nov 19, 2024

Why won’t ssh-agent save my unencrypted key for later use? I recently was annoyed by always having to enter my private keys passphrase every time I wanted to do a git push to or pull from a public git repository. Turns out that if you are logged into a Gnome session on an Ubuntu machine it will automatically add you key to ssh-agent, but if you are logged into a bash session (as I was) then it won’t.

Show full content

Why won’t ssh-agent save my unencrypted key for later use?

I recently was annoyed by always having to enter my private keys passphrase every time I wanted to do a git push to or pull from a public git repository. Turns out that if you are logged into a Gnome session on an Ubuntu machine it will automatically add you key to ssh-agent, but if you are logged into a bash session (as I was) then it won’t.

So you can either manually do the ssh-add yourself or following the instructions in the answer to my question you can setup an automatic way of facilitating this.

One problem I discovered is that if you have git displaying the current branch information in your bash prompt like me then when you start a session it will ask you for your passphrase before rendering your bash prompt.

I am thinking that to work around this I could change the git function in the .bash_profile file to look at the arguments passed to it and if it is a remote operation such as a pull, push or clone then trigger the ssh-add otherwise it can safely skip it.

Any other ideas or patches?

https://www.simonholywell.com/post/2011/04/ssh-agent-not-storing-unencrypted-key/

hotvvheels: Cool is a Color

Simon Holywell Mar 9, 2011 Updated Nov 19, 2024

hotvvheels: Cool is a Color

Show full content

hotvvheels:

Cool is a Color

https://www.simonholywell.com/post/2011/03/hotvvheels-cool-is-a-color/

Gearman, PHP and mod_gearman_status on Ubuntu

Simon Holywell Feb 11, 2011 Updated Nov 19, 2024

Installing Gearman is pretty easy as there are packages for it in Ubuntu: sudo apt-get install gearman libgearman-dev The development headers (libgearman-dev) are only required if you need to compile a library for your programming language such as a PHP extension. To install the PHP module you would run: sudo pecl install channel://pecl.php.net/gearman-0.7.0 If you have trouble with the above step then it is probably because you are running an older version of Ubuntu.

Show full content

Installing Gearman is pretty easy as there are packages for it in Ubuntu:

sudo apt-get install gearman libgearman-dev

The development headers (libgearman-dev) are only required if you need to compile a library for your programming language such as a PHP extension. To install the PHP module you would run:

sudo pecl install channel://pecl.php.net/gearman-0.7.0

If you have trouble with the above step then it is probably because you are running an older version of Ubuntu. In this case take a look at my previous post Getting gearman to install on Ubuntu.

Moving onto mod_gearman_status, which is an Apache module to show the status of Jobs and their associated workers. It looks like the following:

Firstly lets ensure that the system has the build-essentials package installed:

sudo apt-get install build-essentials

We now need to install the Apache2 development headers, which assuming you installed the standard Ubuntu Apache2 package will be the prefork edition. If you don’t understand this then don’t worry you can just continue below.

sudo apt-get install apache2-prefork-dev

Now download modules C file and run the following command to build it and install it as an Apache module:

sudo apxs2 -c -i mod_gearman_status.c

Apache now needs to be told how to load the module and what configuration settings to use. In /etc/apache2/mods-available you need to create two files.

/etc/apache2/mods-available/gearman_status.load:

LoadModule gearman_status_module /usr/lib/apache2/modules/mod_gearman_status.so

/etc/apache2/mods-available/gearman_status.conf:

<IfModule mod_gearman_status.c>
    <Location /gearman-status>
        SetHandler gearman_status
    </Location>
</IfModule>

Enable the module with the following command:

sudo a2enmod gearman_status

Restart Apache to load the module:

sudo service apache2 restart

Now in your browser you can visit http://example.org/gearman-status where example.org is your servers address.

https://www.simonholywell.com/post/2011/02/gearman-php-mod-gearman-status/

Winter motorcycle storage: battery and electrical systems

Simon Holywell Jan 23, 2011 Updated Nov 19, 2024

There are many things to consider as you wrap your bike up for winter such as ensuring your fuel does not go stale, but the electrical system needs attention too. One of the most common failures when placing your motorcycle in storage is the battery losing charge and eventually becoming damaged beyond repair. Causes of battery failure Most vehicles have a residual draw that slowly saps power from the battery even with the ignition switched off, but this is exacerbated by the fitment of alarms or other after market accessories.

Show full content

There are many things to consider as you wrap your bike up for winter such as ensuring your fuel does not go stale, but the electrical system needs attention too. One of the most common failures when placing your motorcycle in storage is the battery losing charge and eventually becoming damaged beyond repair.

Causes of battery failure

Most vehicles have a residual draw that slowly saps power from the battery even with the ignition switched off, but this is exacerbated by the fitment of alarms or other after market accessories. I highlighted this in an article I recently wrote for webBikeWorld reviewing the HealTech GIpro gear indicator, which was flattening my battery in about 3 weeks.

On top of this batteries also have a natural dissipation of the charge over time as well and require regular top up charges to remain in optimum shape. When a battery is left for a long time in state of discharge a reaction occurs between the lead and the acid that results in lead sulfate resulting in what is known as a sulfated battery. The longer the sulfation is allowed to progress the more permanent damage is done to the battery.

Another issue of long term storage of a lead acid battery is stratification, which is the separation of the electrolyte in the battery. The water and the acid will eventually begin to become distinct layers in the battery with the water forming on top of the acid, which can cause greater corrosion to the lead plates. Charging helps to mix the two again as does moving the battery. So riding your bike is one of the best cures for stratification!

Corrosion

In humid environments electrical systems often begin to corrode when they are left sitting for extended periods of time. This leads to poor operating performance, which can manifest itself as dim headlights or a dead horn. Sometimes the motorcycle will no longer be able to start as the earths have stopped passing current.

Problems with the wiring or electrical systems can be very hard to diagnose and may take hours of probing with a multimeter. In general the more modern the motorcycle is the more complex its electrical system will be.

Battery maintenance

To protect yourself from these problems there are a couple of simple defences you can employ. When placing a motorcycle into storage always ensure the battery charge is maintained. Preferably the battery should be removed from the bike and brought inside so that it is not exposed to extremes of temperature. Regular charging of at least once a fortnight is then required to keep the battery in top condition.

The best chargers for this duty are called battery tenders or trickle chargers, which as their name suggests gently ensure that the battery is always charged up. Some of the more expensive chargers are also smart enough to be able to recover partially damaged batteries and attempt to reverse the destructive sulfation process.

Wiring and general electrics

The wiring and electrical systems need to be protected from moisture so connections should be coated in a non-conductive hydrophobic material. Often this can be as simple as daubing the connectors in some grease such as the copper slip used behind brake pads or just about any anti-seize lubricant. This serves as a water repellent layer over the metal of connections and prevents corrosion.

When selecting the grease to you it is important that it does not conduct electricity if you plan to use it on connectors with many wires. Otherwise you may end up inadvertently shorting out some wiring or a connector, which could lead to expensive damage or a poorly running motorcycle.

As you bring the bike back out of storage in the spring it is a good idea to thoroughly check that all the electrical systems are operating as you would expect. You dont want a car missing that brake light!

https://www.simonholywell.com/post/2011/01/winter-motorcycle-electrics/

Winter motorcycle storage: suspension and tyres

Simon Holywell Jan 5, 2011 Updated Nov 19, 2024

This is by no means an exhaustive list of steps for long term motorcycle (or car for that matter) storage, but a few tips I have picked up along the way. When a motorcycle is put into storage it will often remain in the same position for extended periods of time. This is not the intended purpose from factory though and as such the tyres and suspension will not thank you for it.

Show full content

This is by no means an exhaustive list of steps for long term motorcycle (or car for that matter) storage, but a few tips I have picked up along the way.

When a motorcycle is put into storage it will often remain in the same position for extended periods of time. This is not the intended purpose from factory though and as such the tyres and suspension will not thank you for it.

Tyres

Tyres tend to lose pressure through the winter months and under inflated tyres are more likely to get flat spots. If the pressure drops significantly then the side walls will tend to crack if they are left sitting on cold concrete.

Once a tyre has been affected by either a flat spot or a crack then it should be replaced immediately. Both problems are a major hazard that could lead to dramatic tyre deflation at speed or big blow outs. Often this will lead to the bike continuing down the street on its side.

Suspension

Although it is not a significant issue in my experience or research it also makes sense to unload the suspension so that it can relax for the duration of the storage. This might help to prevent spring sag, but I am unaware of any other benefits.

Suspension is basically comprised of oil, springs and seals. Seals are made of rubber and when not in use rubber stiffens and becomes brittle, which can cause cracking and seal failure.

Replacing fork seals is a reasonably time consuming process and therefore can be an expensive trip to the dealer if you are not doing it yourself.

The best way to avoid this issue is visit your motorcycle every fortnight or so, sit in the saddle and drop the bike off of the stand. Now it may sound odd but you need to have a good bounce on the bike to cause the suspension to compress and extend a number of times. This puts seals through their paces so that their propensity to stiffen is reduced.

I find the easiest way to get the forks to travel nicely is to rock the bike back and forth applying the brake firmly as the bike comes forward. If there is more than one person present you can have them push down on the head stock as you sit on the bike to stabilise it.

Another tip is to add little fork oil to your fork stanchions so that your seals remain a little more protected. I like to add some onto the stanchions before I attempt to bounce the bike as mentioned above. Avoid WD-40 as it can swell rubber seals and if it penetrates the seal it can mix with the oil in fork and thin it reducing damping.

Stand Solutions

If your motorcycle has a centre stand then you can make use of it to avoid both problems. With be bike on the stand the rear wheel will clear the ground, which leaves you the simple task of propping the front wheel off of the ground.

A lot of modern bikes do not come with centre stands to improve ground clearance, but there are solutions such as the Superbike Stand from abba. Other options include the Bursig and the Becker-Technik Motorbike-Lifter. I am using an abba stand myself, but the Becker-Technik does look like a nice alternative.

If you just want to get the wheels off the ground then you can use a pair of paddock stands, but this will not take the pressure off of the suspension. A couple of paddock stand options include the Venom Headlift Paddock Stand, Genssi Motorcyle Pro Front & Rear Stands or MOTO-D Front & Rear Paddock Stands.

Flooring Solutions

Seems like too much effort? Then you should at least consider putting the bike a sheet of timber or chipboard to help protect the tyres from the cold concrete. Some people have also used foam or rubber tiles under the bike. Like these NT interlocking rubber tiles for example.

You will still need to move your bike or rotate the wheels regularly so that the tyres do not develop flat spots.

The essential information to take away from this is that your bike does not like being stationary and it should be kept off of its wheels.

https://www.simonholywell.com/post/2011/01/winter-motorcycle-tyres/

PHP Hangs When Fed 2.2250738585072011e-308

Simon Holywell Jan 4, 2011 Updated Nov 19, 2024

PHP Hangs When Fed 2.2250738585072011e-308 A pretty horrible bug when you assign the number 2.2250738585072011e-308 to a variable PHP will hang on Linux or Windows 32bit builds of PHP. This does affect $_GET and $_POST variables as well and as such could be an exploit in some PHP sites. So the following code will break your PHP for example: $var = 2.2250738585072011e-308; Or if a page is given a GET parameter like page.

Show full content

PHP Hangs When Fed 2.2250738585072011e-308

A pretty horrible bug when you assign the number 2.2250738585072011e-308 to a variable PHP will hang on Linux or Windows 32bit builds of PHP. This does affect $_GET and $_POST variables as well and as such could be an exploit in some PHP sites.

So the following code will break your PHP for example:

$var = 2.2250738585072011e-308;

Or if a page is given a GET parameter like page.php?param=2.2250738585072011e-308

$var = $_GET['param'] + 1;
//OR
$var = (double)$_GET['param'];

More debate available on http://news.ycombinator.com/item?id=2066084

https://www.simonholywell.com/post/2011/01/php-number-error/

Winter motorcycle storage: fuel system

Simon Holywell Jan 4, 2011 Updated Nov 19, 2024

This is by no means an exhaustive list of steps for long term motorcycle (or car for that matter) storage, but a few tips I have picked up along the way. The most commonly known issue with bikes in storage is bad or stale fuel, which I have described the causes of in a previous post; Why does fuel go stale?. Generally petrol is fairly stable and won’t break down all that quickly, but motorcycle fuel tanks present problems that dedicated fuel containers do not.

Show full content

This is by no means an exhaustive list of steps for long term motorcycle (or car for that matter) storage, but a few tips I have picked up along the way.

The most commonly known issue with bikes in storage is bad or stale fuel, which I have described the causes of in a previous post; Why does fuel go stale?. Generally petrol is fairly stable and won’t break down all that quickly, but motorcycle fuel tanks present problems that dedicated fuel containers do not. All tanks are vented to the outside world, which means that air can get in and bring moisture with it that will condense inside the tank therefore adding water to your fuel! If you do not have a plastic fuel tank like my Cagiva then you will also potentially suffer from rust inside the tank that will leach into your fuel.

When fuel goes off it begins to lose its volatility and then starts to turn into a sludge, which begins to block the fuel hoses and carburettors or throttle bodies on an injected bike.

This sludge will eventually leave a varnish like effect all through your fuel system that will have to be removed with combination of special cleaners and a good few blasts of compressed air. I alluded to this process in a previous post about my old Suzuki Bandit.

For more information about fuel systems I recommend the following books:

Motorcycle Fuel Systems TechBook (Haynes Techbooks)
The Haynes Manual on Carburettors (Haynes DIY Manuals)
Motor Cycle Carburettor Manual

To avoid having to remove the fuel system, flush it out and potentially rebuild your carburettors; there are at least two things you can do:

Drain the fuel tank completely of all fuel as much as you can and then you can:
1. Run the bike until all the fuel in the system has been burnt and the machine conks out.
2. Open up the carb float bowl drain plugs and drain all the fuel out that way.
Once this is done if you have a steel fuel tank you will want to either prep it with some thin oil to stop it from rusting or re-fill the tank with fresh fuel (potentially think about adding a fuel stabiliser for longer periods of storage – see below) to the absolute brim.
Take the bike out for a nice long run and get the empty light flashing whereupon you pull into a petrol station and add some fuel stabiliser to the tank such as Lucas, Sta-Bil, Putoline Fuel Stabiliser or Sea Foam. Then fill the tank to the brim to prevent the possibility of the build up of condensation.

By adding the fuel stabiliser first you will get a good mix as the fuel is added because the flow of fuel will cause turbulence inside the tank.

To ensure that the fuel stabiliser works its way through the entire system take the bike for a quick blast.

The bike is now ready from a fuel point of view for its winter storage.

https://www.simonholywell.com/post/2011/01/winter-motorcycle-fuel/

FullOctane: Bike and Car Blog

Simon Holywell Jan 3, 2011 Updated Nov 19, 2024

FullOctane: Bike and Car Blog I have setup a site to post all my automotive related discoveries. Currently I have two posts up there about fuel degradation and storage. Putting motorcycles into storage or “winterizing” can be an involved procedure so I have decide to cover it in a series of posts beginning with the fuel system in the post entitled ‘Winter Motorcycle Storage: Fuel System’. Continuing the fuel theme the reasoning behind previous post is backed up by a technical article that addresses the question; ‘Why does fuel go stale?

Show full content

FullOctane: Bike and Car Blog

I have setup a site to post all my automotive related discoveries. Currently I have two posts up there about fuel degradation and storage.

Putting motorcycles into storage or “winterizing” can be an involved procedure so I have decide to cover it in a series of posts beginning with the fuel system in the post entitled ‘Winter Motorcycle Storage: Fuel System’.

Continuing the fuel theme the reasoning behind previous post is backed up by a technical article that addresses the question; ‘Why does fuel go stale?’.

https://www.simonholywell.com/post/2011/01/fulloctane-introduction/

Why does fuel go stale?

Simon Holywell Dec 23, 2010 Updated Nov 19, 2024

When stored correctly, high quality gasolene should continue to be stable forever (well almost!). There are few factors that contribute to the degradation of petrol with the two primary concerns being oxidisation and water. Oxidation If petrol is not stored in an air tight container then the process of oxidation occurs. Fuel that has been exposed to air flow will begin to look cloudy and get darker in colour. Sometimes you may even be able to see particles floating in the fuel if it is badly oxidised.

Show full content

When stored correctly, high quality gasolene should continue to be stable forever (well almost!). There are few factors that contribute to the degradation of petrol with the two primary concerns being oxidisation and water.

Oxidation

If petrol is not stored in an air tight container then the process of oxidation occurs. Fuel that has been exposed to air flow will begin to look cloudy and get darker in colour. Sometimes you may even be able to see particles floating in the fuel if it is badly oxidised.

Once the fuel is in this state it is dangerous to add it to an engine as the fuel will form deposits in the fuel system. Fuel stored in a vehicle is not air tight – particularly in motorcycles and therefore these deposits can build up and affect the proper working of the engine.

A common way to prevent this is to add anti-oxidants such as Lucas Fuel Stabilizer.

Water in the fuel

Water in fuel is a big problem as internal combustion engines cannot ignite water! Water and petrol do not mix so if your fuel tank had transparent sides you would see the petrol sitting on top of a layer of water. This water is sometimes clear, but usually appears as a rusty or dirty colour.

The fuel pickup for most fuel pumps is located towards the bottom of the fuel tank so the engine will be fed the water first. Once the water gets through the fuel lines the engine will be starved of petrol and fail.

Many countries now have up to 10% ethanol (Brazil has 25%!) mixed into fuel for a variety of reasons such as the reduction of carbon emissions or to increase the octane rating of the fuel. Ethyl alcohols are hydroscopic and can easily absorb the moisture in air humidity and therefore increase the water content of the fuel.

This may not be such a big problem because the water that binds with the alcohol will be burnt by the engine, but it will reduce the quality of performance that can be achieved.

An old trick that can remove water from fuel is to throw a bottle of methylated spirits into the fuel tank. This trick exploits the hydroscopic properties of the alcohol so that water is absorbed into the alcohol and can ignite in the engine. However it is important to note that the alcohol will become saturated if there is more water in the tank than it can bind with.

A more controlled way of dealing with water in the fuel is to add a bottle of Wynns Dry Fuel or Wurth Petrol Engine Additive to the suspect fuel.

Other Factors

Other factors in fuel deterioration include contaminants such as rust, dirt or oil and the temperature. The degradation of fuel is accelerated by higher temperatures (above 26°C or 80°F) so fuel should be stored in cool areas and obviously out of direct sunlight!

Vapour Pressure

Whilst not strictly stale fuel; poor starting using stored fuel could be attributed to government regulated volatility. I believe this is likely to only be a problem for the US.

This is measured using RVP (Reid vapour pressure) and differing levels are used depending on the ambient temperature. Fuels with a higher RVP evaporate more easily than those with a lower RVP rating because the components in the latter have a heavier molecular weight.

In summer a fuel of lower RVP (~7.8 to ~9 PSI) is used, which prevents the fuel from evaporating and being wasted into the atmosphere or causing vapour lock. Vapour lock is caused by the fuel turning into a gas in the fuel lines, which then starves the engine of fuel because the pump can only move liquid.

As winter comes around the RVP will be increased with as many as eight graduations and it could eventually end up in the region of ~15 PSI. This means that the fuel can evaporate more easily in colder temperatures therefore making it easier to start a vehicle.

The change in the fuel mixture or oxygenates is usually performed to improve the clean burning of fuels in the summer months to reduce smog and pollution. More information on this subject can be found on Environmental Protection Agency website.

This being the case you could find it difficult to start a vehicle in winter that is filled up with stored fuel from summer. One potential solution in this case is to get it started with some start spray or drain the fuel and replace it with the correct winter fuel.

Protecting fuel in storage

To prevent these problems from affecting your fuel there are five things you can do:

Only use dedicated fuel containers such as jerry cans.
Ensure containers are air tight and capped tightly to prevent evaporation and exposure to air and moisture.
Use a fuel stabiliser product like Lucas Fuel Stabilizer.
Fill containers as completely as possible leaving a 5% air gap for expansion if the temperature rises.
Store containers out of direct sunlight where the temperature does not exceed 26°C or 80°F. If the temperature is exceeded then fuel will begin to degrade more rapidly.

With these precautions in place some people have reported up to 5 years of stable storage, but I am sure there are others out there who have managed to keep fuel for longer.

https://www.simonholywell.com/post/2010/12/why-does-fuel-go-stale/

wombert: xkcd: Convincing

Simon Holywell Dec 13, 2010 Updated Nov 19, 2024

wombert: xkcd: Convincing

Show full content

wombert:

xkcd: Convincing

https://www.simonholywell.com/post/2010/12/wombert-xkcd-convincing/

Logging global PHP objects and saving memory using a lazy loading proxy

Simon Holywell Dec 2, 2010 Updated Nov 19, 2024

Quite often when you are working with legacy code you will come across a mess of globals. Every single method will make use of the same global instance of the database class for example. So where do you begin to work with this massive impediment? Logging is a great way to see what methods and classes are being used by you application and where. To achieve this you would normally need to add a logging call to each and every method in the code base.

Show full content

Quite often when you are working with legacy code you will come across a mess of globals. Every single method will make use of the same global instance of the database class for example. So where do you begin to work with this massive impediment?

Logging is a great way to see what methods and classes are being used by you application and where. To achieve this you would normally need to add a logging call to each and every method in the code base. Clearly this would be incredibly tedious and time consuming.

This is where a proxy object can be implemented to save time and centralise the logging functions. The basic idea of a proxy object is that it will be instantiated in place of the actual class and the proxy will delegate any calls through to the original class. For the purposes of this example the original class will be called Database and the proxy object will be called LazyLoadingProxy.

Lazy Loading Proxy diagram

Based upon the diagram above we can work through a simple example to demonstrate this powerful technique. Initially the class lazy loading proxy is instantiated and assigned to the global variable $Database.

<?php
$Database = new LazyLoadingProxy('Database', '/home/simon/DatabaseClass.php');
$Blog     = new LazyLoadingProxy('Blog', '/var/www/classes/Blog.class.php5');

When the proxy is setup it is told the class name and then the exact path where it can find the class. The proxy will now lie in wait for a request to a method or class property of the proxied class (Database).

<?php
echo $Database->getTable(); // echo the name of the table

If the underlying class is accessed the proxy will require_once() Database.class.php and create a new instance of it on the fly. This is the lazy loading aspect of the process. In this way resources are not consumed until the Database class is really needed. Once instantiated the object is “cached” so that any future requests will reuse the same instance of the class.

So our proxy lazy load class is very simple like the following:

<?php
/**
 * @author Simon Holywell <treffynnon@php.net>
 */
class LazyLoadingProxy {
    /**
     * Where the instance of the actual class is stored.
     * @var $instance object
     */
    private $instance = null;

    /**
     * The name of the class to load
     * @var $class_name string
     */
    private $class_name = null;

    /**
     * The path to the class to load
     * @var $class_path string
     */
    private $class_path = null;

    /**
     * Set the name of the class this LazyLoader should proxy
     * at the time of instantiation
     * @param $class_name string
     */
    public function __construct($class_name, $class_path = null) {
        $this->setClassName($class_name);
        $this->setClassPath($class_path);
    }

    public function setClassName($class_name) {
        if(null !== $class_name) {
            $this->class_name = $class_name;
        }
    }

    public function getClassName() {
        return $this->class_name;
    }

    public function setClassPath($class_path) {
        if(null !== $class_path) {
            $this->class_path = $class_path;
        }
    }

    public function getClassPath() {
        return $this->class_path;
    }

    /**
     * Get the instance of the class this LazyLoader is proxying.
     * If the instance does not already exist then it is initialised.
     * @return object An instance of the class this LazyLoader is proxying
     */
    public function getInstance() {
        if(null === $this->instance) {
            $this->instance = $this->initInstance();
        }
        return $this->instance;
    }

    /**
     * Load an instance of the class that is being proxied.
     * @return object An instance of the class this LazyLoader is proxying
     */
    private function initInstance() {
        Logger::log('Loaded: ' . $class_name);
        require_once($this->class_path);
        $class_name = $this->class_name;
        return new $class_name();
    }

    /**
     * Magic Method to call functions on the class that is being proxied.
     * @return mixed Whatever the requested method would normally return
     */
    public function __call($name, $arguments) {
        $instance = $this->getInstance();
        Logger::log('Called: ' . $this->class_name . '->' . $name . '(' . print_r($arguments, true) . ');');
        return call_user_func_array(
                array($instance, $name),
                $arguments
            );
    }

    /**
     * These are the standard PHP Magic Methods to access
     * the class properties of the class that is being proxied.
     */
    public function __get($name) {
        Logger::log('Getting property: ' . $this->class_name . '->' . $name);
        return $this->getInstance()->$name;
    }

    public function __set($name, $value) {
        Logger::log('Setting property: ' . $this->class_name . '->' . $name);
        $this->getInstance()->$name = $value;
    }

    public function __isset($name) {
        Logger::log('Checking isset for property: ' . $this->class_name . '->' . $name);
        return isset($this->getInstance()->$name);
    }

    public function __unset($name) {
        Logger::log('Unsetting property: ' . $this->class_name . '->' . $name);
        unset($this->getInstance()->$name);
    }
}

To log the calls you can simply echo out the request in each magic method or you could make it more sophisticated with the help of FirePHP and FireBug. I use the latter and it is really handy to see where the legacy code is calling classes and methods that it should not be!

The only hiccup I have found with this system is that all tests using method_exists() need to be changed to be is_callable(). This is because the former appears to use PHPs introspection methods where as the latter attempts to call the method from what I can guess.

This design pattern also has another application in systems that make extensive use of global objects which are instantiated at bootstrap. It will save memory as only the classes that are actually referenced will be instantiated just in time.

I recently used this to make a large legacy application more efficient. In the bootstrap file it simply looped through all the files in a directory called classes and looked for files beginning with a certain prefix (for example SYSBlog.php or SYSSessions.php). When it found a file it would load it via require_once() and then using eval() it would instantiate the class. This would subsequently be used globally as you can see below.

<?php
$dir = 'classes/';
if (is_dir($dir)) {
    if ($dh = opendir($dir)) {
        while (($file = readdir($dh)) !== false) {
            if($file != '.' && $file != '..' && !is_dir($dir.$file)){
                require_once($dir.$file);
            }
        }
        closedir($dh);
    }
}
// SNIP
// some unrelated code such as DB connectors appeared here
// /SNIP
$classes = get_declared_classes();
foreach($classes as $class){
    if(substr($class,0,3)=='SYS'){
        $class2 = str_replace('SYS','',$class);
        eval('$'.strtolower($class2).' = new '.$class.'();');
    }
}

A few optimisations that could be applied to the aforementioned code would be to:

make use of the glob() function in PHP, which allows you loop through files that match a pattern. In this case something like glob('SYS*.php') would have done the trick.
remove the use of eval() which could just as easily be implemented using variable variables. So in the example this should be implemented as $$className = $className() and if you need to access a property $$className->getMethod().
make use of the SPL autoload functions that PHP provides to call the classes when they are needed.

But lets just be honest with ourselves; the code is crap and needs to be rewritten, but what if we do not have the time? Then we can use the following (only slightly improved I know):

<?php
$dir = 'classes/';
if (is_dir($dir)) {
    if ($dh = opendir($dir)) {
        while (($file = readdir($dh)) !== false) {
            if($file != '.' && $file != '..' && !is_dir($dir.$file)){
                require_once($dir.$file);
            }
        }
        closedir($dh);
    }
}
// SNIP
// some unrelated code such as DB connectors appeared here
// /SNIP
$classes = get_declared_classes();
foreach($classes as $class){
    if(substr($class,0,3)=='SYS'){
        $class2 = str_replace('SYS','',$class);
        eval('$'.strtolower($class2).' = new '.$class.'();');
    }
}

Now the class will only be loaded into memory when it is actually being used. This makes sense when you think that a response to a poll would never need to use methods in the blog classes for example. I found this technique was saving 30-70% of memory depending on the request. This could be most easily seen when making ajax requests as they often only made use of one class to perform their actions.

In the actual project the code is slightly different in that it does use an autoloader called Überloader written by Jamie Matthews (a colleague at Mosaic), which is available over on his github account. With the autoloader I was able to dispense with all the require_once()s and the need to pass the class path into the lazy loading proxy class.

https://www.simonholywell.com/post/2010/12/logging-global-php-objects-lazy-loading-proxy/

Set up a new port forward on a Draytek Vigor over the telnet interface

Simon Holywell Nov 20, 2010 Updated Nov 19, 2024

I needed to add a new port forward to a router, but I did not have access to the web interface through a graphical browser. Attempts to get in using Lynx stalled as it seems the router will not serve up the frames in the interface independently of each other and it kept issuing 404 errors. Either way I had to use the telnet interface using the following command (replace 192.

Show full content

I needed to add a new port forward to a router, but I did not have access to the web interface through a graphical browser. Attempts to get in using Lynx stalled as it seems the router will not serve up the frames in the interface independently of each other and it kept issuing 404 errors.

Either way I had to use the telnet interface using the following command (replace 192.168.1.1 23 with the IP address of your router):

 telnet 192.168.1.1 23

This is fine except that Draytek have absolutely no documentation available for the commands. So to discover the correct command I had to go through all the available options (and sub options and sub sub options) as it was not immediately clear to me which option port forwarding was hiding under. To give you an idea here is a list of the top level options (run the ? command to get this view):

 > ?
 % Valid commands are:
 adsl         bpa          csm          webf         ddns         ddos
 urlf         kw           exit         fe           internet     ip
 ipf          log          mngt         port         portmaptime  prn
 quit         show         srv          sys          tsmail       upnp
 vigbrg       vlan         vpn          wan          wol          qos

The option we are interested in is srv which has a number of sub options but we are only interested in nat. Now we have yet more options but lets just stick with portmap.

If you need extra information about a command or it’s sub options you can run the ? option at any time. For example:

 srv nat portmap ?

Of the options available under portmap we are interested in add and table.

Firstly you need to execute:

 srv nat portmap table

So you can see the port forwards that have already been setup. This will allow you find the next available index and find out the WAN numbers. Do not use q for quit but just press enter until you get back to the telnet prompt. My print out looks something like this:

 > srv nat portmap table

 NAT Port Redirection Configuration Table:

 Index  Service Name    Protocol  Public Port  Private IP      Private Port ifno
  1     SSH                6         1963   192.168.0.255          22      -2
  2                        0            0                        0      -2
  3                        0            0                        0      -2
  4                        0            0                        0      -2
  5                        0            0                        0      -2
  6                        0            0                        0      -2
  7                        0            0                        0      -2
  8                        0            0                        0      -2
  9                        0            0                        0      -2
 10                        0            0                        0      -2
 11                        0            0                        0      -2
 12                        0            0                        0      -2
 13                        0            0                        0      -2
 14                        0            0                        0      -2
 15                        0            0                        0      -2
 16                        0            0                        0      -2
 17                        0            0                        0      -2
 18                        0            0                        0      -2
 19                        0            0                        0      -2
 20                        0            0                        0      -2

 Protocol: 0 = Disable, 6 = TCP, 17 = UDP
 --- MORE ---   ['q': Quit, 'Enter': New Lines, 'Space Bar': Next Page] ---

 ifno: 0 = all, 3 = wan1, 4 = wan2

ifno is the interface number, which translate to our WAN number in the srv nat portmap add command. I am using 0 so that is available to all WANs. From the index column I can also see that the next available slot is 2.

Now we have enough information to add the port forward! The add command has the following syntax (we are looking at the second line):

 > srv nat portmap add ?
 % srv nat portmap add <idx> <serv name> <proto> <pub port> <pri ip> <pri port> <wan1/wan2>

So let us translate this to use the same terms as the table we saw earlier:

idx Index serv name Service Name Surround this with quotes if you want to have spaces in the name. proto Protocol This must be in lowercase only such as tcp or udp. pub port Public Port The public port number you want to forward to your internal machine. pri ip Private IP The IP address of your internal machine. pri port Private Port The port number you are using on the internal machine. wan1/wan2 ifno In my case this was 0 for all, 3 for wan1 and 4 for wan2.

So this means I need to run:

 srv nat portmap add 2 "Simons Test" tcp 3840 192.168.0.255 3841 0

to add a new port forward. Your done and you can now access the machine via the public port.

As a simple example if I wanted to open up HTTP over port 8080 instead of the standard port 80 I can use the following port forward command:

 srv nat portmap add 2 "Non-standard HTTP Port" tcp 8080 192.168.0.255 80 0

Now Apache on my internal machine (192.168.0.255) is still serving on port 80 internally to the network, but to access it from the outside world you need to specify port 8080.

https://www.simonholywell.com/post/2010/11/setup-a-new-portforward-on-vigor-over-telnet/

An Excellent Development Server for a Team of Developers

Simon Holywell Nov 8, 2010 Updated Nov 19, 2024

Introduction When working in a team it is very useful to have a central web server with multiple environments and a configuration as close to the live server as possible. This can be a bit of a nightmare though if you need to setup a new VirtualHost container in Apache every time a new project is brought on or when a developer wants to work on a version of the site in their own environment.

Show full content

Introduction

When working in a team it is very useful to have a central web server with multiple environments and a configuration as close to the live server as possible. This can be a bit of a nightmare though if you need to setup a new VirtualHost container in Apache every time a new project is brought on or when a developer wants to work on a version of the site in their own environment.

The good news is that this can all be handled automatically and new sites can be setup by simply adding a new directory to the file system. There are at least two ways of getting this going; the first of which is the mod_vhost_alias module for Apache and the second is enabled via mod_rewrite. I prefer to use the second method as it is more flexible and it allows you tap into the ability of mod_rewrite to introduce environment variables and redirect requests (this is particularly useful for robots.txt - you’ll see).

The Apache2 Manual does have a very good page dedicated to overcoming this problem, but I will be sharing with you all the settings I am using which you will need to stop Google et. al. from crawling your sites served from the staging environment for example.

Intended Audience

This article is targeted at people who have a fairly good working knowledge of Apache and VirtualHost container configuration. I have previously written a more basic and step by step development server configuration article, but it is focused at single developers as it uses a virtual machine.

It is still a good introduction though and takes you through all the installation steps on an Ubuntu machine. If you are not confident installing and configuring a LAMP server then head over and give my A Good Windows Development Environment and Ubuntu Virtualbox article a read first. Skip to step eight if you are not going to be using a virtual machine.

My Setup

In my environment I have three subdomains that I use to host various aspects of a project:

proof.example.org
*.dev.example.org
staging.dev.example.org

They are all essentially handled the same by Apache. When a request comes into the server from one of the above domains mod_rewrite will direct the request to the correct file location. I use the following (respectively):

/var/vhosts/proof
/home/*/www/
/var/vhosts/staging

The *.dev subdomain is used for each users individual development area. So simon.dev.example.org would map to /home/simon/www/ in the file system. Staging is for demonstrating sites to clients and proof is a location used to show designs & wireframes for sign off.

I will now run through how you would create a staging site using this system. Simply:

Create a new directory in /var/vhosts/staging using the name of your site (eg. mysitename).
Now create a directory called pub inside your newly created directory (/var/vhosts/staging/mysitename).
Copy or create your public files in pub (/var/vhosts/staging/mysitename/pub). You may need to include a .htaccess file here with a RewriteBase / line in it.
Navigate to mysitename.staging.example.org to see the served page.

Having the pub directory allows you have the bulk of your application outside of public HTTP folder.

So now you can see how easy, quick and simple it is to use lets delve into how it works.

Development Server Configuration Initial Include

The first file that is necessary is a simple link to our main development server configuration file. This keeps the main Apache configuration file simple and clean, which will make upgrades further down the line less painful.

# This file is /etc/apache2/httpd.conf
# This file is automatically included by Ubuntu in /etc/apache2/apache2.conf

Include /etc/apache2/dev-server.conf

The above file is at /etc/apache2/httpd.conf by default in Ubuntu, but your distribution may differ in which case it might be /etc/httpd/httpd.conf. Essentially you need to add the Include line to your default Apache configuration.

VirtualHost Containers

This leads us to the VirtualHost configuration file. This includes two VirtualHost containers, one for port 80 traffic and one for SSL port 443 traffic.

Both containers are essentially the same except for the actual SSL options so I have included the main or common elements in a separate file (/etc/apache2/dev-server-vhost.conf) so they can easily be reused in both. This saves a lot of duplication and allows you to easily update them both in once place. This will make more sense when I describe the dev-server-vhost.conf file further on.

# This file is /etc/apache2/dev-server.conf
# This file is included by /etc/apache2/httpd.conf
# Hide server information and setup VirtualHost container skeletons for HTTP and HTTPS

# Hide server vitals from responses
ServerSignature Off
ServerTokens Min

# Mass Virtual Hosting
<VirtualHost *:80>
    <IfModule mod_ssl.c>
        SSLEngine off
    </IfModule>
    Include /etc/apache2/dev-server-vhost.conf
</VirtualHost>


# SSL Mass Virtual Hosting
<VirtualHost *:443>
    <IfModule mod_ssl.c>
        SSLEngine on
        SSLOptions +StrictRequire
        SSLCertificateFile /etc/ssl/certs/server.crt
        SSLCertificateKeyFile /etc/ssl/private/server.key
    </IfModule>
    Include /etc/apache2/dev-server-vhost.conf
</VirtualHost>

In this file I firstly configure the server to broadcast as little as possible about itself as a simple way of slowing down possible attackers and hopefull deterring script kiddies. Next the first VirtualHost container is specified for standard port eighty traffic.

In the VirtualHost container I have ensured that SSL is definitely deactivated for this port and then included the common elements from the dev-server-vhost.conf file mentioned earlier.

The second VirtualHost container is basically the same except for the mod_ssl options. Firstly it enables SSL and forces all traffic on this port through SSL encryption. Also specified here are the SSL certificate and the encryption key that the server should use for communication with the client. If you do not know how to create an SSL certificate then the Apache2 manual has you covered. Finally the dev-server-vhost.conf file is included again.

VirtualHost Container Common Configuration Include

The third file is the dev-server-vhost.conf file which is included in both of our VirtualHost containers as mentioned above. This is quite a large set of configuration options, but nothing overly complex.

Basic Configuration

The first few options should be familiar to you; simply setting the DocumentRoot and ServerName etc. After this there is a simple line to block any known robots just incase they do not intend to respect the robots.txt. Followed by Directory directives that enforce the bad_bot blocking. They also stop the contents of directories being displayed or indexed.

# This file is /etc/apache2/dev-server-vhost.conf
# This file is included from the virtual hosts in /etc/apache2/conf.d/dev-server.conf
# The guts of the VirtualHost container with robot blocking and mass virtual hosting via mod_rewrite

ServerName dev.example.org
ServerAdmin example@example.org
ServerAlias *.dev *.dev.example.org staging.example.org *.staging *.staging.example.org proof.example.org *.proof *.proof.example.org
DocumentRoot /var/www
LimitInternalRecursion 15


# Detect bots by user agent
SetEnvIfNoCase User-Agent ".*(Googlebot|msnbot|Yahoo! Slurp|YahooSeeker|Yahoo-Blogs|bot|robot|spider|Ask Jeeves|ArchitextSpider|Scooter|AltaVista|Slurp|Crawler|WebCrawler|Lycos).*" bad_bot

# Detect incorrect website access by referer. This is to stop anybody
# clicking on a link from the following search engine's listings.
SetEnvIfNoCase Referer ".*(Google\.|Yahoo\.|Bing\.|Ask\.|Excite\.|Lycos\.|AltaVista\.|WebCrawler\.).*" bad_bot
<Directory /home/>
    Options Indexes FollowSymLinks Multiviews
    AllowOverride All
    Order allow,deny
    Allow from all
    Deny from env=bad_bot
</Directory>
<Directory /var/vhosts/virtual>
   Options Indexes FollowSymLinks Multiviews
    AllowOverride All
    Order allow,deny
    Allow from all
    Deny from env=bad_bot
</Directory>
<Directory /var/vhosts/proof>
    Options Indexes FollowSymLinks Multiviews
    AllowOverride All
    Order allow,deny
    Allow from all
    Deny from env=bad_bot
</Directory>

RewriteEngine on

# Rewrite log settings
RewriteLog /var/log/apache2/dev-rewrite.log
#0-9: 0 = none 9 = verbose
RewriteLogLevel 0

# Block robots with a central robots.txt for all sites
RewriteRule ^/robots.txt$ /var/vhosts/robots.txt [L]

# Filter to parse URLs to lowercase
RewriteMap lowercase int:tolower

# Dev sites
# With the format "user    /home/dir/"
RewriteMap vhost txt:/etc/apache2/dev-server-rewrite.map
RewriteCond ${lowercase:%{SERVER_NAME}} ([^.]+)\.([^.]+)\.dev\.

# Windows:
# RewriteCond %1.${vhost:%2}£%2 ^([a-z]+)\.([d].*)£([a-z]+)$

# Linux:
RewriteCond %1.${vhost:%2}£%2 ^([a-z]+)\.(/.*)£([a-z]+)$

# The £%2 is only there so that the developer
# username can be passed between RewriteConds.
# The £ symbol is just a delimiter.
RewriteRule ^/(.*)$ %2/%1/pub/$1 [L,E=VIRTUAL_DOCUMENT_ROOT:%2/%1/pub,E=WE_ARE_ON_STAGING:TRUE,E=DEVELOPER_USERNAME:%3]

# Staging sites
RewriteCond ${lowercase:%{SERVER_NAME}} ([^.]+)\.staging\.
RewriteRule ^/(.*)$ /var/vhosts/staging/%1/pub/$1 [L,E=VIRTUAL_DOCUMENT_ROOT:/var/vhosts/staging/%1/pub,E=WE_ARE_ON_STAGING:TRUE]

# Proof sites
RewriteCond ${lowercase:%{SERVER_NAME}} ([^.]+)\.proof\.
RewriteRule ^/(.*)$ /var/vhosts/proof/%1/pub/$1 [L,E=VIRTUAL_DOCUMENT_ROOT:/var/vhosts/proof/%1/pub,E=WE_ARE_ON_STAGING:TRUE]

# Logging
LogLevel debug
LogFormat "%{Host}i %h %l %u %t \"%r\" %s %b" vcommon
CustomLog /var/log/apache2/dev-access.log vcommon
ErrorLog /var/log/apache2/dev-error.log

mod_rewrite Setup and Logging

Next mod_rewrite is employed and a log file is specified for debugging rewrite requests. I have set the log level to zero by default to disable logging but bump it up to 9 and it will give you full request logging.

Central Robots.txt

All requests to robots.txt on a particular site are rewritten to a central file that simply denies all robot traffic to all URLs. If you are not sure how to specify this then check out the documentation on Robotstxt.org.

# This file is /var/vhosts/robots.txt
# The central robots.txt file.

User-agent: *
Disallow: /

Development Environments for Individuals

Further on we have the first rewrite rule for *.dev.example.org sites. These are for each developers individual development environment. It uses a RewriteMap file to match developers name in the URL to a location in the file system. This looks like the following and it is self explanatory:

# This file is /etc/apache2/dev-server-rewrite.map
# This file is included from the virtual hosts in /etc/apache2/dev-server-vhost.conf
# Maps a username to a location in the file system uses the following format:
# name<tab>/dir/location

simon   /home/simon/www
joe     /home/joe/www
jane    /home/jane/www

As you can see from the directory layout above you will need to setup a new user for each developer and add their path and username to the map file.

Environment Variables for Bootstrapping

When the developers environment is rewritten three environment variables are set for use in your scripts (in PHP this is available in $_SERVER). The variables would look like the following if you were to access the site mysitename.simon.dev.example.org:

VIRTUAL_DOCUMENT_ROOT = /home/simon/www/mysitename/pub
WE_ARE_ON_STAGING = TRUE
DEVELOPER_USERNAME = simon

VIRTUAL_DOCUMENT_ROOT

The VIRTUAL_DOCUMENT_ROOT is the location of the files that are being served and can be used in place of DOCUMENT_ROOT in your scripts.

This can get tedious however so I have setup an auto_prepend file in my PHP configuration that converts VIRTUAL_DOCUMENT_ROOT to DOCUMENT_ROOT automatically.

<?php
// This file is /etc/php/auto_prepend.php
// It is actioned by setting auto_prepend_file="/etc/php/auto_prepend.php" in your PHP INI/Config
// Converts the virtual document root to be the standard document root

if(isset($_SERVER['VIRTUAL_DOCUMENT_ROOT']) and
   !empty($_SERVER['VIRTUAL_DOCUMENT_ROOT'])) {
    $_SERVER['DOCUMENT_ROOT'] = $_SERVER['VIRTUAL_DOCUMENT_ROOT'];
}

Then set the PHP INI configuration variable auto_prepend_file to the path of your file. In my case above this would look like: auto_prepend_file="/etc/php/auto_prepend.php"

This is the only way I have found to override the value in the DOCUMENT_ROOT environment variable. Attempting to do so in the RewriteRule will not work.

WE_ARE_ON_STAGING and DEVELOPER_USERNAME

The next two environment variables WE_ARE_ON_STAGING and DEVELOPER_USERNAME are used to automatically setup the correct configuration environment in development, staging and production.

So I currently use an XML configuration file that is interpreted by the ZendFramework Zend_Config class. Zend_Config has the concept of multiple environments that can extend each other. This is described in more detail in the ZendFramework manual, but you could just as easily write your own or tie it into any frameworks bootstrap.

Essentially these variables allow me to work out which set of configuration data I should use when bootstrapping the application. Each developer maybe working from their own database for example so if DEVELOPER_USERNAME is set we can automatically use their configuration.

If the WE_ARE_ON_STAGING variable is not set then we are on the production server so we can use the live configuration options.

This works for me but you may want to tweak it to suit your work flow.

Apache Logging

I have this set to the maximum logging level and I have redirected all logging for sites going through this VirtualHost container to a central error and access log. This is all explained in more depth in the Apache2 manual section on Log Files.

Other Niceties

There are a few the other helpful things I have running on the server as well to aid development. I would go into more detail about them but they are fairly easy to install and configure plus they are not required for the main functioning of the server.

Install and setup Samba shares for the dev, staging and proof directories so Windows users can easily access the files and create new sites.
Setup a mail server so that your applications can send out test emails. I use Postfix for this and there is a nice help tutorial for Ubuntu to help you along.
If you are using PHP then seriously consider installing XDebug and looking into is profiling capabilities.
Also on PHP install APC to speed up PHP.
Consider setting up mod_cache to test your long life caches.
I have added memcached and Redis to the mix as well.

I have recently written about installing memcached and APC on my blog for both RedHat and Ubuntu.

Conclusion

I have found this to be a very effective development environment when working with groups of people. You can develop in your own environment and then push it to staging for client approval. It also ensures that the staging environment and development environments are almost identical. I hope that this proves useful to you as well.

https://www.simonholywell.com/post/2010/11/team-development-server/

Redis: under the hood (internals)

Simon Holywell Oct 18, 2010 Updated Nov 19, 2024

Redis: under the hood (internals) I was curious to learn more about Redis’s internals, so I’ve been familiarizing myself with the source, largely by reading and jumping around in Emacs. After I had peeled back enough of the onion’s layers, I realized I was trying to keep track of too many details in my head, and it wasn’t clear how it all hung together. I decided to write out in narrative form how an instance of the Redis server starts up and initializes itself, and how it handles the request/response cycle with a client, as a way of explaining it to myself, hopefully in a clear fashion.

Show full content

Redis: under the hood (internals)

I was curious to learn more about Redis’s internals, so I’ve been familiarizing myself with the source, largely by reading and jumping around in Emacs. After I had peeled back enough of the onion’s layers, I realized I was trying to keep track of too many details in my head, and it wasn’t clear how it all hung together. I decided to write out in narrative form how an instance of the Redis server starts up and initializes itself, and how it handles the request/response cycle with a client, as a way of explaining it to myself, hopefully in a clear fashion.

https://www.simonholywell.com/post/2010/10/redis-under-the-hood/

PECL Install Issues on Redhat

Simon Holywell Oct 6, 2010 Updated Nov 19, 2024

Installing via the pecl command can be a pain on Redhat. First off all you will need to install the php-devel package: yum install php-devel Then you will need ensure that the PEAR/PECL installer is at the latest version so as root run: pear channel-update pear.php.net pear upgrade pear You may need to force pear to upgrade itself by using: pear upgrade –force pear I had to use the –force option because my version of PEAR was so old that the installer thought my version of Tar_Archive might not have been up to muster.

Show full content

Installing via the pecl command can be a pain on Redhat. First off all you will need to install the php-devel package:

yum install php-devel

Then you will need ensure that the PEAR/PECL installer is at the latest version so as root run:

pear channel-update pear.php.net pear upgrade pear

You may need to force pear to upgrade itself by using:

pear upgrade –force pear

I had to use the –force option because my version of PEAR was so old that the installer thought my version of Tar_Archive might not have been up to muster. It was however.

With all this in place you are ready to attempt to install your chosen extension:

pecl install ssdeep

If you get something like the following back:

/usr/bin/phpize: /tmp/ssdeep/build/shtool: /bin/sh: bad interpreter: Permission

Then it is likely that your temporary directory is mounted in a safer noexec state, which means that you cannot execute scripts within the /tmp directory. To test this you can put a simple bash script into your /tmp directory and chmod it with +x. I used the following bash script:

#!/bin/bash echo “SIMON”

If you do not get SIMON back when you execute the file, but an error like “/bin/sh: bad interpreter: Permission” then the directory is set to noexec.

There are a few ways to overcome this with the easiest being to:

Remount the directory as exec:

mount -o remount,exec /tmp
Install your pecl extension

pecl install ssdeep
Remount the directory as noexec again for safety

mount -o remount,noexec /tmp

There are a variety of other ways to get this working documented in the Media Temple wiki pages if the above technique does not work for you.

https://www.simonholywell.com/post/2010/10/pecl-install-issues-on-redhat/

Forcing NetBeans to Use Unix (LF) Line Endings

Simon Holywell Oct 5, 2010 Updated Nov 19, 2024

Forcing NetBeans to Use Unix (LF) Line Endings NetBeans usually uses the operating systems default line ending when creating a new file (it establishes this by what the JVM tells it). So for example in Windows it will automatically use CRLF and in Unix it will automatically use LF. This behaviour has its advantages, but sometimes you want to to be specific about the line endings you need. To do this you can add the following switch to your call to the NetBeans binary.

Show full content

Forcing NetBeans to Use Unix (LF) Line Endings

NetBeans usually uses the operating systems default line ending when creating a new file (it establishes this by what the JVM tells it). So for example in Windows it will automatically use CRLF and in Unix it will automatically use LF. This behaviour has its advantages, but sometimes you want to to be specific about the line endings you need.

To do this you can add the following switch to your call to the NetBeans binary. On Windows this would be done on the shortcut by opening its properties and adding the switch at the very end of the Target line.

-J-Dline.separator=LF

Whilst this is handy a proper setting in the preferences of NetBeans would be preferable as it could then easily be backed up and taken from machine to machine. Vote for the issue on the NetBeans bug tracker.

https://www.simonholywell.com/post/2010/10/force-netbeans-line-endings/

https://feeds.feedburner.com/SimonHolywell

Posts