Jeroen van Wilgenburg's blog

Jeroen van Wilgenburg Aug 31, 2021

From now (Aug 30, 2021) new articles will be published on jvwilge.github.io because I swear to much when adding code snippets to WordPress (and it is ad-free).

Show full content

From now (Aug 30, 2021) new articles will be published on jvwilge.github.io because I swear to much when adding code snippets to WordPress (and it is ad-free).

http://vanwilgenburg.wordpress.com/?p=660

Extensions

Running your java integration tests with Gradle and docker-compose with only one command (optionally with Bitbucket Pipelines)

Jeroen van Wilgenburg Sep 2, 2020

This article will show you how to run your java integration tests with Docker containers running via docker-compose by only running the Gradle build-step and without needing to fiddle with Docker or docker-compose. As an addition I’ll also show how to do it on Bitbucket Pipelines, but of course you can skip this step. Note […]

Show full content

This article will show you how to run your java integration tests with Docker containers running via docker-compose by only running the Gradle build-step and without needing to fiddle with Docker or docker-compose.
As an addition I’ll also show how to do it on Bitbucket Pipelines, but of course you can skip this step.

Note that this is one of my first projects I set up using Gradle, most projects I worked on were with Maven, so please help me improve this article when I don’t do things the ‘Gradle-way’.

Separation of unit and integration test

I’m a huge proponent of not using mocks when integration testing so you’re testing with a realistic environment. When using docker-compose it’s so easy to not mock things that you’ll won’t even think about trying it. When you really want to be strict you can even make a physical separation between integration and unit-test (see this article on how to make this happen).

The big advantage using a docker-compose is that you can keep your containers running, even after your tests finish. This will increase you development speed since it is now much faster to run your integration tests. Be careful of course to clean up after you finish, otherwise you might get ‘dirty’ data.

It even is possible to use the running containers when running your application locally.

Example code on Github

I created a Github repo with example code so you can run everything yourself (assuming you have Java 11 and Docker installed).

For this project I started a fresh Spring Gradle project via start.spring.io with the following dependencies :

PostgreSQL Driver
Spring Data JDBC

Setup of docker-compose.yaml with a postgresql database

In order to use docker-compose you have to put a docker-compose.yaml in the root directory of you project. I used a simple compose file that will start a postgres image (yes this could’ve been done with Docker only, but this is to illustrate how to do it with docker-compose).

I took the inspiration for the docker-compose.yaml from this article.

Adding the gradle-docker-compose-plugin to your build.gradle

After some searching for plugins I found a nice plugin that does the trick. It doesn’t have a lot of stars yet, but you can change that if you like the plugin.

Add the following line to the plugins task (in your build.gradle) :

id 'com.avast.gradle.docker-compose' version '0.13.2'

And also add this task to the build.gradle:

dockerCompose {
isRequiredBy(test)

// enable following line when debugging container issues
// captureContainersOutput(true)
}

Handling random port numbers

When running tests chances are you already have a postgres instance running on your machine. To make sure there are no conflicts with the ports you can omit the host port and Docker will pick a random port.
To access this port in your application add the following to you build.gradle:

test.doFirst {
// the postgres port is mapped to a random port number that is located in a map
def dbPort = dockerCompose.servicesInfos.database.postgres_db.tcpPorts[5432]
environment.put("spring.datasource.url", "jdbc:postgresql://localhost:$dbPort/postgres")
}

This sets the environment variable that will be picked up by Spring. More information on the inner workings can be found in the ContainerInfo.groovy class or the project readme

Now run ./gradlew composeUp to spin up the containers. Spinning up the containers will happen in the test step, it just nice to see it work.

With ./gradlew composeDown you can shut down the containers.

If you want to use the port number (for example to run your application locally or access the database with a separate client) a nice table is displayed with the port mappings during startup :

+-------------+----------------+-----------------+
| Name        | Container Port | Mapping         |
+-------------+----------------+-----------------+
| postgres_db | 5432           | localhost:32772 |
+-------------+----------------+-----------------+

Running the integration test

For creating the integration test I’ve been a bit lazy and just used @SpringBootTest with a simple jdbc query (see JdbcTest.java.

Spring isn’t really needed to use this mechanism it’s just nice for demo purposes, but you can of course also use this in your vert.x application.

Keep your containers running when your tests finish

Of course it takes time to spin up containers and in many cases you can re-use the containers. To keep the containers running after a test copy this script to ~/.gradle/init.gradle

Note that now the ./gradlew composeDown doesn’t actually stop the containers anymore. Due to some constraints it was not possible. In the past the application didn’t show a warning. So I filed an issue that was fixed and released within a few days! So kudo’s for that, now there is a nice warning.

> Task :composeDown
You're trying to stop the containers, but stopContainers is set to false. Please use composeDownForced task instead.

To stop the containers anyway you can use ./gradlew composeDownForced.

Adding Docker to Bitbucket Pipelines

When running on Bitbucket there are few tweaks you need in order to run docker-compose with Bitbucket Pipelines.
First you need to enable Docker by adding it to the steps where you need Docker. For example :

pipelines:
default:
- step:
name: Build and test
script:
- ./gradlew build
services:
- docker

There is also an option to add it to all steps, but I think it is nicer to only add it where you need it.
For more information see Run Docker commands in Bitbucket Pipelines.

Note that when you’re using the Gradle release plugin by Researchgate you should be aware that the release step also does a full test run and needs the same Docker configuration as your build step!

Adding docker-compose to Bitbucket Pipelines

When running docker-compose you need an image that includes docker-compose. A quick, temporary fix is to install docker-compose by adding a script step before the ./gradlew build (for example - ci/dependencies.sh

#!/usr/bin/env sh

set -eu

# prevent 'ImportError: No module named ssl_match_hostname'
apt-get remove python-configparser --yes

pip install docker-compose
docker-compose -v

Note that this script is for an Ubuntu image and highly depends on which image you’re using. Please use it as inspiration, just like I did from Docker-compose and pipelines and Using docker-compose in Bitbucket Pipelines.

Fixing working directory and $BITBUCKET_CLONE_DIR issues

Bitbucket is a bit picky on where you can put data. I soon ran into the following error:

authorization denied by plugin pipelines: ... only supports $BITBUCKET_CLONE_DIR and its subdirectories

I solved it by using the PWD environment variable that’s available on Bitbucket and your command line.

For example a mysql config will look like (inside the docker-compose.yaml) :

mysql-myapp-main:
image: mysql:5.7
restart: always
environment:
MYSQL_DATABASE: 'myapp-main'
MYSQL_PASSWORD: 'myapp123'
MYSQL_USER: 'myapp'
MYSQL_RANDOM_ROOT_PASSWORD: 'true'
ports:
- 3306:3306
volumes:
- '${PWD}/build/docker-volumes/myapp-mysql-main_mysql:/var/lib/mysql'
- '${PWD}/build/docker-volumes/myapp-mysql-main_files/db:/var/files'

When using Intellij (or maybe other IDE’s) the PWD variable might not be available. The error you’ll see might be a bit cryptic since Gradle basically omits the variable :

ERROR: for b233df47a809387947f80c7fecdff6b7_myapp__mysql-myapp-main_1  Cannot start service mysql-myapp-main: Mounts denied:
The paths /build/docker-volumes/myapp-mysql-main_mysql and /build/docker-volumes/myapp-mysql-main_files/db
are not shared from OS X and are not known to Docker.
You can configure shared paths from Docker -> Preferences... -> File Sharing.
See https://docs.docker.com/docker-for-mac/osxfs/#namespaces for more info.

I solved this by adding

environment.put("PWD", getenv("PWD") ?: projectDir)

Conclusion

I hope you enjoyed the article, if you have any improvements/comments let me know!

http://vanwilgenburg.wordpress.com/?p=644

Extensions

Using a ConnectableFlux to do background batching on elasticsearch

Jeroen van Wilgenburg Jan 9, 2020

We have a Project Reactor application which is a bit brittle on refactorings. The load on our elasticsearch cluster is pretty high due to many single get/insert by id’s. Adding batch read by id was so much work that I was looking for a different solution. I eventually came up with a solution using a […]

Show full content

We have a Project Reactor application which is a bit brittle on refactorings. The load on our elasticsearch cluster is pretty high due to many single get/insert by id’s. Adding batch read by id was so much work that I was looking for a different solution. I eventually came up with a solution using a ConnectableFlux.

Introduction

Note that all the code examples are available on github. The code snippets I use in this article are the shortened versions (with less logging and documentation) to improve readability. The source repository also contains all the scripts to create test data and run performance tests. It is basically a standard Spring Initializr – Spring Boot application.

The application I’m talking about is reading messages from a queue. For each message a document is retrieved from elasticsearch, updated with information from that message and inserted back in elasticsearch. Latency is not a big issue, when messages are processed within 10s of seconds that’s ok. What’s not ok is the pressure on our elasticsearch cluster. Response times exceed 250 ms on peak load. Which eventually leads to delays of minutes in the message processing.
We also do a lot of single get by id’s, so that would be a candidate to look into. When this works the same applies for insert by id.

ConnectableFlux

I was searching for some kind of stream you can put items on and retrieve the answers later, with minimal changes to the existing application of course. This stream would be a long-living Flux. After reading some reactor documentation ConnectableFlux seems like a good candidate.
The idea is to buffer for some time (50ms) and/or size (5) and then make a bulk request to elasticsearch. After retrieving the answer we have to split the bulk response by filter out the correct id (this feels a bit wrong, it’s like : “here are a bunch of answers, check if the right answer is there”).

The batches don’t need to be big, some quick tests with multi get show that even batches of 3 improve response times drastically. With batches larger than 5 performance isn’t increasing drastically anymore. So don’t put the number too high, since you will add a small delay to your requests.

A pseudo-marble diagram explaining the concept

Before we start coding it might be useful to include a marble-like diagram to see what is happening. When you don’t understand the diagram yet scroll back to it when you read the code samples and everything will probably be much clearer.

In this diagram we make 5 requests (get by id). The buffer size is set to 3 (and no timeout for brevity). All requests are sent to the connectableFlux. When the first three (green, yellow, blue) arrive the buffer is full and a request to elasticsearch is made. When the result arrives, all 3 requests receive all answers simultaneously and filter for the requested id (this is all hidden in a method, so no user action is required). Since magenta and red are not enough to fill a buffer (yet) there is no request made to elasticsearch.

The code

So let’s set up the init method of a @Service containing a ConnectableFlux:

private AtomicReference<FluxSink<Long>> input = new AtomicReference<>();

public void init() {
  connectableFlux = Flux.create(input::set, FluxSink.OverflowStrategy.ERROR)
      .bufferTimeout(5, Duration.ofMillis(50))
      .concatMap(ids ->
          Mono.<List<Product>>create(sink -> {
            repository.getBatchAsync(ids, sink);
          })
      )
      .flatMap(Flux::fromIterable)
      .publish();

  // https://www.slideshare.net/Pivotal/reactive-programming-with-pivotals-reactor slide 20
  connectionToConnectableFlux = connectableFlux.connect(); // start pumping, with or without subscribers
}

The input is the object you submit items (product id’s) to (with the next() method).
bufferTimeout waits for size or time, whichever comes first. Don’t set the buffer too high since otherwise the average response times will increase too much (because chances are you hit the time limit before the size).
In the concatMap make the actual batch-call. publish() concludes the connectableFlux and with connect() things start flowing (with or without any subscriber).

The next step is replacing the repository.getById call with repositoryConnectableFlux.get.

public Mono<Product> get(Long id) {
  return connectableFlux
      .doOnSubscribe(ignore -> {
        // send id to connectableFlux so it will be included in a batch
        // send id in onSubscribe or otherwise the result might arrive before the actual subscribe()
        input.get().next(id);
      })
      .filter(product -> id.equals(product.getId())) // connectableFlux contains all results from batch, filter on id
      .next(); // basically a take(1) that converts to mono : https://stackoverflow.com/questions/42021559/convert-from-flux-to-mono
}

In the onSubscribe we put the id we want to retrieve eventually on the connectableFlux with next(id) and return the Mono. Note that this might look like a strange place to submit the id, but this prevents the risk of the answer arriving too soon (since we technically don’t know when someone subscribes to the Mono).

When the product id arrives at input it is put into the buffer of connectableFlux. On a full buffer (or certain time passed) the buffer is converted into a batch request to elasticsearch. When an answer is ready (with probably some other id’s in it too) we go back to our Flux and filter for the requested id (because all other id’s might also float around on the ConnectableFlux).
With next() we say: “give me the next item and terminate the Flux” (effectively converting it to a Mono).

In the repository we have to add a getBatchAsync method :

public void getBatchAsync(List<Long> ids, MonoSink<List<Product>> monoSink) throws RuntimeException {
	MultiGetRequest multiGetRequest = new MultiGetRequest();
	ids.forEach(id -> multiGetRequest.add(INDEX_NAME, DOC_TYPE, id));

	ActionListener<MultiGetResponse> actionListener = new ActionListener<MultiGetResponse>() {
	  @Override
	  public void onResponse(MultiGetResponse multiGetItemResponses) {
	    final List<Product> result = null; // convert response to list of Product here
	    monoSink.success(result);
	  }

	  @Override
	  public void onFailure(Exception e) {
	    monoSink.error(e);
	  }
	};
	client.mgetAsync(multiGetRequest, RequestOptions.DEFAULT, actionListener);
}

This is basically wrapping the async call in a Mono by using the elasticsearch ActionListener.

Why don’t use a normal Flux?

This was a very valid review comment by Tim van Eijndhoven. I didn’t knew the answer and it might even be the case that a ConnectableFlux was something left over from earlier experiments. In any case I didn’t properly documented it so I needed to revisit my construction. I quickly ran into problems that the sink was created multiple times, it even seemed the whole Flux was duplicated.
I eventually found the answer in Flight of the Flux 1 – Assembly vs Subscription (paragraph ‘Hot’). So a share() would also be a valid solution. The downside of a share is the ‘unnecessary’ subscribe you need to make to get the pointer to input. You can argue that connect() is the same, but in my opinion it reflects the purpose better.

Conclusion

I’m not sure if this is the right solution for the problem, but it is quite easy to add to our application and I learned a lot about Reactor when writing this article.

Although this solution looks promising I’m still a bit hesitant to add it to our application. Simplicity is more important at this stage in the life of the application.

There aren’t that many articles on Project Reactor (yet). Since RxJava has the same foundation chances are you can find a solution to your Reactor problem by reading RxJava articles. The methods might have other names or just a different signature.

Make sure to read the articles mentioned in the sources since there are some tricky things you might not know.

Sources

Daily Reactive: Splitting a stream
ConnectableObservable: So Hot Right Now
Flux sharing in Project Reactor: from one to many
Flight of the Flux 1 – Assembly vs Subscription
RxJava One Observable, Multiple Subscribers

http://vanwilgenburg.wordpress.com/?p=627

Extensions

Running your elasticsearch integration tests with JUnit 5, Karate and TestContainers (Docker)

Jeroen van Wilgenburg Jul 8, 2019

Earlier this year I wrote an article on how to run your integration tests with an embedded elasticsearch. When upgrading to elasticsearch 7 this method didn’t work (yet). An alternative (and maybe even better) method is using Testcontainers to run elasticsearch in a Docker container. I will also show how you can leverage Karate to […]

Show full content

Earlier this year I wrote an article on how to run your integration tests with an embedded elasticsearch. When upgrading to elasticsearch 7 this method didn’t work (yet). An alternative (and maybe even better) method is using Testcontainers to run elasticsearch in a Docker container. I will also show how you can leverage Karate to do your integration testing.

Setup and requirements

For this article I use Maven 3.6.1, Java 11 and Testcontainers (a Maven dependency).
Testcontainers uses Docker, so you need to have Docker installed.

Note that when you eventually run Testcontainers on Jenkins you need to be able to run Docker on Jenkins too.

To verify Maven and Java are installed correctly:

mvn -v

Should contain a Maven and Java version.

To verify Docker is installed and running correctly:

docker info

Should contain a result like : Server Version: 18.09.2

All the sample code is available at github and I will show important code snippets in this article.

Use Karate to test elasticsearch – You know, for search

Our initial step is a JUnit test that spins up a testcontainer with elasticsearch and a Karate-test that checks if elasticsearch is running.

Add a Maven dependency

To use the elasticsearch Testcontainer add the following Maven dependency :

    <dependency>
      <groupId>org.testcontainers</groupId>
      <artifactId>elasticsearch</artifactId>
      <version>1.11.4</version>
    </dependency>

Create your test class

Add a class DemoKarateIT :

@Testcontainers
public class DemoKarateIT {

  private static final String url = "docker.elastic.co/elasticsearch/elasticsearch:7.2.0";

  @Container
  private static ElasticsearchContainer container = new ElasticsearchContainer(url);

  @BeforeAll
  static void beforeAll() {
    String httpHostAddress = container.getHttpHostAddress();
    System.setProperty("elasticsearch.address", httpHostAddress);
    System.setProperty("karate.env", "test");
  }

  @Karate.Test
  Karate testAll() {
    return new Karate().feature("classpath:karate");
  }

}

Karate uses system properties to pass values. karate.env is used to determine whether you’re running the tests from JUnit or standalone (more on this later). The most simple use of a Testcontainer is annotating a container field with @Container and adding the @Testcontainers annotation to your test. We also call the container.getHttpHostAddress method to pass the address via the elasticsearch.address. Note that the port is randomized, so we can’t hardcode the port number.
Using this method a container is started/stopped on every method. When you only want to start the container once use the singleton container pattern (thanks to Sergei Eigorov for pointing this out).

Add the Karate config

The karate-config.js :

(function () {
    var env = karate.env; // get java system property 'karate.env'

    if (!env) {
        env = 'local'; //no environment found, assuming local (laptop)
        karate.properties['elasticsearch.address'] = 'localhost:9200'
    }

    karate.log('karate.env property:', env);

    var config = {};

    return config;
});

The check on env is added so we can run the .feature files directly from within the IDE without JUnit (you need a running elasticsearch instance (and possibly other services) for this) which can speed up development. When the env is set (in our unit-test to ‘test’) the address of the elasticsearch Docker container is used.

Add a Feature file

Finally we have to add a feature file called es-basic.feature :

Feature: Basic elasticsearch tests

  Background:

  Scenario: Verify elasticsearch is running
    Given url 'http://' + karate.properties['elasticsearch.address'] + '/'
    When method GET
    Then status 200
    And match $.tagline == 'You Know, for Search'

This test will call the root url and verifies the reason where you want to use elasticsearch for.

Now you can run the integration test DemoKarateIT. Note that the first time you run this test it might take a while since the Docker image for elasticsearch needs to be downloaded once. This is a bit of a caveat on Jenkins, if configured incorrectly the container will be downloaded every time.

Auto detect the elasticsearch version

For my previous article I used a class AutoDetectElasticVersion to auto-detect the version of elasticsearch. You can use this class to use the same Docker image version as your Maven elasticsearch dependency.

Prepare your test data

Now that our basic setup is running it is time to create some test data to create some realistic tests.

It’s imperative to have proper test data in your project. When there is no proper data try to create realistic data (not just data to keep your tests happy), this will prevent many nasty bugs.

In the sample code there is a class called OwnerGeneratorUtil that produces json lines (.jsonl) that can be used to do bulk inserts in elasticsearch.

A sample record :

{"index":{"_index":"owner","_id":"ee38304188"}}
{"id":"ee38304188","firstName":"Jane","lastName":"Doe","car":"BMW"}

Note that the .jsonl file should have an empty line at the end.

The background step in a Karate Feature is executed on every Scenario. When you’re doing read only requests on elasticsearch it makes no sense to re-insert the data every time. It also prevents you from running your tests in parallel (to disable parallel execution add the tag @parallel=fasle to your feature). To do an insert only once you can use callonce

In our example the callonce reads a feature file and executes it with the file names as parameter :

 * callonce read('es-load-testdata.feature') { 'fileName' : 'test-owners.jsonl' }

It is important when loading data in elasticsearch to call refresh afterwards since the data might not be available for search yet!

Creating a real test

For this test we spin up a Vert.x server that exposes an endpoint at /all and /{search term}.

It is not really important that we user Vert.x here, it just is more realistic than calling elasticsearch endpoints directly.

es-advanced.feature

Feature: More advanced elasticsearch tests

  Background:
    * url baseUrl
    * callonce read('es-load-testdata.feature') { 'fileName' : 'test-owners.jsonl' }

  Scenario: Verify find all
    Given path '/all'
    When method GET
    Then status 200
    And match $ == '#[180]'

  Scenario: Verify Hans Gruber drives a Peugeot (might return other Grubers)
    Given path '/gruber'
    When method GET
    Then status 200
    And match $ contains any { 'firstName' : 'Hans', 'lastName' : 'Gruber', 'car' : 'Peugeot', 'id' : '#notnull' }

The first scenario verifies that there are 180 records when calling the /all endpoint.
The second scenario searches for ‘gruber’ and verifies that Hans Gruber drives a Peugeot (who would’ve guessed that?). Not that you can use the #notnull placeholder when handling generated id’s (note that this is technically a javascript String).

Miscellaneous

In this paragraph I will tie up some loose ends you also might run into.

JUnit 5 with Karate

To use Karate with JUnit 5 you have to add a dependency :

    <dependency>
      <groupId>com.intuit.karate</groupId>
      <artifactId>karate-junit5</artifactId>
      <version>0.9.4</version>
      <scope>test</scope>
    </dependency>

Running tests in parallel with Karate and JUnit 5

When you want to run your tests in parallel you can’t use the @Karate.Test annotation anymore. Use the following snippet instead:

  @Test
  void testParallel() {
    Results results = Runner.parallel(getClass(), 2, "target/failsafe-reports");
    assertThat(results.getFailCount() == 0).isTrue().withFailMessage(results.getErrorMessages());
  }

This method will start 2 threads and won’t look as nice when running it in your IDE (with @Karate.Test every scenario was displayed separately, but that’s probably impossible or very challenging).

Use the tag @parallel=false to prevent Scenario’s to run in parallel (Features will still run parallel though).

Surefire, failsafe or skip everything?

Since the JUnit test is actually an integration test the class name should end with IT so it will be ran with the failsafe plugin instead of the surefire plugin.

Don’t forget to add the proper goals and use a least version 2.20.0 to prevent issues :

<plugin>
  <groupId>org.apache.maven.plugins</groupId>
  <artifactId>maven-failsafe-plugin</artifactId>
  <version>2.22.2</version>
  <executions>
    <execution>
      <goals>
        <goal>integration-test</goal>
        <goal>verify</goal>
      </goals>
    </execution>
  </executions>
</plugin>

mvn clean verify -Dmaven.test.skip will skip all tests (where -DskipTests will only skip unit tests). Skipping tests is frowned upon so try to improve run times of your tests instead of skipping them.

Conclusion

I hope this was a coherent piece to read. I wanted to include a lot of issues you might run into to save you the troubleshooting. If you have any comments/improvements please let me know. Thanks for reading!

Sources and more information

Karate
Testcontainers (elasticsearch module)

http://vanwilgenburg.wordpress.com/?p=612

Extensions

How to remotely reload classes with Spring Developer Tools without opening extra ports

Jeroen van Wilgenburg Jun 3, 2019

With Spring Developer Tools it is possible to reload classes on a local or remote machine from within your IDE without using JPDA. In this article I’ll show you how to do this and how to prevent this from happening on your production environment. The nice thing about Spring dev tools is that it works […]

Show full content

With Spring Developer Tools it is possible to reload classes on a local or remote machine from within your IDE without using JPDA. In this article I’ll show you how to do this and how to prevent this from happening on your production environment.

The nice thing about Spring dev tools is that it works via the http server port of your spring boot application and you don’t need additional firewall rules (or Docker config changes) like you need with JPDA.

Naming

When I talk about the dev server I mean the environment where your spring boot application is running. This can be a remote machine or your local environment (also from within your IDE).
With application or project I mean your spring boot application.

Configure Spring Dev Tools in your application

To enable dev tools add the following dependency to your pom :

<dependency>
  <groupId>org.springframework.boot</groupId>
  <artifactId>spring-boot-devtools</artifactId>
  <version>2.1.5.RELEASE</version>
  <optional>true</optional>
</dependency>

By default the dev tools are not packaged in your project because you don’t want this feature to be enabled on your production environment. Ideally you want to pass a maven command line argument to make a local build with the option of class reloading. When you’re using spring boot version 2.2.0 (not yet available at the time of writing) or higher this option is called spring-boot.repackage.excludeDevtools . When you’re using an older version you have to add a configuration section to your spring-boot-maven-plugin :

The spring-boot-maven-plugin has an option to include dev tools in the build :

<plugin>
  <groupId>org.springframework.boot</groupId>
  <artifactId>spring-boot-maven-plugin</artifactId>
  <configuration>
    <!-- TODO remove excludeDevtools when running spring-boot 2.2.0 or higher -->
    <excludeDevtools>${spring-boot.repackage.excludeDevtools}</excludeDevtools>
  </configuration>
</plugin>

Use the same property name to be forward compatible. When you upgrade your spring boot version the configuration property can be removed.

Building

When building the application you now can pass this flag on the command line:

mvn clean install -Dspring-boot.repackage.excludeDevtools=false

This will result in a jar you can run on your dev server. Don’t ever run this jar in production!

To verify that the right classes are included in the packaged jar you can run :

 jar tf target/my-application-1.0-SNAPSHOT.jar | grep devtools

This should result in a jar called spring-boot-devtools-[version].RELEASE.jar.

Running the application

To run the application on your dev server you have to pass a password (or include it in your property file, but again: it might end up in production and it is wise to have multiple safe guards, so pass it as a command line property).

java -jar my-application.jar -Dspring.devtools.remote.secret=bratwurst99

When dev tools is enabled you should see a log statement in the logging like

WARN  o.s.b.d.a.RemoteDevToolsAutoConfiguration - Listening for remote restart updates on /.~~spring-boot!~/restart

Connecting to the application from your IDE

To connect to your application you have to run the Remote Client Application (packaged in the spring dev tools dependency). For IntelliJ you have to add a run configuration :

Main class: org.springframework.boot.devtools.RemoteSpringApplication
VM Options: -Dspring.devtools.remote.secret=bratwurst99
Program arguments: http://my-remote-host:8080/

Make some changes to your classes and build your project, now the new classes will be pushed to your dev server.
When the change is succesful you will see a message in your IDE like:

12:02:21.089 [File Watcher] INFO  o.s.b.d.r.c.ClassPathChangeUploader - Uploaded 1 class resource

You can even add or remove methods and those changes will still be reloaded! Note that it is also possible with STS, Eclipse, Netbeans or other IDE’s, the naming will be slightly different.

When you get the error Exception in thread "File Watcher" java.lang.IllegalStateException: Unexpected 403 FORBIDDEN response uploading class files you might want to make some changes to your spring security config (add the path /.~~spring-boot!~/restart (and again don’t do it on production!).
An access denied error will also appear on your dev server when the spring loglevel are set to DEBUG.

On your dev server spring boot will be reloaded (you will see the spring boot logo for a second time in your logging).

Next steps

For more details (like live reloading in your browser) consult the spring docs on how to do this. Since I don’t want to copy the manual here.

http://vanwilgenburg.wordpress.com/?p=602

Extensions

Writing integration tests for CORS headers (with Karate)

Jeroen van Wilgenburg May 3, 2019

On many projects CORS headers are configured incorrectly. Usually by putting some wildcards (*) in the config and things ‘work’. In this article I will show how to create tests for the correct headers (using Karate, but it should be applicable to any test framework). Introduction CORS stands for Cross-Origin Resource Sharing. And that’s pretty […]

Show full content

On many projects CORS headers are configured incorrectly. Usually by putting some wildcards (*) in the config and things ‘work’. In this article I will show how to create tests for the correct headers (using Karate, but it should be applicable to any test framework).

Introduction

CORS stands for Cross-Origin Resource Sharing. And that’s pretty much everything I will tell about CORS. You’re probably here to get some quick results and won’t listen to me when I tell you it is a wise thing to expand your CORS knowledge first.

Hopefully some of the tests in this article will fail on your project and then you have to read about CORS anyway. An excellent source is Mozilla’s article about CORS and will pretty much cover everything you need to know. I also recommend this article by Derric Gilling, it is a bit easier to read when you’re new to CORS (or want some more elaborate explanations).

My tests are written in Karate. They’re easy to understand without any knowledge about Karate and it’s my preferred way to do integration testing.

With CORS there are two kinds of requests, simple and advanced :

Simple request (and response)

You’r probably dealing with a simple request when the HTTP request method is GET/HEAD/POST, does not contain ‘forbidden’ headers and has Content-Type: text/plain or form data.
You won’t find many simple requests nowadays since most requests have Content-Type: application/json, (which triggers an advanced request).

More details about simple requests

Your web server should treat a request as a CORS-request when you provide the Origin header. The Origin header is not modifiable in your browser. The header is still editable with tools like curl so never trust a request header!

The response should contain an Access-Control-Allow-Origin header with the same value as your Origin. When your Origin does not contain a wildcard (*), the Vary header should also contain Origin

In a Karate test this will look like :

Scenario: Simple CORS request
  Given path '/simple'
  And header Origin = 'http://localhost:3000'
  When method GET
  Then status 200
  And match header Access-Control-Allow-Origin == 'http://localhost:3000'
  And match header Vary contains 'Origin'

Advanced requests

All other requests are advanced and your browser precedes these requests with a ‘preflight request’ (unless a cached version is still valid). The preflight request is an HTTP OPTIONS request. Note that your browser does all the work, you don’t need to manually create or send an OPTIONS request.

Besides the Origin header the Access-Control-Request-Method header is set, this will contain the HTTP method that will be used in the successive call (the actual request). Finally the Access-Control-Request-Headers is set, this will contain the headers that will be sent along with the successive request.

Advanced response

The response should contain the same headers as a simple response plus some more. The first is

Access-Control-Max-Age: This indicates how long a browser can cache the response (in seconds). This is important since the response can contain information regarding other requests you might make in the future and you don’t want the extra roundtrip on every call.
Access-Control-Allow-Methods: will contain all the HTTP methods that are allowed for CORS-requests
Access-Control-Allow-Headers: will contain the HTTP headers that are allowed to use in a CORS-request. Any other header supplied should result in your browser NOT making the call (the conversation stops after the preflight response and prints an error message in your console).
Access-Control-Allow-Credentials: indicates whether the client can send credentials (cookies, authorization headers or TLS client certificates).
Access-Control-Expose-Headers: contains a list of headers that can be included in the response, any other headers are ignored by the browser (unless it’s a Safelisted header.

The test for a preflight request/response will look like :

Scenario: CORS preflight request (one valid, one invalid header)
  Given path '/advanced'
  And header Origin = 'http://localhost:3000'
  And header Content-type = 'application/json'
  And header Access-Control-Request-Method = 'GET'
  And header Access-Control-Request-Headers = 'authorization,fake-header'
  When method OPTIONS
  Then status 200
  And match header Access-Control-Allow-Origin == 'http://localhost:3000'
  And match header Access-Control-Allow-Headers == 'authorization'
  And match header Access-Control-Allow-Methods == 'GET'
  And match header Cache-Control == '#string'
  And match header Vary contains 'Origin'

Note that when a browser made this call the actual successive GET wouldn’t be executed because of the fake-header. The Cache-Control header is a safelisted header that doesn’t need to be in the Access-Control-Request-Headers.

Extra checks and debugging

It can also be a good idea to check explicitly that some headers are missing. Maybe your web server includes too much (and/or wrong) information for example.

When you’re debugging/creating your tests you might want Karate to show some extra information. Lowering the log level is one option, but produces a lot of noise. A better idea is to print your response and/or header with the following statements :

* print response
* print responseHeaders

Remember to clean up those print statements after you’re done!

There is a website where you can test your CORS headers : https://www.test-cors.org/. This site of course uses the Origin: https://www.test-cors.org/, so adapt your web server accordingly. It is also possible to check out the source code and run this code yourself (because with the website your data is sent over the internet).

How to configure CORS on your web server

Now follows a list of guidelines for configuring CORS properly.

Don’t set the Access-Control-Allow-Origin to * unless you want to allow any Origin to access your resources. * also cannot be used in combination with credentials. Make sure you have a server side list with allowed Origins.

For caching of the preflight response (Access-Control-Max-Age header) pick a reasonable value. Most browsers have a cap on the value ranging from 10 minutes to 1 day. Disabling (-1) will degrade performance (you need an extra roundtrip on every CORS-request), so a value of a few minutes is reasonable.

It’s okay to allow all HTTP methods (Access-Control-Allow-Methods: *) unless you have a very specific reason not to (only read only request for example).

Pay attention with Access-Control-Allow-Headers. Be aware that there are some default headers and Content-Type is not one of them (since most people use this nowadays for json request/responses).

Access-Control-Allow-Credentials should return true when the withCredentials property of an XMLHttpRequest is set to true (and your server expects credentials along the the request).

Access-Control-Expose-Headers are the response headers that are visible to the client browser. When this property is set incorrectly the client browser will ignore these headers.

Conclusion

I found a lot of articles about CORS, but ultimately the Mozilla site contained all the (correct) info. I hope I condensed it into a usable article. If you have any remarks/improvements/tips please feel free to comment on this article. Since it’s a bit of a rabbit hole I might have missed some important information.

Be aware that there are a lot of bad explanations of CORS on Stack Overflow and the rest of the internet. When you found something always cross check with reputable sources like Mozilla or the rfc’s on IETF (so also don’t trust this article).

Remember that this is all protection in your browser, with full control over your HTTP headers (ie with Curl) it’s still possible to make all kind of malicious calls. So always treat incoming calls with caution.

http://vanwilgenburg.wordpress.com/?p=594

Extensions

Running your JUnit 5 integration test with an embedded elasticsearch on a random port (and optionally Spring Boot)

Jeroen van Wilgenburg Jan 22, 2019

With recent versions of elasticsearch (5+) the learning curve for an integration test became a bit steeper but will result in a cleaner solution in the end. In this article I will describe how to set up your test with JUnit 5 to run your elasticsearch integration tests. I will also discuss how to make […]

Show full content

With recent versions of elasticsearch (5+) the learning curve for an integration test became a bit steeper but will result in a cleaner solution in the end. In this article I will describe how to set up your test with JUnit 5 to run your elasticsearch integration tests. I will also discuss how to make it work with Spring-Boot Test.

So why did they make it harder you’re probably wondering? It is described in this article. In short : security and class loading problems. It is also good to know the TransportClient will be deprecated soon so the high level rest client is also an upgrade that should be on your radar.

As of version 5 you have to spin up a standalone server and run your integration test against that server. David Pilato (from elastic) suggests a few tools and I found embedded-elasticsearch after some research. Elastic has given a maven plugin low priority so they suggest the tools David mentioned.

Requirements

When evaluating the tools it was important for us to be able to run the test from within the IDE and use Nexus (preferably without hard coding a url in the application) to download the elasticsearch distributions (since there is no direct internet connection on our Jenkins build server).

Random ports

Before we start with embedded-elasticsearch I’ll explain how to run any server on a random port. Even on your local machine it is a bad idea to use static ports in your tests. On a Jenkins server you immediately run into flaky tests when multiple builds use the same port at the same time. On your local machine ports might clash with already running applications. Se be a good citizen and set it up properly from the start.

With the call new ServerSocket(0) java will allocate a random port for you. With getLocalPort this port number is revealed.

In code it will look like this :

private static Integer findRandomPort() throws IOException {
  try (ServerSocket socket = new ServerSocket(0)) {
    return socket.getLocalPort();
  }
}

Setup embedd-elasticsearch

I will explain how to setup your server followed by instructions on how to setup Nexus (this step is optional).

First add embedded-elasticsearch as a dependency to your pom.xml (you might also have to add commons-io when it is not already in your project).

    <dependency>
      <groupId>pl.allegro.tech</groupId>
      <artifactId>embedded-elasticsearch</artifactId>
      <version>2.8.0</version>
      <scope>test</scope>
    </dependency>
    <dependency>
      <groupId>commons-io</groupId>
      <artifactId>commons-io</artifactId>
      <version>2.6</version>
      <scope>test</scope>
    </dependency>

I assume you already use JUnit 5 (and otherwise this might be a good time to start using it).
The first setup looks like :

public class EmbeddedElasticTest {

  private static final String ELASTIC_VERSION = "6.5.4";
  private static EmbeddedElastic embeddedElastic;
  private static Integer port;

  @BeforeAll
  public static void beforeClass() throws Exception {

    port = findRandomPort();

    final URL esUrl = new URL(String.format("https://secret-internal-host.local/nexus/repository/elasticsearch/elasticsearch-%s.zip", ELASTIC_VERSION));

    embeddedElastic = EmbeddedElastic.builder()
        .withElasticVersion(ELASTIC_VERSION)
        .withSetting(PopularProperties.TRANSPORT_TCP_PORT, findRandomPort())
        .withSetting(PopularProperties.HTTP_PORT, port)
        .withSetting(PopularProperties.CLUSTER_NAME, UUID.randomUUID())
        .withDownloadUrl(esUrl)
        .build()
        .start();
  }
}

I decided to start the elasticsearch server once, it is of course possible to start/stop a server every test.

When you don’t use Nexus it suffices to use https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-%s.zip as the url. Also don’t forget to pass the url as a configuration property.

Note that, even though you don’t use it, you have to setup the tcp transport port since multiple processes claiming the same port can lead to problems.

Don’t forget to clean up after you’re done :

  @AfterAll
  public static void afterClass() {
    embeddedElastic.stop();
  }

Right now you have a running server. It is time to add your tests, create indexes and add mappings. Since this is very specific for your needs and a basic setup is provided in the readme of embedded-elasticsearch I decided to skip this.

The important thing to know is that there is a elasticsearch REST server available on http://localhost:{port}

Not hardcoding the elasticsearch version in your test

I’m not happy with hardcoding the version so I submitted a pull request that will scan your classpath for elasticsearch client jars and use the highest version found. It’s a bit cumbersome, but it will save you the hassle of keeping two versions in sync.
When you want to try it out have a look at commit 04bfba7473. When you include AutoDetectElasticVersion in your project you can
set the version by replacing ELASTIC_VERSION with AutoDetectElasticVersion.detect() (works with elasticsearch 5 and higher).

Setup Nexus

To setup Nexus correctly click on the server admin cog in Nexus. Click on repositories, create repository and add a ‘raw (proxy)’ with the following settings :
* name : elasticsearch
* remote storage : https://artifacts.elastic.co/downloads/elasticsearch/
* Blob store : not sure what you should enter here, our config has a default option

When reviewing the setup it should look like this :

The only downside of this configuration is that we have to hardcode the url in our application. The elasticsearch-maven-plugin uses the Maven distribution model (but you can’t run it from your IDE from within the integration test).

Setup random port with SpringBootTest

Since many people use Spring Boot and it is not trivial to set the port I’ll explain it here.
Our project uses the configuration property elasticsearch.port to define the port number. Since it is random now there is no way to set it in application-test.yml.

Luckily Phill Webb has a great suggestion.

Add an inner class in your test :

static class Initializer
    implements ApplicationContextInitializer<ConfigurableApplicationContext> {

  @Override
  public void initialize(
      ConfigurableApplicationContext configurableApplicationContext) {
    TestPropertyValues.of("elasticsearch.port=" + port)
        .applyTo(configurableApplicationContext.getEnvironment());
  }

}

And add the following annotations to your test :

@ExtendWith(SpringExtension.class)
@SpringBootTest(classes = SomeApplication.class)
@ContextConfiguration(initializers = EmbeddedElasticTest.Initializer.class)
public class EmbeddedElasticTest

This will override the elasticsearch.port property when starting your tests.

Improvements/aftertoughts

I hope this article will help you. Improvements/suggestions are more than welcome in the comments.

http://vanwilgenburg.wordpress.com/?p=588

Extensions

Using ZAP-proxy and nginx to debug and tamper with HTTP traffic – Emulate timeouts and other unexpected behaviour

Jeroen van Wilgenburg Oct 2, 2018

I recently ran into a problem where I was unable to set a proxy in an application. I wanted to use this proxy as a man-in-the-middle proxy to debug an external web service call. I solved this problem by using nginx to redirect the traffic to the proxy. Since 99 out of 100 articles about […]

Show full content

I recently ran into a problem where I was unable to set a proxy in an application. I wanted to use this proxy as a man-in-the-middle proxy to debug an external web service call. I solved this problem by using nginx to redirect the traffic to the proxy. Since 99 out of 100 articles about nginx proxies are about reverse proxies I decided to write about a forward proxy to save you some time and show you some of the features of a man-in-the-middle proxy.

The proxy was of course the Zed Attack Proxy (ZAP for short). Before ZAP I used a tool known as WebScarab on numerous occasions. WebScarab is superseded by ZAP. Both tools are able to show HTTP traffic and tamper with requests/responses. It is also possible to track TLS traffic, but for this article I’ll leave this feature in the ZAP toolbox.

Our application showed some strange errors on long running requests to an external web service. When I tried to redirect the traffic to ZAP (to slow down traffic) I wasn’t able to change the proxy settings (probably because it was abstracted away by the many layers of frameworks in the application). Using the (MacOS) system proxy also was fruitless. I was however able to change the address of the external web service, so I changed the address to an nginx server.

Redirecting traffic to ZAP in nginx

Setting a real HTTP proxy in nginx was a bit of a conundrum since most people on the internet want to use nginx as a reverse proxy. I finally found a stackoverflow post that gave me 90% of the directions. The last 10% I found here.
It is extremely important to have your slashes right. When something doesn’t work as expected: check your slashes first!

Update your nginx configuration (/usr/local/etc/nginx/nginx.conf if you used brew on MacOS) :

server {
	listen					18226;

   location / {
       proxy_pass          http://127.0.0.1:48205/;
       proxy_set_header Host $host:8226/;
       proxy_set_header X-Real-IP $remote_addr;
       proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
       proxy_set_header X-Forwarded-Proto $scheme;
       proxy_set_header Request-URI $request_uri;
   }
}

In this example nginx runs on port 18226 (the number is not important, but it is smart to relate this to the original service you’re calling). 18226 is the port number is you configure in your application. The original web service runs on port 8226, this port is set in the Host header. Note that this header ends with a slash! (Thanks Danny the Flame for saving me a lot of time).
At proxy_pass the address of ZAP is configured. ZAP runs on port 8080 by default (you can change it at settings, local proxy).

The old and new situation are depicted in the diagram below.

When you’re running on MacOS stay away from the hostname localhost in your configurations. I’m still not sure what the problem is, but this hostname doesn’t always work as expected. That’s the reason I use 127.0.0.1, it should be exactly the same as localhost, but it isn’t.

Setting a breakpoint and editing the request/response in ZAP

For my next experiment I changed the Host header (proxy_set_header Host) to google.com. When ZAP is running and you point your browser to http://127.0.0.1:18226 you should be redirected to https://www.google.com. Note that I use http in this example because your browser has some https-tricks which will complicate things for this demo.
When the redirect happens you should see an entry in the history tab in ZAP. When you right click the entry, pick Break… and hit save. ZAP will now detect the url and stops (just like a debugger in an IDE).

Go to http://127.0.0.1:18226 again. ZAP now stops at this breakpoint. The first pause at the breakpoint allows you to edit the request, when you hit the play button (the one without the bar in front of it) the edited request is sent and now you can edit the response. In the response change http://www.google.com to http://www.bing.com and hit play again. You will be redirected to the bing site.

Visit Zed Attack Proxy – Intercepting Traffic and Modifying with Breakpoints on YouTube for a more elaborate explanation on how to set a breakpoint.

Other uses for breakpoints

This was of course a simple example. It gets interesting when you edit response codes or break json structures.

Another option when ZAP hits a breakpoint is to do nothing or wait a while to see what happens. This way you can emulate timeouts or slow responses.

Conclusion

ZAP is an enormous toolbox from which I used just a simple screw driver.

A few tips :

Be very careful with slashes in nginx
Mac users should refrain from using localhost, 127.0.0.1 is a better idea

Be careful when you’re redirecting production traffic through ZAP. You might want to use the upstream module of nginx to partially tap traffic to ZAP.

http://vanwilgenburg.wordpress.com/?p=573

Extensions

Lessons learned after serving thousands of concurrent users in a devops team for a year

Jeroen van Wilgenburg Aug 22, 2018

I just celebrated a year at a my customer. When I arrived the project was in a good shape with a few rough edges. There was a solid code review proces, a stable Jenkins server, motivated people, a reasonable amount of integration tests and a product owner who is very involved and very often in […]

Show full content

I just celebrated a year at a my customer. When I arrived the project was in a good shape with a few rough edges. There was a solid code review proces, a stable Jenkins server, motivated people, a reasonable amount of integration tests and a product owner who is very involved and very often in our vicinity. We serve thousands of users and our analist has the confidence to do a release at the busiest time of day.

Clean up your logging and keep it clean

In my first week I was amazed by the thousands of daily error messages in the production logging and the fact that nobody was screaming in panic about it. Most people were used to these amounts as it grew historically (not because people didn’t care, it just happened). The problem with this phenomenon is that small warnings are lost in the sea of errors. New features were introduced, logged some errors and nobody noticed. The customer noticed eventually. When your feedback loop is this long it is harder to fix the error since the features you created are not fresh in your memory anymore and could be ‘contaminated’ with code from other features.

My goal was to reduce the error logging to 0 and make sure the team is on the alert when error messages show up on the dashboard. Our logging is collected in a central place, distributed with Kibana and displayed on a large monitor in the team room. This made things very easy (When you are not collecting your logs in a central place it really is time to start now. Products like Logstash and Graylog are great tools to make this happen.) .
Now you have to create some structure in the error messages. Start with a top 10 of error messages and create tickets for them. This usually solves 80% of the problems. I also created a page with ‘known errors’ for these log messages with a ticket number (or an explanation of the solution). A lot of errors are recurring on the long term (even if you’re pretty sure they aren’t. Yes, also on your project), so this will be a good investment.
Repeat this cycle until the amount of errors is approaching zero.
Now it is time to prevent this from happening again by appointing a ‘developer of the day’.

Appoint a developer of the day

At my previous customer a developer of the day was introduced. At first it might seem like a waste of resources but it has several advantages :

The other developers can focus on the sprint and have less context switches.
Knowledge is shared automatically
You keep your logging clean

Context switches are expensive. When you’re in a deep concentration it can take 15 minutes to ‘restore’ that context when you’re disturbed (and this happens often since we’re in an open-plan office).
Since everybody sees all the error messages all parts of the application are touched. When you’re researching a problem you automatically learn about the code (or can ask the developer who ‘owns’ that part).
Because error messages are investigated every day the logging stays clean. Somehow it’s easier to clean up small bits instead of a big mess.

When we just started with this concept it took almost the all day, now it’s usually one or two hours a day.

Our integration tests were very flaky. When the build failed a retry was usually sufficient to make it succeed. I did some experiments with Karate and really loved it. The whole team liked it when I proposed it, so we decided to replace all the flaky tests with Karate. This proces can take a while, but is really worth it in the end. An advantage of Karate is that our test team can also read the Karate tests and is involved with the reviews of these tests. It’ll even save them some time with their end-to-end tests because they know which cases are covered already. When your tests are replaced run some coverage tools and check where integration tests are missing. Note that this isn’t watertight, but gives you a good indication on where to focus.
After a few months we had so much confidence in the code that scary refactorings were possible again. The Karate tests also saved us on numerous occasions.

When I started on the project there was a custom dashboard application that was constructed by taking screenshots from graphs (that were ‘shot’ too early sometimes). It wasn’t great anymore, but it worked at the time. One day it broke so badly that the operations team decided not to fix it and switched to Grafana.
This immediately improved the quality of our application. Grafana has data sources for Cloudwatch, web server logging and Kibana. We were now able to detect problems before they became a problem (like increasing response times and high application load are canaries in the coal mine).

When you have a great dashboard your team members will scream when something is red. Make sure that there really is a problem when things are red otherwise the dashboard is like the boy who cried wolf and you end up like you did with the sea of log messages.

There is a Dutch saying : ‘Meten is weten’ that means ‘to measure is to know’.
I already wrote an article about the memory problems we had : Introduction to java heap tuning
And this rule is also applicable to other things you can measure (like the Mongo slow query log). Don’t blindly change parameters, prove that your gut feeling is right.

One caveat is that tuning can also hurt. We added a feature that needed a lot more memory than before. Since we still need to improve our load testing this problem appeared very late (when we already released) and slowed down the application. So don’t be too aggressive and have some safe guards (like a proper load test).

A year ago the team was doing about one release every 8-10 days. This isn’t really bad, but we were having some QA-issues and the rest of our infrastructure is in such a good shape that it shouldn’t be too hard to reach continuous deployments in order to tackle the QA-issues.
There is a saying ‘if it hurts do it more often’, so that’s what I’m pushing for.
Release frequency is improved slightly and there is a correlation with the QA-issues, but I still think that not all the pain is visible. This is still a thing I’m fighting for, but I probably need some more ammo a to show that it really helps.

So how did I came up with all these ideas? Most ideas I read in books and talks I watched. But all those are ideas are theoretical. The real proof of the pudding is practice. You can’t try all ideas, what you can do is pitch those ideas at the coffee machine, when you get a lot of positive feedback you should give it a shot. Drinking coffee is a surprisingly good way to exchange information with other teams/developers. Since the applications of the customer have a lot of common ground (and most of them share a platform) you’ll probably run into the same problems.

I hope this article will help to improve the quality of your application.
At our customer the conditions are pretty good. We have an involved product owner and there is room for suggestions and improvement with management. I do realise that we’re lucky and this makes improvements a lot easier. As you can read between the lines there’s still room for improvement and when those things are improved new things can be improved.
Take baby steps and don’t try to build a pyramid when your shed still has a leaky roof.
I’d like to conclude with a saying I recently heard : “Don’t let perfect be the enemy of good”, stop improving when it’s a certain area is good enough, it will only annoy people and waste resources.

http://vanwilgenburg.wordpress.com/?p=568

Extensions

Introduction to Java heap tuning – Some easy steps to improve response times

Jeroen van Wilgenburg Mar 5, 2018

At our current project we wanted to upgrade our EC2 instance to a newer family and generation to improve response times and make our application start faster. This simple task started with blindly increasing the heap size and ended with counting strings on the heap. A few months ago our cpu usage was running a […]

Show full content

At our current project we wanted to upgrade our EC2 instance to a newer family and generation to improve response times and make our application start faster. This simple task started with blindly increasing the heap size and ended with counting strings on the heap.

A few months ago our cpu usage was running a bit hight and we thought it was caused by garbage collection (since the memory usage was high). We increased the heap size of some jvm’s and our problems seemed solved. A few weeks later we wanted to upgrade our r4.large (15.25GB RAM) instances to to m5.large (8 GB RAM). The m5 instances are cheaper, have better response times with our application and start our application faster. Due to the earlier ‘fix’ we were about 1GB short to run on a 8GB machine. Since the cpu usage didn’t change much after the ‘fix’ I decided to use a less blunt weapon.

A bit of background information: we’re running about 15 jvm’s on a single node and most interactions with those jvm’s are stateless.

Before you start your tuning you should equalize the -Xms and -Xmx (starting and max heap size) to make sure you’re measuring the right thing. Another thing to keep in mind is to keep the memory usage below 90%. We haven’t figured out yet what happens at this limit (why it is 90% and whether it applies to your environment), but it makes the measurements less reliable (and you need some slack anyway for OS caching and buffering).

JStat

My starting point was the excellent article ‘How to Tune Java Garbage Collection’. The article left us with 4 simple rules supplemented with my own rule (the last one).

Minor GC is processed quickly (within 50 ms).
Minor GC is not frequently executed (about 10 seconds).
Full GC is processed quickly (within 1 second).
Full GC is not frequently executed (once per 10 minutes).
GC time should not exceed 1% cpu usage.

These rules are just rough guidelines and are a starting point to begin the tuning. In practice you will probably find your own numbers that match your application.

The GC numbers can be found by executing the jstat command with a pid as parameter (ie jstat -gc 13214)

An example result :

S0C    S1C    S0U    S1U      EC       EU        OC         OU       MC     MU    CCSC   CCSU   YGC     YGCT    FGC    FGCT     GCT   
25600.0 26624.0  0.0   3452.1 559616.0 174936.8  307200.0   257942.1  93312.0 83157.9 11904.0 10339.9   5807   86.154  32      8.709   94.863

The columns are explained here.

The average minor GC time (1) is YGCT divided by YGC. Full is FGCT divided by FGC (3).
For the other times you need the uptime of the jvm (so NOT the uptime of your machine).
Minor frequency (2) is the uptime in seconds divided by YGC. Full is the uptime in minutes divided by FGC (4).
The cpu usage (5) is GCT divided by the jvm uptime.

Visualizing of JStat output

Since our application is an application for high school students our peak load is between 8:30 and 14:00, so it makes sense to tune the application for this period. You can achieve this by running jstat twice, once at 8:30 and once at 14:00 and subtract the differences (the -t option will add a timestamp to the result). Now you also can calculate the cpu usage for GC during that period.

Another useful tool is jstatplot. This tool draws a plot of the statistics. When you want to see things about young collection make sure the frequency is high enough to be useful (about 200ms).

This image is output from our application. The red color is the old space. As you can see it’s pretty nice saw tooth and it’s relatively small compared to the young space.

Changing the ratio young/old

The heap space is roughly divided in young and old space. Objects with a short life time live in the young space, older objects live (or are promoted) to the old space. Our application is stateless, so we can expect relatively more young objects.

The first step in tuning the ratio is -XX:NewRatio. The default ratio is 2:1 (2 parts old, 1 part young). This effectively means 66.7% of the space is dedicated to old objects. The only improvement we can make here is changing 2 to 1 for a 1:1 ratio, or 50% each.

To tune more accurately (or if 50% is still way too much) you can use -XX:NewSize where you can set an exact amount of MB’s. Note that this number cannot be changed, so it’s wise to have -Xms and -Xmx the same size. Another warning is that when some fool decides to increase the heap size he probably forgets to change the -XX:NewSize.

Verify your changes

When the numbers of the 5 numbers from jstat are within the margins it’s also wise to keep track of the total cpu and memory usage of your node and response times of your application. Since we’re running 15 jvm’s and our performance is more than acceptable we decided to level all the numbers so each application spends about the same time garbage collecting. This probably can be improved with some nifty machine learning algorithms, but since we’re running on EC2 it is probably cheaper to just add another machine.

Conclusion

Don’t blindly increase the heap size, it can actually harm your response times. Always verify your changes (or better, do A/B testing).
Don’t go overboard with tuning every MB of the heap. Keeping things fool proof is usually a better choice (-XX:NewRatio for example grows with the heap, -XX:NewSize doesn’t, but is more accurate). Also keep in mind that devops-time is often more expensive than adding another node.

I also did some experiments with the G1 garbage collector and -XX:+UseStringDeduplication (after analysing a heap dump), but this didn’t improve things (but they might in your situation, so keep it in mind as a next step).

Sources

Sun Java System Application Server Enterprise Edition 8.2 Performance Tuning Guide
Advanced JVM and GC tuning
Heap tuning parameters
Java SE Tools reference

http://vanwilgenburg.wordpress.com/?p=559

Extensions

https://vanwilgenburg.wordpress.com/feed

Posts