Jake Wharton — GeistHaus

Apr 1, 2026 Updated Apr 1, 2026

Show full content

Eight years ago we launched a library of Kotlin extensions for the Android platform, Android KTX. In the time since, the library became Core KTX, and numerous other KTX libraries were written for other AndroidX libraries. And while KTX's approach to adding Kotlin niceties was built on a strong technology foundation, we’ve made the difficult decision to begin winding down our KTX extension libraries1.

That's right folks–toss another entry on the Killed By Google board!

Despite the date this is no joke.

However, mourn not, friends. The KTX libraries were killed because the adoption of Kotlin has been such a resounding success. All extensions have now been merged directly into their respective main library. Woo!

Below is a table of every library which had a -ktx module and the first version where it became empty and thus obsolete.

KTX library Obsolete in version activity-ktx 1.9.0 appsearch-ktx None, empty2 collection-ktx 1.3.0 concurrent-futures-ktx 1.4.0-alpha013 4 core-ktx 1.19.0-alpha013 4 dynamicanimation-ktx 1.2.0-alpha013 4 fragment-ktx 1.9.0-alpha013 4 lifecycle-livedata-ktx 2.7.0 lifecycle-livedata-core-ktx 2.8.0 lifecycle-reactivestreams-ktx 2.6.0 lifecycle-runtime-ktx 2.8.0 lifecycle-viewmodel-ktx 2.8.0 loader-ktx 1.2.0-alpha013 4 navigation-common-ktx 2.4.0 navigation-fragment-ktx 2.4.0 navigation-runtime-ktx 2.4.0 navigation-ui-ktx 2.4.0 paging-common-ktx 3.0.0 paging-runtime-ktx 3.0.0 paging-rxjava2-ktx 3.0.0 palette-ktx 1.1.0-alpha013 4 preference-ktx 1.3.0-alpha013 4 savedstate-ktx 1.3.0 security-crypto-ktx None, deprecated5 sqlite-ktx 2.7.0-alpha033 4 tracing-ktx 1.3.0 transition-ktx 1.8.0-alpha013 4 watchface-complications-data-source-ktx 1.4.0-alpha013 4 work-runtime-ktx 2.9.0

There is a feature request on Lint to provide a warning when you are declaring a KTX library equal to or newer than when it became obsolete. Hopefully this will be implemented and can aid in migrating your codebase over time.

I had the privilege of starting the KTX libraries. And I also had the privilege of eliminating the last of them!

Huge thanks to Chris Banes and Romain Guy who built that first Android KTX library alongside me. Thanks to Aurimas Liutikas and Alan Viverette for putting the infrastructure in AndroidX so it could move into AOSP, grow into multiple libraries, and ultimately become obsolete as the main libraries became Kotlin-first. Thanks to Marcello Galhardo for getting the ball rolling on the final elimination. And finally, thanks to the ~157 contributors6 both inside Google and externally who helped build them out over the years.

A parody of Stadia's death, and many, many others. ↩
This library has a KTX module, but it's always been empty. ↩
As of April 1st, 2026, this version has not yet been released. I will update the table once that occurs in the coming weeks. ↩ 2↩ 3↩ 4↩ 5↩ 6↩ 7↩ 8↩ 9↩ 10↩
Libraries with only an alpha version listed do not have a stable release yet. I will update the table once the library goes stable in the coming months. ↩ 2↩ 3↩ 4↩ 5↩ 6↩ 7↩ 8↩ 9↩ 10↩
This library has all public API (including its KTX) deprecated. As a result, it will not be migrated. You should eliminate your use of the entire library regardless of KTX usage. ↩
Computed via git rev-list --all --pretty="%an" -- "*/*ktx*" | grep -v "commit " | grep -iv "treehugger" | sort | uniq | wc -l. ↩

https://jakewharton.com/an-update-on-android-ktx

Let's defuse the Compose BOM

Dec 3, 2025 Updated Dec 3, 2025

Show full content

Many people rely on the Compose bill of materials (BOM) artifact to provide the complete set of Compose dependency versions.

If we use Compose’s foundation 1.8.0 but a transitive dependency bumps foundation-layout, there’s a risk that these two versions are incompatible with each other despite otherwise being stable libraries. The Compose BOM will unify the versions so that all are guaranteed to work with each other.

Since Compose comprises about 15 individual libraries, the Compose BOM provides us with only a single version that we have to manually change when upgrading. Nice and simple.

But wait…

We don't really need those things!

Every AndroidX library automatically bundles peer dependency constraints into its Gradle module metadata which ensures that within a library group all artifacts resolve to the same version.

Here's a fragment from the Gradle module metadata for foundation-layout v1.10.0:

"dependencyConstraints": [
  {
    "group": "androidx.compose.foundation",
    "module": "foundation",
    "version": {
      "requires": "1.10.0"
    },
    "reason": "foundation-layout is in atomic group androidx.compose.foundation"
  },
  {
    "group": "androidx.compose.foundation",
    "module": "foundation-lint",
    "version": {
      "requires": "1.10.0"
    },
    "reason": "foundation-layout is in atomic group androidx.compose.foundation"
  }
],

This means that in the scenario above, with a mismatched transitive dependency bump, the module metadata instructs Gradle to automatically bump all artifacts in that group. No manual action or BOM usage required.

As to the single version, did you know there’s actually only five library groups in the Compose BOM? Despite encompassing about 15 libraries, it’s only actually defining four distinct versions (Compose UI and Material library groups share a version). Tools like Renovate or Dependabot can track the libraries in use and query upstream Maven repositories for new versions. When one is available, a PR is automatically created bumping the affected libraries. No manual version bumping required.

Why does the Compose BOM exist at all then?

Build systems which do not use the Gradle module metadata will not automatically align sibling artifacts which exist in the same library group. The BOM is a concept created by Maven, and it’s defined using the same pom.xml format as a normal artifact published to a Maven repository. We publish BOMs for libraries with multiple artifacts like OkHttp, Retrofit, and Wire. Someday we’ll also publish Gradle module metadata for those libraries (and more), making their BOMs entirely redundant for Gradle users.

Additionally, Gradle’s version catalogs are a relatively new concept which standardizes and centralizes how a project defines its external library coordinates and their versions. Prior to this, you either did it in build script code or for smaller projects put versions directly in the build file. The Compose BOM allows these types of builds to omit the versions on the 15 individual libraries it covers and only specify its sole version. When the version catalog is in place, however, the four Compose versions are defined once in the [versions] table.

The BOM can hold you back

Unlike the normal AndroidX release cadence, it is only released once or sometimes twice a month. Sometimes it inexplicably doesn’t release for a month or two. Sometimes a new release actually contains zero changes from the last. Sometimes it does bump all the Compose dependencies, but those libraries can still have their own individual releases between the BOM releases. I don’t fully understand what process or policy causes all this inconsistency on Google’s side, but it all combines to make the BOM a somewhat unreliable source of versioning.

This single, date-based number ends up masking the real versions. Oh, that bug you hit was fixed in Foundation 1.9.4? Well you’re on 2025.10.01 of the BOM. Now go find the website which contains the mapping to determine if you've already got the fix or not.

Finally, if you start gradual adoption of AndroidX betas, you are partially overriding the versions declared in the BOM. So not only is it an indirection that masks the real versions, it’s not even the full source of truth.

Because the versions in the BOM still participate in normal Gradle dependency resolution semantics, transitive dependencies and local overrides can change the resolved version. Unless you commit a lock file, you won't actually know what version is going into your final artifact.

The next time you start a new project, consider defining the two or three versions of the Compose library groups you use the same way you do every other dependency. And if you're already on an existing project which is using it, consider calling in the BOM squad to safely dispose of this artifact.

🚫💣

https://jakewharton.com/defuse-the-compose-bom

You should use AndroidX betas

Nov 19, 2025 Updated Nov 19, 2025

Show full content

Did you know the versioning of AndroidX libraries and their stability guarantees are different from most libraries? Their betas and RCs are actually production-ready, and you should be using them!

In a “normal” library, such as the ones I release, features are added and known bugs are fixed to produce a stable release which might be released as version 1.2.0. If any bugs are found in that release, they get fixed and put into a version 1.2.1. If new APIs are added, the next version becomes 1.3.0. This is basic semantic versioning.

AndroidX does not do versioning this way. When a library has its features added and its known bugs fixed they promote that library to beta01. This artifact is now API stable! Don’t believe me? This is documented in their guidelines. They also have tooling which validates that you cannot break APIs or even introduce new APIs once an artifact has reached beta.

Thus, when AndroidX releases a 1.2.0-beta01 of some library, it is equivalent to a normal library releasing a 1.2.0. This is still semantic versioning, but it’s a more strict subset that imposes restrictions on prerelease versions.

Why do they do this? The motivations are simple: they want the stable versions to be extremely stable. That is to say, to have most of the bugs that would otherwise necessitate subsequent patch releases to be caught in the ramp-up to 1.2.0.

Here’s all those words in chart form:

Normal library AndroidX library 1.2.0-RC 1.2.0-alpha01 1.2.0 1.2.0-beta01 1.2.1 1.2.0-beta02 (etc.) 1.2.2 1.2.0-rc01 (etc.) 1.2.0 (same bits as final RC)

Wondering if anyone else uses these betas? All of Google’s first-party apps ship against the code in AndroidX HEAD. Not only are they relying on these beta and RC versions, they build, test, and ship with the alpha versions and random commits in-between. By the time a library even reaches -beta01 it has already been widely tested and deployed. AndroidX is to Google’s apps as what all of your util- and common- modules are to your app: shared code which is part of their codebase.

In the past, at Cash App, we've had to occasionally bump to a beta to work around a bug or get access to a new feature.

There was a bug in the graphics shape library stable version that was breaking the rounded corners of our bottom sheet decoration.
The collection library’s primitive specializations of ScatterMap had a bug which caused deleted values to be returned during insertion.
A handful of Compose issues, whether in the runtime, foundation, or UI around recomposition problems, layout problems, retained state problems, etc.

In all those cases we reported bugs (or found existing bugs) and bumped to the next version’s betas or RCs in the meantime.

Sticking to stable versions is the easy play, but there’s a cost to that too. One of the tradeoffs made in AndroidX’s choice to version this way is that stable versions are much farther apart. Let’s say you’re on Compose UI 1.8, and you find a bug when bumping to Compose UI 1.9. Not only are you stuck on 1.8 (assuming it has no workaround), but you now have to wait months for Compose UI 1.10. And then you hope the cycle doesn’t repeat. If you found that same bug in the Compose UI 1.9 betas, however, then it’s fixed in a few weeks, and you can upgrade months sooner.

Choosing to do this doesn’t mean unleashing prerelease chaos across the hundreds of AndroidX libraries all at once. You can gradually try this out–that's what we're doing. Libraries which are mature (collection, core, activity, etc.) mean there’s simply not a lot of changes happening, so the risk is low. Libraries which are wildly load-bearing (Compose runtime, foundation, and UI) are also good candidates since you really need to find any bugs in them as soon as possible.

If you've put the necessary testing infrastructure in place, you're already set up to do this with confidence. Comprehensive unit tests ensure correct behavior. Screenshot tests ensure correct pixels. And instrumented tests, end-to-end tests, and/or manual QA testing is the reliability backstop ensuring nothing falls through the cracks.

No matter what, as AndroidX library users, we have a shared responsibility to report bugs upstream so that they get fixed. You cannot assume that someone else will find them. At Cash App, we have a very large codebase, and we do lots of interesting things with these libraries. In order to really maximize the benefits of the betas, we plan to continue to find and report any bugs found. This helps us stay on the path of upgradability, but also helps everyone else using these libraries. And hey, I think you should consider doing the same!

https://jakewharton.com/you-should-use-androidx-betas

Custom short-link redirector

Oct 14, 2025 Updated Oct 14, 2025

Show full content

I used to use Bit.ly to put links into my presentations. Their service allowed you to customize the path portion of the link, so I was able to create links like bit.ly/ok-libs. If you were attending the talks live, watching the recording, or browsing the slides, the short URL was easy to type into a browser.

Unfortunately, the path customization of Bit.ly is a global namespace. Short and memorable paths became increasingly hard to find.

Ten years ago I bought the jakes.link URL to solve this problem. Bit.ly let you point custom domains at their service, and each then gets its own path namespace without risk of collision.

A few months ago bit.ly announced that they would show a preview page with ads before redirecting. Gross. This happens for links on their domain and on custom domains for all free users. I'd be happy to pay a few bucks a year to avoid this, but the cheapest plan which supports custom domains is $350/year (paid annually). That's nothing short of ridiculous for the one or two links per year which I create.

Netlify

I use Netlify for hosting this site because one of its features is server-side redirects. This ensures that I can keep old URLs working, because a good URL is forever. But it also makes Netlify a great candidate for a build-your-own short-link redirector.

Here's how I migrated jakes.link in three steps:

Create a git repo with a _redirects file following Netlify's redirect documentation.
```
/        https://jakewharton.com  302
/how-to  https://jakewharton.com/custom-short-link-redirector/ 302
```
(I use HTTP 302 redirects so that if third-party content moves over time I can at least update my redirects.)
Create a project on Netlify and link it to the git repo (on GitHub or wherever else). It will automatically deploy your redirects to a subdomain on their domain which you can use to test (e.g., jakes-link.netlify.app). Pushes to the git repo will now be automatically deployed.
In the "Domain management" section on the Netlify project, add your custom domain as an alias. You will have to either point your nameservers at Netlify to have it manage DNS automatically, or add the necessary DNS records to whatever service manages your domain. After a day or so, the new nameservers and DNS will propagate and everything should be working.

Try it out: jakes.link/how-to (link to this post).

As a nice bonus, this can all be accomplished on Netlify's free tier. Their paid plans aren't expensive, but probably too expensive for just a link redirector.

Alternatives

I am very satisfied with the Netlify-based solution. It's free, entirely server-side, and driven by data that I control. Bit.ly worked for ten years, and I'll be happy if I get ten years out of Netlify, too.

The two big features lost in the move is analytics and the experience of adding a link. I don't need analytics, and I'm very comfortable editing a file in a git repo to create new links, but perhaps you're not! Thankfully there are plenty of alternative approaches to consider.

I briefly looked at short.io which is more of a direct competitor to bit.ly. Their features look comprehensive, and the free and paid tiers look like good value. If I wasn't already paying Netlify I would have probably used this service.

There are a few link-shortening apps which you could host yourself, like yourls.org. I do self-host a few services, but given the overwhelming simplicity of a link shortener I don't expect hosted services to churn that much. There's just not enough value for me to host my own given how infrequently I shorten links.

Finally, you could write your own simple server-side URL shortener in about 15 lines of code and a full-featured one in about 50. If you choose the right language, these can be deployed on various service providers under the name "workers", "functions", "lambdas", etc. In the old, dark days I used to run an express.js-based one for ActionBarSherlock. With the Netlify setup having data-driven routes in a git repo I control, this is probably my fallback option.

https://jakewharton.com/custom-short-link-redirector

Fan-in to a single required GitHub Action

May 7, 2025 Updated May 7, 2025

Show full content

It doesn't take long for a project to spawn multiple jobs in their GitHub Actions. Parallelization can lead to huge speedups for PRs. Job grouping makes it easier to conditionally enable or disable multiple steps. Each time you add a new job, however, you have to mark it as required in branch protection to prevent failing PRs from accidentally merging.

Being a clever person, you might create a final job which lists all the other jobs as required, and then mark that as the single required job.

jobs:
  # …

  final-status:
    needs:
      - build
      - unit-tests
      - emulator-tests
      - screenshot-tests
    # …

Unfortunately, this does not work in practice.

GitHub will skip the 'final-status' job if any of its 'needs' fail, and skipped jobs are treated as passing according to the docs:

A job that is skipped will report its status as "Success". It will not prevent a pull request from merging, even if it is a required check.

To work around this undesirable behavior, first, change the job to always run (unless canceled):

   final-status:
+    if: ${{ !cancelled() }}
     needs:
       - build
       - unit-tests
       - emulator-tests
       - screenshot-tests
     …

Next, add a step which ensures the status of each 'needs' job was successful:

    steps:
      - name: Check
        run: |
          results=$(tr -d '\n' <<< '${{ toJSON(needs.*.result) }}')
          if ! grep -q -v -E '(failure|cancelled)' <<< "$results"; then
            echo "One or more required jobs failed"
            exit 1
          fi

Finally, you can mark this job the only required one. It will now successfully reflect the status of all jobs. You can also hang additional steps on it, or even entire subsequent jobs (provided they aren't needed for PRs).

I'm using this setup on a few repos such as Mosaic where you can also see a downstream 'publish' job which only runs on the integration branch.

An alternative is to have a final job which only runs when one of its 'needs' fails and then to fail itself. An example of this strategy was posted on the Actions issue tracker. This approach is simpler, but precludes any additional steps or jobs.

https://jakewharton.com/fan-in-to-a-single-required-github-action

Compile-time validation of JNI signatures

Mar 12, 2025 Updated Mar 12, 2025

Show full content

JNI allows managed code inside the JVM or ART to call into native code. Java methods can be declared as native, and then a corresponding C function1 can be written and automatically wired together when the native library is loaded.

Native code lacks mechanisms like packages and overloads, so a special format is used to encode the Java method signature. A Java method defined as:

package com.example;

class Things {
  static native long createThing(String name, int count);
}

Requires a matching C declaration which looks like:

jlong Java_com_example_Things_createThing(
    JNIEnv *env, jclass type, jstring name, jint count) {
  // …
}

If you add parameter overloading into the mix, the C declaration must include the parameter signature as well:

jlong Java_com_example_Things_createThing_Ljava_lang_String_2I(
    JNIEnv *env, jclass type, jstring name, jint count) {
  // …
}

Woof! And if you get any part of the encoding wrong, the method call will fail at runtime:

Exception in thread "main" java.lang.UnsatisfiedLinkError:
    'long Things.createThing(java.lang.String, int)'
  at Things.createThing(Native Method)
  at Main.main(example.java:6)

In my experience, these signatures do not change frequently. Once they're correct you can mostly just leave them untouched. However, it's a class of problem that would be nice to eliminate completely. Especially if within your projects they do change frequently.

JNI header generation

When compiling native code, a header represents a series of functions implemented somewhere else. It allows consumers of a library to compile against its API without requiring the full implementation. When compiling the library itself, the compiler requires all header functions have corresponding implementations.

Defining a manually-written header for our C functions would be redundant and subject to all the same problems above. Instead, we want to automatically derive the header from the corresponding Java code. As of Java 8, javac can do this for us with its -h flag. Let's learn how to use it from javac's help output

❯ javac -h
error: -h requires an argument
Usage: javac <options> <source files>
use --help for a list of possible options

Wait… shit. Please don't use -h for real flags.

Anyway, it just takes a directory.

❯ javac -h h -d out Example.java

❯ tree
.
├── Example.java
├── things.c
├── h
│   └── com_example_Things.h
└── out
    └── com
        └── example
            └── Things.class

Here is the full content of com_example_Things.h:

/* DO NOT EDIT THIS FILE - it is machine generated */
#include <jni.h>
/* Header for class com_example_Things */

#ifndef _Included_com_example_Things
#define _Included_com_example_Things
#ifdef __cplusplus
extern "C" {
#endif
/*
 * Class:     com_example_Things
 * Method:    createThing
 * Signature: (Ljava/lang/String;I)J
 */
JNIEXPORT jlong JNICALL Java_com_example_Things_createThing
  (JNIEnv *, jclass, jstring, jint);

#ifdef __cplusplus
}
#endif
#endif

Among the requisite boilerplate for single inclusion and C++ support is a function signature matching the one we wrote above! Including this file from our .c will cause the native compiler to validate all header functions have corresponding C implementations.

Let's change the Java native method signature and see what happens.

 class Things {
-  static native long createThing(String name, int count);
+  static native long createThing(String name, int count, byte[] buffer);
 }

❯ javac -h h -d out Example.java

❯ clang -I "$JAVA_HOME/include" \
    -I "$JAVA_HOME/include/darwin" \
    -I h \
    things.c

things.c:4:7: error: conflicting types for 'Java_com_example_Things_createThing'
jlong Java_com_example_Things_createThing(
      ^
h/com_example_Things.h:15:25: note: previous declaration is here
JNIEXPORT jlong JNICALL Java_com_example_Things_createThing
                        ^
1 error generated.

It's not the most amazing message that I have seen. But it did fail compilation! Fixing the problem is now a matter of comparing the two signatures and updating the C file as needed.

Gradle does this automatically

Do you use Gradle? Good news! It automatically configures your javac with the -h flag so you don't really need to do much.

❯ tree
.
├── build
│   ├── classes
│   │   └── java
│   │       └── main
│   │           └── com
│   │               └── example
│   │                   └── Things.class
│   ├── generated
│   │   └── sources
│   ⋮       └── headers
│               └── java
│                   └── main
│                       └── com_example_Things.h
⋮
├── build.gradle
├── src
│   └── main
│       └── java
│           └── com
│               └── example
│                   └── Example.java
└── things.c

This is the result after moving Example.java into src/main/java/, writing apply plugin: 'java-library' in build.gradle, and invoking ./gradlew assemble

If your native build occurs outside Gradle, the compileJava task should be run first, then the external native build, and finally (with the native binaries put somewhere like src/main/resources/) the full assemble or build task can be run.

For native builds which run as a Gradle task, you can consume the associated JavaCompile task's options.headerOutputDirectory property which becomes an additional include directory.

Kotlin (and other JVM languages)

Alternative languages which target Java bytecode usually have equivalent markers to bind to native functions, such as Kotlin's external modifier. Unsurprisingly, when writing Kotlin we cannot use java -h because we don't have any Java!

There remains a long-standing feature request for kotlinc to generate these headers like javac. Until then, there are three approaches to solving this problem: just write Java, use javah, or write our own tool.

Just write Java

Since JNI methods are only stubs, continuing to write them in Java is not too painful. The Kotlin compiler supports bidirectional references from Java-to-Kotlin and Kotlin-to-Java. This allows your Kotlin to reference the Java stubs, while any Kotlin types needed in those Java stubs still work.

final class Jni {
  static { loadLibrary("my-library"); }
  private Jni() {}
  static native long createThing(String name, int count);
}

Centralizing the stubs in a Java class creates a single location for the native library to be loaded. This does end up limiting access to a single package, which may or may not be desired.

If you are building with Gradle, both Kotlin/JVM and Kotlin/Multiplatform still get the automatic inclusion of the -h flag on the resulting javac execution. All you need to do is create the Java file, and you're good to go!

Finally, writing JNI stubs in Java helps avoid the need to understand how Kotlin (or any other language) maps its features to the underlying bytecode. You no longer need to worry about how objects, internal funs, or value class parameters get converted.

Use old javah tool

Prior to the javac's -h flag producing headers, the JDK contained a standalone javah tool which parsed Java .class files. This means that any other language which targeted Java bytecode and used its ACC_NATIVE flag could generate headers.

While this sounds like the perfect solution for alternate languages, the tool was deprecated in Java 9 and removed in Java 10. However, if your code still targets Java 9 or older your class files can be read by this tool.

If you use Gradle, fetching old-ass JDKs is the perfect use-case for Gradle toolchains (which despite their docs are otherwise rarely a good idea).

def launcher = javaToolchains.launcherFor {
  languageVersion = JavaLanguageVersion.of(8)
}
tasks.register('generateHeaders', Exec) {
  def javah = launcher.map { it.metadata.installationPath.file("bin/javah") }
  executable(javah.get().asFile) // TODO Lazy, once Gradle supports.
  args("-h") // TODO Pass class dirs, etc.
}

❯ ./gradlew -q generateHeaders
Usage:
  javah [options] <classes>
  …

Write your own tool

With the ASM library or Java's new Class-File API, parsing these files to find native methods is possible. Once found, the mapping to their native signature can be done with a well-documented formula. Someone just has to do the work.

There was a repo which attempted this, but it is incomplete and now seemingly abandoned. When the JDK removed the javah tool, the Scala community forked that library to create their sbt-jni plugin. But to my knowledge there is no other general-purpose tool for other languages which fulfills this need today.

Java 22 and FFM

JNI is generally avoided unless extenuating circumstances demand its use. It remains a challenging system even with compile-time validation due to performance concerns, multiple memory spaces, and almost no safety or security. Elaborate tools such as JNA and SWIG were invented to try and simplify native library use over JNI.

Starting last year with Java 22, the new Foreign Function & Memory API became available to use. FFM inverts ownership of the stubs, generating the Java sources from native headers using the jextract tool.

If I manually write a things.h file with a regular C API that can be fed into jextract.

#ifndef things_h
#define things_h

long createThing(char* name, int count, void* buffer);

#endif // things_h

❯ jextract --output ffm things.h

❯ tree ffm
ffm
└── things_h.java

The resulting things_h.java file is a chonker, but among a slew of FFM implementation detail is a public Java API which corresponds to our native function.

public class things_h {
    // …

    /**
     * {@snippet lang=c :
     * long createThing(char *name, int count, void *buffer)
     * }
     */
    public static long createThing(MemorySegment name, int count, MemorySegment buffer) {
        var mh$ = createThing.HANDLE;
        try {
            if (TRACE_DOWNCALLS) {
                traceDowncall("createThing", name, count, buffer);
            }
            return (long)mh$.invokeExact(name, count, buffer);
        } catch (Throwable ex$) {
           throw new AssertionError("should not reach here", ex$);
        }
    }
}

Since both the name and buffer parameters on the C function were pointers to memory, they come across typed as MemorySegments. If we have a Java String and byte[], conversion and/or pinning of their managed memory such that it can be used by native code is required. The FFM API provides utilities for this conversion in a similar way to which jni.h provided conversion utilities for jstring and jbytearray types.

The boilerplate which makes up the rest of the file helps the JVM understand the shape of the native code so that it can be optimized alongside regular Java as well as layered with safety checks to avoid things like unrestricted memory access.

If your minimum-supported JDK is not yet Java 22 or newer, you can still use FFM through multi-release jars. This embeds your FFM-flavored class files inside the META-INF/versions/22/ directory which is only loaded when the consumer is running on Java 22 or newer.

With the ownership reversed, there is no chance of changes to Java breaking the native code. Instead, changes to the native code will now break the Java compilation.

 #ifndef things_h
 #define things_h
 
-long createThing(char* name, int count, void* buffer);
+long createThing(char* name, int count);
 
 #endif // things_h

❯ jextract --output ffm things.h

❯ javac -d out Example.java ffm/things_h.java
Example.java:23: error: method createThing in class things_h cannot be applied to given types;
    return things_h.createThing(nameSegment, count, bufferSegment);
                   ^
  required: MemorySegment,int
  found:    MemorySegment,int,MemorySegment
  reason: actual and formal argument lists differ in length
1 error

With jextract able to parse native headers, the need to write custom C code to support the use of native libraries is diminished. Ideally you would only run jextract on the headers of the desired native libraries and then write 100% of your interaction with it from Java or your favorite JVM language.

Or any language which can produce functions compatible with the C ABI. ↩

https://jakewharton.com/compile-time-validation-of-jni-signatures

Deprecating idling resource libraries

Feb 19, 2025 Updated Feb 19, 2025

Show full content

When Espresso was made public a decade ago, one of its banner features was the "idling resource" concept. This monitored the main thread and any background thread pools to prevent your test from progressing until the app became idle. Waiting until idle generally increased the stability of tests since at that point the UI should be stable.

We released RxIdler and okhttp-idling-resource for monitoring RxJava schedulers and OkHttp's dispatcher, respectively. Today I am deprecating both libraries. In the years since their release, I have become disillusioned with the idling resource mechanism–and I'm not alone.

Like using R.id to target views, idling resources expose the internals of your application to the testing framework in a way that no real user can match. The point of building tests in the robot pattern was to describe interaction at a high-level. If you can't read a UI test to someone over the phone interacting with the real app then it probably encodes implementation detail.

"Okay dad, now wait for OkHttp's Dispatcher to report itself as idle before clicking 'continue'."

Yeah… no.

What do we do as real users? We wait until some UI condition is met which signals our ability to progress.

"Okay dad, now wait for the 'continue' button to turn green before clicking it."

Much better.

We don't care how the application is performing the work nor the means by which it signals the UI that it is complete. Moreover, test failures that occur based on condition waits are failures which can occur in the wild.

I've been sitting on these deprecations and this blog post for a few years now. Telling you to switch to a new technique without actually demonstrating it is not great. Turns out that around the same time Google was also changing their tune on idling resources. That guidance has since been promoted to the official documentation as well. These links demonstrate how to wait on conditions using new built-in Compose testing APIs.

Flow chart showing 'click on button' pointing to 'is the condition met'. Its 'no' branch recurses onto itself. The 'yes' branch points to 'Assert text is displayed'. (Image courtesy developer.android.com)

For View-based layouts, you can write a custom ViewAction that loops on yielding to the main thread, checking the condition, and then either breaking or looping. Yes I know I'm still not really demonstrating how to do this for views. Sorry!

Both of these idling resource libraries are stable and reliable. They haven't needed any commits or releases in years. If you are relying on them today then absolutely nothing is changing for you. Deprecation is a signal to new users that this is not the recommended approach. And it's a nudge to existing users that they can migrate at their own pace to a superior solution.

https://jakewharton.com/deprecating-idling-resource-libraries

Using Renovate to update build JDK

Jan 8, 2025 Updated Jan 8, 2025

Show full content

You want to be using the latest JDK for development. Don't use Gradle toolchains, they'll needlessly force you to use old JDKs. You can still target and test on old JVM versions but develop with the latest and greatest. Java and the JDK are literally built for this.

Locally this hasn't been a problem. Homebrew (or your favorite equivalent) will keep your default JDK at the latest. Keeping my GitHub actions up-to-date, however, frequently slips my mind. I find projects using 19 or 20 simply because I haven't touched the CI build in the two years since 19 or 20 was the latest.

We're already using Renovate to keep dependencies up to date. With a little extra programming in JSON (wince) we can have the JDK version updated to latest as well.

First, migrate the existing build JDK version in your GitHub Action to a .github/.java-version file1.

Next, change the setup-java action to use this file rather than a hard-coded version.

 - uses: actions/setup-java@v4
   with:
     distribution: 'zulu'
-    java-version: 21
+    java-version-file: .github/.java-version

Finally, in your renovate.json52, add a custom manager to update this file3.

ignorePresets: [
  // Ensure we get the latest version and are not pinned to old versions.
  'workarounds:javaLTSVersions',
],
customManagers: [
  // Update .java-version file with the latest JDK version.
  {
    customType: 'regex',
    fileMatch: [
      '\\.java-version$',
    ],
    matchStrings: [
      '(?<currentValue>.*)\\n',
    ],
    datasourceTemplate: 'java-version',
    depNameTemplate: 'java',
    // Only write the major version.
    extractVersionTemplate: '^(?<version>\\d+)',
  },
],

Commit, push, and wait for Renovate to send you a PR4. Now your CI build automatically tracks the latest JDK.

I'm putting the .java-version file into the .github/ folder because I don't want to force this version on people using jenv or the like. The whole point of this setup is you can build with any version of Java newer than our very, very old baseline of Java 8 (although things like Gradle have a higher minimum requirement). ↩
JSON5 is just JSON with fewer quotes, more trailing commas, and actual comments. You don't have to migrate to use this functionality (but you should!). ↩
You can adapt the file-matching regex and string-matching regex to target the GitHub Action .yamls directly if you want. I choose to use the .java-version file so that I can install multiple older JDKs with a second setup-java (which won't be updated) for testing with multiple JDKs. ↩
Here's an example sequence of PRs which did this migration: Switch to .java-version, add custom Renovate config, and Renovate's automatic JDK bump. ↩

https://jakewharton.com/using-renovate-to-update-build-jdk

Nonsensical Maven is still a Gradle problem

Mar 28, 2024 Updated Mar 28, 2024

Show full content

There was a time when I used Maven heavily, but today all the libraries I work on build with Gradle. Even though I'm publishing with Gradle, consumers can use Gradle, Maven, Bazel, jars in libs/ (but please don't), or anything else. That's a huge JVM ecosystem win!

In general, I don't have to think about what build system someone is using. I'm not here to debate subjective pros and cons of one versus any other. There is one notable exception, however. Maven's dependency resolution strategy is objectively bonkers. And if we want to support Maven consumers, we need to think about it.

If you already are familiar with the concept of dependency resolution, you can skip to the nonsense.

Dependency resolution primer

Chances are your build system of choice (or a separate dependency resolver tool) gives you a declarative way to describe your dependencies. At build time, those declarations are resolved to .jars which can be put on the compiler classpath.

Sometimes we call this a dependency tree, but it's actually a dependency graph, as separate nodes can converge back to something common to both.

Project (build.gradle)
├── A
│   └── B
│       └── C v1.0
└── D
    └── C v1.0

If library B and library D agree on the version of library C, then that is the .jar version which is used. If they disagree on versions, some policy needs to decide the appropriate single version to use.

Pop quiz: If library B wants version 1.1 of library C, and library D wants version 1.0 of library C, which single version of C should we use?

Project (build.gradle)
├── A
│   └── B
│       └── C v1.1
└── D
    └── C v1.0

This is not a trick question. Hopefully the answer feels obvious: you use the newer version, 1.1. That version is probably compatible with 1.0, so it's safe for both library B and library D to use. We can't know for sure, to be clear, but it's a safe choice. This behavior is the default in many dependency resolvers, including the one inside Gradle.

The nonsense

When building with Maven, given two dependencies who disagree on a transitive dependency version, the default resolution strategy is... uh... let's say "interesting". From their docs,

Maven picks the "nearest definition". That is, it uses the version of the closest dependency to your project in the tree of dependencies. ... Note that if two dependency versions are at the same depth in the dependency tree, the first declaration wins.

So in a dependency graph, if library B wants version 1.1 of library C, and library D wants version 1.0 of library C, which single version of C does Maven choose?

Project (pom.xml)
├── A
│   └── B
│       └── C v1.1
└── D
    └── C v1.0

The final build will use version 1.0 of library C. Wat.

If library B was using a new API from library C's version 1.1, the application will throw a NoSuchMethodException or the like at runtime.

As if that wasn't bad enough, disagreements which occur on the same conceptual level of the graph are resolved by whichever comes first. If our project replaces its library A with direct usage of library B, suddenly the resolved version is 1.1 because it came first.

Project (pom.xml)
├── B
│   └── C v1.1
└── D
    └── C v1.0

But if by chance library D was declared first in the pom.xml then oops! we're back to getting 1.0.

Project (pom.xml)
├── D
│   └── C v1.0
└── B
    └── C v1.1

This behavior is not user-friendly. You can always force a specific version by declaring it directly in your pom.xml, but that also means you take ownership of monitoring the versions requested by the entire dependency graph and ensuring you declare the one you need. Gee, that sounds like something it should do for you.

Still a Gradle problem

So Maven has some nonsensical dependency resolution semantics. Why should you, a Gradle user, even care?

In the examples above, the version mismatches were demonstrated using peer dependencies on the Maven project. But disagreements can occur within the transitive graph of a single Gradle-built library.

If I am the author of library A from above, I only have a dependency on library B and it has a dependency on library C.

Project A (build.gradle)
└── B
    └── C v1.1

If I want to start using library C, I may add my own dependency (such as if C is an implementation dependency of B) and select an older version.

 Project A (build.gradle)
 ├── B
 │   └── C v1.1
+└── C v1.0

I have just unknowingly created a time bomb for all of my Maven consumers.

Not a hypothetical

Is this a frequent problem? Seems like no. Is this a real problem? Absolutely.

OkHttp 4.12 ships with two dependencies: Okio 3.6 and the Kotlin stdlib 1.8.21. Okio 3.6, however, depends on Kotlin stdlib 1.9.10.

OkHttp v4.12.0
├── Okio v3.6.0
│   └── Kotlin stdlib v1.9.10
└── Kotlin stdlib v1.8.21

This specific configuration is probably okay in practice, as Okio is unlikely to have used anything new. In general, however, the ability to create such a dependency graph with a mismatch is setting our Maven users up for future failure.

Detecting from Maven

If you are a Maven user, you can eagerly detect this case by using the Maven enforcer plugin and its built-in dependency convergence rule.

A Maven project with an OkHttp 4.12 dependency will now fail like this:

[ERROR] Rule 0: org.apache.maven.enforcer.rules.dependency.DependencyConvergence failed with message:
[ERROR] Failed while enforcing releasability.
[ERROR]
[ERROR] Dependency convergence error for org.jetbrains.kotlin:kotlin-stdlib-jdk8:jar:1.9.10 paths to dependency are:
[ERROR] +-com.example:example:jar:1.0-SNAPSHOT
[ERROR]   +-com.squareup.okhttp3:okhttp:jar:4.12.0:compile
[ERROR]     +-com.squareup.okio:okio:jar:3.6.0:compile
[ERROR]       +-com.squareup.okio:okio-jvm:jar:3.6.0:compile
[ERROR]         +-org.jetbrains.kotlin:kotlin-stdlib-jdk8:jar:1.9.10:compile
[ERROR] and
[ERROR] +-com.example:example:jar:1.0-SNAPSHOT
[ERROR]   +-com.squareup.okhttp3:okhttp:jar:4.12.0:compile
[ERROR]     +-org.jetbrains.kotlin:kotlin-stdlib-jdk8:jar:1.8.21:compile

Now a Maven consumer can temporarily resolve the conflict, and go and ask the library maintainer to correct this configuration.

Ignoring the problem with Gradle

Since Gradle is going to resolve to the newest version of a dependency, your tests end up running with the newest version rather than the declared version. As such, you can tell Gradle to replace your declared version with the resolved version when publishing.

This behavior is not Gradle's default, so we must choose it when setting up publishing. The Gradle docs has an example:

publishing {
  publications {
    mavenJava(MavenPublication) {
      versionMapping {
        usage('java-api') {
          fromResolutionOf('runtimeClasspath')
        }
        usage('java-runtime') {
          fromResolutionResult()
        }
      }
    }
  }
}

There's very little harm in doing this, and it will prevent the Maven issue completely. Nice!

The tradeoff is that it somewhat undermines the versions you declare. Keep in mind, though, even if you declare a dependency version and resolve to that same version, a downstream consumer may resolve a newer version or force an older version.

For me, I want the versions which I declare to be those which are resolved, at least local to my project. So this solution isn't going to work, but it might for your projects.

(Thanks to Paul Merlin for suggesting this solution which was added after initial publishing)

Trying to fix with Gradle

I'm going to outright dismiss "just don't use Maven" as a potential fix. There are lots of reasons not to use Maven that one can explore elsewhere. Ultimately it remains in widespread use, and you can either be sympathetic to those users or not.

Library developers using Gradle could change the default resolution strategy to fail on version conflict. This does precisely what it says, fails your build if the transitive graph contains conflicts.

// OkHttp's build.gradle
dependencies {
  implementation 'com.squareup.okio:okio:3.6.0'
  implementation 'org.jetbrains.kotlin:kotlin-stdlib:1.8.21'
}

configurations.configureEach {
  resolutionStrategy.failOnVersionConflict()
}

Now when building we get a failure:

Execution failed for task ':compileJava'.
> Could not resolve all dependencies for configuration ':compileClasspath'.
   > Conflicts found for the following modules:
       - org.jetbrains.kotlin:kotlin-stdlib-common between versions 1.9.10 and 1.8.21
       - org.jetbrains.kotlin:kotlin-stdlib between versions 1.9.10 and 1.8.21

The failure suggests running dependencyInsight, which shows you a wall of text containing the subgraph of affected dependencies which led to the conflict.

> Task :dependencyInsight
Dependency resolution failed because of conflicts on the following modules:
   - org.jetbrains.kotlin:kotlin-stdlib-common between versions 1.9.10 and 1.8.21

org.jetbrains.kotlin:kotlin-stdlib-common:1.9.10
  Variant compile:
    | Attribute Name                 | Provided | Requested    |
    |--------------------------------|----------|--------------|
    | org.gradle.status              | release  |              |
    | org.gradle.category            | library  | library      |
    | org.gradle.libraryelements     | jar      | classes      |
    | org.gradle.usage               | java-api | java-api     |
    | org.gradle.dependency.bundling |          | external     |
    | org.gradle.jvm.environment     |          | standard-jvm |
    | org.gradle.jvm.version         |          | 21           |
   Selection reasons:
      - By conflict resolution: between versions 1.9.10 and 1.8.21

org.jetbrains.kotlin:kotlin-stdlib-common:1.9.10
+--- com.squareup.okio:okio-jvm:3.6.0
|    \--- com.squareup.okio:okio:3.6.0
|         \--- compileClasspath
\--- org.jetbrains.kotlin:kotlin-stdlib:1.9.10
     +--- compileClasspath (requested org.jetbrains.kotlin:kotlin-stdlib:1.8.21)
     +--- org.jetbrains.kotlin:kotlin-stdlib-jdk8:1.9.10
     |    \--- com.squareup.okio:okio-jvm:3.6.0 (*)
     \--- org.jetbrains.kotlin:kotlin-stdlib-jdk7:1.9.10
          \--- org.jetbrains.kotlin:kotlin-stdlib-jdk8:1.9.10 (*)

The fix for OkHttp is simple: upgrade to a matching version.

Unfortunately, if you upgrade to a version that's newer than your transitive dependency, the build still fails.

> Could not resolve all dependencies for configuration ':compileClasspath'.
   > Conflicts found for the following modules:
       - org.jetbrains.kotlin:kotlin-stdlib between versions 1.9.23 and 1.9.10
       - org.jetbrains.kotlin:kotlin-stdlib-jdk8 between versions 1.9.10 and 1.8.0
       - org.jetbrains.kotlin:kotlin-stdlib-common between versions 1.9.23 and 1.9.10
       - org.jetbrains.kotlin:kotlin-stdlib-jdk7 between versions 1.9.10 and 1.8.0

You have to force the use of 1.9.23 everywhere, but doing so will ironically prevent failOnVersionConflict() from detecting mismatches in the future.

Gradle has other mechanisms like constraints and resolution strategy callbacks that have tons of power to customize dependency resolution, but none provide the ability to reject upgrades. I would love to be corrected on this, but I spent a few days searching and experimenting with no success. Instead, we have to build our own solution.

Actually fixing with Gradle

I wrote a task which consumes the dependency graph and checks if the first-order dependencies (i.e., those your project declared directly) select the same version as they request.

> Task :sympathyForMrMaven FAILED
e: org.jetbrains.kotlin:kotlin-stdlib:1.8.21 changed to 1.9.10

* What went wrong:
Execution failed for task ':sympathyForMrMaven'.
> Declared dependencies were upgraded transitively. See task output above. Please update their versions.

When I bump my declaration to 1.9.10 to match, or even 1.9.23 which is the latest right now, the task no longer fails.

BUILD SUCCESSFUL in 354ms
4 actionable tasks: 4 executed

This is what I hacked up in Groovy very quickly this morning (and to finish the damn post):

def fail = false
def root = configuration.incoming.resolutionResult.rootComponent.get()
((ResolvedComponentResult) root).dependencies.forEach {
  if (it instanceof ResolvedDependencyResult) {
    def rdr = it as ResolvedDependencyResult
    def requested = rdr.requested
    def selected = rdr.selected
    if (requested instanceof ModuleComponentSelector) {
      def requestedVersion = (requested as ModuleComponentSelector).version
      def selectedVersion = selected.moduleVersion.version
      if (requestedVersion != selectedVersion) {
        logger.log(ERROR, "e: ${rdr.requested} changed to ${selectedVersion}")
        fail = true
      }
    }
  }
}
if (fail) {
  throw new IllegalStateException("Declared dependencies were upgraded transitively. See task output above. Please update their versions.")
}

This needs cleaned up before it can be used generally–sorry! In a long post about how Maven's dependency resolution is annoying, I instead became very annoyed at Gradle and just want to stop working on this.

Someone please change it to Java, wrap it in a task, wrap that in a com.yourname.maven-sympathy plugin, publish to Maven Central, and ping me to update this post. I have about 30 projects I'd love to slap it on, and hopefully other sympathetic library authors who read this post will too!

https://jakewharton.com/nonsensical-maven-is-still-a-gradle-problem

Gradle toolchains are rarely a good idea

Mar 21, 2024 Updated Mar 21, 2024

Show full content

The last post featured some Kotlin code inadvertently targeting a new Java API when the build JDK was bumped to 21. This can be solved with the -Xjdk-release Kotlin compiler flag, or by using Gradle toolchains to build with an old JDK.

If you read the Gradle docs…

Using Java toolchains is a preferred way to target a language version

…or the Android docs…

We recommend that you always specify the Java toolchain

…you wouldn't be blamed for thinking Java toolchains are the way to go!

However, Java toolchains are rarely a good idea. Let's look at why.

Bad docs

Last week I released a new version of Retrofit which uses a Java toolchain to target Java 8. Its use of toolchains was contributed a while ago, and I simply forgot to remove it. As a consequence, its Javadoc was built using JDK 8 and is thus not searchable. Searchable Javadoc came in JEP 225 with JDK 9.

The next release of Retrofit will be made without a toolchain and with the latest JDK. Its docs will have all the Javadoc advancements from the last 10 years including search and better modern HTML/CSS.

Resource ignorance

Old JVMs were somewhat notorious for being ignorant to resource limitations imposed by the system. The rise of containers, especially on CI systems, means your process resource limits are different from those of the host OS. JDK 10 kicked things into high gear with cgroups support and JDK 15 extended that to cgroups2.

Both of those changes were backported to the 8 and 11 branches, but since Gradle toolchains will use an already-installed JDK if available you have to have kept your JDK 8 and/or JDK 11 up-to-date. Have you?

Not to stray too far off-topic, but if you installed it with SDKMAN! or similar JDK management tools there's a good chance it's wildly out of date. I keep all my JDKs up-to-date by installing them through a Homebrew tap which itself updates automatically using the Azul Zulu API. As long as I do a brew upgrade every so often, each major JDK release that I have installed will be updated.

Without a Java toolchain, a modern JDK (even an outdated patch release of one) will honor resource limits and perform much better in containerized environments.

Compiler bugs

All software has bugs, and sometimes the JVM, the Java compiler, or both have bugs. When you are using a 10-year-old version of the JVM and Java compiler, you run a much greater risk of compiler bugs, especially around features introduced near to that release.

There were many compilation problems around lambdas which were introduced in Java 8. If you are using the Java compiler from JDK 8 to target Java 8 JVMs you can still run into those bugs. Even if you are keeping your JDK 8 up-to-date many fixes are not backported. You can find ones on the issue tracker without much effort.

Now is the Java compiler in JDK 22 completely bug-free? No. But is using the Java compiler from JDK 22 on sources targeting Java 8 using only Java 8 language features much safer than using one from JDK 8? Absolutely.

Worse performance

Oracle and other large JVM shops devote lots of person-hours to making the JVM faster. We have newer garbage collectors that use less memory and consume less CPU. Work that happened on startup gets deferred to first-use to try and spread the cost out over the lifetime of the process. Algorithms and in-memory representations are specialized for common cases.

A language compiler is basically a worst-case scenario for the JVM. Endless string manipulation, object creation, and so so many maps. These areas receive many improvements over the years. My favorite of which is that strings which are ASCII-based suddenly occupy half as much memory in Java 9 than in Java 8. You know what's often entirely ASCII? Java and Kotlin source code!

Not needed for cross-compilation

Using the Java compiler from JDK 8 I can set -source and -target to "1.7" to compile a class that works on a Java 7 JVM. This does not prevent me from using Java 8 APIs, however. You have to add -bootclasspath with a pointer to a JDK 7 runtime (rt.jar) so that the compiler knows what APIs are available in Java 7. You could alternatively use a tool like Animal Sniffer to validate that no APIs newer than Java 7 were used. In this world, just compiling with JDK 7 to target Java 7 might actually just be easier.

In JDK 9, however, this all changed. The compiler now contains a record of all public APIs from every Java version going back to Java 8. It also allows specifying a single compiler flag, --release, which sets the source code language version, the target bytecode version, and the available runtime APIs to the specified release. There is simply no value in compiling with an older JDK to target an older JVM anymore.

Wasted disk space

All those JDKs needlessly take up space in your home directory. Each JDK is a few hundred MiB. By default, Gradle will try to match an existing JDK when a toolchain is requested. Project owners can specify additional attributes such as the JDK vendor which might cause existing JDKs to not match. This means even though one project forced you to install Eclipse Temurin JDK 8, another might force Azul Zulu JDK 8. So not only do you now have a bunch of old JDKs, you have two or three copies of each. My JDK cache in ~/.gradle is nearly 2 GiB.

Not the Gradle JVM

Toolchains are only used for tasks that create a new JVM. That means compilation (of Java or Kotlin) and running unit tests. They do not control the JVM that is used for running the actual Gradle build or any of the plugins therein. If you have minimum requirements there, or in other JVM-based tools which are invoked by the Gradle build, the toolchain does not help you.

If your build already has a minimum JDK requirement then why force installation of old JDKs given the newer one is already available on disk, can cross-compile perfectly, has fewer compiler bugs, builds faster, and respects system CPU and memory limits more effectively?

Not all bad

I want to stress that toolchains are unequivocally not a good idea for compilation. They still have utility elsewhere, however.

Retrofit has runtime behavior that changes based on the JVM version on which it's running. (This is because until Java 16 it took various different hacks to support invoking default methods through a Proxy.) That code needs to be tested on different JVM versions. As a result, we compile with the latest Java, but test through the lowest-supported Java using toolchains on the Test task. No need to worry about the user having weird old JDKs for Java 14 because it's now installed on-demand when the full test suite is run.

Some tools that dip into JDK internals regularly break on newer versions of the compiler because they rely on unstable APIs. I'm thinking about things like Google Java Format or Error-Prone. No need to hold the rest of your project from enjoying the latest JDK, if those tools are run via a JavaExec task you can use a toolchain to keep them on an older JDK until a newer version is available.

What do I do?

Use the --release flag if you're compiling Java! Gradle exposes a property for it now.

Use the -Xjdk-release flag if you're compiling Kotlin. Future versions of the Kotlin Gradle plugin will expose a nice DSL property for it.

If you're targeting Android (with Java, Kotlin, or both) you need only specify the sourceCompatibility (for Java) and jvmTarget (for Kotlin). You don't need the targetCompatibility as it will default to match the sourceCompatibility.

No matter what the Gradle or Android docs tell you, don't use a toolchain! Save toolchains for JVM unit tests or incompatible tools.

https://jakewharton.com/gradle-toolchains-are-rarely-a-good-idea

Kotlin's JDK release compatibility flag

Mar 13, 2024 Updated Mar 13, 2024

Show full content

Yesterday, our Android app crashed with a weird NoSuchMethodError.

java.lang.NoSuchMethodError: No interface method removeFirst()Ljava/lang/Object; in class Ljava/util/List; or its super classes (declaration of 'java.util.List' appears in /apex/com.android.art/javalib/core-oj.jar)
    at app.cash.redwood.lazylayout.widget.LazyListUpdateProcessor.onEndChanges(SourceFile:165)
    at app.cash.redwood.lazylayout.view.ViewLazyList.onEndChanges(SourceFile:210)
    at app.cash.redwood.protocol.widget.ProtocolBridge.sendChanges(SourceFile:125)
    at app.cash.redwood.treehouse.ViewContentCodeBinding.receiveChangesOnUiDispatcher(SourceFile:419)
    at app.cash.redwood.treehouse.ViewContentCodeBinding$sendChanges$1.invokeSuspend(SourceFile:383)
    at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(SourceFile:33)
    at kotlinx.coroutines.DispatchedTask.run(SourceFile:104)
    at android.os.Handler.handleCallback(Handler.java:938)
    at android.os.Handler.dispatchMessage(Handler.java:99)
    at android.os.Looper.loop(Looper.java:250)
    at android.app.ActivityThread.main(ActivityThread.java:7868)

The offending code is written in Kotlin, and looks like this:

val widget = edit.widgets.removeFirst()

The IDE showing an italicized blue style for removeFirst means it's a Kotlin extension function which compiles down to a static helper in the bytecode. However, the exception clearly indicates we are calling a member function on List directly. What gives?

In JDK 21, as part of the sequenced collection effort, the List interface added removeFirst() and removeLast() methods. According to the Kotlin docs on extension functions:

If a class has a member function, and an extension function is defined which has the same receiver type, the same name, and is applicable to given arguments, the member always wins.

When we bumped our build JDK to 21, the new member became available and accidentally took precedence. Oops!

But wait, we set our Kotlin jvmTarget to 1.8 in order to be backwards compatible. Is that not enough?

val javaVersion = JavaVersion.VERSION_1_8
tasks.withType(KotlinJvmCompile::class.java).configureEach {
  it.kotlinOptions.jvmTarget = javaVersion.toString()
}
// Kotlin requires the Java compatibility matches despite have no sources.
tasks.withType(JavaCompile::class.java).configureEach {
  it.sourceCompatibility = javaVersion.toString()
  it.targetCompatibility = javaVersion.toString()
}

This setting controls the Java bytecode version that the Kotlin compiler emits for JVM and Android targets. We can confirm this is being honored by inspecting the offending class with javap.

$ javap -v redwood-lazylayout-widget/build/classes/kotlin/jvm/main/app/cash/redwood/lazylayout/widget/LazyListUpdateProcessor.class | head -8
Classfile redwood-lazylayout-widget/build/classes/kotlin/jvm/main/app/cash/redwood/lazylayout/widget/LazyListUpdateProcessor.class
  Last modified Mar 13, 2024; size 16001 bytes
  SHA-256 checksum dbeed7bba16c023a98fa356bab7cada7abe686d5da7d4824781790de577e94a2
  Compiled from "LazyListUpdateProcessor.kt"
public abstract class app.cash.redwood.lazylayout.widget.LazyListUpdateProcessor<V extends java.lang.Object, W extends java.lang.Object> extends java.lang.Object
  minor version: 0
  major version: 52
  flags: (0x0421) ACC_PUBLIC, ACC_SUPER, ACC_ABSTRACT

The classfile's major version is listed at 52, which we can reverse lookup using a version table and see that this corresponds to Java 8. So we know that's working, at least.

Further down the output, however, the offending reference can also be seen.

405: checkcast     #101        // class app/cash/redwood/lazylayout/widget/LazyListUpdateProcessor$Edit$Insert
408: invokevirtual #107        // Method app/cash/redwood/lazylayout/widget/LazyListUpdateProcessor$Edit$Insert.getWidgets:()Ljava/util/List;
411: invokeinterface #151,  1  // InterfaceMethod java/util/List.removeFirst:()Ljava/lang/Object;
416: checkcast     #121        // class app/cash/redwood/widget/Widget
419: astore        6

The reason this can happen is that the Java bytecode version is independent from the set of JDK APIs that you can reference. This is not unique to Kotlin. javac's -target flag behaves the same way, as you can see in this Godbolt sample.

This can be fixed with javac by specifying the -bootclasspath argument and pointing at the rt.jar from a JDK 8 install. The JDK 21 compiler emits a warning telling us to do this when target any bytecode version other than the default:

warning: [options] bootstrap class path not set in conjunction with -source 8

Starting with Java 9, javac has a new flag, --release, which sets the -source, -target, and -bootclasspath flags automatically to the same version (and doesn't require having the old JDK available). If we switch the Java sample to use --release it now fails to compile!

Kotlin 1.7 brought a new flag to kotlinc (Kotlin's JVM compiler) which acts just like javac's --release: -Xjdk-release. As far as I can tell, this has flown massively under the radar but is an essential piece to the cross-compilation toolkit.

Let's configure our JVM target's compilation to use this flag and see what changes.

 kotlin.targets.withType(KotlinJvmTarget::class.java) { target ->
  target.compilations.configureEach {
    it.kotlinOptions.freeCompilerArgs += listOf(
      "-Xjdk-release=$javaVersion",
    )
  }
}

After compiling and dumping the Java bytecode there is a welcome change.

 405: checkcast     #101        // class app/cash/redwood/lazylayout/widget/LazyListUpdateProcessor$Edit$Insert
 408: invokevirtual #107        // Method app/cash/redwood/lazylayout/widget/LazyListUpdateProcessor$Edit$Insert.getWidgets:()Ljava/util/List;
-411: invokeinterface #151,  1  // InterfaceMethod java/util/List.removeFirst:()Ljava/lang/Object;
+411: invokestatic  #152        // Method kotlin/collections/CollectionsKt.removeFirst:(Ljava/util/List;)Ljava/lang/Object;
 414: checkcast     #121        // class app/cash/redwood/widget/Widget
 417: astore        6

With the JDK API unavailable, the removeFirst extension now resolves to the static method in the Kotlin standard library.

The -Xjdk-release flag is useful for the Kotlin JVM plugin or the JVM targets of the Kotlin multiplatform plugin to ensure compatibility with your target minimum JVM. Users of the Kotlin Android plugin or the Android targets of the Kotlin multiplatform plugin do not need to do this, as the use of the android.jar as the boot classpath limits the java.* APIs to those of your compileSdk (and Android Lint ensures you don't use anything newer than your minSdk).

Unforunately there's no Gradle DSL for this yet, but KT-49746 tracks that.

If you use Gradle toolchains you don't have this problem. This is because you actually use the ancient JDK and JVM of your minimum target to run javac and kotlinc and miss out on a decade's worth of compiler improvements. Gradle toolchains are rarely a good idea. But that's a topic for next week…

https://jakewharton.com/kotlins-jdk-release-compatibility-flag

Perils of duplicate finding

Feb 14, 2024 Updated Feb 14, 2024

Show full content

Given an array of integers ([1, 2, 3, 1, 3, 1]), find the elements which are duplicated. No, we're not interviewing. I'm trying to prevent a user from specifying a reserved value twice.

Elsewhere in the file I already have duplicate detection for object tags.

val dupes: Map<Int, List<Widget>> =
    widgets.groupBy(Widget::tag)
      .filterValues { it.size > 1 }

I can do the same technique for the integer array with an identity function and grabbing the resulting keys.

val dupes: Set<Int> =
    ints.groupBy { it }
      .filterValues { it.size > 1 }
      .keys

This prints [1, 3].

So… done? Yes! But no, using the map seems wasteful, right?

Attempt 1

My first attempt to avoid the map was to remove the set of integers from a list of them. This should result in a list of any duplicated elements.

val dupes: List<Int> =
    ints.toList() - ints.toSet()

No matter the content of ints, this will always print []. Why?

The minus operator says that it "returns a list containing all elements of the original collection except the elements contained in the given elements collection". So it removes all occurrences of each element in the set from the list.

This is some surprising behavior to hide behind an operator whose signature operates on an Iterable receiver and Collection argument.

Attempt 2

Second attempt switches to MutableList.removeAll which takes a collection of elements. The MutableList.remove function only removes the first occurrence of an element, so this should remove the first occurrence of each element in the set.

val dupes: List<Int> =
    ints.toMutableList()
      .apply { removeAll(ints.toSet()) }

This once again prints []. But why?

Kotlin made me a liar. MutableList.remove does indeed only remove the first occurrence of the element. MutableList.removeAll, however, removes all occurrences of each element in the supplied collection. That's quite the subtle asymmetry.

There is no function for removing all occurrences of a single element. Nor a function to remove only the first occurrences of each element of a supplied collection.

You needn't be mad at Kotlin, though. It inherited this behavior from Java.

Attempt 3

Third attempt now with MutableList.remove.

val dupes: List<Int> =
    ints.toMutableList()
      .apply { ints.toSet().forEach(::remove) }

This (finally) prints [1, 3, 1]. If we want just the set of duplicates to match the map-based approach above we can tag on a toSet().

val dupes: Set<Int> =
    ints.toMutableList()
      .apply { ints.toSet().forEach(::remove) }
      .toSet()

Visually this is not the greatest. It's also not really that efficient (not that we've been worrying about that yet). We got here because I started with a clever-but-incorrect approach (toList() - toSet()) that I then had to refactor until it was correct.

Attempt 4

Fourth attempt is a chance to reset our approach. I thought that we could partition the elements based on whether we've seen the value before. A set tracks the values, and its MutableSet.add returns a boolean indicating whether the collection was mutated (i.e., has been seen before).

val dupes: Set<Int> =
    HashSet<Int>()
      .run { ints.partition(::add) }
      .second
      .toSet()

This prints [1, 3] correctly. Visually the code is just dreadful. It's hard to quickly discern what value is flowing from line to line.

Using partition was just my first intuition. But a partition that throws away half the result has another name: a filter!

Attempt 5

Fifth attempt at this now using a filter.

val dupes: Set<Int> =
    ints.filterNot(HashSet<Int>()::add)
      .toSet()

This continues to print [1, 3] correctly. We use filterNot because we want to keep elements where MutableSet.add returns false. Visually this is pretty decent.

The use of HashSet<Int>()::add is what's known as a bound reference. We are specifying a function reference of MutableSet::add as our filterNot lambda, but bound to an instance of HashSet which we are creating on-the-fly. This is an equivalent version of the above code.

val seen = HashSet<Int>()
val dupes: Set<Int> =
    ints.filterNot(seen::add)
      .toSet()

The advantage of inlining the HashSet instantiation is that we don't need to name it.1

Attempt 5.1

Finally, almost all of Kotlin's collection extensions have To-suffixed variants which allow supplying a destination collection. This can save you from having to add a toSomething() after an operation by instead just using that Something in the operation directly.

val dupes: Set<Int> =
    ints.filterNotTo(HashSet(), HashSet<Int>()::add)

Pretty, pretty, pretty good.

Benchmarks

Performance is not really a concern in my usage, but let's look anyway.

Benchmark                                       Score      Error   Units
----------------------------------------------  ---------  ------  -----
IntDupes.map                                     94.015 ±   0.469  ns/op
IntDupes.map:·gc.alloc.rate.norm                776.000 ±   0.001   B/op

IntDupes.mutableListRemove                      155.744 ±  17.829  ns/op
IntDupes.mutableListRemove:·gc.alloc.rate.norm  560.000 ±   0.001   B/op

IntDupes.partition                              135.693 ±  18.976  ns/op
IntDupes.partition:·gc.alloc.rate.norm          544.000 ±   0.001   B/op

IntDupes.filterNot                               97.748 ±   1.055  ns/op
IntDupes.filterNot:·gc.alloc.rate.norm          504.000 ±   0.001   B/op

IntDupes.filterNotTo                             39.904 ±   0.331  ns/op
IntDupes.filterNotTo:·gc.alloc.rate.norm        432.000 ±   0.001   B/op

So the filterNotTo winds up being the fastest and allocates the fewest bytes. Double win!

As I'm writing this I'm realizing the partition above could have been ints.partition(HashSet<Int>::add).second.toSet(). This produces the same bytecode, but from a more compact Kotlin. ↩

https://jakewharton.com/perils-of-duplicate-finding

Intermediate collection avoidance

Feb 7, 2024 Updated Feb 7, 2024

Show full content

Given a list of users, extract their names and join them into a comma-separated list. Kotlin's extension functions on collections make this trivial.

users.map { it.name }.joinToString()

Writing this in IntelliJ IDEA produces a "weak warning" offering advice.

Call chain on collection type may be simplified

An intention action will refactor the code for you to a more efficient form.

users.joinToString() { it.name }

Mapping the user to their name now occurs during construction of the joined string rather than as a discrete operation. The additional iterator and intermediate collection produced by the map is eliminated.

This code is both shorter and faster, and the IDE helps you discover this superior form.

Two similar fused operations that I like but which don't benefit from IDE advice are array and pre-sized list initialization with a lambda.

If we wanted to create an array of our user's names, instead of doing

users.map { it.name }.toTypedArray()

we can use

Array(users.size) { users[it].name }

This again trades the intermediate iterator and collection within map for an indexed loop. Primitive array versions are also available.

IntArray(users.size) { users[it].age }

Arrays are not used too often. Mostly for memory-sensitive or performance-sensitive code, or when calling out to a Java API. Thankfully this lambda-accepting initializer is also available for pre-sized lists.

MutableList(users.size) { users[it].name }

Use this to initialize element default values, compute elements based on the index, or derive data from another source.

In the case of deriving data, the source needs to support random access in order to actually result in a more efficient computation.1 If you use a list backed by an alternate structure (linked, persistent, etc.) performance will be abysmal. This technique works best for internal library usage and should not be used when you don't control the original list.

Benchmark                                       Score     Error   Units
--------------------------------------------- ---------- -------- -----
NamesJoinToString.map                         126.582 ±  38.237   ns/op
NamesJoinToString.map:·gc.alloc.rate.norm     232.000 ±   0.001    B/op
NamesJoinToString.lambda                       73.586 ±   1.960   ns/op
NamesJoinToString.lambda:·gc.alloc.rate.norm  168.000 ±   0.001    B/op

NamesToTypedArray.map                          78.444 ±  22.427   ns/op
NamesToTypedArray.map:·gc.alloc.rate.norm     120.000 ±   0.001    B/op
NamesToTypedArray.lambda                       10.326 ±   0.129   ns/op
NamesToTypedArray.lambda:·gc.alloc.rate.norm   40.000 ±   0.001    B/op

As you can see in the benchmarks above, the lambda initialization variants are both faster due to the use of indexed loops and allocate fewer bytes with no iterator or intermediate collection. We could hand-write such loops, but Kotlin's zero-overhead functions keep our code short and sweet.

You might be aware of Compose UI's horribly-named "fast" collection functions which also use this strategy. ↩

https://jakewharton.com/intermediate-collection-avoidance

A stable, multiplatform Molecule 1.0

Jul 19, 2023 Updated Jul 19, 2023

Show full content

This post was published externally on Cash App Code Blog. Read it at https://code.cash.app/molecule-1-0.

https://jakewharton.com/molecule-1-0

Native UI and multiplatform Compose with Redwood

Jul 5, 2023 Updated Jul 5, 2023

Show full content

This post was published externally on Cash App Code Blog. Read it at https://code.cash.app/native-ui-and-multiplatform-compose-with-redwood.

https://jakewharton.com/native-ui-and-multiplatform-compose-with-redwood

Flow testing with Turbine

Jun 21, 2023 Updated Jun 21, 2023

Show full content

This post was published externally on Cash App Code Blog. Read it at https://code.cash.app/flow-testing-with-turbine.

https://jakewharton.com/flow-testing-with-turbine

Using jlink to cross-compile minimal JREs

Jan 16, 2023 Updated Jan 16, 2023

Show full content

jlink is a JDK tool to create bespoke, minimal JREs for your applications. Let's try it with a "Hello, world!" program:

class Main {
  public static void main(String... args) {
    System.out.println("Hello, world!");
  }
}

My laptop is an M1 Mac and I have downloaded the Azul Zulu JDK 19 build for it. With the JDK I can both compile Java and then run the resulting program.

$ mkdir out
$ zulu19.30.11-ca-jdk19.0.1-macosx_aarch64/bin/javac -d out in/Main.java
$ zulu19.30.11-ca-jdk19.0.1-macosx_aarch64/bin/java -cp out Main
Hello, world!

Azul Zulu also provides a JRE that I can use to run compiled programs.

$ zulu19.30.11-ca-jre19.0.1-macosx_aarch64/bin/java -cp out Main
Hello, world!

Note the slight change in folder name ("jdk" → "jre").

If we were shipping this to end-users it would be an easy win for binary size.

$ du -hs zulu*
329M    zulu19.30.11-ca-jdk19.0.1-macosx_aarch64
136M    zulu19.30.11-ca-jre19.0.1-macosx_aarch64

But 136MiB just for "Hello, world"? Don't tell Reddit or Hacker News!

Thankfully, jlink is here to help us build a minimal JRE with only what we need. Given our program, a sibling tool, jdeps, lists the Java modules which are required.

$ zulu19.30.11-ca-jdk19.0.1-macosx_aarch64/bin/jdeps \
      --print-module-deps \
      out/Main.class
java.base

Our program is so simple that it only needs the "base" module. Now with jlink we can produce a minimal JRE.

$ zulu19.30.11-ca-jdk19.0.1-macosx_aarch64/bin/jlink \
      --compress 2 \
      --strip-debug \
      --no-header-files \
      --no-man-pages \
      --output zulu-hello-jre \
      --add-modules java.base

$ du -hs zulu*
 28M    zulu-hello-jre
329M    zulu19.30.11-ca-jdk19.0.1-macosx_aarch64
136M    zulu19.30.11-ca-jre19.0.1-macosx_aarch64

28MiB won't win any language wars, but it's a massive 80% savings over the full JRE.

$ zulu-hello-jre/bin/java -cp out Main
Hello, world!

We can ship it to our client and call it a day, right?

$ tar -czf hello.tgz zulu-hello-jre out

$ scp hello.tgz jw@server:
hello.tgz            100%   14MB   2.0MB/s   00:07

$ ssh jw@server "tar xzf hello.tgz && zulu-hello-jre/bin/java -cp out Main"
bash: zulu-hello-jre/bin/java: cannot execute binary file: Exec format error

Nope! While the Java bytecode we compiled is platform independent, the JRE is specific to each platform and my server runs Linux x64.

Thankfully, jlink can operate on JDKs for different platforms. Let's download the Linux x64 JDK and point jlink at its Java modules using --module-path.

$ zulu19.30.11-ca-jdk19.0.1-macosx_aarch64/bin/jlink \
      --compress 2 \
      --strip-debug \
      --no-header-files \
      --no-man-pages \
      --output zulu-hello-jre-linux-x64 \
      --module-path zulu19.30.11-ca-jdk19.0.1-linux_x64/jmods
      --add-modules java.base

$ du -hs zulu*
 28M    zulu-hello-jre
 36M    zulu-hello-jre-linux-x64
338M    zulu19.30.11-ca-jdk19.0.1-linux_x64
329M    zulu19.30.11-ca-jdk19.0.1-macosx_aarch64
136M    zulu19.30.11-ca-jre19.0.1-macosx_aarch64

The Linux x64 JRE is a little larger than the one for my ARM Mac, but it's still small compared to the full-size JRE. Does it work on the client?

$ tar -czf hello-linux.tgz zulu-hello-jre-linux-x64 out

$ scp hello-linux.tgz jw@server:
hello.tgz            100%   16MB   2.1MB/s   00:08

$ ssh jw@server "tar xzf hello-linux.tgz && zulu-hello-jre-linux-x64/bin/java -cp out Main"
Hello, world!

It works! Now we can grab JDKs for any architecture for any platform and use our host jlink to effectively cross-compile minimal JREs for each target.

This is a great solution for multi-architecture Docker containers, desktop clients like JetBrains Compose UI, shipping to devices where you can't fit a full JDK, and more. Be sure to explore all the options on jdeps and jlink for ways to keep your runtimes small.

https://jakewharton.com/using-jlink-to-cross-compile-minimal-jres

Report card: Java 19 and the end of Kotlin

Sep 20, 2022 Updated Sep 20, 2022

Show full content

Three years ago I gave the talk "What's new in Java 19: The end of Kotlin?" which forecasted what a future Java language would look like in September 2022 when Java 19 was released. Check your calendars, folks. It's September 2022 right now and Java 19 was released today!

As expected my predictions were not perfect, but I'm pretty happy with the results. Let's check in with each feature and see how my predictions fared report-card style1.

Local methods

This feature allows for methods to be declared inside of other methods making them effectively private to that method.

public static boolean anyMatch(Graph graph, Predicate<Node> predicate) {
  var seen = new HashSet<Node>();

  boolean hasMatch(Node node) {
    if (!seen.add(node)) return false; // already seen
    if (predicate.test(node)) return true; // match!
    return node.getNodes().stream().anyMatch(n -> hasMatch(n));
  }

  return hasMatch(getRoot());
}

Grade: F 🔴

Working support for local methods was added to a branch in Project Amber in October 2019. It seemed like a slam dunk, but a JEP for the feature was never created. The branch still sits in the Project Amber repo unchanged in three years.

If I had to guess, all eyes in Amber are focused on pattern matching and its related features. Hopefully someday local methods can be picked back up as a proposed feature.

Text blocks

A multiline string literal for when one line just isn't enough.

System.out.println("""
  SELECT *
  FROM users
  WHERE name LIKE 'Jake %'
""");

Grade: A 🟢

Delivered in Java 15.

Records

A read-only type that exists solely for carrying data with strong, semantic names.

record Person(String name, int age) { }

Grade: A 🟢

Delivered in Java 16.

Sealed hierarchies

Define the list of permitted subtypes of your class or interface and prevent any others.

sealed interface Developer { }
record Person(String name, int age) extends Developer { }
record Business(String name) extends Developer { }

Grade: A 🟢

Delivered in Java 17.

Type patterns

Declare a new name to bind when a type test succeeds.

Object o = 1;
if (o instanceof Integer i) {
  System.out.println(i + 1);
}

Grade: A 🟢

Delivered in Java 16 for instanceof. Third preview in Java 19 for use in a switch.

Record patterns

Bind the component parts of a record type to local names.

Developer alice = Person("Alice", 12);
switch (alice) {
  case Person(var name, var age) -> // ...
}

Grade: C 🟠

First preview in Java 19

Just now starting in preview and does not include things like the use of an underscore (_) as a wildcard or syntax for destructors.

Virtual threads

All of your blocking calls with none of the blocking.

try (var executor = Executors.newVirtualThreadPerTaskExecutor()) {
  for (int count = 10; count > 0; count--) {
    executor.submit(() -> {
      Thread.sleep(100 * count);
      System.out.println(count)
    });
  });
}

Grade: B 🟡

First preview in Java 19.

Nice to see it just make the cut, although expect a few previews to be needed before it's stable.

All things considered I think this is a passing report card (despite failing one elective).

The next three years of Java will hopefully see the completion of the items above as well as see larger efforts like Project Panama and Project Valhalla start to come to fruition. It's a great time to be a Java developer.

To the surprise of no one, Kotlin did not end. It continued to evolve in the last three years with language features such as context receivers, sealed interfaces, and exhaustive-by-default. It's also a great time to be a Kotlin developer.

But in the end I think we can all agree on one thing: there's no such thing as OpenJDK LTS and the best long-term version of the JDK is always the latest one. Welcome to my hill. Update to Java 19 today!

A is perfect, F is fail, and there is no E. ↩

https://jakewharton.com/report-card-java-19-and-the-end-of-kotlin

Build on latest Java, test through lowest Java

May 17, 2022 Updated May 17, 2022

Show full content

In the past, when a new version of Java was released, I would add that version to our open source project's CI builds.

 strategy:
   matrix:
     java-version:
       - 8
       - 9
         ⋮
       - 17
+      - 18

This ensures that each project can be built and its tests pass on every major version.

But this makes no sense! No user is building these projects on different versions. No user is building these projects at all. Consumers are using the pre-built .jar which we ship to Maven Central built on a single version.

Testing on every version, however, is something extremely valuable. Thankfully, Gradle toolchains let us retain this while still only building once.

First, CI only has to build on a single version. We choose the latest because Java has excellent cross-compilation capabilities, and we want to be using the latest tools.

 - uses: actions/setup-java@v2
   with:
     distribution: 'zulu'
-    java-version: ${{ matrix.java-version }}
+    java-version: 18

Second, unchanged from before, we still target whichever Java version is the lowest supported through either the --release flag or sourceCompatibility/targetCompatibility per the Gradle docs.

And finally, we set up tests to run on every supported version.

// Normal test task runs on compile JDK.
(8..17).each { majorVersion ->
  def jdkTest = tasks.register("testJdk$majorVersion", Test) {
    javaLauncher = javaToolchains.launcherFor {
      languageVersion = JavaLanguageVersion.of(majorVersion)
    }

    description = "Runs the test suite on JDK $majorVersion"
    group = LifecycleBasePlugin.VERIFICATION_GROUP

    // Copy inputs from normal Test task.
    def testTask = tasks.getByName("test")
    classpath = testTask.classpath
    testClassesDirs = testTask.testClassesDirs
  }
  tasks.named("check").configure { dependsOn(jdkTest) }
}

This setup reduces CI burden since we only compile the main and test sources once but execute the tests on every supported version from latest to lowest.

Verification tasks
------------------
check - Runs all checks.
test - Runs the test suite.
testJdk10 - Runs the test suite on JDK 10
testJdk11 - Runs the test suite on JDK 11
testJdk12 - Runs the test suite on JDK 12
testJdk13 - Runs the test suite on JDK 13
testJdk14 - Runs the test suite on JDK 14
testJdk15 - Runs the test suite on JDK 15
testJdk16 - Runs the test suite on JDK 16
testJdk17 - Runs the test suite on JDK 17
testJdk8 - Runs the test suite on JDK 8
testJdk9 - Runs the test suite on JDK 9

For projects using multi-release jars, this compilation and testing setup is essential since the source sets require compiling with newer versions but testing through a lower version bound.

So if adding Java versions to a CI matrix is something you've been doing, consider switching to compile with a single Java version and instead varying your test execution instead. And if you only build and test on a single version today, adding this can ensure correctness on all versions that you support.

Not every project needs to test on multiple versions. If your code is mostly algorithmic you won't gain much from doing this. But if you vary behavior based on Java version, conditionally leverage APIs on newer versions, or interact with non-public APIs then this is a best practice.

P.S. Are you an Android developer? You probably keep your compileSdk high, your minSdk low(-ish), and execute instrumentation tests on a few versions between those two. Great news, you're already following this advice as it's always been the norm!

https://jakewharton.com/build-on-latest-java-test-through-lowest-java

Slope-intercept library design

Apr 5, 2022 Updated Apr 5, 2022

Show full content

The equation y=mx+b defines a line in slope-intercept form. The line will intercept the y-axis at the value b and for each change in x its slope (the amount the line goes up or down) will change by m.

Slope-intercept gives me a way to think about the design of libraries in relation to each other. The intercept is the initial cost of learning and setup for a library, and the slope is how the library's complexity changes over time. There's no real units here and the values are entirely subjective. Let's try it!

Picasso

Exactly 10 years ago today I introduced Picasso internally at Square. As an image loading library for Android, its primary selling point was a low intercept. It required no real configuration and only one line of code (even in a ListView adapter).

Picasso.with(context).load("https://...").into(imageView);

At the time this was a refreshing change from the existing libraries which required a lot of up-front and per-request configuration.

The downside, however, was that as your needs grow the slope of complexity also grows faster than desired. Configuring the global instance, managing multiple instances, intercepting requests, and transforming images are all possible but more difficult than if the library was designed differently.

Retrofit

Retrofit is a declarative HTTP client abstraction for the JVM and Android. It requires configuration of a central object before you can use it to create instances of service interfaces.

interface GitHubService {
  @GET("users/{user}/repos")
  Call<List<Repo>> listRepos(@Path("user") String user);
}

var retrofit = new Retrofit.Builder()
  .baseUrl("https://api.github.com/")
  .addConverter(MoshiJsonConverter.create())
  .build();

var service = retrofit.create(GitHubService.class);

This up-front configuration gives Retrofit a higher intercept on the y-axis. Exposure to these APIs gives you an entrypoint to discover functionality and encourages you to manage their lifetimes in an efficient way for your usage allowing the slope of complexity to not be as steep.

Dagger

Dagger is an annotation processor-based dependency injection library for the JVM and Android. It has almost no API of its own aside from a handful of annotations. In order to use Dagger you need to learn dependency injection as a concept, learn how to build the various types to which its annotations apply, and then decide how dependency injection will fit into your architecture. It's just about the least turn-key library I've ever used which gives it an extremely high conceptual intercept.

@Component(modules = {
  AppModule.class
})
interface AppComponent {
  App app();
}

@Module
final class AppModule {
  @Provides static Database provideDatabase() {
    return new Database();
  }
}

final class App {
  private Database database;

  @Inject App(Database database) {
    this.database = database;
  }

  void run() {
    System.out.println(database.getUsers());
  }

  public static void main(String... args) {
    AppComponent.create().app().run();
  }
}

That's a lot of lines to basically do new App(new Database()).run()! But of course nothing stays that simple.

Once you have Dagger fully integrated into a large application, adding and connecting new dependencies is as easy as adding a parameter. The library automatically figures out how to wire the two together and shares instances across pre-defined lifetimes. Its slope of complexity is extremely shallow.

The slope-intercept evaluations of Picasso, Retrofit, and Dagger look roughly like this:

What are the units? It doesn't matter! This is a subjective approximation of concepts.

Design

Slope-intercept evaluation really shines when designing new libraries. It serves as a framework for discussing the amount of complexity you front-load onto a user and the amount which is spread over the continued usage of a library.

Can some parameter be specified globally or should it be passed with each call? Should it be available in both locations with overriding behavior? Is there an implicit default or should you always explicitly require that it is supplied?

As answers to those questions are being determined, you can start to look at the library as a whole. Can multiple parameters become a composite type? Do certain parameters imply defaults for the others? Are too many concepts being pushed into global configuration rather than local?

And finally, you can compare your design against others to determine if you're comfortable with its approximate slope and intercept. Picasso was built to combat image loading libraries whose complexity was that of Dagger. With the initial design I missed the mark and over-corrected to be too simple. Being closer to Retrofit would have been a much more comfortable place for the long-term health of the library.

Layering

Ideally every library would have an intercept near zero and a slope near zero. That is, a library which is trivial to get started with and whose API can accommodate every use case over time without learning anything new.

In practice this never happens simply due to the nature of complexity. You can't build libraries to solve non-trivial tasks while keeping the API basic and supporting myriad use cases. But what you can do is cheat by providing multiple of these hypothetical slope-intercept lines through layering.

Providing multiple APIs at different levels of abstraction allows solving 80% of use cases with a simple API, then 80% of the remaining 20% with a more detailed API, and then the final slice with a low-level API. Each layer is built on top of the next one with a measured reduction in API complexity.

In an HTTP client, for example, you can expose the declarative API for the majority, an imperative API for the minority, and then low-level protocol handlers for exotic needs. If a layer does not meet your requirements then you can always drop down to the next one for more control but also more responsibility.

And now what you've created is that exact same graph as above, except representing one library and its three layers of APIs.

This is certainly no exact science. But perhaps it will help you build a better library in the future. It's helped me!

https://jakewharton.com/slope-intercept-library-design

The state of managing state (with Compose)

Nov 11, 2021 Updated Nov 11, 2021

Show full content

This post was published externally on Cash App Code Blog. Read it at https://code.cash.app/the-state-of-managing-state-with-compose.

https://jakewharton.com/the-state-of-managing-state-with-compose

Multiplatform Compose and Gradle module metadata abuse

Nov 4, 2021 Updated Nov 4, 2021

Show full content

My primary work project for the better part of a year (named Redwood) is built on top of Compose1 and runs on every platform that Kotlin supports. This of course means Android, but we also have Compose running on iOS, the web, the JVM, and all other native targets. It's truly a multiplatform Compose project2.

Getting Compose to run on all these platforms isn't as hard as you would think. The Compose runtime is written as multiplatform Kotlin code but Google only ships it compiled for Android. JetBrains goes farther by shipping versions compiled for the web and for the JVM. We simply go the whole distance and compile it for every Kotlin target, while also shipping it as a single Kotlin multiplatform artifact.

For a year this worked fine. However, Compose UI recently went stable which meant our Android engineers were eager to start using it in the main app (as opposed to just samples). Upon Compose UI's introduction D8 fails with a duplicate class error:

> Duplicate class androidx.compose.runtime.AbstractApplier found in
  redwood-compose-runtime (app.cash.redwood:compose-runtime-android:0.1.0-square.15) and
  runtime-1.0.0-runtime (androidx.compose.runtime:runtime:1.0.0)

The androidx.compose.* types are compiled into Redwood's multiplatform Compose runtime artifact. Compose UI depends on the official Compose runtime for Android which also contains these types. Since the two artifacts have different Maven coordinates, Gradle allows both to be included in the app which eventually causes D8 to complain3.

Redwood was already building Compose from the same git SHAs as Google's release builds. Ideally we could use our own builds for every platform except Android, and then point at Google's artifact solely for Android. This would allow Gradle to see the two projects as sharing a common dependency thereby de-duplicating the Compose runtime classes.

Gradle module metadata

The mechanism by which Kotlin multiplatform artifacts resolve the correct dependency is through Gradle's module metadata format.

Gradle Module Metadata is a unique format aimed at improving dependency resolution by making it multi-platform and variant-aware.

The module metadata is a JSON document which describes the supported platforms through key/value attributes. For Redwood's Compose runtime the module metadata looks roughly like this:

{
  "component": {
    "group": "app.cash.redwood",
    "module": "compose-runtime",
    "version": "0.1.0-square.15"
  },
  "variants": [
    {
      "name": "releaseApiElements-published",
      "attributes": {
        "org.gradle.usage": "java-api",
        "org.jetbrains.kotlin.platform.type": "androidJvm"
      },
      "available-at": {
        "url": "../../compose-runtime-android/0.1.0-square.15/compose-runtime-android-0.1.0-square.15.module",
        "group": "app.cash.redwood",
        "module": "compose-runtime-android",
        "version": "0.1.0-square.15"
      }
    },
    {
      "name": "iosArm64ApiElements-published",
      "attributes": {
        "artifactType": "org.jetbrains.kotlin.klib",
        "org.gradle.usage": "kotlin-api",
        "org.jetbrains.kotlin.native.target": "ios_arm64",
        "org.jetbrains.kotlin.platform.type": "native"
      },
      "available-at": {
        "url": "../../compose-runtime-iosarm64/0.1.0-square.15/compose-runtime-iosarm64-0.1.0-square.15.module",
        "group": "app.cash.redwood",
        "module": "compose-runtime-iosarm64",
        "version": "0.1.0-square.15"
      }
    },
    ...
  ]
}

When a 64-bit iOS ARM target consumes the app.cash.redwood:compose-runtime dependency, Gradle will parse this JSON file and actually resolve the app.cash.redwood:compose-runtime-iosarm64 artifact. It behaves somewhat like an HTTP 302 redirect by replacing the user-friendly Maven coordinate with the canonical platform-specific coordinate.

For an Android consumer the artifact redirect resolves to app.cash.redwood:compose-runtime-android which is one of the offending artifact coordinates seen in the duplicate class error from D8. As I mentioned above, what we want is to have this variant redirect to Google's build of the Compose runtime and not our own.

We could try to alter the values in the available-at object to point to Google's artifact, but according to the Gradle module metadata spec the url key must also point to a metadata file which is something Google does not ship.

Thankfully, just below available-at in the spec, the dependencies array affords the ability to point at arbitrary Maven coordinates. This would allow us to define a variant with no available-at but a single dependency item to the associated Google Compose runtime artifact.

 {
   "name": "releaseApiElements-published",
   "attributes": {
     "org.gradle.usage": "java-api",
     "org.jetbrains.kotlin.platform.type": "androidJvm"
   },
-  "available-at": {
-    "url": "../../compose-runtime-android/0.1.0-square.15/compose-runtime-android-0.1.0-square.15.module",
-    "group": "app.cash.redwood",
-    "module": "compose-runtime-android",
-    "version": "0.1.0-square.15"
-  }
+  "dependencies": [
+    {
+      "group": "androidx.compose.runtime",
+      "module": "runtime",
+      "version": {
+        "prefers": "1.0.4"
+      }
+    }
+  ]
 }

But the module metadata file is entirely generated by Gradle based on project information. How can we modify it to change the output of only a single variant?

Modifying Gradle module metadata

Spoiler alert: You can't. At least not using any stable APIs that Gradle provides4.

The best (only?) mechanism that I've found is to hook into the module metadata file generation task and perform text-based modification of the JSON immediately after it is generated.

First, we define a text file which contains the expected JSON contents to be replaced5.

    {
      "name": "releaseApiElements-published",
      "attributes": {
        "org.gradle.usage": "java-api",
        "org.jetbrains.kotlin.platform.type": "androidJvm"
      },
      "available-at": {
        "url": "../../compose-runtime-android/{REDWOOD_VERSION}/compose-runtime-android-{REDWOOD_VERSION}.module",
        "group": "app.cash.redwood",
        "module": "compose-runtime-android",
        "version": "{REDWOOD_VERSION}"
      }
    },
    {
      "name": "releaseRuntimeElements-published",
      "attributes": {
        "org.gradle.usage": "java-runtime",
        "org.jetbrains.kotlin.platform.type": "androidJvm"
      },
      "available-at": {
        "url": "../../compose-runtime-android/{REDWOOD_VERSION}/compose-runtime-android-{REDWOOD_VERSION}.module",
        "group": "app.cash.redwood",
        "module": "compose-runtime-android",
        "version": "{REDWOOD_VERSION}"
      }
    },

Notice how the {REDWOOD_VERSION} placeholder is used to minimize changes to this file over time.

Next, define the replacement JSON in another file.

    {
      "name": "releaseApiElements-published",
      "attributes": {
        "org.gradle.usage": "java-api",
        "org.jetbrains.kotlin.platform.type": "androidJvm"
      },
      "dependencies": [
        {
          "group": "androidx.compose.runtime",
          "module": "runtime",
          "version": {
            "prefers": "{COMPOSE_VERSION}"
          }
        }
      ]
    },
    {
      "name": "releaseRuntimeElements-published",
      "attributes": {
        "org.gradle.usage": "java-runtime",
        "org.jetbrains.kotlin.platform.type": "androidJvm"
      },
      "dependencies": [
        {
          "group": "androidx.compose.runtime",
          "module": "runtime",
          "version": {
            "prefers": "{COMPOSE_VERSION}"
          }
        }
      ]
    },

Once again we use a special string {COMPOSE_VERSION} to minimize the need to change this file as we update to new Compose versions.

Finally, perform this text-based substitution immediately after the file is generated. Here the {REDWOOD_VERSION} and {COMPOSE_VERSION} placeholders are replaced with their real values.

tasks.named("generateMetadataFileForKotlinMultiplatformPublication").configure {
  doLast {
    String find = file('module_find.txt').text.replace('{REDWOOD_VERSION}', version)
    String replace = file('module_replace.txt').text.replace('{COMPOSE_VERSION}', versions.compose)

    File file = outputFile.get().getAsFile()
    String text = file.text

    int start = text.indexOf(find)
    if (start == -1) {
      throw new RuntimeException("Unable to locate module_find.txt contents in module JSON ($file)")
    }
    int end = start + find.length()

    String newText = text.substring(0, start) + replace + text.substring(end)
    file.text = newText
  }
}

This is some very hacky code, but any unexpected changes to the module metadata format will cause a build failure allowing you to reevaluate the approach. Perhaps in the future Gradle will support this type of transformation with a stable public API.

This simple text substitution solves the original duplicate class problem today. And it does so in a way which does not require the consumer to understand the nuances of how the Compose runtime is built.

Despite solving the issue for Android builds, we still have the duplicate class problem for the other platforms on which multiple Compose-based projects can be used. If you happened to use Redwood on the JVM with JetBrains' Compose for Desktop you would have two copies of the Compose runtime (potentially built from different versions). The same is true for targeting the web and using JetBrains' Compose for Web.

Google really should be shipping the Compose runtime as a proper multiplatform artifact for all Kotlin targets to remedy this situation. Unfortunately their Kotlin multiplatform story is a few years behind the community's need and the prospect of this happening anytime soon is very unlikely. The best we can hope for now is JetBrains to ship a proper multiplatform artifact of the Compose runtime with the same versioning as Google's and using this hack to point the Android variant at Google's binary. Then everyone in the multiplatform Compose space could standardize on their artifacts.

Until then, however, we'll continue the imperfect practice of building our own Compose runtime for Redwood and pointing to Google's artifact for Android6.

Obligatory: [I mean Compose and NOT Compose UI][1]! [1]: /a-jetpack-compose-by-any-other-name/ ↩
Continuing with the poor naming surrounding Compose, JetBrains has a project called "Compose Multiplatform" which is not fully multiplatform nor fully ports Compose UI to each supported platform. Our project is "just" the Compose runtime (not Compose UI) but running fully multiplatform. ↩
Unlike the JVM whose classpath is a set of jars which each contain classes where the first wins, Android's classpath is a single set of classes in which duplicates are not supported (because of the dex file format). ↩
As of Gradle 7.2. ↩
Omitted from the earlier example, some variants have both an "api" and "runtime" entry. ↩
We also have to build the Compose Kotlin compiler plugin for native because of how the Kotlin/Native compiler works. Google could ship it, or JetBrains could make the existing plugins work for native. ↩

https://jakewharton.com/multiplatform-compose-and-gradle-module-metadata-abuse

Gradle dependency license validation

Jun 8, 2021 Updated Jun 8, 2021

Show full content

This post was published externally on Cash App Code Blog. Read it at https://code.cash.app/gradle-dependency-license-validation.

https://jakewharton.com/gradle-dependency-license-validation

Case-insensitive filesystems considered harmful (to me)

Jun 4, 2021 Updated Jun 4, 2021

Show full content

Having been burned by case-insensitive filesystem bugs one too many times, I long ago switched my development folder to a case-sensitive filesystem partition on my otherwise case-insensitive Mac. Unfortunately this can actually work against me as I interact with the computers of coworkers and service providers which use the default. Well I was burned again, and this is the tale!

I've been working on two projects based on Jetpack Compose 1 which require me to recompile its sources. Despite building them unmodified, I still run its tests against my compiled version to ensure this core functionality of my project behaves as expected. However, both of my projects recently started experiencing test failures on CI, and it was the same, single test failing on both projects.

The first project failed about a month ago when I added a MacOS worker in addition to the Linux worker to build a JNI library. Being so focused on the JNI compilation, I figured the Compose failure was a flake or something wrong with my setup. Its failure was:

androidx.compose.runtime.CompositionTests[jvm] > testInsertOnMultipleLevels[jvm] FAILED
    java.lang.NoClassDefFoundError: androidx/compose/runtime/CompositionTests$testInsertOnMultipleLevels$1$item$1 (wrong name: androidx/compose/runtime/CompositionTests$testInsertOnMultipleLevels$1$Item$1)
        at java.base/java.lang.ClassLoader.defineClass1(Native Method)
         ⋮
        at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
        at androidx.compose.runtime.CompositionTests$testInsertOnMultipleLevels$1.invokeSuspend$Item(CompositionTests.kt:2055)

Like I said I didn't look too closely at this output and assumed it was my own fault.

The second project (which is not open source yet) started failing yesterday when I added a Windows worker to publish new targets for its Kotlin multiplatform library. Notably, the project already had a MacOS worker, and the PR to add the Windows worker did see both workers succeed. The merge commit, however, failed with an exception on the Windows worker which looked awfully familiar:

androidx.compose.runtime.CompositionTests[jvm] > testInsertOnMultipleLevels[jvm] FAILED
    java.lang.NoClassDefFoundError: androidx/compose/runtime/CompositionTests$testInsertOnMultipleLevels$1$Item$1 (wrong name: androidx/compose/runtime/CompositionTests$testInsertOnMultipleLevels$1$item$1)
        at java.lang.ClassLoader.defineClass1(Native Method)
         ⋮
        at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
        at androidx.compose.runtime.CompositionTests$testInsertOnMultipleLevels$1.invokeSuspend$Item(CompositionTests.kt:2055)

"It's the same exception!", my brain thought. But if you look closely it is the same but it's also different. In this case we tried to load CompositionTests$testInsertOnMultipleLevels$1$Item$1 (note the uppercase "i" in Item) but found a class named CompositionTests$testInsertOnMultipleLevels$1$item$1 (note the lowercase "i" in item). This is in contrast to the first exception above where the "item" casing is reversed.

Cracking open CompositionTests we can look at the testInsertOnMultipleLevels method and see the source of this class:

fun testInsertOnMultipleLevels() = compositionTest {
  // …code…

  fun Item(number: Int, numbers: List<Int>) {
    Linear {
      // --> This lambda is the source! <--
      // …code…
    }
  }

  // …code…
}

The anonymous lambda passed to compositionTest becomes $1, the nested Item function becomes $Item, and the lambda passed to Linear becomes another $1 producing the final class name of CompositionTests$testInsertOnMultipleLevels$1$Item$1.

This all seems fine, though. So how could the name of the class for the function change casing from Item to item?

Thankfully, with the investigative powers of Isaac Udy helping, we stumbled upon more code further down the function:

fun testInsertOnMultipleLevels() = compositionTest {
  // …code…

  fun Item(number: Int, numbers: List<Int>) {
    Linear {
      // …code…
    }
  }

  // …code…

  fun MockViewValidator.item(number: Int, numbers: List<Int>) {
    Linear {
      // …code…
    }
  }

 // …code…
}

The class generation in this second nested function follow a similar formula to the first. The anonymous lambda passed to compositionTest once again becomes $1, the nested MockViewValidator.item function becomes $item, and the lambda passed to Linear becomes another $1 producing the final class name of CompositionTests$testInsertOnMultipleLevels$1$item$1.

And there it is. The lambda inside first function produces a class named CompositionTests$testInsertOnMultipleLevels$1$Item$1 which is written to CompositionTests$testInsertOnMultipleLevels$1$Item$1.class on the filesystem. The lambda inside the second function produces a class named CompositionTests$testInsertOnMultipleLevels$1$item$1 which is written to CompositionTests$testInsertOnMultipleLevels$1$item$1.class on the filesystem. Except on a case-insensitive filesystem, those are the same file!

To be clear, the problematic steps are this:

The build system cleans the output directory giving us a blank slate on the filesystem.
The Kotlin compiler generates the class CompositionTests$testInsertOnMultipleLevels$1$Item$1.
The Kotlin compiler opens the CompositionTests$testInsertOnMultipleLevels$1$Item$1.class file (which does not exist and is created), writes the bytecode for CompositionTests$testInsertOnMultipleLevels$1$Item$1, and closes the file.
The Kotlin compiler generates the class CompositionTests$testInsertOnMultipleLevels$1$item$1.
The Kotlin compiler opens the CompositionTests$testInsertOnMultipleLevels$1$item$1.class file (but the filesystem sees CompositionTests$testInsertOnMultipleLevels$1$Item$1.class as an existing match and opens it as an existing file), writes the bytecode for CompositionTests$testInsertOnMultipleLevels$1$item$1, and closes the file.

When the project builds on my machine the non-standard, case-sensitive filesystem sees those as separate files and the failure does not occur. On MacOS- and Windows-based CI workers with their filesystem defaults, however, they're seen as the same and one overwrites the other. This is what leads to the class name of the second appearing in the file name of the first.

The fix here is easy: rename one of the functions to produce different names. And in an ironic twist of timing, JetBrains made the exact fix to Compose just 12 hours ago.

-fun MockViewValidator.item(number: Int, numbers: List<Int>) {
+fun MockViewValidator.validateItem(number: Int, numbers: List<Int>) {
   Linear {
     // …code…
   }
 }

A simple git submodule update and all my problems are now solved.

Or are they?

This is not the first time I have had this problem, and it likely won't be the last. I would like to make the argument that this is a Kotlin compiler bug. Regardless of whether you are targeting a case-insensitive filesystem, the Kotlin compiler could avoid this entire class of problem by further mangling the name of this otherwise unnamed type to avoid case-insensitive collision.

You can trivially reproduce this if you have a case-insensitive filesystem:

class Hey
class hey

$ kotlinc Hey.kt
$ ls Hey*
Hey.class	Hey.kt

And a minimal reproducer for the more cryptic cause in this post would be:

fun complex() = run {
  fun Nested() {
    run { println("Nested") }
  }
  fun String.nested() {
    run { println("String.nested") }
  }
}
fun run(lambda: () -> Unit) = lambda()

$ kotlinc Complex.kt
$ ls Complex*
Complex.kt	ComplexKt$complex$1$Nested$1.class	ComplexKt$complex$1.class	ComplexKt.class

I have filed KT-47123 to advocate that the compiler should automatically prevent this from happening.

Hey Java users you're not totally immune either!

class Hey {}
class hey {}

$ javac Hey.java
$ ls Hey*
Hey.class	Hey.java

I'm confident that this year will finally be the year of the Linux desktop to solve all these problems with its case-sensitive-by-default filesystems, right? But until then, having tools which are smarter about filesystem interaction in a world where both case-sensitive and case-insensitive variants exist would go a long way to reducing developer headaches like this.

Obligatory: [I mean Compose and NOT Compose UI][2]! [2]: /a-jetpack-compose-by-any-other-name/ ↩

https://jakewharton.com/case-insensitive-filesystems-considered-harmful-to-me

Cross-compiling static Rust binaries in Docker for Raspberry Pi

May 27, 2021 Updated May 27, 2021

Show full content

Earlier this year I built a web-based garage door controller using Rust for the Raspberry Pi called Normally Closed. My deployment includes a Pi 3b and a Pi Zero which are ARMv7 and ARMv6 devices, respectively. I deploy services with Docker and wanted to continue using it here for simplicity.

Getting all of this set up and working together was not easy. There's also a lot of words in that title which might not mean much to you. That's okay! This is the blog post that I needed two weeks ago, so let's take a look at the steps required to accomplish this task.

Cross-compiling and static linking

Rust has excellent facilities for cross-compiling and static linking through Cargo. I got started following this guide on cross-compiling Rust for the Raspberry Pi.

The guide recommends using the armv7-unknown-linux-gnueabihf Rust target which would support my Pi 3b. For the Pi Zero we can infer from Rust's platform support list that we need arm-unknown-linux-gnueabihf. However, these targets dynamically link against GNU libc whereas I wanted to statically link with musl. Referencing the platform support list again we can find the armv7-unknown-linux-musleabihf and arm-unknown-linux-musleabihf targets to use instead.

$ rustup target add armv7-unknown-linux-musleabihf
$ rustup target add arm-unknown-linux-musleabihf

In addition to the target, the linker needs to be changed since it otherwise will use the one from your machine and its architecture. This can be specified in .cargo/config:

[target.armv7-unknown-linux-musleabihf]
linker = "arm-linux-gnueabihf-ld"

[target.arm-unknown-linux-musleabihf]
linker = "arm-linux-gnueabihf-ld"

On Ubuntu you can install this linker by running sudo apt install gcc-arm-linux-gnueabihf. On my Mac I was able to install it with brew install arm-linux-gnueabihf-binutils. (Despite using the musl compilation target, the GNU linker will still work.)

This is enough to compile working binaries!

$ cargo build --target armv7-unknown-linux-musleabihf --release
$ cargo build --target arm-unknown-linux-musleabihf --release

Bonus: Smaller binaries

I really like to make my binaries and Docker containers as small as possible. Once again Rust's Cargo gives us a simple mechanism to achieve this in our Cargo.toml:

[profile.release]
opt-level = 'z'
lto = true
codegen-units = 1

Setting opt-level to z instructs the compiler to favor a smaller binary size over performance. This will not be appropriate for anything CPU-intensive, but for a web server which will only see a few interactions per year we don't require maximum performance.

LTO is short for "link-time optimization" which performs optimization on the whole program rather than locally on individual functions. By virtue of analyzing the whole program it also improves the ability to remove dead code.

Finally, the codegen-units setting reduces the parallelism of Cargo to allow compilation to occur in a single unit and be optimized as a single unit. This allows compilation and optimization to have the maximum impact by always seeing the entire program.

Building inside Docker

Having starting with only the Pi 3b and needing ARMv7, getting the build going in Docker was not too difficult. We basically just run the commands from above to produce the binary and then copy that into an Alpine container.

FROM rust:1.52.1 AS rust
RUN rustup target add armv7-unknown-linux-musleabihf
RUN apt-get update && apt-get -y install binutils-arm-linux-gnueabihf
WORKDIR /app
COPY .cargo ./.cargo
COPY Cargo.toml Cargo.lock .rustfmt.toml ./
COPY src ./src
RUN cargo build --release --target armv7-unknown-linux-musleabihf

FROM --platform linux/arm alpine:3.12
WORKDIR /app
COPY --from=rust /app/target/armv7-unknown-linux-musleabihf/release/normally-closed ./
# ENTRYPOINT setup...

This container can be built with the regular docker build . command.

Adding a second architecture complicates things significantly. Docker does support containers which are built for multiple architectures through Docker buildx.

However, unlike the buildx examples, we cannot naively run docker buildx build --platform linux/arm/v7,linux/arm/v6 . and have it just work. For one, the Rust container is not available for those architectures. But even if it were, we still need to specify the custom compilation target per architecture due to our desire to use musl.

The first step towards making this work is having the Rust container always use the architecture of the machine on which it is running. This is similar to having run Cargo directly on our machine before, and it works because Rust is already allowing us to cross-compile to ARM.

-FROM rust:1.52.1 AS rust
+FROM --platform=$BUILDPLATFORM rust:1.52.1 AS rust
 RUN rustup target add armv7-unknown-linux-musleabihf
  ⋮

The BUILDPLATFORM argument is documented as one which is available by default in the global scope of a Dockerfile.

With Rust always running on our host architecture we still need to vary the Rust target which is used for cross-compilation. Also present in the list of default arguments in Docker is TARGETPLATFORM which will contain either linux/arm/v7 or linux/arm/v6 in our case. We can use a case statement to determine the associated Rust target.

 FROM --platform=$BUILDPLATFORM rust:1.52.1 AS rust
+ARG TARGETPLATFORM
+RUN case "$TARGETPLATFORM" in \
+  "linux/arm/v7") echo armv7-unknown-linux-musleabihf > /rust_target.txt ;; \
+  "linux/arm/v6") echo arm-unknown-linux-musleabihf > /rust_target.txt ;; \
+  *) exit 1 ;; \
+esac
 RUN rustup target add armv7-unknown-linux-musleabihf
  ⋮

We write the value to a file since export does not work and there's not really another mechanism for passing data between build steps. A read of that file replaces the hard-coded targets in the steps that follow.

  ⋮
 esac
-RUN rustup target add armv7-unknown-linux-musleabihf
+RUN rustup target add $(cat /rust_target.txt)
 RUN apt-get update && apt-get -y install binutils-arm-linux-gnueabihf
  ⋮
 COPY src ./src
-RUN cargo build --release --target armv7-unknown-linux-musleabihf
+RUN cargo build --release --target $(cat /rust_target.txt)

The Alpine build stage also references the target in the source folder of the binary. Rather than worry about passing along the file which holds this value, an easy workaround is to copy the binary to a location which does not contain the target name.

  ⋮
 RUN cargo build --release --target $(cat /rust_target.txt)
+# Move the binary to a location free of the target since that is not available in the next stage.
+RUN cp target/$(cat /rust_target.txt)/release/normally-closed .
  ⋮

The Alpine build stage can now remove the target platform and copy from the new location.

  ⋮
-FROM --platform linux/arm alpine:3.12
+FROM alpine:3.12
 WORKDIR /app
-COPY --from=rust /app/target/armv7-unknown-linux-musleabihf/release/normally-closed ./
+COPY --from=rust /app/normally-closed ./
 # ENTRYPOINT setup...

At this point we have a fully-working, cross-compiling, static-linking, multi-architecture Docker container built from Rust!

$ docker buildx build --platform linux/arm/v7,linux/arm/v6 .
[+] Building 147.9s (34/34) FINISHED

You can see the result reflected on the Docker Hub listing as compared to the latest release:

The full and final Dockerfile can be found here for reference. The repository also contains GitHub Actions setup for building the standalone binaries as well as the multi-architecture Docker container.

Hopefully this helps someone! It was a couple nights of piecing together all the steps for me. And hey if you have a garage door and a spare Pi lying around maybe try out Normally Closed!

https://jakewharton.com/cross-compiling-static-rust-binaries-in-docker-for-raspberry-pi

Migrating from Burst to TestParameterInjector

Apr 15, 2021 Updated Apr 15, 2021

Show full content

This post was published externally on Cash App Code Blog. Read it at https://code.cash.app/migrating-from-burst-to-testparameterinjector.

https://jakewharton.com/migrating-from-burst-to-testparameterinjector

Integration verbosity and good layering

Apr 7, 2021 Updated Apr 7, 2021

Show full content

One of my favorite non-features from building view binding is that it lacks integration with activities or fragments. If you use view binding with activities or fragments, however, this fact might be to your disdain. Every activity using view binding is forced to do something along the lines of:

override fun onCreate(savedInstanceState: Bundle) {
  super.onCreate(savedInstanceState)

  val binding = ProfileViewBinding.inflate(layoutInflater)
  setContentView(binding.root)

  // Do stuff with 'binding'
}

This is textbook verbosity, and some would argue boilerplate. It only gets worse with fragments (due to their poor design and to no specific fault of view binding which works the same as any View reference).

View binding exists at a different layer of abstraction than is appropriate for integration with higher-level components like activities or fragments. It serves as a type-safe representation of a schema declared in an XML file and that's it. It has no more knowledge of activities and fragments than the associated R.layout.profile_view integer does.

Higher-level libraries like androidx.activity and androidx.fragment have integrations with those R.layout integers. If you're upset that view binding has no turn-key solution for activities and fragments then this is the tree you should be barking up.

View binding wasn't built with verbosity in mind. Hell, it's not even that verbose. It ended up this way because it's the design the layer of abstraction it operates at demands.

The same pattern occurs in some of my other favorite libraries. Dagger offers you nothing and requires that you build up the dependency injectors, their hierarchy, and their lifecycle entirely yourself. SQLDelight makes you specify database info in the build configuration and a database driver in the runtime API. RecylerView requires at minimum an adapter subtype and to choose and configure a layout manager. The layer at which these tools operate is sufficiently general such that their good design requires them to hoist a bunch of decisions to their caller.

If Dagger was more opinionated about integration with Android it's hard to imagine Hilt could have been built as it is today. If SQLDelight was more opinionated about talking to SQLite on Android it's hard to imagine it could support talking to SQLite, MySQL, or Postgres on any platform as it does today. If RecyclerView was more opinionated about layout managers or adapters it's hard to imagine ViewPager 2 could have been built as it is today.

I certainly bear many scars of layering mistakes in my library past.

Picasso shipped with a global, static get() method so that image loading could be a one-liner with no setup. But what if you need to configure the HTTP client or set cache policy or need two different versions of those things? Libraries even shipped on top of Picasso using get() and assuming it would behave a certain way. It is a mistake to assume there will be only one configuration even if it is true 99% of the time. Half-life 3 Picasso 3 (if it ever ships) corrects this mistake by only offering instance-based APIs. If you want a global instance it's only one line of code, and it's now your decision to make.

Retrofit 1 shipped with a Gson dependency that was enabled by default. You could still swap in a different converter if you wanted, but Gson would always be there. It is a mistake to assume someone will be speaking JSON and that they will want to use Gson even if that was true (then) 99% of the time. We know literally nothing about the enclosing application or the server it's speaking to! Retrofit 2 corrects this mistake by only speaking bytes in its core. You're forced to bring a serialization format converter and configure it on each instance, even if it's always JSON and Gson (please stop using Gson).

You can usually spot these types of problems in libraries because they start to accumulate weird ceremony in order to support different use-cases like testing1. It can be tempting as a library author to over-correct away from exposing verbosity. By removing required configuration options and reducing the use of inversion of control you make the happy path happy, but alternative use-cases and alternative integrations become much harder.

Instead of trying to push verbosity down into the library when faced with situations like the view binding activity usage above, package it into an integration library that's easy to evolve or throw away. When one of those integrations inevitably disappears, or a new one arrives, your core library won't need to change.

Um, this sentence is somewhat ridiculous, right? Testing is not a different use case. It's a primary use case! I hope libraries come to mind here. Many do for me when I wrote it. ↩

https://jakewharton.com/integration-verbosity-and-good-layering

AssistedInject is dead, long live AssistedInject!

Mar 31, 2021 Updated Mar 31, 2021

Show full content

This post was published externally on Cash App Code Blog. Read it at https://code.cash.app/assisted-inject-is-dead-long-live-assisted-inject.

https://jakewharton.com/assisted-inject-is-dead-long-live-assisted-inject

A Jetpack Compose by any other name

Dec 30, 2020 Updated Dec 30, 2020

Show full content

I really like Jetpack Compose. Between work and personal stuff I have three projects which are each built on top of it. It's great!

So far my biggest problem is its name… but that requires some explaining. Welcome to one of the hills I'll die on!

What is Jetpack Compose?

If you're already familiar with it, something should pop in your head when asked: What is Jetpack Compose?

A new UI toolkit for Android? Yep, that's right. A declarative Android UI framework? Sure, that is correct. A multiplatform application UI? Thanks to JetBrains this is also true.

If you're somewhat in tune to how the sausage is made you may also reference the fact that it's a Kotlin compiler plugin and DSL to build Android UI or multiplatform UI. It's those things, too.

None of these answers are wrong. However, they're doing a bit of a disservice to the internals of Compose and its unrealized potential.

Pedigree

What we now know as Jetpack Compose started as two separate projects:

The first was a solution for writing declarative Android UIs using the existing platform UI toolkit. Take the declarative components of React, wrap it in an ahead-of-time compiler like Svelte, and target Android's UI toolkit with Kotlin. All existing View-based UI could suddenly level-up by changing their programming paradigm from imperative to declarative.

Separately, the toolkit team was about to ramp up on unbundling as many UI widgets as possible from the OS. This followed on the success of ViewPager, RecyclerView, and what was learned from the AppCompat and Design libraries. By removing a ton of OEM touchpoints and normalizing behavior across all versions of Android, the work required to build a good UI would be reduced.

Over time these efforts became inescapably linked.

If you are building standalone versions of the platform UI widgets then why not take the opportunity to correct mistakes in their API and overcome limitations of the resource system? And if you're changing their API, why not have the new declarative system target only these unbundled widgets? Each project only empowered the other as they spiraled closer together.

In hindsight, it seems inevitable they would become a single effort. Being a single effort does not necessarily mean tight coupling, however.

Layering

Each of my three projects built on Compose do not use the new Compose UI toolkit. That can be a confusing statement even to those who have done a lot of Compose work. Didn't we just call them inescapably linked? Didn't we define it earlier as a UI toolkit?

While Compose became a single effort started from two projects, a layering and responsibility split similar to those original projects still exists. In fact, that separation has only become more defined.

What this means is that Compose is, at its core, a general-purpose tool for managing a tree of nodes of any type. Well a "tree of nodes" describes just about anything, and as a result Compose can target just about anything.

Those tree nodes could be the new Compose UI toolkit internals. But it could just as easily be the old View nodes inside a ViewGroup tree, it could be a RemoteView and the various remote views within its tree, or a notification and the content inside it.

The nodes don't have to be UI related at all. The tree could exist at the presenter layer as view state objects, at the data layer as model objects, or simply be a value tree of pure data.

Compose doesn't care! This is the part I really like. This is the part that's great.

Separately, however, Compose is also a new UI toolkit and DSL which renders applications on Android and Desktop. This part of Compose is built on top of the aforementioned core as a tree of nodes which happen to be able to render themselves to a canvas.

The two parts are called the Compose compiler/runtime and Compose UI, respectively.

This separation of concerns is very welcome. Conflating both under the name of Compose, in my opinion, is not welcome.

Naming

Adjusting the naming here would address two problems: specificity and pigeonholing.

Placing a general-purpose compiler and runtime with a specific UI toolkit implementation under an umbrella name means discussions about them are imprecise by nature. This post started by saying that I'm working on three projects built on Compose… did you think I meant Compose UI? You almost certainly did.

The Compose name is more akin to Jetpack than it is AppCompat. We don't treat it like that, and there are no signs of Google correcting our perception. So now I must endlessly clarify that I'm working on three projects built on the Compose compiler/runtime which do not use the Compose UI toolkit and oh, by the way, yes, those are separate things.

Maybe you don't think this is a good enough reason to have two names. After all, how many people are going to build something on just the compiler and runtime?

Yet that is all the more reason to rename it! You've just pigeonholed the project to not be anything more than what it already is–a cool technology on which Compose UI is built. The possible applications of the general-purpose Compose compiler/runtime are widespread and should be encouraged. Right now it feels like Google buried the lede.

A separate name is an easy way to draw attention to the great work which is the compiler and runtime of Compose. There's been the rare tweet about it, the casual mention in a talk, and the occasional blog post showcasing a different use, but aside from that there's not much breathing room. The excitement around Compose UI (which is also much deserved) drowns it out.

Compose compiler/runtime supports more platforms and targets than Compose UI. In addition to Android I've run Compose-based projects on the JVM (in a server, not on desktop) and limped one along in a non-browser JS engine. These are places where Compose UI is impossible, but Compose is not!

I'm excited for these three projects of mine to make their way into open source to showcase what the Compose compiler and runtime can do on their own. I am not excited about having to continually clarify that the Compose compiler/runtime duo are not related to Compose UI or to Android.

Crane

The internal codename for the new UI toolkit was "crane". Before it was public I voiced support of retaining that name. After Compose was public I voiced support of using two names and using "crane" for Compose UI. But messages in chat rooms are easy to ignore–even if some agreed.

Unfortunately, much time has passed, and the tea leaves are showing that Compose is about to enter beta. It's too late to make this naming change for Compose UI. No one even calls it Compose UI. It's always been just Compose.

So this blog post is a hail-mary plea to Google: please rename the compiler and runtime to something else!

Compose is such a bland name anyway. Name it Evergreen (like the trees). Name it Juliet (who wrote the blog title). Hell, name it Crane (for maximum internal confusion). Give it the different name that it deserves so it can stand on its own.

But please, don't relegate this amazing, general-purpose, multiplatform compiler and runtime to live behind the blanket nomenclature that is just Compose!

https://jakewharton.com/a-jetpack-compose-by-any-other-name

Treating Dockerfiles as shell scripts

Dec 3, 2020 Updated Dec 3, 2020

Show full content

I use Docker to run a lot of tools. With the tools all wrapped up in containers, my computers are free of Python and Go and the various other dependencies needed for their use. While this is a nice win for isolation and reproducibility, the user experience is sub-par.

To run a tool, I just use docker run:

$ docker run --rm tool arguments...

But the editing workflow is something like:

$ nano tool.dockerfile
# hack hack hack...
$ docker build -t tool - < tool.dockerfile
$ docker run --rm tool arguments...

I also use a lot of bash scripts with easy-to-remember names. To run a script I just type its name: ./update_containers.sh. To edit, I open it in an editor, save, and then run. The user experience of this is top-notch!

Can we combine the two?

Executable Dockerfiles

If the first line of an executable starts with #!, unix-y systems will treat what follows on that line as an executable for interpreting the rest of the file. This is called the shebang, and the bash scripts I use start with one: #!/usr/bin/env bash.

In order to do this with a Dockerfile, though, we need a program which will conditionally run docker build and then docker run the resulting image. Thankfully docker build is already conditional and won't rebuild anything unless necessary, so we can always run it.

#!/usr/bin/env bash
NAME=$(basename "$1")
docker build -t "$NAME" - < "$1" > /dev/null
shift # Remove script name from arguments
docker run --rm --name "$NAME" "$NAME" "$@"

With this saved as dockerfile-shebang.sh, we can add it as the shebang in a Dockerfile.

#!/path/to/dockerfile-shebang.sh

FROM alpine:latest
ENTRYPOINT ["echo"]

Saving this as echo.dockerfile and running chmod +x echo.dockerfile provides the user experience we're after:

$ ./echo.dockerfile Hello, world!
Hello, world!

It's Dockerfile-Shebang!

I have wrapped up this utility into an executable, dockerfile-shabang. You can find it at github.com/JakeWharton/dockerfile-shebang.

The implementation is a bit more complicated than above for a few usability and correctness concerns:

Builds can be slow, so a message will be displayed if the container is currently being built.
If the build step fails, its entire output will be displayed to aid in debugging.
Most importantly, there's a mechanism for passing arguments to the docker run command for mounting volumes, setting environment variables, and any other container-level flags.

A real-world invocation looks something like:

$ ./tool.dockerfile -v /tanker/backups:/backups -e UID=1000 -- /backups/path/to/file.txt

While I've been using Docker to wrap tools for a while, I've only been using this shebang for a week. If you're feeling similar usability pain around Dockerfiles, try it out, let me know if it works, and let me know of any use cases you have which aren't covered.

https://jakewharton.com/treating-dockerfiles-as-shell-scripts

Peeking at command-line ANSI escape sequences

Oct 28, 2020 Updated Oct 28, 2020

Show full content

Command-line programs use color to convey additional information and to look pretty. For example, compare the output of ls with and without the --color flag:

The color helps convey information in this compact output that would otherwise only be available in more verbose forms (-l).

In addition to color, a program may update existing output. You can see this when updating images with docker-compose:

Both of these effects are created using something called ANSI escape sequences.

ANSI escape crash course

Reading the Wikipedia entry on ANSI escapes is a great starting point for learning how to recreate these examples. Each escape sequence starts with a 0x1B (escape) character followed usually by [ and then one or more commands using letters or numbers.

The ls example above uses green and blue text as well as making the colored entries bold which we can recreate.

echo -e "\e[1;32mbinary\e[0m  file  \e[1;34mfolder\e[0m"

Let's break down the interesting parts:

echo -e – Adding the -e flag to echo instructs it to enable backslash escapes.
\e[1;32m – \e is a backslash escape for the 0x1B escape character and the [ starts a sequence. 1 enables bold and 32 is the color green. Numbers are separated by ; and terminated by m. Anything that follows will now be displayed as bold and green.
\e[0m – Once again \e[ starts a sequence and m terminates it. The 0 clears all previous formatting.
\e[1;34m – Nearly identical to the sequence from before except it uses 34 for a blue color.

The docker-compose example moves the cursor to rewrite previous output which we can begin to recreate.

echo "Pulling zulu-jdk-15 ... downloading" && \
echo "Pulling zulu-jdk-11 ... downloading" && \
echo "Pulling zulu-jdk-8  ... downloading" && \
sleep 2 && \
echo -e "\e[2A\e[24C\e[32mdone\e[0m\e[K" && \
sleep 1 && \
echo -e "\e[24C\e[32mdone\e[0m\e[K" && \
sleep 1 && \
echo -e "\e[3A\e[24C\e[32mdone\e[0m\e[K\n\n"

Let's break down the interesting parts for this example:

\e[2A – Each echo emits a trailing newline, so after the third echo our cursor is below the third line at column 0. This command moves the cursor up (A) by two lines placing it on the "zulu-jdk-11" line still at column 0.
\e[24C – Move the cursor to the right (C) by 24 columns. This places the cursor directly before the "d" in "downloading".
\e[32m – Set the color to green. Remember this from the last section?
\e[K – After writing "done", the "loading" part of "downloading" is still visible. This command clears the current line from the cursor position to the line end.

With these ANSI escape sequences we can recreate existing programs and being to create our own. But how do we know whether we're using the same techniques as these programs? And if we don't know how to produce a particular output how can we discover how it was created?

Displaying ANSI sequences

Given that ANSI sequences start with the 0x1B character and then [ we can replace that escape with something else to disable it.

ls --color | sed -r 's/\x1b\[/\\e\[/g'

The sed command1 matches 0x1B and [ and replaces it with \e[ which is shown as normal text. This particular replacement is convenient because you can copy the output into an echo and see the rendered form.

In this output we can see ls is using almost exactly the same ANSI sequences as we were. The only addition is that they start with \e[0m in order to clear any existing formatting.

You may also notice that the output has changed to list each entry on its own line rather than on a single line. This is because ls detects that its output is going into a pipe rather than to a terminal display. Programs may also choose to omit color when piped which defeats the whole purpose of adding the sed command. To solve both cases, run the program using unbuffer before piping.

unbuffer ls --color | sed -r 's/\x1b\[/\\e\[/g'

With the pipe usage hidden by unbuffer, the output of ls is back to being a single line.

If you run docker-compose with unbuffer and piping to sed the result is clearly not correct:

unbuffer docker-compose pull | sed -r 's/\x1b\[/\\e\[/g'

This is because docker-compose is using carriage returns (\r) to move the cursor back to column 0 on a line. We can update our sed to include a command to escape carriage returns too.

unbuffer docker-compose pull | sed -r -e 's/\x0d/\\r/g' -e 's/\x1b\[/\\e\[/g'

Now we can see all the commands. There is a lot of output here because docker-compose is updating the display very rapidly. Unlike our toy version above, each line is fully rewritten for each update. At the very end, though, you can see the \e[32mdone\e[0m sequence as part of updating the "zulu-jdk-15" line.

Bonus technique: Asciinema

Asciinema can also be used to inspect ANSI sequences, carriage returns, and everything else that a program outputs. Every terminal image and animation captured in this post was captured using Asciinema before being fed to svg-term.

For example, the docker-compose output can be captured like this:

asciinema rec -c "docker-compose pull" docker.json

(Yes, I captured the above example of using Asciinema inside Asciinema!)

The resulting docker.json contains a series of JSON objects which describe the output commands.

{"version": 2, "width": 122, "height": 48, "timestamp": 1603858671, "env": {"SHELL": "/bin/bash", "TERM": "xterm-256color"}}
[0.412745, "o", "Pulling zulu-jdk-15 ... \r\r\nPulling zulu-jdk-11 ... \r\r\nPulling zulu-jdk-8  ... \r\r\n"]
[0.671883, "o", "\u001b[1A\u001b[2K\rPulling zulu-jdk-8  ... pulling from azul/zulu-openjdk\r\u001b[1B"]
[0.672048, "o", "\u001b[1A\u001b[2K\rPulling zulu-jdk-8  ... digest: sha256:13d16ca0335fbe1df3...\r\u001b[1B"]
[0.672159, "o", "\u001b[1A\u001b[2K\rPulling zulu-jdk-8  ... status: image is up to date for a...\r\u001b[1B"]
[0.672478, "o", "\u001b[1A\u001b[2K\r"]
[0.672507, "o", "Pulling zulu-jdk-8  ... \u001b[32mdone\u001b[0m\r\u001b[1B"]
[0.782864, "o", "\u001b[2A\u001b[2K\rPulling zulu-jdk-11 ... pulling from azul/zulu-openjdk\r\u001b[2B"]
[0.782985, "o", "\u001b[2A\u001b[2K\r"]
[0.78307, "o", "Pulling zulu-jdk-11 ... digest: sha256:315e0a2a7b6bcc2343...\r\u001b[2B"]
[0.783146, "o", "\u001b[2A\u001b[2K\rPulling zulu-jdk-11 ... status: image is up to date for a...\r\u001b[2B"]
[0.783372, "o", "\u001b[2A\u001b[2K\r"]
[0.783428, "o", "Pulling zulu-jdk-11 ... \u001b[32mdone\u001b[0m\r\u001b[2B"]
[1.091186, "o", "\u001b[3A\u001b[2K\rPulling zulu-jdk-15 ... pulling from azul/zulu-openjdk\r\u001b[3B"]
[1.09136, "o", "\u001b[3A\u001b[2K\rPulling zulu-jdk-15 ... digest: sha256:bf2d25e46d2c9fc373...\r\u001b[3B"]
[1.091511, "o", "\u001b[3A\u001b[2K\r"]
[1.091571, "o", "Pulling zulu-jdk-15 ... status: image is up to date for a...\r\u001b[3B"]
[1.091859, "o", "\u001b[3A\u001b[2K\rPulling zulu-jdk-15 ... \u001b[32mdone\u001b[0m\r"]
[1.091919, "o", "\u001b[3B"]

For a complex output like docker-compose the JSON form can be easier to understand. One other advantage is that each individual write to standard out gets its own line whereas with the sed escape technique we don't differentiate individual writes.

If you use tools like Docker, Gradle, Bazel, and even just ls you may be familiar with seeing colored and updating output daily. By using tools like sed and asciinema you can learn how those tools render their output. Should you find yourself building a command-line tool in the future, knowledge of how to use these ANSI sequences can help delight your users–even if it's only yourself!

If you are on Mac OS, you'll need GNU sed for the -r flag which can be installed via brew install gnu-sed and then used as gsed or by alias sed=gsed. ↩

https://jakewharton.com/peeking-at-colorful-command-line-output

Smaller APKs with resource optimization

Sep 1, 2020 Updated Sep 1, 2020

Show full content

How many times does the name of a layout file appear in an Android APK? We can build a minimal APK with a single layout file to count the occurrences empirically.

Building an Android app with Gradle requires only one thing: an AndroidManifest.xml file with a package. From there we can add a dummy layout whose contents are just <merge/> since we only care about its name.

.
├── build.gradle
└── src
    └── main
        ├── AndroidManifest.xml
        └── res
            └── layout
                └── home_view.xml

Running gradle assembleRelease will produce a release APK measuring a paltry 2,118 bytes. We can dump its contents using xxd and look for home_view byte sequences.

$ xxd build/outputs/apk/release/app-release-unsigned.apk
   ⋮
000004c0: 0000 0074 0000 0018 0000 0072 6573 2f6c  ...t.......res/l
000004d0: 6179 6f75 742f 686f 6d65 5f76 6965 772e  ayout/home_view.
000004e0: 786d 6c63 66e0 6028 6160 6060 6490 61d0  xmlcf.`(a```d.a.
   ⋮
00000570: 0000 0000 0000 0000 1818 7265 732f 6c61  ..........res/la
00000580: 796f 7574 2f68 6f6d 655f 7669 6577 2e78  yout/home_view.x
00000590: 6d6c 0000 0002 2001 f801 0000 7f00 0000  ml.... .........
   ⋮
00000700: 0000 0000 0909 686f 6d65 5f76 6965 7700  ......home_view.
00000710: 0202 1000 1400 0000 0100 0000 0100 0000  ................
   ⋮
00000870: 0000 ad04 0000 7265 732f 6c61 796f 7574  ......res/layout
00000880: 2f68 6f6d 655f 7669 6577 2e78 6d6c 504b  /home_view.xmlPK
   ⋮

There are three uncompressed occurrences of the path and one uncompressed occurrence of only the name in the APK based on this output.

If you have not read my post on calculating zip entry size or are not familiar with the structure of a zip file, a zip file is a list of file entries followed by a directory of all available entries. Each entry contains the file path and so does the directory. This accounts for the first occurrence (the entry header) and the last occurrence (the directory record) in the output.

The middle two occurrences in the output are from inside the resources.arsc file which is a database of sorts for resources. Its contents are visible because the file is uncompressed inside the APK. Running aapt dump --values resources build/outputs/apk/release/app-release-unsigned.apk shows the home_view record and its mapping to the path:

Package Groups (1)
Package Group 0 id=0x7f packageCount=1 name=com.example
  Package 0 id=0x7f name=com.example
    type 0 configCount=1 entryCount=1
      spec resource 0x7f010000 com.example:layout/home_view: flags=0x00000000
      config (default):
        resource 0x7f010000 com.example:layout/home_view: t=0x03 d=0x00000000 (s=0x0008 r=0x00)
          (string8) "res/layout/home_view.xml"

The APK contains a fifth occurrence of the name inside the classes.dex file. It does not show up in the xxd output because the file is compressed. Running baksmali dump <(unzip -p build/outputs/apk/release/app-release-unsigned.apk classes.dex) shows the dex file's string table which contains an entry for home_view:

                           |[10] string_data_item
000227: 09                 |  utf16_size = 9
000228: 686f 6d65 5f76 6965|  data = "home_view"
000230: 7700               |

This is for the field inside the R.layout class which maps the layout name to a unique integer value. Incidentally, that integer is the index into the resources.arsc database to look up the associated file name for reading its XML contents.

To summarize the answer to our question, for each resource file, the full path appears three times and the name appears twice.

Optimizing resources

Android Gradle plugin 4.2 introduces the android.enableResourceOptimizations=true flag which will run optimizations targeted for resources. This invokes the aapt optimize command on the merged resources and resources.arsc file before they are packaged into the APK. The optimization only applies to release builds and will run regardless of whether minifyEnabled is set to true.

With the flag added to gradle.properties we can compare two APKs using diffuse to see its effects. The output is long, so we'll break it apart by section.

          │       compressed        │       uncompressed
          ├─────────┬───────┬───────┼─────────┬─────────┬───────
 APK      │ old     │ new   │ diff  │ old     │ new     │ diff
──────────┼─────────┼───────┼───────┼─────────┼─────────┼───────
      dex │   695 B │ 695 B │   0 B │ 1,016 B │ 1,016 B │   0 B
     arsc │   682 B │ 674 B │  -8 B │   576 B │   564 B │ -12 B
 manifest │   535 B │ 535 B │   0 B │ 1.1 KiB │ 1.1 KiB │   0 B
      res │   185 B │ 157 B │ -28 B │   116 B │   116 B │   0 B
    asset │     0 B │   0 B │   0 B │     0 B │     0 B │   0 B
    other │    22 B │  22 B │   0 B │     0 B │     0 B │   0 B
──────────┼─────────┼───────┼───────┼─────────┼─────────┼───────
    total │ 2.1 KiB │ 2 KiB │ -36 B │ 2.7 KiB │ 2.7 KiB │ -12 B

First is a diff of the contents in the APK. The "compressed" columns are the size cost inside the APK, and the "uncompressed" columns are the cost when extracted.

The res category represents our single resource file whose size dropped 28 bytes. The arsc category is for the resource.arsc file which itself dropped 8 bytes. We'll see the cause of these changes shortly.

 DEX     │ old │ new │ diff
─────────┼─────┼─────┼───────────
   files │   1 │   1 │ 0
 strings │  15 │  15 │ 0 (+0 -0)
   types │   8 │   8 │ 0 (+0 -0)
 classes │   2 │   2 │ 0 (+0 -0)
 methods │   3 │   3 │ 0 (+0 -0)
  fields │   1 │   1 │ 0 (+0 -0)


 ARSC    │ old │ new │ diff
─────────┼─────┼─────┼──────
 configs │   1 │   1 │  0
 entries │   1 │   1 │  0

These two sections represent the code and contents of the resource database. Having no changes, we can infer that the optimizations have not affected the R.layout.home_view field nor the home_view resource entry.

=================
====   APK   ====
=================

   compressed   │  uncompressed  │
───────┬────────┼───────┬────────┤
 size  │ diff   │ size  │ diff   │ path
───────┼────────┼───────┼────────┼────────────────────────────
       │ -185 B │       │ -116 B │ - res/layout/home_view.xml
 157 B │ +157 B │ 116 B │ +116 B │ + res/eA.xml
 674 B │   -8 B │ 564 B │  -12 B │ ∆ resources.arsc
───────┼────────┼───────┼────────┼────────────────────────────
 831 B │  -36 B │ 680 B │  -12 B │ (total)

Finally, a granular diff of the file changes shows the effect of optimization. Our layout resource had its filename significantly truncated and was moved out of the layout/ folder!

Inside the Gradle project, the folder and file names of XMLs have meaning. The folder is the resource type, and the name corresponds to the generated field and resource entry in the .arsc file. Once those files are inside the APK, however, the file path is meaningless and arbitrary. Resource optimization leverages this fact by making the names as short as possible1.

The output of aapt dump confirms that the resource database also reflects the file change:

Package Groups (1)
Package Group 0 id=0x7f packageCount=1 name=com.example
  Package 0 id=0x7f name=com.example
    type 0 configCount=1 entryCount=1
      spec resource 0x7f010000 com.example:layout/home_view: flags=0x00000000
      config (default):
        resource 0x7f010000 com.example:layout/home_view: t=0x03 d=0x00000000 (s=0x0008 r=0x00)
          (string8) "res/eA.xml"

All three occurrences of the path in the APK are now shorter which results in the 36 byte savings. And while 36 bytes is a very small number, remember that the entire binary is only 2,118 bytes. A 36-byte savings is a 1.7% size reduction!

Real-world examples

The resources of a real application number far more than just one. What does this optimization look like when applied to a real application?

Plaid

Nick Butcher's Plaid app has 734 resource files. In addition to their quantity, the names of the resource files are more descriptive (which is a fancy way of saying they're longer). Instead of home_view, Plaid contains names like searchback_stem_search_to_back.xml, attrs_elastic_drag_dismiss_frame_layout, and designer_news_story_description.xml.

After updating the project to AGP 4.2, I used diffuse to compare a build without resource optimization to one with it enabled:

          │            compressed             │           uncompressed
          ├───────────┬───────────┬───────────┼───────────┬───────────┬───────────
 APK      │ old       │ new       │ diff      │ old       │ new       │ diff
──────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────
      dex │   3.8 MiB │   3.8 MiB │       0 B │   9.9 MiB │   9.9 MiB │       0 B
     arsc │ 316.7 KiB │ 292.5 KiB │ -24.2 KiB │ 316.6 KiB │ 292.4 KiB │ -24.2 KiB
 manifest │     3 KiB │     3 KiB │       0 B │  11.9 KiB │  11.9 KiB │       0 B
      res │ 539.2 KiB │ 490.7 KiB │ -48.5 KiB │ 617.2 KiB │ 617.2 KiB │       0 B
   native │   4.6 MiB │   4.6 MiB │       0 B │   4.6 MiB │   4.6 MiB │       0 B
    asset │       0 B │       0 B │       0 B │       0 B │       0 B │       0 B
    other │  83.6 KiB │  83.6 KiB │       0 B │ 128.6 KiB │ 128.6 KiB │       0 B
──────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────
    total │   9.4 MiB │   9.3 MiB │ -72.7 KiB │  15.6 MiB │  15.5 MiB │ -24.2 KiB

Resource optimization netted a 0.76% savings on APK size. The native library size kept the impact smaller than I had hoped.

SeriesGuide

Uwe Trottmann's SeriesGuide app has 1044 resource files. Unlike Plaid, it is free of native libraries which should increase the impact of the optimization.

Once again I updated the project to AGP 4.2 and used diffuse to compare two builds:

          │            compressed             │           uncompressed
          ├───────────┬───────────┬───────────┼───────────┬───────────┬───────────
 APK      │ old       │ new       │ diff      │ old       │ new       │ diff
──────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────
      dex │   2.4 MiB │   2.4 MiB │       0 B │   5.7 MiB │   5.7 MiB │       0 B
     arsc │   1.7 MiB │   1.6 MiB │ -32.9 KiB │   1.7 MiB │   1.6 MiB │ -32.9 KiB
 manifest │   5.6 KiB │   5.6 KiB │       0 B │  28.3 KiB │  28.3 KiB │       0 B
      res │ 693.9 KiB │   628 KiB │   -66 KiB │ 992.2 KiB │ 992.2 KiB │       0 B
    asset │  39.9 KiB │  39.9 KiB │       0 B │ 100.4 KiB │ 100.4 KiB │       0 B
    other │ 118.1 KiB │ 118.1 KiB │       0 B │ 148.8 KiB │ 148.8 KiB │       0 B
──────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────
    total │   4.9 MiB │   4.8 MiB │ -98.9 KiB │   8.6 MiB │   8.6 MiB │ -32.9 KiB

Here resource optimization was able to reduce the APK size by 2.0%!

Tivi

Chris Banes' Tivi app has a non-trivial subset written using Jetpack Compose which means fewer resources overall. A current build still contains 776 resource files.

By virtue of using Compose, Tivi is already using the latest AGP 4.2. With two quick builds we can see the impact of resource optimization:

          │            compressed             │           uncompressed
          ├───────────┬───────────┬───────────┼───────────┬───────────┬───────────
 APK      │ old       │ new       │ diff      │ old       │ new       │ diff
──────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────
      dex │     3 MiB │     3 MiB │       0 B │   6.8 MiB │   6.8 MiB │       0 B
     arsc │ 363.4 KiB │ 337.9 KiB │ -25.6 KiB │ 363.3 KiB │ 337.7 KiB │ -25.6 KiB
 manifest │   3.6 KiB │   3.6 KiB │       0 B │  16.1 KiB │  16.1 KiB │       0 B
      res │ 680.4 KiB │ 629.2 KiB │ -51.2 KiB │   1.2 MiB │   1.2 MiB │       0 B
    asset │  39.9 KiB │  39.9 KiB │       0 B │ 100.4 KiB │ 100.4 KiB │       0 B
    other │ 159.9 KiB │ 151.7 KiB │  -8.2 KiB │ 306.3 KiB │ 254.8 KiB │ -51.5 KiB
──────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────
    total │   4.2 MiB │   4.1 MiB │   -85 KiB │   8.8 MiB │   8.7 MiB │ -77.1 KiB

Once again we hit the 2.0% mark for APK size reduction!

One more occurrence

All four examples so far have not used signed APKs. There are multiple versions of APK signing, and if your minSdkVersion is lower than 24 you are required include version 1 (V1) when signing. V1 signing uses Java's .jar signing specification which signs each file individually as a text entry in the META-INF/MANIFEST.MF file.

After creating and configuring a keystore for the original single-layout app, dumping the manifest file with unzip -c build/outputs/apk/release/app-release.apk META-INF/MANIFEST.MF shows these signatures:

Manifest-Version: 1.0
Built-By: Signflinger
Created-By: Android Gradle 4.2.0-alpha08

Name: AndroidManifest.xml
SHA-256-Digest: HdoGVd8U3Zjtf2VkGLExAPCQ1fq+kNL8eHKjVQXGI60=

Name: classes.dex
SHA-256-Digest: BVA1ApPvECg56DrrNPgD3jgv1edcM8VKYjcJEAG4G44=

Name: res/eA.xml
SHA-256-Digest: nDn7UQex2OWB3/AT054UvSAx9pYNSWwERCLfgdM6J6c=

Name: resources.arsc
SHA-256-Digest: 6w7i2Z9+LjwqlXS7YhhjzP/XhgvJF3PUuyJM60t0Qbw=

The full path of each file makes an appearance bringing the total occurrences of each resource path to four. Since shorter names will once again result in this file containing fewer bytes, resource optimization has an even greater impact.

The Google-internal email which introduced me to this feature purported a savings of 1-3% on final APK size. Based on real-world tests this range seems to be about right. Ultimately the savings will depend on the size and number of resource files in your APK.

If you're already using AGP 4.2 add android.enableResourceOptimizations=true to your gradle.properties and enjoy this free APK size savings. If you are not yet on AGP 4.2 add it anyway so that you don't forget when you eventually upgrade!

In this example, notably, the name doesn't seem as small as possible since it is two characters instead of one. A hash function computes the new name for each file. The number of resource files dictates the size of the hash which has a lower bound of two. The algorithm appears to work with a lower bound of one, so I'm not sure why the author chose to use two. Perhaps they didn't expect projects to contain fewer than 64 resources. I sent r.android.com/1416749 to lower the bound. ↩

https://jakewharton.com/smaller-apks-with-resource-optimization

Shrinking a Kotlin binary by 99.2%

Aug 24, 2020 Updated Aug 24, 2020

Show full content

We'll get to the shrinking, but first let's motivate the binary in question. Three years ago I wrote the "Surfacing Hidden Change to Pull Requests" post which covered pushing important stats and diffs into PRs as a comment. This avoids surprises with changes that affect binary size, manifests, and dependency trees.

Showing dependency trees used Gradle's dependencies task and diff -U 0 to display changes from the previous commit. The example in that post bumped the Kotlin version from 1.1-M03 to 1.1-M04 producing the following diff:

@@ -125,2 +125,3 @@
-|    \--- org.jetbrains.kotlin:kotlin-stdlib:1.0.4 -> 1.1-M03
-|         \--- org.jetbrains.kotlin:kotlin-runtime:1.1-M03
+|    \--- org.jetbrains.kotlin:kotlin-stdlib:1.0.4 -> 1.1-M04
+|         \--- org.jetbrains.kotlin:kotlin-runtime:1.1-M04
+|              \--- org.jetbrains:annotations:13.0
@@ -145,2 +146 @@
-+--- org.jetbrains.kotlin:kotlin-stdlib:1.1-M03
-+--- org.jetbrains.kotlin:kotlin-runtime:1.1-M03
++--- org.jetbrains.kotlin:kotlin-stdlib:1.1-M04

Aside from seeing the version bump reflected, there's two extra facts here we can deduce about the change:

The kotlin-runtime dependency gained a dependency on Jetbrains' annotations artifact as seen in the first section of the diff.
A direct dependency on kotlin-runtime was removed as seen in the second section of the diff. This is fine, as the first section already tells us that kotlin-runtime is a dependency of kotlin-stdlib.

These two facts are shown in the displayed diff, but there's a subtle third fact which is only implied. Because the first section is indented, we know that one of our direct dependencies has a transitive dependency on kotlin-stdlib. Unfortunately we have no idea which dependency is affected.

To solve this problem I wrote a tool called dependency-tree-diff which shows the path to a root dependency for any changes in the tree.

 +--- com.jakewharton.rxbinding:rxbinding-kotlin:1.0.0
-|    \--- org.jetbrains.kotlin:kotlin-stdlib:1.0.4 -> 1.1-M03
-|         \--- org.jetbrains.kotlin:kotlin-runtime:1.1-M03
+|    \--- org.jetbrains.kotlin:kotlin-stdlib:1.0.4 -> 1.1-M04
+|         \--- org.jetbrains.kotlin:kotlin-runtime:1.1-M04
+|              \--- org.jetbrains:annotations:13.0
-+--- org.jetbrains.kotlin:kotlin-stdlib:1.1-M03 (*)
-\--- org.jetbrains.kotlin:kotlin-runtime:1.1-M03
+\--- org.jetbrains.kotlin:kotlin-stdlib:1.1-M04 (*)

Our implicit third fact, which other direct dependency was affected, is now explicit in the output. Change authors can now reflect whether there may be any compatibility issues with the affected dependencies.

You can learn more about the tool and see another example in its README.

Shrinking the binary

This tool needs to be checked into our repo and run on CI. Having successfully built adb-event-mirror using Kotlin script the first version of this tool also used Kotlin script. While it worked and was tiny, kotlinc is not installed on the CI machines. We rely on the Kotlin Gradle plugin to compile Kotlin, not a standalone binary.

You can locally redirect the Kotlin script cache directory to capture the compiled jar, but it still depends on the Kotlin script artifact which is large, has lots of dependencies, and are still quite dynamic. It was clear this wasn't the right path, but I filed KT-41304 to hopefully make producing a fat .jar of a script easier in the future.

I switched to a classic Kotlin Gradle project and produced a fat .jar with the kotlin-stdlib dependency included. After prepending a script to make the jar self-executing, the binary clocked in 1699978 bytes (or ~1.62MiB). Not bad, but we can do better!

Removing Kotlin metadata

Listing the files in the .jar using unzip -l shows that aside from .class, the majority are .kotlin_module or .kotlin_metadata. These are used by the Kotlin compiler and by Kotlin's reflection and neither are needed for our binary.

We can filter these out of the binary along with module-info.class which is used for Java 9's module system and files in META-INF/maven/ which propagate information about projects built with the Maven tool.

Removing all these files yields a new binary size of 1513414 bytes (~1.44MiB), an 11% reduction in size.

Using R8

R8 is the code optimizer and obfuscator for Android builds. While it's normally used to optimize and obfuscate Java classfiles during conversion to the Dalvik executable format, it also supports outputting Java classfiles. In order to use it, we need to specify the entry point to the tool using ProGuard's configuration syntax.

-dontobfuscate
-keepattributes SourceFile, LineNumberTable

-keep class com.jakewharton.gradle.dependencies.DependencyTreeDiff {
  public static void main(java.lang.String[]);
}

In addition to the entrypoint, obfuscation is disabled, and we retain the source file and line number attributes so that any exceptions which occur will still be understandable.

Passing the fat .jar through R8 produces a new minified .jar which can then be made executable. The resulting binary is now just 41680 bytes (~41KiB), a 98% reduction in size. Nice!

Since we are producing a binary and not a library, the -allowaccessmodification option will make optimizations like class merging and inlining more effective by allowing hidden members to be made public. Adding this produces a binary of 37630 bytes (~37KiB).

Tweaking standard library usage

It is absolutely safe to stop here, but I'm bad at stopping...

Now that the binary is sufficiently small we can start looking at what code is contributing to the size. Normally I would turn to javap for peeking at bytecode, but since we only care about seeing API calls we can unzip the binary and open the classfiles in IntelliJ IDEA which will use the Fernflower decompiler to show roughly-equivalent Java.

The main method starts by reading in the arguments as files:

fun main(vararg args: String) {
  if (args.size == 2) {
    val old = args[0].let(::File).readText()
    val new = args[1].let(::File).readText()

The decompiled code looks like this:

public static final void main(String... var0) {
  Intrinsics.checkNotNullParameter(var0, "args");
  if (var0.length == 2) {
    String[] var10000 = var0;
    String var3 = var0[0];
    var3 = FilesKt__FileReadWriteKt.readText$default(new File(var3), (Charset)null, 1);
    String var1 = var10000[1];
    String var8 = FilesKt__FileReadWriteKt.readText$default(new File(var1), (Charset)null, 1);

Peeking at FilesKt__FileReadWriteKt shows the unfortunate file reading code we've all written at some point in the past, and it pulls in kotlin.ExceptionsKt, kotlin.jvm.internal.Intrinsics, and kotlin.text.Charsets.

Switching from java.io.File to java.nio.path.Path means we can use a built-in method for reading the contents.

 fun main(vararg args: String) {
   if (args.size == 2) {
-    val old = args[0].let(::File).readText()
-    val new = args[1].let(::File).readText()
+    val old = args[0].let(Paths::get).let(Paths::readString)
+    val new = args[1].let(Paths::get).let(Paths::readString)

With these changes the binary drops to 30914 bytes (~30KiB).

Another standard library usage that caught my eye is splitting the inputs by line:

private fun findDependencyPaths(text: String): Set<List<String>> {
  val dependencyLines = text.lineSequence()
    .dropWhile { !it.startsWith("+--- ") }
    .takeWhile { it.isNotEmpty() }

The decompiled Java looks somewhat like this:

public static final Set findDependencyPaths(String var0) {
  String[] var10000 = new String[]{"\r\n", "\n", "\r"};
  List var1;
  DelimitedRangesSequence var2;

This indicates that we're using a Kotlin implementation of splitting and using its Sequence type. Java 11 added a String.lines() which returns a Stream that also has the dropWhile and takeWhile operators which are already in use. Unfortunately Kotlin also has a String.lines() extension, so we need a cast in order to use the Java 11 method.

 private fun findDependencyPaths(text: String): Set<List<String>> {
-  val dependencyLines = text.lineSequence()
+  val dependencyLines = (text as java.lang.String).lines()
     .dropWhile { !it.startsWith("+--- ") }
     .takeWhile { it.isNotEmpty() }

This change drops the binary to just 13643 bytes (~13KiB) for a 99.2% reduction.

Remaining bloat

Kotlin being a multiplatform language means that it has its own implementation of an empty list, set, and map. When targeting the JVM, however, there's no reason to use these over the ones provided by java.util.Collections. I filed KT-41333 to track this enhancement.

Dumping the contents of the final binary shows its empty collections (and related types) contribute about 50% of the remaining size:

$ unzip -l build/libs/dependency-tree-diff-r8.jar
Archive:  build/libs/dependency-tree-diff-r8.jar
  Length      Date    Time    Name
---------  ---------- -----   ----
       84  12-31-1969 19:00   META-INF/MANIFEST.MF
      926  12-31-1969 19:00   com/jakewharton/gradle/dependencies/DependencyTrees$findDependencyPaths$dependencyLines$1.class
      854  12-31-1969 19:00   com/jakewharton/gradle/dependencies/DependencyTrees$findDependencyPaths$dependencyLines$2.class
     6224  12-31-1969 19:00   com/jakewharton/gradle/dependencies/DependencyTreeDiff.class
      604  12-31-1969 19:00   com/jakewharton/gradle/dependencies/Node.class
     2534  12-31-1969 19:00   kotlin/collections/CollectionsKt__CollectionsKt.class
     1120  12-31-1969 19:00   kotlin/collections/EmptyIterator.class
     3227  12-31-1969 19:00   kotlin/collections/EmptyList.class
     2023  12-31-1969 19:00   kotlin/collections/EmptySet.class
     1958  12-31-1969 19:00   kotlin/jvm/internal/CollectionToArray.class
     1638  12-31-1969 19:00   kotlin/jvm/internal/Intrinsics.class
---------                     -------
    21192                     11 files

In addition to those extra types, the bytecode contains a bunch of extra null checks. For example, the decompiled bytecode for findDependencyPaths from the last section actually looks like this:

public static final Set findDependencyPaths(String var0) {
  Intrinsics.checkNotNullParameter(var0, "$this$lines");
  Intrinsics.checkNotNullParameter(var0, "$this$lineSequence");
  String[] var10000 = new String[]{"\r\n", "\n", "\r"};
  Intrinsics.checkNotNullParameter(var0, "$this$splitToSequence");
  Intrinsics.checkNotNullParameter(var10000, "delimiters");
  Intrinsics.checkNotNullParameter(var10000, "$this$asList");

These Intrinsics calls enforce the nullability invariants of the type system on function parameters, but after inlining all but the first one are redundant. Duplicate calls like this appear all over the code. This is an R8 bug caused by Kotlin renaming these intrinsic methods and R8 not updating to properly track that change.

With these two issues fixed, it's likely the binary will drop into single-digit KiBs producing a high-99 percent reduction from the original fat .jar.

If you are building a JVM binary or a JVM library which shades dependencies make sure you use a tool like R8 or ProGuard to remove unused code paths, or use a Graal native image to produce a minimal native binary. This tool was kept as Java bytecode so that a single .jar can be used on multiple platforms.

The full source code and build setup for dependency-tree-diff is available on GitHub.

https://jakewharton.com/shrinking-a-kotlin-binary

Wire Support For Swift, Part 1

Aug 19, 2020 Updated Aug 19, 2020

Show full content

This post was published externally on Cash App Code Blog. Read it at https://code.cash.app/wire-support-for-swift-part-1.

https://jakewharton.com/wire-support-for-swift-part-1

Sixteen corners

Aug 6, 2020 Updated Aug 6, 2020

Show full content

Last year I built a library called Picnic for rendering data tables in monospaced environments like your terminal. Part of rendering the table is calculating what character to use for each wall and each corner separating the cells.

Here's a representative output with a bunch of different corner styles:

          │          compressed           │          uncompressed
          ├───────────┬───────────┬───────┼───────────┬───────────┬────────
 APK      │ old       │ new       │ diff  │ old       │ new       │ diff
──────────┼───────────┼───────────┼───────┼───────────┼───────────┼────────
      dex │ 664.8 KiB │ 664.8 KiB │ -25 B │   1.5 MiB │   1.5 MiB │ -112 B
     arsc │ 201.7 KiB │ 201.7 KiB │   0 B │ 201.6 KiB │ 201.6 KiB │    0 B
 manifest │   1.4 KiB │   1.4 KiB │   0 B │   4.2 KiB │   4.2 KiB │    0 B
      res │ 418.2 KiB │ 418.2 KiB │ -14 B │ 488.3 KiB │ 488.3 KiB │    0 B
    asset │       0 B │       0 B │   0 B │       0 B │       0 B │    0 B
    other │  37.1 KiB │  37.1 KiB │   0 B │  36.3 KiB │  36.3 KiB │    0 B
──────────┼───────────┼───────────┼───────┼───────────┼───────────┼────────
    total │   1.3 MiB │   1.3 MiB │ -39 B │   2.2 MiB │   2.2 MiB │ -112 B

Wall border calculation is straightforward. For a vertical wall, a vertical pipe is used if either or both of the two cells wants a border, otherwise an empty space is used.1

Corner calculation is a bit more involved. A corner has four potential segments for the four cardinal directions that may be drawn. The four adjacent cells each participate in the visibility of two segments.

Corner Characters

Once the code determines the four boolean values for the four segments of a corner we need to map that to the display character. Four booleans produce sixteen possible values.

Initially I started with the naive nesting of conditionals to get it working.

return if (left) {
  if (right) {
    if (up) {
      if (down) {
        '┼'
      } else {
        '┴'
      }
    } else {
      if (down) {
        '┬'
      } else { /*..*/ }
    }
  } else { /*..*/ }
}

Nesting conditionals is an optimization so that each boolean is only checked once. If we wanted, we could flatten the conditionals by repeatedly checking each boolean.

if (left && right &&  up &&  down) return '┼'
if (left && right &&  up && !down) return '┴'
if (left && right && !up &&  down) return '┬'
if (left && right && !up && !down) return '─'
// ...

The boolean type is a facade over the binary values 0 and 1. Replacing these conditionals with the corresponding binary yields familiar values: 1111, 1110, 1101, 1100, etc. These are the decimal values 15, 14, 13, 12, and so on down to 0.

Mapping the four booleans to these bits gives a decimal we can use to index into a single string which contains all the corner characters.

val corners = " ╷╵│╶┌└├╴┐┘┤─┬┴┼"
val index =
  (if (down) 0b0001 else 0) or
  (if (up) 0b0010 else 0) or
  (if (right) 0b0100 else 0) or
  (if (left) 0b1000 else 0)
return corners[index]

Much nicer!

Testing Corners

The logic of determining the four booleans and then choosing the corner character needs tests. Once again I started with the naive approach of a bunch of 2x2 tables with varying borders so that the middle corner was different in each.

@Test fun borderLeftRightUpDown() {
  val table = table { /*..*/ }
  assertThat(table.renderText()).isEqualTo("""
    |1│2
    |─┼─
    |3│4
    |""".trimMargin())
}

@Test fun borderLeftRightUp() {
  val table = table { /*..*/ }
  assertThat(table.renderText()).isEqualTo("""
    |1│2
    |─┴─
    |3 4
    |""".trimMargin())
}

Needing sixteen different tests feels very much like the nested conditionals above. Sure it's correct, but can we do better? That was the question I presented to two friends who had already been watching me build the library.

(At this point I think they know to just stand back as I fall down these rabbit holes.)

What do you think? Feel free to give it a try! Scroll down for the answer...

1111...

1110...

1101...

1100...

1011...

1010...

1001...

1000...

0111...

0110...

0101...

0100...

0011...

0010...

0001...

0000!

After about 10 minutes at the whiteboard I managed to come up with a configuration that worked.

This translates nicely into a single test.

@Test fun allCorners() {
  val table = table { /*..*/ }
  assertThat(table.renderText()).isEqualTo("""
    |┌─┬─┐ ╷
    |│1│2│3│
    |├─┤ ╵ │
    |│4│5 6│
    |└─┼───┘
    | 7│8 9 
    |╶─┴─╴  
    |""".trimMargin())
}

The number of theoretical arrangements of corners is 16!, or 20,922,789,888,000, so finding a solution felt like a nice win.

This post was supposed to stop here, but...

Finding All Possible Arrangements

I did the above work a year ago, but upon seeing the very large value of 16! in preparing the post I began to wonder how many valid arrangements exist.

Once again starting naive, I wrote a recursive function which created permutations of the numbers [0,15] and then did a validation pass to see if all corresponding corners had matching edges.

(0 until 16)
    .permutationSequence() // <-- produces Sequence<IntArray>
    .filter { validateCorners(it) }
    .forEach { println(it.contentToString()) }

This was exorbitantly slow. I let it run for an hour, and it never got far enough to find a single match.

Instead of validating each complete permutation, huge sets of permutation candidates could immediately be rejected as soon as two corners were invalid. For example, if the very first corner (upper left) has a left or up segment we can immediately reject it and eliminate 15! candidates.

fun validTables(): Sequence<IntArray> = sequence {
  val state = IntArray(16)
  suspend fun SequenceScope<IntArray>.placeCorner(index: Int) {
    if (index == 16) {
      yield(state.clone())
      return
    }
    for (corner in 0 until 16) {
      // TODO validate corner fits here!

      state[index] = corner
      placeCorner(index + 1)
    }
  }
  placeCorner(0)
}

Instead of using a two-dimensional array to map the 4x4 grid it is flattened into a single 16-element array.

Since each corner needs to be different, we need to track which of the 16 were already used. This could be done with a Set<Int> but that would require allocation. Since the range of values is [0,15] and we only need to store a boolean value we can once again turn to using bits in a single Int.

 fun validTables(): Sequence<IntArray> = sequence {
   val state = IntArray(16)
-  suspend fun SequenceScope<IntArray>.placeCorner(index: Int) {
+  suspend fun SequenceScope<IntArray>.placeCorner(index: Int, used: Int) {
     if (index == 16) {
       yield(state.clone())
       return
     }
     for (corner in 0 until 16) {
+      if (used.hasBit(corner)) continue
+
       // TODO validate corner fits here!

       state[index] = corner
-      placeCorner(index + 1)
+      placeCorner(index + 1, used.withBit(corner))
     }
   }
-  placeCorner(0)
+  placeCorner(0, 0)
 }
+
+fun Int.hasBit(bit: Int) = ((1 shl bit) and this) != 0
+fun Int.withBit(bit: Int) = (1 shl bit) or this

There are three constraints for placing a corner at the current index that must be validated:

If the corner is at the edge of the square, no corner segment must be present in the direction of the edge.

For example, index 0 which is at the top and left edge of the 4x4 cannot be ├ because it has an up segment.
If there is a corner to the left of the current index in the 4x4 grid, this corner can only have a left segment if that corner has a right segment.

For example, if ┐ is at index 1 then ┬ is invalid for index 2 since they do not agree about the presence of a horizontal segment.
If there is a corner above the current index in the 4x4 grid, this corner can only have an up segment if that corner has a down segment.

For example, if ╶ is at index 0 then ├ is invalid for index 4 since they do not agree about the presence of a vertical segment.

In the same way four booleans were used as bits to create the numbers [0,15] in the first section, we can invert that operation to extract the four booleans from the numbers to perform validation.

fun Int.hasDownSegment() = (0b0001 and this) != 0
fun Int.hasUpSegment() = (0b0010 and this) != 0
fun Int.hasRightSegment() = (0b0100 and this) != 0
fun Int.hasLeftSegment() = (0b1000 and this) != 0

With these helpers we can add the validation.

     for (corner in 0 until 16) {
       if (used.hasBit(corner)) continue
 
-       // TODO validate corner fits here!
+      if (index > 11 && corner.hasDownSegment()) continue // Bottom row
+      if (index % 4 == 3 && corner.hasRightSegment()) continue // Right column
+
+      // Find the previous row and column corners so we can test if the current corner can fit at
+      // this position. Use 0 when in top row or left column since it will always be incompatible.
+      val previousRowCorner = if (index % 4 == 0) 0 else state[index - 1]
+      val previousColCorner = if (index < 4) 0 else state[index - 4]
+
+      if (previousRowCorner.hasRightSegment() != corner.hasLeftSegment()) continue // Horizontal mismatch
+      if (previousColCorner.hasDownSegment() != corner.hasUpSegment()) continue // Vertical mismatch
 
       state[index] = i
       placeCorner(index + 1, used.withBit(i))
     }

With no allocation and being able to quickly reject massive sets of invalid candidates this should hopefully produce results in less than an hour. Let's run it!

fun main() {
  val time = measureTimeMillis {
    validTables().forEachIndexed { index, corners ->
      val table = corners.map { " ╷╵│╶┌└├╴┐┘┤─┬┴┼".get(it) }
        .joinToString("")
        .chunked(4)
        .joinToString("\n")
      println("#${index + 1}: ${state.contentToString()}\n$table\n")
    }
  }
  println("Done. Took $time milliseconds.")
}

Survey says?

#1: [0, 1, 4, 8, 5, 15, 12, 9, 3, 7, 13, 11, 2, 6, 14, 10]
 ╷╶╴
┌┼─┐
│├┬┤
╵└┴┘

...

#652: [5, 13, 12, 9, 7, 15, 8, 3, 6, 11, 1, 2, 4, 14, 10, 0]
┌┬─┐
├┼╴│
└┤╷╵
╶┴┘ 

Done. Took 57 milliseconds.

Considerably faster than taking hours! Only 652 valid candidates out of 20,922,789,888,000 possible permutations. You can check out the full list here.

If we look at output #1 above, this table is technically invalid since it contains an orphan corner in the upper right. There is no way to create such a corner by setting borders on table cells. However, purely from a segment-validation standpoint it is valid. Visual inspection of the candidates makes it seem like about 15-25% suffer from this case.

I'm out of time on this post, so finding the true number of valid configurations expressible by table cell borders will have to be an exercise left to the reader.

Creating Picnic was a fun rabbit hole to fall into for a few days last year. Aside from the challenges of corners, it implements the CSS specification for measuring and laying out tables and supports row and column spans, vertical and horizontal text alignment, and vertical and horizontal cell padding.

If you ever need to display a command-line table and have written an HTML table in your life it should be very approachable with Picnic.

It's actually little more complicated than this. If none of the rows want to draw a border between the two cells in these columns then the border width will be zero and won't occupy any space. ↩

https://jakewharton.com/sixteen-corners

R8 Optimization: Lambda Groups

Apr 30, 2020 Updated Apr 30, 2020

Show full content

Note: This post is part of a series on D8 and R8, Android's new dexer and optimizer, respectively. For an intro to D8 read "Android's Java 8 support". For an intro to R8 read "R8 Optimization: Staticization".

Lambda usage in Kotlin feels more pervasive than Java because of the functional nature of the Kotlin standard library. Some lambdas are merely syntactic constructs that are eliminated at compile-time through the use of inline functions. The rest materialize into whole classes for use at runtime.

The mechanisms by which lambdas work was covered in the Android Java 8 support post, but here's a quick refresher:

javac hoists lambda bodies to a package-private method and writes an invoke-dynamic bytecode for the target lambda type at the call-site. The JVM spins a class at runtime of the desired type and invokes the package-private method in the method body. Android does not ship this runtime support, so D8 performs a compile-time transformation to a class which implements the desired type and which invokes the package-private method.
kotlinc skips the invoke-dynamic bytecode (even when targeting Java 8+) and generates full classes directly.

Here's two Kotlin classes and some lambda usage that we can experiment with.

class Employee(
  val id: String,
  val joined: LocalDate,
  val managerId: String?
)

class EmployeeRepository(val allEmployees: () -> Sequence<Employee>) {
  fun joinedAfter(date: LocalDate) =
      allEmployees()
          .filter { it.joined >= date }
          .toList()

  fun reports(manager: Employee) =
      allEmployees()
          .filter { it.managerId == manager.id }
          .toList()
}

The EmployeeRepository class accepts a lambda which produces a sequence of employees and exposes two functions for listing the employees who joined after a particular date and those who report to a particular employee. Both functions use a lambda to filter the sequence to the desired items before converting to a list.

Kotlin's approach to lambdas is immediately visible after compiling this class.

$ kotlinc EmployeeRepository.kt
$ ls *.class
Employee.class
EmployeeRepository.class
EmployeeRepository$joinedAfter$1.class
EmployeeRepository$reports$1.class

Each lambda has a unique name formed by joining the enclosing class name, enclosing function name, and a monotonic value.

Kotlin Lambdas and D8

To establish a baseline of what ends up in our APK, let's run these classfiles through D8.

$ java -jar $R8_HOME/build/libs/d8.jar \
      --lib $ANDROID_HOME/platforms/android-29/android.jar \
      --release \
      --output . \
      *.class

You can dump the whole output with dexdump -d classes.dex, but let's focus on the bodies of the joinedAfter and reports functions.

[000590] EmployeeRepository.joinedAfter:(Ljava/time/LocalDate;)Ljava/util/List;
0000: iget-object v0, v2, LEmployeeRepository;.allEmployees:Lkotlin/jvm/functions/Function0;
0002: invoke-interface {v0}, Lkotlin/jvm/functions/Function0;.invoke:()Ljava/lang/Object;
0005: move-result-object v0
0006: new-instance v1, LEmployeeRepository$joinedAfter$1;
0008: invoke-direct {v1, v3}, LEmployeeRepository$joinedAfter$1;.<init>:(Ljava/time/LocalDate;)V
000b: invoke-static {v0, v1}, Lkotlin/sequences/SequencesKt;.filter:(Lkotlin/sequences/Sequence;Lkotlin/jvm/functions/Function1;)Lkotlin/sequences/Sequence;
000e: move-result-object v0
000f: invoke-static {v0}, Lkotlin/sequences/SequencesKt;.toList:(Lkotlin/sequences/Sequence;)Ljava/util/List;
0012: move-result-object v0
0013: return-object v0

[0005dc] EmployeeRepository.reports:(LEmployee;)Ljava/util/List;
0000: iget-object v0, v2, LEmployeeRepository;.allEmployees:Lkotlin/jvm/functions/Function0;
0002: invoke-interface {v0}, Lkotlin/jvm/functions/Function0;.invoke:()Ljava/lang/Object;
0005: move-result-object v0
0006: new-instance v1, LEmployeeRepository$reports$1;
0008: invoke-direct {v1, v3}, LEmployeeRepository$reports$1;.<init>:(LEmployee;)V
000b: invoke-static {v0, v1}, Lkotlin/sequences/SequencesKt;.filter:(Lkotlin/sequences/Sequence;Lkotlin/jvm/functions/Function1;)Lkotlin/sequences/Sequence;
000e: move-result-object v0
000f: invoke-static {v0}, Lkotlin/sequences/SequencesKt;.toList:(Lkotlin/sequences/Sequence;)Ljava/util/List;
0012: move-result-object v0
0013: return-object v0

There's a lot going on here, but each function is almost identical so we can break both down at once:

0000-0005 gets the Sequence<Employee> by invoking the allEmployees lambda.
0006 creates an instance of the respective lambda class for each function.
0008 calls the lambda class constructor, passing in either the date or manager argument as the sole parameter.
000b-000e calls filter on the sequence passing in the lambda instance.
000f-0012 calls toList on the filtered sequence.
0013 returns the list.

If we looked at the lambda classes we would find each implementing the Function1 interface, having a field of type LocalDate or Employee, having a constructor which accepts a parameter and sets its field, and having an invoke method with the body of the lambda.

D8 performs a straightforward translation of the Java bytecode into the equivalent Dalvik bytecode. It's only when we break out R8 do interesting things start to happen.

Kotlin Lambdas and R8

Since we have no actual usage of these APIs, they need explicitly kept or R8 will produce an empty dex file.

-keep class Employee { *; }
-keep class EmployeeRepository { *; }
-dontobfuscate

With our two classes kept, let's run R8 and see what changes.

$ java -jar $R8_HOME/build/libs/r8.jar \
      --lib $ANDROID_HOME/platforms/android-29/android.jar \
      --release \
      --output . \
      --pg-conf rules.txt \
      *.class kotlin-stdlib-*.jar

We can see what has changed in the bodies of the joinedAfter and reports functions.

 [000dd4] EmployeeRepository.joinedAfter:(Ljava/time/LocalDate;)Ljava/util/List;
 0000: iget-object v0, v3, LEmployeeRepository;.allEmployees:Lkotlin/jvm/functions/Function0;
 0002: invoke-interface {v0}, Lkotlin/jvm/functions/Function0;.invoke:()Ljava/lang/Object;
 0005: move-result-object v0
-0006: new-instance v1, LEmployeeRepository$joinedAfter$1;
-0008: invoke-direct {v1, v3}, LEmployeeRepository$joinedAfter$1;.<init>:(Ljava/time/LocalDate;)V
+0006: new-instance v1, L-$$LambdaGroup$ks$D2r6uJKXMyXfodlTO7Kw1WcCloA;
+0008: const/4 v2, #int 0
+0009: invoke-direct {v1, v2, v4}, L-$$LambdaGroup$ks$D2r6uJKXMyXfodlTO7Kw1WcCloA;.<init>:(ILjava/lang/Object;)V
 000d: invoke-static {v0, v1}, Lkotlin/sequences/SequencesKt;.filter:(Lkotlin/sequences/Sequence;Lkotlin/jvm/functions/Function1;)Lkotlin/sequences/Sequence;
 0010: move-result-object v0
 0011: invoke-static {v0}, Lkotlin/sequences/SequencesKt;.toList:(Lkotlin/sequences/Sequence;)Ljava/util/List;
 0014: move-result-object v0
 0015: return-object v0

 [000e34] EmployeeRepository.reports:(LEmployee;)Ljava/util/List;
 0000: iget-object v0, v3, LEmployeeRepository;.allEmployees:Lkotlin/jvm/functions/Function0;
 0002: invoke-interface {v0}, Lkotlin/jvm/functions/Function0;.invoke:()Ljava/lang/Object;
 0005: move-result-object v0
-0006: new-instance v1, LEmployeeRepository$reports$1;
-0008: invoke-direct {v1, v3}, LEmployeeRepository$reports$1;.<init>:(LEmployee;)V
+0006: new-instance v1, L-$$LambdaGroup$ks$D2r6uJKXMyXfodlTO7Kw1WcCloA;
+0008: const/4 v2, #int 1
+0009: invoke-direct {v1, v2, v4}, L-$$LambdaGroup$ks$D2r6uJKXMyXfodlTO7Kw1WcCloA;.<init>:(ILjava/lang/Object;)V
 000d: invoke-static {v0, v1}, Lkotlin/sequences/SequencesKt;.filter:(Lkotlin/sequences/Sequence;Lkotlin/jvm/functions/Function1;)Lkotlin/sequences/Sequence;
 0010: move-result-object v0
 0011: invoke-static {v0}, Lkotlin/sequences/SequencesKt;.toList:(Lkotlin/sequences/Sequence;)Ljava/util/List;
 0014: move-result-object v0
 0015: return-object v0

Let's break down the new bytecode:

0006 creates an instance of a class named -$$LambdaGroup$ks$D2r6uJKXMyXfodlTO7Kw1WcCloA. And, notably, both functions are creating an instance of the same class now.
0008 stores an integer value of 0 for joinedAfter and 1 for reports.
0009 call the class constructor and passes the integer and either the date or manager (but as an Object).

Both functions are now instantiating the same class for their lambda. Let's peek at that class.

Class #15            -
  Class descriptor  : 'L-$$LambdaGroup$ks$D2r6uJKXMyXfodlTO7Kw1WcCloA;'
  Access flags      : 0x0011 (PUBLIC FINAL)
  Interfaces        -
    #0              : 'Lkotlin/jvm/functions/Function1;'
  Instance fields   -
    #0              : (in L-$$LambdaGroup$ks$D2r6uJKXMyXfodlTO7Kw1WcCloA;)
      name          : '$capture$0'
      type          : 'Ljava/lang/Object;'
      access        : 0x1011 (PUBLIC FINAL SYNTHETIC)
    #1              : (in L-$$LambdaGroup$ks$D2r6uJKXMyXfodlTO7Kw1WcCloA;)
      name          : '$id$'
      type          : 'I'
      access        : 0x1011 (PUBLIC FINAL SYNTHETIC)
  Direct methods    -
    #0              : (in L-$$LambdaGroup$ks$D2r6uJKXMyXfodlTO7Kw1WcCloA;)
      name          : '<init>'
      type          : '(ILjava/lang/Object;)V'
      access        : 0x10001 (PUBLIC CONSTRUCTOR)
      code          -
[000db0] -$$LambdaGroup$ks$D2r6uJKXMyXfodlTO7Kw1WcCloA.<init>:(ILjava/lang/Object;)V
0000: iput v1, v0, L-$$LambdaGroup$ks$D2r6uJKXMyXfodlTO7Kw1WcCloA;.$id$:I
0002: iput-object v2, v0, L-$$LambdaGroup$ks$D2r6uJKXMyXfodlTO7Kw1WcCloA;.$capture$0:Ljava/lang/Object;
0004: return-void

This output tells us that the class implements the Function1 interface, has two fields: an object and integer id, and has a constructor which accepts an object and integer and assigns the two fields.

Now let's look at the implementation of itsinvoke function.

[000d14] -$$LambdaGroup$ks$D2r6uJKXMyXfodlTO7Kw1WcCloA.invoke:(Ljava/lang/Object;)Ljava/lang/Object;
0000: iget v0, v4, L-$$LambdaGroup$ks$D2r6uJKXMyXfodlTO7Kw1WcCloA;.$id$:I
0002: iget-object v1, v4, L-$$LambdaGroup$ks$D2r6uJKXMyXfodlTO7Kw1WcCloA;.$capture$0:Ljava/lang/Object;
0004: if-eqz v0, 002c
0006: const/4 v2, #int 1
0007: if-ne v0, v2, 002a

000a: check-cast v1, LEmployee;
 ⋮
0029: return-object v5

002a: const/4 v5, #int 0
002b: throw v5

002c: check-cast v0, Ljava/time/LocalDate;
 ⋮
0044: return-object v5

I've trimmed quite a lot, but let's break it down:

0000 loads the integer id value from the field.
0002 loads the object value from the field.
0004 checks if the id is zero and if so jumps to 002c.
0006-0007 checks if the id is not one and if so jumps to 002a.
000a-0029 casts the object to Employee and runs the code from the reports lambda body. Remember, this codepath is taken if the previous comparison of id != 1 fails.
002a-002a causes a NullPointerException. Remember, this codepath is taken if id is not zero and not one.
002c-0044 casts the object to LocalDate and runs the code from the joinedAfter lambda body. Remember, this codepath is taken if id is zero.

It can be hard to follow exactly what this transformation means solely by looking at Dalvik bytecode. We can make the equivalent transformation in source code to illustrate it more clearly.

 class EmployeeRepository(val allEmployees: () -> Sequence<Employee>) {
   fun joinedAfter(date: LocalDate) =
       allEmployees()
-          .filter { it.joined >= date }
+          .fitler(MyLambdaGroup(date, 0))
           .toList()

   fun reports(manager: Employee) =
       allEmployees()
-          .filter { it.managerId == manager.id }
+          .filter(MyLambdaGroup(manager, 1))
           .toList()
 }
+
+private class MyLambdaGroup(
+  private val capture0: Any?,
+  private val id: Int
+) : (Employee) -> Boolean {
+  override fun invoke(employee: Employee): Boolean
+    return when (id) {
+      0 -> employee.joinedAfter >= (capture0 as LocalDate)
+      1 -> employee.managerId == (capture0 as Employee).id
+      else -> throw NullPointerException()
+    }
+  }
+}

The two lambdas which would have produced two classes have been replaced by a single class with an integer discriminator for its behavior. By merging the bodies of the lambdas, the number of classes in the APK can be reduced.

This only works because the two lambdas have the same shape. They do not need to be exactly the same as we can see in our example. One lambda captures a LocalDate but the other captures an Employee. Since both only capture a single value they have the same shape and can be merged into this single "lambda group" class.

Java Lambdas and R8

Let's rewrite our repository in Java and see what happens.

final class EmployeeRepository {
  private final Function0<Sequence<Employee>>allEmployees;

  EmployeeRepository(Function0<Sequence<Employee>> allEmployees) {
    this.allEmployees = allEmployees;
  }

  List<Employee> joinedAfter(LocalDate date) {
    return SequencesKt.toList(
      SequencesKt.filter(
          allEmployees.invoke(),
          e -> e.getJoined().compareTo(date) >= 0));
  }

  List<Employee> reports(Employee manager) {
    return SequencesKt.toList(
      SequencesKt.filter(
          allEmployees.invoke(),
          e -> Objects.equals(e.getManagerId(), manager.getId())));
  }
}

We're using Kotlin's Function0 instead of Supplier, Sequence instead of Stream, and sequence extensions as static helpers to keep the two examples as close to each other as possible. We can compile with javac and reuse the same R8 invocation.

$ rm EmployeeRepository*.class
$ javac -cp . EmployeeRepository.class
$ java -jar $R8_HOME/build/libs/r8.jar \
      --lib $ANDROID_HOME/platforms/android-29/android.jar \
      --release \
      --output . \
      --pg-conf rules.txt \
      *.class kotlin-stdlib-*.jar

The joinedAfter and reports function bodies should look the same as when they were written in Kotlin, right?

[000d2c] EmployeeRepository.joinedAfter:(Ljava/time/LocalDate;)Ljava/util/List;
 ⋮
0008: new-instance v1, L-$$Lambda$EmployeeRepository$RwNrgP_DBeZWqltgaXgoLCrPfqI;
000a: invoke-direct {v1, v4}, L-$$Lambda$EmployeeRepository$RwNrgP_DBeZWqltgaXgoLCrPfqI;.<init>:(Ljava/time/LocalDate;)V
 ⋮

[000d80] EmployeeRepository.reports:(LEmployee;)Ljava/util/List;
 ⋮
0008: new-instance v1, L-$$Lambda$EmployeeRepository$JjZ4a6TbrR3768PIUyNflFlLVF8;
000a: invoke-direct {v1, v4}, L-$$Lambda$EmployeeRepository$JjZ4a6TbrR3768PIUyNflFlLVF8;.<init>:(LEmployee;)V
 ⋮

They do not! Each implementation is calling into its own lambda class rather than using a lambda group.

As far as I can tell, there's no technical limitation as to why this would only work for Kotlin lambdas but not Java lambdas. The work just hasn't been done yet. Issue 153773246 tracks adding support for merging Java lambdas into lambda groups.

By merging lambdas of the same shape together, R8 reduces the APK size impact and runtime classloading burden at the expense of increasing the method body of the lambda.

While the optimization does run on the entire app, by default merging will only occur within a package. This ensures any package-private methods or types used in the lambda body are accessible. Add the -allowaccessmodification directive to your shrinker rules to enable R8 to globally merge lambdas by increasing the visibility of referenced methods and types when needed.

You may have noticed that the names of the classes generated for Java lambdas and lambda groups appear to have some kind of hash in them. In the next post we're going to dig into the unique naming of these classes.

https://jakewharton.com/r8-optimization-lambda-groups

Which is better on Android: divide by 2 or shift by 1?

Apr 23, 2020 Updated Apr 23, 2020

Show full content

I've been porting the AndroidX collection library to Kotlin multiplatform to experiment with binary compatibility, performance, tooling, and the different memory models. Some of the data structures in the library use array-based binary trees to store elements. The Java code has a lot of shifts to replace power-of-two multiplications and divides. When ported to Kotlin, these turn into the slightly-awkward infix operators which further obfuscate the intent of the code.

I sampled a few people about bitwise shifts vs. multiplication/division and many had heard anecdotal claims of shifts having better performance, but everyone remained skeptical of whether it was true. Some assumed that one of the compilers seen before the code ran on a CPU would handle optimizing this case.

In an effort to satisfy my curiosity (and partially to avoid Kotlin's infix bitwise operators) I set out to answer which is better and some other related questions. Let's go!

Does anyone optimize this?

There are three major compilers that code passes through before it hits the CPU: javac/kotlinc, D8/R8, and ART.

Each of these has the opportunity to optimize. But do they?

javac

class Example {
  static int multiply(int value) {
    return value * 2;
  }
  static int divide(int value) {
    return value / 2;
  }
  static int shiftLeft(int value) {
    return value << 1;
  }
  static int shiftRight(int value) {
    return value >> 1;
  }
}

This Java can be compiled with javac from JDK 14 and the resulting bytecode can be displayed with javap.

$ javac Example.java
$ javap -c Example
Compiled from "Example.java"
class Example {
  static int multiply(int);
    Code:
       0: iload_0
       1: iconst_2
       2: imul
       3: ireturn

  static int divide(int);
    Code:
       0: iload_0
       1: iconst_2
       2: idiv
       3: ireturn

  static int shiftLeft(int);
    Code:
       0: iload_0
       1: iconst_1
       2: ishl
       3: ireturn

  static int shiftRight(int);
    Code:
       0: iload_0
       1: iconst_1
       2: ishr
       3: ireturn
}

Every method starts with iload_0 which loads the first argument value. The multiply and divide methods both then have iconst_2 which loads the constant value 2. Each then runs imul or idiv to perform integer multiplication or integer division, respectively. The shift methods load the constant value 1 before ishl or ishr which is an integer shift left or integer shift right, respectively.

No optimization here, but if you know anything about Java this isn't unexpected. javac isn't an optimizing compiler and leaves the majority of the work to its runtime compilers on the JVM or ahead-of-time compilers.

kotlinc

fun multiply(value: Int) = value * 2
fun divide(value: Int) = value / 2
fun shiftLeft(value: Int) = value shl 1
fun shiftRight(value: Int) = value shr 1

The Kotlin is compiled to Java bytecode with kotlinc from Kotlin 1.4-M1 where the javap tool can once again be used.

$ kotlinc Example.kt
$ javap -c ExampleKt
Compiled from "Example.kt"
public final class ExampleKt {
  public static final int multiply(int);
    Code:
       0: iload_0
       1: iconst_2
       2: imul
       3: ireturn

  public static final int divide(int);
    Code:
       0: iload_0
       1: iconst_2
       2: idiv
       3: ireturn

  public static final int shiftLeft(int);
    Code:
       0: iload_0
       1: iconst_1
       2: ishl
       3: ireturn

  public static final int shiftRight(int);
    Code:
       0: iload_0
       1: iconst_1
       2: ishr
       3: ireturn
}

Exactly the same output as Java. This is using the original JVM backend of Kotlin, but using the forthcoming IR-based backend (via -Xuse-ir) also produces the same output.

We'll use the Java bytecode output from the Kotlin example as input to the latest D8 built from master (SHA 2a2bf622d at the time of writing).

$ java -jar $R8_HOME/build/libs/d8.jar \
      --release \
      --output . \
      ExampleKt.class
$ dexdump -d classes.dex
Opened 'classes.dex', DEX version '035'
Class #0            -
  Class descriptor  : 'LExampleKt;'
  Access flags      : 0x0011 (PUBLIC FINAL)
  Superclass        : 'Ljava/lang/Object;'
  Direct methods    -
    #0              : (in LExampleKt;)
      name          : 'divide'
      type          : '(I)I'
      access        : 0x0019 (PUBLIC STATIC FINAL)
      code          -
000118:                              |[000118] ExampleKt.divide:(I)I
000128: db00 0102                    |0000: div-int/lit8 v0, v1, #int 2 // #02
00012c: 0f00                         |0002: return v0

    #1              : (in LExampleKt;)
      name          : 'multiply'
      type          : '(I)I'
      access        : 0x0019 (PUBLIC STATIC FINAL)
      code          -
000130:                              |[000130] ExampleKt.multiply:(I)I
000140: da00 0102                    |0000: mul-int/lit8 v0, v1, #int 2 // #02
000144: 0f00                         |0002: return v0

    #2              : (in LExampleKt;)
      name          : 'shiftLeft'
      type          : '(I)I'
      access        : 0x0019 (PUBLIC STATIC FINAL)
      code          -
000148:                              |[000148] ExampleKt.shiftLeft:(I)I
000158: e000 0101                    |0000: shl-int/lit8 v0, v1, #int 1 // #01
00015c: 0f00                         |0002: return v0

    #3              : (in LExampleKt;)
      name          : 'shiftRight'
      type          : '(I)I'
      access        : 0x0019 (PUBLIC STATIC FINAL)
      code          -
000160:                              |[000160] ExampleKt.shiftRight:(I)I
000170: e100 0101                    |0000: shr-int/lit8 v0, v1, #int 1 // #01
000174: 0f00                         |0002: return v0

(Note: output slightly trimmed)

Dalvik bytecode is register-based instead of stack-based like Java bytecode. As a result, each method only has one real bytecode which does the associated integer operation. Each uses the v1 register which will be the first argument value and an integer literal of 2 or 1.

So no change behavior, but D8 isn't an optimizing compiler (although it can do method-local optimization).

To run R8 we need to define a rule in order to keep our methods from being removed.

-keep,allowoptimization class ExampleKt {
  <methods>;
}

The rules are passed with --pg-conf and we also supply the Android APIs to link against using --lib.

$ java -jar $R8_HOME/build/libs/r8.jar \
      --lib $ANDROID_HOME/platforms/android-29/android.jar \
      --release \
      --pg-conf rules.txt \
      --output . \
      ExampleKt.class
$ dexdump -d classes.dex
Opened 'classes.dex', DEX version '035'
Class #0            -
  Class descriptor  : 'LExampleKt;'
  Access flags      : 0x0011 (PUBLIC FINAL)
  Superclass        : 'Ljava/lang/Object;'
  Direct methods    -
    #0              : (in LExampleKt;)
      name          : 'divide'
      type          : '(I)I'
      access        : 0x0019 (PUBLIC STATIC FINAL)
      code          -
000118:                              |[000118] ExampleKt.divide:(I)I
000128: db00 0102                    |0000: div-int/lit8 v0, v1, #int 2 // #02
00012c: 0f00                         |0002: return v0

    #1              : (in LExampleKt;)
      name          : 'multiply'
      type          : '(I)I'
      access        : 0x0019 (PUBLIC STATIC FINAL)
      code          -
000130:                              |[000130] ExampleKt.multiply:(I)I
000140: da00 0102                    |0000: mul-int/lit8 v0, v1, #int 2 // #02
000144: 0f00                         |0002: return v0

    #2              : (in LExampleKt;)
      name          : 'shiftLeft'
      type          : '(I)I'
      access        : 0x0019 (PUBLIC STATIC FINAL)
      code          -
000148:                              |[000148] ExampleKt.shiftLeft:(I)I
000158: e000 0101                    |0000: shl-int/lit8 v0, v1, #int 1 // #01
00015c: 0f00                         |0002: return v0

    #3              : (in LExampleKt;)
      name          : 'shiftRight'
      type          : '(I)I'
      access        : 0x0019 (PUBLIC STATIC FINAL)
      code          -
000160:                              |[000160] ExampleKt.shiftRight:(I)I
000170: e100 0101                    |0000: shr-int/lit8 v0, v1, #int 1 // #01
000174: 0f00                         |0002: return v0

Same exact output as D8.

ART

We'll use the Dalvik bytecode output from the R8 example as the input to ART running on Android 10 on an x86 emulator.

$ adb push classes.dex /sdcard/classes.dex
$ adb shell
generic_x86:/ $ su
generic_x86:/ # dex2oat --dex-file=/sdcard/classes.dex --oat-file=/sdcard/classes.oat
generic_x86:/ # oatdump --oat-file=/sdcard/classes.oat
OatDexFile:
0: LExampleKt; (offset=0x000003c0) (type_idx=1) (Initialized) (OatClassAllCompiled)
  0: int ExampleKt.divide(int) (dex_method_idx=0)
    CODE: (code_offset=0x00001010 size_offset=0x0000100c size=15)...
      0x00001010:     89C8      mov eax, ecx
      0x00001012:   8D5001      lea edx, [eax + 1]
      0x00001015:     85C0      test eax, eax
      0x00001017:   0F4DD0      cmovnl/ge edx, eax
      0x0000101a:     D1FA      sar edx
      0x0000101c:     89D0      mov eax, edx
      0x0000101e:       C3      ret
  1: int ExampleKt.multiply(int) (dex_method_idx=1)
    CODE: (code_offset=0x00001030 size_offset=0x0000102c size=5)...
      0x00001030:     D1E1      shl ecx
      0x00001032:     89C8      mov eax, ecx
      0x00001034:       C3      ret
  2: int ExampleKt.shiftLeft(int) (dex_method_idx=2)
    CODE: (code_offset=0x00001030 size_offset=0x0000102c size=5)...
      0x00001030:     D1E1      shl ecx
      0x00001032:     89C8      mov eax, ecx
      0x00001034:       C3      ret
  3: int ExampleKt.shiftRight(int) (dex_method_idx=3)
    CODE: (code_offset=0x00001040 size_offset=0x0000103c size=5)...
      0x00001040:     D1F9      sar ecx
      0x00001042:     89C8      mov eax, ecx
      0x00001044:       C3      ret

(Note: output significantly trimmed)

The x86 assembly reveals that ART has indeed stepped in and normalized the arithmetic operations to use shifts!

First, multiply and shiftLeft now have the exact same implementation. They both use shl for a left bitwise shift of 1. Beyond this, if you look at the offsets in the file (the leftmost column), they are actually the same. ART has recognized these functions have the same body when compiled into x86 assembly and has de-duplicated them.

Next, while divide and shiftRight are not the same, they do share the use of sar for a right bitwise shift of 1. The four additional instructions in divide that precede sar handle the case when the input is negative by adding 1 to the value1.

Running the same commands on a Pixel 4 running Android 10 shows how ART compiles this code to ARM assembly2.

OatDexFile:
0: LExampleKt; (offset=0x000005a4) (type_idx=1) (Verified) (OatClassAllCompiled)
  0: int ExampleKt.divide(int) (dex_method_idx=0)
    CODE: (code_offset=0x00001009 size_offset=0x00001004 size=10)...
      0x00001008: 0fc8      lsrs r0, r1, #31
      0x0000100a: 1841      adds r1, r0, r1
      0x0000100c: 1049      asrs r1, #1
      0x0000100e: 4608      mov r0, r1
      0x00001010: 4770      bx lr
  1: int ExampleKt.multiply(int) (dex_method_idx=1)
    CODE: (code_offset=0x00001021 size_offset=0x0000101c size=4)...
      0x00001020: 0048      lsls r0, r1, #1
      0x00001022: 4770      bx lr
  2: int ExampleKt.shiftLeft(int) (dex_method_idx=2)
    CODE: (code_offset=0x00001021 size_offset=0x0000101c size=4)...
      0x00001020: 0048      lsls r0, r1, #1
      0x00001022: 4770      bx lr
  3: int ExampleKt.shiftRight(int) (dex_method_idx=3)
    CODE: (code_offset=0x00001031 size_offset=0x0000102c size=4)...
      0x00001030: 1048      asrs r0, r1, #1
      0x00001032: 4770      bx lr

Once again multiply and shiftLeft both use lsls for a left shift and were de-duplicated and shiftRight uses asrs for a right shift. divide is also using asrs for its right shift, but it uses another right shift, lsrs, to handle adding 1 for negative values3.

With this we can now definitively say that replacing value * 2 with value << 1 offers no benefit. Stop doing it for arithmetic operations and reserve it only for strictly bitwise things!

However, value / 2 and value >> 1 still produce different assembly instructions and thus presumably have different performance characteristics. Thankfully, doing value / 2 avoids using generic division and is still primarily based on right shift, so they're likely not that far apart in terms of performance.

Is shift faster than division?

To determine whether a divide or shift is faster we can use the Jetpack benchmark library.

class DivideOrShiftTest {
  @JvmField @Rule val benchmark = BenchmarkRule()

  @Test fun divide() {
    val value = "4".toInt() // Ensure not a constant.
    var result = 0
    benchmark.measureRepeated {
      result = value / 2
    }
    println(result) // Ensure D8 keeps computation.
  }

  @Test fun shift() {
    val value = "4".toInt() // Ensure not a constant.
    var result = 0
    benchmark.measureRepeated {
      result = value shr 1
    }
    println(result) // Ensure D8 keeps computation.
  }
}

I don't have any x86 devices but I do have an ARM-based Pixel 3 running Android 10. Here are the results:

android.studio.display.benchmark=4 ns DivideOrShiftTest.divide
count=4006
mean=4
median=4
min=4
standardDeviation=0

android.studio.display.benchmark=3 ns DivideOrShiftTest.shift
count=3943
mean=3
median=3
min=3
standardDeviation=0

There's effectively zero difference between using division versus a shift with numbers this small. Those are nanoseconds, after all. Using a negative number shows no difference in the result.

With this we can now definitely say that replacing value / 2 with value >> 1 offers no benefit. Stop doing it for arithmetic operations and reserve it only for strictly bitwise things!

Can D8/R8 use this information to save APK size?

Given two different ways to express the same operations we should choose the one that has the better performance. But if both have the same performance, we should choose whichever results in a smaller APK size.

We know that value * 2 and value << 1 produce the same assembly from ART. Thus, if one is more space-efficient than the other in Dalvik bytecode we should unconditionally rewrite it into the smaller form. Looking at the output from D8 these produce the same size bytecode:

    #1              : (in LExampleKt;)
      name          : 'multiply'
      ⋮
000140: da00 0102                    |0000: mul-int/lit8 v0, v1, #int 2 // #02

    #2              : (in LExampleKt;)
      name          : 'shiftLeft'
      ⋮
000158: e000 0101                    |0000: shl-int/lit8 v0, v1, #int 1 // #01

While there are no gains to be had for this power of 2, the multiplication runs out of bytecode space before the shift for storing the literal value. Here's value * 32_768 compared to value << 15:

    #1              : (in LExampleKt;)
      name          : 'multiply'
      ⋮
000128: 1400 0080 0000               |0000: const v0, #float 0.000000 // #00008000
00012e: 9201 0100                    |0003: mul-int v1, v1, v0

    #2              : (in LExampleKt;)
      name          : 'shiftLeft'
      ⋮
00015c: e000 000f                    |0000: shl-int/lit8 v0, v0, #int 15 // #0f

I have filed an issue on D8 to investigate optimizing this automatically, but I strongly suspect the cases where it applies to be near zero so it's likely not worthwhile.

The output of D8 and R8 also tell us that value / 2 and value >> 1 cost the same in terms of Dalvik bytecode.

    #0              : (in LExampleKt;)
      name          : 'divide'
      ⋮
000128: db00 0102                    |0000: div-int/lit8 v0, v1, #int 2 // #02

    #2              : (in LExampleKt;)
      name          : 'shiftLeft'
      ⋮
000158: e000 0101                    |0000: shl-int/lit8 v0, v1, #int 1 // #01

These will also diverge in bytecode size when the literal reaches 32,768. Unconditionally replacing a power-of-two division with a right shift is never safe because of the behavior around negatives. We could do the replacement if the value was guaranteed to be non-negative, but D8 and R8 do not track the possible ranges of integer values at this time.

Does unsigned number power-of-two division use shift?

Java bytecode lacks unsigned numbers, but you can emulate them by using the signed counterparts. In Java there are static helper methods for operating on signed types as unsigned values. Kotlin offers types like UInt which does similar things but completely abstracted behind a type. It's conceivable then that when using division by a power-of-two that it could be rewritten as a shift.

We can use Kotlin to model both of these cases.

fun javaLike(value: Int) = Integer.divideUnsigned(value, 2)
fun kotlinLike(value: UInt) = value / 2U

There's a few cases to look at just with how the code is compiled. We'll start with plain kotlinc (again with Kotlin 1.4-M1).

$ kotlinc Example.kt
$ javap -c ExampleKt
Compiled from "Example.kt"
public final class ExampleKt {
  public static final int javaLike(int);
    Code:
       0: iload_0
       1: iconst_2
       2: invokestatic  #12       // Method java/lang/Integer.divideUnsigned:(II)I
       5: ireturn

  public static final int kotlinLike-WZ4Q5Ns(int);
    Code:
       0: iload_0
       1: istore_1
       2: iconst_2
       3: istore_2
       4: iconst_0
       5: istore_3
       6: iload_1
       7: iload_2
       8: invokestatic  #20       // Method kotlin/UnsignedKt."uintDivide-J1ME1BU":(II)I
      11: ireturn
}

Kotlin does not recognize this as a power-of-two division where it could use the iushr bytecode. I've filed KT-38493 to track adding this behavior.

Using -Xuse-ir doesn't change anything (except remove some of the load/store noise). However, targeting Java 8 does.

$ kotlinc -jvm-target 1.8 Example.kt
$ javap -c ExampleKt
Compiled from "Example.kt"
public final class ExampleKt {
  public static final int javaLike(int);
    Code:
       0: iload_0
       1: iconst_2
       2: invokestatic  #12       // Method java/lang/Integer.divideUnsigned:(II)I
       5: ireturn

  public static final int kotlinLike-WZ4Q5Ns(int);
    Code:
       0: iload_0
       1: iconst_2
       2: invokestatic  #12       // Method java/lang/Integer.divideUnsigned:(II)I
       5: ireturn
}

The Integer.divideUnsigned method is available as of Java 8 so it's prefered when targeting 1.8 or newer. Since this makes both function bodies identical, let's revert back to the old output just to see what happens with it in comparison.

Next up is R8. Notably different from when it was invoked above is that we include the Kotlin stdlib as an input and we also pass --min-api 24 since Integer.divideUnsigned is only available on API 24 and newer.

$ java -jar $R8_HOME/build/libs/r8.jar \
      --lib $ANDROID_HOME/platforms/android-29/android.jar \
      --min-api 24 \
      --release \
      --pg-conf rules.txt \
      --output . \
      ExampleKt.class kotlin-stdlib.jar
$ dexdump -d classes.dex
Opened 'classes.dex', DEX version '039'
Class #0            -
  Class descriptor  : 'LExampleKt;'
  Access flags      : 0x0011 (PUBLIC FINAL)
  Superclass        : 'Ljava/lang/Object;'
  Direct methods    -
    #0              : (in LExampleKt;)
      name          : 'javaLike'
      type          : '(I)I'
      access        : 0x0019 (PUBLIC STATIC FINAL)
      code          -
0000f8:                              |[0000f8] ExampleKt.javaLike:(I)I
000108: 1220                         |0000: const/4 v0, #int 2 // #2
00010a: 7120 0200 0100               |0001: invoke-static {v1, v0}, Ljava/lang/Integer;.divideUnsigned:(II)I // method@0002
000110: 0a01                         |0004: move-result v1
000112: 0f01                         |0005: return v1

    #1              : (in LExampleKt;)
      name          : 'kotlinLike-WZ4Q5Ns'
      type          : '(I)I'
      access        : 0x0019 (PUBLIC STATIC FINAL)
      code          -
000114:                              |[000114] ExampleKt.kotlinLike-WZ4Q5Ns:(I)I
000124: 8160                         |0000: int-to-long v0, v6
000126: 1802 ffff ffff 0000 0000     |0001: const-wide v2, #double 0.000000 // #00000000ffffffff
000130: c020                         |0006: and-long/2addr v0, v2
000132: 1226                         |0007: const/4 v6, #int 2 // #2
000134: 8164                         |0008: int-to-long v4, v6
000136: c042                         |0009: and-long/2addr v2, v4
000138: be20                         |000a: div-long/2addr v0, v2
00013a: 8406                         |000b: long-to-int v6, v0
00013c: 0f06                         |000c: return v6

Kotlin has its own unsigned integer division implementation which was inlined into our function. It converts the input argument and the literal to longs, performs long division, and then converts back to int. When we eventually run them through ART they're just translated to equivalent x86 so we're going to leave this function behind. The opportunity for optimization here was already missed.

For the Java version, R8 failed to replace the divideUnsigned call with a shift. I've filed issue 154712996 to track this for D8 and R8.

The last opportunity to optimize this case is ART.

$ adb push classes.dex /sdcard/classes.dex
$ adb shell
generic_x86:/ $ su
generic_x86:/ # dex2oat --dex-file=/sdcard/classes.dex --oat-file=/sdcard/classes.oat
generic_x86:/ # oatdump --oat-file=/sdcard/classes.oat
OatDexFile:
0: LExampleKt; (offset=0x000003c0) (type_idx=1) (Initialized) (OatClassAllCompiled)
  0: int ExampleKt.javaLike(int) (dex_method_idx=0)
    CODE: (code_offset=0x00001010 size_offset=0x0000100c size=63)...
      0x00001010:         85842400E0FFFF             test eax, [esp + -8192]
        StackMap[0] (native_pc=0x1017, dex_pc=0x0, register_mask=0x0, stack_mask=0b)
      0x00001017:                     55             push ebp
      0x00001018:                 83EC18             sub esp, 24
      0x0000101b:                 890424             mov [esp], eax
      0x0000101e:     6466833D0000000000             cmpw fs:[0x0], 0  ; state_and_flags
      0x00001027:           0F8519000000             jnz/ne +25 (0x00001046)
      0x0000102d:             E800000000             call +0 (0x00001032)
      0x00001032:                     5D             pop ebp
      0x00001033:             BA02000000             mov edx, 2
      0x00001038:           8B85CE0F0000             mov eax, [ebp + 4046]
      0x0000103e:                 FF5018             call [eax + 24]
        StackMap[1] (native_pc=0x1041, dex_pc=0x1, register_mask=0x0, stack_mask=0b)
      0x00001041:                 83C418             add esp, 24
      0x00001044:                     5D             pop ebp
      0x00001045:                     C3             ret
      0x00001046:         64FF15E0020000             call fs:[0x2e0]  ; pTestSuspend
        StackMap[2] (native_pc=0x104d, dex_pc=0x0, register_mask=0x0, stack_mask=0b)
      0x0000104d:                   EBDE             jmp -34 (0x0000102d)
  1: int ExampleKt.kotlinLike-WZ4Q5Ns(int) (dex_method_idx=1)
    CODE: (code_offset=0x00001060 size_offset=0x0000105c size=67)...
      ⋮

ART does not intrinsify calls to divideUnsigned so instead we get the machinery to jump to the regular method implementation. I filed issue 154693569 to track adding the ART intrinsics for unsigned divide.

Well that certainly was a journey. Congrats if you made it this far (or just scrolled to the bottom). Let's summarize:

ART rewrites power-of-two multiplication to left shift and power-of-two division to right shift (with a few extra instructions to handle negatives).
There is no observable performance difference between a right shift and power-of-two division.
There is no size difference in Dalvik bytecode between shifts and multiply/divide.
Nobody optimizes unsigned division (yet), but you're probably not using it anyway.

With these facts we can answer the title of this post:

Which is better on Android: divide by 2 or shift by 1?

Neither! So use division for arithmetic and only use shifts for actual bitwise operations. I'll be switching the AndroidX collection port from shifts to multiply and divide. See you next time.

-3 in binary is 0b11111101. If we attempt to divide by 2 by solely performing the right shift the result is 0b11111110 which is -2, an incorrect result. By adding 1 to -3 first we get -2 which in binary is 0b11111110. Shifted right we get 0b11111111 which is -1, the correct result.

In terms of the actual instructions:
- mov eax, ecx saves the original input argument value in eax.
- lea edx, [eax + 1] adds 1 to the input argument and stores the result in edx, the register we will be shifting.
- test eax, eax does a bitwise AND of the input argument against itself which results in a few registers being set based on properties of the input argument.
- cmovnl/ge edx, eax then maybe overwrites edx (value+1) with eax (value) based on the result of the test.
From there the instructions do a normal right shift. This is basically equivalent to (value < 0 ? value + 1 : value) >> 1. ↩
Thanks to Sergey Vasilinets for providing this. dex2oat can only be run as root on modern Android versions so a normal Android install such as on my Pixel 3 can't run it. ↩
In terms of the actual instructions:
- lsrs r0, r1, #31 does a logical (i.e., not sign-extending) shift of the input argument by 31 bits into r0. This results in 1 for negative numbers and 0 for positive numbers.
- adds r1, r0, r1 adds the result of the previous instruction to the input argument, effectively adding 1 to negative inputs.
From there the instructions do a normal right shift. This is basically equivalent to (value + (value >>> 31)) >> 1. ↩

https://jakewharton.com/which-is-better-on-android-divide-by-two-or-shift-by-one

Simple Multiplatform RPC with Kotlin Serialization

Apr 15, 2020 Updated Apr 15, 2020

Show full content

I recently played a minor role in helping add Cast support to an Android app. Both the Android app and Cast display are written in Kotlin. The Android Cast SDK relays JSON strings to the JavaScript SDK which invokes your callback with the deserialized equivalent as a JS object. A multiplatform library holds the model objects so that they can be shared between Android and JS.

class Game(
  val players: Array<Player>
)
class Player(
  val name: String,
  val color: String,
  val scores: Array<Int>
)

Moshi serializes the models to JSON in the Android app.

val game = Game(arrayOf(
  Player("Jesse", "#ff0000", arrayOf(1, 2, 3)),
  Player("Matt", "#ff00ff", arrayOf(3, 0, 2))
))

val gameAdapter = moshi.adapter(Game::class.java)
val gameJson = gameAdapter.toJson(game)
// {"players":[{"name":"Jesse",...},{"name":"Matt",...}]}

castSdk.send(gameJson)

The Cast app receives the deserialized JS object and interprets it as being of the same type.

castSdk.addCustomMessageListener { message ->
  val game = message.data.unsafeCast<Game>()
  ui.render(game)
}

This works but imposes some severe limitations. The model objects can only use collections available natively to JS which means Arrays instead of Lists. Custom serialization is also not supported because the JSON to JS object conversion was happening outside the library.

It was clear this setup wasn't going to work long-term.

Kotlin Serialization

kotlinx.serialization is Kotlin's multiplatform, reflection-free, format-agnostic serialization library. Its compiler plugin generates code for types which are annotated as @Serializable.

+@Serializable
 class Game(
   val players: Array<Player>
 )
+@Serializable
 class Player(
   val name: String,

Updating the Android app requires specifying that we're using the JSON format and supplying a reference to the generated serializer.

-val gameAdapter = moshi.adapter(Game::class.java)
-val gameJson = gameAdapter.toJson(game)
+val gameJson = Json.stringify(Game.serializer(), game)
 // {"players":[{"name":"Jesse",...},{"name":"Matt",...}]}

 castSdk.send(gameJson)

Normally in this situation, changing the serialization library would only affect the Android app since the Cast SDK internally parses JSON to JS objects. However, kotlinx.serialization has the unique feature of being able to "parse" a JS object.

+val objectParser = DynamicObjectParser()
 castSdk.addCustomMessageListener { message ->
-  val game = message.data.unsafeCast<Game>()
+  val game = objectParser.parse(message.data, Game.serializer())
   ui.render(game)
 }

This walks the object properties as if it were JSON and passes them through the serializer. Now we can use all of the features of the library from custom serializers to simple things like using a List.

 @Serializable
 class Game(
-   val players: Array<Player>
+   val players: List<Player>
 )
 @Serializable
 class Player(
   val name: String,
   val color: String,
-  val scores: Array<Int>
+  val scores: List<Int>
 )

This future-proofed the app to ensure that its models could continue to be shared even as they grew in complexity. And they were about to.

Simple RPCs

The Cast app started as a stateless rendering of the game model but it lacked some of the Android app's flair. Instead of sending only the bare model, the Android app was changed to send an event. This allowed showing animations on the Cast display after an action. Each event contained a copy of the game model as well as any other information about the event.

@Serializable
data class PlayerAdded(
  val game: Game,
  val player: Player
)

@Serializable
data class SpinTheBottle(
  val game: Game,
  val winner: Int
)

The type will determine the behavior of the Cast app in response to these events.

when (event) {
  is PlayerAdded -> { .. }
  is SpinTheBottle -> { .. }
}

Unfortunately this does not work as-is. When serialized, the root JSON object contains only the properties of the object and not which specific type was serialized.

{"game":{ /*..*/ },"winner":1}

You can try to infer the type from which properties are present but it's a brittle setup.

This is generally solved by using something called "polymorphic serialization" which uses some kind of marker to encode which type was serialized. In kotlinx.serialization 0.14.0, the compiler automatically enables polymorphic serialization for Kotlin sealed hierarchies so it's an obvious choice.

+@Serializable
+sealed class GameEvent {
+  abstract val game: Game
+}

 @Serializable
 data class PlayerAdded(
-  val game: Game,
+  override val game: Game,
   val player: Player
-)
+) : GameEvent()

 @Serializable
 data class SpinTheBottle(
-  val game: Game,
+  override val game: Game,
   val winner: Int
-)
+) : GameEvent()

The JSON will now include a discriminator, a string identifying which type was used, so that the deserialization code picks the corresponding type on the other side. By default the library uses array-based discriminators (but you could elect to add a property to the object itself).

["com.example.model.SpinTheBottle",{"game":{ /*..*/ },"winner":1}]

Additionally, by using a sealed class, Kotlin can now enforce that a when on the event types is exhaustive1.

kotlinx.serialization 0.20.0 added support for polymorphic serialization in DynamicObjectParser allowing the Cast app to take advantage of it.

 val objectParser = DynamicObjectParser()
 castSdk.addCustomMessageListener { message ->
-  val game = objectParser.parse(message.data, Game.serializer())
+  val event = objectParser.parse(message.data, GameEvent.serializer())
+  val game = event.game
   ui.render(game)
+  when (event) {
+    is PlayerAdded -> { .. }
+    is SpinTheBottle -> { .. }
+  }
 }

This setup creates a pretty robust unidirectional RPC system for the Android app to talk to the Cast display. The build will fail if you forget to handle a new event on the Cast side. The sending code and transport don't need updated for new events since it's all based on the GameEvent supertype.

With the Cast SDK imposing JSON and automatic deserialization to JS objects, the feature set of Kotlin serialization fits right in. It allows maximizing code reuse without imposing too much complexity. And, granted, it's just about the most basic RPC system you could build, but it serves the app well. Supporting requirements like associated responses and bidirectional streaming is better left to more heavyweight systems like gRPC.

Note: The snippet with this code is not set up to be exhaustive for simplicity. ↩

https://jakewharton.com/simple-multiplatform-rpc-with-kotlin-serialization

Litmus-Testing Kotlin's Many Memory Models

Apr 8, 2020 Updated Apr 8, 2020

Show full content

When writing multiplatform code, Kotlin's three compiler backends each have different memory models which must be considered.

JavaScript is single-threaded so you really can do no wrong. The JVM model is arguably too permissive where you can do incorrect things and have them work 99.9% of the time. When targeting native, Kotlin enforces some invariants which helps prevent you from those 0.1% bugs that crop up in the JVM.

I've been porting the AndroidX collection library to Kotlin multiplatform to experiment with binary compatibility, performance, tooling, and the different memory models. The library consists of mutable, single-threaded data structures. This should mean the different memory models never come into play. But weirdly they do, and let's look at how.

On Deck

The Kotlin standard library contains general-purpose collections like lists, sets, and maps in both mutable and read-only form. Kotlin 1.3.70 added another collection, ArrayDeque, a "double-ended queue" for efficient stacks and queues.

During the 1.3.70 EAP, Kevin Galligan opened an issue where ArrayDeque could only be instantiated on the main thread and not a background thread when targeting Kotlin/Native. At the time I didn't read into it, but as I was porting these collections it came to mind.

The underlying cause was that the implementation relied on a top-level val for a shared, empty array when the collection was empty. Arrays are fixed-length, so an empty array is effectively immutable and thus can be shared by all empty collections. But that seems fine?

It is fine for Kotlin/JS and Kotlin/JVM but Kotlin/Native is different here. By default, Kotlin/Native only allows the main thread to access top-level vals. If you want to access the value from multiple threads (potentially concurrently) you must choose whether you want thread-local or shared-but-immutable behavior with an annotation. ArrayDeque's empty array was missing this annotation.

As it turns out, my collections had the exact same issue! Each started with a shared, empty array and only allocated its own storage when the first element arrived. I had tests, but the tests were only exercising the type on the main thread. It's an easy fix, just add @SharedImmutable, but how do I prevent regression and future problems of this nature?

Testing Threads

Since Kotlin/Native enforces different semantics between its main thread and background threads, it's only logical to run the tests once on the main thread and once on a background thread to ensure compliance.

If our test is written solely for Kotlin/Native this is pretty easy. The native version of the standard library has a Worker API for running on a background thread.

fun threadedTest(body: () -> Unit) {
  body()

  body.freeze()
  val worker = Worker.start()
  val future = worker.execute(SAFE, { body }) {
    runCatching(it)
  }
  future.result.getOrThrow()
}

This function accepts a lambda which it runs synchronously (which will be on the main thread) and then transfers that lambda to a background thread where it's run a second time. The main thread blocks on the result of the background thread where it rethrows any exceptions that occurred.

Each test case is updated to put its body inside a call to this function.

-@Test fun isEmpty() {
+@Test fun isEmpty() = threadedTest {
   val map = ArrayMap()
   assertTrue(map.isEmpty())
 }

Running without @SharedImmutable now causes the test to correctly fail. Say goodbye to an entire class of Kotlin/Native bugs!

Multiplatform

For multiplatform libraries, like my collection library, the tests are written in platform-agnostic "common" Kotlin with no access to the Kotlin/Native-specific Worker API. We can instead rely on the expect/actual language feature of multiplatform Kotlin to make this work.

In src/commonTest/kotlin/ the threadedTest function is declared as an expect fun:

expect fun threadedTest(body: () -> Unit)

The native-specific implementation is put in src/nativeTest/kotlin/:

actual fun threadedTest(body: () -> Unit) {
  // Same as Kotlin/Native code from previous section.
}

For JavaScript in src/jsTest/kotlin/ we don't need threading so its implementation just inlines itself away.

actual inline fun threadedTest(body: () -> Unit) = body()

For the JVM in src/jvmTest/kotlin/ you're free to either inline it away like JavaScript or use the Thread APIs to invoke body twice. Since the memory models of the JVM and Android give no special treatment to the main thread there's really no reason to run it twice.

Now our test from the previous section can live in src/commonTest/kotlin/ and wrap itself in threadedTest. On JS and JVM the test will run normally and only on native targets will it run twice.

The memory model of Kotlin/Native helps eliminate bugs that would probabilistically occur on more permissive platforms like the JVM. With the constraints of its memory model being runtime checked, running your unit tests on both the main thread and a background thread prevent bugs like the one which occurred with ArrayDeque.

I filed an issue on the Kotlin/Native repo asking for some kind of built-in mechanism to support this use case. And ideally it would be something that you could apply to a whole class rather than having to remember to do it for each function.

https://jakewharton.com/litmus-testing-kotlins-many-memory-models

D8 Optimization: Assertions

Mar 25, 2020 Updated Mar 25, 2020

Show full content

Note: This post is part of a series on D8 and R8, Android's new dexer and optimizer, respectively. For an intro to D8 read "Android's Java 8 support". For an intro to R8 read "R8 Optimization: Staticization".

The assert keyword is quirky Java language syntax used for testing invariants. That is: things you expect to always be true.

Its syntax has two forms:

assert <bool-expression>;
assert <bool-expression> : <expression>;

The first expression will only be evaluated at runtime if the -ea (enable assertions) flag is set on the JVM. The second expression, if present, is used as the argument to the AssertionError constructor that's thrown if the first expression returns false.

As an Android developer you might not be too familiar with assert. This is because every Android app runs on a VM which is forked from a shared "zygote" process which has assertions disabled. Thus, even if you put an assert in your code, there is no way to actually enable it.

So why bother talking about it? Well it turns out they're about to become useful on Android for the first time!

Today's behavior

assert statements guard things which must always be true in order for your program to execute correctly. Let's write one.

class IdGenerator {
  private int id = 0;

  int next() {
    assert Thread.currentThread() == Looper.getMainLooper().getThread();
    return id++;
  }
}

This class creates unique IDs and guarantees they're unique by only allowing calls from the main thread. If this class was called concurrently from multiple threads you might see duplicate values. Sure it's a little contrived and there's things like @MainThread which is checked by Lint but we're focusing on assert so roll with it.

The Null Data Flow Analysis post introduced the SSA form that R8 uses to eliminate branches of code which it can prove will never be executed. The SSA for the next() method when parsed from Java bytecode looks very roughly like this:

D8 knows that Android does not support Java assertions. It will remove the check and replace it with false allowing dead-code elimination to occur. This propagates to the nodes which can only be taken when it returns true.

As a result, the boolean expression and optional message expression are entirely eliminated from the bytecode. Only the field read, field increment, and return remain.

We can confirm this by sending the Java source through the compilation pipeline:

$ javac -bootclasspath $ANDROID_HOME/platforms/android-29/android.jar IdGenerator.java
$ java -jar $R8_HOME/build/libs/d8.jar \
      --lib $ANDROID_HOME/platforms/android-29/android.jar \
      --output . \
      IdGenerator.class
$ dexdump -d classes.dex
 ⋮
[00011c] IdGenerator.next:()I
0000: iget v0, v2, LIdGenerator;.id:I
0002: add-int/lit8 v1, v0, #int 1
0004: iput v1, v2, LIdGenerator;.id:I
0006: return v0
 ⋮

Eliminating a runtime check which always returns false is an easy win, but the SSA form means that we eliminate the bytecode for both expressions of the assert statement including any intermediate values they rely on.

Tomorrow's behavior

The version of D8 in AGP 4.1 slightly changes the thinking around Java assert. Instead of assuming that the runtime check will always fail at runtime (which it still does), it computes the check at compile-time based on whether your build is debuggable.

In practice, this means that any debug variant will replace the assertions-enabled check at compile-time with true.

This eliminates the enabled check but retains the invariant check.

Sending IdGenerator through D8 with the --force-enable-assertions flag that AGP automatically adds for debug variants shows this in Dalvik bytecode:

 $ java -jar $R8_HOME/r8/build/libs/d8.jar \
       --lib $ANDROID_HOME/platforms/android-29/android.jar \
+      --force-enable-assertions \
       --output . \
       IdGenerator.class
 $ dexdump -d classes.dex
  ⋮
 [000190] IdGenerator.next:()I
+0000: invoke-static {}, Ljava/lang/Thread;.currentThread:()Ljava/lang/Thread;
+0003: move-result-object v0
+0004: invoke-static {}, Landroid/os/Looper;.getMainLooper:()Landroid/os/Looper;
+0007: move-result-object v1
+0008: invoke-virtual {v1}, Landroid/os/Looper;.getThread:()Ljava/lang/Thread;
+000b: move-result-object v1
+000c: if-ne v0, v1, 0015
 000e: iget v0, v2, LIdGenerator;.id:I
 0010: add-int/lit8 v1, v0, #int 1
 0012: iput v1, v2, LIdGenerator;.id:I
 0014: return v0
+0015: new-instance v0, Ljava/lang/AssertionError;
+0017: invoke-direct {v0, v1}, Ljava/lang/AssertionError;.<init>:()V
+001a: throw v0
  ⋮

Our debug build still tests the invariant at runtime but the release build completely eliminates the check. This behavior is now similar to the JVM where unit tests turn on the -ea flag whereas production does not.

(If you're wondering why the code which throws the exception was moved to the bottom of the method check out the Optimizing Bytecode by Manipulating Source Code post.)

This feature is already available in the latest AGP 4.1 alphas. The nature of invariants are such that they should never fail unless you're already doing something very wrong. By checking them in debug builds we have only confidence to gain in the correctness of our libraries and application code when running on Android.

Kotlin's assert() function currently has a subtle behavior difference compared to Java's assert keyword. For more information see Jesse Wilson's Kotlin’s Assert Is Not Like Java’s Assert post. D8 currently does not recognize Kotlin's assert() to apply the optimization in this post, but the original D8 feature request remains open for this very reason.

Unlike some of the R8 optimizations covered in recent posts, this optimization is localized to the body of a single method which is why it can also be performed by D8. Check out the D8 Optimizations post for more optimizations which apply in both D8 and R8.

And stay tuned for more D8 and R8 optimization posts coming soon!

https://jakewharton.com/d8-optimization-assertions

Removing Google as a Single Point of Failure Part 2: Gmail

Mar 18, 2020 Updated Mar 18, 2020

Show full content

I want to remove Google as a single point of failure in my life. In the first blog post on this subject I detailed my setup for backing up Google Photos and Google Drive contents onto my home server and remotely to rsync.net. Left out of that post was a solution for Gmail because I hadn't found one yet. Now I have.

Source of truth

That first post started with an important qualification:

This does not mean that I'm going to stop using Google products. Quite the opposite. Gmail, Google Photos, and Google Drive will remain the source-of-truth for all of the things I listed above. What's different is that should Google disappear tomorrow (or just my account) I would lose no data.

This was easy to achieve with Photos and Drive because the data is all there is. With email that's unfortunately not true.

Incrementally backing up the email data is pretty straightforward–we'll get into that shortly. But with Gmail your email address is still tied to the @gmail.com domain. So if my account or all of Google disappears, I won't be able to receive any more email.

Of course the "easy" fix here is to just use a domain that I control. Obviously I own jakewharton.com, and I intend to set that up, but I wanted something shorter. I've owned cob.io for many years with the intention of setting up j@cob.io, but I go by "Jake". Luckily the last few years have seen an influx of new TLDs so I managed to grab ke.fyi. Say hello to j@ke.fyi!

Having an email on my own domain doesn't address the problem that there's still hundreds or thousands of services that I've given the Gmail address to. While I can migrate many, there are inevitably those which I can't or that I simply don't know exist. The old address needs to remain working.

Fastmail

After browsing a few hosted email solutions, I settled on Fastmail (Note: referral link). In addition to a positive recommendation from a friend, there were a few key motivating factors.

Domain catch-all

A popular feature of Gmail is the ability to append a + to your user followed by any text and mail will still be sent to you. This can be used for filters or to see who is selling your email address to others.

A domain catch-all is the same thing but you can change the entire username. Now I can use addresses like southwest@ke.fyi without needing to set anything up first. Aside from knowing if they sell my email it also slightly improves security. While the format is human guessable, any automated attack using emails from a data breach simply don't exist on other services.

Fastmail supports replying to these catch-all emails using the same address to which it was sent. This is critical to maintain the illusion, especially when dealing with people rather than automated systems.

Multiple domains

Aside from ke.com I also set up jakewharton.com and a few other domains. Fastmail sends all emails to my configured domains to a unified inbox rather than forcing me to switch accounts. Instead, my replies will match the incoming address the same as it did for the catch-all.

Additionally, when composing emails I can choose the domain from which it will be sent. And for those with catch-all set up, I can even pick arbitrary usernames on those domains. Neat!

Gmail support

Since my Gmail address will receive some mail for the foreseeable future it's important to use a service that supports more than just a one-time import. Fastmail performs near-realtime incremental syncs to pull in any new email or calendar events from Gmail. Not only is it very fast, but they seem to be able to bypass the rate limits that otherwise exist when downloading your email over IMAP.

I can compose email using the Gmail address. I don't know why I would ever need this, but it's nice to have.

In replies to any email Fastmail lets me change the address from which I'm replying. When a person sends an email to my old address, I can use this feature to gradually migrate them over to the new one.

IMAP availability

Remember, it's not enough to migrate from Gmail to Fastmail for an address on our own domain. We still need to ensure a backup solution. Thankfully Fastmail supports any and every protocol you'd need.

As a nice bonus, the Gmail to Fastmail sync bypasses Google's rate limit meaning you can also sync your entire Gmail within minutes through Fastmail rather than having to spread it out over multiple days when accessing it directly.

Backup

Almost immediately after the previous blog post people were sending me a myriad of tools for Gmail backup. Thank you for that!

mbsync

After trying a few tools I settled on mbsync which is part of the isync project. The tool is very generic but can be used to synchronize emails to the Maildir format over IMAP.

Maildir is a standard format that can be read by many tools. Unlike mbox, the format used in Google Takeout for Gmail, Maildir uses individual files for each email. This lends itself to incremental updates, tools like grep, and compression.

Few clients operate on Maildir directly, unfortunately. Definitely none which I'm comfortable using (sorry, Mutt).

It's quite easy to push Maildir back into any IMAP-supported host with mbsync should you need to restore from a backup. And if you really need an always-on, self-hosted client you can push into one as part of your sync.

Docker

In order to automate this procedure I wrapped mbsync up in a Docker container as jakewharton/mbsync which can run it on a periodic schedule.

It uses same healthchecks.io service as the rclone and gphotos-sync containers from the last post for monitoring. I personally send this to my own Slack workspace which gives me simple history and easy notifications on all my devices.

Here's its entry in my docker-compose.yml:

version: "3.6"

services:
  # Services from previous blog post...

  mbsync-jake:
    container_name: mbsync-jake
    image: jakewharton/mbsync:latest
    restart: unless-stopped
    volumes:
      - /tanker/backup/jake/mail:/mail
      - ${USERDIR}/docker/mbsync-jake:/config
    environment:
      # Hourly
      - "CRON=0 * * * *"
      - "CHECK_URL=https://hc-ping.com/..."

For information on how to set up the container please see the repo's README.

Storage

Just like the "Data Storage" and "Data Replication" sections from the last post, the backup goes to a dedicated ZFS filesystem. This filesystem is regularly snapshotted to provide local history. The data and all its snapshots are also synchronized to rsync.net for an off-site copy.

$ zfs list
NAME                          USED  AVAIL     REFER  MOUNTPOINT
tanker                       18.4T  2.08T      151K  /tanker
tanker/backup                 529G  2.08T      151K  /tanker/backup
tanker/backup/angela          172G  2.08T      140K  /tanker/backup/angela
tanker/backup/angela/photos   172G  2.08T      172G  /tanker/backup/angela/photos
tanker/backup/jake            337G  2.08T      151K  /tanker/backup/jake
tanker/backup/jake/drive     78.9G  2.08T     78.9G  /tanker/backup/jake/drive
tanker/backup/jake/mail      12.1G  2.08T     12.1G  /tanker/backup/jake/mail
tanker/backup/jake/photos     246G  2.08T      148G  /tanker/backup/jake/photos

I didn't bother enabling compression on the filesystem because it's only 12GiB. I suspect it would compress very well and it's something that I can always turn on later.

I did this migration one week after the previous post so I've been on Fastmail for about three weeks now. In general it's been a positive experience. The Android app is hybrid so sometimes it feels a bit weird, but otherwise the clients have some nice features. My favorite so far is how it deals with quoted sections in long threads:

Screenshot of Fastmail showing a large quoted section collapsed

Having much more control over my email, photos, and files is comforting but I sincerely hope I never need to rely on these backups.

Once configured the Docker containers have been almost entirely maintenance free. I haven't touched the photos or files sync for over a month now. Sometimes it hiccups and notifies me, but it's always recovered on its own.

Screenshot of Slack channel showing healthchecks.io notifications of sync being down and then recovering an hour later

Now that Google is mostly removed as a single point of failure (I'm still relying on them for Keep and employment for now), it seems like getting automated backups rolling for all my GitHub projects is the next most pressing matter.

https://jakewharton.com/removing-google-as-a-single-point-of-failure-gmail

Removing Google as a Single Point of Failure

Feb 19, 2020 Updated Feb 19, 2020

Show full content

I want to remove Google as a single point of failure in my life. They have two decades of my email. They have two decades of my photos. They have the only copy of thousands of documents, projects, and other random files from the last two decades.

Now I trust Google completely in their ability to correctly retain my data. But I think it's clear that over the last 5 years the company has lost something intrinsically important in the way it operates. I no longer trust them not to permanently lock me out of my account. And I say this as a current Google employee.

This year I've embarked on a mission to reclaim ownership of my data. This does not mean that I'm going to stop using Google products. Quite the opposite. Gmail, Google Photos, and Google Drive will remain the source-of-truth for all of the things I listed above. What's different is that should Google disappear tomorrow (or just my account) I would lose no data.

Get Your Data Step 1: Takeout

The first thing you need to do today is visit takeout.google.com and export your Gmail, Photos, and Drive data (and anything else you want). This will send you links to a set of 50GB .tar.gz files of your data that you can download.

That is, provided it works. It took me 5 attempts of exporting just my Photos data to have one succeed. Persistence pays off, though, so don't give up even though this is a slow process. Get. Your. Data.

Google providing the Takeout service is amazing, but as far as a backup solutions go it is woefully inadequate. It's an extremely manual, slow, and non-incremental process. However, it's also comprehensive in ways that no other solution can match. Because of that, I have a monthly recurring task to perform a Takeout. Do it during a boring meeting so it feels less of a chore and more of a welcome distraction.

Seriously, do this today!

Step 2: Drive Sync

The rclone tool can incrementally sync your Google Drive contents. It will also take Google's proprietary document formats and convert them into well-defined standard formats (which usually means Microsoft Office formats).

I run rclone hourly using the bcardiff/docker-rclone Docker container onto a large, redundant storage array (more on this array later). This container is nice because it pings healthchecks.io after each hourly sync. The check is set up to expect an hourly ping with a grace period of two hours.

Step 3: Photo Sync

While Google Photos does have an API, it does not provide access to the original image. This is some bullshit. I pay for Google Drive and Google Photos storage but I can only access original files for Drive. Some bullshit.

Thankfully, after tweeting about said bullshit I was pointed at the gphotos-cdp tool (built by some very smart people). This uses the Chrome DevTools protocol to drive the Google Photos website and download the original photos one-by-one. This is awful and awesome and scary and it totally works!

In an effort to automate this, I wrapped the tool up in a Docker container as jakewharton/gphotos-sync which can run it on a periodic schedule and uses the same healthchecks.io service as the rclone container. The initial setup is a little rough, but I've been running two instances hourly for two weeks without incident. Try it out!

Step 4: Gmail Sync

I looked into a bunch of tools to do backup Gmail but I couldn't find one that was still maintained and still worked. This is bad. Takeout is a start for this, but I want something more real-time.

Anyone have a solution here? Please let me know!

Data Storage

I recently built a brand new home server with the intent of using it for storing backups of my Google data (among other things). It has four 8TB drives in a ZFS pool to ensure data is written to more than one physical drive. ZFS is an incredible storage technology that can ensure data is written in a way that is resilient to both drive failures and bitrot on both the writing and reading side.

ZFS supports creating separate filesystems as easily as you would normally create folders. Each filesystem can manage things like storage quotas and their own snapshots of content. Each of the Docker containers running rclone or gphotos-cdp writes into its own ZFS filesystem.

$ zfs list
NAME                          USED  AVAIL     REFER  MOUNTPOINT
tanker                       17.8T  2.71T      151K  /tanker
tanker/backup                 417G  2.71T      151K  /tanker/backup
tanker/backup/angela          170G  2.71T      140K  /tanker/backup/angela
tanker/backup/angela/photos   170G  2.71T      170G  /tanker/backup/angela/photos
tanker/backup/jake            227G  2.71T      151K  /tanker/backup/jake
tanker/backup/jake/drive     78.9G  2.71T     78.9G  /tanker/backup/jake/drive
tanker/backup/jake/photos     148G  2.71T      148G  /tanker/backup/jake/photos

I currently use znapzhot to recursively create automatic snapshots of all these filesytems under "tanker/backup". My current policy is:

Hourly snapshots retained for one day.
Daily snapshots retained for one month.
Monthly snapshots retained for one year.

This policy is a hedge against any deleted or changed file. The simplicity of cd-ing into the hidden .zfs directory means these older copies are easily browsed, if ever needed.

Data Replication

The frequently-repeated, best-practice rule for data storage is the "3–2–1 rule". That is: three copies of the data, across two storage mediums, with one off-site location. In this framework, Google serves as one copy, one storage medium, and one off-site location. The local backups that we're synchronizing serve as a second copy and a second storage medium (HDDs vs. the cloud).

For the third copy, I chose rsync.net which is quite the nerdy backup solution. Normally turning back to rclone for synchronizing the data to Dropbox or Backblaze would be an obvious solution. But rsync.net is unique in that they give you direct access to a ZFS zpool over SSH as root. This means that I can not only synchronize the latest data, but I can also synchronize the historical snapshots of it from the last year. The znapzend tool that I am already using handles sending the incremental snapshots as they're taken. While rsync.net is a slightly more expensive alternative for cloud storage, the raw ZFS access and ability to store historical snapshots makes it worthwhile.

Self Hosting

In the unlikely event that Google implodes (or the far-more-likely scenario that they lock you out of your account) your data may be backed up but is otherwise relatively inaccessible. This is not very useful.

So far I have been serving read-only copies of my "tanker/backup" folder using NextCloud via the linuxserver/nextcloud Docker container. This not only affords me access on the go, but I can also easily share content with others.

NextCloud is a generic file host that offers document editing, photo viewing, and video playback in addition to just serving raw files. It offers many similar features to Google Drive. For example, if you do not want to set up the gphotos-cdp tool to back up your photos, you can run the NextCloud app on your phone which can automatically synchronize new photos to your server.

In order to expose NextCloud to the internet, you need, at minimum, knowledge of your IP address. While I do have business internet at home, I don't have a static IP. Instead, I use the oznu/cloudflare-ddns Docker container to update a Cloudflare DNS A record on one of my domains.

Instead of exposing NextCloud directly to the internet, I use the traefik Docker container as a reverse proxy. It takes care of talking to Let's Encrypt to keep a valid SSL certificate in rotation as well as routing traffic for the domain to the NextCloud container.

Docker

The NextCloud, Traefik, Cloudflare DDNS, rclone, and gphotos-cdp containers are all managed by Docker Compose. This makes it easy to update and manage their configuration.

In order to monitor the host I also run Netdata and Portainer.

Here's my docker-compose.yml:

version: "3.6"

services:
  portainer:
    container_name: portainer
    image: portainer/portainer
    command: -H unix:///var/run/docker.sock
    restart: always
    ports:
      - "11080:9000"
    volumes:
      - ${USERDIR}/docker/portainer/data:/data
      - /var/run/docker.sock:/var/run/docker.sock
    environment:
      - TZ=${TZ}

  netdata:
    container_name: netdata
    image: netdata/netdata
    restart: unless-stopped
    hostname: netdata
    ports:
      - 19999:19999
    environment:
      - PGID=998 #docker group
    cap_add:
      - SYS_PTRACE
    security_opt:
      - apparmor:unconfined
    volumes:
      - ${USERDIR}/docker/netdata:/etc/netdata:ro
      # For monitoring:
      - /etc/passwd:/host/etc/passwd:ro
      - /etc/group:/host/etc/group:ro
      - /etc/os-release:/etc/os-release:ro
      - /proc:/host/proc:ro
      - /sys:/host/sys:ro
      - /var/log/smartd:/var/log/smartd:ro
      - /var/run/docker.sock:/var/run/docker.sock:ro

  traefik:
    container_name: traefik
    image: traefik
    command:
      - "--api.insecure=true"
      - "--providers.docker=true"
      - "--providers.docker.exposedbydefault=false"
      - "--entrypoints.http.address=:80"
      - "--entrypoints.https.address=:443"
      - "--certificatesresolvers.letsencrypttls.acme.tlschallenge=true"
      - "--certificatesresolvers.letsencrypttls.acme.email=example@example.com"
      - "--certificatesresolvers.letsencrypttls.acme.storage=/letsencrypt/acme.json"
    ports:
      - "80:80"
      - "443:443"
      - "8080:8080"
    volumes:
      - "${USERDIR}/docker/traefik/letsencrypt:/letsencrypt"
      - "/var/run/docker.sock:/var/run/docker.sock:ro"
    labels:
      - "traefik.enable=true"
      # HTTP-to-HTTPS Redirect
      - "traefik.http.routers.http-catchall.entrypoints=http"
      - "traefik.http.routers.http-catchall.rule=HostRegexp(`{host:.+}`)"
      - "traefik.http.routers.http-catchall.middlewares=redirect-to-https"
      - "traefik.http.middlewares.redirect-to-https.redirectscheme.scheme=https"

  cloudflare-ddns:
    container_name: cloudflare-ddns
    image: oznu/cloudflare-ddns
    restart: unless-stopped
    environment:
      - API_KEY=apikey
      - ZONE=example.com
      - SUBDOMAIN=*

  nextcloud:
    container_name: nextcloud
    image: linuxserver/nextcloud
    restart: unless-stopped
    environment:
      - TZ=${TZ}
      - PUID=${PUID}
      - PGID=${PGID}
    volumes:
      - ${USERDIR}/docker/nextcloud:/config
      - /tanker/nextcloud:/data
      - /tanker/backup:/backup:ro
    labels:
      - "traefik.enable=true"
      - "traefik.http.routers.nextcloud.rule=Host(`files.example.com`)"
      - "traefik.http.routers.nextcloud.entrypoints=https"
      - "traefik.http.routers.nextcloud.tls.certresolver=letsencrypttls"

  rclone-drive-jake:
    container_name: rclone-drive-jake
    image: pfidr/rclone
    restart: unless-stopped
    volumes:
      - ${USERDIR}/docker/rclone-drive-jake:/config
      - /tanker/backup/jake/drive:/gdrive
    environment:
      - "UID=${PUID}"
      - "GID=${PGID}"
      - "TZ=${TZ}"
      - "SYNC_SRC=gdrive:"
      - "SYNC_DEST=/gdrive"
      - "CHECK_URL=https://hc-ping.com/..."
      # Hourly
      - "CRON=0 * * * *"
      # TODO update to https://github.com/rclone/rclone/issues/2893 when released
      - "SYNC_OPTS=-v --drive-alternate-export"

  gphotos-sync-jake:
    container_name: gphotos-sync-jake
    image: jakewharton/gphotos-sync:latest
    restart: unless-stopped
    volumes:
      - ${USERDIR}/docker/gphotos-sync-jake:/tmp/gphotos-cdp
      - /tanker/backup/jake/photos:/download
    environment:
      - TZ=${TZ}
      # Hourly
      - "CRON=0 * * * *"
      - "CHECK_URL=https://hc-ping.com/..."

  gphotos-sync-angela:
    container_name: gphotos-sync-angela
    image: jakewharton/gphotos-sync:latest
    restart: unless-stopped
    volumes:
      - ${USERDIR}/docker/gphotos-sync-angela:/tmp/gphotos-cdp
      - /tanker/backup/angela/photos:/download
    environment:
      - TZ=${TZ}
      # Hourly
      - "CRON=0 * * * *"
      - "CHECK_URL=https://hc-ping.com/..."

All of the containers store their configuration in ${USERDIR}/docker which is in my home directory. This folder is mounted as a ZFS filesystem on a partition of the OS drive. It has a znapzend snapshot policy, is replicated into /tanker/backup/home, and is synchronized to rsync.net. In the event of this machine failing or being destroyed it should be fairly easy to set up a replacement.

So far I'm pretty happy with this setup for backing up my Google Drive and Photos content. The apps for Drive and Photos are best-in-class and so I prefer to keep using them as the source of truth as long as possible. It's nice to know that NextCloud could step in here if needed, but hopefully it never comes to that.

Gmail backups remain a problem to be solved. It's also a huge problem that I cannot take control of my email address if it were needed. The Gmail webapp and mobile app also haven't seen innovation in a decade and increasingly feel like legacy software. The thought of migrating my email is daunting, but it feels like it's looming.

I continue to beleive that trusting Google with your data is a safe bet, but it is not a sufficient backup strategy by itself. Take control of your data.

https://jakewharton.com/removing-google-as-a-single-point-of-failure

Extracting 100% of Data From a Stubborn, Dying ZFS Pool

Feb 12, 2020 Updated Feb 12, 2020

Show full content

In 2010 I built a home server with five 2TB drives. It ran Solaris and ZFS for the redundancy and data checksumming to ensure no data could be lost or corrupted. Just 16 months later five 3TB drives were added to the pool. This computer took the 2600-mile trip to live in San Francisco with me. It then endured the 2600-mile return trip when I left.

Having sat unplugged for five years, I recently powered the server back on for new workloads. But relying on 10 ten-year-old hard drives in 2020 is asking for cascading failure. And not only were the drives old, they've experienced physical trauma. So instead I built a new server and endeavored to migrate the data.

During the transfer the drives exhibited consistent read failures as expected, but ZFS was able to transparently mitigate them. Occasionally, though, the pool would lock up in a way that could only be fixed with a hard reboot. These lock ups sent me on a weird journey of software and hardware orchestration to complete the data transfer.

Symptoms

During transfer of the data, progress would stall randomly in a way that seemingly could not be killed. CTRL+C had no effect. No kill signal had an effect. Even last-resort shutdown -r nows did nothing.

The system was oddly otherwise responsive. You could SSH in from another tab and poke around. ps showed that the transfer process was in the "D+" state which was uninterruptible sleep in the foreground.

jake     21749  1.1  0.0   8400  2124 pts/0    D+   23:42   0:00 rsync ...

That explained why the process wouldn't die. The dmesg output also confirmed the problem happened deep in the I/O stack.

[ 3626.101527] INFO: task rsync:30680 blocked for more than 120 seconds.
[ 3626.101547]       Tainted: P           O      5.3.0-26-generic #28-Ubuntu
[ 3626.101563] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 3626.101580] rsync           D    0 30680      1 0x00000000
[ 3626.101584] Call Trace:
[ 3626.101590]  __schedule+0x2b9/0x6c0
[ 3626.101596]  schedule+0x42/0xb0
[ 3626.101601]  schedule_timeout+0x152/0x2f0
[ 3626.101717]  ? __next_timer_interrupt+0xe0/0xe0
[ 3626.101730]  io_schedule_timeout+0x1e/0x50
[ 3626.101756]  __cv_timedwait_common+0x15e/0x1c0 [spl]
[ 3626.101767]  ? wait_woken+0x80/0x80
[ 3626.101790]  __cv_timedwait_io+0x19/0x20 [spl]
[ 3626.102007]  zio_wait+0x11b/0x230 [zfs]
[ 3626.102166]  dmu_buf_hold_array_by_dnode+0x1db/0x490 [zfs]
[ 3626.102322]  dmu_read_uio_dnode+0x49/0xf0 [zfs]
[ 3626.102523]  ? zrl_add_impl+0x31/0xb0 [zfs]
[ 3626.102680]  dmu_read_uio_dbuf+0x47/0x60 [zfs]
[ 3626.102880]  zfs_read+0x117/0x300 [zfs]
[ 3626.103086]  zpl_read_common_iovec+0x99/0xe0 [zfs]
[ 3626.103292]  zpl_iter_read_common+0xa8/0x100 [zfs]
[ 3626.103496]  zpl_iter_read+0x58/0x80 [zfs]
[ 3626.103509]  new_sync_read+0x122/0x1b0
[ 3626.103525]  __vfs_read+0x29/0x40
[ 3626.103536]  vfs_read+0xab/0x160
[ 3626.103547]  ksys_read+0x67/0xe0
[ 3626.103558]  __x64_sys_read+0x1a/0x20
[ 3626.103569]  do_syscall_64+0x5a/0x130
[ 3626.103581]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 3626.103594] RIP: 0033:0x7fe95027a272
[ 3626.103608] Code: Bad RIP value.
[ 3626.103616] RSP: 002b:00007ffd1cf2b8c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[ 3626.103629] RAX: ffffffffffffffda RBX: 00005649e1948980 RCX: 00007fe95027a272
[ 3626.103637] RDX: 0000000000040000 RSI: 00005649e1993120 RDI: 0000000000000004
[ 3626.103645] RBP: 0000000000040000 R08: 00000002a1df57af R09: 00000000108f57af
[ 3626.103654] R10: 00000000434ed0ba R11: 0000000000000246 R12: 0000000000000000
[ 3626.103662] R13: 0000000000040000 R14: 0000000000000000 R15: 0000000000000000

Visual inspection of the computer also showed anywhere from one to three drive LEDs were solid. None of the drive arms were moving to actually access data (the platters were still spinning).

Picture of stuck LEDs on hard drive cage

If there was a way to recover from this state I could not find it.

Mitigation

The only way out of this state was a hard reboot which required holding down the power button. But after a reboot another lockup would occur randomly from within 10 seconds, to a few minutes in, to sometimes many hours later. Since I was only here to get data off of the machine, I did not want to spend a lot of time fixing the problem when a simple hard reboot was enough to unblock progress.

While seemingly random, I estimated a 100 hard reboots would be all that was needed. This meant progress could only be made during the day which increased transfer time from about 4 days to 10. Coupled with the restart time and then resuming transfer it looked more like 12 days would be required.

I decided this was unfortunate, but doable. I would work from my basement next to the machine which was hooked up to display on a TV. Whenever I noticed it was locked up, I would hard reboot the machine, wait for it to boot, and then resume the transfer by typing on its keyboard. In two weeks it would be over with.

After one day of working next to the machine I knew I needed to find a better solution. That day it had locked up 20-30 times which was triple what I had estimated for one day. Not only were the occurrences more frequent, but it took me a while to notice and using a keyboard attached to the server to restart the transfer was tedious.

In order for this transfer to complete with my sanity intact I needed to somehow automate the process. There were two problems to solve: figuring out when rsync was hung and performing a hard reboot.

Automating Detection

Because rsync was stuck deep in the I/O stack there was no chance for timeouts to apply. Neither its transfer-based timeout or a timeout on the SSH connection would cause it to exit when hung. Even when I moved rsync to run from the new machine I could not get a timeout to trigger. Running on the new machine did allow killing the process, which was a start.

Since rsync wouldn't exit normally, I decided to automate hung detection the same way I checked manually: monitoring the output. If the last two lines of the output (which show the current file and transfer progress) haven't changed in 10 seconds we consider the transfer to be hung.

# Clear state from previous runs to ensure we can detect immediate hangs.
truncate -s 0 rsync.txt

rsync -ahHv --progress \
  theflame:/tanker/* \
  /tanker | tee rsync.txt &

LAST_OUTPUT=""
while sleep 10
do
  NEW_OUTPUT=$(tail -2 rsync.txt)
  if [[ "$LAST_OUTPUT" == "$NEW_OUTPUT" ]]; then
    break # Output has not changed, assume locked up.
  fi
  LAST_OUTPUT="$NEW_OUTPUT"
done

This script will now exit when the transfer is hung. I could now detect hangs by playing beeps or sending myself a push notification with curl on exit. Using this script on the second day meant that almost no time was wasted waiting for me to notice the transfer had stopped. I was still hard rebooting the machine and re-starting the script 20-30 times, though.

Automating Reboot

Last year I got a TP-Link Kasa Smart Plug as a stocking stuffer. I found a few sites which detailed how to use their undocumented API but while I was able to authenticate I was unable to toggle the power. Thankfully they have integration with IFTTT. I linked the plug and set up two applets which were each triggered by webhook.

IFTTT Applets for powering on and off the plug

I hooked the old server's power through the plug and I could now control its power with two curl commands!

Integrating this into the script was a bit more complicated than expected. I started with a simple infinite loop running the above sync and then doing a power cycle with a delay.

while :
do
  # truncate, rsync, while loop from above...

  echo "POWER: off"
  curl -X POST -s https://maker.ifttt.com/trigger/power_off/with/key/... > /dev/null
  sleep 5

  echo "POWER: on"
  curl -X POST -s https://maker.ifttt.com/trigger/power_on/with/key/... > /dev/null
  echo "Waiting 50s for startup..."
  sleep 50
done

After a few successful reboots the script would endlessly power-cycle the server or it would simply remain off. This was because IFTTT has no guarantees on latency or order of events. Instead of using sleep to time things, I switched to monitoring the actual machine for its state through SSH.

while :
do
  # Loop until SSH appears to confirm power on.
  ssh -q -o ConnectTimeout=1 theflame exit
  if [[ $? != 0 ]]; then
    echo "SSH is not available. waiting 5s..."
    sleep 5
    continue
  fi

  # truncate, rsync, while loop from above...

  echo "POWER: off"
  curl -X POST -s https://maker.ifttt.com/trigger/power_off/with/key/... > /dev/null

  while :
  do
    sleep 5
    ssh -q -o ConnectTimeout=1 theflame exit
    if [[ $? != 0 ]]; then
      break # SSH is down!
    fi
    echo "POWER: SSH is still available. Waiting 5s..."
  done

  echo "POWER: on"
  curl -X POST -s https://maker.ifttt.com/trigger/power_on/with/key/... > /dev/null
  echo "Waiting 50s for startup..."
  sleep 50
done

Now when IFTTT was delayed in delivering the power_off event the script would wait to confirm the machine powered off. This would sometimes spike as high as 10 minutes. But whenever it eventually triggered, 5 seconds later the power_on event would be sent and the machine would start coming back up. After 50 seconds it confirms SSH availability before restarting the rsync.

Sadly I didn't capture a video of this in action. I did capture some of the output to excitedly share with some friends as they watched me go down this rabbit hole though.

Other/backup/presentations/...
          1.20G  79%   79.29MB/s    0:00:25
POWER: off
POWER: on
Waiting 50s for startup...
Starting rsync...
receiving incremental file list
Other/
Other/backup/presentations/...
          1.45G  95%   81.64MB/s    0:00:05

I let this script run for two days and it managed to complete the transfer of all files. IFTTT reports that the power_on and power_off events were each triggered 151 times! A few hours of scripting and a $10 gift saved me from two weeks of doing this myself.

Final Thoughts

Had I known the drives would be so much trouble I would have taken a totally different approach. A better option would have been to run dd on the drives to copy their raw content to 2TB and 3TB files which would do a single pass across the platters. I think this would have been less likely to cause a freeze than the random access that rsync was doing. Then I could mount these files as storage on the new machine, import them as a ZFS pool, and do a local zfs send | zfs recv to get the data out.

I chose to use rsync over zfs send | zfs recv because I was unable to get a snapshot to complete before locking up. Once the initial transfer completed, I did a second pass using this script where rsync did a checksum of the file content on both ends (normally it only compares size and date). This found a few inconsistencies and re-transfered about 100GB of data.

Here's the full script in its entirety: gist.github.com/JakeWharton/968859c48fd1bd7e85a0f78a164253b9. There are some additional features such as mounting the old ZFS pool as readonly on boot and using a third IFTTT trigger to update a push notification on my phone for the current item.

More on what this new machine is for in future posts.

https://jakewharton.com/extracting-100-percent-of-data-from-a-stubborn-dying-zfs-zpool

D8 Library Desugaring

Dec 18, 2019 Updated Dec 18, 2019

Show full content

Note: This post is part of a series on D8 and R8, Android's new dexer and optimizer, respectively. For an intro to D8 read "Android's Java 8 support". For an intro to R8 read "R8 Optimization: Staticization".

So far in this series the coverage of D8 has been about desugaring of Java 8 language features, working around vendor- and version-specific bugs in the platform, and performing method-local optimization. In this post we'll cover an upcoming feature of D8 called "core library desugaring" which makes newer APIs available on older versions of Android.

Library desugaring of Java 8 APIs such as streams, optional, and the new time APIs was announced at the developer keynote of Google I/O 2019 and delivered at Android DevSummit 2019 with the first canary build of Android Studio 4.0. This will allow developers to use these features introduced in API 24 and 26 on every version their app targets. No more backport libraries and duplicated APIs!

This is also a boon to the Java library ecosystem. Many libraries have long-since moved on to Java 8 but are unable to use newer APIs in order to maintain Android compatibility. While every new API is not available, D8 desugaring should allow these libraries to use the APIs which are most desired.

Not a new feature

Despite the recent fanfare, desugaring APIs is not a actually a new feature of D8. Since it became a usable alternative to dx, D8 has desugared calls to the API 19 Objects.requireNonNull method. But, why that one method?

Certain code patterns will cause the Java compiler to synthesize an explicit null check.

class Counter {
  final int count = 0;
}
class Main {
  void doSomething(Counter counter) {
    int count = counter.count;
  }
}

When compiled with JDK 8, the Java bytecode of the doSomething method contains a call to getClass() whose return value is then thrown away.

void doSomething(Counter);
  Code:
     0: aload_1
     1: invokevirtual #2   // Method java/lang/Object.getClass:()Ljava/lang/Class;
     4: pop
     5: iconst_0
     6: istore_2
     ⋮

The zero value of count gets inlined into doSomething at bytecode index 5. As a result, if you were to pass null as the Counter the program would not throw a null-pointer exception. By including a call to getClass() on the Counter, the correct program behavior is maintained.

If you recompile this snippet with JDK 9, the bytecode changes.

 void doSomething(Counter);
   Code:
      0: aload_1
-     1: invokevirtual #2   // Method java/lang/Object.getClass:()Ljava/lang/Class;
+     1: invokestatic  #2   // Method java/util/Objects.requireNonNull:(Ljava/lang/Object;)Ljava/lang/Object;
      4: pop
      5: iconst_0
      6: istore_2
      ⋮

JDK-8074306 changed the behavior of the Java compiler in this scenario to produce better exceptions. But the Android toolchain has historically not worked correctly with JDK 9 (and newer), so you may be wondering how these calls came to be.

The primary source was Google's error-prone compiler and static analyzer which works with JDK 8 but is built on top of the JDK 9 compiler. While error-prone resolved the issue by introducing an off-by-default flag, Retrolambda added desugaring for the API which basically required that D8 do the same.

Running D8 on the Java bytecode (with a minimum API level of less than 19) desugars the call back into a getClass() invocation.

[00016c] Main.doSomething:(LCounter;)V
0000: invoke-virtual {v1}, Ljava/lang/Object;.getClass:()Ljava/lang/Class;
 ⋮

Objects.requireNonNull was the only API that D8 was able to desugar for a long time, and it did so using a simple rewrite. But soon its desugaring capabilities would have to expand in order to actually backport functionality.

Kotlin's Java 8

Unlike the Java compiler, the Kotlin compiler emits references to many APIs when generating bytecode for its language features. A data class is an example of the compiler generating a lot of bytecode on your behalf.

data class Duration(val amount: Long, val unit: TimeUnit)

In Kotlin 1.1.60, when targeting Java 8 bytecode, the hashCode method of a data class changed to start referencing some Java 8 APIs.

public int hashCode();
  Code:
     0: aload_0
     1: getfield      #10   // Field amount:J
     4: invokestatic  #71   // Method java/lang/Long.hashCode:(J)I
     ⋮

The compiler is free to call Long.hashCode because we told it that we were targeting Java 8. This is a new static method which has been added to the Long class.

Normally this would not be a problem for Android since the Kotlin compiler targets Java 6 by default. Unfortunately, the community push to target Java 8 for its language features interacted poorly with a decision to have the Kotlin compiler respect the specified target of your Java compiler in Kotlin 1.3. As a result, Android developers started seeing NoSuchMethodErrors for these hashCode calls because they were only available in API 24 and newer.

While the behavior of the Kotlin compiler was reverted for Android projects, there still was a potential for libraries consumed by Android projects to be targeting Java 8 and to reference these methods. The D8 team decided to step in and mitigate this problem by desugaring the hashCode APIs.

Running D8 on the Java bytecode (with a minimum API level of less than 24) shows the desugaring.

[0003e4] Duration.hashCode:()I
0000: iget-wide v0, v2, LDuration;.amount:J
0002: invoke-static {v0, v1}, L$r8$backportedMethods$utility$Long$1$hashCode;.hashCode:(J)I
 ⋮

I'm not sure how you expected Long.hashCode to be desugared, but I'm guessing it wasn't to a class named $r8$backportedMethods$utility$Long$1$hashCode! Unlike Objects.requireNonNull which was rewritten to getClass() to produce the same observable behavior, Long.hashCode has an implementation which cannot be replicated with a trivial rewrite.

Backporting methods

Inside of the D8 project, there are template implementations of each API that it can backport.

public final class LongMethods {
  public static int hashCode(long l) {
    return (int) (l ^ (l >>> 32));
  }
}

The code for these APIs are either written from the Javadoc specification of the method or adapted from libraries like Google Guava. When D8 is built, these templates are automatically converted into abstract representations of the method body.

public static CfCode LongMethods_hashCode() {
  return new CfCode(
      /* maxStack = */ 5,
      /* maxLocals = */ 2,
      ImmutableList.of(
          new CfLoad(ValueType.LONG, 0),
          new CfLoad(ValueType.LONG, 0),
          new CfConstNumber(32, ValueType.INT),
          new CfLogicalBinop(CfLogicalBinop.Opcode.Ushr, NumericType.LONG),
          new CfLogicalBinop(CfLogicalBinop.Opcode.Xor, NumericType.LONG),
          new CfNumberConversion(NumericType.LONG, NumericType.INT),
          new CfReturn(ValueType.INT)));
}

When D8 is compiling bytecode and first encounters a call to Long.hashCode, it generates a class on-the-fly with a hashCode method whose body created by calling that factory method. Each Long.hashCode call is then rewritten to point at this newly-generated class.

Class #0            -
  Class descriptor  : 'L$r8$backportedMethods$utility$Long$1$hashCode;'
  Access flags      : 0x1401 (PUBLIC ABSTRACT SYNTHETIC)
  Superclass        : 'Ljava/lang/Object;'
  Direct methods    -
    #0
      name          : 'hashCode'
      type          : '(J)I'
      access        : 0x1009 (PUBLIC STATIC SYNTHETIC)
00044c:                   |[00044c] $r8$backportedMethods$utility$Long$1$hashCode.hashCode:(J)I
00045c: 1300 2000         |0000: const/16 v0, #int 32
000460: a500 0200         |0002: ushr-long v0, v2, v0
000464: c202              |0004: xor-long/2addr v2, v0
000466: 8423              |0005: long-to-int v3, v2
000468: 0f03              |0006: return v3

This process allows the Java 8-targeting data class work on versions of Android prior to API 24. If you look closely, you can probably map each Dalvik bytecode back to the abstract representation and then back to the template source code.

It may sound overkill to generate one class per method but this ensures that there is only one implementation of each API that requires backporting. When using R8, these synthesized classes also participate in optimizations such as method inlining and class merging which ultimately reduce their impact.

D8 can desugar 98 individual APIs from Java 7 and Java 8 which were added to existing types. But why stop there?

Because of how easy it is to add these templates, D8 can also desugar an additional 58 individual APIs from Java 9, 10, and 11 on existing types. This potentially allows Java libraries to target even newer versions of Java and still be used on Android.

A full list of the APIs which are available to desugar can be found here. Most of these are already available in AGP 3.6.0.

Backporting Types

Types like Optional, Function, Stream, and LocalDateTime are just some of those added in Java 8 which came to Android in API 24 and API 26. Backporting these to work on older API levels is more complicated than what it took to backport a single method for a few reasons.

class Main {
  public static void main(String... args) {
    System.out.println(LocalDateTime.now());
  }
}

LocalDateTime was introduced in Android API 26 and an app whose minimum API level is 26 or higher can call into the class directly.

[000240] Main.main:([Ljava/lang/String;)V
0000: sget-object v1, Ljava/lang/System;.out:Ljava/io/PrintStream;
0002: invoke-static {}, Ljava/time/LocalDateTime;.now:()Ljava/time/LocalDateTime;
0005: move-result-object v0
0006: invoke-virtual {v1, v0}, Ljava/io/PrintStream;.println:(Ljava/lang/Object;)V
0009: return-void

To enable the use of these types when the minimum API is below 26, the Android Gradle plugin (4.0 or newer) requires that you enable "core library desugaring" in its DSL.

android {
  compileOptions {
    coreLibraryDesugaringEnabled true
  }
}

Recompiling will change the bytecode to reference the backport types.

 [000240] Main.main:([Ljava/lang/String;)V
 0000: sget-object v1, Ljava/lang/System;.out:Ljava/io/PrintStream;
-0002: invoke-static {}, Ljava/time/LocalDateTime;.now:()Ljava/time/LocalDateTime;
+0002: invoke-static {}, Lj$/time/LocalDateTime;.now:()Lj$/time/LocalDateTime;
 0005: move-result-object v0
 0006: invoke-virtual {v1, v0}, Ljava/io/PrintStream;.println:(Ljava/lang/Object;)V
 0009: return-void

The call to java.time.LocalDateTime was simply rewritten to j$.time.LocalDateTime, but the rest of the APK has changed dramatically.

Using the diffuse tool we can get a high-level view of the changes.

$ diffuse diff app-min-26.apk app-min-25.apk
OLD: app-min-26.apk (signature: V2)
NEW: app-min-25.apk (signature: V2)

          │          compressed          │         uncompressed
          ├───────┬──────────┬───────────┼─────────┬──────────┬─────────
 APK      │ old   │ new      │ diff      │ old     │ new      │ diff
──────────┼───────┼──────────┼───────────┼─────────┼──────────┼─────────
      dex │ 680 B │   44 KiB │ +43.4 KiB │   944 B │ 90.9 KiB │ +90 KiB
     arsc │ 524 B │    520 B │      -4 B │   384 B │    384 B │     0 B
 manifest │ 603 B │    603 B │       0 B │ 1.2 KiB │  1.2 KiB │     0 B
    other │ 229 B │    229 B │       0 B │    95 B │     95 B │     0 B
──────────┼───────┼──────────┼───────────┼─────────┼──────────┼─────────
    total │ 2 KiB │ 45.4 KiB │ +43.4 KiB │ 2.6 KiB │ 92.6 KiB │ +90 KiB


         │        raw        │           unique
         ├─────┬──────┬──────┼─────┬─────┬────────────────
 DEX     │ old │ new  │ diff │ old │ new │ diff
─────────┼─────┼──────┼──────┼─────┼─────┼────────────────
   count │   1 │    2 │   +1 │     │     │
 strings │  16 │ 1005 │ +989 │  16 │ 996 │ +980 (+983 -3)
   types │   7 │  175 │ +168 │   7 │ 170 │ +163 (+164 -1)
 classes │   1 │   88 │  +87 │   1 │  88 │  +87 (+87 -0)
 methods │   5 │  728 │ +723 │   5 │ 727 │ +722 (+724 -2)
  fields │   1 │  255 │ +254 │   1 │ 255 │ +254 (+254 -0)

There's two important things that this summary tells us:

Our APK size grew by 43.4KB which is entirely attributed to dex files. Looking at the dex changes there are a bunch of new classes, methods, and fields.
The number of dex files increased from one to two despite the number of total methods being nowhere close to the limit. These were release builds so we should be getting the minimum number of dex files.

Let's break each of these down.

APK size impact

Historically, in order to use the java.time APIs in an app with a minimum supported API level below 26 you would need to use the ThreeTenBP library (or ThreeTenABP). This is a standalone repackaging of the java.time APIs in the org.threeten.bp package which requires you to update all your imports.

D8 is basically performing that same operation but at the bytecode level. It rewrites your code from calling java.time to j$.time as seen in the bytecode diff above. To accompany that rewrite, an implementation needs to be bundled into the application. That is the cause of the large APK size change.

In this example the release APK is minified using R8 which also minifies the backport code. If minification is disabled, the increase in dex size jumps up to 180KB, 206 classes, 3272 methods, and 713 fields.

Second Dex

A release build will cause D8 or R8 to produce the minimum number of dex files required, and that's actually still the case here. D8 and R8 are responsible for producing the dex files for user code and your declared libraries. This means that only the Main type will be present in the first dex which we can confirm by dumping its members.

$ unzip app-min-25.apk classes.dex && \
    diffuse members --dex --declared classes.dex
com.example.Main <init>()
com.example.Main main(String[])

As D8 or R8 are compiling your code and performing rewrites to the j$ packages, they record the types and APIs that are being rewritten. This produces a set of shrinker rules that are specific to the backported types. Currently (i.e., for AGP 4.0.0-alpha06) these rules are located at build/intermediates/desugar_lib_project_keep_rules/release/out/4 and for this example contains only the LocalDateTime.now() reference.

-keep class j$.time.LocalDateTime {
    j$.time.LocalDateTime now();
}

All of the available backported types have been pre-compiled from OpenJDK source to a dex file as part of Google's desugar_jdk_libs project. That dex file is downloaded from Google's maven repo and then fed into a tool called L8 along with those generated keep rules. L8 shrinks this dex file in isolation using the provided rules to produce the final, second dex file.

Dumping the L8-minified second dex file shows a set of types and APIs that have been entirely obfuscated except for the LocalDateTime.now() API that the application is referencing.

$ unzip app-min-25.apk classes2.dex && \
    diffuse members --dex classes2.dex | grep -C 6 'LocalDateTime.now'
j$.time.LocalDateTime c(s) → long
j$.time.LocalDateTime compareTo(Object) → int
j$.time.LocalDateTime d() → h
j$.time.LocalDateTime d(s) → x
j$.time.LocalDateTime equals(Object) → boolean
j$.time.LocalDateTime hashCode() → int
j$.time.LocalDateTime now() → LocalDateTime
j$.time.LocalDateTime toString() → String
j$.time.a <init>(k)
j$.time.a a() → k
j$.time.a a: k
j$.time.a b() → f
j$.time.a c() → long

L8 is purpose-built for processing this special dex file. Previously in this series, R8 was introduced as...

...a version of D8 that also performs optimization. It’s not a separate tool or codebase, just the same tool operating in a more advanced mode.

Well L8 is a version of R8 that optimizes the JDK desugar dex file. It's not a separate tool or codebase, just the same tool operating in a more advanced mode.

It may not be clear why the explicit extra dex is needed rather than consuming the desugared JDK types like any other library and allowing them to be processed normally by R8. First of all, Google probably doesn't want me talking about it which should itself be somewhat of an indication why the extra ceremony is needed. For more information you can consult the OpenJDK source code license, specifically the very end. Sorry if that's not enough information, but I suspect that's all I'm allowed to say.

By virtue of always requiring at least a second dex, you either need have a minimum supported API of 21 or use legacy multidex. Most applications should choose the former, or use this feature as yet-another justification to potentially increase your minimum to 21.

Backporting methods on backported types

In addition to backporting methods on the types that have been around since API 1 like Long, D8 and R8 will also backport newer methods on these backportable types like Optional. These use the same template mechanism as detailed earlier, but will only be available when your minimum API level is high enough to access the target type or you have core library desugaring enabled.

For Stream and the four different optional types, D8 and R8 will backport 18 methods from Java 9, 10, and 11. The full list of those APIs can be found here.

Developer Story

As a developer wanting to write code using these APIs, how do you know which ones are available for backport? Currently there's not a great way to know about them all.

To start with, once you enable coreLibraryDesugaring the IDE and Lint will start allowing you to use the new types and new APIs when supported. Running Lint on this example will produce no errors despite the minimum supported API being below 26 which LocalDateTime would otherwise require. When library desugaring is disabled, though, the NewApi check fails as it normally would.

Main.java:7: Error: Call requires API level 26 (current min is 25): java.time.LocalDateTime#now [NewApi]
    System.out.println(LocalDateTime.now());
                                     ~~~

This ensures you don't errantly use an unsupported type or API, but it does not help for discoverability.

For now the best list of backported types is in the Android Studio 4.0 feature list and the best list of backported APIs on existing types are the two lists in this post (1, 2). Hopefully in the future these will be more discoverable, though.

The backporting of individual APIs has been improving since D8 and R8's inception. With core library desugaring now becoming available in Android Gradle plugin 4.0 alphas, applications have access to the foundational types from Java 8 even when their minimum supported API level is lower than when those types were introduced. It also means that Java libraries can start to leverage these types while still maintaining compatibility with Android.

It's important to remember that even with all this shiny new API availability, the JDK and Java APIs are continuing to improve along their six-month release cadence. While D8 and R8 can help bridge the gap by desugaring some of those APIs from Java 9, 10, and 11 even before they land in Android, pressure must be maintained to actually ship these APIs in the Android framework.

https://jakewharton.com/d8-library-desugaring

Public API challenges in Kotlin

Nov 21, 2019 Updated Nov 21, 2019

Show full content

Kotlin is justifiably lauded for its language features compared to today's Java. It has constructs which allow expressing common patterns with more concise alternatives. An overused example in every intro-to-Kotlin talk or blog post is comparing a Java "POJO" to a Kotlin data class.

Here's yet another one of those comparisons, but bear with me as it will be used to illustrate the points in this post.

public final class Person {
  private final @NonNull String name;
  private final int age;

  public Person(@NonNull String name, int age) {
    this.name = name;
    this.age = age;
  }

  public @NonNull String getName() { return name; }
  public int getAge() { return age; }

  @Override public String toString() {
    return "Person(name=" + name + ", age=" + age + ')'
  }
  @Override public boolean equals(@Nullable Object o) {
    if (o == this) return true;
    if (!(o instanceof Person)) return false;
    Person other = (Person) o;
    return name.equals(other.name)
        && age == other.age
  }
  @Override public int hashCode() {
    return Objects.hash(name, age);
  }
}

data class Person(
  val name: String,
  val age: Int
)

Let us assume that this Person type is exposed in a library. As a result, evolving its public API needs to be done in a way that's source and binary-compatible with previous versions. This post will cover some of the challenges of porting a library containing types like Person from Java to Kotlin while maintaining the required flexibility and exposing the correct conventions to each language.

Binary Compatibility

What changes are necessary in order to add a new property, nickname, to Person in a binary-compatible way?

For the manually-written Java type we add a new field, getter, and constructor parameter. In order to maintain compatibility, we retain the old constructor signature for old callers.

 public final class Person {
   private final @NonNull String name;
+  private final @Nullable String nickname;
   private final int age;

-  public Person(@NonNull String name, int age) {
+  public Person(@NonNull String name, @Nullable String nickname, int age) {
     this.name = name;
+    this.nickname = nickname;
     this.age = age;
   }

+  public Person(@NonNull String name, int age) {
+    this(name, null, age);
+  }
+
   public @NonNull String getName() { return name; }
+  public @Nullable String getNickname() { return nickname; }
   public int getAge() { return age; }

   @Override public String toString() {
-   return "Person(name=" + name + ", age=" + age + ')'
+   return "Person(name=" + name + ", nickname=" + nickname + ", age=" + age + ')'
   }
   @Override public boolean equals(@Nullable Object o) {
     if (o == this) return true;
     if (!(o instanceof Person)) return false;
     Person other = (Person) o;
     return name.equals(other.name)
+        && Objects.equals(nickname, other.nickname)
         && age == other.age
   }
   @Override public int hashCode() {
-    return Objects.hash(name, age);
+    return Objects.hash(name, nickname, age);
   }
 }

So tedious!

The Kotlin class only needs a new property and the secondary constructor for compatibility.

 data class Person(
   val name: String,
+  val nickname: String?,
   val age: Int
-)
+) {
+  constructor(name: String, age: Int) : this(name, null, age)
+}

Much nicer, right? Unfortunately we have created two backwards-incompatible changes in the Kotlin version despite our efforts.

Destructuring Functions

For each property defined in the primary constructor, a data class will generate a componentN() function to facilitate destructuring declarations. We can see these by running javap on the original Kotlin version of Person:

$ javap Person.class
Compiled from "Person.kt"
public final class Person {
  public final java.lang.String getName();
  public final int getAge();
  public final java.lang.String component1();
  public final int component2();
   ⋮

Adding the nickname property in the middle of the primary constructor causes these component methods to shift incompatibly.

 public final class Person {
   public final java.lang.String getName();
+  public final java.lang.String getNickname();
   public final int getAge();
   public final java.lang.String component1();
-  public final int component2();
+  public final java.lang.String component2();
+  public final int component3();
    ⋮

Consumers who are destructuring Person will receive a NoSuchMethodError at runtime unless they also recompile their code.

We can work around this by only adding new properties at the end of the primary constructor. This will ensure that existing component methods do not change their return type.

A nice property of being forced to only append properties is that we can rely on default values and the @JvmOverloads annotation to avoid having to manually write secondary constructors.

-data class Person(
+data class Person @JvmOverloads constructor(
   val name: String,
   val age: Int,
+  val nickname: String? = null
 )

The downside of this approach is that you can no longer control the order of properties.

Copy Functions

In addition to the component functions, two copy functions are also generated automatically.

$ javap Person.class
Compiled from "Person.kt"
public final class Person {
   ⋮
  public final Person copy(java.lang.String, int);
  public static Person copy$default(Person, java.lang.String, int, int, java.lang.Object);
   ⋮

These support creating a new instance of a Person while also updating a subset of its properties (e.g., alice.copy(age = 99)).

Unfortunately, adding the nickname property changes the signature of both of these methods breaking compatibility.

 public final class Person {
    ⋮
-  public final Person copy(java.lang.String, int);
+  public final Person copy(java.lang.String, java.lang.String, int);
-  public static Person copy$default(Person, java.lang.String, int, int, java.lang.Object);
+  public static Person copy$default(Person, java.lang.String, java.lang.String, int, int, java.lang.Object);
    ⋮

Even if you are only appending properties to avoid breaking the component functions, these two signatures always change. The use of @JvmOverloads on the primary constructor does not propagate to the copy functions. Any consumers using copy will now receive a NoSuchMethodError at runtime.

Mitigation: No data

The only real way to avoid these binary-incompatibilities for public API is to avoid the data modifier from the start and implement equals, hashCode, and toString yourself. Adding nickname to a non-data class can be now done in a fully-compatible way.

class Person(
  val name: String,
  val nickname: String?,
  val age: Int
) {
  // ...

  constructor(name: String, age: Int) : this(name, null, age)

  override fun toString() = "Person(name=$name, nickname=$nickname, age=$age)"
  override fun equals(other: Any?) = other is Person
      && name == other.name
      && nickname == other.nickname
      && age == other.age
  override fun hashCode() = Objects.hash(name, nickname, age)
}

You can implement the componentN() functions yourself to support destructuring. If you plan to add properties in the middle of the list, however, it may not make sense for the type to support destructuring.

The copy method can also be written manually, but evolving it compatibly is tricky. The simplest way is to maintain all of the old versions of the function but mark them as @Deprecated(level=HIDDEN). This will keep their methods in the bytecode for old callers, but prevent new users from calling anything but the latest version.

class Person(
  val name: String,
  val nickname: String?,
  val age: Int
) {
  // ...

  @Deprecated("", level = HIDDEN) // For binary compatibility.
  fun copy(name: String = this.name, age: Int = this.age) =
      copy(name = name, age = age) // Calls the function below.

  fun copy(name: String = this.name, nickname: String? = this.nickname, age: Int = this.age) =
      Person(name, nickname, age)
}

Interop Compatibility

Another part of compatibility when migrating the Person library from Java to Kotlin is maintaining correct conventions for the API exposed to each language.

To avoid the explosion of constructors in Java, the Person type would traditionally hide its constructor and expose a nested Builder class. This not only allows adding new properties without a concern of binary compatibility, but allows properties to be supplied in any order and for partially-constructed instances to be passed around.

 public final class Person {
   ⋮

-  public Person(@NonNull String name, @Nullable String nickname, int age) {
+  private Person(@NonNull String name, @Nullable String nickname, int age) {
     this.name = name;
     this.nickname = nickname;
     this.age = age;
   }

   ⋮
+
+  public static final class Builder {
+    private String name;
+    private String nickname;
+    private int age;
+
+    public Builder setName(String name) { this.name = name; }
+    public Builder setNickname(String nickname) { this.nickname = nickname; }
+    public Builder setAge(int age) { this.age = age; }
+
+    public Person build() {
+      return new Person(requireNonNull(name), nickname, age);
+    }
+  }
 }

Creating the builder in Kotlin is nearly identical.

-class Person(
+class Person private constructor(
   val name: String,
   val nickname: String?,
   val age: Int
 ) {
   override fun toString() = TODO()
   override fun equals(other: Any) = TODO()
   override fun hashCode() = TODO()
+
+  class Builder {
+    private var name: String? = null
+    private var nickname: String? = null
+    private var age: Int = 0
+
+    fun setName(name: String?) = apply { this.name = name }
+    fun setNickname(nickname: String?) = apply { this.nickname = nickname }
+    fun setAge(age: Int) = apply { this.age = age }
+
+    fun build() = Person(name!!, nickname, age)
+  }
 }

Nothing too interesting here, but by supporting Java we're starting to create problems for Kotlin.

Builder Boilerplate

A builder is usually a mutable(ish) version of an immutable type that also is responsible for validating any invariants (such as, in this case, that name is not null). It can be tempting to rewrite it in Kotlin as public vars to avoid the manual setter boilerplate.

 class Builder {
-  private var name: String? = null
+  var name: String? = null
-  private var nickname: String? = null
+  var nickname: String? = null
-  private var age: Int = 0
+  var age: Int = 0

-  fun setName(name: String?) = apply { this.name = name }
-  fun setNickname(nickname: String?) = apply { this.nickname = nickname }
-  fun setAge(age: Int) = apply { this.age = age }
-
   fun build() = Person(name!!, nickname, age)
 }

Unfortunately, doing so would be incorrect. The return type of the generated setters are now void instead of Builder.

Without a language change to allow property setters to return values, we are forced to use setter functions. I tend to keep the public var but hide its void-returning setter from Java with the @JvmSynthetic annotation. This allows Kotlin users to still get full usage of the property for reading and writing.

 class Builder {
-  private var name: String? = null
+  @set:JvmSynthetic // Hide 'void' setter from Java
+  var name: String? = null
-  private var nickname: String? = null
+  @set:JvmSynthetic // Hide 'void' setter from Java
+  var nickname: String? = null
-  private var age: Int = 0
+  @set:JvmSynthetic // Hide 'void' setter from Java
+  var age: Int = 0

   fun setName(name: String?) = apply { this.name = name }
   fun setNickname(nickname: String?) = apply { this.nickname = nickname }
   fun setAge(age: Int) = apply { this.age = age }

   fun build() = Person(name!!, nickname, age)
 }

There is no annotation to hide the setter functions from Kotlin callers. While not essential, they're far better served by mutating the properties in an apply { } block.

Constructor

By virtue of making the primary constructor private we've removed the idiomatic means of creating a Person for Kotlin. Instead of a builder, Kotlin prefers default parameter values and named arguments. The @JvmSynthetic annotation can't be used to hide constructors from Java, so we need to purse a different approach.

There is a convention of defining a top-level function whose name is the same as a type which we can use to replicate the constructor.

fun Person(name: String, nickname: String? = null, age: Int): Person {
  return Person(name, nickname, age)
}

Since this is a regular function and not a constructor, we can hide it from Java with @JvmSynthetic.

+@JvmSynthetic // Hide from Java callers who should use Builder.
 fun Person(name: String, nickname: String? = null, age: Int): Person {
  ⋮

Once again, however, we've fallen into a binary compatibility trap. This signature has the same problem as the copy function that was generated for a data class.

Thankfully, since we wrote this function, the same mitigation trick can be used as outlined above for a manually-written copy. That is, we maintain the old versions of the function and mark them as @Deprecated(level=HIDDEN).

These factory functions have no way of enforcing only named-parameter usage. As a result, they are vulnerable to source-incompatibility issues as arguments change position.

There's also the problem of having to duplicate default values in each of these factory functions and the builder. A best practice would be to maintain defaults in private constants that could be re-used, but that requires additional discipline and continues to add boilerplate.

Mitigation: Factory DSL?

While currently unconventional, another potential workaround for the constructor problem is to change from a function-like syntax to a DSL-like syntax leveraging the Builder.

@JvmSynthetic // Hide from Java callers who should use Builder.
fun Person(initializer: Person.Builder.() -> Unit): Person {
  return Person.Builder().apply(initializer).build()
}

Creation of an instance now looks more like inline-JSON.

val alice = Person {
  name = "Alice Alison"
  age = 99
}

This also has the advantage of re-using any default values from the builder allowing them to be localized in one place.

DSLs tend to have specialized usage and are do not currently have widespread usage as factories. Their ability to enforce named usage and maintain source and binary compatibility as properties are introduced makes them an attractive solution, however.

Summary

Using Kotlin types whose properties will change over time in public API requires extra care to maintain source and binary compatibility as well as an idiomatic API for each language.

Avoid using the data modifier. Instead, implement equals, hashCode, and toString yourself for these value-based types.
Expose a builder for Java callers. Public vars are not enough, fluent setters need to be written.
Hide constructors and be mindful of factory function binary compatibility. Reusing the builders for a DSL-factory may be a way to avoid this.

If your type is not going to change its properties over time (like a 2D point) you can ignore this advice and stick with a simple data class.

Here is the final Person declaration for the public API of a library:

class Person private constructor(
  val name: String,
  val nickname: String?,
  val age: Int
) {
  override fun toString() = "Person(name=$name, nickname=$nickname, age=$age)"
  override fun equals(other: Any?) = other is Person
      && name == other.name
      && nickname == other.nickname
      && age == other.age
  override fun hashCode() = Objects.hash(name, nickname, age)

  class Builder {
    @set:JvmSynthetic // Hide 'void' setter from Java
    var name: String? = null
    @set:JvmSynthetic // Hide 'void' setter from Java
    var nickname: String? = null
    @set:JvmSynthetic // Hide 'void' setter from Java
    var age: Int = 0

    fun setName(name: String?) = apply { this.name = name }
    fun setNickname(nickname: String?) = apply { this.nickname = nickname }
    fun setAge(age: Int) = apply { this.age = age }

    fun build() = Person(name!!, nickname, age)
  }
}

@JvmSynthetic // Hide from Java callers who should use Builder.
fun Person(initializer: Person.Builder.() -> Unit): Person {
  return Person.Builder().apply(initializer).build()
}

Quite the distance from the simple data class version, but it's at least safe to change over time.

Future versions of Kotlin will stabilize compiler plugins allowing these patterns to be placed behind annotations or custom modifiers.

// Hypothetical 'value' on 'class' provides generated 'equals',
// 'hashCode', and 'toString' similar to 'data'.
value class Person private constructor(
  val name: String,
  val nickname: String? = null,
  val age: Int
) {
  // Hypothetical 'builder' on nested 'class' exposes mutable
  // versions of primary constructor properties.
  builder class Builder
}

This will eliminate the boilerplate required to create Kotlin types suitable for evolving in public APIs.

https://jakewharton.com/public-api-challenges-in-kotlin

D8 Optimizations

Oct 30, 2019 Updated Oct 30, 2019

Show full content

Note: This post is part of a series on D8 and R8, Android's new dexer and optimizer, respectively. For an intro to D8 read "Android's Java 8 support". For an intro to R8 read "R8 Optimization: Staticization".

No, that's not a typo! While the optimizations in this series so far have been done by R8 during whole-program optimization, D8 can also perform some simple optimizations.

D8 was introduced as the new Java-to-Dalvik bytecode compiler for Android. It handles backporting of Java 8 language features to work on Android (as well as those of Java 9 and beyond). It also works around vendor- and version-specific bugs in the platform.

That's what we've seen from D8 so far in the series, but it has two other responsibilities that we'll cover in this post and the next:

Backporting methods to work on older API levels where they didn't exist.
Performing local optimizations to reduce bytecode size and/or improve performance.

We'll cover API backporting in the next post in the series. For now, let's look at some of the local optimizations that D8 might perform.

Switch Rewriting

The last two posts (1, 2) have dealt with optimizing switch statements. Both have slightly lied about the bytecode that D8 and R8 produce for certain switch statements. Let's look at one of those examples again.

enum Greeting {
  FORMAL, INFORMAL;

  static String greetingType(Greeting greeting) {
    switch (greeting) {
      case FORMAL: return "formal";
      case INFORMAL: return "informal";
      default: throw new AssertionError();
    }
  }
}

The full Java bytecode that was shown for greetingType used the lookupswitch bytecode which has offsets for where to jump when a value is matched.

static java.lang.String greetingType(Greeting);
  Code:
     0: getstatic     #2      // Field Main$1.$SwitchMap$Greeting:[I
     3: aload_0
     4: invokevirtual #3      // Method Greeting.ordinal:()I
     7: iaload
     8: lookupswitch  {
                   1: 36
                   2: 39
             default: 42
        }
    36: ldc           #4      // String formal
    38: areturn
    39: ldc           #5      // String informal
    41: areturn
    42: new           #6      // class java/lang/AssertionError
    45: dup
    46: invokespecial #7      // Method java/lang/AssertionError."<init>":()V
    49: athrow

The tableswitch Java bytecode was shown as being rewritten to packed-switch when converted to Dalvik bytecode.

[000584] Main.greetingType:(LGreeting;)Ljava/lang/String;
0000: sget-object v0, LMain$1;.$SwitchMap$Greeting:[I
0002: invoke-virtual {v2}, LGreeting;.ordinal:()I
0005: move-result v1
0006: aget v0, v0, v1
0008: packed-switch v0, 00000017
000b: new-instance v0, Ljava/lang/AssertionError;
000d: invoke-direct {v0}, Ljava/lang/AssertionError;.<init>:()V
0010: throw v0
0011: const-string v0, "formal"
0013: return-object v0
0014: const-string v0, "informal"
0016: return-object v0
0017: packed-switch-data (8 units)

If we actually compile and dex the above source file with D8, its Dalvik bytecode output is different.

 [0005f0] Main.greetingType:(LGreeting;)Ljava/lang/String;
 0000: sget-object v0, LMain$1;.$SwitchMap$Greeting:[I
 0002: invoke-virtual {v1}, LGreeting;.ordinal:()I
 0005: move-result v1
 0006: aget v0, v0, v1
-0008: packed-switch v0, 00000017
+0008: const/4 v1, #int 1
+0009: if-eq v0, v1, 0014
+000b: const/4 v1, #int 2
+000c: if-eq v0, v1, 0017
 000e: new-instance v0, Ljava/lang/AssertionError;
 0010: invoke-direct {v0}, Ljava/lang/AssertionError;.<init>:()V
 0013: throw v0
 0014: const-string v0, "formal"
 0016: return-object v0
 0017: const-string v0, "informal"
 0019: return-object v0
-0017: packed-switch-data (8 units)

Instead of a packed-switch at bytecode index 0008, there are a series of if/else if-like checks. Based on the indices, you might think this winds up producing a larger binary but it's actually the opposite. The original packed-switch is accompanied by a packed-switch-data bytecode that reports itself as being 8 units long. So the packed-switch version has a total cost of 26 bytecodes whereas the if/else if version only costs 20.

Rewriting switches to normal conditionals is only done when there is a bytecode savings. This depends on the number of case blocks, whether there's fallthrough, and whether or not the values are contiguous or not. D8 computes the cost of both forms and then chooses that which is smaller.

String Optimizations

Back in February there was a post on R8's string constants operations. It showed an example from OkHttp where a call to String.length was made on a constant.

static String patternHost(String pattern) {
  return pattern.startsWith(WILDCARD)
      ? pattern.substring(WILDCARD.length())
      : pattern;
}

When compiled with the old dx tool the output is a straightforward translation.

[0001a8] Test.patternHost:(Ljava/lang/String;)Ljava/lang/String;
0000: const-string v0, "*."
0002: invoke-virtual {v2, v0}, Ljava/lang/String;.startsWith:(Ljava/lang/String;)Z
0005: move-result v1
0006: if-eqz v1, 0010
0008: invoke-virtual {v0}, Ljava/lang/String;.length:()I
0011: move-result v1
0012: invoke-virtual {v2, v1}, Ljava/lang/String;.substring:(I)Ljava/lang/String;
000f: move-result-object v2
0010: return-object v2

Bytecode index 0008 performs the String.length call on the constant loaded at index 0000.

With D8, however, this method call on a constant is detected and evaluated at compile-time to its corresponding numerical value.

 [0001a8] Test.patternHost:(Ljava/lang/String;)Ljava/lang/String;
 0000: const-string v0, "*."
 0002: invoke-virtual {v1, v0}, Ljava/lang/String;.startsWith:(Ljava/lang/String;)Z
 0005: move-result v0
 0006: if-eqz v0, 000d
-0008: invoke-virtual {v0}, Ljava/lang/String;.length:()I
-0011: move-result v1
+0008: const/4 v0, #int 2
 0009: invoke-virtual {v1, v0}, Ljava/lang/String;.substring:(I)Ljava/lang/String;
 000c: move-result-object v1
 000d: return-object v1

Removing a method call is not something that D8 or even R8 will normally do. This optimization is only safe to apply because String is a final class in the framework with well-defined behavior.

In the nine months since the original post, the number of methods on a string which can be optimized has grown substantially. Both D8 and R8 will compute isEmpty(), startsWith(String), endsWith(String), contains(String), equals(String), equalsIgnoreCase(String), contentEquals(String), hashCode(), length(), indexOf(String), indexOf(int), lastIndexOf(String), lastIndexOf(int), compareTo(String), compareToIgnoreCase(String), substring(int), substring(int, int), and trim() on a constant string. Obviously it's unlikely that most of these will apply without R8 inlining, but they're there when it does occur.

Known Array Lengths

Just like how you might call length() on a constant string to maintain a single source of truth, it's not uncommon to see code call length on an array which has a constant size for the same reason.

Let's once again turn to OkHttp for a Kotlin example of this pattern.

private fun decodeIpv6(input: String, pos: Int, limit: Int): InetAddress? {
  val address = ByteArray(16)
  var b = 0

  var i = pos
  while (i < limit) {
    if (b == address.size) return null // Too many groups.

The use of address.size (which becomes a call to length in bytecode) prevents having to duplicate the 16 constant or extract it to a shared constant value. The downside is that each iteration of this parsing loop has resolve the array length as seen in output of dx.

[00020c] OkHttpKt.decodeIpv6:(Ljava/lang/String;II)Ljava/net/InetAddress;
0000: const/16 v5, #int 16
0002: new-array v0, v5, [B
0004: const/4 v1, #int 0
0005: const/4 v2, #int 0
0006: if-ge v2, v8, 0036
0008: array-length v6, v0
0009: if-ne v1, v6, 000b
 ⋮

The constant 16 is loaded into register v5 at bytecode index 0000 which is used as the array size at index 0002. The resulting array reference is stored in register v0. The loop then starts at index 0006 with the i < limit comparison. Inside the loop, v0's array length is loaded into v6 at index 0008 to be tested in the if at index 0009.

D8 recognizes that the length lookup is being done on an array reference which does not change and whose size is known at compile-time.

 [00020c] OkHttpKt.decodeIpv6:(Ljava/lang/String;II)Ljava/net/InetAddress;
 0000: const/16 v5, #int 16
 0002: new-array v0, v5, [B
 0004: const/4 v1, #int 0
 0005: const/4 v2, #int 0
 0006: if-ge v2, v8, 0036
-0008: array-length v6, v0
-0009: if-ne v1, v6, 000b
+0009: if-ne v1, v5, 000b
  ⋮

The call to array-length is removed and the if is rewritten to re-use register v5 which is the size that was used to create the array.

On its own this pattern is not overly common. Once again it plays well when R8 inlining comes into effect and a method checking array.length is inlined into a caller that declares a new array.

Each of these optimizations are small. D8 can only perform an optimization when it has no externally-visible effect and does not change program behavior. That pretty much limits it to optimizations which occur inside of a single method body.

At runtime you cannot tell that a switch was rewitten to if/else conditionals. You cannot tell that a call to length() on a constant string was replaced with its equivalent constant value. You cannot tell that a call to length on an array initialized in the same method was replaced with the input size. Each of these optimizations (and the few others) that D8 is able to perform result in slightly smaller and more-efficient bytecode. And, of course, when you invoke the full power of R8, their impact is multiplied.

In the next post we'll start to cover how D8 backports new APIs on existing types to work on older API levels.

https://jakewharton.com/d8-optimizations

R8 Optimization: Enum Switch Maps

Oct 16, 2019 Updated Oct 16, 2019

Show full content

Note: This post is part of a series on D8 and R8, Android's new dexer and optimizer, respectively. For an intro to D8 read "Android's Java 8 support". For an intro to R8 read "R8 Optimization: Staticization".

The previous post on R8 covered enum ordinals which then allowed branch elimination to apply to a switch statement. In that post, the full bytecode for switch on an enum was omitted because there's actually more to the optimization.

Let's start with a simple enum and a switch over its contents in two separate source files (this will be important later).

enum Greeting {
  FORMAL, INFORMAL
}

class Main {
  static String greetingType(Greeting greeting) {
    switch (greeting) {
      case FORMAL: return "formal";
      case INFORMAL: return "informal";
      default: throw new AssertionError();
    }
  }

  public static void main(String... args) {
    System.out.println(greetingType(Greeting.INFORMAL));
  }
}

If we compile and run these files the output is as expected.

$ javac Greeting.java Main.java
$ java -cp . Main
informal

The bytecode in the previous post showed that the compiler produces a call to ordinal() which is then used in the switch. But if that was all that the compiler did, re-ordering the constants of Greeting would break the output of Main.

 enum Greeting {
-  FORMAL, INFORMAL
+  INFORMAL, FORMAL
 }

After changing the constant order, we can recompile only Greeting.java and yet the application still produces the correct output.

$ javac Greeting.java
$ java -cp . Main
informal

If the bytecode was only relying on the value of ordinal(), this code would have produced "formal".

Into The Bytecode

To understand how this works we can look at the Java bytecode of greetingType.

$ javap -c Main.class
class Main {
  static java.lang.String greetingType(Greeting);
    Code:
       0: getstatic     #2      // Field Main$1.$SwitchMap$Greeting:[I
       3: aload_0
       4: invokevirtual #3      // Method Greeting.ordinal:()I
       7: iaload
       8: lookupswitch  {
                     1: 36
                     2: 39
               default: 42
          }
      36: ldc           #4      // String formal
      38: areturn
      39: ldc           #5      // String informal
      41: areturn
      42: new           #6      // class java/lang/AssertionError
      45: dup
      46: invokespecial #7      // Method java/lang/AssertionError."<init>":()V
      49: athrow
}

Let's break the contents down. The first bytecode of this method has a lot of information to unpack:

0: getstatic     #2      // Field Main$1.$SwitchMap$Greeting:[I

This looks up a static field on the class Main$1 with the name $SwitchMap$Greeting and the type int[]. We obviously did not write this class or field, so it must have been generated automatically by javac.

The next two bytecodes perform the call to ordinal() on the method argument.

3: aload_0
4: invokevirtual #3      // Method Greeting.ordinal:()I

Java bytecode is stack-based, so the int[] result of getstatic and the int value of ordinal() both remain on the stack. (If you don't understand how a stack-based machine works, you can watch this presentation for an introduction.) The next instruction uses that int[] and int as its operands.

7: iaload

This "integer array load" instruction looks up a value in the int[] at the index returned by ordinal(). The rest of the bytecodes of the method are a "normal" switch statement which uses the value from the array as its input.

Switch Maps

It's pretty clear that this $SwitchMap$Greeting array is the mechanism which allows our code to continue to work despite the ordinals changing their value. So how does it work?

When compiled, each case of the switch is assigned one-based index. The default branch is assigned zero.

switch (greeting) {
  case FORMAL: ...   // <-- index 1
  case INFORMAL: ... // <-- index 2
  default: ...       // <-- index 0
}

The $SwitchMap$Greeting array is populated at runtime in the static initializer of Main$1. The empty int[] is created first and assigned to the $SwitchMap$Greeting field.

0: invokestatic  #1      // Method Greeting.values:()[LGreeting;
3: arraylength
4: newarray      int
6: putstatic     #2      // Field $SwitchMap$Greeting:[I

The length of this array is the same as the number of constants (which might not match the number of case blocks). This is important since ordinals are used as an index into this array.

The next bytecodes are repeated for each constant used in the switch statement.

 9: getstatic     #2      // Field $SwitchMap$Greeting:[I
12: getstatic     #3      // Field Greeting.FORMAL:LGreeting;
15: invokevirtual #4      // Method Greeting.ordinal:()I
18: iconst_1
19: iastore

The ordinal of FORMAL, the first case subject, is used as the offset in the array where its corresponding switch index value of 1 is stored. The same is done for the ordinal of INFORMAL and the value 2. This int[] effectively creates a map from the ordinals which may change to a fixed set of integer values which will not.

Diagram showing the switch map working when the ordinals are changed.

By using this map, the switch statement can remain stable even when we re-arrange the constants of Greeting.

The Optimization

The switch map indirection created by javac is useful when the enum may be recompiled separately from the callers. Android applications are packaged as a single unit, so the indirection is nothing but wasted binary size and runtime overhead.

Running D8 on the class files from above shows that the indirection is maintained.

$ java -jar $R8_HOME/build/libs/d8.jar \
      --lib $ANDROID_HOME/platforms/android-29/android.jar \
      --release \
      --output . \
      *.class

$ $ANDROID_HOME/build-tools/29.0.2/dexdump -d classes.dex
 ⋮
[00040c] Main.greetingType:(LGreeting;)Ljava/lang/String;
0000: sget-object v0, LMain$1;.$SwitchMap$Greeting:[I
0002: invoke-virtual {v1}, LGreeting;.ordinal:()I
0005: move-result v1
0006: aget v1, v0, v1
0008: packed-switch v1, 00000024
 ⋮

R8, however, performs whole-program analysis and optimization. There's no point for it to retain this indirection since the enum cannot change independently of the switch.

 [00040c] Main.greetingType:(LGreeting;)Ljava/lang/String;
-0000: sget-object v0, LMain$1;.$SwitchMap$Greeting:[I
 0000: invoke-virtual {v1}, LGreeting;.ordinal:()I
 0003: move-result v1
-0006: aget v1, v0, v1
 0004: packed-switch v1, 00000024

The branches of the switch are rewritten to account for the fact that the input now uses the zero-based ordinal directly instead of the one-based values from the switch map. With the Main$1 class and its array being no longer referenced, it is eliminated like normal dead code.

Only with this indirection removed can the enum ordinal optimization from the previous post result in eliminating the switch. Otherwise, the ordinal value would flow into the int[] as an index which is not safe to eliminate in the general case.

Kotlin

An enum used in a Kotlin when will also produce a similar indirection for the same reasons.

val Greeting.type get() = when (this) {
  Greeting.FORMAL -> "formal"
  Greeting.INFORMAL -> "informal"
}

When compiled, the Java bytecode shows a similar mechanism but with different names.

$ javap -c MainKt
public final class MainKt {
  public static final java.lang.String getType(Greeting);
    Code:
      0: aload_0
      1: getstatic     #21     // Field MainKt$WhenMappings.$EnumSwitchMapping$0:[I
      4: swap
      5: invokevirtual #27     // Method Greeting.ordinal:()I
      8: iaload
      9: tableswitch   {
                    1: 36
                    2: 41
              default: 46
         }
     ⋮

The generated class is suffixed with $WhenMappings instead of an arbitrary integer and the array is named $EnumSwitchMapping$0.

R8 initially did not detect Kotlin mappings because of these slightly different names. Version 1.6 of R8 (included in AGP 3.6) will correctly detect and eliminate them.

Switch map elimination is a nice win for binary size and runtime performance. More importantly, by removing an indirection between the input to a switch and its branching logic, other optimizations like turning calls to ordinal() into a constant can result in branch elimination.

More R8 optimization posts coming soon. Stay tuned!

https://jakewharton.com/r8-optimization-enum-switch-maps

R8 Optimization: Enum Ordinals and Names

Oct 9, 2019 Updated Oct 9, 2019

Show full content

Note: This post is part of a series on D8 and R8, Android's new dexer and optimizer, respectively. For an intro to D8 read "Android's Java 8 support". For an intro to R8 read "R8 Optimization: Staticization".

Enums are (and have always been!) a recommended way to model a fixed set of constants. Most commonly an enum only provides a set of possible constants and nothing more. But being full classes, enums can also carry helper methods and fields (both instance and static) or even implement interfaces.

A common optimization for enums in tools that perform whole-program optimization is to replace simple occurrences (i.e., those which don't have fields, methods, or interfaces) with integer values. However, there are other optimizations which are applicable to all enums that are still available.

Ordinal

Each enum constant has an ordinal() which returns its position in the list of all constants. Since the ordinal range is always [0, N), it can be used for indexing into other zero-based data structures such as arrays or even bits. The most common usage is actually by the Java compiler itself for switch statements over enums.

enum Greeting {
  FORMAL {
    @Override String greet(String name) {
      return "Hello, " + name;
    }
  },
  INFORMAL {
    @Override String greet(String name) {
      return "Hey " + name + '!';
    }
  };

  abstract String greet(String name);

  static String type(Greeting greeting) {
    switch (greeting) {
      case FORMAL: return "formal";
      case INFORMAL: return "informal";
      default: throw new AssertionError();
    }
  }
}

The compiled bytecode reveals the hidden call to ordinal().

[000a34] Greeting.type:(LGreeting;)Ljava/lang/String;
0000: invoke-virtual {v1}, LGreeting;.ordinal:()I
0003: move-result v1
 ⋮

If we call this method with one of the constants, an opportunity for optimization presents itself.

public static void main(String... args) {
  System.out.println(Greeting.type(Greeting.INFORMAL));
}

As this is the only usage of type in our whole application, R8 inlines the method.

[000b60] Greeter.main:([Ljava/lang/String;)V
0000: sget-object v1, Ljava/lang/System;.out:Ljava/io/PrintStream;
0002: sget-object v0, LGreeting;.INFORMAL:LGreeting;
0004: invoke-virtual {v0}, LGreeting;.ordinal:()I
0007: move-result v0
 ⋮
0047: invoke-virtual {v1, v2}, Ljava/io/PrintStream;.println:(Ljava/lang/String;)V
0050: return-void

Bytecode index 0002 looks up the INFORMAL enum constant and then 0004 - 0007 invokes its oridinal() method. This is now a wasteful operation since the ordinal of the constant is known at compile-time.

R8 detects when a constant lookup flows into a call to ordinal() and replaces the call and lookup with the correct integer value that the call would produce.

 [000b60] Greeter.main:([Ljava/lang/String;)V
 0000: sget-object v1, Ljava/lang/System;.out:Ljava/io/PrintStream;
-0002: sget-object v0, LGreeting;.INFORMAL:LGreeting;
-0004: invoke-virtual {v0}, LGreeting;.ordinal:()I
-0007: move-result v0
+0002: const/4 v0, #int 1
  ⋮
 0042: invoke-virtual {v1, v2}, Ljava/io/PrintStream;.println:(Ljava/lang/String;)V
 0045: return-void

This constant value now flows into the switch statement which can be eliminated leaving only the desired branch.

 [000b60] Greeter.main:([Ljava/lang/String;)V
 0000: sget-object v1, Ljava/lang/System;.out:Ljava/io/PrintStream;
-0002: const/4 v0, #int 1
- ⋮
+0002: const-string v0, "informal"
 0004: invoke-virtual {v1, v0}, Ljava/io/PrintStream;.println:(Ljava/lang/String;)V
 0007: return-void

Even though the language provides switches over an enum, it's implementation is all based on integers from the ordinal values. It's a simple optimization to replace calls to ordinal() on fixed constants, but it enables more advanced optimizations like branch elimination to apply where they otherwise could not.

Name

In addition to ordinal(), each enum constant exposes its declared name through the name() method. The toString() will also return the declared name by default, but since that method can be overridden it's important to have a distinct name().

enum Greeting {
  FORMAL { /* … */ },
  INFORMAL { /* … */ };

  abstract String greet(String name);

  @Override public String toString() {
    return "Greeting(" + name().toLowercase(US) + ')';
  }
}

The value of name() is sometimes used for display, logging, or serialization.

static void printGreeting(Greeting greeting, String name) {
  System.out.println(greeting.name() + ": " + greeting.greet(name));
}

public static void main(String... args) {
  printGreeting(Greeting.FORMAL, "Jake");
}

This program prints "FORMAL: Hello, Jake" when run. Once again, by virtue of only being called from one place, R8 inlines printGreeting into main.

[000474] Greeting.main:([Ljava/lang/String;)V
0000: sget-object v3, LGreeting;.FORMAL:LGreeting;
0002: sget-object v0, Ljava/lang/System;.out:Ljava/io/PrintStream;
0004: new-instance v1, Ljava/lang/StringBuilder;
0006: invoke-direct {v1}, Ljava/lang/StringBuilder;.<init>:()V
0009: invoke-virtual {v3}, LGreeting;.name:()Ljava/lang/String;
000c: move-result-object v2
 ⋮
0022: invoke-virtual {v1, v2}, Ljava/io/PrintStream;.println:(Ljava/lang/String;)V
0025: return-void

Bytecode index 0000 looks up the FORMAL enum constant and then 0009 - 000c invokes its name() method. Just like ordinal(), this is a wasteful operation as the name of the constant is known at compile-time.

R8 again detects when a constant enum lookup flows into a call to name() and replaces the call and lookup with a string constant. If you read the economics of generated code post, it talked about the cost of generating new string constants. Thankfully, because these strings share their name with the name of the enum constant, we do not pay for a new string.

 [000474] Greeting.main:([Ljava/lang/String;)V
 0000: sget-object v3, LGreeting;.FORMAL:LGreeting;
 0002: sget-object v0, Ljava/lang/System;.out:Ljava/io/PrintStream;
 0004: new-instance v1, Ljava/lang/StringBuilder;
 0006: invoke-direct {v1}, Ljava/lang/StringBuilder;.<init>:()V
-0009: invoke-virtual {v3}, LGreeting;.name:()Ljava/lang/String;
-000c: move-result-object v2
+0009: const-string v2, "FORMAL"
  ⋮
 0020: invoke-virtual {v1, v2}, Ljava/io/PrintStream;.println:(Ljava/lang/String;)V
 0023: return-void

The lookup at bytecode index 0000 still occurs because the code needs to invoke the greet method, but the call to name() was eliminated.

This optimization won't enable other large optimizations like branch elimination to apply. But, since it produces a string, any string operations that are done on the result of the name() call may also be performed at compile-time.

For enums without a toString() override, this optimization will also apply to calls to toString() which defaults to being the same as name().

Both of these enum optimizations are small and really only work in the context of other R8 optimizations. Although, if it wasn't clear in this series by now, that's how most of these optimizations achieve their true power.

So far in this series I chose to highlight optimizations based on having found bugs in them or sometimes even suggesting them myself through the R8 issue tracker. But the two optimizations in this post are somewhat special because I actually managed to contribute these myself! I suspect we won't see much else of my contribution in the series, but it feels good to have at least played a small part.

In the next post we'll come back to the enum ordinal optimization because switch statements on enums are far more complicated than they seem. Stay tuned!

https://jakewharton.com/r8-optimization-enum-ordinals-and-names

R8 Optimization: Class Reflection and Forced Inlining

Sep 25, 2019 Updated Sep 25, 2019

Show full content

Note: This post is part of a series on D8 and R8, Android's new dexer and optimizer, respectively. For an intro to D8 read "Android's Java 8 support". For an intro to R8 read "R8 Optimization: Staticization".

The previous post on R8 covered method outlining which automatically de-duplicated code. This was actually a detour from what I had promised was next at the end of the class constant operations post which preceded it. So let's get back on track.

Class constant operations allow R8 to take calls such as MyActivity.class.getSimpleName() and replace it with the string literal "MyActivity". This was presented in the context of log tags, where you might write that expression instead of the string literal so that the tag always reflects the actual class name, even after obfuscation. This works great in a static context where the MyActivity.class literal is fixed, but it does not work when used on an instance.

Instance reflection

When dealing with an instance, the Class reference is obtained by calling getClass() instead of a MyActivity.class literal. This operation is not terribly expensive, but it is still a form of reflection.

class MyActivity extends Activity {
  @Override void onCreate(Bundle savedInstanceState) {
    super.onCreate(savedInstanceState);
    String name = this.getClass().getSimpleName();
    Log.e(name, "Hello!");
  }
}

The getClass() API is just a normal method on every Object and appears as a normal invoke-virtual in bytecode.

[0003d0] MyActivity.onCreate:(Landroid/os/Bundle;)V
0000: invoke-super {v1, v2}, Landroid/app/Activity;.onCreate:(Landroid/os/Bundle;)V
0003: invoke-virtual {v1}, Ljava/lang/Object;.getClass:()Ljava/lang/Class;
0006: move-result-object v2
0007: invoke-virtual {v2}, Ljava/lang/Class;.getSimpleName:()Ljava/lang/String;
000a: move-result-object v2

Since R8 is performing whole-program analysis, it knows that there are no subtypes of MyActivity even though it's not marked as final. As a result, it can replace calls to this.getClass() with MyActivity.class.

 [000170] MyActivity.onCreate:(Landroid/os/Bundle;)V
 0000: invoke-super {v1, v2}, Landroid/app/Activity;.onCreate:(Landroid/os/Bundle;)V
-0003: invoke-virtual {v1}, Ljava/lang/Object;.getClass:()Ljava/lang/Class;
-0006: move-result-object v2
+0003: const-class v2, Lcom/example/MyActivity;
 0005: invoke-virtual {v2}, Ljava/lang/Class;.getSimpleName:()Ljava/lang/String;
 0008: move-result-object v2

Beyond that, the Class<?> reference immediately flows into a call to getSimpleName(). Thus, the optimization covered in the previous post can now apply producing only the simple constant string.

 0000: invoke-super {v1, v2}, Landroid/app/Activity;.onCreate:(Landroid/os/Bundle;)V
-0003: const-class v2, Lcom/example/MyActivity;
-0005: invoke-virtual {v2}, Ljava/lang/Class;.getSimpleName:()Ljava/lang/String;
-0008: move-result-object v2
+0003: const-string v2, "MyActivity"

But how often do you write this.getClass() where the class is known unequivocally?

In keeping with the example of logging, let's look at a hypothetical library which accepts an Activity and an optional name for use with logging.

class SomeLibrary {
  static SomeLibrary create(Activity activity) {
    return create(activity, activity.getClass().getSimpleName());
  }

  static SomeLibrary create(Activity activity, String name) {
    return new SomeLibrary(activity, name);
  }

  private SomeLibrary(Activity activity, String name) {
    // ...
  }

  void doSomething() {
    Log.d(name, "Starting work!");
    // ...
  }
}

When a name is not supplied, it is inferred from the activity class name using getClass().getSimpleName(). Since the input is not a fixed class literal, this cannot be replaced with a string at compile-time.

Calling this from an activity is straightforward and reminiscent of a few popular libraries.

class MyActivity extends Activity {
  private SomeLibrary library;

  @Override void onCreate(Bundle savedInstanceState) {
    super.onCreate(savedInstanceState);
    library = SomeLibrary.create(this);
  }

  @Override void onResume() {
    library.doSomething();
  }
}

The inlining of method bodies has been a staple in previous R8 posts as it often unlocks optimizations that otherwise would not apply. This example is no different in that regard, but it is different because the create(Activity) method is too large to be inlined normally. The three method calls to getClass(), getSimpleName(), and the create() overload, along with specifying the arguments to those methods, exceeds the maximum allowed method body size for inline candidates.

Inlining by force

R8 advertises its configuration rules as being compatible with those documented for ProGuard, the tool it's meant to replace. But aside from honoring what ProGuard supports, it does have a undocumented rules of its own. An example of this was shown in the value assumption post (and ProGuard has since come to add support for that rule!). While undocumented, this rule is supported by R8.

Another undocumented, R8-specific rule can help guide inlinining is -alwaysinline. This directive overrides the limitations of normal inlining to inline method bodies which might not otherwise be considered. Unfortunately, this rule is undocumented for a very good reason: it is completely unsupported and supposed to be for testing-purposes only.

By using -alwaysinline, the create(Activity) method can be forced to be inlined.

-alwaysinline class com.example.SomeLibrary {
  static void create(android.app.Activity);
}

This causes the getClass().getSimpleName() calls to be moved from the library code to each call site.

 @Override void onCreate(Bundle savedInstanceState) {
   super.onCreate(savedInstanceState);
-  library = SomeLibrary.create(this);
+  library = SomeLibrary.create(this, this.getClass().getSimpleName());
 }

As a result, we've created the above scenario where the enclosing class is known at compile time. It will be replaced with the MyActivity.class class literal which is then quickly replaced with the "MyActivity" string literal.

 @Override void onCreate(Bundle savedInstanceState) {
   super.onCreate(savedInstanceState);
-  library = SomeLibrary.create(this, this.getClass().getSimpleName());
+  library = SomeLibrary.create(this, "MyActivity");
 }

Once again we see the power of successive optimizations applying. No more reflection!

Unlike previous posts where inlining happened automatically, the unsupported -alwaysinline directive forced this behavior in R8. Inlining should only be forced like this when you know that a subsequent optimization will apply to offset the bytecode impact. In this example, there is a chance that the instance cannot be determined at compile-time and we end up slightly bloating the bytecode. And, of course, the unsupported nature of the rule means it may change or disappear at any time. For a stable solution, Kotlin's inline function modifier has the same effect, but only for Kotlin callers.

Replacing calls to getClass() with a class literal is a very small optimization. It saves only four bytes when inlined, but its greatest contribution is enabling other optimizations to apply. Subsequent calls to methods like getSimpleName() can now be eliminated which then opens up string optimizations to potentially apply.

In future R8 posts we'll come back to this getClass() optimization and others which it enables. But for now, there's a lot of other R8 optimizations that I want to cover without promising a specific topic next, so stay tuned.

https://jakewharton.com/r8-optimization-class-reflection-and-forced-inlining

Calculating the true impact of zip file entries

Sep 20, 2019 Updated Sep 20, 2019

Show full content

How can we determine the impact of each entry on a zip file's size? It seems like a trivial problem, but things quickly don't add up.

There's three built-in ways to read information about the contents of zip file in Java:

Mount the zip as a FileSystem using FileSystems.newFileSystem and then access its contents using Paths.
Open it with ZipInputStream for a one-shot iteration over the zip entries.
Open it with ZipFile for random access to the zip entries.

The first mechanism is extremely convenient. It allows interacting with the contents of a zip file using the same APIs as normal files. Unfortunately, by virtue of being exposed like regular files, you only have one way to check their size: Files.size(Path). This delegates to an API called BasicFileAttributes.size() which returns size of the file contents. While there is a ZipFileAttributes.compressedSize() for returning the size of the compressed contents, it's internal to the JDK and not available for our use.

The other two mechanisms,ZipInputStream and ZipFile, both expose entries using the ZipEntry type. These being zip-centric APIs, many of the properties of the zip file format are directly available. Notably for our use case, there's a getCompressedSize() method.

Problem solved? Not exactly…

If you sum the compressed size of all entries in a zip the result will not equal the size of the zip file. This isn't entirely unexpected. After all, the zip file format surely requires additional metadata to track per-entry information like the relative path of each compressed file.

So if we're looking to calculate the actual size impact of an entry on the final zip, can we do it?

Zip file format

An overview of the zip file format specification can be found on Wikipedia. It consists of a list of entries which are each defined as header followed by the compressed data (whose length is specified in the header). Finally, at the end, there is a central directory which lists all of the entries available in the file.

A slight tangent: Given this format, it's pretty obvious how ZipInputStream and ZipFile work. The former simply iterates forward through the bytes reading each entry as it comes. The latter parses the central directory at the end and then jumps to the offset of whichever entry you request.

Back on our problem, ZipEntry.getCompressedSize() is only exposing the length of compressed data (pictured as the blue <data> blocks). However, the header for each entry and the record in the central directory also contribute to the overall size impact. Thus, to get the real value, we need to be able to calculate the size of those two things.

Zip entry header

The header for each entry is defined as follows:

th,td { padding-right: 15px; padding-bottom: 5px; } table { margin-bottom: 15px; } Offset Size Description 0 4 Local file header signature ... ... ... 26 2 File name length (n) 28 2 Extra field length (m) 30 n File name 30+n m Extra field

Here we can see that the size of the header will be a fixed 30 bytes plus the length of ZipEntry.getName() (as UTF-8 bytes) plus the length of ZipEntry.getExtra() (which returns opaque bytes).

There is also an optional trailer which can be either 12 or 16 bytes. This is only present when a specific bit in one of the fields of the header is set. Unfortunately, the field which contains the bit is not exposed in the API of ZipEntry, and so we cannot include it in the calculation. Thankfully, this seems infrequently used.

Central directory record

The central directory is a list of records for each file followed by a single end-of-directory record.

The record for each entry is defined as follows:

Offset Size Description 0 4 Central directory file header signature ... ... ... 42 4 Relative offset of local file header. 46 n File name 46+n m Extra field 46+n+m k File comment

The size will be 46 bytes plus the length of ZipEntry.getName() plus the length of ZipEntry.getExtra() plus the length of ZipEntry.getComment() (as UTF-8 bytes).

The end-of-directory record is defined as follows:

Offset Size Description 0 4 End of central directory signature ... ... ... 20 2 Comment length (n) 22 n Comment

Its size is 22 bytes plus the length of ZipFile.getComment() (as UTF-8) bytes. ZipInputStream, since it only iterates forward over the entries, does not expose the zip comment.

Putting it all together

With this knowledge of the zip file format we can now calculate a more accurate representation of the impact of each entry.

static long entryImpactBytes(ZipEntry entry) {
  int nameSize = entry.getName().getBytes(UTF_8).length;
  int extraSize = entry.getExtra() != null
      ? entry.getExtra().length
      : 0;
  int commentSize = entry.getComment() != null
      ? entry.getComment().getBytes(UTF_8).length
      : 0;

  // Calculate the actual compressed size impact in the zip, not just compressed data size.
  // See https://en.wikipedia.org/wiki/Zip_(file_format)#File_headers for details.
  return entry.getCompressedSize()
      // Local file header. There is no way of knowing whether a trailing data descriptor
      // was present since the general flags field is not exposed, but it's unlikely.
      + 30 + nameSize + extraSize
      // Central directory file header.
      + 46 + nameSize + extraSize + commentSize;
}

Using this method, a sum of all entries will put you very close to the actual size of the zip file. All that's left is to account for the end-of-directory record from the central directory.

static int additionalBytes(ZipFile file) {
  int commentSize = file.getComment() != null
      ? file.getComment().getBytes(UTF_8).length
      : 0;
  return 22 + commentSize;
}

Using these two functions, the sum total should now exactly match the size of the zip file.

There's some small improvements to be had here if we want. For one, we don't need to encode the name and comment as UTF-8 bytes only then to get its length. Libraries like Guava and Okio provide methods for calculating the UTF-8 length directly on a String. Additionally, the zip format is so simple that you could write your own parser which included the file trailers in its calculation depending on how accurate you needed the numbers to be.

This entryImpactBytes method can be useful for calculating how much a zip file size will change when an entry is added or removed. But it really shines when you have two versions of a zip file. For example, reducing the contents of one file by 100 bytes and removing 50 bytes from its name will result in a net change of -200 bytes (2 * name diff + content diff). If you were only using ZipEntry.getCompressedSize() to compute such a difference, the result would only show a change of -100 bytes.

https://jakewharton.com/calculating-zip-file-entry-true-impact

Exceptions and proxies and coroutines, oh my!

Jul 31, 2019 Updated Jul 31, 2019

Show full content

Checked exceptions are a concept that exist only in the Java compiler and are enforced only in source code. In Java bytecode and at runtime in the virtual machine you're free to throw checked exceptions from anywhere regardless of whether they're declared. At least, anywhere except from a instance created by a Java Proxy.

A Proxy creates instances of interfaces at runtime where a single callback intercepts every method call. Libraries like Retrofit use proxies to create HTTP calls based on the annotations of interface methods. These methods tend to return promise-like objects such as RxJava's Single, Guava's ListenableFuture, or its own Call type.

// MyService.java
interface MyService {
  @GET("/user/{id}")
  Call<User> user(@Path("id") long id);
}

Retrofit recently added support for Kotlin coroutines' suspend functions which behave a bit differently. Aside from the suspend modifier, the method signature otherwise appears synchronous.

// MyService.kt
interface MyService {
  @GET("/user/{id}")
  suspend fun user(@Path("id") id: Long): User
}

Kotlin does not require declaring checked exceptions. With Retrofit using a Proxy and performing a network call that may throw an IOException, you might expect to be required to declare @Throws(IOException::class) though. This isn't actually required because the method signature gets rewritten by the Kotlin compiler to accept a Continuation parameter where both exceptions and results are forwarded.

// Approximate Java for the compiled bytecode of MyService.kt:
interface MyService {
  void user(@Path("id") long id, Continuation<? super User> continuation);
}

Despite rewriting the bytecode to be callback-based and Retrofit asynchronously invoking the Continuation, rare calls to this method were resulting in an UndeclaredThrowableException. This indicates a checked exception was somehow being synchronously thrown.

To understand why this was occurring and to craft a fix, we need to learn more about how coroutines work…

Coroutine Implementation Crash Course

The above approximation of the Kotlin MyService bytecode is inaccurate. While the Continuation parameter is the primary mechanism of delivering a success or error result, it's not the only mechanism.

// Exact Java equivalent of MyService.kt bytecode
interface MyService {
  Object user(@Path("id") long id, Continuation<? super User> continuation);
}

Object is used as the return type because a User instance can be directly returned if available synchronously. Otherwise, the method returns the "coroutine suspended" marker object to indicate suspension (where the result will be delivered to the Continuation).

This is one way that a checked exception could occur synchronously outside of Retrofit. When the method fails synchronously, the exception is allowed to propagate.

For asynchronous results, the Kotlin standard library provides the suspendCoroutine API.

suspend fun user(id: Long): User {
  return suspendCoroutine { continuation ->
    executor.execute {
      continuation.resume(User("jw"))
      // or continuation.resumeWithException(IOException("broken"))
    }
  }
}

This approximates to the following Java source:

public Object user(long id, Continuation<? super User> continuation) {
  // code inside lambda that calls into 'continuation'
  return COROUTINE_SUSPENDED;
}

The marker object is returned up the stack which frees the thread to run other code. Once the continuation is invoked, our code will resume as soon as any thread is free again.

Retrofit Coroutine Implementation

Retrofit uses the suspendCoroutine API with its own Callback to suspend while the HTTP request is sent on a background thread.

suspend fun <T : Any> Call<T>.awaitResponse(): Response<T> {
  return suspendCoroutine { continuation ->
    enqueue(object : Callback<T> {
      override fun onResponse(call: Call<T>, response: Response<T>) {
        continuation.resume(response)
      }

      override fun onFailure(call: Call<T>, t: Throwable) {
        continuation.resumeWithException(t)
      }
    })
  }
}

The implementation of Call.enqueue is very similar to the sample above which calls executor.execute { .. }. A thread pool picks up the Call, runs the request, and invokes the Callback when a reply is received.

It seems that Retrofit is not doing any work synchronously that would cause a checked exception. The stacktrace of the UndeclaredThrowableException even confirms that the work ran on the background Executor:

java.lang.reflect.UndeclaredThrowableException
    at ...
Caused by: java.net.UnknownHostException
    at ...
    at retrofit2.AsyncCall.execute(AsyncCall.java:172)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

Despite doing everything seemingly right, there's still clearly a bug or we never would see the UndeclaredThrowableException.

The Bug

There's one behavior of suspendCoroutine that was not mentioned above which is designed to protect the execution stack. If the lambda passed to suspendCoroutine invokes the Continuation parameter synchronously then instead of calling the real Continuation, the value is intercepted and propagated synchronously.

Going back to the sample, removing the call to executor.execute would create this behavior.

suspend fun user(id: Long): User {
  return suspendCoroutine { continuation ->
    continuation.resume(User("jw"))
  }
}

Without stack protection, invoking the continuation like this could cause the caller's code to resume beneath the current stack frame. This would lead to extremely deep call stacks which would eventually trigger a StackOverflowError.

suspendCoroutine performs interception by wrapping the Continuation. Here is the approximated Java equivalent:

public Object user(long id, Continuation<? super User> real) {
  ContinuationImpl<? super User> continuation = new ContinuationImpl(real);
  // code inside lambda that calls into 'continuation'
  return continuation.getResult();
}

The getResult() call will do one of three things:

If resume was already called on continuation, return the value that was supplied.
If resumeWithExceptionwas already called on continuation, throw the exception that was supplied.
Otherwise, return COROUTINE_SUSPENDED. Future calls to resume and resumeWithException will forward to the real continuation.

The behavior of case #2 provides a probable source of a checked exception being thrown synchronously which in turn would cause the UndeclaredThrowableException.

But this only explains the bug if the callback is invoked before the calling method is able to return. Since enqueue dispatches work to an Executor and immediately returns, the likelihood of this happening is zero. That is, at least, until you consider preemption.

There are two threads here: the caller and the background worker. If we ignore the case where these execute on different CPU cores, a single core may preempt the caller thread to let the background worker make progress.

Diagram showing the caller thread being preempted between the call to enqueue and returning and the worker thread invoking the continuation

Occasionally that preemption will occur precisely between the ContinuationImpl creation (green) and the call to getResult() (red). If the background work is quick enough the continuation may be invoked (orange) before switching back. In this example, an exception is quickly thrown due to a failed DNS lookup that was cached.

The Fix

Detecting this case in Retrofit is simple. When the Java-based implementation delegates to the suspend fun it captures checked exceptions with a try/catch block.

try {
  return KotlinExtensions.awaitResponse(call, continuation);
} catch (Exception e) {
  // but now what?
}

Invoking the continuation in the catch block is possible, but would defeat the stack protection of suspendCoroutine that caused this behavior in the first place. The current method call needs to be suspended before the exception is delivered. In Kotlin, this can be achieved with yield().

suspend fun Exception.yieldAndThrow(): Nothing {
  yield()
  throw this
}

From Java this function which will always return COROUTINE_SUSPENDED because of yield(). The continuation will then receive the exception at the next available time on the current coroutine dispatcher.

try {
  return KotlinExtensions.awaitResponse(call, continuation);
} catch (Exception e) {
  return KotlinExtensions.yieldAndThrow(e, continuation);
}

It's not clear why a Proxy requires checked exceptions to be declared when normal methods do not. Libraries providing suspend fun support through a Proxy will need to be mindful of this behavior and put similar workarounds in place.

This bug fix is available in Retrofit 2.6.1 today!

https://jakewharton.com/exceptions-and-proxies-and-coroutines-oh-my

R8 Optimization: Method Outlining

Apr 11, 2019 Updated Apr 11, 2019

Show full content

Note: This post is part of a series on D8 and R8, Android's new dexer and optimizer, respectively. For an intro to D8 read "Android's Java 8 support". For an intro to R8 read "R8 Optimization: Staticization".

I recently wrote about the economics of generated code which talked about performing optimizations to generated code that aren't worthwhile in manually-written code. While the examples in that post were motivated by changes to code generators that I had worked on in the past, it also resulted in some new changes being made.

One change proposed to Moshi, a JSON serializer, replaced its generated strings with StringBuilder to de-duplicate the constant parts. Each non-null property in your JSON model generates an exception to ensure non-null values are read from the JSON.

 name = stringAdapter.fromJson(reader) ?:
     throw JsonDataException(
-        "Non-null value 'name' was null at ${reader.path}")
+        StringBuilder("Non-null value '").append("name")
+            .append("' was null at ").append(reader.path).toString())

A second exception is generated when that non-null property lacks a default value and no value was present in the JSON.

 return Person(
   name = name ?: throw JsonDataException(
-      "Required property 'name' missing at ${reader.path}"),
+      StringBuilder("Required property '").append("name")
+          .append("' missing at ").append(reader.path).toString()),

These two diffs are the result of applying the advice from that post.

Each of these exceptions are generated for every property in the type. This means if you have a type with 10 properties you get 20 exceptions generated (assuming they're non-null and don't have default values). This winds up creating a lot of StringBuilder bytecode!

One way to reduce this bytecode bloat is to generate a private method which takes four arguments (prefix, name, suffix, path) and returns the final string. This was proposed as part of the change to Moshi. We ultimately duplicated the code instead of generating a method because it ends up optimizing to a smaller APK thanks to R8. Let's find out why.

Representative Example

Instead of dealing with Moshi, kapt, and generated Kotlin directly, it's easier to work with a representative example. To start with, we need some JSON model objects. In order to require both of the StringBuilder usages from above, each property has a non-null type and has no default value.

data class User(
  val id: String,
  val username: String,
  val displayName: String,
  val email: String,
  val created: OffsetDateTime,
  val isPublic: Boolean
)

data class Tweet(
  val id: String,
  val userId: String,
  val content: String,
  val created: OffsetDateTime
)

When used with Moshi, these types would be annotated with @JsonClass which causes the annotation processor to generate code. That code then interacts with Moshi's JsonReader type to parse the values of each property. We can replicate this using Android's built-in JsonReader type and writing the generated code by hand.

object TweetParser {
  fun fromJson(reader: JsonReader): Tweet {
    var id: String? = null
    // other properties…

    reader.beginObject()
    while (reader.peek() != JsonToken.END_OBJECT) {
      when (reader.nextName()) {
        "id" -> id = reader.nextString() ?:
            throw IllegalStateException(
                StringBuilder("Non-null value '").append("id")
                    .append("' was null at").append(reader).toString())
        // other properties…
        else -> reader.skipValue()
      }
    }
    reader.endObject()

    return Tweet(
      id = id ?: throw IllegalStateException(
          StringBuilder("Required property '").append("id")
             .append("' missing at ").append(reader).toString()),
      // other properties…
    )
  }
}

This is the version for Tweet showing only one property. You would do the same for the userId, content and created properties, and create a similar type for parsing User1.

If you compile with kotlinc, dex with D8, and dump the bytecode with dexdump you'll see the StringBuilder code repeated many times.

0181: new-instance v1, Ljava/lang/IllegalStateException;
0183: new-instance v4, Ljava/lang/StringBuilder;
0185: invoke-direct {v4, v3}, Ljava/lang/StringBuilder;.<init>:(Ljava/lang/String;)V
0188: invoke-virtual {v4, v12}, Ljava/lang/StringBuilder;.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
018b: invoke-virtual {v4, v2}, Ljava/lang/StringBuilder;.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
018e: invoke-virtual {v4, v0}, Ljava/lang/StringBuilder;.append:(Ljava/lang/Object;)Ljava/lang/StringBuilder;
0191: invoke-virtual {v4}, Ljava/lang/StringBuilder;.toString:()Ljava/lang/String;
0194: move-result-object v0
0195: invoke-direct {v1, v0}, Ljava/lang/IllegalStateException;.<init>:(Ljava/lang/String;)V
0198: check-cast v1, Ljava/lang/Throwable;
019a: throw v1

This bytecode sequence weighs less than generating the single string so its a net win no matter what, but this still feels like a waste. Generating a method with this code in each parser type would reduce its impact. So why did we elect not to?

Outlining

Most of the posts in this R8 series have touched on inlining in one way or another. This optimization is when a method is small enough and/or called infrequently enough that it becomes beneficial to copy the method body contents to the call site and remove the method. Outlining is the opposite optimization where common bytecode sequences are identified and extracted to a shared method.

Before running R8, let's add a main function which uses our parsers and can serve as an entry point for optimization.

fun main() {
  println(TweetParser.fromJson(JsonReader(StringReader(""))))
  println(UserParser.fromJson(JsonReader(StringReader(""))))
}

We don't need real data because we're not executing the code. This is just enough to ensure R8 keeps the codepaths we care about. With some simple rules to keep the main method, let's see what R8 does.

$ cat rules.txt
-keepclasseswithmembers class * {
  public static void main(...);
}
-dontobfuscate

$ java -jar r8.jar \
      --lib $ANDROID_HOME/platforms/android-28/android.jar \
      --release \
      --output . \
      --pg-conf rules.txt \
      *.class

Dumping the output of R8 shows a very different picture for the exception code compared to what D8 produced.

 0181: new-instance v1, Ljava/lang/IllegalStateException;
-0183: new-instance v4, Ljava/lang/StringBuilder;
-0185: invoke-direct {v4, v3}, Ljava/lang/StringBuilder;.<init>:(Ljava/lang/String;)V
-0188: invoke-virtual {v4, v12}, Ljava/lang/StringBuilder;.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
-018b: invoke-virtual {v4, v2}, Ljava/lang/StringBuilder;.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
-018e: invoke-virtual {v4, v0}, Ljava/lang/StringBuilder;.append:(Ljava/lang/Object;)Ljava/lang/StringBuilder;
-0191: invoke-virtual {v4}, Ljava/lang/StringBuilder;.toString:()Ljava/lang/String;
+0183: invoke-static {v3, v12, v2, v0}, Lcom/android/tools/r8/GeneratedOutlineSupport;.outline0:(Ljava/lang/String;Ljava/lang/String;Ljava/lang/String;Ljava/lang/Object;)Ljava/lang/String;
 0194: move-result-object v0
 0195: invoke-direct {v1, v0}, Ljava/lang/IllegalStateException;.<init>:(Ljava/lang/String;)V
-0198: check-cast v1, Ljava/lang/Throwable;
 019a: throw v1

The outlining optimization has recognized that the StringBuilder code is repeated many times. The bytecode sequence is de-duplicated to the outline0 method on this com.android.tools.r8.GeneratedOutlineSupport class. Every occurrence of the bytecode sequence is replaced with a call to this new method.

Taking a look at the new method shows the common StringBuilder code.

[000eb4] com.android.tools.r8.GeneratedOutlineSupport.outline0:(Ljava/lang/String;Ljava/lang/String;Ljava/lang/String;Ljava/lang/Object;)Ljava/lang/String;
0000: new-instance v0, Ljava/lang/StringBuilder
0002: invoke-direct {v0, v1}, Ljava/lang/StringBuilder;.<init>:(Ljava/lang/String;)V
0005: invoke-virtual {v0, v2}, Ljava/lang/StringBuilder;.append:(Ljava/lang/String;)Ljava/lang/StringBuilder
0008: invoke-virtual {v0, v3}, Ljava/lang/StringBuilder;.append:(Ljava/lang/String;)Ljava/lang/StringBuilder
000b: invoke-virtual {v0, v4}, Ljava/lang/StringBuilder;.append:(Ljava/lang/Object;)Ljava/lang/StringBuilder
000e: invoke-virtual {v0}, Ljava/lang/StringBuilder;.toString:()Ljava/lang/String
0011: move-result-object v1
0012: return-object v1

R8 has created the helper method which we were considering adding ourselves!

I specifically chose to use two types in the example which together have 10 properties resulting in 20 StringBuilder usages. This is the lower bound of duplicate sequences that R8 will consider outlining. The duplicated bytecode must also be between 3 and 99 bytes.

If Moshi generated a private StringBuidler helper method our example would still have two copies. You would need 20 JSON model objects before R8 stepped in and de-duplicated the helper method. By electing to duplicate the StringBuilder code, only 20 properties are needed in any number of JSON model objects before R8 outlining kicks in. Once that happens we only pay for the code once no matter how many JSON model objects and properties are in use.

Outlining works really well with generated code since it tends to produce repeated patterns. In examples like the one above, you can avoid putting a helper function in your runtime library and instead rely on R8 to de-duplicate bytecode when it's repeated enough. And because R8 is doing whole-program analysis, unrelated code which happens to have the same bytecode patterns participate in the de-duplication.

It's also interesting to think about how this interacts with Kotlin's inline function modifier. The more you use inline functions (and especially if you invoke inline functions inside other inline functions) the more likely you are to have R8 outline some of the function body back into a regular method. Make sure that you're using inline for things like reified generics or to avoid allocating lambda objects as it's intended.

In the previous post about R8 I teased that the next post (aka this one) would cover an optimization that created const-class bytecodes. After writing two posts outside of this series on generated code and having the discussion on the Moshi change, however, it felt like a natural progression to cover outlining. With outlining out of the way the next R8 post will get back on track with producing const-class bytecodes.

The full example code is available at gist.github.com/JakeWharton/6d08b7fb74c320b048db68e21912d878 ↩

https://jakewharton.com/r8-optimization-method-outlining

Optimizing Bytecode by Manipulating Source Code

Apr 2, 2019 Updated Apr 2, 2019

Show full content

This post is a follow-up to "The Economics of Generated Code" which argued that spending time optimizing generated code is more worthwhile than the same optimizations done in manually-written code.

The second example from that post dealt with looking up views, checking for null, and potentially throwing an exception. In an effort to reduce the impact of the generated exception message string, each was split into a prefix which will be de-duplicated and the view ID name which was effectively free since it matched a field name. If you're lost on what that all means, check out the other post first.

 public static MainBinding bind(View root) {
   TextView name = root.findViewById(R.id.name);
   if (name == null) {
-    throw new NullPointerException("View 'name' required but not found");
+    throw new NullPointerException("Missing required view with ID: ".concat("name"));
   }
   TextView email = root.findViewById(R.id.email);
   if (email == null) {
-    throw new NullPointerException("View 'email' required but not found");
+    throw new NullPointerException("Missing required view with ID: ".concat("email"));
   }
   return new MainBinding(root, name, email);
 }

That change was just about strings, but I also mentioned that there's more optimization which could be done. So let's do it!

By virtue of the fact that we throw an exception when a view is absent, that case is expected to be rare. This is what allowed us to justify sacrificing a single string constant in favor of multiple constants and runtime concatenation. While that allowed us to de-duplicate the strings, it creates more duplication in the bytecode.

[000288] MainBinding.bind:(Landroid/view/View;)LMainBinding;
0000: sget v0, LR$id;.name:I
0002: invoke-virtual {v3, v0}, Landroid/view/View;.findViewById:(I)Landroid/view/View;
0005: move-result-object v0
0006: check-cast v0, Landroid/widget/TextView;

0008: if-nez v0, 0018

000a: new-instance v0, Ljava/lang/NullPointerException;
000c: const-string v1, "Missing required view with ID: "
000e: const-string v2, "name"
0010: invoke-virtual {v1, v2}, Ljava/lang/String;.concat:(Ljava/lang/String;)Ljava/lang/String;
0013: move-result-object v1
0014: invoke-direct {v0, v1}, Ljava/lang/NullPointerException;.<init>:(Ljava/lang/String;)V
0017: throw v0

0018: sget v1, LR$id;.email:I
001a: invoke-virtual {v3, v1}, Landroid/view/View;.findViewById:(I)Landroid/view/View;
001d: move-result-object v1
001e: check-cast v1, Landroid/widget/TextView;

0020: if-nez v1, 0030

0022: new-instance v0, Ljava/lang/NullPointerException;
0024: const-string v1, "Missing required view with ID: "
0026: const-string v2, "email"
0028: invoke-virtual {v1, v2}, Ljava/lang/String;.concat:(Ljava/lang/String;)Ljava/lang/String;
002b: move-result-object v1
002c: invoke-direct {v0, v1}, Ljava/lang/NullPointerException;.<init>:(Ljava/lang/String;)V
002f: throw v0

0030: new-instance v2, LMainBinding;
0032: invoke-direct {v2, v3, v0, v1}, LMainBinding;.<init>:(Landroid/view/View;Landroid/widget/TextView;Landroid/widget/TextView;)V
0035: return-object v2

I've spaced the bytecode out so it's easier to see the logical sections and, hopefully, identify what we want to change.

Indices 000a–0017 and 0022–002f are near-exact duplicates of each other which only vary by the name of the missing view. Again, because this code is expected to never run, it would be nice to remove the duplication. Fixing this will be the focus of the post, but I also want to point out a second problem that we'll fix in tandem.

In addition to the exception code being duplicated it's also interspersed between "normal" code. This means that the common execution path of required views being present has to jump over unused bytecode.

The code was actually compiled with the old dx tool to produce the bytecode above. Simply compiling with D8 instead produces a dramatically different arrangement of the control flow.

[000258] MainBinding.bind:(Landroid/view/View;)LMainBinding;
0000: sget v0, LR$id;.name:I
0002: invoke-virtual {v3, v0}, Landroid/view/View;.findViewById:(I)Landroid/view/View;
0005: move-result-object v0
0006: check-cast v0, Landroid/widget/TextView;

0008: const-string v1, "Missing required view with ID: "

000a: if-eqz v0, 0028

000c: sget v2, LR$id;.email:I
000e: invoke-virtual {v3, v2}, Landroid/view/View;.findViewById:(I)Landroid/view/View;
0011: move-result-object v2
0012: check-cast v2, Landroid/widget/TextView;

0014: if-eqz v2, 001c

0016: new-instance v1, LMainBinding;
0018: invoke-direct {v1, v3, v0, v2}, LMainBinding;.<init>:(Landroid/view/View;Landroid/widget/TextView;Landroid/widget/TextView;)V
001b: return-object v1

001c: new-instance v3, Ljava/lang/NullPointerException;
001e: const-string v0, "email"
0020: invoke-virtual {v1, v0}, Ljava/lang/String;.concat:(Ljava/lang/String;)Ljava/lang/String;
0023: move-result-object v0
0024: invoke-direct {v3, v0}, Ljava/lang/NullPointerException;.<init>:(Ljava/lang/String;)V
0027: throw v3

0028: new-instance v3, Ljava/lang/NullPointerException;
002a: const-string v0, "name"
002c: invoke-virtual {v1, v0}, Ljava/lang/String;.concat:(Ljava/lang/String;)Ljava/lang/String;
002f: move-result-object v0
0030: invoke-direct {v3, v0}, Ljava/lang/NullPointerException;.<init>:(Ljava/lang/String;)V
0033: throw v3

D8 understands that the case in which you throw an exception is, well, exceptional. Thus, the conditionals are inverted so that the exceptional cases move to the end of the method. This makes the common case not require any jumps.

Another side-effect of using D8 is that the loading of the exception message prefix string was de-duplicated at bytecode index 0008. This is actually an unfortunate behavior since it now occurs during normal execution as well.

Before attempting to fix these problems, let's manually re-arrange the bytecode (with dummy indices, for simplicity) to the ideal form we'd like to produce.

[000258] MainBinding.bind:(Landroid/view/View;)LMainBinding;
0000: sget v0, LR$id;.name:I
0001: invoke-virtual {v3, v0}, Landroid/view/View;.findViewById:(I)Landroid/view/View;
0002: move-result-object v0
0003: check-cast v0, Landroid/widget/TextView;

0010: if-eqz v0, 0050

0020: sget v1, LR$id;.email:I
0021: invoke-virtual {v3, v1}, Landroid/view/View;.findViewById:(I)Landroid/view/View;
0022: move-result-object v1
0023: check-cast v1, Landroid/widget/TextView;

0030: if-eqz v1, 0060

0040: new-instance v2, LMainBinding;
0041: invoke-direct {v2, v3, v0, v1}, LMainBinding;.<init>:(Landroid/view/View;Landroid/widget/TextView;Landroid/widget/TextView;)V
0042: return-object v2

0050: const-string v2, "email"
0051: goto 0070

0060: const-string v2, "name"

0070: const-string v1, "Missing required view with ID: "
0071: new-instance v3, Ljava/lang/NullPointerException;
0072: invoke-virtual {v1, v2}, Ljava/lang/String;.concat:(Ljava/lang/String;)Ljava/lang/String;
0073: move-result-object v2
0074: invoke-direct {v3, v2}, Ljava/lang/NullPointerException;.<init>:(Ljava/lang/String;)V
0075: throw v3

This has everything we want: the normal execution case flows from index 0000 to 0042 without jumps and the exception-handling code is de-deuplicated at index 0070 to 0075. There's only one load of the prefix string as part of creating the exception message. When a null is found, the code jumps to a section which loads the correct view ID string and then jumps (or falls through) to the exception.

Now that we have a goal it's easier to iterate on the generated Java code to see how our changes move us closer or farther from achieving it. Let's start by de-duplicating the exception code.

 public static MainBinding bind(View root) {
+  String missingId = null;
   TextView name = root.findViewById(R.id.name);
   if (name == null) {
+    missingId = "name";
-    throw new NullPointerException("Missing required view with ID: ".concat("name"));
   }
   TextView email = root.findViewById(R.id.email);
   if (email == null) {
+    missingId = "email";
-    throw new NullPointerException("Missing required view with ID: ".concat("email"));
   }
-  return new MainBinding(root, name, email);
+  if (missingId == null) {
+    return new MainBinding(root, name, email);
+  }
+  throw new NullPointerException("Missing required view with ID: ".concat(missingId));
 }

This produces bytecode which successfully de-duplicates the exception code but with a slight penalty on the other parts.

[000258] MainBinding.bind:(Landroid/view/View;)LMainBinding;
0000: sget v0, LR$id;.name:I
0002: invoke-virtual {v3, v0}, Landroid/view/View;.findViewById:(I)Landroid/view/View;
0005: move-result-object v0
0006: check-cast v0, Landroid/widget/TextView;

0008: if-nez v0, 000d

000a: const-string v1, "name"
000c: goto 000e

000d: const/4 v1, #int 0

000e: sget v2, LR$id;.email:I
0010: invoke-virtual {v3, v2}, Landroid/view/View;.findViewById:(I)Landroid/view/View;
0013: move-result-object v2
0014: check-cast v2, Landroid/widget/TextView;

0016: if-nez v2, 001a

0018: const-string v1, "email"

001a: if-nez v1, 0022

001c: new-instance v1, LMainBinding;
001e: invoke-direct {v1, v3, v0, v2}, LMainBinding;.<init>:(Landroid/view/View;Landroid/widget/TextView;Landroid/widget/TextView;)V
0021: return-object v1

0022: new-instance v3, Ljava/lang/NullPointerException;
0024: const-string v0, "Missing required view with ID: "
0026: invoke-virtual {v0, v1}, Ljava/lang/String;.concat:(Ljava/lang/String;)Ljava/lang/String;
0029: move-result-object v0
002a: invoke-direct {v3, v0}, Ljava/lang/NullPointerException;.<init>:(Ljava/lang/String;)V
002d: throw v3

Since the throw statement was removed from the if check body, D8 no longer understands that they're exceptional cases. This means that the jumps in normal execution have returned. There's also a slight behavior change in that we now report the last missing view instead of the first.

The first thing that comes to my mind for trying to eliminate the needless jumps is nesting the conditionals.

 public static MainBinding bind(View root) {
-  String missingId = null;
+  String missingId;
   TextView name = root.findViewById(R.id.name);
-  if (name == null) {
-    missingId = "name";
-  }
-  TextView email = root.findViewById(R.id.email);
-  if (email == null) {
-    missingId = "email";
-  }
-  if (missingId == null) {
-    return new MainBinding(root, name, email);
+  if (name != null) {
+    TextView email = root.findViewById(R.id.email);
+    if (email != null) {
+      return new MainBinding(root, name, email);
+    } else {
+      missingId = "email";
+    }
+  } else {
+    missingId = "name";
   }
   throw new NullPointerException("Missing required view with ID: ".concat(missingId));
 }

Lo and behold, we've done it!

[000258] MainBinding.bind:(Landroid/view/View;)LMainBinding;
0000: sget v0, LR$id;.name:I
0002: invoke-virtual {v3, v0}, Landroid/view/View;.findViewById:(I)Landroid/view/View;
0005: move-result-object v0
0006: check-cast v0, Landroid/widget/TextView;

0008: if-eqz v0, 001d

000a: sget v1, LR$id;.email:I
000c: invoke-virtual {v3, v1}, Landroid/view/View;.findViewById:(I)Landroid/view/View;
000f: move-result-object v1
0010: check-cast v1, Landroid/widget/TextView;

0012: if-eqz v1, 001a

0014: new-instance v2, LMainBinding;
0016: invoke-direct {v2, v3, v0, v1}, LMainBinding;.<init>:(Landroid/view/View;Landroid/widget/TextView;Landroid/widget/TextView;)V
0019: return-object v2

001a: const-string v3, "email"
001c: goto 001f

001d: const-string v3, "name"

001f: new-instance v0, Ljava/lang/NullPointerException;
0021: const-string v1, "Missing required view with ID: "
0023: invoke-virtual {v1, v3}, Ljava/lang/String;.concat:(Ljava/lang/String;)Ljava/lang/String;
0026: move-result-object v3
0027: invoke-direct {v0, v3}, Ljava/lang/NullPointerException;.<init>:(Ljava/lang/String;)V
002a: throw v0

Modulo a few register re-numberings, this is exactly the same bytecode as the ideal case we crafted above. The key which makes this work is mostly in the else branches. Once an else branch is taken, it then immediately jumps down to the exception code because it's the last statement in the if branch in every layer above.

So are we done?

While we shouldn't care too much about how generated code looks, I still find this solution to be unsatisfactory. If you have 20 views in a layout you'll get 20 levels of nesting. Even though generated code isn't written by hand, you still might find yourself reading it when clicking through elements of a stacktrace or during debugging. As a result, if a more readable solution is available without sacrificing the value we should prefer it.

In order to flatten the generated code, we need a similar mechanism which allows control flow to jump to a particular point. This sounds awfully similar to a "goto", and it is, but all control flow is a form of "goto" so we might as well use whatever the language provides. For Java, the break statement of a switch or loop comes to mind as something to try.

 public static MainBinding bind(View root) {
   String missingId;
-  TextView name = root.findViewById(R.id.name);
-  if (name != null) {
+  while (true) {
+    TextView name = root.findViewById(R.id.name);
+    if (name == null) {
+      missingId = "name";
+      break;
+    }
     TextView email = root.findViewById(R.id.email);
-    if (email != null) {
-      return new MainBinding(root, name, email);
-    } else {
+    if (email == null) {
       missingId = "email";
+      break;
     }
-  } else {
-    missingId = "name";
+    return new MainBinding(root, name, email);
   }
   throw new NullPointerException("Missing required view with ID: ".concat(missingId));
 }

By using a return statement as the last of the infinite loop, we never actually loop and instead just borrow the break feature. This is functionality equivalent to the previous version and it produces the exact same bytecode but without nesting.

Does a loop that doesn't actually loop offend your sensibilities? It certainly does for IntelliJ IDEA which produces a warning: "'while' loop does not loop". We could generate a suppression, but it would be nice to just use something else more suited for this case. There's actually one more construct where a break can be used: labeled blocks.

 public static MainBinding bind(View root) {
   String missingId;
-  while (true) {
+  missingId: {
     TextView name = root.findViewById(R.id.name);
     if (name == null) {
       missingId = "name";
-      break;
+      break missingId;
     }
     TextView email = root.findViewById(R.id.email);
     if (email == null) {
       missingId = "email";
-      break;
+      break missingId;
     }
     return new MainBinding(root, name, email);
   }
   throw new NullPointerException("Missing required view with ID: ".concat(missingId));
 }

Now this really looks like a "goto", but the compiler will still validate that missingId is initialized in all execution paths that lead to the exception just like it did with while (true) and the nested if/elses. And, unsurprisingly, the bytecode remains the same.

This is the final form of this specific example of generated code as it stands right now. The bytecode size was reduced from 55 bytes to 31. The duplication was removed and the control flow is now tailored for all views being present. The source code actually got a little bit longer, but it's still very readable. The labeled block is admittedly something you don't see often and probably wouldn't use in manually written code unless it was for breaking across nested loops.

You don't need to dig this deep if you're building something that generates code. Start with generating a good API and producing correct behavior. All of this optimization can be done later, or even never. I get involved in this optimization because it's a fun exploration, but also because the economics of generated code mean that the work almost always pays for itself.

https://jakewharton.com/optimizing-bytecode-by-manipulating-source-code

The Economics of Generated Code

Mar 26, 2019 Updated Mar 26, 2019

Show full content

Among the many things that I've stolen learned from Jesse Wilson is the phrase "the economics of generated code". This captures the idea that the things we value when generating code are different than those we value for code that's manually written.

A code generator is only written once but the code it generates occurs many times. Thus, any investment into making the generator emit more efficient code will pay for itself very quickly. This generally means output less code and allocate fewer objects wherever possible. I'd like to expand on that with two specific, real-world examples which I've run into.

Extra Method References

While it's not as much of a problem as it used to be, method reference count is still something worth keeping an eye on. This is especially true for generated code. Small changes in the generator can result in the count going up or down by the hundreds or thousands.

It's common for generated classes to be a subtype of a class in the runtime library. Aside from facilitating polymorphism, this allows consolidating common utilities and behavior. Take a JSON model that wants to retain unknown keys and values encountered during parsing. Each generated class could maintain its own Map<String, ?> for the unknown pairs, but this is a great candidate for consolidation into a base class in the library.

abstract class JsonModel {
  private final Map<String, ?> unknownPairs;

  public final Map<String, ?> getUnknownPairs() {
    return unknownPairs;
  }

  // …
}

Not having a getUnknownPairs() method in each generated class should obviously reduce the count. But since the count is not just about declared methods, reducing the referenced methods in the generated code will also have an impact.

Each generated class extends JsonModel and implements toString() which outputs its own fields and the getUnknownPairs() map.

final class UserModel extends JsonModel {
  private final String name;
  private final String email;

  // …

  @Override public String toString() {
    return "UserModel{"
        + "name=" + name + ", "
        + "email=" + email + ", "
        + "unknownPairs=" + getUnknownPairs()
        + '}';
  }
}

When you compile, dex, and dump the Dalvik bytecode of the above class with dexdump, the way in which toString() invokes the getUnknownPairs() method is surprising.

[00024c] UserModel.toString:()Ljava/lang/String;
0000: iget-object v0, v5, LUserModel;.name:Ljava/lang/String;
0002: iget-object v1, v5, LUserModel;.email:Ljava/lang/String;
0004: invoke-virtual {v5}, LUserModel;.getUnknownPairs:()Ljava/util/Map;
0007: move-result-object v2

Despite placing the getUnknownPairs() method on the JsonModel supertype, each generated class produces a reference to that method as if it were defined directly on the generated type. Moving the method does not actually reduce the count!

A medium-sized app might have 100 models for its API layer. If each generated class contains four calls to a method defined in the supertype that's 400 method references created for no purpose.

Changing the generated code to explicitly use super will produce method references which all point directly to the supertype method.

 @Override public String toString() {
   return "UserModel{"
       + "name=" + name + ", "
       + "email=" + email + ", "
-      + "unknownPairs=" + getUnknownPairs()
+      + "unknownPairs=" + super.getUnknownPairs()
       + '}';
 }

 [00024c] UserModel.toString:()Ljava/lang/String;
 0000: iget-object v0, v5, LUserModel;.name:Ljava/lang/String;
 0002: iget-object v1, v5, LUserModel;.email:Ljava/lang/String;
-0004: invoke-virtual {v5}, LUserModel;.getUnknownPairs:()Ljava/util/Map;
+0004: invoke-virtual {v5}, LJsonModel;.getUnknownPairs:()Ljava/util/Map;
 0007: move-result-object v2

Those 400 extra references are now reduced to just one! We would normally be unlikely to make such a change, but because we control the base class and the generated class this change is safe and results in a significant reduction of method references.

It's important to point out that using R8 to optimize your app will change this method reference automatically. Not every consumer of your code generator will be using an optimizer, though. Making this small change will ensure everyone benefits.

String Duplication

Having strings in generated code isn't a given, but it shows up frequently enough to think about its impact. In my experience, strings in generated code tend to fall into two categories: keys for some type of serialization or error messages for exceptions. There's not much we can do about the former, but the latter is interesting because those strings exist in code paths which are expected to be rarely taken.

Take, for example, a code generator which binds Android views from a layout into fields of a class. Views are required when they're present in every configuration of the layout and we validate their presence at runtime with a null check.

public final class MainBinding {
  // …

  public static MainBinding bind(View root) {
    TextView name = root.findViewById(R.id.name);
    if (name == null) {
      throw new NullPointerException("View 'name' required but not found");
    }
    TextView email = root.findViewById(R.id.email);
    if (email == null) {
      throw new NullPointerException("View 'email' required but not found");
    }
    return new MainBinding(root, name, email);
  }
}

If you compile, dex, and dump the contents of the .dex file using Baksmali, you can see these strings in the string data section of the output.

                           |[20] string_data_item
00044f: 22                 |  utf16_size = 34
000450: 5669 6577 2027 6e61|  data = "View \'name\' required but not found"
000458: 6d65 2720 7265 7175|
000460: 6972 6564 2062 7574|
000468: 206e 6f74 2066 6f75|
000470: 6e64 00            |
                           |[21] string_data_item
000473: 23                 |  utf16_size = 35
000474: 5669 6577 2027 656d|  data = "View \'email\' required but not found"
00047c: 6169 6c27 2072 6571|
000484: 7569 7265 6420 6275|
00048c: 7420 6e6f 7420 666f|
000494: 756e 6400          |

In order to be encoded in the dex file format, these strings require 36 and 37 bytes, respectively (the extra two bytes for each encode their length and a null terminator).

With some napkin math we can estimate the cost of these strings in a real app. Each string requires 32 bytes plus the length of the view ID which we'll say is usually around 12 characters. A medium-sized app has around 50 layouts each with around 10 views. So 50 * 10 * (32 + 12) yields a total cost of 22KB. This isn't a huge amount of space, but considering we expect these strings to never be used unless there's a programming error the overhead feels unfortunate.

Strings are de-duplicated in dex so if the common parts of the string were separated we would only pay their cost once. Additionally, the string data section is also used to hold the names of fields so strings which match the name of a field will be free. Using this information, we might naively try to split up the string into three pieces.

 if (name == null) {
-  throw new NullPointerException("View 'name' required but not found");
+  throw new NullPointerException("View '" + "name" + "' required but not found");
 }
 TextView email = root.findViewById(R.id.email);
 if (email == null) {
-  throw new NullPointerException("View 'email' required but not found");
+  throw new NullPointerException("View '" + "email" + "' required but not found");
 }

Unfortunately, javac sees the concatenation of constants as something it can optimize so it turns them back into single, unique strings. To outsmart it, we need to generate code which uses a StringBuilder or the little-known String.concat method.

 if (name == null) {
-  throw new NullPointerException("View 'name' required but not found");
+  throw new NullPointerException("Missing required view with ID: ".concat("name"));
 }
 TextView email = root.findViewById(R.id.email);
 if (email == null) {
-  throw new NullPointerException("View 'email' required but not found");
+  throw new NullPointerException("Missing required view with ID: ".concat("email"));
 }

Now the dex file only contains a single prefix string and we don't pay for the ID strings because they were already being used for the R.id. fields.

                           |[17] string_data_item
00046a: 1f                 |  utf16_size = 31
00046b: 4d69 7373 696e 6720|  data = "Missing required view with ID: "
000473: 7265 7175 6972 6564|
00047b: 2076 6965 7720 7769|
000483: 7468 2049 443a 2000|

22KB of string data reduced to 33 bytes! Now it is worth noting that we spend an extra 7 bytes loading the second string and invoking String.concat, but since the string was always more than 32 bytes it's still a nice win. There's still room to de-duplicate the actual concatenation and exception throwing code so that it's only paid once per class instead of once per view, but I'll leave that for another post.

Seeing either of these optimizations in manually written code should raise an eyebrow. The individual savings of applying them are not worth their otherwise unidiomatic nature. With code generation, however, the economics are different. A single change to the generator can have optimizations like this apply to hundreds or thousands of locations producing a much larger effect.

https://jakewharton.com/the-economics-of-generated-code

R8 Optimization: Class Constant Operations

Feb 27, 2019 Updated Feb 27, 2019

Show full content

Note: This post is part of a series on D8 and R8, Android's new dexer and optimizer, respectively. For an intro to D8 read "Android's Java 8 support". For an intro to R8 read "R8 Optimization: Staticization".

The previous post in the series showed R8 (and D8) invoking string methods at compile-time when the inputs were all constants. R8 is able to do this because the content of constant strings is available inside the bytecode. That post also claimed that strings are the only non-primitive type that can be manipulated like this at compile-time.

There is, however, another object type that can be manipulated at compile-time: classes. Classes are templates for the instances we interact with at runtime. Since bytecode fundamentally exists to hold these templates, some operations on classes can thus be performed at compile time.

Log Tags

There's an ongoing debate (if you can even call it that) on the best way to define a tag string in a class. Historically there have been two strategies: string literals and calling getSimpleName() on the class.

private static final String TAG = "MyClass";
// or
private static final String TAG = MyClass.class.getSimpleName();

Let's compare the difference in bytecode by defining both and adding some log messages.

class MyClass {
  private static final String TAG_STRING = "MyClass";
  private static final String TAG_CLASS = MyClass.class.getSimpleName();

  public static void main(String... args) {
    Log.d(TAG_STRING, "String tag");
    Log.d(TAG_CLASS, "Class tag");
  }
}

Compiling, dexing, and dumping the Dalvik bytecode shows the effect of the choice.

[000194] MyClass.<clinit>:()V
0000: const-class v0, LMyClass;
0002: invoke-virtual {v0}, Ljava/lang/Class;.getSimpleName:()Ljava/lang/String;
0005: move-result-object v0
0006: sput-object v0, LMyClass;.TAG_CLASS:Ljava/lang/String;
0008: return-void

[000120] MyClass.main:([Ljava/lang/String;)V
0000: const-string v1, "MyClass"
0002: const-string v0, "String tag"
0004: invoke-static {v1, v0}, Landroid/util/Log;.d:(Ljava/lang/String;Ljava/lang/String;)I
0007: sget-object v1, LMyClass;.a:Ljava/lang/String;
0009: const-string v0, "Class tag"
000b: invoke-static {v1, v0}, Landroid/util/Log;.d:(Ljava/lang/String;Ljava/lang/String;)I
000e: return-void

In the main method, index 0000 loads the constant string of the tag. Index 0007, on the other hand, has to look up the static field in order to get the tag value. In the <clinit> method, the static field is initialized by loading the MyClass class and then invoking getSimpleName at runtime. This method is automatically invoked the first time the class is loaded.

The string literal is more efficient but using the class reference is resilient to things like refactoring. But if you've read any of these posts so far, you should know where this is going! Let's try again with R8 and look at its output.

[000120] MyClass.main:([Ljava/lang/String;)V
0000: const-string v1, "MyClass"
0002: const-string v0, "String tag"
0004: invoke-static {v1, v0}, Landroid/util/Log;.d:(Ljava/lang/String;Ljava/lang/String;)I
0007: const-string v0, "Class tag"
0009: invoke-static {v1, v0}, Landroid/util/Log;.d:(Ljava/lang/String;Ljava/lang/String;)I
000c: return-void

The bytecode which came after index 0004 that loaded the second tag has disappeared and v1, the string literal tag, was re-used for the second call to Log.

Since the simple name of MyClass is known at compile-time, R8 has replaced MyClass.class.getSimpleName() with the string literal "MyClass". Because the field value is now a constant, the <clinit> method becomes empty and is removed. At the usage site, the sget-object bytecode was replaced with a const-string for the constant. Finally, the two const-string bytecodes which reference the same string were de-duplicated and the value is reused.

So while the verdict might not be in on which pattern to use for log tag fields, R8 makes sure that those choosing the class-based route don't incur any additional runtime overhead. And because the getSimpleName() computation is trivial, D8 will actually perform it as well!1

Applicability

Being able to compute getSimpleName() (and getName() and getCanonicalName() too!) on a MyClass.class reference seems of limited use–potentially even solely for this log tag case. The optimization only works with a class literal reference–getClass() won't work! It is once again in combination with other R8 features that this optimization starts to apply more.

Consider a class which abstracts logging and uses a static initializer that accepts which class will be sending log messages.

class Logger {
  static Logger get(Class<?> cls) {
    return new Logger(cls.getSimpleName());
  }
  private Logger(String tag) { /* … */ }

}

class MyClass {
  private static final Logger logger = Logger.get(MyClass.class);
}

If Logger.get is inlined to all of its call sites, the call to Class.getSimpleName which previously had a dynamic input from the method parameter will change to a static input of a class reference (MyClass.class in this case). R8 can now replace the call with a string literal resulting in a field initializer that directly invokes the constructor (which will also have its private modifier removed).

class MyClass {
  private static final Logger logger = new Logger("MyClass");
}

This relies on the get method being small enough or being called in a way that the heuristics of R8 will perform the inlining.

The Kotlin language offers the ability to force a function to be inlined. It also allows marking a generic type parameter on an inline fuction as "reified" which ensures that the compiler knows which class it resolves to when compiling. With these features we can ensure our function is always inlined and that getSimpleName is always called on an explicit class reference.

class Logger private constructor(val tag: String) {

}
inline fun <reified T : Any> logger() = Logger(T::class.java.simpleName)

class MyClass {

  companion object {
    private val logger = logger<MyClass>()
  }
}

The initializer for logger will always have the bytecode equivalent of MyClass.class.getSimpleName() which R8 can then always replace with a string literal.

For other Kotlin examples, type inference can often allow omitting the explicit type parameter.

inline fun <reified T> typeAndValue(value: T) = "${T::class.java.name}: $value"
fun main() {
  println(typeAndValue("hey"))
}

This example outputs "java.lang.String: hey" and its bytecode contains only two constant strings, a StringBuilder to concatenate them, and a call to System.out.println. And if this issue was implemented, you'd wind up with only a single string and the call to System.out.println.

Obfuscation and Optimization

Since this optimization operates on classes, it has to interact with the other features of R8 that might affect a class such as obfuscation and different optimizations.

Let's go back to the original example.

class MyClass {
  private static final String TAG_STRING = "MyClass";
  private static final String TAG_CLASS = MyClass.class.getSimpleName();

  public static void main(String... args) {
    Log.d(TAG_STRING, "String tag");
    Log.d(TAG_CLASS, "Class tag");
  }
}

What happens if this class is obfuscated? If R8 was not replacing the getSimpleName call, the first log message would have a tag of "MyClass" and the second would have a tag matching the obfuscated class name such as "a".

In order for R8 to be allowed to replace getSimpleName it needs to do so with a value that matches what the behavior would be at runtime. Thankfully, since R8 is also the tool which is performing obfuscation, it can defer the replacement until the the class has been given its final name.

[000158] a.main:([Ljava/lang/String;)V
0000: const-string v1, "MyClass"
0002: const-string v0, "String tag"
0004: invoke-static {v1, v0}, Landroid/util/Log;.d:(Ljava/lang/String;Ljava/lang/String;)I
0007: const-string v1, "a"
0009: const-string v0, "Class tag"
000b: invoke-static {v1, v0}, Landroid/util/Log;.d:(Ljava/lang/String;Ljava/lang/String;)I
000e: return-void

Note how index 0007 now will load a tag value for the second log call (unlike the original R8 output) and how it correctly reflects the obfuscated name.

There are other R8 optimizations which affect the class name even when obfuscation is disabled. While I plan to cover it in a future post, R8 will sometimes merge a superclass into a subtype if it can prove the superclass isn't needed and the subtype is the only one. When this happens, the class name string optimization will correctly reflect the subtype name even if the original code was equivalent to TheSupertype.class.getSimpleName().

String Data Section

The previous post talked about how performing an operation like String.substring or string concatenation at compile-time could lead to the string section of the dex file increasing in size.2 The optimization in this post produces strings which might not otherwise exist so that is also a possibility here.

There's two cases to consider: when obfuscation is enabled and when it is disabled.

When obfuscation is enabled calls to getSimpleName() should not create a new string. Both classes and methods will be obfuscated using the same dictionary which by default starts with single letters. This means that for an obfuscated class named b, inserting the string "b" is almost always free since there is going to be a method or field whose name is also b. In the dex file all strings are stored in a single pool which contains the literals, class names, method names, and field names making the probability of a match when obfuscating very high.

With obfuscation disabled, though, replacing getSimpleName() is never free. Despite the unified string section of the dex file, class names are stored in type descriptor form. This includes the package name, uses / as separators, and is prefixed with L and suffixed with ;. For MyClass, if in a hypothetical com.example package, the string data contains an entry for Lcom/example/MyClass;. Because of this format, the string "MyClass" doesn't already exist and will need to be added.

Both getName() and getCanonicalName() will also, unfortunately, always create new strings. Even though these return a fully-qualified strings, they don't match the type descriptor form which is already present in string data.

Since this optimization has the potential to create a large amount of strings, it's currently disabled for everything except top-level types. This means that it works in the MyClass example from this post but in a nested type or anonymous type it will not apply. There is also some escape analysis done to avoid applying the optimization for calls inside a single method. Both of these minimize any adverse impact on your dex size.

The next post on R8 will look at an optimization which produces class literals like those used in this post (i.e., the const-class bytecodes created from MyClass.class). You won't be surprised when that post shows class literal creation which in turn allows the optimizations from this post to apply which in turn allows the string optimizations to apply and so on.

(This post was adapted from a part of my Digging into D8 and R8 talk that was never presented. Watch the video and look out for future blog posts for more content like this.)

It won't, however, replace the sget-object bytecodes with const-string nor remove the now-empty <clinit> method. ↩
Coincidentally, compile-time substring() computation landed in R8 yesterday! ↩

https://jakewharton.com/r8-optimization-class-constant-operations

R8 Optimization: String Constant Operations

Feb 12, 2019 Updated Feb 12, 2019

Show full content

Note: This post is part of a series on D8 and R8, Android's new dexer and optimizer, respectively. For an intro to D8 read "Android's Java 8 support". For an intro to R8 read "R8 Optimization: Staticization".

The previous post in the series covered an R8 flag which allows you to specify the return value range of a field or method. R8 can use this to automatically remove conditionals against SDK_INT based on your app's minimum supported API level, for example. That can only happen because multiple R8 features are working together. This post (and the next few) will cover smaller optimizations of R8 which work best when combined with others.

Aside from the eight primitive types of Java, all of the other values your program interacts with are instances of classes whose data can only be manipulated at runtime. That is, all except for one type: strings. Strings are such a fundamental and ubiquitous type that they are given special treatment in the Java and Kotlin language, in Java bytecode, and in Dalvik bytecode. And because of that special treatment, tools like R8 can manipulate them at compile-time!

Constant Pool and String Data

When you write a string literal in Java or Kotlin, the contents of that string are encoded in a special section of the bytecode. For Java bytecode it's called the constant pool. For Dalvik bytecode it's called the string data section. In addition to string literals which were present in the source code, strings for the names of types, methods, fields, and other structural elements are included in these sections.

When you look at the Java bytecode of a class file through javap as these posts have been doing, references to the constant pool use an octothorpe (#) followed by a number.

0: new           #2  // class java/lang/StringBuilder
3: dup
4: invokespecial #3  // Method java/lang/StringBuilder."<init>":()V
7: ldc           #4  // String A:

Helpful comments are included so that we don't have to manually consult the constant pool to figure out what each means.

If you invoke javap with the -v argument the constant pool will be included in the output.

Constant pool:
   #1 = Methodref          #9.#18         // java/lang/Object."<init>":()V
   #2 = Class              #19            // java/lang/StringBuilder
   #3 = Methodref          #2.#18         // java/lang/StringBuilder."<init>":()V
   #4 = String             #20            // A:
    ⋮
  #10 = Utf8               <init>
  #11 = Utf8               ()V
    ⋮
  #18 = NameAndType        #10:#11        // "<init>":()V
  #19 = Utf8               java/lang/StringBuilder
  #20 = Utf8               A:

#4 is a String type whose data is at #20 which is a UTF-8 entry for "A:". This was one of the string literals from the source (taken from the Java 9 string concat example). If you look at #2 or #3, they're signatures for a Class and Methodref (method reference), respectively. Each uses one or more UTF-8 entries to create the signature it represents.

When using dexdump to look at Dalvik bytecode, the program doesn't show the string data section directly. Instead, strings are substituted into the bytecode output to make it easier to read.

0000: new-instance v0, Ljava/lang/StringBuilder; // type@0003
0002: invoke-direct {v0}, Ljava/lang/StringBuilder;.<init>:()V // method@0003
0005: const-string v1, "A: " // string@0002

Hints of the string data section are shown in the comments which follow each line. string@0002 indicates this literal comes from index 2 in the string data section. The type@0003 and method@0003 hints point to separate sections of the dex which themselves eventually use the string data to create their signatures (similar to how the constant pool in the Java bytecode worked).

String Operations

Performing string operations on literals isn't something that frequently happens in your source code. You wouldn't write something like new User("OliveJakeHazel".substring(5, 9)) to create a User named "Jake". You would use "Jake" as the string literal without a substring call. One notable exception to this is computing the length of a string literal.

static String patternHost(String pattern) {
  return pattern.startsWith(WILDCARD)
      ? pattern.substring(WILDCARD.length())
      : pattern;
}

This code is adapted from a real example inside OkHttp where a string is tested for a prefix and then conditionally removed. The length is computed so that if the constant changes the value passed to substring remains correct.

Let's take a look at what Dalvik bytecode this example produces.

[0001a8] Test.patternHost:(Ljava/lang/String;)Ljava/lang/String;
0000: const-string v0, "*."
0002: invoke-virtual {v2, v0}, Ljava/lang/String;.startsWith:(Ljava/lang/String;)Z
0005: move-result v1
0006: if-eqz v1, 0010
0008: invoke-virtual {v0}, Ljava/lang/String;.length:()I
0011: move-result v1
0012: invoke-virtual {v2, v1}, Ljava/lang/String;.substring:(I)Ljava/lang/String;
000f: move-result-object v2
0010: return-object v2

In index 0000 to 0002, the WILDCARD constant (whose value is the literal "*.") is loaded into register v0 in order to call startWith on the parameter (in v2). Later, in index 0008 to 0011, the length of v0 is calculated and stored in v1 so that it can be used to call substring on the parameter.

Since WILDCARD is a constant initialized with a string literal, its length is also a constant. Computing its length at runtime is a waste of time because it will always produce the same value. When the above code is compiled with R8, the call to length() on a constant is replaced with the value as determined at compile-time.

[0001a8] Test.patternHost:(Ljava/lang/String;)Ljava/lang/String;
0000: const-string v0, "*."
0002: invoke-virtual {v1, v0}, Ljava/lang/String;.startsWith:(Ljava/lang/String;)Z
0005: move-result v0
0006: if-eqz v0, 000d
0008: const/4 v0, #int 2
0009: invoke-virtual {v1, v0}, Ljava/lang/String;.substring:(I)Ljava/lang/String;
000c: move-result-object v1
000d: return-object v1

Index 0008 now loads the constant value of 2 which is immediately passed to the substring call. The bytecode gets the performance benefit of a hardcoded value without the maintenance burden of keeping the two values in sync in the source code.

And because this computation was trivial and removing the call to length() won't change the program's behavior, D8 will also perform this optimization!

Inlining

Computing the length of a constant string isn't the only string operation that can happen at compile-time. Common string operations such as startWith, indexOf, and substring can all be computed provided that their arguments are also constants. While this is rare to find verbatim in source code, method inlining can create situations where this happens.

class Test {
  private static final String WILDCARD = "*.";

  private static String patternHost(String pattern) {
    return pattern.startsWith(WILDCARD)
        ? pattern.substring(WILDCARD.length())
        : pattern;
  }

  public static String canonicalHost(String pattern) {
    String host = patternHost(pattern);
    return HttpUrl.get("http://" + host).host();
  }

  public static void main(String... args) {
    String pattern = "*.example.com";
    String canonical = canonicalHost(pattern);
    System.out.println(canonical);
  }
}

Take this more complete example where the main method calls a public library method canonicalHost with a string literal. The canonicalHost library method delegates to patternHost which is a private library method. Because this program is so small both methods will ultimately be inlined into the main method.

We can pretend this inlining happened at the source-level to see how the code changes as the string optimizations apply.

class Test {
  private static final String WILDCARD = "*.";

  public static void main(String... args) {
    String pattern = "*.example.com";
    String host = pattern.startsWith(WILDCARD)
        ? pattern.substring(WILDCARD.length())
        : pattern;
    String canonical = HttpUrl.get("http://" + host).host();
    System.out.println(canonical);
  }
}

R8's intermediate representation (IR) during compilation uses static single-assignment form (SSA) (introduced in part 1 of the null analysis) which allows it to, among other things, trace the origin of local variables. Despite startsWith operating on the variable pattern, that variable's origin can be traced to the string literal "*.example.com". The argument to startsWith, WILDCARD, is also a string constant allowing the whole operation to be replaced with its result at compile-time.

 String pattern = "*.example.com";
-String host = pattern.startsWith(WILDCARD)
+String host = true
     ? pattern.substring(WILDCARD.length())

Dead-code elimination removes the impossible 'else' branch and the conditional.

 String pattern = "*.example.com";
-String host = true
-     ? pattern.substring(WILDCARD.length())
-     : pattern;
+String host = pattern.substring(WILDCARD.length());
 String canonical = HttpUrl.get("http://" + host).host();

The call to length() on a string constant is replaced with the constant integer value as demonstrated in the previous section.

 String pattern = "*.example.com";
-String host = pattern.substring(WILDCARD.length());
+String host = pattern.substring(2);
 String canonical = HttpUrl.get("http://" + host).host();

Compiling and dexing the original three-method example with R8 confirms that this is the final result.

$ javac -cp okhttp-3.13.1.jar Test.java

$ cat rules.txt
-keepclasseswithmembers class * {
  public static void main(java.lang.String[]);
}

$ java -jar r8.jar \
    --lib $ANDROID_HOME/platforms/android-28/android.jar \
    --release \
    --output . \
    --pg-conf rules.txt \
    *.class

$ $ANDROID_HOME/build-tools/28.0.3/dexdump -d classes.dex
[0001c0] Test.main:([Ljava/lang/String;)V
0000: const/4 v2, #int 2
0001: const-string v0, "*.example.com"
0003: invoke-virtual {v0, v2}, Ljava/lang/String;.substring:(I)Ljava/lang/String;
0006: move-result-object v2
0007: new-instance v0, Ljava/lang/StringBuilder;
0009: invoke-direct {v0}, Ljava/lang/StringBuilder;.<init>:()V
000c: const-string v1, "http://"
000e: invoke-virtual {v0, v1}, Ljava/lang/StringBuilder;.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
0011: invoke-virtual {v0, v2}, Ljava/lang/StringBuilder;.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
0014: invoke-virtual {v0}, Ljava/lang/StringBuilder;.toString:()Ljava/lang/String;
0017: move-result-object v2
0018: invoke-static {v2}, Lokhttp3/HttpUrl;.get:(Ljava/lang/String;)Lokhttp3/HttpUrl;
001b: move-result-object v2
001c: invoke-virtual {v2}, Lokhttp3/HttpUrl;.host:()Ljava/lang/String;
001f: move-result-object v2
0020: sget-object v0, Ljava/lang/System;.out:Ljava/io/PrintStream;
0022: invoke-virtual {v0, v2}, Ljava/io/PrintStream;.println:(Ljava/lang/String;)V
0025: return-void

The startsWith check and conditional have been removed because inlining has made the receiver string available as a constant. Our dex file is a bit smaller and our program runs a bit faster now because this condition which always produced the same value was computed at compile-time.

Money Left on the Table

Having length() and startsWith() replaced with a value computed at compile-time is a nice win. Other methods on String can be computed at compile-time such as isEmpty(), contains(), endsWith(), equals(), and equalsIgnoreCase(). Looking at the result above leaves me unsatisfied because optimizations were left on the table. Let's look at the final form as if it were source code and analyze what didn't happen.

String pattern = "*.example.com";
String host = pattern.substring(2);
String canonical = HttpUrl.get("http://" + host).host();
System.out.println(canonical);

The now-removed call to startsWith was able to be eliminated because the receiver (i.e., the target string) and argument were both known at compile-time. Looking at the above example, that condition holds true for the call to substring. It should have been eliminated.

-String pattern = "*.example.com";
-String host = pattern.substring(2);
+String host = "example.com";
 String canonical = HttpUrl.get("http://" + host).host();

The argument sent to HttpUrl.get is now the result of string concatenation of two string literals. The need to concatenate those at runtime should have been eliminated.

-String host = "example.com";
-String canonical = HttpUrl.get("http://" + host).host();
+String canonical = HttpUrl.get("http://example.com").host();

These optimizations are likely to be included in a future version of R8 but they're not as trivial as they might seem.

Every existing string optimization returns a primitive value such as a boolean or int which can be represented directly in the bytecode. As a result of those optimizations, it's possible for the string data section to shrink if a string becomes unused. In the example above, WILDCARD becomes unused since its only two uses (as an argument to startsWith and as a receiver for length) were replaced with primitives and so it does not appear in the final dex file.

Computing a substring or performing concatenation at compile-time has the potential to increase the size of the string data section. If the input strings are still used in other parts of the application they won't be eliminated. The new string, however, will always be added.

Doing these optimizations on the trivial program in this post removes 16 bytes of bytecode but adds 18 bytes of string data. In this case, because the input strings are not used anywhere else, an additional 20 bytes is removed for a net reduction of 18 bytes (ignoring the other parts of a dex).

In real-world applications it becomes less clear whether computing these is the correct choice. For now, these optimizations are not performed.

When combined with inlining, R8's string optimizations help eliminate dead code and improve runtime performance when working with string literals. To track updates to and show support for new String methods being computed at compile-time star issuetracker.google.com/issues/119364907. For string concatenation star issuetracker.google.com/issues/114002137.

The next post in the series will look at an optimization that creates string literals at compile-time which otherwise would need to be created at runtime.

(This post was adapted from a part of my Digging into D8 and R8 talk that was never presented. Watch the video and look out for future blog posts for more content like this.)

https://jakewharton.com/r8-optimization-string-constant-operations

R8 Optimization: Value Assumption

Jan 22, 2019 Updated Jan 22, 2019

Show full content

Note: This post is part of a series on D8 and R8, Android's new dexer and optimizer, respectively. For an intro to D8 read "Android's Java 8 support". For an intro to R8 read "R8 Optimization: Staticization".

The previous post (part 1, part 2) featured R8 performing data-flow analysis of variables in order to determine if they were maybe null, always null, or never null, and then potentially performing dead-code elimination based on that info.

Another way to think about that optimization is that R8 tracks the use of a variable along with a range of its possible nullability. If any conditional against that range can be determined to always produce the same result, dead-code elimination removes the unused branches and the conditional disappears. Part 2 of the last post ended with an example where an args variable was passed into a first method and then checked for null before printing.

System.out.println(first(args));
if (args == null) {
  System.out.println("null!");
}

The range of nullability for args in that snippet is [null, non-null] (meaning its either null or a non-null reference).

System.out.println(first(args/* [null, non-null] */));
if (args/* [null, non-null] */ == null) {
  System.out.println("null!");
}

In this state, R8 can't do anything to the conditional because the reference might actually be null. However, if the first method checks its argument for null and throws an exception (as it did in that post), null can be eliminated as a possible value after the method call.

System.out.println(first(args/* [null, non-null] */));
if (args/* [non-null] */ == null) {
  System.out.println("null!");
}

With args only able to be a non-null reference at the time of the if check against null, the conditional will always be false and can be removed by normal dead-code elimination.

System.out.println(first(args/* [null, non-null] */));
if (false) {
  System.out.println("null!");
}

Right now this range tracking doesn't extend beyond nullability. Checking an integer for being positive twice in a method does not cause R8 to eliminate the second conditional. That being said, there is a way to manually help R8 understand the range of other types.

Value Assumption

R8 uses the same configuration syntax as ProGuard in order to simplify migration. Once you've migrated, though, there are some R8-specific flags you can specify. This post deals with one of those flags: -assumevalues.1

The -assumevalues flag informs R8 that the specified field value or method's return value will always be between a certain range or equal to a single value. The paragraph above mentioned that R8 won't eliminate a second check for a positive value like it would a second check for null. If the integer value being checked comes from a method or is stored in a field this flag can help.

class Count {
  public static void main(String... args) {
    count = 3;
    sayHi();
  }

  private static int count = 1;

  private static void sayHi() {
    if (count < 0) {
      throw new IllegalStateException();
    }
    for (int i = 0; i < count; i++) {
      System.out.println("Hi!");
    }
  }
}

This example has a static field that dictates how many times "Hi!" is printed. Compiling, dexing with R8, and dumping the resulting bytecode shows that the check for negative remains in the bytecode despite being an impossible condition.

$ javac *.java

$ cat rules.txt
-keepclasseswithmembers class * {
  public static void main(java.lang.String[]);
}
-dontobfuscate

$ java -jar r8.jar \
    --lib $ANDROID_HOME/platforms/android-28/android.jar \
    --release \
    --output . \
    --pg-conf rules.txt \
    *.class

$ $ANDROID_HOME/build-tools/28.0.3/dexdump -d classes.dex
[000148] Count.main:([Ljava/lang/String;)V
0000: const/4 v2, #int 3
0001: sput v2, LCount;.count:I
0003: sget v2, LCount;.count:I
0005: if-ltz v2, 0017
0007: const/4 v2, #int 0
0008: sget v0, LCount;.count:I
000a: if-ge v2, v0, 0016
000c: sget-object v0, Ljava/lang/System;.out:Ljava/io/PrintStream;
000e: const-string v1, "Hi!"
0010: invoke-virtual {v0, v1}, Ljava/io/PrintStream;.println:(Ljava/lang/String;)V
0013: add-int/lit8 v2, v2, #int 1
0015: goto 0008
0016: return-void
0017: new-instance v2, Ljava/lang/IllegalStateException;
0019: invoke-direct {v2}, Ljava/lang/IllegalStateException;.<init>:()V
001c: throw v2

R8 has inlined sayHi() into main() but everything is still here. Bytecode index 0000-0001 assign the value of 3 to count. Then index 0003-0005 read count and check if it's less than 0, jumping to index 0017 if so. Index 0007-00015 is the loop, 0016 is the implicit return, and 0017 is the exception code (notice how it's been moved to the bottom as explained in the previous post).

In order for R8 to eliminate the negative check it would need to analyze how the entire program interacts with count. While it would be trivial in this tiny example, in a real program the complexity of this task makes it infeasible.

Since this is application code in our control, we have additional knowledge of the domain of count which R8 can't infer. Adding an -assumevalues flag to our rules.txt gives R8 the expected range of values that reading count will produce.

 -keepclasseswithmembers class * {
   public static void main(java.lang.String[]);
 }
 -dontobfuscate
+-assumevalues class Count {
+  static int count return 0..2147483647;
+}

Just as it did for tracking whether or not a reference could be null, R8 can now track the range of values of count.

if (count/* [0..2147483647] */ < 0) {
  throw new IllegalStateException();
}
for (int i = 0; i < count/* [0..2147483647] */; i++) {
  System.out.println("Hi!");
}

With count only able to be a positive value at the time of the if check for negative, the conditional will always be false and can be removed by normal dead-code elimination.

if (false) {
  throw new IllegalStateException();
}

Running R8 with the new rules.txt validates that this works.

[000128] Count.main:([Ljava/lang/String;)V
0000: const/4 v2, #int 3
0001: sput v2, LCount;.count:I
0003: sget v2, LCount;.count:I
0005: const/4 v2, #int 0
0006: sget v0, LCount;.count:I
0008: if-ge v2, v0, 0014
000a: sget-object v0, Ljava/lang/System;.out:Ljava/io/PrintStream;
000c: const-string v1, "Hi!"
000e: invoke-virtual {v0, v1}, Ljava/io/PrintStream;.println:(Ljava/lang/String;)V
0011: add-int/lit8 v2, v2, #int 1
0013: goto 0006
0014: return-void

Bytecode index 0000-0001 is still the assignment, 0005-0013 is the loop, and 0014 is the implicit return. No conditional in sight!

Side-Effects

In the final bytecode from the previous example, index 0003 still reads count despite its value never actually being used (it's immediately overwritten with 0 by the very next bytecode). This is the field read that would have been used for the now-eliminated conditional. Previous posts showed R8 eliminating unused code like this using its static, single-assignment intermediate representation (SSA IR). Why isn't that happening here?

When R8 eliminates code based on -assumevalues it explicitly keeps the method call or field read despite not needing the value. A method call might trigger some other side-effect which would result in a behavior change if removed. A field read might cause a class to be loaded for the first time where a static initializer could have side-effects. It's usually unlikely that your application has these side-effects or that you rely on them. Changing the rule from -assumevalues to -assumenosideeffects assures R8 of this allowing index 0003 to be removed.2

This example is obviously small and contrived. But does anything come to mind as a real-world use case for eliminating impossible if branches by telling R8 the range of an integer field?

Build.VERSION.SDK_INT

As Android developers, we're accustomed to varying implementation based on the version of the OS that our libraries and applications are running on. This is done by checking the Build.VERSION.SDK_INT integer field against known API levels.

if (Build.VERSION.SDK_INT >= 21) {
  System.out.println("21+ :-D");
} else if (Build.VERSION.SDK_INT >= 16) {
  System.out.println("16+ :-)")
} else {
  System.out.println("Pre-16 :-(");
}

With -assumevalues, R8 can now be used to eliminate these unused branches by specifying the supported API range.

-assumevalues class android.os.Build$VERSION {
  int SDK_INT return 21..2147483647;
}

The range from this rule is used to see if any conditionals can be made constant.

if (Build.VERSION.SDK_INT/* [21..2147483647] */ >= 21) {
  System.out.println("21+ :-D");
} else if (Build.VERSION.SDK_INT/* [21..2147483647] */ >= 16) {
  System.out.println("16+ :-)")
} else {
  System.out.println("Pre-16 :-(");
}

In this example, based on the supplied range, both conditional checks will always evaluate to true.

if (true) {
  System.out.println("21+ :-D");
} else if (true) {
  System.out.println("16+ :-)")
} else {
  System.out.println("Pre-16 :-(");
}

With the first branch guaranteed to always be taken, dead-code elimination kicks in to remove the else if and else branches leaving only a single print.

System.out.println("21+ :-D");

For SDK_INT conditionals in the application code we write day-to-day, there aren't going to be branches for API levels lower than our minimum SDK version. Android's lint tool will actually validate this with its ObsoleteSdkInt check (which you should set to error!).

These conditionals are far more pervasive in libraries since they tend to support a larger API range than the consuming application. It's almost guaranteed then that the libraries have branches which will never be executed in the context of your application.

AndroidX Core

Whether you know it or not, these SDK_INT conditionals are all over your app. The AndroidX 'core' library (formerly the Support 'compat' library) is present in practically in 100% of apps and it exists almost exclusively to host compatibility APIs which use SDK_INT checks to vary their implementation. Its minimum supported SDK is 14 which is almost certainly lower than that of your app.

// ViewCompat.java
public static boolean hasOnClickListeners(@NonNull View view) {
  if (Build.VERSION.SDK_INT >= 15) {
    return view.hasOnClickListeners();
  }
  return false;
}

There are conditionals for every API level regardless of whether they're needed for your app. The above example has a trivial fallback, but some of the compatibility implementations start to require quite a bit of code.

// ViewCompat.java
public static int getMinimumWidth(@NonNull View view) {
  if (Build.VERSION.SDK_INT >= 16) {
    return view.getMinimumWidth();
  }

  if (!sMinWidthFieldFetched) {
    try {
      sMinWidthField = View.class.getDeclaredField("mMinWidth");
      sMinWidthField.setAccessible(true);
    } catch (NoSuchFieldException e) { }
    sMinWidthFieldFetched = true;
  }
  if (sMinWidthField != null) {
    try {
      return (int) sMinWidthField.get(view);
    } catch (Exception e) { }
  }
  return 0;
}

That legacy implementation after the first if sits in your APK despite few (if any) apps actually needing a pre-API 16 implementation. Some of the compatibility implementations also require entire classes for support.

// DrawableCompat.java
public static Drawable wrap(@NonNull Drawable drawable) {
  if (Build.VERSION.SDK_INT >= 23) {
    return drawable;
  } else if (Build.VERSION.SDK_INT >= 21) {
    if (!(drawable instanceof TintAwareDrawable)) {
      return new WrappedDrawableApi21(drawable);
    }
    return drawable;
  } else {
    if (!(drawable instanceof TintAwareDrawable)) {
      return new WrappedDrawableApi14(drawable);
    }
    return drawable;
  }
}

If your minimum SDK is less than 23 then the WrappedDrawableApi21 class is in your APK. And if your minimum SDK is less than 21 the WrappedDrawableApi14 is also in your APK.

There are over 850 SDK_INT checks in AndroidX 'core' across every API level–double that number across all of AndroidX. You might use a few of these static helpers in your app, but it's other libraries who using are the biggest users of these APIs. Things like RecyclerView, fragments, CoordinatorLayout, and AppCompat all support API 14 as well and so they frequently call into these methods.

Using -assumevalues allows R8 to eliminate compatibility implementations in these methods which will never be used by your app. This means less classes, less methods, less fields, and less code in your release APK.

Zero-Overhead Abstraction

A common theme of these posts is multiple features of R8 combining to produce really impressive results. This post is no different! The SDK_INT checks in the AndroidX 'core' library delegate to the framework mechanism when available. If your minimum SDK is high enough, R8 will eliminate all of the conditionals in a compat method leaving only the call to the framework.

import android.os.Build;
import android.view.View;

class ZeroOverhead {
  public static void main(String... args) {
    View view = new View(null);
    setElevation(view, 8f);
  }
  public static void setElevation(View view, float elevation) {
    if (Build.VERSION.SDK_INT >= 21) {
      view.setElevation(elevation);
    }
  }
}

An app with a minimum SDK of 21 using -assumevalues should expect to see the setElevation static method become a simple trampoline to the built-in method.

$ javac *.java

$ cat rules.txt
-keepclasseswithmembers class * {
  public static void main(java.lang.String[]);
}
-dontobfuscate
-assumevalues class android.os.Build$VERSION {
  int SDK_INT return 21..2147483647;
}

$ java -jar r8.jar \
    --lib $ANDROID_HOME/platforms/android-28/android.jar \
    --release \
    --output . \
    --pg-conf rules.txt \
    *.class

$ $ANDROID_HOME/build-tools/28.0.3/dexdump -d classes.dex
[00013c] ZeroOverhead.main:([Ljava/lang/String;)V
0000: new-instance v1, Landroid/view/View;
0002: const/4 v0, #int 0
0003: invoke-direct {v1, v0}, Landroid/view/View;.<init>:(Landroid/content/Context;)V
0006: sget v0, Landroid/os/Build$VERSION;.SDK_INT:I
0008: const/high16 v0, #int 1090519040
000a: invoke-virtual {v1, v0}, Landroid/view/View;.setElevation:(F)V
000d: return-void

After running this through R8, the static setElevation method has completely disappeared. At the call site in main, bytecode index 000a now shows a direct call to the real View.setElevation method.

After -assumevalues removed the conditional, the body of the static setElevation method is small enough that it becomes eligible for inlining. All calls to ViewCompat.setElevation will be rewritten to directly call view.setElevation. The small penalty that would otherwise be incurred from the extra method call and conditional can be completely eliminated when they no longer serve a purpose.

No Configuration Necessary

If you read the post on VM-specific workarounds you might remember that D8 and R8 have a --min-api flag. When the Android Gradle plugin (AGP) invokes D8 or R8 it sets this flag to the minimum SDK version that your app supports. Starting with R8 1.4.22 which is part of AGP 3.4 beta 1 (and newer), a rule for Build.VERSION.SDK_INT is automatically added based on the --min-api flag's value.

-assumevalues public class android.os.Build$VERSION {
  public static int SDK_INT return <minApi>..2147483647;
}

Instead of having to know about this R8 feature and manually enable it with your minimum SDK version, the tool enables it by default so that everyone gets smaller APKs and better runtime performance.

Because of the use of -assumevalues for this automatic rule, the read of the Build.VERSION.SDK_INT field will be retained. You can see this in the bytecode above at index 0006. Unfortunately, switching to -assumenosideeffects won't cause the read to be removed like an application field would. Follow issuetracker.google.com/issues/111763015 for supporting this behavior on framework fields.

Defining a range for SDK_INT is by far the most compelling demo of value assumption and now that it's enabled by default should have a positive impact on APKs. Marking View.isInEditMode() as always false is potentially another useful default, but issuetracker.google.com/issues/111763015 prevents it from working correctly. Other examples will likely vary from app-to-app or depend on the libraries in use.

The next post in the series will take a look at a few optimizations that R8 applies to values which are constants.

(This post was adapted from a part of my Digging into D8 and R8 talk. Watch the video and look out for future blog posts for more content like this.)

Since the original presentation, ProGuard has opted to include this flag and functionality in its 6.1.0 version (currently in beta at the time of writing). So while it originated in R8, it is technically no longer R8-specific. ↩
Currently this only works for methods and not fields. Follow issuetracker.google.com/issues/123080377 for updates on supporting fields. ↩

https://jakewharton.com/r8-optimization-value-assumption

R8 Optimization: Null Data Flow Analysis (Part 2)

Jan 15, 2019 Updated Jan 15, 2019

Show full content

Note: This post is part of a series on D8 and R8, Android's new dexer and optimizer, respectively. For an intro to D8 read "Android's Java 8 support". For an intro to R8 read "R8 Optimization: Staticization".

Part 1 of this post demonstrated R8's ability to eliminate null checks after method inlining. This was accomplished by virtue of nullability information being present in R8's (and D8's) intermediate representation (IR). When the arguments flowing into a method were always non-null or always null, the now-inlined null check can be computed at compile-time.

Examples in the last two posts have mostly used Kotlin. To improve readability of their bytecode, I've been removing a section of it. The last post started with an example of a coalesce function being called from a main function.

fun <T : Any> coalesce(a: T?, b: T?): T? = a ?: b

fun main(args: Array<String>) {
 println(coalesce("one", "two"))
 println(coalesce(null, "two"))
}

Multiple versions of the compiled bytecode of this function were shown in that post and they all started with sget-object v1, Ljava/lang/System;.out:Ljava/io/PrintStream;. This is the bytecode looking up the static System.out field on which it can eventually invoke the println method.

If you compile, dex, and dump the bytecode of the Kotlin source above, however, the first bytecodes are something quite different.

$ kotlinc *.kt

$ java -jar d8.jar \
    --lib $ANDROID_HOME/platforms/android-28/android.jar \
    --release \
    --output . \
    *.class kotlin-stdlib-1.3.11.jar

$ $ANDROID_HOME/build-tools/28.0.3/dexdump -d classes.dex
[00023c] NullsKt.main:([Ljava/lang/String;)V
0000: const-string v0, "args"
0002: invoke-static {v2, v0}, Lkotlin/jvm/internal/Intrinsics;.checkParameterIsNotNull:(Ljava/lang/Object;Ljava/lang/String;)V
0005: sget-object v1, Ljava/lang/System;.out:Ljava/io/PrintStream;
…

Instead of bytecodes representing the body of the function we wrote, the Kotlin compiler first emits a call to the standard library's Intrinstrics.checkParameterIsNotNull function. This call is a behind-the-scenes runtime validation of a compile-time constraint.

Kotlin's type system models the nullability of references. By making the parameter of my main function Array<String>, I have declared it as never being null. But since this is a public API that anyone in any language can invoke, nothing prevents a non-Kotlin caller from passing null. In order to validate the non-null constraint and protect its users, the Kotlin compiler inserts defensive checks for non-null parameters in every public API function.

(Note: While there is a way to disable generation of these defensive checks, it's not wise to do.)

Let's take a look at how using R8 on the same source file changes the output.

$ cat rules.txt
-keepclasseswithmembers class * {
  public static void main(java.lang.String[]);
}
-dontobfuscate

$ java -jar r8.jar \
    --lib $ANDROID_HOME/platforms/android-28/android.jar \
    --release \
    --output . \
    --pg-conf rules.txt \
    *.class kotlin-stdlib-1.3.11.jar

$ $ANDROID_HOME/build-tools/28.0.3/dexdump -d classes.dex
[000314] NullsKt.main:([Ljava/lang/String;)V
0000: if-eqz v1, 0011
0002: sget-object v1, Ljava/lang/System;.out:Ljava/io/PrintStream;
…
0010: return-void
0011: const-string v1, "args"
0013: invoke-static {v1}, Lkotlin/jvm/internal/Intrinsics;.throwParameterIsNullException:(Ljava/lang/String;)V

The string constant load and Intrinsics method call which started the method body in the D8 output above has been replaced with a standard null check via the if-eqz bytecode. If that null check succeeds (i.e., the reference is null), the program jumps ahead to the end of the method's bytecodes where the code that builds and throws the exception lives. As a result, in the normal operation of this method where args is non-null, the runtime can execute bytecodes index 0000 through 0010 without a jump.

If we make a quick conjecture about why the bytecode looks like this with R8 we might say that it's the result of inlining. In part 1 we saw the coalesce function be inlined and so here the Instrinsics.checkParameterIsNotNull implementation could have just been inlined too. A quick glance at its implementation does show a standard null check and a call to Instrincs.throwParameterIsNullException.

public static void checkParameterIsNotNull(Object value, String paramName) {
  if (value == null) {
    throwParameterIsNullException(paramName);
  }
}

But the actual R8 bytecode doesn't match what you would expect when thinking about how inlining works. If this method was inlined, the body of the if should appear at the top of the method immediately after the check. Beyond that, despite being a tiny method it's actually larger than R8's inlining threshold. The only way it would be inlined is if it was used infrequently (which it isn't). There's a few tricks at play here which produce the actual result we're seeing.

The first trick is that R8 will increase the inlining threshold for a method when it null checks an argument whose value is also an argument at the call site. Since checkParameterIsNotNull is only used for arguments at call sites the inlining threshold for this method goes up. The body of the method is otherwise empty so it becomes eligible and is inlined.

The second trick is that R8 can recognize the sequence of bytecodes which both perform a null check on an argument and then throw an exception. When this pattern is recognized, R8 assumes it's the uncommon path for method execution. In order to optimize for the common path, the null check is inverted so that the non-null case immediately follows the check. The exception-throwing code is pushed to the bottom of the method.

But the if check of checkParameterIsNotNull does not match the sequence of bytecodes R8 needs to recognize the argument-check pattern. The body of the if contains a static method call instead of an exception being thrown. So the final trick is that R8 has an intrinsic which recognizes calls to Intrinsics.throwParameterIsNullException as being equivalent to throwing an exception. This allows the body to correctly match the pattern.

These three tricks combine to explain why R8 produces the bytecode we see above.

And remember, every method which is potentially visible to a non-Kotlin caller has this code for every non-null parameter. That's a large amount of occurrences in any non-trivial app!

With R8 replacing a static method call with a standard null-check and moving the uncommon case to the end of the method the code retains the safety of the check while minimizing the performance implications.

Combining Null Information

Once again, part 1 of this post showed R8 using nullability information of values to eliminate unnecessary null checks. The first half of this part showed R8 raise inlining thresholds to ignore null checks and replace Kotlin's Intrinsic method with standard null check bytecodes. These two features sound like they could combine to make some impactful changes. And they do!

This example adds a function, String.double, which just duplicates the string its called on. This function is invoked on the result of coalesce with a safe-call operator since null might be returned.

fun String.double(): String = this + this

fun coalesce(a: String?, b: String?): String? = a ?: b

fun main(args: Array<String>) {
  println(coalesce(null, "two")?.double())
}

Before looking at the R8 output, let's enumerate the null checks which are present before dexing and optimizing:

args argument is checked because it's a public function.
The return value of coalesce is checked before conditionally invoking double.
coalesce checks the first argument before conditionally returning either first or second.
double's receiver is checked because it's a public function.

You can run D8 on the example to confirm. But running with R8 produces a pretty picture.

[000310] NullsKt.main:([Ljava/lang/String;)V
0000: if-eqz v1, 0019
0002: const-string v1, "two"
0004: new-instance v0, Ljava/lang/StringBuilder;
0006: invoke-direct {v0}, Ljava/lang/StringBuilder;.<init>:()V
0009: invoke-virtual {v0, v1}, Ljava/lang/StringBuilder;.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
000c: invoke-virtual {v0, v1}, Ljava/lang/StringBuilder;.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
000f: invoke-virtual {v0}, Ljava/lang/StringBuilder;.toString:()Ljava/lang/String;
0012: move-result-object v1
0013: sget-object v0, Ljava/lang/System;.out:Ljava/io/PrintStream;
0015: invoke-virtual {v0, v1}, Ljava/io/PrintStream;.println:(Ljava/lang/Object;)V
0018: return-void
0019: const-string v1, "args"
001b: invoke-static {v1}, Lkotlin/jvm/internal/Intrinsics;.throwParameterIsNullException:(Ljava/lang/String;)V

All of the null checks except the one guarding the argument were eliminated!

Because R8 can prove coalesce returns a non-null reference, all downstream null checks can be eliminated. This means the safe-call isn't needed and is replaced with a normal method call. The null check on the receiver of the double function is also eliminated.

No Inlining Required

The examples so far have included inlining to aid in reducing the output. In practice, inlining won't happen to the degree that it does in these small examples. That doesn't prevent elimination of all null checks.

While I find the Kotlin examples particularly compelling here because of the forced, defensive null checks, looking at the optimization for Java is interesting because of the opposite behavior. Java doesn't put defensive null checks on public method arguments and so data flow analysis can use other patterns for nullability signals even without inlining.

final class Nulls {
  public static void main(String[] args) {
    System.out.println(first(args));
    if (args == null) {
      System.out.println("null!");
    }
  }

  public static String first(String[] values) {
    if (values == null) throw new NullPointerException("values == null");
    return values[0];
  }
}

Every reference is potentially-nullable in Java. As a result, it's not uncommon to see defensive checks in library methods like first (even when annotated @NonNull!). Library methods might be large or called from all over your application and so they usually aren't inlined. To simulate this, we can explicitly tell R8 to keep first as a method in the rules.txt.

 -keepclasseswithmembers class * {
   public static void main(java.lang.String[]);
 }
 -dontobfuscate
+-keep class Nulls {
+   public static java.lang.String first(java.lang.String[]);
+}

Even without inlining the output is favorable.

[000144] Nulls.first:([Ljava/lang/String;)Ljava/lang/String;
0000: if-eqz v1, 0006
0002: const/4 v0, #int 0
0003: aget-object v1, v1, v0
0005: return-object v1
0006: new-instance v1, Ljava/lang/NullPointerException;
0008: const-string v0, "values == null"
000a: invoke-direct {v1, v0}, Ljava/lang/NullPointerException;.<init>:(Ljava/lang/String;)V
000d: throw v1

[000170] Nulls.main:([Ljava/lang/String;)V
0000: sget-object v0, Ljava/lang/System;.out:Ljava/io/PrintStream;
0002: invoke-static {v1}, LNulls;.first:([Ljava/lang/String;)Ljava/lang/String;
0005: move-result-object v1
0006: invoke-virtual {v0, v1}, Ljava/io/PrintStream;.println:(Ljava/lang/String;)V
0009: return-void

In first, R8 has once again inverted the null check so that the uncommon case of throwing an exception is at the bottom of the method at index 0006. Normal execution of this method will flow from index 0000 straight down to 0005 and return.

In main, the explicit null check of args and its printing has disappeared. This is because R8 has tracked that the args reference flowed into first where it became impossible to be null after that call. As a result, any null checks that occur after the call to first don't need to occur.

All of these examples are small and somewhat contrived, but they demonstrate a part of the data-flow analysis that R8 is doing with regard to nullability and null checking. In the scope of your whole application whether it's Java, Kotlin, or mixed, unnecessary null checks and unused branches can be eliminated without sacrificing the safety they otherwise afford.

Next week's R8 post will cover my favorite feature of the tool. It's also the one which I think produces the best demo and which resonates with every Android developer. Stay tuned!

(This post was adapted from a part of my Digging into D8 and R8 talk. Watch the video and look out for future blog posts for more content like this.)

https://jakewharton.com/r8-optimization-null-data-flow-analysis-part-2

Inline Classes Make Great Database IDs

Jan 10, 2019 Updated Jan 10, 2019

Show full content

Kotlin 1.3's experimental inline class feature allows creating type-safe, semantic wrappers around values which are erased at runtime. Database IDs are a perfect use case for this functionality. Combined with SQLDelight which automatically generates model objects and APIs for querying, different table's IDs become different types which prevent erroneous use.

In modeling an app that sends payments, the domain includes customers, instruments (like debit cards and bank accounts), and payments. These otherwise would all have their IDs represented by a Long allowing programming bugs such as passing a payment ID as a customer ID to go undetected.

Instead, define an inline class for each ID around a Long (or whatever your ID type is).

package com.example.db

inline class CustomerId(val value: Long)
inline class InstrumentId(val value: Long)
inline class PaymentId(val value: Long)

When defining your schema, tell SQLDelight to use these types for the ID columns.

-- src/main/sqldelight/com/example/db/Customer.sq
CREATE TABLE customer(
  id INTEGER AS CustomerId PRIMARY KEY,

  -- other columns…
);

-- src/main/sqldelight/com/example/db/Instrument.sq
CREATE TABLE instrument(
  id INTEGER AS InstrumentId PRIMARY KEY,

  -- other columns…
);

(Just like when specifying any other Kotlin type for a column, you will need to register a ColumnAdapter for these types)

The payment table also uses these types for its own ID as well as the foreign key IDs to other tables.

-- src/main/sqldelight/com/example/db/Payment.sq
CREATE TABLE payment(
  id INTEGER AS PaymentId PRIMARY KEY,

  sender_id INTEGER AS CustomerId NOT NULL,
  recipient_id INTEGER AS CustomerId NOT NULL,
  instrument_id INTEGER AS InstrumentId NOT NULL,

  -- other columns…

  FOREIGN KEY(sender_id) REFERENCES customer(id),
  FOREIGN KEY(recipient_id) REFERENCES customer(id),
  FOREIGN KEY(instrument_id) REFERENCES instrument(id)
);

(Note: SQLDelight will soon enforce that these FOREIGN KEY relationships use the same type so that you can't mix them up)

Named queries whose arguments or selected columns reference these IDs will now automatically use these types.

paymentsBySender:
SELECT id
FROM payment
WHERE sender_id = ?;

The generated Kotlin signature for this query accepts a CustomerId and the query returns PaymentIds as expected.

fun paymentsBySender(sender_id: CustomerId): Query<PaymentId> {
  // …
}

If you're already looking at a single payment in this app, you might want to fetch all payments sent from that sender. With a reference to the current Payment object, you can invoke the named query to get the list.

val payment: Payment = // …

val bySender = queries.paymentsBySender(payment.id).executeAsList()

Before using inline classes, this code would have compiled and returned an empty list at runtime because of programmer error. The Payment's own id was erroneously supplied for the sender ID instead of the sender_id.

But because inline classes were used, this mistake can be caught at compile-time.

PaymentPresenter.kt:189:43: error: type mismatch: inferred type is PaymentId but CustomerId was expected
  val bySender = queries.paymentsBySender(payment.id).executeAsList()
                                          ^

After defining and using the inline classes in the schema once, this extra validation is effectively free because SQLDelight generates both the Payment model object and the function for the query.

A quick fix to pass sender_id allows the code to compile and also reflect the original intended behavior.

 val payment: Payment = // …
 
-val bySender = queries.paymentsBySender(payment.id).executeAsList()
+val bySender = queries.paymentsBySender(payment.sender_id).executeAsList()

When moving data around inside the database domain, the use of inline classes can prevent using semantically incorrect IDs. When combined with SQLDelight, these inline classes automatically apply to all of your models and query arguments adding an additional layer of safety to your database interaction. Enjoy!

https://jakewharton.com/inline-classes-make-great-database-ids

R8 Optimization: Null Data Flow Analysis (Part 1)

Dec 18, 2018 Updated Dec 18, 2018

Show full content

Note: This post is part of a series on D8 and R8, Android's new dexer and optimizer, respectively. For an intro to D8 read "Android's Java 8 support". For an intro to R8 read "R8 Optimization: Staticization".

The last post in this series was the first to cover R8 and one of its optimizations. This post will cover an optimization which performs data flow analysis of nullability. Let's dig in!

A coalesce function returns the first non-null argument that is provided. Running the following example, unsurprisingly, prints "one" and then "two".

fun <T : Any> coalesce(a: T?, b: T?): T? = a ?: b

fun main(vararg args: String) {
 println(coalesce("one", "two"))
 println(coalesce(null, "two"))
}

R8 and ProGuard will both perform function inlining when a function is small or if it's only called in one place. Since coalesce is small, its body will be inlined to every call site to be equivalent to the following source.

fun main(vararg args: String) {
  println("one" ?: "two")
  println(null ?: "two")
}

Were this actual source, the Kotlin compiler will determine that both of the elvis operators (?:) can be determined at compile-time. Compiling and dexing that fake source produces two calls to println with "one" and "two" and zero conditionals.

[000180] NullsKt.main:([Ljava/lang/String;)V
0000: sget-object v1, Ljava/lang/System;.out:Ljava/io/PrintStream;
0002: const-string v0, "one"
0004: invoke-virtual {v1, v0}, Ljava/io/PrintStream;.println:(Ljava/lang/Object;)V
0007: sget-object v1, Ljava/lang/System;.out:Ljava/io/PrintStream;
0009: const-string v0, "two"
000b: invoke-virtual {v1, v0}, Ljava/io/PrintStream;.println:(Ljava/lang/Object;)V
000e: return-void

But since the inlining occurs inside of R8 and not prior to running the Kotlin compiler, the actual Dalvik bytecode contains the conditionals.

[000144] NullsKt.main:([Ljava/lang/String;)V
0000: sget-object v1, Ljava/lang/System;.out:Ljava/io/PrintStream;
0002: const-string v0, "one"
0004: if-nez v0, 0006
0006: const-string v0, "two"
0008: invoke-virtual {v1, v0}, Ljava/io/PrintStream;.println:(Ljava/lang/Object;)V
000b: sget-object v1, Ljava/lang/System;.out:Ljava/io/PrintStream;
000d: const/4 v0, #int 0
000f: if-nez v0, 0010
0010: const-string v0, "two"
0012: invoke-virtual {v1, v0}, Ljava/io/PrintStream;.println:(Ljava/lang/Object;)V
0015: return-void

Note how bytecode index 0002 load the string "one" and then index 0004 performs a non-null check that will always succeed. This makes index 0006 which loads "two" dead code. Similarly, index 000d loads 0 (which represents null) and then index 000f does a non-null check that will always fail and fall through into index 0010.

As mentioned in the previous post, R8 uses an intermediate representation (IR) for code. This IR uses static single assignment form (SSA) in order to facilitate certain optimizations. With SSA, R8 can determine how data flows through the program. For the value that flows into the first println after inlining its SSA graph looks a bit like the following.

The foundational property of SSA is that each variable is only assigned once. This is why "two" is assigned to y instead of overwriting x. z uses a special phi function (Φ) to select between x or y based on which branch was taken. As you can see in the previous bytecode output, x, y, and z all wind up becoming register v0 which does get overwritten–single assignment is only for the IR!

If we take the above graph and add nullability information to it, both x and y would be marked as non-nullable since they are both initialized with a constant. As a result, z would also be non-nullable. Since w is a field lookup of a reference, it is potentially nullable.

With x being non-nullable, R8 determines that the if-nez bytecode which checks if x is non-null will always be true and thus is useless. The false branch of the conditional which assigns y will never be taken and so it is also useless.

These useless bytecodes can then be pruned from the graph since we know that they are dead code.

z is now a phi function on a single variable, x, which means we can just replace all usages z directly with x.

What's left is just the System.out lookup into w, assignment of the "one" constant into x, and then the call to println on w with the value x.

The above was only the SSA graph which flows into the first println. The second println is the inverse case where the value is initialized to null, a null check is performed, and then a fallback value is conditionally set.

With the SSA IR, R8 is able to determine that both conditionals are useless after the inlining of coalesce and remove the dead branches.

$ kotlinc *.kt

$ cat rules.txt
-keepclasseswithmembers class * {
  public static void main(java.lang.String[]);
}
-dontobfuscate

$ java -jar r8.jar \
    --lib $ANDROID_HOME/platforms/android-28/android.jar \
    --release \
    --output . \
    --pg-conf rules.txt \
    *.class kotlin-stdlib-1.3.11.jar

$ $ANDROID_HOME/build-tools/28.0.3/dexdump -d classes.dex
[000340] NullsKt.main:([Ljava/lang/String;)V
0000: sget-object v1, Ljava/lang/System;.out:Ljava/io/PrintStream;
0002: const-string v0, "one"
0004: invoke-virtual {v1, v0}, Ljava/io/PrintStream;.println:(Ljava/lang/Object;)V
0007: sget-object v1, Ljava/lang/System;.out:Ljava/io/PrintStream;
0009: const-string v0, "two"
000b: invoke-virtual {v1, v0}, Ljava/io/PrintStream;.println:(Ljava/lang/Object;)V
000e: return-void

The final Dalvik bytecode now matches that which the manually-inlined source file above produced.

Analysis Inside D8

In attempting to create the bytecode that would be generated after inlining but before nullability analysis eliminated dead code I tried to use equivalent Java.

class Nulls {
  public static void main(String... args) {
    Object first = "one";
    if (first == null) {
      first = "two";
    }
    System.out.println(first);
    Object second = null;
    if (second == null) {
      second = "two";
    }
    System.out.println(second);
  }
}

When you compile, dex with D8, and dump the bytecode from this example, though, the conditionals are still eliminated.

$ javac *.java

$ java -jar d8.jar \
    --lib $ANDROID_HOME/platforms/android-28/android.jar \
    --release \
    --output . \
    *.class

$ $ANDROID_HOME/build-tools/28.0.3/dexdump -d classes.dex
[000224] Nulls.main:([Ljava/lang/String;)V
0000: sget-object v1, Ljava/lang/System;.out:Ljava/io/PrintStream;
0002: const-string v0, "one"
0004: invoke-virtual {v1, v0}, Ljava/io/PrintStream;.println:(Ljava/lang/Object;)V
0007: sget-object v1, Ljava/lang/System;.out:Ljava/io/PrintStream;
0009: const-string v0, "two"
000b: invoke-virtual {v1, v0}, Ljava/io/PrintStream;.println:(Ljava/lang/Object;)V
000e: return-void

The reason that this happens is because the same IR is used by D8 and the nullability information is still present. Even without doing any of R8 optimizations, when conditionals are present in the IR that are trivially determined to be always true or always false then dead code elimination can occur.

If you use the legacy dx tool whose IR does not contain this information the bytecode will retain the conditionals and dead code.

$ $ANDROID_HOME/build-tools/28.0.3/dx --dex --output=classes.dex *.class

$ $ANDROID_HOME/build-tools/28.0.3/dexdump -d classes.dex
[000204] Nulls.main:([Ljava/lang/String;)V
0000: const-string v0, "one"
0002: if-nez v0, 0006
0004: const-string v0, "two"
0006: sget-object v1, Ljava/lang/System;.out:Ljava/io/PrintStream;
0008: invoke-virtual {v1, v0}, Ljava/io/PrintStream;.println:(Ljava/lang/Object;)V
000b: const/4 v0, #int 0
000c: if-nez v0, 0010
000e: const-string v0, "two"
0010: sget-object v1, Ljava/lang/System;.out:Ljava/io/PrintStream;
0012: invoke-virtual {v1, v0}, Ljava/io/PrintStream;.println:(Ljava/lang/Object;)V
0015: return-void

So while the data flow analysis really shines when optimizations like inlining are being applied by R8, if constant conditionals and dead code are present directly from source they'll still be eliminated by D8.

This post only scratches the surface of the data flow analysis inside R8. The next post will continue to expand on the nullability analysis with respect to how Kotlin enforces nullability constraints at runtime.

(This post was adapted from a part of my Digging into D8 and R8 talk. Watch the video and look out for future blog posts for more content like this.)

https://jakewharton.com/r8-optimization-null-data-flow-analysis-part-1

R8 Optimization: Staticization

Dec 11, 2018 Updated Dec 11, 2018

Show full content

Note: This post is part of a series on D8 and R8, Android's new dexer and optimizer, respectively. For an intro to D8 read "Android's Java 8 support". This post introduces R8.

The first three posts (1, 2, 3) in this series explored D8. Among its core responsibility of converting Java bytecode to Dalvik bytecode, it desugars new Java language features and works around vendor- and version-specific bugs in Android's VMs.

In general, D8 doesn't perform optimization. It may choose to use Dalvik bytecodes which more efficiently represent the intent of Java bytecodes (as seen with the not-int example). Or, in the process of desugaring language features, it may choose to optimize the desugared code it is generating. Aside from these very localized changes, D8 otherwise performs a direct translation.

R8 is a version of D8 that also performs optimization. It's not a separate tool or codebase, just the same tool operating in a more advanced mode. Where D8 first parses Java bytecode into its own intermediate representation (IR) and then writes out the Dalvik bytecode, R8 adds optimization passes over the IR before its written out.

This post (and a bunch of future posts) are going to explore some of the individual optimizations that R8 performs. We start with an optimization called staticization which means the act of making something static.

Companion Objects

Kotlin uses companion objects to model the features of Java's static modifier. They're actually a much more powerful language feature allowing things like inheritance and implementing interfaces. That power comes with an associated cost, however, and we pay for that cost regardless of whether we're using the added power or just emulating static.

fun main(vararg args: String) {
  println(Greeter.hello().greet("Olive"))
}

class Greeter(val greeting: String) {
  fun greet(name: String) = "$greeting, $name!"

  companion object {
    fun hello() = Greeter("Hello")
  }
}

In this example, the Greeter class uses a companion object to expose functionality that isn't tied to instances of Greeter. A convenience factory hello returns instances of Greeter initialized with the string "Hello". A main function calls the factory and then greets my dog Olive.

Compiling with kotlinc, dexing with D8, and dumping the Dalvik bytecode with dexdump we can see how this is implemented.

$ kotlinc *.kt

$ java -jar d8.jar \
    --lib $ANDROID_HOME/platforms/android-28/android.jar \
    --release \
    --output . \
    *.class

$ $ANDROID_HOME/build-tools/28.0.3/dexdump -d classes.dex
…
[000370] GreeterKt.main:([Ljava/lang/String;)V
0000: sget-object v1, LGreeter;.Companion:LGreeter$Companion;
0002: invoke-virtual {v1}, LGreeter$Companion;.hello:()LGreeter;
0005: move-result-object v1
0006: const-string v0, "Olive"
0008: invoke-virtual {v1, v0}, LGreeter;.greet:(Ljava/lang/String;)Ljava/lang/String;
000b: move-result-object v1
000c: sget-object v0, Ljava/lang/System;.out:Ljava/io/PrintStream;
000e: invoke-virtual {v0, v1}, Ljava/io/PrintStream;.println:(Ljava/lang/Object;)V
0011: return-void
…

Bytecode index 0000 loads an instance of the Greeter$Companion class from a static Companion field on Greeter. Index 0002 then makes a virtual method call to the hello function on that instance.

Looking at the nested Companion class confirms that it contains virtual (aka non-static methods).

Virtual methods   -
  #0              : (in LGreeter$Companion;)
    name          : 'hello'
    type          : '()LGreeter;'
    access        : 0x0011 (PUBLIC FINAL)
[000314] Greeter.Companion.hello:(Ljava/lang/String;)Ljava/lang/String;
0000: new-instance v0, LGreeter;
0002: const-string v1, "Hello"
0004: invoke-direct {v0, v1}, LGreeter;.<init>:(Ljava/lang/String;)V
0007: return-object v0

The use of a companion on Greeter means that a second, nested class named Companion is generated which adds to our binary size and slows startup because of additional class loading. The singleton instance of this class is retained in memory for the life of our application adding memory pressure. And finally, the use of instance methods require virtual calls which are slower than static calls. Granted, the impact of all these things for just one class is extremely minor, but in a large application written entirely in Kotlin it begins to contribute non-trivial overhead.

We can convert the Java classfiles to Dalvik using R8 instead of D8 and see what optimizations it applies. The flags to run R8 is nearly identical to D8 except it requires adding --pg-conf to supply a ProGuard-compatible configuration file. The one in use here keeps the main method as an entry point (otherwise the dex file would be empty) and disables class and method name obfuscation for the sake of readability.

$ cat rules.txt
-keepclasseswithmembers class * {
  public static void main(java.lang.String[]);
}
-dontobfuscate

$ java -jar r8.jar \
    --lib $ANDROID_HOME/platforms/android-28/android.jar \
    --release \
    --output . \
    --pg-conf rules.txt \
    *.class

R8 will produce a classes.dex just like D8 except with contents that have been optimized.

$ $ANDROID_HOME/build-tools/28.0.3/dexdump -d classes.dex
…
[000234] GreeterKt.main:([Ljava/lang/String;)V
0000: invoke-static {}, LGreeter;.hello:()LGreeter;
0003: move-result-object v1
0004: const-string v0, "Olive"
0006: invoke-virtual {v1, v0}, LGreeter;.greet:(Ljava/lang/String;)Ljava/lang/String;
0009: move-result-object v1
000a: sget-object v0, Ljava/lang/System;.out:Ljava/io/PrintStream;
000c: invoke-virtual {v0, v1}, Ljava/io/PrintStream;.println:(Ljava/lang/Object;)V
000f: return-void
…

The main method has changed slightly from the original version. Instead of an sget-object to look up the Companion instance and an invoke-virtual to call a hello instance method, only an invoke-static remains. It's also important to note that R8 hasn't just made the hello method static inside the Companion class, it has moved the method from the Companion to be directly on the Greeter class.

  #1              : (in LGreeter;)
    name          : 'hello'
    type          : '(Ljava/lang/String;)Ljava/lang/String;'
    access        : 0x0019 (PUBLIC STATIC FINAL)
[0002bc] Greeter.hello:(Ljava/lang/String;)Ljava/lang/String;
[000240] Greeter.hello:()LGreeter;
0000: new-instance v0, LGreeter;
0002: const-string v1, "Hello"
0004: invoke-direct {v0, v1}, LGreeter;.<init>:(Ljava/lang/String;)V
0007: return-object v0

With the hello method having been moved, the entire Companion class and the singleton field holding its instance on Greeter have both been removed.

This is staticization in practice. R8 finds occurrences of instance methods where the instance isn't actually required and makes them static. It also has special knowledge of how Kotlin implements companions so that in addition to making their methods static the extra class they'd otherwise generate can also be removed.

Source Transformation

Understanding exactly how a Kotlin companion is represented in bytecode and how R8's optimization works in bytecode can be challenging. In order to better understand both of these things we can emulate them at the source-code level.

The Kotlin compiler compiles the original Greeter class into Java bytecode which approximates to the following Java source code.

public final class Greeter {
  public static final Companion Companion = new Companion();

  private final String greeting;

  public Greeter(String greeting) {
    this.greeting = greeting;
  }

  public String getGreeting() {
    return greeting;
  }

  public String greet(String name) {
    return greeting + ", " + name;
  }

  public static final class Companion {
    private Companion() {}

    public Greeter hello() {
      return new Greeter("Hello");
    }
  }
}

The val greeting: String primary constructor property declaration is translated into a private field, constructor parameter, constructor assignment statement, and getter method. The companion object becomes a nested class named Companion and the enclosing Greeter class keeps a static, final singleton instance of it.

The main method is put into yet another class called GreeterKt which is based on the filename, Greeter.kt.

public final class GreeterKt {
  public static void main(String[] args) {
    System.out.println(Greeter.Companion.hello().greet("Olive"));
  }
}

In order to access the hello factory method, the main method calls through the static Companion field.

R8's optimization alters the code into what we otherwise would have written if the original Greeter was written in Java.

 public final class Greeter {
-  public static final Companion Companion = new Companion();
-
   private final String greeting;
@@

-  public static final class Companion {
-    private Companion() {}
-
-    public Greeter hello() {
-      return new Greeter("Hello");
-    }
-  }
+  public static Greeter hello() {
+    return new Greeter("Hello");
+  }
 }

The hello method becomes a static method directly inside Greeter and the Companion class and singleton instance field are removed.

 public final class GreeterKt {
   public static void main(String[] args) {
-    System.out.println(Greeter.Companion.hello().greet("Olive"));
+    System.out.println(Greeter.hello().greet("Olive"));
   }
 }

The main method is also updated to reflect this change, again looking more like if it were originally written in Java.

@JvmStatic

If you're familiar with Kotlin and its Java interoperability story, using the @JvmStatic annotation might have come to mind to achieve a similar effect.

   companion object {
+    @JvmStatic
     fun hello() = Greeter("Hello")

With the annotation added to the original example, running it through D8 only and dumping the bytecode shows an interesting result.

$ kotlinc *.kt

$ java -jar d8.jar \
    --lib $ANDROID_HOME/platforms/android-28/android.jar \
    --release \
    --output . \
    *.class

$ $ANDROID_HOME/build-tools/28.0.3/dexdump -d classes.dex
…
  #2              : (in LGreeter;)
    name          : 'hello'
    type          : '()LGreeter;'
    access        : 0x0019 (PUBLIC STATIC FINAL)
[00042c] Greeter.hello:()LGreeter;
0000: sget-object v0, LGreeter;.Companion:LGreeter$Companion;
0002: invoke-virtual {v0, v1}, LGreeter$Companion;.hello:()LGreeter;
0005: move-result-object v1
0006: return-object v1
…

A static hello method was added to the Greeter class, but it's just a trampoline into the Companion instance and the instance method of the same name.

[000234] GreeterKt.main:([Ljava/lang/String;)V
0000: sget-object v1, LGreeter;.Companion:LGreeter$Companion;
0002: invoke-virtual {v1}, LGreeter$Companion;.hello:()LGreeter;
…

And even with that static method present, Kotlin callers still do the Companion instance lookup and virtual method call.

Even with @JvmStatic present, R8 will still perform the staticization optimization. The Companion's greet method body will move into the static greet method on Greeter, the main function will do a static method call, and the entire Companion class will be removed.

More Than Companions

This optimization isn't limited to only Kotlin companion objects. Regular Kotlin objects will have their methods made static.

@Module
object HelloGreeterModule {
  @Provides fun greeter() = Greeter("Hello")
}

Java classes will also receive this optimization when the instance is not needed.

public final class Thing {
  public static final Thing INSTANCE = new Thing();

  private Thing() {}

  public void doThing() {
    // …
  }
}

Running R8 on these examples and validating the resulting bytecode is left as an exercise for the reader.

In summary, staticization takes instance methods which don't actually require access to an instance and makes them static. For Kotlin, it understands the bytecode of companion objects and can often eliminate them entirely when they're only being used to emulate Java's static.

Many R8 optimizations are aware of Kotlin-specific bytecode patterns in order to make them more effective. Stay tuned for the next post which features another R8 optimization that works well with Kotlin.

(This post was adapted from a part of my Digging into D8 and R8 talk. Watch the video and look out for future blog posts for more content like this.)

https://jakewharton.com/r8-optimization-staticization

Avoiding Vendor- and Version-Specific VM Bugs

Dec 4, 2018 Updated Dec 4, 2018

Show full content

Note: This post is part of a series on D8 and R8, Android's new dexer and optimizer, respectively. For an intro to D8 read "Android's Java 8 support".

The first two posts (1, 2) in this series explored how D8 is responsible for desugaring new Java language features to work on all versions of Android. Desugaring is the more interesting feature to demonstrate, but it's secondary functionality of D8. The primary responsibility is converting the stack-based Java bytecode into register-based Dalvik bytecode so that it can run on Android's VM.

At this point in Android's tenure it'd be reasonable to think that this conversion (called dexing) is a solved problem. During the process of building and rolling out D8, however, interesting vendor-specific and version-specific bugs in different VMs were uncovered which this post is going to explore.

Not A Not

D8 takes compiled Java bytecode and produces equivalent functionality using Dalvik bytecode. We can see this with a simple example that uses Java's bitwise not operator.

class Not {
  static void print(int value) {
    System.out.println(~value);
  }
}

Compiling and dumping the class file shows the bytecodes that are used to implement this feature.

$ javac *.java

$ javap -c *.class
class Not {
  static void print(int);
    Code:
       0: getstatic     #2      // Field java/lang/System.out:Ljava/io/PrintStream;
       3: iload_0
       4: iconst_m1
       5: ixor
       6: invokevirtual #3      // Method java/io/PrintStream.println:(I)V
       9: return
}

Bytecode index 3, 4, and 5 load the argument value onto the stack, load the constant -1, and perform a bitwise exclusive-or. If your bitwise skills are a little rusty, -1 is represented as all 1s and an exclusive-or sets a bit if and only if one of the two bits is set.

00010100  (value)
 xor
11111111  (-1)
 =
11101011

By performing an exclusive-or on a number whose bits are all set to 1, we are left with a number whose bits are the opposite of the original yielding the bitwise not.

Running this through D8 shows the operation is implemented similarly in Dalvik bytecode.

$ java -jar d8.jar \
    --lib $ANDROID_HOME/platforms/android-28/android.jar \
    --release \
    --output . \
    *.class

$ $ANDROID_HOME/build-tools/28.0.3/dexdump -d classes.dex
[000134] Not.print:(I)V
0000: sget-object v0, Ljava/lang/System;.out:Ljava/io/PrintStream;
0002: xor-int/lit8 v1, v1, #int -1
0004: invoke-virtual {v0, v1}, Ljava/io/PrintStream;.println:(I)V
0007: return-void

Index 0002 performs an exclusive-or on register v1 (the argument value) with a constant of -1 and stores it back into v1. This is a very straightforward mapping from Java bytecode and if you didn't know any better it wouldn't be given a second thought. But its inclusion in this post should tip you off that there is more to the story.

All of the Dalvik bytecodes are available for browsing on the Android developer documentation site. If you look closely, there's a unary operator section which contains a bytecode called not-int. Instead of doing an exclusive-or on the argument value with -1 a dedicated bitwise not bytecode could be used. This has the potential for using more efficient machine instructions and hardware in the CPU. So why isn't it being used?

The answer lies with the old dx tool and the fact that it also does not use the not-int instruction.

$ $ANDROID_HOME/build-tools/28.0.3/dx \
      --dex \
      --output=classes.dex \
      *.class
[000130] Not.print:(I)V
0000: sget-object v0, Ljava/lang/System;.out:Ljava/io/PrintStream;
0002: xor-int/lit8 v1, v2, #int -1
0004: invoke-virtual {v0, v1}, Ljava/io/PrintStream;.println:(I)V
0007: return-void

The old dx tool is hosted in dalvik/dx/ of AOSP. If we grep its codebase, we can find the constant used for the not-int instruction.

$ grep -r -C 1 'not-int' src/com/android/dx/io
OpcodeInfo.java-522-    public static final Info NOT_INT =
OpcodeInfo.java:523:        new Info(Opcodes.NOT_INT, "not-int",
OpcodeInfo.java-524-            InstructionCodec.FORMAT_12X, IndexType.NONE);

So while dx knows that the instruction exists, when you grep its codebase for uses of that constant when converting from a class file there are zero! For comparison I've also included the if-eq bytecode's constant.

$ grep -r -C 1 'NOT_INT' src/com/android/dx/cf

$ grep -r -C 1 'IF_EQ' src/com/android/dx/cf
code/RopperMachine.java-885-            case ByteOps.IFNULL: {
code/RopperMachine.java:886:                return RegOps.IF_EQ;
code/RopperMachine.java-887-            }

This means that the dx tool will never emit a not-int instruction no matter what Java bytecodes were used. This is unfortunate, but ultimately isn't that big of a deal.

The real problem stems from the fact that because the bytecode was never used by the canonical dexing tool, some vendors decided that they wouldn't bother supporting it in their Dalvik VM's JIT! Once D8 came along and started using the full bytecode set, JIT-compiled apps running on these specific phones would crash. As a result, D8 can't use the not-int instruction in this case even if it wants to.

With the introduction of the ART VM in API 21, all phones now have support for this instruction. As a result, passing --min-api 21 to D8 will change the bytecodes used to leverage not-int.

$ java -jar d8.jar \
    --lib $ANDROID_HOME/platforms/android-28/android.jar \
    --release \
    --min-api 21 \
    --output . \
    *.class

$ $ANDROID_HOME/build-tools/28.0.3/dexdump -d classes.dex
[000134] Not.print:(I)V
0000: sget-object v0, Ljava/lang/System;.out:Ljava/io/PrintStream;
0002: not-int v1, v1
0003: invoke-virtual {v0, v1}, Ljava/io/PrintStream;.println:(I)V
0006: return-void

Index 0002 now contains the more specific instruction as we expect.

In a similar manner to how language features are desugared to work on older versions of Android, D8 can change the shape of individual bytecodes to ensure compatibility. As the ecosystem and our minimum API level rises, D8 will automatically use the more efficient bytecodes.

Long Compare

Even when all of the instructions in use are supported, vendor-specific JITs are software like any other and can contain bugs. This happened close to home in code that was present in OkHttp and Okio.

Both libraries deal in moving and counting bytes. Their methods frequently start with a check for a negative count (which is invalid) and then a zero count (no work to do).

class LongCompare {
  static void somethingWithBytes(long byteCount) {
    if (byteCount < 0) throw new IllegalArgumentException("byteCount < 0");
    if (byteCount == 0) return; // Nothing to do!
    // Do something…
  }
}

When you compile and dex this code, the constant 0 is loaded and then two comparisons are made.

$ javac *.java

$ java -jar d8.jar \
    --lib $ANDROID_HOME/platforms/android-28/android.jar \
    --release \
    --output . \
    *.class

$ $ANDROID_HOME/build-tools/28.0.3/dexdump -d classes.dex
[000138] LongCompare.somethingWithBytes:(J)V
0000: const-wide/16 v0, #int 0
0002: cmp-long v2, v3, v0
0004: if-ltz v2, 000b
0006: cmp-long v2, v3, v0
0008: if-nez v2, 000a
…

Based on these bytecodes, we can infer that cmp-long produces a value that's either less-than zero, zero, or greater-than zero. After each comparison, a check for less-than zero occurs and then a check for non-zero, respectively. But if a single cmp-long produces the comparison result, why does index 0006 perform it a second time?

The reason is that one vendor-specific JIT will crash if a non-zero check is performed immediately after a less-than zero check. This would cause the program to see impossible exceptions such as a NullPointerException when only dealing with longs.

Just in the last example, the introduction of the ART VM resolved this problem. Passing --min-api 21 produces the more efficient sequence which only does a single cmp-long.

$ java -jar d8.jar \
    --lib $ANDROID_HOME/platforms/android-28/android.jar \
    --release \
    --min-api 21 \
    --output . \
    *.class

$ $ANDROID_HOME/build-tools/28.0.3/dexdump -d classes.dex
[000138] LongCompare.somethingWithBytes:(J)V
0000: const-wide/16 v0, #int 0
0002: cmp-long v2, v2, v0
0004: if-ltz v2, 0009
0006: if-nez v2, 0008
…

Once again D8 changes the shape of the bytecodes it uses for the purpose of compatibility. When your application no longer supports the versions of Android which have the broken vendor implementations, the bytecode is updated to the more efficient form.

But while ART has brought a normalization to the VM across the ecosystem eliminating (or at least reducing) these vendor-specific bugs, it is not exempt from bugs itself.

Recursion

Bugs that occur in ART itself affect specific versions of Android regardless of the vendor. As D8 evolves and changes the bytecode it emits, dormant bugs in ART can suddenly surface.

The example which demonstrates an interesting bug is admittedly very contrived, but the code was derived from a real application and distilled into a self-contained example.

import java.util.List;

class Recursion {
  private void f(int x, double y, double u, double v, List<String> w) {
    f(x, y, u, v, w);
    f(x, y, u, v, w);
    f(x, y, u, v, w);
    f(x, y, u, v, w);
    f(x, y, u, v, w);
    f(x, y, u, v, w);
    f(x, y, u, v, w);
    f(x, y, u, v, w);
    f(x, y, u, v, w);
    w.add(g(y, u, v));
  }

  private String g(double y, double u, double v) {
    return null;
  }
}

In Android 6.0 (API 23), ART's ahead-of-time (AOT) compiler added call analysis in order to perform method inlining. Due to the heavily-recursive nature of the f method above, the dex2oat compiler will actually consume all of the memory on the device during this analysis and crash. This was fixed in the next release, Android 7.0 (API 24).

When your minimum SDK is below 24, D8 will change the dex file to work around this bug. But before looking at the workaround, let's reproduce the crash.

$ javac *.java

$ java -jar d8.jar \
    --lib $ANDROID_HOME/platforms/android-28/android.jar \
    --release \
    --min-api 24 \
    --output . \
    *.class

We pass --min-api 24 to D8 in order to produce a dex file that does not contain the workaround for the bug. If you push this dex file to an API 23 device, dex2oat will refuse to compile it.

$ adb shell push classes.dex /sdcard

$ adb shell dex2oat --dex-file=/sdcard/classes.dex --oat-file=/sdcard/classes.oat

$ adb logcat
…
11-29 13:57:08.303  4508  4508 I dex2oat : dex2oat --dex-file=/sdcard/classes.dex --oat-file=/sdcard/classes.oat
11-29 13:57:08.306  4508  4508 W dex2oat : Failed to open .dex from file '/sdcard/classes.dex': Failed to open dex file '/sdcard/classes.dex' from memory: Unrecognized version number in /sdcard/classes.dex: 0 3 7
11-29 13:57:08.306  4508  4508 E dex2oat : Failed to open some dex files: 1
11-29 13:57:08.309  4508  4508 I dex2oat : dex2oat took 7.440ms (threads: 4)

The documentation for the dex file format defines that the first 8 bytes should be the characters DEX, a newline character, three number characters indicating the version, and then a null byte. Because --min-api 24 was specified, the dex file declares version 037. Dumping the first few bytes of the dex file confirm this.

$ xxd classes.dex | head -1
00000000: 6465 780a 3033 3700 e595 2d8c 49b5 d6b6  dex.037...-.I...

In order to get this dex file to install on an older device the version must be changed to 035. Any hex editor can be used to do this. I used xxd again to convert from binary to hexadecimal, edited the hexidecimal in an editor (which I know how to exit), and then used xxd again to convert hexadecimal back to binary.

$ xxd -p classes.dex > classes.hex

$ nano classes.hex  # Change 303337 to 303335

$ xxd -p -r classes.hex > classes.dex

With the version changed this dex file will now compile on Android 6.0 devices but with a different result.

$ adb shell push classes.dex /sdcard

$ adb shell dex2oat --dex-file=/sdcard/classes.dex --oat-file=/sdcard/classes.oat
Segmentation fault

Whoops! We (successfully) crashed the AOT compiler. Running dex2oat with the same dex file on Android 7.0 or newer does not trigger the crash, as expected.

Removing the --min-api 24 line will force D8 to insert its work around for this AOT compiler problem. Before doing so the old dex file is renamed so that we can compare the two.

$ mv classes.dex classes_api24.dex

$ java -jar d8.jar \
    --lib $ANDROID_HOME/platforms/android-28/android.jar \
    --release \
    --output . \
    *.class

Dumping the bytecodes of both shows the difference.

$ $ANDROID_HOME/build-tools/28.0.3/dexdump -d classes_api24.dex
[000190] Recursion.f:(IDDDLjava/util/List;)V
0000: invoke-direct/range {v7, v8, v9, v10, v11, v12, v13, v14, v15}, LRecursion;.f:(IDDDLjava/util/List;)V
…
0018: invoke-direct/range {v7, v8, v9, v10, v11, v12, v13, v14, v15}, LRecursion;.f:(IDDDLjava/util/List;)V
001b: move-object v0, v7
001c: move-wide v1, v9
001d: move-wide v3, v11
001e: move-wide v5, v13
001f: invoke-direct/range {v0, v1, v2, v3, v4, v5, v6}, LRecursion;.g:(DDD)Ljava/lang/String;
0022: move-result-object v8
0023: invoke-interface {v15, v8}, Ljava/util/List;.add:(Ljava/lang/Object;)Z
0026: return-void
  catches       : (none)

$ $ANDROID_HOME/build-tools/28.0.3/dexdump -d classes.dex
[000198] Recursion.f:(IDDDLjava/util/List;)V
0000: invoke-direct/range {v7, v8, v9, v10, v11, v12, v13, v14, v15}, LRecursion;.f:(IDDDLjava/util/List;)V
…
0018: invoke-direct/range {v7, v8, v9, v10, v11, v12, v13, v14, v15}, LRecursion;.f:(IDDDLjava/util/List;)V
001b: move-object v0, v7
001c: move-wide v1, v9
001d: move-wide v3, v11
001e: move-wide v5, v13
001f: invoke-direct/range {v0, v1, v2, v3, v4, v5, v6}, LRecursion;.g:(DDD)Ljava/lang/String;
0022: move-result-object v8
0023: invoke-interface {v15, v8}, Ljava/util/List;.add:(Ljava/lang/Object;)Z
0026: return-void
0027: move-exception v8
0028: throw v8
  catches       : 1
    0x0018 - 0x001b
      Ljava/lang/Throwable; -> 0x0027

The contents of each version of the method are the exact same until the very end. The version which works around the bug has two extra bytecodes, move-exception and throw, and an entry in the catches section. This is the bytecode equivalent of a try-catch block that simply re-throws the exception. By inserting this try-catch block, the AOT compiler's call analysis for method inling is disabled.

The range of the catch block only covers the last recursive call from bytecode index 0018 to 001b. If you were remove a single call to f in the original source code, the level of recursion won't be large enough to trigger the bug in the AOT compiler. Therefore the try-catch workaround only surrounds the recursive calls when they're problematic.

The same code when dexed with the old dx compiler will not cause a crash on Android 6.0. This is because the bytecode is less efficient and uses more registers which prevents the inlining analysis from even running.

The three examples above are a few cases of vendor- and version-specific bugs in Android's VMs. Just like the language feature desugaring covered in the previous posts, D8 will only apply workarounds for these bugs when necessary based on your minimum API level.

The conditionals which control whether these are applied are at the bottom of a file named InternalOptions.java in the D8 codebase. Bugs in the VM aren't only found in old versions of Android. If you search for AndroidApiLevel.Q in that file you'll find two workarounds for VM bugs present in every version of Android (at least at time of writing).

It's important to remember that all of these problems weren't caused by D8. They were uncovered by D8 in its effort to use registers more effectively and order bytecodes more efficiently when compared to dx. For optimizing dex even further, we have to turn to D8's optimizing sibling, R8, which we'll start to examine in the next post.

(This post was adapted from a part of my Digging into D8 and R8 talk that was only partially presented. Watch the video and look out for future blog posts for more content like this.)

https://jakewharton.com/avoiding-vendor-and-version-specific-vm-bugs

Android's Java 9, 10, 11, and 12 Support

Nov 27, 2018 Updated Nov 27, 2018

Show full content

Note: This post is part of a series on D8 and R8, Android's new dexer and optimizer, respectively. For an intro to D8 read "Android's Java 8 support".

The first post in this series explored Android's Java 8 support. Having support for the language features and APIs of Java 8 is table stakes at this point. We're not quite there with the APIs yet, sadly, but D8 has us covered with the language features. There's a future promise for the APIs which is essential for the health of the ecosystem.

A lot of the reaction to the previous post echoed that Java 8 is quite old. The rest of the Java ecosystem is starting to move to Java 11 (being the first long-term supported release after 8) after having toyed with Java 9 and 10. I was hoping for that reaction because I mostly wrote that post so that I could set up this one.

With Java releases happening more frequently, Android's yearly release schedule and delayed uptake of newer language features and APIs feels more painful. But is it actually the case that we're stuck with those of Java 8? Let's take a look at the Java releases beyond 8 and see how the Android toolchain fares.

Java 9

The last release on the 2 - 3 year schedule, Java 9 contains a few new language features. None of them are major like lambdas were. Instead, this release focused on cleaning up some of the sharp edges on existing features.

Concise Try With Resources

Prior to this release the try-with-resources construct required that you define a local variable (such as try (Closeable bar = foo.bar())). But if you already have a Closeable, defining a new variable is redundant. As such, this release allows you to omit declaring a new variable if you already have an effectively-final reference.

import java.io.*;

class Java9TryWithResources {
  String effectivelyFinalTry(BufferedReader r) throws IOException {
    try (r) {
      return r.readLine();
    }
  }
}

This feature is implemented entirely in the Java compiler so D8 is able to dex it for Android.

$ javac *.java

$ java -jar d8.jar \
    --lib $ANDROID_HOME/platforms/android-28/android.jar \
    --release \
    --output . \
    *.class

$ ls
Java9TryWithResources.java  Java9TryWithResources.class  classes.dex

Unlike the lambdas or static interface methods of Java 8 which required special desugaring, this Java 9 feature becomes available to all API levels for free.

Anonymous Diamond

Java 7 introduced the diamond operator which allowed omitting a generic type from the initializer if it could be inferred from the variable type.

List<String> strings = new ArrayList<>();

This cut down on redundant declarations, but it wasn't available for use on anonymous classes. With Java 9 that is now supported.

import java.util.concurrent.*;

class Java9AnonymousDiamond {
  Callable<String> anonymousDiamond() {
    Callable<String> call = new Callable<>() {
      @Override public String call() {
        return "Hey";
      }
    };
    return call;
  }
}

Once again this is entirely implemented in the Java compiler so the resulting bytecode is as if String was explicitly specified.

$ javac *.java

$ javap -c *.class
class Java9AnonymousDiamond {
  java.util.concurrent.Callable<java.lang.String> anonymousDiamond();
    Code:
       0: new           #7  // class Java9AnonymousDiamond$1
       3: dup
       4: aload_0
       5: invokespecial #8  // Method Java9AnonymousDiamond$1."<init>":(LJava9AnonymousDiamond;)V
       8: areturn
}

class Java9AnonymousDiamond$1 implements java.util.concurrent.Callable<java.lang.String> {
  final Java9AnonymousDiamond this$0;

  Java9AnonymousDiamond$1(Java9AnonymousDiamond);
    Code:
       0: aload_0
       1: aload_1
       2: putfield      #1  // Field this$0:LJava9AnonymousDiamond;
       5: aload_0
       6: invokespecial #2  // Method java/lang/Object."<init>":()V
       9: return

  public java.lang.String call();
    Code:
       0: ldc           #3  // String Hey
       2: areturn
}

Because there is nothing interesting in the bytecode, D8 handles this without issue.

$ java -jar d8.jar \
    --lib $ANDROID_HOME/platforms/android-28/android.jar \
    --release \
    --output . \
    *.class

$ ls
Java9AnonymousDiamond.java  Java9AnonymousDiamond.class  Java9AnonymousDiamond$1.class  classes.dex

Yet another language feature available to all API levels for free.

Private Interface Methods

Interfaces with multiple static or default methods can often lead to duplicated code in their bodies. If these methods were part of a class and not an interface private helper functions could be extracted. Java 9 adds the ability for interfaces to contain private methods which are only accessible to its static and default methods.

interface Java9PrivateInterface {
  static String hey() {
    return getHey();
  }

  private static String getHey() {
    return "hey";
  }
}

This is the first language feature that requires some kind of support. Prior to this release, the private modifier was not allowed on an interface member. Since D8 is already responsible for desugaring default and static methods, private methods were straightforward to include using the same technique.

$ javac *.java

$ java -jar d8.jar \
    --lib $ANDROID_HOME/platforms/android-28/android.jar \
    --release \
    --output . \
    *.class

$ ls
Java9PrivateInterface.java  Java9PrivateInterface.class  classes.dex

Static and default methods are supported natively in ART as of API 24. When you pass --min-api 24 for this example, the static method is not desugared. Curiously, though, the private static method is also not desugared.

$ $ANDROID_HOME/build-tools/28.0.2/dexdump -d classes.dex
Class #1            -
  Class descriptor  : 'LJava9PrivateInterface;'
  Access flags      : 0x0600 (INTERFACE ABSTRACT)
  Superclass        : 'Ljava/lang/Object;'
  Direct methods    -
    #0              : (in LJava9PrivateInterface;)
      name          : 'getHey'
      type          : '()Ljava/lang/String;'
      access        : 0x000a (PRIVATE STATIC)
00047c:                 |[00047c] Java9PrivateInterface.getHey:()Ljava/lang/String;
00048c: 1a00 2c00       |0000: const-string v0, "hey"
000490: 1100            |0002: return-object v0

We can see that the getHey() method's access flags still contain both PRIVATE and STATIC. If you add a main method which calls hey() and push this to a device it will actually work. Despite being a feature added in Java 9, ART allows private interface members since API 24!

Those are all the language features of Java 9 and they all already work on Android. How about that.

The APIs of Java 9, though, are not yet included in the Android SDK. A new process API, var handles, a version of the Reactive Streams interfaces, and collection factories are just some of those which were added. Since libcore (which contains implementation of java.*) and ART are developed in AOSP, we can peek and see that work is already underway towards supporting Java 9. Once included included in the SDK, some of its APIs will be candidates for desugaring to all API levels.

String Concat

The new language features and APIs of a Java release tend to be what we talk about most. But each release is also an opportunity to optimize the bytecode which is used to implement a feature. Java 9 brought an optimization to a ubiquitous language feature: string concatenation.

class Java9Concat {
  public static String thing(String a, String b) {
    return "A: " + a + " and B: " + b;
  }
}

If we take this fairly innocuous piece of code and compile it with Java 8 the resulting bytecode will use a StringBuilder.

$ java -version
java version "1.8.0_192"
Java(TM) SE Runtime Environment (build 1.8.0_192-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.192-b12, mixed mode)

$ javac *.java

$ javap -c *.class
class Java9Concat {
  public static java.lang.String thing(java.lang.String, java.lang.String);
    Code:
       0: new           #2  // class java/lang/StringBuilder
       3: dup
       4: invokespecial #3  // Method java/lang/StringBuilder."<init>":()V
       7: ldc           #4  // String A:
       9: invokevirtual #5  // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
      12: aload_0
      13: invokevirtual #5  // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
      16: ldc           #6  // String  and B:
      18: invokevirtual #5  // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
      21: aload_1
      22: invokevirtual #5  // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
      25: invokevirtual #7  // Method java/lang/StringBuilder.toString:()Ljava/lang/String;
      28: areturn
}

The bytecode contains the code we otherwise would have written if the language didn't allow simple concatenation.

If we change the compiler to Java 9, however, the result is very different.

$ java -version
java version "9.0.1"
Java(TM) SE Runtime Environment (build 9.0.1+11)
Java HotSpot(TM) 64-Bit Server VM (build 9.0.1+11, mixed mode)

$ javac *.java

$ javap -c *.class
class Java9Concat {
  public static java.lang.String thing(java.lang.String, java.lang.String);
    Code:
       0: aload_0
       1: aload_1
       2: invokedynamic #2,  0  // InvokeDynamic #0:makeConcatWithConstants:(
                                                      Ljava/lang/String;Ljava/lang/String;)Ljava/lang/String;
       7: areturn
}

The entire StringBuilder usage has been replaced with a single invokedynamic bytecode! The behavior here is similar to how native lambdas work on the JVM which was discussed in the last post.

At runtime, on the JVM, the JDK class StringConcatFactory is responsible for returning a block of code which can efficiently concatenate the arguments and constants together. This allows the implementation to change over time without the code having to be recompiled. It also means that the StringBuilder can be pre-sized more accurately since the argument's lengths can be queried.

If you want to learn more about why this change was made, Aleksey Shipilëv gave a great presentation on the motivations, implementation, and resulting benchmarks of the change.

Since the Android APIs don't yet include anything from Java 9, there is no StringConcatFactory available at runtime. Thankfully, just like it did for LambdaMetafactory and lambdas, D8 is able to desugar StringConcatFactory for concatenations.

$ java -jar d8.jar \
    --lib $ANDROID_HOME/platforms/android-28/android.jar \
    --release \
    --output . \
    *.class

$ $ANDROID_HOME/build-tools/28.0.2/dexdump -d classes.dex
[000144] Java9Concat.thing:(Ljava/lang/String;Ljava/lang/String;)Ljava/lang/String;
0000: new-instance v0, Ljava/lang/StringBuilder;
0002: invoke-direct {v0}, Ljava/lang/StringBuilder;.<init>:()V
0005: const-string v1, "A: "
0007: invoke-virtual {v0, v1}, Ljava/lang/StringBuilder;.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
000a: invoke-virtual {v0, v2}, Ljava/lang/StringBuilder;.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
000d: const-string v2, " and B: "
000f: invoke-virtual {v0, v2}, Ljava/lang/StringBuilder;.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
0012: invoke-virtual {v0, v3}, Ljava/lang/StringBuilder;.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
0015: invoke-virtual {v0}, Ljava/lang/StringBuilder;.toString:()Ljava/lang/String;
0018: move-result-object v2
0019: return-object v2

This means that all of the language features of Java 9 can be used on all API levels of Android despite changes in the bytecode that the Java compiler emits.

But Java is now on a six-month release schedule making Java 9 actually two versions old. Can we keep it going with newer versions?

Java 10

The only language feature of Java 10 was called local-variable type inference. This allows you to omit the type of local variable by replacing it with var when that type can be inferred.

import java.util.*;

class Java10 {
  List<String> localVariableTypeInferrence() {
    var url = new ArrayList<String>();
    return url;
  }
}

This is another feature implemented entirely in the Java compiler.

$ javac *.java

$ javap -c *.class
Compiled from "Java10.java"
class Java10 {
  java.util.List<java.lang.String> localVariableTypeInferrence();
    Code:
       0: new           #2  // class java/util/ArrayList
       3: dup
       4: invokespecial #3  // Method java/util/ArrayList."<init>":()V
       7: areturn
}

No new bytecodes or runtime APIs are required for this feature to work and so it can be used for Android just fine.

Of course, like the versions of Java before it, there are new APIs in this release such as Optional.orElseThrow, List.copyOf, and Collectors.toUnmodifiableList. Once added to the Android SDK in a future API level, these APIs can be trivially desugared to run on all API levels.

Java 11

Local-variable type inference was enhanced in Java 11 to support its use on lambda variables. You don't see types used in lambda parameters often so a lot of people don't even know this syntax exists. This is useful when you need to provide an explicit type to help type inference or when you want to use a type-annotation on the parameter.

import java.util.function.*;

@interface NonNull {}

class Java11 {
  void lambdaParameterTypeInferrence() {
    Function<String, String> func = (@NonNull var x) -> x;
  }
}

Just like Java 10's local-variable type inference this feature is implemented entirely in the Java compiler allowing it to work on Android.

New APIs in Java 11 include a bunch of new helpers on String, Predicate.not, and null factories for Reader, Writer, InputSteam, and OutputStream. Nearly all of the API additions in this release could be trivially desugared once available.

A major API addition to Java 11 is the new HTTP client, java.net.http. This client was previously available experimentally in the jdk.incubator.http package since Java 9. This is a very large API surface and implementation which leverages CompletableFuture extensively. It will be interesting to see whether or not this even lands in the Android SDK let alone is available via desugaring.

Nestmates

Like Java 9 and its string concatenation bytecode optimization, Java 11 took the opportunity to fix a long-standing disparity between Java's source code and its class files and the JVM: nested classes.

In Java 1.1, nested classes were added to the language but not the class specification or JVM. In order to work around the lack of support in class file, nesting classes in a source file instead creates sibling classes which use a naming convention to convey nesting.

class Outer {
  class Inner {}
}

Compiling this with Java 10 or earlier will produce two class files from a single source file.

$ java -version
java version "10" 2018-03-20
Java(TM) SE Runtime Environment 18.3 (build 10+46)
Java HotSpot(TM) 64-Bit Server VM 18.3 (build 10+46, mixed mode)

$ javac *.java

$ ls
Outer.java  Outer.class  Outer$Inner.class

As far as the JVM is concerned, these classes have no relationship except that they exist in the same package.

This illusion mostly works. Where it starts to break down is when one of the classes needs to access something that is private in the other.

class Outer {
  private String name;

  class Inner {
    String sayHi() {
      return "Hi, " + name + "!";
    }
  }
}

When these classes are made siblings, Outer$Inner.sayHi() is unable to access Outer.name because it is private to another class.

In order to work around this problem and maintain the nesting illusion, the Java compiler adds a package-private synthetic accessor method for any member accessed across this boundary.

 class Outer {
   private String name;
+
+  String access$000() {
+    return name;
+  }

   class Inner {
     String sayHi() {
-      return "Hi, " + name + "!";
+      return "Hi, " + access$000() + "!";
     }

This is visible in the compiled class file for Outer.

$ javap -c -p Outer.class
class Outer {
  private java.lang.String name;

  static java.lang.String access$000(Outer);
    Code:
       0: aload_0
       1: getfield      #1  // Field name:Ljava/lang/String;
       4: areturn
}

Historically this has been at most a small annoyance on the JVM. For Android, though, these synthetic accessor methods contribute to the method count in our dex files, increase APK size, slow down class loading and verification, and degrade performance by turning a field lookup into a method call!

In Java 11, the class file format was updated to introduce the concept of nests to describe these nesting relationships.

$ java -version
java version "11.0.1" 2018-10-16 LTS
Java(TM) SE Runtime Environment 18.9 (build 11.0.1+13-LTS)
Java HotSpot(TM) 64-Bit Server VM 18.9 (build 11.0.1+13-LTS, mixed mode)

$ javac *.java

$ javap -v -p *.class
class Outer {
  private java.lang.String name;
}
NestMembers:
  Outer$Inner

class Outer$Inner {
  final Outer this$0;

  Outer$Inner(Outer);
    Code: …

  java.lang.String sayHi();
    Code: …
}
NestHost: class Outer

The output here has been trimmed significantly, but the two class files are still produced except without an access$000 in Outer and with new NestMembers and NestHost attributes. These allow the VM to enforce a level of access control between package-private and private called nestmates. As a result, Inner can directly access Outer's name field.

ART does not understand the concept of nestmates so it needs to be desugared back into synthetic accessor methods.

$ java -jar d8.jar \
    --lib $ANDROID_HOME/platforms/android-28/android.jar \
    --release \
    --output . \
    *.class
Compilation failed with an internal error.
java.lang.UnsupportedOperationException
  at com.android.tools.r8.org.objectweb.asm.ClassVisitor.visitNestHostExperimental(ClassVisitor.java:158)
  at com.android.tools.r8.org.objectweb.asm.ClassReader.accept(ClassReader.java:541)
  at com.android.tools.r8.org.objectweb.asm.ClassReader.accept(ClassReader.java:391)
  at com.android.tools.r8.graph.JarClassFileReader.read(JarClassFileReader.java:107)
  at com.android.tools.r8.dex.ApplicationReader$ClassReader.lambda$readClassSources$1(ApplicationReader.java:231)
  at java.base/java.util.concurrent.ForkJoinTask$AdaptedCallable.exec(ForkJoinTask.java:1448)
  at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290)
  at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020)
  at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656)
  at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594)
  at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:177)

Unfortunately, at the time of writing, this does not work. The version of ASM, the library used to read Java class files, predates the final implementation of nestmates. Beyond that, though, D8 does not support desugaring of nest mates. You can star the D8 feature request on the Android issue tracker to convey your support for this feature.

Without support for desugaring nestmates it is currently impossible to use Java 11 for Android. Even if you avoid accessing things across the nested boundary, the mere presence of nesting will fail to compile.

Without the APIs from Java 11 in the Android SDK, its single language feature of lambda parameter type inference isn't compelling. For now, Android developers are not missing anything by being stuck on Java 10. That is, until we start looking forward…

Java 12

With a release date of March 2019, Java 12 is quickly approaching. The language features and APIs of this release have been in development for a few months already. Through early-access builds, we can download and experiment with these today.

In the current EA build, number 20, there are two new language features available: expression switch and string literals.

class Java12 {
  static int letterCount(String s) {
    return switch (s) {
      case "one", "two" -> 3;
      case "three" -> 5;
      default -> s.length();
    };
  }

  public static void main(String... args) {
    System.out.println(`
 __        ______    ______   ______   ______    ______    ______    
/\ \      /\  ___\  /\__  _\ /\__  _\ /\  ___\  /\  == \  /\  ___\   
\ \ \____ \ \  __\  \/_/\ \/ \/_/\ \/ \ \  __\  \ \  __<  \ \___  \  
 \ \_____\ \ \_____\   \ \_\    \ \_\  \ \_____\ \ \_\ \_\ \/\_____\ 
  \/_____/  \/_____/    \/_/     \/_/   \/_____/  \/_/ /_/  \/_____/
`);
    System.out.println("three: " + letterCount("three"));
  }
}

Once again, both of these features are implemented entirely as part of the Java compiler without any new bytecodes or APIs.

$ java -version
openjdk version "12-ea" 2019-03-19
OpenJDK Runtime Environment (build 12-ea+20)
OpenJDK 64-Bit Server VM (build 12-ea+20, mixed mode, sharing)

$ javac *.java

$ java -jar d8.jar \
    --lib $ANDROID_HOME/platforms/android-28/android.jar \
    --release \
    --output . \
    *.class

$ ls
Java12.java  Java12.class  classes.dex

We can push this to a device to ensure that it actually works at runtime.

$ adb push classes.dex /sdcard
classes.dex: 1 file pushed. 0.6 MB/s (1792 bytes in 0.003s)

$ adb shell dalvikvm -cp /sdcard/classes.dex Java12

 __        ______    ______   ______   ______    ______    ______
/\ \      /\  ___\  /\__  _\ /\__  _\ /\  ___\  /\  == \  /\  ___\
\ \ \____ \ \  __\  \/_/\ \/ \/_/\ \/ \ \  __\  \ \  __<  \ \___  \
 \ \_____\ \ \_____\   \ \_\    \ \_\  \ \_____\ \ \_\ \_\ \/\_____\
  \/_____/  \/_____/    \/_/     \/_/   \/_____/  \/_/ /_/  \/_____/

three: 5

This works because the bytecode for expression switch is the same as the "regular" switch we would otherwise write with an uninitialized local, case blocks with break, and a separate return statement. And a multi-line string literal is just a string with newlines in it, something we've been able to do with escape characters forever.

As with all the other releases covered, there will be new APIs in Java 12 and it's the same story as before. They'll need added to the Android SDK and evaluated for desugaring capability.

Hopefully by the time Java 12 is actually released D8 will have implemented desugaring for Java 11's nestmates. Otherwise the pain of being stuck on Java 10 will go up quite a bit!

Java 8 language features are here and desugaring of its APIs are coming (star the issue!). As the larger Java ecosystem moves forward to newer versions, it's reassuring that every language feature between 8 and 12 is already available on Android.

With Java 9 work seemingly happening in AOSP (cross your fingers for Android P+1), hopefully we'll have a new batch of APIs in the summer as candidates for desugaring. Once that lands, the smaller releases of Java will hopefully yield faster integration into the Android SDK.

Despite this, the end advice remains the same as in the last post. It's vitally important to maintain pressure on Android for supporting the new APIs and VM features from newer versions of Java. Without APIs being integrated into the SDK they can't (easily) be made available for use via desugaring. Without VM features being integrated into ART D8 bears a desugaring burden for all API levels instead of only to provide backwards compatibility.

Before these posts move on to talk about R8, the optimizing version of D8, the next one will cover how D8 works around version-specific and vendor-specific bugs in the VM.

(This post was adapted from a part of my Digging into D8 and R8 talk that was never presented. Watch the video and look out for future blog posts for more content like this.)

https://jakewharton.com/androids-java-9-10-11-and-12-support

Android's Java 8 Support

Nov 20, 2018 Updated Nov 20, 2018

Show full content

I've worked from home for a few years, and during that time I've heard people around the office complaining about Android's varying support for different versions of Java. Every year at Google I/O you could find me asking about it at the fireside chats or directly to the folks responsible. At conferences and other developer events it comes up in conversation or in talks with different degrees of accuracy. It's a complicated topic because what exactly we mean when talking about Android's Java support can be unclear. There's a lot to a single version of Java: the language features, the bytecode, the tools, the APIs, the JVM, and more.

When someone talks about Android's Java 8 support they usually are referring to the language features. So let's start there with a look at how Android's toolchain deals with the language features of Java 8.

Lambdas

The banner language feature of Java 8 was by far the addition of lambdas. This brought a more terse expression of code as data whereas previously more verbose constructs like anonymous classes would be used.

class Java8 {
  interface Logger {
    void log(String s);
  }

  public static void main(String... args) {
    sayHi(s -> System.out.println(s));
  }

  private static void sayHi(Logger logger) {
    logger.log("Hello!");
  }
}

After compiling this program with javac, running it through the legacy dx tool produces an error.

$ javac *.java

$ ls
Java8.java  Java8.class  Java8$Logger.class

$ $ANDROID_HOME/build-tools/28.0.2/dx --dex --output . *.class
Uncaught translation error: com.android.dx.cf.code.SimException:
  ERROR in Java8.main:([Ljava/lang/String;)V:
    invalid opcode ba - invokedynamic requires --min-sdk-version >= 26
    (currently 13)
1 error; aborting

This is because lambdas use a newer bytecode, invokedynamic, added in Java 7. As the error message indicates, Android's support for this bytecode requires a minimum API of 26 or newer–something practically unfathomable for applications at the time of writing. Instead, a process named desugaring is used which turns lambdas into representations compatible with all API levels developers are targeting.

Desugaring History

This history of the Android toolchain's desugaring capability is… colorful. The goal is always the same: allow newer language features to run on all devices.

Initially a third-party tool called Retrolambda had to be used. This worked by using the built-in mechanism which the JVM uses to turn lambdas into classes at runtime except happening at compile-time. The generated classes were very expensive in terms of method count, but work on the tool over time reduced the cost to something reasonable.

The Android tools team then announced a new compiler which would provide Java 8 language feature desugaring along with better performance. This was built on the Eclipse Java compiler but emitting Dalvik bytecode instead of Java bytecode. The Java 8 desugaring was extremely efficient, but otherwise adoption was low, performance was worse, and integration with other tooling was non-existent.

When the new compiler was (thankfully) abandoned, a Java bytecode to Java bytecode transformer which performed desugaring was integrated into the Android Gradle plugin from Bazel, Google's bespoke build system. The desugaring output remained efficient but performance still wasn't great. It was eventually made incremental, but work was happening concurrently to provide a better solution.

The D8 dexer was announced to replace the legacy dx tool with a promise of having desugar occur during dexing rather than a standalone Java bytecode transformation. The performance and accuracy of D8 compared to dx was a big win and it brought with it more efficient desugared bytecode. It was made the default dexer in Android Gradle plugin 3.1 and it then became responsible for desugaring in 3.2.

Using D8 to compile the above example to Dalvik bytecode succeeds.

$ java -jar d8.jar \
    --lib $ANDROID_HOME/platforms/android-28/android.jar \
    --release \
    --output . \
    *.class

$ ls
Java8.java  Java8.class  Java8$Logger.class  classes.dex

To see how D8 desugared the lambda we can use the dexdump tool which is part of the Android SDK. The tool produces quite a lot of output so we'll only look at the relevant sections.

$ $ANDROID_HOME/build-tools/28.0.2/dexdump -d classes.dex
[0002d8] Java8.main:([Ljava/lang/String;)V
0000: sget-object v0, LJava8$1;.INSTANCE:LJava8$1;
0002: invoke-static {v0}, LJava8;.sayHi:(LJava8$Logger;)V
0005: return-void

[0002a8] Java8.sayHi:(LJava8$Logger;)V
0000: const-string v0, "Hello"
0002: invoke-interface {v1, v0}, LJava8$Logger;.log:(Ljava/lang/String;)V
0005: return-void
…

If you haven't seen bytecode before (Dalvik or otherwise) don't worry–most of it can be picked up without a full understanding.

In the first block, our main method, bytecode index 0000 retrieves a reference from a static INSTANCE field on a class named Java8$1. Since the original source didn't contain a Java8$1 class, we can infer that it was generated as part of desugaring. The main method's bytecode also doesn't contain any traces of the lambda body so it likely has to do with this Java8$1 class. Index 0002 then calls the static sayHi method with the INSTANCE reference. The sayHi method requires a Java8$Logger argument so it would seem the Java8$1 class implements that interface. We can verify all of this in the output.

Class #2            -
  Class descriptor  : 'LJava8$1;'
  Access flags      : 0x1011 (PUBLIC FINAL SYNTHETIC)
  Superclass        : 'Ljava/lang/Object;'
  Interfaces        -
    #0              : 'LJava8$Logger;'

The presence of the SYNTHETIC flag means that the class was generated and the interfaces list includes Java8$Logger.

This class is now representing the lambda. If you look at its log method implementation, you might expect to find the missing lambda body.

…
[00026c] Java8$1.log:(Ljava/lang/String;)V
0000: invoke-static {v1}, LJava8;.lambda$main$0:(Ljava/lang/String;)V
0003: return-void
…

Instead, it invokes a static method on the original Java8 class named lambda$main$0. Again, the original source didn't contain this method but it's present in the bytecode.

…
    #1              : (in LJava8;)
      name          : 'lambda$main$0'
      type          : '(Ljava/lang/String;)V'
      access        : 0x1008 (STATIC SYNTHETIC)
[0002a0] Java8.lambda$main$0:(Ljava/lang/String;)V
0000: sget-object v0, Ljava/lang/System;.out:Ljava/io/PrintStream;
0002: invoke-virtual {v0, v1}, Ljava/io/PrintStream;.println:(Ljava/lang/String;)V
0005: return-void

The SYNTHETIC flag again confirms that this method was generated. And its bytecode contains the body of the lambda: a call to System.out.println. The reason that the lambda body is kept inside the original class is that it might access private members that the generated class wouldn't have access to.

All of the puzzle pieces for understanding how desugaring works are here. Seeing it in Dalvik bytecode, though, can be a bit dense and intimidating.

Source Transformation

In order to better understand how desugaring works we can perform the transformation at the source code level. This is not how it actually works, but it's a useful exercise for learning both what happens but also reinforcing what we saw in the bytecode.

Once again, we start from the original program with a lambda.

class Java8 {
  interface Logger {
    void log(String s);
  }

  public static void main(String... args) {
    sayHi(s -> System.out.println(s));
  }

  private static void sayHi(Logger logger) {
    logger.log("Hello!");
  }
}

First, the lambda body is moved to a sibling, package-private method.

   public static void main(String... args) {
-    sayHi(s -> System.out.println(s));
+    sayHi(s -> lambda$main$0(s));
   }
+
+  static void lambda$main$0(String s) {
+    System.out.println(s);
+  }

Then, a class is generated which implements the target interface and whose method body calls the lambda method.

   public static void main(String... args) {
-    sayHi(s -> lambda$main$0(s));
+    sayHi(new Java8$1());
   }
@@
 }
+
+class Java8$1 implements Java8.Logger {
+  @Override public void log(String s) {
+    Java8.lambda$main$0(s);
+  }
+}

Finally, because the lambda doesn't capture any state, a singleton instance is created and stored in a static INSTANCE variable.

   public static void main(String... args) {
-    sayHi(new Java8$1());
+    sayHi(Java8$1.INSTANCE);
   }
@@
 class Java8$1 implements Java8.Logger {
+  static final Java8$1 INSTANCE = new Java8$1();
+
   @Override public void log(String s) {

This results in a fully desugared source file that can be used on all API levels.

class Java8 {
  interface Logger {
    void log(String s);
  }

  public static void main(String... args) {
    sayHi(Java8$1.INSTANCE);
  }

  static void lambda$main$0(String s) {
    System.out.println(s);
  }

  private static void sayHi(Logger logger) {
    logger.log("Hello!");
  }
}

class Java8$1 implements Java8.Logger {
  static final Java8$1 INSTANCE = new Java8$1();

  @Override public void log(String s) {
    Java8.lambda$main$0(s);
  }
}

If you actually look in the Dalvik bytecode for the generated lambda class it won't have a name like Java8$1. The real name will look something like -$$Lambda$Java8$QkyWJ8jlAksLjYziID4cZLvHwoY. The reason for the awkward naming and the advantages it brings are content for another post…

Native Lambdas

When we used the dx tool to attempt to compile lambda-containing Java bytecode to Dalvik bytecode its error message indicated that this would only work with a minimum API of 26 or newer.

$ $ANDROID_HOME/build-tools/28.0.2/dx --dex --output . *.class
Uncaught translation error: com.android.dx.cf.code.SimException:
  ERROR in Java8.main:([Ljava/lang/String;)V:
    invalid opcode ba - invokedynamic requires --min-sdk-version >= 26
    (currently 13)
1 error; aborting

Thus, if you re-run D8 and specify --min-api 26 it's reasonable to assume that "native" lambdas will be used and desugaring won't actually occur.

$ java -jar d8.jar \
    --lib $ANDROID_HOME/platforms/android-28/android.jar \
    --release \
    --min-api 26 \
    --output . \
    *.class

But if you dump the .dex file, you'll still find the -$$Lambda$Java8$QkyWJ8jlAksLjYziID4cZLvHwoY class was generated. Maybe it's a D8 bug?

To learn why desugaring always occurs we need to look inside the Java bytecode of the Java8 class.

$ javap -v Java8.class
class Java8 {
  public static void main(java.lang.String...);
    Code:
       0: invokedynamic #2, 0   // InvokeDynamic #0:log:()LJava8$Logger;
       5: invokestatic  #3      // Method sayHi:(LJava8$Logger;)V
       8: return
}
…

The output has been trimmed for readability, but inside the main method you'll see the invokedynamic bytecode at index 0. The second argument to the bytecode is the value 0 which is the index of the associated bootstrap method. A bootstrap method is a bit of code that runs the first time that the bytecode is executed and it defines the behavior. The list of bootstrap methods are present at the bottom of the output.

…
BootstrapMethods:
  0: #27 invokestatic java/lang/invoke/LambdaMetafactory.metafactory:(
                        Ljava/lang/invoke/MethodHandles$Lookup;Ljava/lang/String;
                        Ljava/lang/invoke/MethodType;Ljava/lang/invoke/MethodType;
                        Ljava/lang/invoke/MethodHandle;Ljava/lang/invoke/MethodType;)
                        Ljava/lang/invoke/CallSite;
    Method arguments:
      #28 (Ljava/lang/String;)V
      #29 invokestatic Java8.lambda$main$0:(Ljava/lang/String;)V
      #28 (Ljava/lang/String;)V

In this case, the bootstrap method is called metafactory on the java.lang.invoke.LambdaMetafactory class. This class lives in the JDK and is responsible for creating anonymous classes on-the-fly at runtime for lambdas in a similar fashion to how D8 creates them at compile time.

If you look at the Android documentation for java.lang.invoke or the AOSP source code for java.lang.invoke, though, you'll notice this class isn't present in the Android runtime. This is why desguaring always happens at compile-time regardless of your minimum API level. The VM has the bytecode support for an equivalent to invokedynamic, but the JDK's built-in LambdaMetafactory is not available to use.

Method References

In addition to lambdas, method references were added to the language in Java 8. They're an efficient way to create a lambda whose body points to an existing method.

The logger example in this post has been using a lambda body whose contents call an existing method, System.out.println. We can substitute the explicit lambda for a method reference to save some code.

   public static void main(String... args) {
-    sayHi(s -> System.out.println(s));
+    sayHi(System.out::println);
   }

This compiles with javac and dexes with D8 the same as the lambda version with one notable difference. When dumping the Dalvik bytecode, the body of the generated lambda class has changed.

[000268] -$$Lambda$1Osqr2Z9OSwjseX_0FMQJcCG_uM.log:(Ljava/lang/String;)V
0000: iget-object v0, v1, L-$$Lambda$1Osqr2Z9OSwjseX_0FMQJcCG_uM;.f$0:Ljava/io/PrintStream;
0002: invoke-virtual {v0, v2}, Ljava/io/PrintStream;.println:(Ljava/lang/String;)V
0005: return-void

Instead of calling the generated Java8.lambda$main$0 method which contains the call to System.out.println, the log implementation now invokes System.out.println directly.

The lambda class is also no longer a static singleton. Bytecode index 0000 above is reading an instance field for a PrintStream reference. This reference is System.out which is resolved at the call-site in main and passed into the constructor (which is named <init> in bytecode).

[0002bc] Java8.main:([Ljava/lang/String;)V
0000: sget-object v1, Ljava/lang/System;.out:Ljava/io/PrintStream;
0003: new-instance v0, L-$$Lambda$1Osqr2Z9OSwjseX_0FMQJcCG_uM;
0004: invoke-direct {v0, v1}, L-$$Lambda$1Osqr2Z9OSwjseX_0FMQJcCG_uM;.<init>:(Ljava/io/PrintStream;)V
0008: invoke-static {v0}, LJava8;.sayHi:(LJava8$Logger;)V

Performing the transformation at the source level again results in a straightforward transformation.

   public static void main(String... args) {
-    sayHi(System.out::println);
+    sayHi(new -$$Lambda$1Osqr2Z9OSwjseX_0FMQJcCG_uM(System.out));
   }
@@
 }
+
+class -$$Lambda$1Osqr2Z9OSwjseX_0FMQJcCG_uM implements Java8.Logger {
+  private final PrintStream ps;
+
+  -$$Lambda$1Osqr2Z9OSwjseX_0FMQJcCG_uM(PrintStream ps) {
+    this.ps = ps;
+  }
+
+  @Override public void log(String s) {
+    ps.println(s);
+  }
+}

Interface Methods

The other significant language feature of Java 8 was the ability to have static and default methods in interfaces. Static methods on interfaces allow providing instance factories or other helpers directly on the interface type on which they operate. Default methods allow you to compatibly add new methods to interfaces which have default implementations.

interface Logger {
  void log(String s);

  default void log(String tag, String s) {
    log(tag + ": " + s);
  }

  static Logger systemOut() {
    return System.out::println;
  }
}

Both of these new method types on interfaces are supported by D8's desugaring. Using the tools above it's possible to understand how these are desugared to work on all API levels. That investigation is left as an exercise for the reader.

It is worth noting, though, that both of these features are implemented natively in the Android VM as of API 24. As a result, unlike lambdas and method references, specifying --min-api 24 to D8 will result in them not having to be desugared.

Just Use Kotlin?

By this point, a large majority of readers will have thought of Kotlin in some capacity. Yes, Kotlin provides lambdas and method references for passing code as data. Yes, Kotlin provides default and static(-like) functions on interfaces. All of those features are actually implemented by kotlinc in exactly the same way that D8 desugars the Java 8 bytecode (modulo small implementation details).

Android's development toolchain and VM support of newer Java language features is still important even if you are writing 100% Kotlin code. New versions of Java bring more efficient constructs in both bytecode and in the VM that Kotlin can then take advantage of.

It's not unreasonable to think that Kotlin will stop supporting Java 6 and Java 7 bytecode at some point in the future. The IntelliJ platform has moved to Java 8 as of version 2016.1. Gradle 5.0 has moved to Java 8. The number of platforms running on older JVMs are dwindling. Without support for Java 8 bytecode and VM functionality, Android is in danger of becoming the largest ecosystem holding Kotlin's Java bytecode generation back. Thankfully D8 and ART are stepping up here to ensure that isn't the case.

Desugaring APIs

Thus far this post has focused on the language features and bytecode of newer Java versions. The other major benefit of new Java versions are the new APIs that come with it. Java 8 brought a ton of new APIs such as streams, Optional, functional interfaces, CompletableFuture, and a new date/time API.

Going back to the original logger example, we can use the new date/time API in order to know when messages were logged.

import java.time.*;

class Java8 {
  interface Logger {
    void log(LocalDateTime time, String s);
  }

  public static void main(String... args) {
    sayHi((time, s) -> System.out.println(time + " " + s));
  }

  private static void sayHi(Logger logger) {
    logger.log(LocalDateTime.now(), "Hello!");
  }
}

We can again compile this with javac and convert it to Dalvik bytecode with D8 which desugars it to run on all API levels.

$ javac *.java

$ java -jar d8.jar \
    --lib $ANDROID_HOME/platforms/android-28/android.jar \
    --release \
    --output . \
    *.class

You can actually push this onto a phone or emulator to verify it works, something we didn't do with the previous examples.

$ adb push classes.dex /sdcard
classes.dex: 1 file pushed. 0.5 MB/s (1620 bytes in 0.003s)

$ adb shell dalvikvm -cp /sdcard/classes.dex Java8
2018-11-19T21:38:23.761 Hello

If your device runs API 26 or newer you will see a timestamp and the string "Hello!" as expected. But running it on a device with a version earlier than API 26 produces a very different result.

java.lang.NoClassDefFoundError: Failed resolution of: Ljava/time/LocalDateTime;
  at Java8.sayHi(Java8.java:13)
  at Java8.main(Java8.java:9)

D8 has desugared the new language feature of lambdas to work on all API levels but didn't do anything with the new API usage of LocalDateTime. This is disappointing because it means we only see some of the benefits of Java 8, not all of them.

Developers can choose to bundle their own Optional class or use a standalone version of the date/time library called ThreeTenBP to work around this. But if you can manually rewrite your code to use versions bundled in your APK, why can't desugar in D8 do it for you?

It turns out that D8 already does this but only for a single API: Throwable.addSuppressed. This API is what allows the try-with-resources language feature of Java 7 to work on all versions of Android despite the API only being available from API 19.

All we need for the Java 8 APIs to work on all API levels then is a compatible implementation that we can bundle in the APK. It turns out the team that works on Bazel have again already built this. Their code that does the rewriting can't be used, but the standalone repackaging of these JDK APIs can be. All we need is for the D8 team to add support in their desugaring tool to do the rewriting. You can star the D8 feature request on the Android issue tracker to convey your support.

While the desugaring of language features has been available in various forms for some time, the lack of API desugaring remains a large gap in our ecosystem. Until the day that the majority of apps can specify a minimum API of 26, the lack of API desugaring in Android's toolchain is holding back the Java library ecosystem. Libraries which support both Android and the JVM cannot use the Java 8 APIs that were introduced nearly 5 years ago!

And despite Java 8 language feature desugaring now being part of D8, it's not enabled by default. Developers must explicitly opt-in by specifying their source and target compatibility to Java 8. Android library authors can help force this trend by building and publishing their libraries using Java 8 bytecode (even if you don't use the language features).

D8 is being actively worked on and so the future still looks bright for Java language and API support. Even if you're solely a Kotlin user, it's important to maintain pressure on Android for support of new versions of Java for the better bytecodes and new APIs. And in some cases, D8 is actually ahead of the game for versions of Java beyond 8 which we'll explore in the next post.

(This post was adapted from a part of my Digging into D8 and R8 talk that was never presented. Watch the video and look out for future blog posts for more content like this.)

https://jakewharton.com/androids-java-8-support

Increased accuracy of aapt2 "keep" rules

Aug 7, 2018 Updated Aug 7, 2018

Show full content

The aapt2 tool packages your Android application resources into the format used at runtime. It also generates "keep" rules for ProGuard or R8 so that the types referenced inside of your resources do not get removed. Views referenced only in layout XML, action providers referenced only in menu XML, and broadcast receivers referenced only in the manifest XML are some examples of types that would otherwise be removed from the final APK were it not for these rules.

Prior to version 3.3.0-alpha05 of the Android Gradle plugin, aapt2 would generate "keep" rules for the constructors of these types using an argument wildcard. Some rules for an application class, activity class, and view reference look like this:

# Referenced at frontend/android/build/intermediates/merged_manifests/release/AndroidManifest.xml:20
-keep class com.jakewharton.sdksearch.SdkSearchApplication { <init>(...); }
# Referenced at frontend/android/build/intermediates/merged_manifests/release/AndroidManifest.xml:28
-keep class com.jakewharton.sdksearch.ui.MainActivity { <init>(...); }
# Referenced at search/ui-android/build/intermediates/packaged_res/release/layout/search.xml:57
-keep class android.support.v7.widget.RecyclerView { <init>(...); }

Dumping the methods of the release APK we get:

com.jakewharton.sdksearch.SdkSearchApplication <init>()
com.jakewharton.sdksearch.ui.MainActivity <init>()
android.support.v7.widget.RecyclerView <init>(Context)
android.support.v7.widget.RecyclerView <init>(Context, AttributeSet)
android.support.v7.widget.RecyclerView <init>(Context, AttributeSet, int)

SdkSearchApplication and MainActivity contain only a default constructor but RecyclerView contains three. As far as the reflective lookup is concerned, only one constructor will be used. For types in the manifest the default (no-argument) constructor is used. For types in a layout XML file the two-arg Context+AttributeSet constructor is invoked by LayoutInflater. By generating rules with <init>(...) we are forcing every constructor to be retained despite only needing one.

Starting with version 3.3.0-alpha05 of the Android Gradle plugin, a new version of aapt2 is used which generates more precise rules that reference only the exact constructor which the reflective lookup will use.

# Referenced at frontend/android/build/intermediates/merged_manifests/release/AndroidManifest.xml:20
-keep class com.jakewharton.sdksearch.SdkSearchApplication { <init>(); }
# Referenced at frontend/android/build/intermediates/merged_manifests/release/AndroidManifest.xml:28
-keep class com.jakewharton.sdksearch.ui.MainActivity { <init>(); }
# Referenced at search/ui-android/build/intermediates/packaged_res/release/layout/search.xml:57
-keep class android.support.v7.widget.RecyclerView { <init>(android.content.Context, android.util.AttributeSet); }

Dumping the methods of the release APK again now shows:

com.jakewharton.sdksearch.SdkSearchApplication <init>()
com.jakewharton.sdksearch.ui.MainActivity <init>()
android.support.v7.widget.RecyclerView <init>(Context, AttributeSet)
android.support.v7.widget.RecyclerView <init>(Context, AttributeSet, int)

The <init>(Context) of RecyclerView is no longer present! That constructor used to be forced into the release APK despite never actually being used. The three-argument constructor is still kept is because the two-argument one delegates to it:

public RecyclerView(@NonNull Context context, @Nullable AttributeSet attrs) {
    this(context, attrs, 0);
}

If optimization is also enabled and there are no other uses of that three-argument constructor it may get inlined–something that couldn't have happened with the old rules.

This seems like a small change, and it mostly is. Application, activity, and action provider subtypes tend to only have the one constructor so their counts are unlikely to change. View subtypes, however, very frequently have three or four constructors and you will likely now see two or three of those being removed. In the scope of an entire APK that allows on the order of tens or hundreds of methods to be removed which were needlessly being kept. As the specificity of "keep" rules increases it not only reduces the raw number of methods that wind up in the final APK, but often allows optimization passes to have a greater effect.

If you find any bugs with the new rules, please report them on the Android issue tracker.

https://jakewharton.com/increased-accuracy-of-aapt2-keep-rules

Tracing Gradle task execution

Aug 1, 2018 Updated Aug 1, 2018

Show full content

Gradle provides two built-in mechanisms for tracing your build: --profile and --scan. The former produces a simple HTML report of task execution times. You can get a rough idea of where time was spent but are unlikely to glean any real insights. The latter sends a detailed report to Gradle's servers (or to a Gradle Enterprise installation) with much more granular information. Task details are rendered on a concurrent timeline corresponding to their execution. For CI builds, I tend to want something more granular than --profile but I don't like the idea of sending details of every build to Gradle with --scan. It seems entirely needless considering their plugin has all of that information locally but chooses to render it remotely.

The Gradle profiler project started a few years ago as a way to deterministically measure build speeds. By creating scenarios such as an ABI-breaking change, ABI-compatible change, Android resource change, etc., the tool can run these scenarios multiple times to first warm up the JVM and then to produce an accurate picture of what gets executed. It offers integrations and outputs for use with popular JVM-based performance analysis tools such as YourKit and Java Flight Recorder.

For CI builds, executing through the Gradle profiler would be an annoying abstraction to use. We can instead use it for inspiration and run its integrations on individual builds.

Java Flight Recorder can be used on individual Gradle builds with the jcmd binary in the JDK and with flags to java specified on the org.gradle.jvmargs in your gradle.properties. There are even Gradle plugins which offer to start and stop the recording automatically. We can then open the resulting .jfr file in Java Mission Control or use a command-line tool to convert it into a flamegraph.

The flamegraph can show where time is being spent inside of tasks over the course of the build. The stacks aren't correlated to a task, though, so it's important to remember that you're looking at the larger picture. This also doesn't handle tasks which communicate with their own daemons such as the Kotlin compiler.

While this produces a pretty output, its utility is small and the Gradle plugin integration is not the most stable. I would refrain from using this on CI as result unless you're going to build out a strong integration with jcmd directly. These visualizations work well when you have a small subset of tasks to run rather than when your entire project is being built.

The Gradle profiler also includes support for Chrome traces. This output will be familiar to Android users who have used the systrace tool. Again we can integrate this into our builds without jumping through the Gradle profiler.

The code for producing a Chrome trace lives inside the Gradle profiler repository. Clone and build the project which will produce a jar at subprojects/chrome-trace/build/libs/chrome-trace.jar. Copy this jar into the gradle/ directory of your project. This jar contains a plugin which can be applied inside a Gradle initialization script.

// init.gradle

initscript {
  dependencies {
    classpath files('gradle/chrome-trace.jar')
  }
}

rootProject {
  def date = new java.text.SimpleDateFormat("yyyy-MM-dd-HH-mm-ss").format(new Date())
  ext.chromeTraceFile = new File(rootProject.buildDir, "reports/trace/trace-${date}.html")
}

apply plugin: org.gradle.trace.GradleTracingPlugin

When invoking Gradle we need to reference this script and also pass a flag to enable the tracing.

$ ./gradlew --init-script init.gradle -Dtrace build

This will produce a trace file at build/reports/trace/trace-(date).html which you can open in Chrome and navigate using the arrow keys and A-S-D-W keys.

The trace gives a picture of concurrent task execution and timings therein. There is very little here that isn't in the --profile report, but it's presented in a manner that gives you more context. The most notable and welcome addition is that of CPU load, heap size, and GC events.

Unfortunately, the granularity per-task is near zero. There are no insights into workers that operate as part of a task. We cannot get flame graphs of the call stacks inside of a task.

I have added this to SDK Search's CI builds in addition to the other reports it already generates if you'd like to see a full integration: https://github.com/JakeWharton/SdkSearch/commit/3cc9bd8bc9741cf8459bf975a186e0c36e5481d8.

Neither is perfect but both can be useful in different situations. Hopefully in the future visibility into workers will be added to the Chrome trace. Figuring out how to merge the Java Flight Recorder data into the Chrome trace would also be an amazing addition. For now, having the Chrome trace run on CI gives a good picture of how the build is performing and then Java Flight Recorder can be used either manually or with the Gradle profiler to dig into individual task performance.

Here are the four tracing outputs of a single build:

--profile report
Chrome trace
JFR flamegraph
--scan report

https://jakewharton.com/tracing-gradle-task-execution

Introducing Android KTX: Even Sweeter Kotlin Development for Android

Feb 5, 2018 Updated Feb 5, 2018

Show full content

This post was published externally on Android Developers Blog. Read it at https://android-developers.googleblog.com/2018/02/introducing-android-ktx-even-sweeter.html.

https://jakewharton.com/introducing-android-ktx

Surfacing Hidden Change to Pull Requests

Jul 13, 2017 Updated Jul 13, 2017

Show full content

This post was published externally on Square Corner. Read it at https://developer.squareup.com/blog/surfacing-hidden-change-to-pull-requests.

https://jakewharton.com/surfacing-hidden-change-to-pull-requests

Generating Kotlin code with KotinPoet

May 16, 2017 Updated May 16, 2017

Show full content

This post was published externally on Square Corner. Read it at https://developer.squareup.com/blog/generating-kotlin-code-with-kotlinpoet/.

https://jakewharton.com/generating-kotlin-code-with-kotlinpoet

An Optional's place in Kotlin

May 14, 2017 Updated May 14, 2017

Show full content

This post was published externally on Square Corner. Read it at https://developer.squareup.com/blog/an-optionals-place-in-kotlin.

https://jakewharton.com/an-optionals-place-in-kotlin

Square Open Source ♥s Kotlin

May 12, 2017 Updated May 12, 2017

Show full content

This post was published externally on Square Corner. Read it at https://developer.squareup.com/blog/square-open-source-loves-kotlin.

https://jakewharton.com/square-open-source-loves-kotlin

Web Sockets now shipping in OkHttp 3.5!

Dec 2, 2016 Updated Dec 2, 2016

Show full content

This post was published externally on Square Corner. Read it at https://developer.squareup.com/blog/web-sockets-now-shipping-in-okhttp-3-5.

https://jakewharton.com/web-sockets-now-shipping-in-okhttp

Forcing bytes downward in Okio

Sep 6, 2016 Updated Sep 6, 2016

Show full content

Okio's BufferedSink is a high-level abstraction for writing binary and character data as bytes. Its design stems from frustrations with the JDK's java.io.* and java.nio.* libraries. At Droidcon Montreal last year I gave a presentation comparing it with the former, but also showcased Okio's concept of a segment and how it enables the library to cheaply move bytes. If you aren't familiar with Okio I encourage you to go watch the presentation first since the rest of this post will assume at least cursory knowledge of its types.

If you never look at the source code of Okio you won't know that this segment concept exists. It's an implementation detail for performance that remains completely opaque to the consumer of the library. That is, except for one* notable exception: the emitCompleteSegments() method on BufferedSink.

This method is part of a family of three methods which force buffered bytes to be moved to the underlying Sink. Their difference is subtle, but understanding that difference ensures correctness and can make or break throughput.

First let's understand the difference in behavior and then look at some use cases for each.

flush()

Flush is a common concept in stream APIs and its semantics remain unchanged in Okio. Calls to this method cause all buffered bytes to be moved to the underlying Sink and then that Sink is also instructed to flush itself. When calls to flush() return, you are guaranteed that all bytes have been sent all the way to the destination Sink.

When multiple levels of buffering are in use, a call to flush() will clear the buffers at every level. In Okio multiple levels of buffering are so cheap that it's practically free. A flush just amounts to each level moving its segments down to the next level of the chain.

With java.io.* streams, however, multiple levels of buffering require each level to allocate and manage its own byte[]. This means that a flush operation will result in each level doing an arraycopy() of its data down to the next level (which also might have required buffer expansion).

Calls to close() on stream types typically behave similar to flush() in that they write all buffered bytes to the underlying stream before also instructing it to close.

emit()

Emitting bytes is very similar to flushing except that it is not a recursive operation. Calls to this method cause all buffered bytes to be moved to the underlying Sink. Unlike flush(), however, that Sink is not told to do any other operations.

Because buffering is so inexpensive with Okio, it's not uncommon to accept a Sink for API flexibility and immediately wrap it in a BufferedSink for the implementation's convenience. It's important to not leave any buffered bytes unwritten to the original Sink which is what calls to emit() will ensure.

emit() is a nice alternative to flush() since it allows you to use the more useful BufferedSink type without the concern of needlessly causing bytes to be sent all the way down the chain every time you're finished with the abstraction.

emitCompleteSegments()

If you understand the behavior of emit() and you understood the concept of segments then the behavior of this method should be straightforward. Calls to this method cause only the bytes which are part of complete segments to be moved to the underlying Sink. If you haven't buffered enough bytes to create a complete segment this method will actually do nothing!

Remember, segments are an implementation detail of Okio and as such so are their sizes. So why does the public API expose their concept in this method?

The reason this method exists to ensure that your code is actually not buffering too many bytes. When sending large amounts of data over a long period of time through a BufferedSink, it can be beneficial to occasionally write parts of the data to the underlying Sink. This ensures that the destination isn't overwhelmed by a single gigantic write. Instead, the Sink can incrementally process bytes as they're available and optionally send back signals to the producer (either with an exception or out-of-band notifications like HTTP/2's flow control).

Now that we know the difference in behavior of these three methods, let's look at some use cases of when you would want to use each.

Writing messages to a WebSocket

WebSockets are long-lived connections to a server over which string or binary messages are constantly streamed in both directions. The frequency of these messages can be extremely rapid or quite sparse. An API for sending messages on a WebSocket class would look something like this:

public void sendMessage(String content);
public void sendMessage(byte[] content);

This WebSocket type would wrap a socket stream with a BufferedSink for sending messages. Because we don't know when the next message will come from the application, the implementation of sendMessage needs to call flush() before returning to optimize for latency. This ensures all of the data from each message will be sent down through the socket.

If you were to use emitCompleteSegments() part of the message would almost always be left in the buffer. Using emit() would only work if there's no intermediate buffering which is hard to guarantee. This is why flush() is the only appropriate operation for this example.

Encoding a video to a file

Video encoding is a CPU-bound, memory-intensive process which generates data at a fairly consistent rate. Writing this data to a file as its being encoded keeps the buffer size small and ensures that the slower disk drive can keep up.

At regular intervals the implementation writing encoded data to a BufferedSink should call emitCompleteSegments() to allow large portions of the buffer to get moved to the underlying Sink and start trickling down the chain. The reason that emitCompleteSegments() is preferred here over emit() is that more data will be coming into buffer. Sending a partially-completed segment would be wasteful since it has empty bytes that can still be used for the incoming data.

It's important to note again that emitCompleteSegments() only writes to the underlying Sink and not all the way down the chain (i.e., it's not flushCompleteSegments()). This means that if there is an intermediate buffer which isn't monitoring its size and occasionally calling this method you will end up buffering the whole video.

When the video is done encoding and no more bytes will be written a call to emit() (with the same caveats as the last paragraph) or flush() should happen so that the final bytes are not left in the buffer.

Serializing an object to JSON

If we wanted to create a library that took an object and serialized it to JSON we would probably give it a method signature like this:

public void toJson(Object o, Sink sink)

Because Sink offers no convenience on its own, the implementation would buffer it into an BufferedSink for access to high-level APIs.

Once the implementation was finished writing the JSON representation to the BufferedSink it needs to return. In order to not leave any bytes in the buffer it needs to call emit(). This writes all of the buffered bytes to the underlying Sink, but not any further. Whether or not the Sink should be flushed all the way down is left as a decision for the caller. This affords more control so that if the caller wants to serialize multiple objects and flush them all at once they are able to do so.

Astute readers might be aware of Moshi–a small JSON serialization library built with Okio. Unlike our hypothetical toJson method above, Moshi's method actually requires a BufferedSink directly. The reason for this is completely unrelated to flushing or emitting data, but rather for symmetry with the fromJson method which requires a BufferedSource.

So flush(), emit(), and emitCompleteSegments() each instruct a BufferedSink to move data to the underlying Sink in slightly different ways. Understanding that difference ensures that your bytes do not get lost inside of intermediate buffers but also that you move the minimal amount of bytes for your needs.

* Technically there is two exceptions. Buffer has a completeSegmentByteCount() method which returns the number of bytes that would be moved by a call to emitCompleteSegments().

https://jakewharton.com/forcing-bytes-downward-in-okio

Just Say mNo to Hungarian Notation

Jan 21, 2016 Updated Jan 21, 2016

Show full content

Every day new Java code is written for Android apps and libraries which is plagued with an infectious disease: Hungarian notation.

The proliferation of Hungarian notation on Android is an accident and its continued justification erroneous. Let's dispel its common means of advocacy:

"The Android Java style guide recommends its use"

There is no such thing as an Android Java style guide that provides any guidance on how you should write Java code. Most people referencing this non-existent style guide are referring to the style guide for contributions to the Android Open Source Project (AOSP).

You are not writing code for AOSP so you do not need to follow their style guide.

If you're working on code that might someday live in AOSP you don't even need to follow this style guide. Almost all of the Java libraries imported by AOSP do not follow it, and even some of the ones developed inside of AOSP don't either.
"The Android samples use it"

These samples started life in the platform inside of AOSP so they adhere to the AOSP style. For those which did not come from AOSP, the author either incorrectly believes the other points of advocation in this post or simply forget to correct their style when writing the sample.
"The extra information helps in code review"

The 'm' or 's' prefix on name indicates a private/package instance field or private/package static field, respectively, where this would otherwise not be known in code review. This assumes the field isn't visible in the change, since then its visibility would obviously be known regardless.

Before I attempt to refute this, let's define Hungarian notation. According to Wikipedia, there are two types of Hungarian notations:
- System notation encoded the data type of the variable in its name. A user ID that was a long represented in Java would name a variable lUserId to indicate both usage and type information.
- Apps notation encoded the semantic use of the variable rather than it's logical use or purpose. A variable for storing private information had a prefix (like mUserId) whereas a variable for storing public information had another prefix, or none whatsoever.
So when you see the usage of a field, which piece of information is more important for the review: the visibility of that field or the type of that field?

The visibility is a useless attribute to care about in a code review. The field is already present and available for use, and presumably its visibility was code-reviewed in a previous change. The type of a field, however, has a direct impact on how that field can being used in the change. The correct methods to call, the position in arguments, and the methods which can be called all are directly related to its type.

Not only is advocating for 'apps' Hungarian wrong because it's not useful, but it's doubly wrong since 'system' Hungarian would provide more relevant info. That's not to say you should use 'system', both the type and visibility of a field changes and you will forget to update the name. It's not hard to find static mContext fields, after all.
"The extra information helps in development"

Android Studio and IntelliJ IDEA visually distinguish field names based on membership (instance or static):

IDEs will enforce correct membership, visibility, and types by default so a naming convention isn't going to add anything here. A popup showing all three properties (and more) of a field is also just a keypress away.
"I want to write Java code like Google does"

While Android and AOSP are part of the company, Google explicitly and actively forbids Hungarian notation in their Java style guide. This public Java style guideline is the formalization of long-standing internal conventions.

Android had originated outside of Google and the team early on chose to host the Hungarian disease. Changing it at this point would be needless churn and cause many conflicts across branches and third-party partners.

With your continued support and activism on this topic, this disease can be eradicated in our lifetime.

mFriends don't let sFriends use Hungarian notation!

https://jakewharton.com/just-say-no-to-hungarian-notation

Java Interoperability Policy for Major Version Updates

Dec 11, 2015 Updated Dec 11, 2015

Show full content

Major version updates to libraries solve the API warts of old and bring shiny new APIs to address previous shortcomings—often in a breaking fashion. Updating an Android or Java app is usually a day or two affair before you reap the benefits. Problems arise, however, when other libraries you depend on have transitive dependencies on older versions of the updated library.

Retrofit 2.0 is nearing release and it comes with three years of knowledge gained since its version 1.0—some of which is in backwards-incompatible API changes. We are fortunate to say that Retrofit has become a popular library, but it presents a real problem in that other libraries have been published which rely on its 1.x API. While a sudden breaking change doesn't present an immediate problem for them, consumers of those libraries wanting to upgrade their apps to the new API face a difficult choice.

This problem is not new, and I won't waste time rehashing all its nuances. After some discussion with Jesse Wilson, we have decided on a course of action for the libraries we manage going forward in order to mitigate this pain. The following does not assume strict semantic versioning, but general adherence to its idea of major version bumps.

For major version updates in significantly foundational libraries we will take the following steps:

Rename the Java package to include the version number.

This immediately solves the API compatibility problem from transitive dependencies on multiple versions. Classes from each can be loaded on the same classpath without interacting negatively.

Users can perform major versions updates gradually or in increments rather than requiring an immediate switch. If possible, shims for older versions can be built on newer versions in a sibling artifact.

For example, versions 0.x and 1.x would be under com.example.retrofit, versions 2.x would be under com.example.retrofit2, and so on.

(Libraries with a major version of 0 or 1 can skip this, and only start with major version 2 and above.)
Include the library name as part the group ID in the Maven coordinates.

Even for projects that have only a single artifact, including the project name in the group ID allows future updates that may introduce additional artifacts to not pollute the root namespace. In projects that have multiple artifacts from inception, it provides a means of grouping them together on artifact hosts like Maven central.

For example, the Maven coordinates for the main Retrofit artifact could be com.example.retrofit:retrofit. Additional modules (present or future) can be listed under the same group ID such as com.example.retrofit:converter-moshi.
Rename the group ID in the Maven coordinates to include the version number.

Individual group IDs prevent dependency resolution semantics to upgrade older versions to newer, incompatible ones. Each major version is resolved independently allowing transitive dependencies to be upgraded compatibly.

For example, take a project given Library A with a dependency on 1.2.0, Library B with a dependency on 1.3.0, Library C with a dependency on 2.1.0, and a direct dependency on 2.4.0. A dependency resolver would first choose 1.3.0 for which Library A and Library B are compatible using the 1.x group ID. The resolver would then choose 2.4.0 for which Library C is compatible using the 2.x group ID.

Group ID renaming is chosen over the artifact ID for a few reasons:
- The filename of built artifacts is the combination of the artifact ID the the version. If the artifact ID contained the major version it would appear redundant (e.g., retrofit2-2.1.0).
- Projects can be comprised of multiple artifacts and not all of them contain the raw name of the project. Properly describing the contents of the artifact is more important than including versioning information.
- Maven-based builds reference dependencies on sibling modules in the same project using their artifact ID but can use variables for the group ID and version. If the artifact ID were to change, a lot of error-prone pom.xml changes would be required instead of one group ID change.
(Libraries with a major version of 0 or 1 can skip this, and only start with major version 2 and above.)

Each of these steps are not new ideas themselves. The growing usage of the libraries on which we work has forced us to figure out a reasonable policy to ensure major version upgrades are as smooth as possible. We are excited to offer something that will allow our users to upgrade sooner while also having relatively low maintenance cost for us.

The forthcoming releases of Retrofit 2.0 and OkHttp 3.0 will be the first two libraries to apply this policy. Enjoy!

https://jakewharton.com/java-interoperability-policy-for-major-version-updates

SQLBrite: A reactive Database Foundation

Feb 25, 2015 Updated Feb 25, 2015

Show full content

This post was published externally on Square Corner. Read it at https://developer.squareup.com/blog/sqlbrite-a-reactive-database-foundation.

https://jakewharton.com/sqlbrite-a-reactive-database-foundation

Better Parameterized Tests with Burst

Nov 21, 2014 Updated Nov 21, 2014

Show full content

This post was published externally on Square Corner. Read it at https://developer.squareup.com/blog/better-parameterized-tests-with-burst.

https://jakewharton.com/better-parameterized-tests-with-burst

The Conference Speaker Investment

Nov 20, 2014 Updated Nov 20, 2014

Show full content

Speaking at conferences is an investment of speakers in time, energy, and knowledge. The quality of a presentation and of the conference itself can be measured in the amount of investment made.

Here are my tips for future conference speakers.

Choosing A Conference

At which conference you choose to speak is arguably as important as the topic selection and the amount of work put into the presentation. There are very obvious factors that everyone should consider such as the overall theme, audience that will be attending, and overall conference size.

A conference of 200 people on a specific topic in a remote region deserves a very different preparation than one of 2000 people on a general topic in a large tech hub. The smaller of these two would require something very detailed on which you consider yourself an expert because the people in attendance will expect it. At the larger one you can still give the same detailed talk, but you also have the option of being more broad, appealing more to introductory learners, and not having to be an absolute authority on the subject.

Beyond those, there are other very important factors that I consider essential criteria when selecting where to speak.

Recording

Unless you are going to give the same presentation over and over again (and some do this, refining it over time), having your talk recorded means you instantly increase the size of your potential audience by 1000x. Not only does this force you to know what you are talking about and make pretty slides, but it helps justify the amount of work required to prepare.

Rather than having only static images to refer to or your sweet (cough) prezi, those wishing to review your presentation can do so directly from you in addition to the static version (see 'Publishing' below).

Some conferences will really go above and beyond in recording. Devoxx not only uses professional recording equipment with manned cameras, but they also publish their talks on parleys.com which features slides that are synchronized with the video. You can watch my talk from last year on Dagger for an example.

Venue

With the amount of time it takes to prepare a presentation, you want it to be seen and heard by as many as possible. A bad venue can really turn an otherwise interesting set of speakers into a painful experience. Not being able to see or hear in a room is bad, but having to turn away tons of people due to poor planning or room size is even worse.

Beyond the physical, the technical facilities of a conference are of great importance. The size, placement, and resolution of the screen ensures that your content will be legible to all in the room. At this point it is inexcusable to have anything other than a widescreen, high-definition projector. Screen size should simply be as large as possible. For a presentation of anything over 20 or 30 a speaker and mic setup is essential. The size of the room should dictate how large the speaker setup needs to be.

This is another Devoxx slam-dunk as it is hosted inside a movie theater with stadium seating (pictured). Presenters wear tiny mics and stand beneath an absolutely massive screen onto which both your slides and a real-time video feed is shown. As both an attendee and presenter it's a fantastic experience.

Organizers

The people behind the conference is an interesting factor to guage a conference by but it turns out to be an important one. Some conferences are run by companies who have motives beyond the conference content itself. All conferences will have sponsors and may even be run by people who work for a single company. The difference is the companies for which running conferences is part of their job. By doing this, their motivation for recording talks, finding the best venues, and having the best equipment is greatly diminished by ulterior motives.

A conference that is run by a company is likely still a good conference to attend and will attract good speakers. It's just a choice you have to make as to whether you want to support something like that. There is a well-known Android conference in San Francisco that I would love to see destroyed by a Droidcon SF since they are guilty of not doing the things listed above.

Creating Your Slides

There are tons of articles on how to create effective slides. Slides should normally exist to augment and re-enforce you. Granted, some things like screenshots and code snippets are invaluable, and invert the roles allowing you to augment them. I'm not going re-iterate the well-known things and instead focus on what's applicable for technical talks.

Getting Started

There's a variety of tools to choose from for creating slides, choose the one that best suits the job. Google Slides is fantastic for collaborating on content, Keynote has a fantastic balance of simplicity and power, and Javascript-based tools allow fantastic interactivity. There's also PowerPoint, but I'm not a Windows user. All will work, choose the one which will make you the most productive and I'll leave it to other articles to compare.

Slides should be done in widescreen (16:9) at 1080p (1920x1080) resolution. Hopefully this will be the native resolution at which they will be played (as discussed above), but at worst they will be downscaled which is always better than having them upscaled.

Never start a presentation with a description of who you are, what you do, and the outline of your slides. This is content for the abstract of your talk that most people will have read and used in choosing to attend. Those who didn't read this information don't need it because they chose your talk for its catchy title or something and thus won't care either way.

Text & Bullets

Good 'ol bullets of text. The fundamental building block of a presentation. Some people can get away with using only single words, short phrases, and pretty images but rarely does that fly for a technical talk.

There is no "too large" for font size of your text. Better to err on the side of large than too small. A single slide should only be able to fit 6 single-line bullets of text. What you think is readable when sitting two feet from your 23" monitor might not be readable at full scale. Start your slideshow and stand back 6 to 10 feet (depending on monitor size) and ensure that everything is readable.

Use simple layouts for text on slides which maximizes the available space and readability. The large size of your font tends to force this, but some slide layouts too heavily emphasize design rather than practicality. For example, Square has this one absolutely atrocious slide template (pictured) which put the title and bullets side-by-side and end up wasting 2/3rds of the screen real estate. It looks great when sitting in front of your laptop but in actual presentation only the first groupings of people will be able to read them with ease. The title should be at the top, bullets underneath, and font size at least doubled.

The text on the actual bullets is tricky to get right. You want enough information to convey the point you are trying to get across but not so much to force the audience spend most of their time reading. Sticking to a single line is usually a good litmus test for the right amount of content. Each bullet should animate in individually when you are talking about it. Having all the bullets on the screen at once will send most people off to read them all rather than listen to you.

Visualizations

Conveying abstract or complicated concepts effectively with only text can be challenging. Visualizations can aid in helping the audience understand exactly what it is you are describing.

Similar to text, visualizations should be large and unambiguous. If there is more than 10 or 12 components you are going to saturate the individual's ability to understand the content. Remember, while you are familiar with everything happening on screen this is the first time the attendees are seeing it.

Visualization should start simple and gradually enhance themselves if there are more than 5 or 6 components. In addition to bringing components in gradually, deliberate animation is also extremely helpful. The animation draws the eye to the correct place and should be used to help convey the concept which you are describing. In Keynote, break the visualization up into multiple slides and use the 'Magic Move' slide transition to handle figuring out most of the moves for you.

A visualization is not a substitute for a description of what is happening and why. Phrases like "as you can see" are red flags that you need to be the one explaining the behavior rather than relying on the visualization or animation to do it for you. Avoid trying to force the user where to look with commands. If there is an area you want to focus on, fade the unimportant parts of the visualization to 50% opacity to naturally draw their eyes to the important sections.

Code

What is a technical presentation without code? Showing code on screen is a very hard thing to do correctly. Code comes with an inherent cognitive overhead versus text, but it also is fraught with opinion in both language and style.

It should come as no surprise that yet again size is important here. You don't need to go as large as text, but limit the number of lines of code to 12 to 15 on a slide. Similar to the gradual enhancement in visualizations, if you need to show more than 5 or 6 lines of code break it up into logical chunks and fade them in individually. Explain each section as it comes in and then summarize in one line again once the complete snippet is on the screen to reinforce the behavior.

Syntax highlighting of code is essential for slides. Programmer's eyes are trained to recognize the semantic building blocks of code through syntax highlighting. There is absolutely no reason to omit this. IntelliJ 14 or newer will do rich-text copy with syntax highlighting and can even be customized to use a different style when copying than the active one (search 'rich' in preferences).

Often when showing code in a presentation, you will want to show code snippets that change between slides as a result of demonstrating new APIs or just as a different example. Back-to-back slides with syntax-highlighted code is hard on your audience because they need to both understand what changed and then understand why. Every single change in code you make should initially desaturate the unchanged parts and then automatically re-saturate after a short interval (pictured). Duplicate the original slide and update it with the changed code. Copy and paste the code on the new slide and position it at the same x and y positions. Select the parts of the code which haven't changed and set them to a neutral gray. Put a fade-out animation on the desaturated code of 0.5 seconds with a delay of 1.5 seconds and set it to play automatically when the slide is shown. Put a 0.5 second cross-fade transition between the two slides.

The same desaturation technique should be used whenever you need to emphasize a section of code that is already on screen. Here, however, it's wise to do it as explicit animation steps for both desaturation and re-saturation since you want the focus to be kept for as long as you are talking about that code (pictured). Use the same technique as above with separate slides with the exception of having the desaturated code fade out on click.

Publishing

Unless a conference has really professional recording and post-processing the quality of your slides on the video will be low. Publishing them separately allows for a much higher quality and gives control to the viewer as to how quickly or slowly they want to move forward. If you have followed the above recommendations your slides can mostly likely stand on their own without the video. Those referring back to them won't need to use the recording and those not in attendance can look at them while waiting for the video to be released.

SpeakerDeck is my preferred publishing platform of choice. It has a no-frills interface without ads or noisy chrome around the slides like other sites. It also allows embedding if you have your own website that chronicles your presentations.

To export your slides, make a copy of the original version and append '-ForExport'. You will almost certainly have to make changes for export and operating only on a copy ensures you don't accidentally overwrite the original. After opening the copy, immediately export a PDF. Be sure to select the options that creates a separate slide for each animation step and include the slide number. Open the PDF and flip through from start to end noting any changes you want to make in a text editor with the slide number. Flip back to the presentation and make the changes, re-export, and repeat until you have something you are satisfied with. The two most common changes I have to make is removing animations which play immediately after transition and splitting slides with complex move animations into multiple.

Once you are satisfied with your PDF, do one final export which disables the slide numbers and sets the image quality to its highest setting. Save both the 'ForExport' version of the presentation and the final exported PDF along side the original. I've had to circle back and tweak the export or email the exported PDF on multiple occasions. Upload the presentation to SpeakerDeck, give it a good title and description, and check the little publish box. Once the video is made available, be sure to update the description with a link to the video.

These are my opinions having done this only about 10 times. By no means am I an expert, but I think I have good grasp on how to choose the presentations that I give. And if you didn't pick up on it, Devoxx and Keynote are my current bar by which I measure the venue and presentation quality, respectively.

I've uploaded some files from my most recent presentation on Dagger 2 as examples:

Original Keynote (zip): drive.google.com/open?id=0B490UMAh3G13TjBrTGQ2OVBFdjA
Original PDF Export: drive.google.com/open?id=0B490UMAh3G13YjZUNnNoaHluWmc
'ForExport' Keynote (zip): drive.google.com/open?id=0B490UMAh3G13U3pOZ0o0WXNUXzA
Final PDF Export: drive.google.com/open?id=0B490UMAh3G13ZlFfVVdiZDJvcGs
SpeakerDeck upload: speakerdeck.com/jakewharton/dependency-injection-with-dagger-2-devoxx-2014

https://jakewharton.com/the-conference-speaker-investment

Coercing Picasso To Play With Palette

Oct 24, 2014 Updated Oct 24, 2014

Show full content

Features feel how I imagine children would. They are relatively easy to spawn but require a lengthy committment and constant care. Picasso's API would be nothing short of a Greek tragedy if we humored every feature request that we received. Finding the right balance of what is appropriate to add and what isn't is a constant struggle.

When the Palette library was teased the inevitable feature request came in to Picasso for a means of supporting it. This was certainly not an unreasonable request, and while we might not explicitly support it directly we will probably add something in the future to more easily facilitate its use.

But what do we do in the interim? The API for proper support is many months if not a year away from actually being implemented. Let's walk through an attempt to adapt the existing APIs to allow Palette's use.

The fundamental component of Picasso's data pipeline after the request has been fulfilled is a Bitmap. We use this in the return values and method parameters which traverse upwards to the main thread. Bitmap is a final class in Android so we will not get an opportunity to hang extra metadata on a subclass.

There are two ways that Picasso can notify the caller of a successful image download: the Callback when loading directly into an ImageView or the more generalized Target. A Target is given direct access to the Bitmap object but the Callback is not (although you can get it indirectly with some cleverness). Regardless of Bitmap access is the problem that both of these are called on the main thread. Since Palette does a decent amount of computation we don't want to do it here anyways.

Palette actually has an asynchronous mode of operation that we could leverage in these two callback locations but you wouldn't want to. The images are ready to be displayed when the callbacks are invoked so either delaying display until you can run Palette on another background thread or displaying the image right away and getting the Palette information later both seem like sub-par experiences.

From the time Picasso gets the Bitmap to the time it makes it back to the main thread, where can we hook in to allow the invocation of Palette inside of the threading model which is already being managed by Picasso? Those familiar with Picasso will know that there's two places: the Downloader (or RequestHandler in the upcoming v2.4) and in a Transformer.

Downloader and the upcoming RequestHandler are means to obtaining the original Bitmap instances (or InputStream instances) to fulfill a request. While we could invoke Palette here, I'm going to immediately reject it for a few reasons. There are multiple sources from which an image can be loaded which means we need to duplicate our logic across all of them. Additionally, the number of sources is constantly changing and some of them you cannot replace (I'm looking at you, drawable resource ID loading). Sometimes sources provide an InputStream which means we now have the burden of doing the initial Bitmap decoding ourselves--a job which is supposed to be Picasso's, right? Finally, the Bitmap at this level is the raw, original sized version. Not only will Palette operate much more slowly on it but a later transformation might alter the color makeup of the image.

A Transformer, it would seem, remains our only hope. And in fact, the more you look at it the more appealing it becomes. A Transformer always receives a Bitmap instance. It is invoked after all of the internal transformations have been applied. This means that the ever-common fit()/resize() & centerCrop() combo have already been executed. All transformers run last in the pipeline and the order is controlled by the caller. This means that we can place a custom transformer as the very last thing that is run before Picasso starts the process of sending the Bitmap back to the main thread.

I think we found our hook. Let's get started on some code:

public final PaletteTransformation implements Transformation {
  @Override public Bitmap transform(Bitmap source) {
    // TODO Palette all the things!
    return source;
  }

  @Override public String key() {
    return ""; // Stable key for all requests. An unfortunate requirement.
  }
}

While Transformer gives us the very last minute hook into the processing pipeline, as we noted before, its return type of Bitmap means we aren't hanging any metadata directly on the return value. How can we propogate this metadata between the transformation back to the call site?

Looking at how we invoke Picasso with our transformation should give you a clue:

Picasso.with(context)
    .load(url)
    .fit().centerCrop()
    .transform(new PaletteTransformation())
    .into(imageView, new EmptyCallback() {
      @Override public void onSuccess() {
        // TODO I can haz Palette?
      }
    });

Those familiar with Picasso best practices should be screaming about the new PaletteTransformer() snippet. In general, all Picasso transformations should be completely stateless functions so that a single instance can be used for every call to .transform(). In this case we are going to make an exception because the transformation looks like a great place to pass along our metadata.

final PaletteTransformation paletteTransformation = new PaletteTransformation();
Picasso.with(context)
    .load(url)
    .fit().centerCrop()
    .transform(paletteTransformation)
    .into(imageView, new EmptyCallback() {
      @Override public void onSuccess() {
        Palette palette = paletteTransformation.getPalette();
        // TODO apply palette to text views, backgrounds, etc.
      }
    });

Now that we have a working model to hand off the metadata, let's update our PaletteTransformation to actually extract the palette from the Bitmap that is passing through.

public final PaletteTransformation implements Transformation {
  private Palette palette;

  public Palette getPalette() {
    if (palette == null) {
      throw new IllegalStateException("Transformation was not run.");
    }
    return palette;
  }

  @Override public Bitmap transform(Bitmap source) {
    if (palette != null) {
      throw new IllegalStateException("Instances may only be used once.");
    }
    palette = Palette.generate(source);
    return source;
  }

  // ...
}

While this looks like a working solution, there are two problems:

It requires an additional object allocation for every Picasso request (something we try very hard at to minimize).
Images which are cached do not pass through the transformation pipeline and thus will always return null from getPalette().

Object allocations are becoming more cheap with newer platform versions but will never be free. Since Picasso is often called thousands of times in very performance sensitive areas of applications we aim for the utmost efficiency in terms of CPU and memory use.

Saving the allocation for our transformation is easy. We can use the well-known pattern of object pooling. Thanks to recent support library updates, this is even easier to do than before with the Pools helper.

public final PaletteTransformation implements Transformation {
  private static final Pool<PaletteTransformation> POOL = new SynchronizedPool<>(5);

  public static PaletteTransformation getInstance() {
    PaletteTransformation instance = POOL.obtain();
    return instance != null ? instance : new PaletteTransformation();
  }

  private Palette palette;

  private PaletteTransformation() {}

  public Palette extractPaletteAndRelease() {
    Palette palette = this.palette;
    if (palette == null) {
      throw new IllegalStateException("Transformation was not run.");
    }
    this.palette = null;
    POOL.release(this);
    return palette;
  }

  // ...
}

Our calling code only changes slightly to use the new static factory and the more semantically named palette extraction method.

final PaletteTransformation paletteTransformation = PaletteTransformation.getInstance();
Picasso.with(context)
    .load(url)
    .fit().centerCrop()
    .transform(paletteTransformation)
    .into(imageView, new EmptyCallback() {
      @Override public void onSuccess() {
        Palette palette = paletteTransformation.extractPaletteAndRelease();
        // TODO apply palette to text views, backgrounds, etc.
      }
    });

I have hard coded the pool size to retain 5 instances. This is an educated guess based on how I know Picasso's internals to work. If you were adopting this implementation you should add logging to test whether the size needs to be increased for you application.

Dealing with memory-cached images is a bit less straightforward. Picasso's Cache is hard coded to use Bitmap as the value which means we can't wrap up the Palette instance along side. Since we can't use the main cache we will be forced to mirror it.

In deciding on our cache key we have a choice: the Bitmap which is the source of truth for the pixels or the URL which is the source of truth for the image. The choice here leads to two different implementations and neither one is applicable to all use cases. We'll quickly explore both which will yield the final result.

Keying by Bitmap creates the most elegant implementation of the transformation at the expense of ugliness in the calling code. We can even revert to using a single transformation instance with an embedded cache. Pooling is fun but not having to pool is even better!

public final class PaletteTransformation implements Transformation {
  private static final PaletteTransformation INSTANCE = new PaletteTransformation();
  private static final Map<Bitmap, Palette> CACHE = new WeakHashMap<>();

  public static PaletteTransformation instance() {
    return INSTANCE;
  }

  public static Palette getPalette(Bitmap bitmap) {
    return CACHE.get(bitmap);
  }

  private PaletteTransformation() {}

  @Override public Bitmap transform(Bitmap source) {
    Palette palette = Palette.generate(source);
    CACHE.put(source, palette);
    return source;
  }

  // ...
}

The WeakHashMap will release the Palette reference when its associated Bitmap is garbage collected. We rely on Picasso's memory cache to retain the strong reference even if it isn't currently being displayed in an ImageView.

The calling code has to obtain the final Bitmap in order to query the cache. This is trivial in Picasso's Target, but much more ugly in the more common Callback.

Picasso.with(context)
    .load(url)
    .fit().centerCrop()
    .transform(PaletteTransformation.instance())
    .into(imageView, new EmptyCallback() {
      @Override public void onSuccess() {
        Bitmap bitmap = ((BitmapDrawable) imageView.getDrawable()).getBitmap(); // Ew!
        Palette palette = PaletteTransformation.getPalette(bitmap);
        // TODO apply palette to text views, backgrounds, etc.
      }
    });

This approach places each Bitmap as the source of truth for the Palette instance. The same URL which is displayed at different resolutions might have slightly different values in each swatch because of this. Using the URL as the key on a cache would ensure that multiple sizes of the same image had exactly the same palette. I'm not exactly sure of how much of an issue this is in practice, if any.

There are other problems with a URL String key approach, however. Picasso's default LruCache implementation does not expose a callback for when entries are being purged. This means we have no way of reference counting the Palette instances in a parallel cache (remember, multiple entries in the main cache could reference the same Palette). We could create a new Cache implementation based on the support-v4 library LruCache (from which Picasso's is based) but now we are duplicating functionality for little gain.

Another way to solve the problem would be a double-key map where the URL String is mapped to the Palette instance, but also each resulting transformed Bitmap instance was used as a map key with a weak reference. When a reference queue callback was invoked because the Bitmap was garbage collected, we check to see whether any Bitmap mappings still exist and if not purge the String to Palette mapping. This is viable, but it's more work than I am willing to do because ultimately we are designing a temporary solution.

It looks like the Bitmap key based approach is the most viable at the expense of potential (but not proven) variance in the Palette instances for multiple Bitmaps of the same image URL.

This is a long post and there isn't exactly a nice neat bow to tie it all together with. I wrote it in this way because I wanted to showcase two very important things when it comes to how we do feature and API design:

Not every use case deserves a first-party API but with some clever thinking you can usually accomplish what you are after by thinking outside of the box. We ultimately will support the use of Palette in Picasso but less as a first-class citizen and more through a general means of passing along arbitrary metadata through the pipeline.
Exhausting multiple implementations and solutions to a problem is essential for ironing out the best approach. This was a journey to the final conclusion and it's one we frequently make for most of the changes we make to Picasso and all of our open source libraries. Not only can we be sure that we reached a good solution, but we also are able to defend and justify our decisions.

If you come upon any other bright ideas for applying Palette into Picasso I'd love to hear about them. Otherwise get playing with the fantastic Palette library today and keep an eye out for Picasso 2.4 in the next few days!

https://jakewharton.com/coercing-picasso-to-play-with-palette

Play Services 5.0 Is A Monolith Abomination

Jul 3, 2014 Updated Jul 3, 2014

Show full content

Guava is a monolithic library, but that's not necessarily a bad thing. Nobody thinks twice when bundling it for the JVM. In the world of Android the mention of Guava has a bit of a negative stigma due to the dex file format's method limit and a concern about bloating APK size. The latter is no longer a valid argument. The dex method limit is a hard 64k limit to which Guava contributes just over 14k methods. 20% of this hard limit vanishes when you include Guava.

Sounds scary, right? It isn't.

Google Play Services 5.0 which just launched contributes over twenty thousand methods to your app. 20k+. One third of the limit! Now that is scary.

The Play Services library includes proprietary functionality built on the normal Android APIs and a separate APK downloaded on all devices with the Play Store. Some of the services it provides are invaluable. Like Guava it is also a monolothic library but it is a bad thing in this case.

A lot of really cool functionality is being put in Play Services. You'll have a hard time making a compelling app that lives in the Google Play ecosystem without it. You should want to put it in your applications and not have to worry about the overhead it brings.

Most of the library's offerings are very disparate, having only the fact that they're by Google as a common thread. This screams for small, modular artifacts which can be composed!

Google, it's time to unbundle. All the cool kids are doing it. (Spoiler alert: it happened)

At worst, we specify a few dependencies manually:

dependencies {
  compile 'com.google.android.gms:play-services-ads:5.0.+'
  compile 'com.google.android.gms:play-services-analytics:5.0.+'
  compile 'com.google.android.gms:play-services-games:5.0.+'
}

Best case would be a plugin that provided a clear DSL to what you were getting and offered easier configuration of the various components.

apply plugin: 'com.google.playservices'

playServices {
  version '5.0.+'
  components 'ads', 'analytics', 'games'
}

(You can even still provide the "fat" jar in both the dependency management world and the people who like manual dependency management.)

ProGuard is not the answer. Yes, for release builds it's nice to strip out any methods which are not being used. However, this is not justification for having large chunks of unused code as dependencies. Besides, if you read my post on a simulator you know that we deserve a faster development build pipeline which removes steps, not adds them.

It's not going to be a walk in the park but the packages inside Play Services are surprisingly well-configured to partitioning:

(Top-left: Games, top-center: Drive, middle-left: Plus, middle: common, middle-right: Maps, bottom: Ads)

Here's Guava for comparison which has less clear partition lines:

Here's how the method counts were determined:

$ curl 'http://search.maven.org/remotecontent?filepath=com/google/guava/guava/17.0/guava-17.0.jar' > guava.jar
$ ~/android-sdk/build-tools/20.0.0/dx --dex --output guava.dex guava.jar
$ dex-method-count guava.dex
14824

$ cp ~/android-sdk/extras/google/m2repository/com/google/android/gms/play-services/5.0.77/play-services-5.0.77.aar .
$ unzip play-services-5.0.77.aar
$ ~/android-sdk/build-tools/20.0.0/dx --dex --output play-services.dex classes.jar
$ dex-method-count play-services.dex
20298

And the full by-package breakdown of Play Services:

$ dex-method-count-by-package play-services.dex
20298 com
20298 com.google
207   com.google.ads
169   com.google.ads.mediation
73    com.google.ads.mediation.admob
62    com.google.ads.mediation.customevent
20188 com.google.android
20188 com.google.android.gms
2     com.google.android.gms.actions
480   com.google.android.gms.ads
135   com.google.android.gms.ads.doubleclick
25    com.google.android.gms.ads.identifier
88    com.google.android.gms.ads.mediation
4     com.google.android.gms.ads.mediation.admob
73    com.google.android.gms.ads.mediation.customevent
26    com.google.android.gms.ads.purchase
118   com.google.android.gms.ads.search
866   com.google.android.gms.analytics
52    com.google.android.gms.analytics.ecommerce
10    com.google.android.gms.appindexing
151   com.google.android.gms.appstate
80    com.google.android.gms.auth
644   com.google.android.gms.cast
1026  com.google.android.gms.common
12    com.google.android.gms.common.annotation
382   com.google.android.gms.common.api
235   com.google.android.gms.common.data
202   com.google.android.gms.common.images
126   com.google.android.gms.common.internal
126   com.google.android.gms.common.internal.safeparcel
1940  com.google.android.gms.drive
87    com.google.android.gms.drive.events
897   com.google.android.gms.drive.internal
241   com.google.android.gms.drive.metadata
202   com.google.android.gms.drive.metadata.internal
205   com.google.android.gms.drive.query
151   com.google.android.gms.drive.query.internal
451   com.google.android.gms.drive.realtime
451   com.google.android.gms.drive.realtime.internal
123   com.google.android.gms.drive.realtime.internal.event
38    com.google.android.gms.drive.widget
332   com.google.android.gms.dynamic
4534  com.google.android.gms.games
73    com.google.android.gms.games.achievement
113   com.google.android.gms.games.event
2956  com.google.android.gms.games.internal
858   com.google.android.gms.games.internal.api
43    com.google.android.gms.games.internal.constants
8     com.google.android.gms.games.internal.data
31    com.google.android.gms.games.internal.events
9     com.google.android.gms.games.internal.experience
215   com.google.android.gms.games.internal.game
56    com.google.android.gms.games.internal.multiplayer
23    com.google.android.gms.games.internal.notification
80    com.google.android.gms.games.internal.player
86    com.google.android.gms.games.internal.request
256   com.google.android.gms.games.leaderboard
640   com.google.android.gms.games.multiplayer
239   com.google.android.gms.games.multiplayer.realtime
256   com.google.android.gms.games.multiplayer.turnbased
213   com.google.android.gms.games.quest
150   com.google.android.gms.games.request
210   com.google.android.gms.games.snapshot
47    com.google.android.gms.gcm
111   com.google.android.gms.identity
111   com.google.android.gms.identity.intents
62    com.google.android.gms.identity.intents.model
5760  com.google.android.gms.internal
295   com.google.android.gms.location
2342  com.google.android.gms.maps
804   com.google.android.gms.maps.internal
1068  com.google.android.gms.maps.model
483   com.google.android.gms.maps.model.internal
14    com.google.android.gms.panorama
902   com.google.android.gms.plus
352   com.google.android.gms.plus.internal
316   com.google.android.gms.plus.model
192   com.google.android.gms.plus.model.moments
126   com.google.android.gms.plus.model.people
33    com.google.android.gms.security
1367  com.google.android.gms.tagmanager
867   com.google.android.gms.wallet
376   com.google.android.gms.wallet.fragment
143   com.google.android.gms.wallet.wobs
1011  com.google.android.gms.wearable
714   com.google.android.gms.wearable.internal

You can grab these two scripts from here: gist.github.com/JakeWharton/6002797

The dependency graphs were generated using degraph and yEd. Download the .graphml for Play Services and Guava.

https://jakewharton.com/play-services-is-a-monolith

Android Needs A Simulator, Not An Emulator

Jun 16, 2014 Updated Jun 16, 2014

Show full content

Two years ago I wrote a blog post complaining that the Android build system was broken. At the time, Eclipse ADT and Ant were the blessed solutions and they just hadn't scaled with the platform. Third-party solutions existed for both tooling and IDE but they always felt a bit illegitimate and at risk for problems. My post joined the cries of others who knew that something had to be done.

Xavier Ducrohet swooped in and dropped a bomb on the resulting Google+ thread: "We are looking at revamping the whole thing".

In the two years since he and the tools team have transformed the landscape of how Android development is done. A first-party Gradle plugin now provides the powerful and dynamic platform on which any app of quality is built. Ownership of the Android plugin inside IntelliJ IDEA (with a sprinkle of branding) yields a development environment that moves mountains for you.

Neither the Gradle plugin nor the IntelliJ IDEA plugin (known bundled as Android Studio) are at a v1.0 yet. They're both still beta (albeit arguably in the sense that GMail was circa 2008).

Why was all of this important and why is it important for Google moving forward?

Developers are the top of the funnel for Android's continued success. Without quality development tools there are no quality apps, without quality apps there are no quality users, and without quality users the developers will flee. Half of the developer flow in this funnel comes from the tools and the other half from APIs. This post is about the former.

My first exposure to Android was the M3 pre-release SDK's emulator (pictured). The emulator started up quickly and was responsive. Each successive release up to and beyond version 1.0 added much needed functionality to both the OS and the emulator. And in each successive release the emulator slowed.

Android's pubescent period (otherwise known as Honeycomb) and its eventual emergence into adulthood had a devestating effect on emulator performance. Despite our development environments becoming more powerful, two factors outpaced Moore's law:

Support of tablets and advances in screen technology meant that devices were gaining a lot of pixels.
A more advanced graphics pipeline pushed rendering from software down into the hardware. This brought great performance at the expense of internal complexity.

Working with Intel, Google eventually released an x86 version of the emulator which eliminated ARM emulation and leveraged virtualization technologies built-in to CPUs. This was better. In fact, it was so much better that a lot of people were satisfied — myself included.

Recently, a company called Genymotion was formed around the work of a project that compiled Android to run in a VirtualBox VM. They not only delivered an experience that was faster and simpler than the x86 emulator, but they provided much-needed tools for modern app development. Easier sensor controls, touchscreen input using a remote device, and screen capture support for both images and video are just some of these features which really make the product shine.

Hopefully none of what I've covered is news to you. But now I am ready to start talking about this post's true topic:

All existing emulator solutions are terrible.

The management interface for creating, configuring, and starting emulators is a minimumly-viable Swing app. While the low quality list view is actually fine, the configuration pane is a mess (pictured).

This screen lacks any design to facilitate the correct behavior. The instruction set is usually defaulted to ARM which has to be fully emulated (very slow). Use of the host GPU is defaulted to off which means that the display pipeline will not be hardware accelerated (again, very slow). Unless you've been Googling passive-aggressive phrases about the emulator speed you might never even cross HAXM.

The pain does not cease once the emulator is running (whether using optimal settings or not). Each instance requires significant system resources and the performce of the contained OS will vary wildly. Instances will occasionally hang, crash, or disappear from adb's visibility requiring manual restarts. Controls and interfaces to a few of the sensor are present, but they are far from comprehensive.

Genymotion is a step up from the first-party offering. It has fewer options for configuration because it is already set up for optimal performance. As previously mentioned, its sensor and developer controls are much more rich and useful.

The initial downside of Genymotion is the required user account and strange pricing of commercial licenses (and lack of a site license). They also are not without problems which plague the actual use of the emulator. The VirtualBox images occasionally get corrupted or stuck which require a trip in the depths of your filesystem for manual purging. The free license cripples functionality that would otherwise exist if they hadn't explicitly disabled it (screenshot, recording). Their pricing model for commercial use also does not reflect the amount of utility you actually receive.

We put up with these solutions because they are an improvement compared to what came prior. However, they are nowhere near what we truly need or deserve.

Android needs a simulator for day-to-day development and testing.

sim·u·la·tor /ˈsimyəˌlātər/

A machine with a similar set of controls designed to provide a realistic imitation of the operation of a vehicle, aircraft, or other complex system, used for training purposes.

A simulator is a shim that sits between the Android operating system runtime and the computer's running operating system. It bridges the two into a single unit which behaves closely to how a real device or full emulator would at a fraction of the overhead.

The most well known simulator to any Android developer is probably (and ironically) the one that iOS develoers use from Apple. The iPhone and iPad simulators allow quick, easy, and lightweight execution of in-development apps. If you haven't seen this simulator in action, I would encourage you to take a two-minute tour of one before continuing this post.

What does a simulator buy us that a traditional emulator does not?

Creating, configuration, and running simualtors becomes about the runtime, not the right configuration of options for optimal performance.
Instrumentation tests gain stability and speed which thus massively increases their utility. A simulator that runs headless as part of the normal build means your build can run these tests no differently than it compiles the Java sources.
The need for a separate, JVM-based unit test solution diminishes drastically. Even more exciting is that the need for a third-party testing solution like Robolectric dimishes with it. When a headless simulator is part of your build (and with an upcoming test runner diversification), unit tests on the JVM become a first-party delight.
The lack of an emulated architecture layer, the overhead of a whole OS running, and the need for build steps like packaging (more on that to follow) means you are able to develop and deploy at a speed which just isn't possible in the current setup.

There always will be a need for a proper emulator for acceptance testing your application in an environment that behaves exactly like a device. For day-to-day development this is simply not needed. Developer productivity will rise dramatically and the simplicity through which testing can now be done will encourage their use and with any luck improve overall app quality.

Android actually already has two simulators which are each powerful in different ways, but nowhere near powerful enough. Before we talk about them, let's cover why a simulator is a perfect fit for Android development.

Apps are text files of Java The Language™, compiled with javac to JVM bytecode, transformed with dex to Dalvik bytecode, zipped up into an .apk, and signed using jarsigner. Other tools like zipalign and ProGuard are optionally a part of this toolchain but since they aren't usually used in development we can safely ignore them. Prior to the invocation of javac, all of the resources of an app must be parsed with aapt for code generation and special encoding of some files. This is a lot of steps!

Using a simulator would reduce this to a single step: compilation with javac. We are already relying on the JVM for running our IDE, our build system, and our compilation. Why aren't we leveraging it for running a simulated OS?

Ok so I glossed over two other toolchain components we'd need to run our class files in a JVM-hosted simulator:

A modified aapt whose only responsibility was generation of R files would still be needed. Thankfully the slower operations this resource step performs (image optimization and text file encoding) wouldn't be needed. XML files can be read on-the-fly as text by the simulator. Images don't need optimized since they are just being displayed from the local filesystem.
A signing key is required for the OS to verify installation and grant special permissions. Rather than having to actually sign anything, a simple certificate can be created from the keystore and included as a string.

Imagine how quick the time between modifying your source code and running the application becomes when the only steps needed are a resource scan, javac, and copying a string. Oh, but do you have a ton of dependencies? Not a problem since all that's needed is appending the file path of the .jar file onto the JVM classpath.

The prospect of this "exploded" .apk application should get you seriously excited.

Even more exciting is that there are already two simulators which work with these exploded apps. The first and most well known is Robolectric, a tool for running unit tests on the JVM. The second is named "layoutlib" which is far less known but is used daily by every Android developer.

Robolectric runs a compiled version of the OS in a separate classloader using techniques like bytecode rewriting and proxies. This puts most of the real OS infrasture at your disposal for unit testing code paths of your app that have to touch Android code. People often abuse Robolectric for testing the wrong things but it usually works because the real OS code is used by default.

It takes about two seconds for Robolectric to initialize. Most of this time is creating the custom classloader and initializing all the proxy classes. Once running application code is loaded into the classloader and run like normal Java code. The resources are lazily resolved directly from the source files.
"layoutlib" is a module whose purpose is to run view code on the JVM including parsing layout XML and loading resources. If you've ever used the layout designer or layout preview in either Eclipse or IntelliJ IDEA/Android Studio then you have used this library.

Running your view code (including custom views) is done like any other Java code. The classes inside the library fake out the Context and the resource loading it brings. The rendering pipeline is also simply mapped into normal Java rendering primitives so you can see real-time updates of your layouts.

Both of these libaries use very clever techniques to simulate parts of Android to great success. Neither one is suited to running an application during development which is what we are after.

There are hurdles to be tackled in building a simulator that can host development applications. If you'll remember from above, we already have a pared down version of aapt, a simple representation of the signing key, and are leveraging javac and the JVM classpath for loading code and libraries. Let's enumerate what else is required — none of which are insurmountable.

Native code has to be compiled for x86 in order to be run. Aside from the regular pain of JNI there should not be too much trouble here.
The graphics pipeline of the OS needs hooked in to the host. I won't pretend to know a lot about what would be required in this area. Native code covers software rendering and hooking up OpenGL should facilitate hardware rendering.
A variety of interfaces with hardware need replicated or faked. Some have obvious native equivalents on the host like bluetooth and networking. Some can simply be fixed to default values like the accelerometer and compass. Others can not be marked as present or just emulated.

Thankfully these are all solvable problems. Each one just needs the right person with the time and effort to tackle it. However, therein lies another problem.

The single greatest hurdle to the creation of the simulator we deserve is the time, effort, and desire required to build and maintain it. Sorry, tools team! I'm told they're hiring.

Follow the discussion on Google+ and Reddit.

https://jakewharton.com/android-needs-a-simulator

Hello Picasso 2.3

May 30, 2014 Updated May 30, 2014

Show full content

This post was published externally on Square Corner. Read it at https://developer.squareup.com/blog/hello-picasso-2-3.

https://jakewharton.com/hello-picasso-2.3

Dynamic Images with Thumbor

Jan 20, 2014 Updated Jan 20, 2014

Show full content

This post was published externally on Square Corner. Read it at https://developer.squareup.com/blog/dynamic-images-with-thumbor.

https://jakewharton.com/dynamic-images-with-thumbor

Enhance Your Application Using Picasso

May 14, 2013 Updated May 14, 2013

Show full content

This post was published externally on Square Corner. Read it at https://developer.squareup.com/blog/enhance-your-application-using-picasso.

https://jakewharton.com/enhance-your-application-using-picasso

Easy HTTP Requests with Retrofit

May 13, 2013 Updated May 13, 2013

Show full content

This post was published externally on Square Corner. Read it at https://developer.squareup.com/blog/easy-http-requests-with-retrofit.

https://jakewharton.com/easy-http-requests-with-retrofit

MimeCraft, JavaWriter, and ProtoParser

May 8, 2013 Updated May 8, 2013

Show full content

This post was published externally on Square Corner. Read it at https://developer.squareup.com/blog/mimecraft-javawriter-and-protoparser.

https://jakewharton.com/mimecraft-javawriter-and-protoparser

Seven Days of Open Source

May 6, 2013 Updated May 6, 2013

Show full content

This post was published externally on Square Corner. Read it at https://developer.squareup.com/blog/seven-days-of-open-source.

https://jakewharton.com/seven-days-of-open-source

The Resurrection of Testing for Android

Apr 3, 2013 Updated Apr 3, 2013

Show full content

This post was published externally on Square Corner. Read it at https://developer.squareup.com/blog/the-resurrection-of-testing-for-android.

https://jakewharton.com/the-resurrection-of-testing-for-android

Deprecated From Inception

Oct 1, 2012 Updated Oct 1, 2012

Show full content

Stop using ActionBarSherlock! Well... eventually.

If you are writing an application right now with a minSdkVersion lower than 14 you should be using it. I am not saying this just as the developer of the library, but as someone who likes to minimize wasted time.

Writing applications is hard and nobody wants to spend time writing boilerplate code. It sucks valuable engineering time away from what is most important in your app: the content. Throw in ActionBarSherlock and a slew of other open source libraries to bootstrap your development so you can make the best app possible.

In order to have pleasant longevity, libraries you integrate into your application should be as small and modular as possible. Inevitably new libraries will be released and some will have features that you wish to integrate at the cost of others. If a library is small, these replacements can be seamless (and often times drop-in).

ActionBarSherlock is very much the opposite of a small, modular library. When you decide to include ActionBarSherlock in your app you are committing to using it for the foreseeable future. Or are you?

By design, ActionBarSherlock uses the native action bar API and theming. Yes you have to use types that exist in different packages and duplicate theme attributes but the names of classes, methods, and attributes mirror their native counterparts exactly. In fact, you have to go out of your way in order to find things that are not API-compatible with the native action bar.

Some day, market saturation and the demands of your application will necessitate updating your minSdkVersion to 14 (or higher).

If you are using a different action bar library or you have rolled your own then you will either have zero work ahead of you (which means you will still be using your crappy implementation) or you have a lot of work ahead of you (migrating to use the native components). Both of these situations are not ideal and I can think of handfuls of massively popular apps who will someday find themselves in this predicament.

On the other hand, if you happen to be using ActionBarSherlock, all you have to do is switch from using the custom types back to the native types. This switch is so easy that 99% of it can be scripted. It boils down to changing a few imports, replacing calls to getSupportActionBar with getActionBar, and using a Holo parent theme rather than a Sherlock one.

This is the entire purpose of the library. It has been designed for this use case specifically. It has been deprecated from inception with the intention that you could some day throw it away.

The process also works in reverse, in fact. At AnDevCon III I gave a talk where I migrated multiple code examples written for the native action bar to work with ActionBarSherlock. The entire process took less than a few minutes and the result was code that only ran on ~8% of devices (at the time) now being able to be run on about 91% of devices.

Google has announced that they are working on a library that will backport a subset of the action bar. I was disappointed in this anouncement for two reasons: it was announced 12 months too late (and remains yet unreleased) and it is wasting engineering time of a talented developer. There are so many gaping holes in Android development where their time could be better spent. If they wanted to come out with a backport then it should have been done at the same time as ICS dropped.

Let us hope they honor the notion of deprecation from inception as well, otherwise they will only make things worse.

Follow the discussion on Google+ and Reddit.

https://jakewharton.com/deprecated-from-inception

The Android Build System Is Broken

Jul 22, 2012 Updated Jul 22, 2012

Show full content

The blessed souls in the Android world with regard to compilation are Eclipse and ant. Both serve admirably if you have a small-to-medium-sized app. You might even pull in a library project and a jar or two. This works great. As the complexity grows beyond this, however, both of these players break down and you are left to fend for yourself.

Advanced configurations which are designed to make it more efficient for you, the developer, end up causing unnecessary strain because the build system cannot handle it. When you have multiple modules for production and development versions of your app so that both can be installed at once and library projects that depend on library projects that depend on library projects—all of which have different overlapping jar dependencies—you are on your own.

Why is this? Well it is mostly because I have been lying to you. Android does not have a build system.

What Android has is a scripting language that has been shoved unceremoniously into XML, a default configuration that attempts to cover all of your use cases, and an IDE whose configuration attempts to mirror the scripting language configuration but has only marginal integration.

This is awful.

The Android community should settle for nothing short of the following:

Dependency management - You should never have to copy a jar or library project into your tree. Version number differences should automatically be resolved. Transitive dependencies should be recursively pulled in.
Build order - Multi-module builds are a directed, acyclic graph and the order of their compilation can be determined by the build system. If you add a dependency between two modules the order should automatically change to accommodate.
Non-Android projects - Modules should not have to be Android library projects to be part of the build path. Pure Java projects (and anything that compiles to class files) must be supported.
Seamless IDE integration - Changes to configurations should be reflected in both command-line builds and IDE builds without any additional effort.

Could all of this be accomplished with ant and Eclipse? Maybe. Should it be attempted? Absolutely not.

I use maven and IntelliJ IDEA for all of my projects and while it solves all of the requirements listed above, it does not feel perfect. At Square we are currently using ant (with a lot of customization) and IntelliJ IDEA. I think almost everyone on the team would agree that it feels far from perfect but it works well enough. Results are hard to argue with but a resounding endorsement this is not.

A build system should empower, not constrain. It should enable, not restrict. It should be dynamic, not rigid.

The bottom line is that whether you use maven, sbt, or gradle we all lose because Google is advocating and supporting ant and the Eclipse plugin.

We finally have an operating system that has been refined at an amazing level of detail. We have tooling around developing and debugging applications to an unparalleled depth. We deserve a build system with the same attention to detail.

Follow discussion on Google+ or Reddit

Update: Be sure you check out Xavier Durochet's reply on the Google+ thread (approximately 17 comments down).

https://jakewharton.com/the-android-build-system-is-broken

Decoupling Android App Communication with Otto

Jul 2, 2012 Updated Jul 2, 2012

Show full content

This post was published externally on Square Corner. Read it at https://developer.squareup.com/blog/decoupling-android-app-communication-with-otto.

https://jakewharton.com/decoupling-android-apps-with-otto

Using ActionBarSherlock As A Base

May 15, 2012 Updated May 15, 2012

Show full content

This post was published externally on Square Corner. Read it at https://developer.squareup.com/blog/using-actionbarsherlock-as-a-base.

https://jakewharton.com/using-actionbarsherlock-as-a-base

Announcing ActionBarSherlock Version 4.0

Mar 7, 2012 Updated Mar 7, 2012

Show full content

It's been approximately three months since the Android 4.0 Ice Cream Sandwich source code landed on the Android Open Source Projects's servers. Since then I have spent countless nights developing what I described in an earlier blog post as the "first true release" of ActionBarSherlock: version 4.0. As of 11:57:57PM PST I have finally tagged and released this new revolutionary version of the library into the wild for your consumption.

This date of release was chosen because of its historical significance within the scope of ActionBarSherlock. Exactly one year ago the first version, 1.0, was tagged and released. If you read the history lesson blog post you'll know that this version only lasted 24 hours before 2.0 came along but it still represents the very first milestone. Version 4.0 is another huge milestone so the fact that it is occurring on the same date should help convey its significance.

For those who are not aware, version 4.0 is a feature-complete backport of the Android 4.0 action bar and its supporting widgets to Android 2.x and up. This allows applications to integrate a 100% API and theme-compatible action bar with extremely little effort. Rather than worrying about interfacing with a custom third-party action bar or adding features to the ActionBarCompat sample, developers can now drop in ActionBarSherlock and focus on the most important part of their app, the content!

In the next 48 hours I plan to fill out the release a bit with a proper migration guide for applications coming from the v3.x branch along with some minor updates to the website with new screenshots. Savy developers should already have all they need, however. The library is available for download on the website along with its sample applications--the source code to which is included in the download.

For those who would like to stay up-to-date with the library in a more automated fashion you can download its samples from the newly re-branded Play Store:

I would like to issue a special thank you to all the developers with whom I have worked closely with on the development of this release and for all of the detailed feedback, bug reporting, and (my personal favorite) pull requests. Some of these developers even have already released applications which are using early versions of the library and I urge you to show them your support as well:

If you release an application that uses ActionBarSherlock version 4 I would love to hear about it. Please contact me via Twitter, Google+, or email.

Quick follow up: It should be noted that some bugs still exist and may rear their heads depending on how you use the library. The library is under constant development and I will do my best to get updates and fixes out ASAP. The library is certainly more stable than not and I could not in good conscience allow anyone to develop using v3.5.x any longer. I'm hoping that the increased exposure that should come with the final release will help attract more developer attention which should aid in resolving bugs more quickly.

Please use the Google group for all library-related discussions @ abs.io/forum

Also, pull requests save lives so send them!

ActionBarSherlock Logo Happy Birthday to me! Ice Cream Sandwich

https://jakewharton.com/actionbarsherlock-four-point-oh

Advanced Pre-Honeycomb Animation with NineOldAndroids

Jan 18, 2012 Updated Jan 18, 2012

Show full content

The lovely new animation framework in Android 3.0 came with some additional methods on the View class to allow for transformations such as translation, scale, rotation, and alpha. The NineOldAndroids library allows for the use of this API on any platform but is limited only to modifying values for which methods exist on the running platform.

Recently I set out to solve this problem and allow for utilizing my library to animate these properties regardless of the API level. Neither an answer to the linked StackOverflow question nor a quick exchange with the animation guru Chet Haase himself semed to produce a reliable, stable implementation for this--the recommendation always being to just use the built-in view animation.

As I was digging around in the View class I noticed that there really was no way to achieve this effect directly, even with reflection. It was only once I started poking around how view animations are processed and executed did a rather clever solution appear to me.

For those that are not familiar with how the view animation framework works, an animation receives a callback with a Transformation object and a time interval. Each animation then adjusts the object in order to reflect the state of the associated view at whatever time interval it is at. Since the Transformation object contains a method for setting alpha and a matrix which is applied to the canvas rendering the view we can easily achieve all of the transformations of the native methods introduced in Honeycomb.

void applyTransformation(float interpolatedTime, Transformation t) {
    //Perform transformations
}

So now that we know these transformations were possible, how best to implement them in a manner that can be used by the new animation API? To accomplish this we use a few tricks of view animation in order to do this in a way that is as lightweight and fast as possible (we're on the UI thread, remember).

Only one animation can be applied to a view at a time so it was obvious that a custom class extending Animation was required to apply our many transformations. Now it became a matter of synchronizing our new class with the NineOldAndroids library since it would be the one actually controlling the animation.

Instead of attempting to integrate NineOldAndroids directly in this custom class I chose to make it act only as a proxy to the alpha and the Transformation object by exposing methods to allow changing the various properties that were introduced in Honeycomb.

In order to take the native stepping of view animation out of the equation, our custom class immediately sets two properties on itself: setDuration(0) and setFillAfter(true). This effectively disables the timer internally triggering the transformation and it allows the transformations that we make to be persisted on the view after the animation has completed. In order for the latter to occur the animation is kept around so that its transformation can be applied whenever the view is invalidated. This is the behavior that we leverage in order to provide our animation.

AnimatorProxy(View view) {
    setDuration(0); //perform transformation immediately
    setFillAfter(true); //persist transformation beyond duration
    view.setAnimation(this);
    mView = view;
}

We expose our new properties as getter and setter methods that the new animation API can interact with and hold them in instance variables in our animation. Each invalidation then triggers our callback which we can then apply the newly updated values for each property, thus, animating the view.

void setAlpha(float alpha) {
    mAlpha = alpha;
    mView.invalidate();
}

This works extremely well and provides fluid, multi-property animation using NineOldAndroids for the new animation API but it still requires us to use the animation proxy class for these specific properties. In order to provide a more seamless experience, we need a way to have this handled automatically.

In order to determine when this class is required we add a small check in the initialization method of ObjectAnimator. If the animation meets the following four conditions then a proxy instance is used: we are using a named property and not a Property, we are running on pre-3.0 Android, the target class is an instance of View, and the named property is one of the ones introduced in Honeycomb.

if ((mProperty == null) && AnimatorProxy.NEEDS_PROXY && (mTarget instanceof View)
        && PROXY_PROPERTIES.containsKey(mPropertyName)) {
    setProperty(PROXY_PROPERTIES.get(mPropertyName));
}

Here, PROXY_PROPERTIES is a Map which maps the required property names to special Property classes that automatically use an instance of our proxy animation class. By setting a Property instance on the animation we will essentially override the string equivalent so that reflection on the method is not attempted.

Now you can enjoy advanced Honeycomb-style animation of post-Honeycomb View properties by simple changing your imports to use NineOldAndroids!

AnimatorSet set = new AnimatorSet();
set.playTogether(
    ObjectAnimator.ofFloat(myView, "rotationX", 0, 360),
    ObjectAnimator.ofFloat(myView, "rotationY", 0, 180),
    ObjectAnimator.ofFloat(myView, "rotation", 0, -90),
    ObjectAnimator.ofFloat(myView, "translationX", 0, 90),
    ObjectAnimator.ofFloat(myView, "translationY", 0, 90),
    ObjectAnimator.ofFloat(myView, "scaleX", 1, 1.5f),
    ObjectAnimator.ofFloat(myView, "scaleY", 1, 0.5f),
    ObjectAnimator.ofFloat(myView, "alpha", 1, 0.25f, 1)
);
set.setDuration(5 * 1000).start();

Download NineOldAndroids 2.0.0 from nineoldandroids.com and check it out on GitHub.

https://jakewharton.com/advanced-pre-honeycomb-animation

Something Beta This Way Comes!

Jan 2, 2012 Updated Jan 2, 2012

Show full content

Some implementation details…

There are 6 base activities, 4 of which are the core library and one in each of two plugins.

Core: SherlockActivity, SherlockPreferenceActivity, SherlockListActivity, and SherlockExpandableListActivity - Former two should be obvious, latter two shouldn’t really be used anymore. Try fragments.
Compat-Lib: FragmentActivity - Modified version of the official library to support having an action bar by default.
Maps: SherlockMapActivity - Since referencing the base class MapActivity requires compiling with the Google APIs this is in its own plugin. Remember: This does not work like FragmentMapActivity from v3.x.

Everything has been moved into the com.actionbarsherlock.* package tree. Stay away from com.actionbarsherlock.internal.*. Check your imports.

Again, as with v3.x, the default options menu methods have been marked final in the base activities so that you cannot use them erroneously. A majority of the supposed bugs that I receive have to do with incorrect imports. Check your imports and check the samples before filing a bug or emailing the mailing list.

Don’t file a bug or suggestion on anything related to the compat-lib plugin that does not directly relate to this library. As nice as it was to upstream some bugfixes I am not doing that anymore. File them on b.android.com.

SherlockPreferenceActivity does not have fragment or loader support like v3.x did. No I will not enable it. Yes I’ll look at porting PreferenceFragment in the future. Don’t ask for an ETA.

There are bugs and missing features. Check abs.io/4 before reporting anything! Check the samples. If a sample is missing, take a few minutes to write it.

Pay attention for new betas. Check the website often. You can even follow the site’s repository on GitHub for better notification: github.com/JakeWharton/beta.abs.io.

Want fixes sooner? Check the 4.0-wip branch. You’ll have to build the plugins yourself though if changes were made. Please don’t ask me how. Maven, SDK deployer, and mvn clean package.

Try everything. Write a new app, port an old app, write more samples. Do something. Don’t complain if you jump on the final release and find bugs without having trying the betas.

Use Theme.Sherlock. Use Theme.Sherlock!

There is no light theme… yet. Use black for testing. Don’t complain and don’t bother implementing it. A light and a light/dark action bar theme will be present in the release candidate.

Things are broken. Most is working. Try before you buy. All sales are final.

I’ll leave you with a semi-related, partially-humorous quote from Equilibrium (which is actually from W. Yeats)

But I, being poor, have only my dreams. I have spread my dreams under your feet. Tread softly because you tread on my dreams.

How to report bugs:

Fix it yourself and send a pull request…

Ok, you don’t have to do that but I’ll seriously love you for it.

Create a new issue on GitHub, include as much description, code, and images as humanly possible to make your problem apparent to someone who has never done Android. It’s not that don’t understand your problem, it’s that I don’t want to have to spent extra time deciphering it or have any doubt about what you think the problem is.

I’ll do my best to thank you no matter how severe of a bug you find :)

https://jakewharton.com/something-beta-this-way-comes

ActionBarSherlock - A Love Story (Part 3)

Jan 1, 2012 Updated Jan 1, 2012

Show full content

I am talking, of course, about version 3 and version 4, respectively. And I’m also lying a bit because I won’t just be abandoning the 3.x users either. I’ll give you a two-month deprecation window from version 4’s release. Because…

ActionBarSherlock v4 is coming and it is awesome.

Now I realize that I am a bit biased, but let me explain how this version is the first version that I think I will be truly proud of.

No more shuffling between native and custom implementations.

Google’s support library operates in this way in that it makes no attempt to use any native implementations even if they exist. It is far easier and more stable to keep all of the functionality in the library. Plus, the Android 4.0 action bar has been designed to accommodate every conceivable screen size that the platform can run on so why should we continue bothering to switch to the native implementation?

Additionally, this change became more and more of an apparent need rather than a choice due to changes in Android 4.0’s MenuItem interface.
The support library classes are no longer included in the core of the library.

Though somewhat ironic based on the last point, the decision to allow the library to stand alone was made in order to accommodate developers who were uncomfortable using a custom built version of the support library (or who even didn’t use it at all).

A version of the support library will be provided as a plugin .jar that has been modified to add ActionBarSherlock support. The changes to the library will be kept at a minimum and will not include any unrelated fixes. File bugs on b.android.com for that, please.

WARNING: This means that if you are using FragmentMapActivity or using fragments in SherlockPreferenceActivity you WILL have to change your implementation or create your own versions of these base classes. I will no longer be maintaining support for these.
Extending from a custom base activity is no longer required (but still recommended).

Similar to how ActionBarSherlock v1 and v2 operated, you can perform static attachment of the action bar to your activities. This allows for the use of alternate base activities such as those provided by other third-party libraries (e.g., RoboGuice).

The added side-effect of this is that all of the interaction logic has been placed within this single class which is also the one used by the base activities. This means that whether you do use a base activity or choose to interact with the static attachment you are afforded the full API.
Fully mirrored theming support to mimic the native action bar.

Forget the ‘ab’-prefixed attributes of v3.x, v4 now allows for defining proper styles for the action bar, action mode, and various other sub-components of the action views.
It is the Ice Cream Sandwich action bar!

…but you probably knew that already.

Split action bar, action modes, action providers, condensed tab navigation, and so much more!

I have been working on this for nearly 8 weeks now so it’s easy for me to get excited. Starting tomorrow the version 4 beta will be officially announced and detailed in a much more technical manner so that you can begin testing and hopefully join in the excitement.

As it stands now there are still large bugs and “bugs” with version 4. You can find them under milestone 4.0.0 on the GitHub issue tracker. As always, code contributions are welcomed and encouraged.

There is no timeline yet for the final release. There will be one or two release candidates before which is when I will be working with a few devs on real implementations to determine any problems that exist. If everything goes smoothly the final release will not be far behind that.

Thank you everyone for your support thus far. Happy new year to all.

https://jakewharton.com/actionbarsherlock-a-love-story-part-3

ActionBarSherlock - A Love Story (Part 2)

Dec 19, 2011 Updated Dec 19, 2011

Show full content

Despite this, however, most users probably have never used the first or second versions, let alone the “lost” third version which was scrapped just hours from release. In future posts we’ll look forward to where version four will take us. For this post, however, we’ll be taking a quick look back at the origins of the library.

Like most libraries, ActionBarSherlock was birthed out of personal necessity. With my recent migration of the majority of our servers at work to VMWare and the vSphere platform in January 2011, I wanted an app which allowed me to view essential VM information and perform quick vMotions from my phone. At this time there was only two apps of exceptionally inferior quality on the Android Market which offered such functionality—neither being open source. It’s easy to guess what happened next: I began down the path of writing my own.

Writing an app which interfaces with vCenter Server (essentially vSphere’s coordination hub) is no trivial task. I spent a month porting the Java SDK to run on Android. During this process of ripping and replacing I ended up writing my own SOAP client which married aggressive caching with lazy loading objects. During this I discovered a lot of fun little-known facts about Android and Dalvik (Did you know that when reflecting on a class’ properties Android will return them in alphabetical order while desktop Java returns them in declaration order?). After about one month I had a mostly-working API wrapper which led to the next logical step, creating the application shell.

At that time, and much to my amusement today, I thought GreenDroid to be the end-all, be-all library for implementing the common user interface patterns easily. If you’re following the timeline in your head, you might have already guessed that the Honeycomb SDK had landed this very same week and I chose to be forward thinking and support both phones and tablets with a single APK. This would, however, necessitate a bit of work since we were pre-Android Compatibility Library. I set off to adopt a proxy the action bar API of both GreenDroid and Honeycomb in a single custom API.

The first version was completed in one day. It proxied only the methods which I needed and required you implement two static inner-classes to handle the pre- and post-Honeycomb configurations. I discovered that both GreenDroid and Android used the getActionBar() method name which required a last-minute switch to Hameno’s fork where it was changed to getGDActionBar(). I had also discovered the compatibility library had launched and hastily slapped a mention of compatibility in the README file.

If you followed the link above you’ll notice that the entire library was a single class file and a rudimentary sample—a far cry from what it’s grown into today. Not very impressive, comprehensive, or even useful. I still dealt with menu inflation and action item creation manually!

The second version was released the next day, a complete rewrite. I came to realize that GreenDroid just wasn’t going to cut it despite its numerous beautiful widgets and added better support for implementing your own “pre-Honeycomb” handler. Android-ActionBar support was added, my new library of choice for the vSphere client app (you almost forgot, didn’t you).

Over the next few weeks I managed to get simple screens of the application working such as listing VMs, viewing their information, and browsing things like the datacenter objects and datastores. As more screens were introduced, a more dynamic action bar was required and as such I implemented more and more of the native ActionBar API. Two weeks after 2.0.0, I released version 2.1.0 which represented the first step towards where the library exists today. The compatibility was a required dependency, Maven was adopted as the build and release system, and APIs such as list navigation, menu inflation, and Fragment support was added.

At this point there were a handful of users who knew of the library and likely even less using it, progress on the VM app had stalled because of issues in getting my custom SOAP client working properly, and I had become aware of just how limiting my custom API really was. The following code snippet was taken from one of the samples of version 2.1.1:

ActionBarSherlock.from(this)
    .with(savedInstanceState)
    .layout(R.layout.activity_hello)
    .menu(R.menu.hello)
    .homeAsUp(true)
    .title(R.string.hello)
    .handleCustom(ActionBarForAndroidActionBar.Handler.class)
    .attach();

While there was nothing fundamentally wrong with this API in the context of my app, requests were coming in for the support of other action bar features. Constantly implementing methods for every item was clearly going to create a mess of a library. I chose once again to embark on a rewrite to afford a more flexible model.

Over the next month I reworked the entire library to be based on interfaces which provided a single action bar feature. This way, you could drop-in whatever third-party action bar library you wanted and only implement the feature interfaces in its handler that the library supported. On May 12th, 2011 the code was feature complete and ready for a 3.0.0 release. This tree, 83283d9f, is the “lost” 3.0.

At some point during that evening—and I have no recolection of the exact moment—I had an epiphany.

If I am providing an API for various action bar methods through a custom class, why am I not just providing the full API through a getSupportActionBar() method exactly like the compatibility library that I’m already dependent on?

It turns out that adapting my code to provide the full action bar API was the easy part and didn’t take long. The majority of the next month was spent working with Johan Nilsson to expand Android-ActionBar’s feature set to very nearly match that of Honeycomb.

Finally, on June 5th, 2011, version 3.0.0 was released which fully internalized the Android-ActionBar sources for a seamless mirroring of the native API on Android 1.6 and newer.

Releases on the 3.x branch came steadily over the next few months culminating in the release of 3.5.0 last night. If you’re reading this, you likely came aboard the ActionBarSherlock ship sometime during this time and probably reasonably familiar with how it operates. Through the support and contributions of the community it’s become quite a useful library—even solving problems such as supporting MapViews in fragments and preference activities.

I want to especially thank Cyril Mottier of GreenDroid fame and Johan Nilsson of Android-ActionBar fame. Despite now being my competition (not really), without their efforts the library would likely not exist. Thank you to all of the users who sent in pull requests. Thank you Chris Banes for fleshing out device support and providing tiny bug fixes and enhancements.

Thank you to all of the implementations. SeriesGuide, RateBeer, FriendCaster, Minus, Cargo Decoder, Folder Organizer, mAnalytics, Traktoid, CrossFit Travel, BubbleUPnP, Bird Bar, and the many, many more! (I’m working on a webapp to organize all of the implementations)

Now, for the second time, did you remember this blog post was about a vSphere application? The explosion of interest in ActionBarSherlock as well some of my other projects overwhelmed my free time and coupled with my unstable SOAP client I never was able to make any more progress. In fact, I have never written an application using any of my own libraries. …yet!

This week we’re not pouring one out, but rather I raise my glass to all of you, the ActionBarSherlock community. Cheers! See you at 4.0.

In the next installment of this series I will begin to talk about where version 4.0 will take us and how it is a return to its roots.

https://jakewharton.com/actionbarsherlock-a-love-story-part-2

ActionBarSherlock - A Love Story (Part 1)

Dec 1, 2011 Updated Dec 1, 2011

Show full content

The library has been out for nearly 9 months already and has seen 22 releases across 3 major versions. Fast approaching is the next major release, version 4.0. This will bring the full functionaity of Ice Cream Sandwich’s action bar to all relevant APIs and will be what I consider the first true release. I’m going to be writing a series of posts which talks about some various development decisions and the reasoning behind them as well as a bit of the history of the library. The series will culminate in the formal announcement and release of v4.0 some time in the near future. So strap in and hold on, with this first post I’m just going to tear off the band-aid…

I am officially dropping 1.6 support from ActionBarSherlock 4.0. This post originally was going to be a call for arguments against this, but with the release of the latest platform distributions I’m rounding 1.6’s share down to 1% which was my mental event horizon for its support.

This will likely upset a few people. I know of a lot of apps leveraging my library that still support 1.6. I even have an implementer who supports 1.5 because he has an extremely niche market with a bunch of users who own the Motorola i1!

I’ve said time and time again to developers that as long as the compatibility library supported 1.6 then so would I. For ActionBarSherlock 3.x this was an important thing because the library was so tightly integrated with the compat lib that it actually became a near drop-in replacement for it since its classes were included.

This will all change with ABS 4.0. I have chosen to take the library in a completely separate direction. While the next post will talk about the trials and tribulations of dealing with having a library which extends (and effectively replaces) the official compat lib, all we need to know for now is that the core library will now have zero dependencies.

Since the library is no longer immediately dependent on the compatibility library my justification for forcing support for Android 1.6 is no longer there. Now this is not to say that ActionBarSherlock 4.0 won’t be supporting tight integration with the compat lib—it most certainly will. I have instead simplified the main component of the library, the action bar, to stand alone.

As such, we can now focus solely on the betterment of the functionality that we’ve all come for. And with that comes waving a fond farewell to the version which I consider to be Android’s true “1.0”, API level 4.

To those that support Android 1.6 I commend you. It is in all sense of the word a bastard version. Development on Android was exploding and 1.5’s shortcomings were vastly documented (I’ll spare the links here) so 1.6 brought a bit of fresh air. In my opinion it was the first version of the platform that had the future in mind. Unfortunately for it, so were the next two API levels, the now defunct 2.0 and 2.0.1. Stability finally started to arrive in API 7, Android 2.1, which is to be the new minimum target of the library. This period of unrest saw the death of my primary argument against support 1.6, its classloader.

Android’s 1.6 classloader is an over-eager know-it-all. It seeks out and checks every method call present in your loaded (key word!) classes. This means that even if you’ve blocked out a section in a check against Build.VERSION.SDK_INT, Android 1.6 will check every method inside. This can be easily remedied by surrounding these calls in concise static inner-classes which is annoying, but doable. Problems arise when you need to call the superclasses version of a method you have implicitly overriden. You’re out of luck.

It turns out that the latter comes into play a lot in the ICS version of the action bar. Things such as accessibility and configuration changes on views are unable to call up to their superclass implementations. While probably not a show-stopper, couple this with constantly having to battle the former situation and it becomes a seemingly neverending battle of abstraction to these static classes.

My choice to drop Android 1.6 mostly comes out of wanting to be able to more rapidly develop and ship the library. I charge you, the reader (and hopefully ActionBarSherlock user), with the responsibility of determining whether maintaining support for 1.6 is feasible and also implementing it if you deem it so.

I’m sure some users will be still be angered by this decision despite the above explanation. Having developed, supported, and maintained this library for 9 months completely in my spare time and never charging a penny, I say, “Show me the money.” If you want to sponsor development of 1.6 support I will make every effort possible towards the effort. However, I’d rather you forked it, did it yourself, and sent a pull request… or better yet, just let it die.

Pour one out for Android 1.6 and all the users of it.

In the next installment of this series I will be talking about the history of ActionBarSherlock and how version 4.0 is a return to its roots.

https://jakewharton.com/actionbarsherlock-a-love-story-part-1

https://jakewharton.com/atom.xml

Posts