Gabriel Garcia Torrico

Jan 25, 2026

Looking back at 16 years of GitHub gists.

Show full content

About #

In this post, I look back at several years of recreational programming that I have saved as GitHub Gists.

I am also doing this to bring some visibility to the different experiments I have written over the years and to have useful links to share with people when I say, "Oh, I remember doing something like this at some point!"

Let's begin.

2007 #

I started programming around 2005-2006, exclusively for the web. I first got started with HTML and CSS and eventually moved on to learn PHP and SQL. I built my own website, my own blog, and a few other webapps. Unfortunately, all of that code is lost.

So 2007 is our starting point.

San Andreas Multiplayer Scripts #

I almost forgot about this! I used to play a mod for GTA: San Andreas that enabled multiplayer gameplay. The mod used Pawn for server scripts.

For example, I wrote a /givemygunto command that gives your current weapon to another player. I remember barely having a grasp of what was going on in terms of the language and programming in general; I just copied and pasted my way to my goals.

I am pretty sure this was the first non-web programming I ever did.

2008 # PunBB extensions #

I used to be an active member of the PunBB community. PunBB was a PHP forum software that was lightweight and fast, compared to alternatives like phpBB.

I wrote several extensions for it: Sitemap Generator, XML Backup, Improved Smilies System, Gravatar support, Genders support, Akismet Spam Protection, and some others.

The actual code for these extensions is no longer available, unfortunately.

Blog Themes #

I got a bit into web design around this time. I find it funny that I kept trying to get into blogging by writing my own blog software and even designing my own blog themes. All of this to no avail; I never actually wrote anything worthwhile until many years later.

Some of the themes I designed during that period: Black on White, Horizontal, Red, Blue, Blue Alt, and Simple.

2010 # codeigniter-mongodb #

This is a library that I wrote for an early version of CodeIgniter, an application framework for PHP.

Based on this line, I think this library was intended for use on OnePage, my first ever coding collaboration with another person (Buffer's Joel Gascoigne).

CSS3 Experiments #

CSS3 changed everything when it finally came about, and I was very excited about it.

I used it to replicate some desktop UIs: Windows 7 window chrome, iTunes for Windows, Pidgin for Linux, Windows 7 taskbar, Windows Live Mail desktop client.

imgur_upload.py #

A Python 2 script that uploads image files to Imgur. Apparently, I put some effort to make it cross-platform.

The gist also contains a .desktop file for easy drag-and-drop invocation from a desktop UI on Linux.

I think that I started learning Python in 2009. And I remember that in 2010, the Tornado web server/framework had become popular. I had tried to do something with it, but it was too advanced for me at the time. I was more comfortable with PHP's stateless execution mode. In this sense, simple Python scripts did fit my mental model of programming at the time.

2011 #

This was a gap year for me, before starting university.

This was also the year when I realized that PHP was not an end-game language, and so I started learning a bunch of languages at once: C, C++, C#, D, Go, Haskell, Java, Python (more of it).

I wasn't planning on learning D, but I was lucky enough to stumble upon Andrei Alexandrescu's "The D Programming Language" in a secluded section of a random bookstore in São Paulo. Looking back, this makes no sense at all. But it happened so. And I'm very glad it did, because it introduced me to a) Andrei's entertaining writing style, and b) that language design can be approachable and fun.

youtube_subs.py #

Another Python 2 script that appears to transform an RSS feed of YouTube subscription videos. I can't remember why I needed this.

I see the use of the BeautifulSoup library, which at some point had become a staple in my web scraping scripts. I found myself doing that frequently, and this library was very handy.

It's also funny that I thought that I needed to use global to read global variables. AFAIK, that's only needed for writing. Maybe I was trying to be explicit about what's local and what's global?

js-flash-audio-input #

Also not a gist, but it could have been. This seems to be a proof of concept on how to access the microphone from JavaScript before the Media Streams API became available a couple of years later.

Audio samples are captured by a Flash (ActionScript) component and fed into JavaScript via a callback. The demo then stores these samples to create a WAV file in memory and expose it to the user via a Data URL for playback in an Audio element.

minify.py #

Yet another Python 2 script/utility. This one reads a URL from the clipboard, minifies it, and writes that back to the clipboard. I also tried to make it cross-platform.

I remember URL minification being a big deal around this time, I'm not sure why. I think Twitter back then wouldn't do that automatically? I also remember Google's URL minifier coming in as a new player in this URL minification space, so I think I was just eager to implement a tool with it.

The code looks somewhat neat overall. It's very imperative, too.

kag-map-renderer #

A Python script that renders King Arthur's Gold maps.

I think this was one of my earliest ventures into graphics, so the code is very messy. But it did a decent job.

This eventually led to a collaboration with the game's creator, Michał Marcinkowski, to build a new website for the game.

2012 #

This year I started university.

For whatever reason, I barely used GitHub and gists during this period. So a lot the code that I have saved from this period lives on my Google Drive.

In particular, I wrote a lot of C++, because that's what my university's curriculum was based on. Because I was already familiar with the language, I used the spare time and energy going deeper into it. I got into template metaprogramming, decompiling a la Godbolt before that became a thing, poking at RTTI and vtable internals, and so on.

cinecenter.py #

This Python 2 script scraped the movie listings of my local cinema at the time.

I can see the usual combination of urllib2 and BeautifulSoup (version 4, this time). I also see that I structured the code in a class this time. Maybe I was getting into OOP. Though this is a very poor use of classes.

Pixelated (source) #

This is a clone of a Blackberry game by the same name. I remember playing this game on a friend's phone during a road trip. When we arrived at our location, I just had to pull out my laptop and code up the game.

2013 #

My second year of university.

static_iterator.cpp #

One of my first C++ template metaprogramming experiments. It seems that I was trying to emulate statically-defined loops through an iterators-like API. For example, the 32-bit constant 0xDEADBEEF would need 32 iterations to be turned into a std::array<bool, 32> and those were defined at compiled time.

screenshot.py #

A Python 2 script that takes a screenshot and uploads it to Imgur.

leonardo_numbers.cpp #

Computes (Leonardo numbers)[https://en.wikipedia.org/wiki/Leonardo_number] at compile time via C++ templates.

I remember having watched some Channel 9 videos by none other than Stephan T. Lavavej about template metaprogramming. I then reached out to him to ask some questions. He was kind enough to explain some concepts and techniques to me.

closure_list.js #

This appears to implement a list data structure in JavaScript using eval and dynamically-constructed variable names. The first element is stored in v, the second in vv, and so on. This is very weird! Maybe I was trying to understand closures and lexical scopes?

ruby_n_times.d #

The first appearance of D! Here I was trying to replicates Ruby's N.times construct in D using templates and UFCS.

I was probably trying to exemplify the expressiveness of D to some peer.

variadic_mergesort.cpp #

Implements mergesort with C++11 variadic templates.

Judging by the local variable named meow, this is probably a continuation of my exploration of template programming inspired by Stephan.

function_traits.cpp #

A C++ experiment that seems to extract RTTI information from each parameter of a function. Was I trying to undertand the C++ ABI? (Like array-to-pointer decay.) Or was I preparing to do some runtime inspection of types? I can't remember.

simplex.py #

I remember being taught the Simplex algorithm in a linear programming course at university. As the teacher explained it on the blackboard, I transcribed the steps to Python code. I'm pretty sure it didn't work out in the end, though!

tuple_explode.cpp #

Based on the code comment, this was also inspired by a Channel 9 video. I think the challenge was to call a function by exploding a std::tuple into a parameter pack.

My comment explains:

This is the first idea that came to mind after watching the video.

I wouldn't recommend its use, but it is one way of tackling the challenge.

devirt.cpp #

I remember this. I was trying to manually implement devirtualization. Maybe because I had heard about this compiler optimization and was trying to understand it?

2014 #

My third year of university.

count_digits.d #

A D experiment that counts the number of decimal digits of a runtime integer by building the binary search statically.

I vaguely remember this being inspired by one of Alexandrescu's talks.

descent.d, descent.hs #

I think around this time I was taking Andrew Ng's course on Machine Learning.

These two programs seem to implement one of the course's descent algorithms in D and in Haskell, respectively.

bk.py #

This is a funny one. Around the time, our local Burger King was running a campaign that allowed you to win some food vouchers by playing a football minigame. (2014 was a FIFA World Cup year, so that tracks.) You had to pick one of five positions to shoot and the goalie may or may not defend your shot. The minigame was completely luck-based.

Naturally, I wrote a Python script that played the game for you, for a number of email-password pairs.

automata-pila.hs #

A barebones stack automaton implementation in Haskell, for one of our programming language courses.

I wanted to showcase Haskell's conciseness to my teacher and peers.

pushdown.hs #

A similar thing but for pushdown automata.

This seems to make an attempt to generalize the concept with a type class and type families.

iterators.py, generators.py #

I think I was trying to explain Python's iterator and generator protocols to a peer.

The second script reads much more like a tutorial. This type of writing eventually became one my core style for explaining things, as demonstrated by some posts in this blog.

roundrobin-kernel.hs #

I do not remember this at all. It looks like a CPU process scheduling simulation in Haskell. I was probably taking an operating systems course at the time.

It looks pretty involved, though running it doesn't seem to do much. Maybe it is incomplete.

google_ascii.py #

A Python 3 script that picks a random image from a Google search, downloads it, and then outputs it as ASCII art. I remember writing this while bored during class.

2015 #

My fourth year of university.

Lambda.hs #

Possibly my first lambda calculus evaluator in Haskell. I'm actually surprised that it happened this late.

curry.cpp #

Implements function currying in C++14. I guess I still had some template programming itches to itch! I wrote a second version here: curry2.cpp.

meow.cpp #

This is a C++ type that records how many times it has been constructed, copied, moved, etc. I remember I used this to explain move semantics to someone.

translate_im.py, translate_im.d #

A toy script that translates a word from Spanish to Portuguese using Bing's API and then downloads a random image from Google using that word as a search term.

I remember writing this to help my then-girlfriend learn Portuguese.

I wrote the script in both Python and D. I think I was pleasantly surprised with D's conciseness compared to Python. Combined with rdmd (compile & run), D proved a good alternative to Python.

pptx_image_extractor.html #

A web 'script' that extracts images from a PPTX and saves them to a ZIP file.

I wrote this for my grandfather, who enjoyed sending and receiving PowerPoint presentations that contained pretty photos accompanied by music and inspirational quotes. He would manually copy the images to this computer and use them in his desktop wallpaper randomizer.

pixelated-hs, pixelated.d, pixelated.cpp #

This was an attempt at writing a solver for the game Pixelated. Despite my best attempt at the time, I think the algorithm had an exponential running time.

copiarlog.c #

It's not entirely clear what the purpose of this C program is. But I do remember having read The Linux Programming Interface and wanting to use some of the concepts I had learned.

encoderWAV.js #

Not the first time I write a WAV file encoder in JavaScript! I did that the first time with my js-flash-audio-input project in 2011. This time, though, I had access to typed arrays.

rx.speech.js #

This seems to be an RxJS wrapper for the Web Speech API. I can't remember what I used this for.

Program.dnx.il #

I was early adopter of .NET's new compiler toolchain back then. Before Roslyn was production-ready, they still relied on Mono for Linux builds. I found some issue and reported it.

cliente_sockets.c #

A toy HTTP client written in C. I remember I wrote this for an operating systems course. The HTTP server counterpart was written in Haskell, also using raw sockets.

json.hs #

I suppose this was my first attempt at using parser combinators to parse JSON in Haskell?

Temporizado.cs #

Some concurrency experiment in C#. I think I was trying to grasp C#'s memory model, based on references to 'relaxed atomic load' and such.

I remember being heavily influenced by Herb Sutter's atomic Weapons talks and also by CLR via C#.

pib.hs #

A Haskell script that appears to plot some linear regression model, mean-squared error, and a Savitzky–Golay filter. Probably did this for some simulations course.

I also remember having learned and used aivika, a simulation library for Haskell. Reading through its manual was very enlightening in terms of its use of monads and arrows.

fibs.ipynb #

Different ways of computing Fibonacci numbers in Haskell.

expr.hs #

Yet another lambda calculus in Haskell. This time, with a syntax and parser.

gene.{hs,py} #

A transcription of a genetic algorithm from a thesis print I found in my university's library. Unfortunately, I could not get it to work.

Parser.hs #

Possibly one of my first attempts at writing my own parser combinators in Haskell.

data.hs #

An early attempt at understanding Church (or Mogensen-Scott) encodings of data types in Haskell.

I think this also kickstarted my itch for initial/final encodings.

ffi.idr #

A basic demo of using the C FFI in Idris.

Apparently around this time I started playing around with Idris.

paks.{d,hs} #

I used to be an avid Arch Linux user. These scripts seem to print installed packages sorted by installation date?

2016 #

My fifth and last year of university.

This year I moved to Lausanne, Switzerland to work for a small algorithmic trading startup called Nafora. We used a lot of Python and Java there.

emscripten.php #

A PHP script that compiles C++ using emscripten and then runs the output on Node.js. What a weird combination of technologies! I am not sure what I built this for.

miaw.cpp #

A C++ interpreter for a basic stack-based language.

I think I was trying to prove to a friend that 'inventing' a language is not that hard.

test300.raw.xz.b64 #

300 seconds of 440Hz sine signal in raw 32bit IEEE Float format, xz-ipped, in base64

I have no idea what that was for. So cryptic!

async.js #

I think this tried to emulate async/await via generators in JavaScript.

ib_codes.json #

Around this time I was working with Interactive Brokers Trader Workstation API.

vortex.html #

A colleague of mine liked to draw this sort of spiral pattern on paper. So I wrote a JS script that draws the same pattern programmatically.

I decided to write it in a very functional style.

lambda_delegate.cpp #

Implements some C++/CLI helpers to be able to use C++ lambdas as CLI delegates.

I remember using C++/CLI a lot in 2012 because my university was really into C++ and WinForms. I found it really interesting how they managed to compile C++ for the CLI. But I have no idea why I was still looking into this in 2016!

lazy_primality.cpp #

It is 'lazy' in the sense that it just delegates the primality check to the factor program.

leaking.cpp #

A C++ template that purposefully leaks (doesn't call the destructor) of a given type.

simple-any.cpp #

Implements a toy version of std::any.

novice-tuple.cpp #

Implements a toy version of std::tuple.

JavaStringMemchr.java #

Uses C's memchr on a Java byte array. I guess I was experimenting with the JNI for the first time.

2017 #

This year I moved to Mendoza, Argentina to work for Eventbrite.

I wrote more Haskell code this and the next years because I started a Haskell study group at work. This is something that I have since done at every company I have worked for.

Circle.hs #

I think I had found a cool solution to some Leetcode-style problem. This gist describes my analysis.

2018 # HM.hs #

I think this implements Hindley-Milner style type inference.

EventDescriptionValidity.java #

A demo of using phantom types in Java to hold validation invariants.

Snake.hs #

Snake game in Haskell.

Quad.hs #

I think this implements Quadtree for 2D collisions in Haskell?

FunctionalParsers.java #

My first attempt at writing very functional Java, inspired by Haskell.

StackExpr.hs #

My first experiment with initial encodings and the interpretation problem in Haskell.

Vect.{hs,java} #

Length-indexed vectors in Haskell and Java.

2019 #

This year I moved to Amsterdam to work for Uber.

free_monad.py #

An attempt to build a sort of effect system in Python 3 using generators.

di.py #

A basic dependency injection container in Python 3 with type annotations.

2020 # Parser.java #

The first version of the code that eventually became my post on Declarative and composable input validation with rich errors in Java.

ExprCompiler.java #

Continuing my exploration of initial/final encodings, this time in Java.

TypedExpr.java #

A similar thing but with typed expressions.

free.py #

A continuation of the Python algebraic effects idea, which then became this blog post.

Life.hs #

Conway's Game of Life in Haskell.

Parsec.java #

A Parsec adaptation to Java.

2021 # MicroWire.java #

A toy dependency injection container in Java.

ParserClass.hs #

Further exploration of initial/final encodings in Haskell, applied to parsers.

I wasn't sure what I was really doing, so I started a thread on Haskell's Discourse and came to learn of those terms.

ParserClass.java #

I implemented the same thing in Java. This taught me about the higher kinds encoding for Java and about type equalities.

Nat.{cs,java,kt} #

Length-indexed vectors in C#, Java, and Kotlin.

hk.py #

Applying the higher kinds encoding to Python 3 with type annotations.

Sub.{cs,java} #

Just like the type equality witness above, this is a subtyping witness.

2022 # Stages.java #

An interesting idea about representing the type of each field in a structure with a unique type parameter. Then, computations can be split into 'stages', each on which accurately represents the change of structure in each field.

2024 # sql.go #

A SQL sort of DSL written in Go, which can be 'compiled' to actual SQL or can be executed into actual values.

2025 #

This year I started working for Picnic.

vscode-tunnel.php #

A PHP script that manages the background process for VS Code Remote Tunnels.

vscode-tunner.py #

The same thing but in Python 3 as a CGI script, also implementing Google authentication.

vscode-tunnel.tsx #

The same thing again but in Deno + TypeScript, also as CGI script.

refined.java #

An idea for refinement types in Java.

TypeClassSystem.java #

The very first version on my type classes implementation for Java.

This eventually became my post, Full Haskell-like Type Class resolution in Java.

rex.c #

A regular expressions DSL for C, inspired by Haskell's Parsec library. Continued in https://github.com/Garciat/parsing.c.

https://garciat.com/posts/gists-lookback/

Extensions

GHC Generics in Java via associated types

Dec 27, 2025

Extending my type class resolution library to support an approximation of associated types, so that I can build something like GHC Generics for Java.

Show full content

Background #

This post is a continuation of my previous post on Type Classes for Java.

During the implementation of type classes for Java, I had an interesting thought: what if we had GHC Generics in Java?

This could allow us to write data type-generic code without reflection. Not only could that be promising in terms of ergonomics and (in theory) efficiency, but also potentially more principled because the generic code then only has access to the structure of the data type and not the data type itself.

I parked the idea while I was still implementing the type class resolution library. But also because I just couldn't get past the challenge of how to properly encode associated types in Java.

Until now — I eventually managed to come up with a sufficient encoding, which we will explore later.

First, let's take a look at the Haskell code that we're trying to emulate.

The Goal #

class Generic a where
  type Rep a :: Type
  from :: a -> Rep a
  to :: Rep a -> a

type Rep a :: Type there is a case of associated type synonyms.

-- # (Reduced) Type Representation Constructors:

-- No-args constructor
data U1 = U1
-- Reference to type
data K1 a = K1 a
-- Product type
data a :*: b = a :*: b
-- Sum type
data a :+: b = L1 a | R1 b

-- # Example:

data Tree a = Leaf a | Node (Tree a) (Tree a)

-- Compiler-generated:
instance Generic (Maybe a) where
  type Rep (Maybe a) = K1 a :+: (K1 (Tree a) :*: K1 (Tree a))

  from = ...
  to = ...

-- # Usage:

data JSON
  = JsonString String
  | JsonInt Int
  | JsonObject [(String, JSON)]
  | JsonArray [JSON]

class ToJSON a where
  toJSON :: a -> JSON

class GenericToJSON a where
  gToJSON :: a -> Bool

instance GenericEq U1 where
  gToJSON U1 = JsonObject []

-- This may recurse!
instance ToJSON a => GenericToJSON (K1 a) where
  gToJSON (K1 a) = toJSON a

instance (GenericToJSON a, GenericToJSON b) => GenericToJSON (a :*: b) where
  gToJSON (a :*: b) = JsonArray [gToJSON a, gToJSON b]

instance (GenericToJSON a, GenericToJSON b) => GenericToJSON (a :+: b) where
  gToJSON (L1 a) = gToJSON a
  gToJSON (R1 b) = gToJSON b

-- (Note those implementations are just toy examples.)

instance ToJSON Int where
  toJSON i = JsonInt i

instance ToJSON a => ToJSON (Tree a) where
  toJSON a = gToJSON (from a)

The Encoding #

I first had to find a way to represent Haskell's associated type synonyms in Java.

My first idea was trying something with existentials:

@TypeClass
interface Generic<A> {
  <R> R accept(Visitor<A, R> visitor);

  interface Visitor<A, R> {
    <Rep> R visit(Function<A, Rep> from, Function <Rep, A> to);
  }
}

The Visitor type says: there exists a type Rep which is isomorphic to A (via from and to).

But that is not enough. It means the right thing, but we actually need access to Rep itself so that it can be recursively pattern-matched by type class instance resolution.

I then thought, go simple:

@TypeClass
interface Generic<A, Rep> {
  Rep from(A value);

  A to(Rep rep);
}

But I need Rep to be an output of the type class, not a regular type that gets matched during resolution.

So I 'just' declared it as such:

@TypeClass
interface Generic<A, @Out Rep> {
  Rep from(A value);

  A to(Rep rep);
}

That's it. That's the encoding I went for. Everything else unfolds from this.

This is not a normal type parameter.
It is computed by resolution, not supplied by the caller.
It changes resolution from unification-only → dependency-driven inference.

The expectation is that witness resolution will match on just Generic<A, _> and when it finds a match, the Rep type argument will flow outwards in an instance declaration:

@TypeClass.Witness
static <A, Rep> SomeReturnType example(Generic<A, Rep> g, GenericToJson<Rep> json);

In this example, GenericToJson<Rep> depends on the resolution of Generic<A, Rep>, out of which we will get the actual Rep type to resolve it with.

An Example #

We can now start translating the motivating Haskell code into Java:

// Type Representation Constructors
interface TyRep {
  record U1() {}

  record K1<A>(A value) {}

  sealed interface Sum<A, B> {
    record L1<A, B>(A left) implements Sum<A, B> {}

    record R1<A, B>(B right) implements Sum<A, B> {}
  }

  record Prod<A, B>(A first, B second) {}
}

// Example data type with Generic representation
sealed interface Tree<A> {
  record Leaf<A>(A value) implements Tree<A> {}

  record Node<A>(Tree<A> left, Tree<A> right) implements Tree<A> {}

  @TypeClass.Witness
  static <A> Generic<Tree<A>, Sum<K1<A>, Prod<K1<Tree<A>>, K1<Tree<A>>>>> generic() {
    return new Generic<>() {
      // mechanic transformation code
    };
  }
}

Defining generic instances looks like this:

interface JsonValue {
  record JsonString(String value) implements JsonValue {}

  record JsonInteger(int value) implements JsonValue {}

  record JsonObject(List<Prop> props) implements JsonValue {}

  record JsonArray(List<JsonValue> values) implements JsonValue {}

  record Prop(String key, JsonValue value) {}
}

@TypeClass
interface ToJsonGeneric<Rep> {
  JsonValue toJson(Rep rep);

  @TypeClass.Witness
  static ToJsonGeneric<TyRep.U1> u1() {
    return _ -> new JsonValue.JsonObject(List.of());
  }

  @TypeClass.Witness
  static <A> ToJsonGeneric<K1<A>> k1(Lazy<ToJson<A>> toJsonA) {
    return rep -> toJsonA.get().toJson(rep.value());
  }

  @TypeClass.Witness
  static <A, B> ToJsonGeneric<Prod<A, B>> prod(
      ToJsonGeneric<A> toJsonA, ToJsonGeneric<B> toJsonB) {
    return rep ->
        new JsonValue.JsonArray(
            List.of(toJsonA.toJson(rep.first()), toJsonB.toJson(rep.second())));
  }

  @TypeClass.Witness
  static <A, B> ToJsonGeneric<Sum<A, B>> sum(ToJsonGeneric<A> toJsonA, ToJsonGeneric<B> toJsonB) {
    return rep ->
        switch (rep) {
          case L1(var value) -> toJsonA.toJson(value);
          case R1(var value) -> toJsonB.toJson(value);
        };
  }
}

And the final bit that brings it all together:

@TypeClass
interface ToJson<A> {
  JsonValue toJson(A a);

  @TypeClass.Witness
  static ToJson<Integer> toJsonInteger() {
    return JsonValue.JsonInteger::new;
  }
}

sealed interface Tree<A> {
  // ...

  // Notice here how `Rep` is an "output" of `Generic<Tree<A>, Rep>`
  // which becomes an input for `ToJsonGeneric<Rep>`
  @TypeClass.Witness
  static <A, Rep> ToJson<Tree<A>> toJson(
      Generic<Tree<A>, Rep> generic, ToJsonGeneric<Rep> toJsonGeneric) {
    return tree -> toJsonGeneric.toJson(generic.from(tree));
  }

  // ...
}

The Machinery #

I introduced @Out which annotates a type parameter in a type class. It denotes that a witness for this type class outputs a type through the respective type argument.

The API is really simple. But then we have a couple of implementation challenges to tackle:

Revamping witness resolution so that types flow out of type variables whose target parameter is annotated with @Out.
Tying the knot for recursive data types such as Tree above, so that:
- resolution terminates, and
- the resolved witness constructor plan can be reified.

Let's tackle (2) first.

First, I introduced a wrapper type that introduces laziness for witness constraints:

interface Lazy<A> {
  A get();
}

It needs to be applied, for example, when handling the K1 type representation constructor, which may or may not introduce recursion. For example:

@TypeClass
interface ToJsonGeneric<Rep> {
  JsonValue toJson(Rep rep);

  @TypeClass.Witness
  static <A> ToJsonGeneric<K1<A>> k1(Lazy<ToJson<A>> toJsonA) {
    return rep -> toJsonA.get().toJson(rep.value());
  }
}

Then, I updated the recursive resolution algorithm:

Introduced a Lazy constructor for the ParsedType ADT. This marks the request for lazy resolution by the client.
Started keeping track of previously seen resolution targets. (Similar to a DFS algorithm.)
When the resolution target is has Lazy on the head, then:
- If this the type under the Lazy constructor has been seen before, then emit a LazyLookup node.
  - In this case, witness reification is expected to keep its own cache to tie the know and use this a signal to look up in the cache.
- Otherwise, recurse into resolution as usual, but wrap the result in a LazyWrap node.
  - This is so that witness reification can correctly return an instance of Lazy<A>, even though we have a concrete object.

Now, onto tackling (1).

At a high level:

Type variables which are arguments to @Out-annotated type parameters are annotated in the ParsedType structure somehow.
After finding a single witness constructor candidate (through unification and overlapping instances reduction):
- The candidate's constraints (its argument dependencies) are substituted with the result from unification. (This is not new).
  - These resolved constraint types may contain unbound type variables.
  - Some of which are under an Out constructor. (Note: unification ignores Out nodes.)
- From each constrain type, we can derive: (provides: List<Var>, expects: List<Var>).
  - Type variables under Out constructors go in provides and the rest go under expects.
- This denotes a directed graph. (Possibly acyclic due to Java's type system; not sure.)
- Constraints can be topologically sorted, from fewest expectations to most. (Like a TreeMap<Integer, List<Constraint>>.)
- At each expectation stratum:
  - Resolve each constraint type.
  - Unify each newly-resolved constraint type with the original constraint type.
  - Merge all substitutions maps and apply them to all of the constraint types (or just the following stratum).
- By the end of this, there should be no unbound type variables.
  - If there are any, resolution fails.
- Resolution succeeds with a witness constructor match where its return type and constraint types have been fully substituted .

The Code #

I prototyped this solution in this PR, and the latest version (as of writing) of the instance resolution code is here. It reads slightly better than the prototype.

What else can we do? #

It turns out that now we can do a fair bit of type-level computation!

A fantastic inspiration for this is Alexis King's An introduction to typeclass metaprogramming.

Flattening of arbitrarily nested lists #

Based on this Haskell code:

type family ElementOf a where
  ElementOf [[a]] = ElementOf [a]
  ElementOf [a]   = a

class Flatten a where
  flatten :: a -> [ElementOf a]

instance Flatten [a] where
  flatten x = x

instance {-# OVERLAPPING #-} Flatten [a] => Flatten [[a]] where
  flatten x = flatten (concat x)

We can now write in Java:

@TypeClass
interface Flatten<A, @Out T> {
  List<T> flatten(A list);

  @TypeClass.Witness
  static <A> Flatten<List<A>, A> here() {
    return list -> list;
  }

  @TypeClass.Witness(overlap = TypeClass.Witness.Overlap.OVERLAPPING)
  static <A, R> Flatten<List<List<A>>, R> there(Flatten<List<A>, R> e) {
    return list -> list.stream().flatMap(innerList -> e.flatten(innerList).stream()).toList();
  }
}

And the usage code looks like:

Flatten<List<String>, String> e1 = witness(new Ty<>() {});

assertThat(e1.flatten(List.of("a", "b", "c")))
    .isEqualTo(List.of("a", "b", "c"));

Flatten<List<List<String>>, String> e2 = witness(new Ty<>() {});

assertThat(e2.flatten(List.of(List.of("a", "b"), List.of("c"))))
    .isEqualTo(List.of("a", "b", "c"));

Natural number addition #

void example() {
  ReifyNatAdd<S<S<Z>>, S<S<S<Z>>>> reifyAdd = witness(new Ty<>() {});

  assertThat(reifyAdd.reify()).isEqualTo(5);
}

sealed interface Nat<N extends Nat<N>> {
  record Z() implements Nat<Z> {}

  // Note that we don't store the predecessor!
  record S<N extends Nat<N>>() implements Nat<S<N>> {}
}

@TypeClass
interface ReifyNat<N extends Nat<N>> {
  int reify();

  @TypeClass.Witness
  static ReifyNat<Z> reifyZ() {
    return () -> 0;
  }

  @TypeClass.Witness
  static <N extends Nat<N>> ReifyNat<S<N>> reifyS(ReifyNat<N> rn) {
    return () -> 1 + rn.reify();
  }
}

@TypeClass
interface NatAdd<A, B, @Out C> {
  Unit trivial();

  @TypeClass.Witness
  static <B extends Nat<B>> NatAdd<Z, B, B> addZ() {
    return Unit::unit;
  }

  @TypeClass.Witness
  static <A extends Nat<A>, B extends Nat<B>, C extends Nat<C>> NatAdd<S<A>, B, S<C>> addS(
      NatAdd<A, B, C> prev) {
    return Unit::unit;
  }
}

@TypeClass
interface ReifyNatAdd<A, B> {
  int reify();

  @TypeClass.Witness
  static <A extends Nat<A>, B extends Nat<B>, C extends Nat<C>> ReifyNatAdd<A, B> reifyAddS(
      NatAdd<A, B, C> addAB, ReifyNat<C> reifyC) {
    return reifyC::reify;
  }
}

https://garciat.com/posts/java-associated-types/

Extensions

Full Haskell-like Type Class resolution in Java

Dec 6, 2025

Typeclasses, Higher-Kinded Types, and Overlapping Instances in Pure Java: A post about re-implementing a mini Haskell in Java, using reflection, generic metadata, and type unification.

Show full content

TL;DR #

This post explores how far Java’s type system can be pushed by building a small Haskell-style type class instance resolution engine entirely in plain Java.

Using only reflection, generic type metadata, and a tiny first-order unifier, we can automatically resolve type class instances (including higher-kinded ones), support overlapping instances, and derive witnesses for arbitrarily nested generic types.

It’s not intended for production, but as an experiment it reveals just how much structure Java actually preserves at runtime — and how surprisingly close it can get to type-level programming without language changes.

You can also jump straight to the full implemention here:

https://gist.github.com/Garciat/204226a528018fa7d10abb93fa51c4ca

Update: I published a GitHub repo for the project at github.com/Garciat/java-type-classes.

Intro: What are Type Classes? #

Consider:

interface Equatable<T> {
  boolean eq(T other);
}

record Point(int x, int y) implements Equatable<Point> {
  @Override
  public boolean eq(Point other) {
    return x == other.x() && y == other.y();
  }
}

That's a pretty simple OOP-style generic interface.

However:

record Dup<A>(A fst, A snd) implements Equatable<Dup<A>> {
  @Override
  public boolean eq(Dup<A> other) {
    return fst.eq(other.fst()) && snd.eq(other.snd());
    // Error: no method eq() in type A!
  }
}

We can't implement eq() for Dup without knowing the fact that A extends Equatable<A>.

So let's try that:

record Dup<A extends Equatable<A>>(A fst, A snd) implements Equatable<Dup<A>> {
  @Override
  public boolean eq(Dup<A> other) {
    return fst.eq(other.fst()) && snd.eq(other.snd());
    // OK
  }
}

Now the code compiles.

However, Dup is now only compatible with types that extend Equatable. Dup is now less capable.

We could have checked at runtime if each object is instanceof Equatable, but that doesn't work very well with generics. And it may fail at runtime.

Ideally, there would be a way to say:

Dup implements Equatable<Dup<A>> if and only if A implements Equatable<A>.

Also, if you think about it, it is a bit strange that the definition of equality for a type A depends on the object instance. As in, eq() is an instance method.

It would make more sense if equality depended only on the type A itself. Right?

That's where type clases come in:

interface Eq<A> {
  boolean eq(A x, A y);
}

record Point(int x, int y) {
  public static Eq<Point> eq() {
    return (p1, p2) ->
        p1.x() == p2.x()
        && p1.y() == p2.y();
  }
}

record Dup<A>(A fst, A snd) {
  public static <A> Eq<Dup<A>> eq(Eq<A> eqA) {
    return (d1, d2) ->
        eqA.eq(d1.fst(), d2.fst())
        && eqA.eq(d1.snd(), d2.snd());
  }
}

Let's unpack this:

Eq<A> now compares two values of type A.
- This indicates that we are no longer expecting the implicit this parameter to be relevant in equality.
Point no longer implements the interface.
- Now it provides a static constructor (factory) for Eq<Point>.
- We call this a witness that Point conforms to Eq.
Dup also does not implement the interface.
- Its static constructor for Eq<Dup<A>> has a parameter of type Eq<A>.
  - This indicates the dependency that we were looking for:
  - In order to prove Eq<Dup<A>>, we must first prove Eq<A>.
- It uses the witness Eq<A> eqA, which contains boolean eq(A, A), to implement its own intended behavior.

The code is rather simple, but it marks a dramatic change of perspective:

Equality now belongs to the type definition and not to an object instance.

So, what are type classes?

Type classes can be seen as a pattern that allows us to model shared behavior in a way that puts types first AND is entirely compositional.

Now, in the above example, how do we actually use Dup's eq()?

Eq<Dup<Point>> pointDupEq = Dup.eq(Point.eq());

pointDupEq.eq(new Point(1, 1), new Point(1, 1));
// Returns: true

We must first construct the witness Eq<Dup<Point>> by chaining calls to the respective Eq witness constructors of Dup and Point.

Then, we use this witness to access/invoke the actual logic that we're after.

Notice how this pattern composes very nicely:

static <K, V> Eq<Map<K, V>> mapEq(Eq<K> eqK, Eq<V> eqV) { ... }

static <E> Eq<List<E>> listEq(Eq<E> eqE);

static Eq<Integer> integerEq();

static Eq<String> stringEq();

// then:

Eq<Map<String, List<Integer>>> eq =
  mapEq(stringEq(), listEq(integerEq()));

Map<String, List<Integer>> m1 = ...;
Map<String, List<Integer>> m2 = ...;

eq.eq(m1, m2);

With witness constructors as our building blocks, we can recursively construct infinitely many witnesses, as needed.

A note on nomenclature:

In Haskell, type class 'implementations' are called instances.

Here, I am calling them witnesses because that's how Java's Architect, Brian Goetz, decided to refer to them in a recent talk he gave about bringing type classes to Java sometime in the future.

Now, wouldn't it be great if we didn't have to manually construct these witnesses?

Goal: Programmatically instantiating witnesses #

Given:

@TypeClass
interface Show<A> {
  String show(A value);

  // Convenience shortcut
  static <A> String show(Show<A> showA, A value) {
    return showA.show(value);
  }

  @TypeClass.Witness
  static Show<Integer> showInteger() { ... }

  @TypeClass.Witness
  static <A> Show<List<A>> showList(Show<A> showA) { ... }
}

Instead of:

Show<List<List<Integer>>> w =
    Show.showList(Show.showList(Show.showInteger()));

Show.show(w, List.of(List.of(1, 2), List.of(3, 4)));
// Returns: "[[1, 2], [3, 4]]"

We want:

Show.show(witness(), List.of(List.of(1, 2), List.of(3, 4)));
// Returns: "[[1, 2], [3, 4]]"

Which means that the witness() method is able to automatically construct the required witness for Show<List<List<Integer>>>.

How do we get there?

First, we must understand some aspects of how Java generics work at runtime.

Aside: Java Generics and Type Erasure #

It is a well-known fact that Java generics get erased at runtime.

However, there are a few scenarios in which generics do get reified!

Note: 'Type Reification' is the process by which type information is made concrete and available at runtime.

Given:

interface Stream<T> { ... }

interface IntStream extends Stream<Integer> { ... }

Then:

IntStream.class.getInterfaces()[0];
// Returns: Stream

IntStream.class.getGenericInterfaces()[0];
// Returns: Stream<Integer>

This means that Java did preserve the generic type arguments for the supertype Stream<Integer>. (We just need to know where to look for it.)

Note that this also occurs with anonymous classes:

var s = new Stream<String>() { ... };

s.getClass().getGenericInterfaces()[0];
// Returns: Stream<String>

This leads us to our first workaround:

Aside: Capturing static types #

We define:

interface Ty<T> {
  default Type type() {
    return requireNonNull(
        ((ParameterizedType) getClass().getGenericInterfaces()[0])
            .getActualTypeArguments()[0]);
  }
}

Which lets us write:

Type type = new Ty<Map<String, List<Integer>>>() {}.type();

println(type);
// Prints: Map<String, List<Integer>>

Why do we need this?

Because our witness() method will need a runtime representation of the type that it is trying to instantiate.

So our use case has turned into:

Show.show(witness(new Ty<>() {}), List.of(List.of(1, 2), List.of(3, 4)));

Note that we do not have to specify the type argument for Ty<>. Thankfully, type inference does that for us.

Type inference works in this case because the List.of(...) parameter has a well-defined type, and the call to <A> Show.show(Show<A>, A) lets type inference flow from the second argument to the first.

Aside: Understanding java.lang.reflect.Type's hierarchy #

The standard java.lang.reflect.Type hierarchy consists of:

Class<?>
GenericArrayType
ParameterizedType
TypeVariable<?>
WildcardType

Where:

Class<?>, sometimes referred to as a 'raw type', represents either:
- A non-generic type like String or Integer.
- A generic type like Function<T, R> where its type parameter list [T, R] can be retrieved via TypeVariable<?>[] getTypeParameters().
GenericArrayType represents an array type E[] whose component type E is a ParameterizedType or a TypeVariable<?>.
ParameterizedType represents a type T<Arg1, Arg2, ..., ArgN>. Its getRawType() method returns the class or interface T.
TypeVariable<?> represents an unbound type parameter of a generic declaration like a generic class/interface or generic method.
WildcardType represents an occurrence of ? within a ParameterizedType.

Moreover, a java.lang.reflect.Method can be queried via TypeVariable<?>[] getTypeParameters(), Type[] getGenericParameterTypes(), and Type getGenericReturnType().

For example:

class Example {
  static <A> List<A> hello(String s, A[] arr, Optional<?> opt) { ... }
}

Inspecting the Example's hello method at runtime would yield something like:

Method(
  name: "hello",
  typeParameters:
    [TypeVariable(A)],
  genericReturnType:
    ParameterizedType(
      rawType: List.class,
      typeArguments: [TypeVariable(A)],
    ),
  genericParameterTypes:
    [
      String.class,
      GenericArrayType(componentType: TypeVariable(A)),
      ParameterizedType(
        rawType: Optional.class,
        typeArguments: [WildcardType()]
      )
    ]
)

Note that the three ocurrences of TypeVariable(A) are unique in the sense that they compare equal() to each other, but not to other type variable instances, even if their names are also A.

Subgoal: Parsing java.lang.reflect.Type into an AST #

We want to do this for a few reasons:

java.lang.reflect.Type is not convenient for programming:
- It is not a sealed hierarchy, so pattern-matching on it is error-prone.
- Both Class<?> and GenericArrayType may be array types.
- Class<?> may represent a primitive type, like int or float, which does not participate in generic parameterization (yet).
- Type application as represented by ParameterizedType is variadic. Single-parameter-a-time is easier to program against.
When we get to higher-kinded type classes (spoiler!), then we will need our own type representation anyway.

Here's the AST:

sealed interface ParsedType {
  record Var(TypeVariable<?> java) implements ParsedType {}

  record App(ParsedType fun, ParsedType arg) implements ParsedType {}

  record ArrayOf(ParsedType elementType) implements ParsedType {}

  record Const(Class<?> java) implements ParsedType {}

  record Primitive(Class<?> java) implements ParsedType {}
}

And these are our parsing rules:

sealed interface ParsedType {
  // ...

  static ParsedType parse(Type java) {
    return switch (java) {
      case Class<?> arr when arr.isArray() ->
          new ArrayOf(parse(arr.getComponentType()));
      case Class<?> prim when prim.isPrimitive() ->
          new Primitive(prim);
      case Class<?> c ->
          new Const(c);
      case TypeVariable<?> v ->
          new Var(v);
      case ParameterizedType p ->
          parseAll(p.getActualTypeArguments()).stream()
              .reduce(parse(p.getRawType()), App::new);
      case GenericArrayType a ->
          new ArrayOf(parse(a.getGenericComponentType()));
      case WildcardType w ->
          throw new IllegalArgumentException("Cannot parse wildcard type: " + w);
      default ->
          throw new IllegalArgumentException("Unsupported type: " + java);
    };
  }
}

Note: the rule for ParameterizedType will take a type like T<A, B> and turn it into App(App(Const(T), A), B).

This means that the generic type T is first applied to A and then to B.

For example, Map<Integer, List<String>> becomes:

App(
  App(Const(Map.class), Const(Integer.class)),
  App(Const(List.class), Const(String.class))
)

That's it! Really, it's not much, but the added uniformity will help our code down the line.

Recap #

Until this point we have:

new Ty<T>() {}: a way to capture a type T from a static context and access it at runtime.
ParsedType: a uniform representation for Java's java.lang.reflect.Types.

Now, let's move on to the problem of type class instance resolution.

Subgoal: Witness resolution #

Let's consider the following scenario:

@TypeClass
interface Show<A> {
  String show(A value);

  // Convenience shortcut
  static <A> String show(Show<A> showA, A value) {
    return showA.show(value);
  }

  @TypeClass.Witness
  static Show<Integer> showInteger() { ... }

  @TypeClass.Witness
  static <A> Show<List<A>> showList(Show<A> showA) { ... }
}

record Pair<A, B>(A fst, B snd) {
  @TypeClass.Witness
  public static <A, B> Show<Pair<A, B>> show(Show<A> showA, Show<B> showB) { ... }
}

We observe that, for example, in order to summon a witness for Show<List<Pair<Integer, List<Integer>>>, then we must apply several witness constructors recursively.

What is a witness constructor? It is a public static method annotated with @TypeClass.Witness.

But first, we must find the witness constructors!

How do we do that? Do we need to do a runtime scan of all loaded classes?

That would be way too complicated.

In order to reduce our search scope, we can define the following convention:

When trying to resolve a witness C<T> for some type class C and a concrete type T, then we only look for witness constructors within the definitions of C and T and nowhere else.

For example, when looking for Show<Integer>, then we scan the methods of Show and Integer.

Generally, we will prefer to define witness constructors within concrete types and not within type classes. However, for a built-in type like Integer, we have no choice and we must define its witness constructors within the relevant type classes, as we have done with static Show<Integer> showInteger().

Now that we are able to find the relevant witness constructors, how do we:

know which of them we must invoke,
and in which order?

First, we must understand type unification:

Aside: Type Unification #

Type unification is the process of finding a substitution that makes two types identical.

For example, given Pair<[A], String> and Pair<Integer, String>, then the substitution {A -> Integer} would make the first type identical to the second.

Here, I have placed the type A between square brackets to indicate that it is a type variable. We only substitute type variables!

Unification may fail when it encounters incompatible types. For example, List<Integer> and List<String> cannot be unified because no substitution exists that would make them identical.

Conversely, when two types are already identical, unification succeeds with an empty substitution {}.

Because types are recursive, the unification algorithm must also be recursive (in nature).

Consider:

unify(
  Map<Pair<String, [T]>, List<[U]>>,
  Map<Pair<String, Integer>, List<Optional<Boolean>>>
)

// we are unifying two types of shape:
// - Map<K1, V1>
// - Map<K2, V2>
// the type constructors match on either side (Map)
// so we recurse to their type arguments:
// unify(K1, K2) ∪ unify(V1, V2)
// and so on!

=   unify(Pair<String, [T]>, Pair<String, Integer>)
  ∪ unify(List<[U]>, List<Optional<Boolean>>)

=   unify(String, String) // first arguments of Pair
  ∪ unify([T], Integer)   // second arguments of Pair
  ∪ unify([U], <Optional<Boolean>) // arguments of List

=   {}
  ∪ {T -> Integer}
  ∪ {U -> Optional<Boolean>}

= {T -> Integer, U -> Optional<Boolean>}

// Done!

If any unification step fails, then the entire unification fails:

unify(Pair<[T], Integer>, Pair<String, String>)

=   unify([T], String)
  ∪ unify(Integer, String)

=   {T -> String}
  ∪ err

= err

Following is a code listing for the concrete type unification algorithm unify() for our ParsedType definition, along with a subsitute() method that applies a substitution to a type.

Listing: Type Unification algorithm for ParsedType #

Here we use the Maybe type, which is analogous to Java's Optional. It is used to indicate that the unify() function may fail with a nothing() (empty) case. This is different from it returning just(Map.of()) (an empty map), which signals success but without any substitutions. In short, based on the examples above, err is nothing() and {} is just(Map.of()).

A key step here is the unification of App. This is the main driver for the algorithm's recursion.

The following helper methods are relevant to this step:

// applies function `f` if both `ma` and `mb` are present
<A, B, C> Maybe<C> apply(BiFunction<A, B, C> f, Maybe<A> ma, Maybe<B> mb);

// merges two maps
// it fails at runtime if duplicate keys /with different values/ exist
<K, V> Map<K, V> merge(Map<K, V> m1, Map<K, V> m2);

Here's the full code:

class Unification {
  public static Maybe<Map<ParsedType.Var, ParsedType>> unify(ParsedType t1, ParsedType t2) {
    return switch (Pair.of(t1, t2)) {
      case Pair<ParsedType, ParsedType>(ParsedType.Var var1, ParsedType.Primitive p) ->
          Maybe.nothing(); // no primitives in generics
      case Pair<ParsedType, ParsedType>(ParsedType.Var var1, var t) ->
          Maybe.just(Map.of(var1, t));
      case Pair<ParsedType, ParsedType>(ParsedType.Const const1, ParsedType.Const const2)
          when const1.equals(const2) ->
          Maybe.just(Map.of());
      case Pair<ParsedType, ParsedType>(
              ParsedType.App(var fun1, var arg1),
              ParsedType.App(var fun2, var arg2)) ->
          Maybe.apply(Maps::merge, unify(fun1, fun2), unify(arg1, arg2));
      case Pair<ParsedType, ParsedType>(
              ParsedType.ArrayOf(var elem1),
              ParsedType.ArrayOf(var elem2)) ->
          unify(elem1, elem2);
      case Pair<ParsedType, ParsedType>(
              ParsedType.Primitive(var prim1),
              ParsedType.Primitive(var prim2))
          when prim1.equals(prim2) ->
          Maybe.just(Map.of());
      default ->
          Maybe.nothing();
    };
  }

  public static ParsedType substitute(Map<ParsedType.Var, ParsedType> map, ParsedType type) {
    return switch (type) {
      case ParsedType.Var var ->
          map.getOrDefault(var, var);
      case ParsedType.App(var fun, var arg) ->
          new ParsedType.App(substitute(map, fun), substitute(map, arg));
      case ParsedType.ArrayOf var ->
          new ParsedType.ArrayOf(substitute(map, var.elementType()));
      case ParsedType.Primitive p ->
          p;
      case ParsedType.Const c ->
          c;
    };
  }

  public static List<ParsedType> substituteAll(
      Map<ParsedType.Var, ParsedType> map, List<ParsedType> types) {
    return types.stream().map(t -> substitute(map, t)).toList();
  }
}

Extra: A type representation for static methods #

record FuncType(Method java, List<ParsedType> paramTypes, ParsedType returnType) {
  public static FuncType parse(Method method) {
    if (!Modifier.isStatic(method.getModifiers())) {
      throw new IllegalArgumentException("Method must be static: " + method);
    }
    return new FuncType(
        method,
        ParsedType.parseAll(method.getGenericParameterTypes()),
        ParsedType.parse(method.getGenericReturnType()));
  }
}

Subgoal: Witness resolution, part 2 #

Why is type unification relevant?

Type unification will guide the witness resolution process in two ways:

By checking if any given witness constructor is relevant to our witness goal.
And if it is relevant, then it will tell us which witness subgoals to resolve next.

For example, when resolving Show<List<Integer>>, we check:

Which witness constructors can we find?

We scan List:
- It contains zero witness constructors.
We scan Show, and it contains:
- Show<Integer> showInteger()
- <A> Show<List<A>> showList(Show<A> showA)

Which witness constructors apply to our goal?

Does Show<Integer> showInteger() apply?
- We try to unify Show<Integer> and Show<List<Integer>>.
- Unification fails.
- Skip this constructor ❌
Does <A> Show<List<A>> showList(Show<A> showA) apply?
- We try to unify Show<List<A>> and Show<List<Integer>>.
- Unification succeeds with the substitution {A -> Integer}.
- We can use this constructor ✅
- But this constructor has an argument: Show<A>
- If we apply the substitution to the argument, we find our next goal:
  - substitute({A -> Integer}, Show<A>) yields Show<Integer>.
- Add Show<Integer> to our goals.

Recurse until we have no further goals.

That outlines the witness resolution algorithm.

Recap #

Until this point we have:

new Ty<T>() {}: a way to capture a type T from a static context and access it at runtime.
ParsedType: a uniform representation for Java's java.lang.reflect.Types.
Unification: an algorithm for type unification of ParsedTypes.
And a rough sketch for the overall recursive witness resolution algorithm.

Now, let's put it all together!

Subgoal: Witness resolution implementation #

Let's start with the witness constructor lookup code:

@Retention(RetentionPolicy.RUNTIME)
@interface TypeClass {
  @Retention(RetentionPolicy.RUNTIME)
  @interface Witness {}
}

class TypeClasses {
  // ...

  private static List<InstanceConstructor> findRules(ParsedType target) {
    return switch (target) {
      case ParsedType.App(var fun, var arg) ->
          Lists.concat(findRules(fun), findRules(arg));
      case ParsedType.Const(var java) ->
          rulesOf(java);
      case ParsedType.Var(var java) ->
          List.of();
      case ParsedType.ArrayOf(var elem) ->
          List.of();
      case ParsedType.Primitive(var java) ->
          List.of();
    };
  }

  private static List<InstanceConstructor> rulesOf(Class<?> cls) {
    return Arrays.stream(cls.getDeclaredMethods())
        .filter(TypeClasses::isWitnessMethod)
        .map(FuncType::parse)
        .map(InstanceConstructor::new)
        .toList();
  }

  private static boolean isWitnessMethod(Method m) {
    return m.accessFlags().contains(PUBLIC)
        && m.accessFlags().contains(STATIC)
        && m.isAnnotationPresent(TypeClass.Witness.class);
  }

  private record InstanceConstructor(FuncType func) {}

  // ...
}

Now, let's move on to how we choose relevant witness constructors and find our subsequent goals:

class TypeClasses {
  // ...

  private static List<Candidate> findCandidates(ParsedType target) {
    return findRules(target).stream()
        .flatMap(
            rule ->
                rule
                    .tryMatch(target)
                    .map(requirements -> new Candidate(rule, requirements))
                    .stream())
        .toList();
  }

  private record Candidate(WitnessRule rule, List<ParsedType> requirements) {}

  // Spoiler: we will have another subtype of WitnessRule later on (;
  private sealed interface WitnessRule {
    Maybe<List<ParsedType>> tryMatch(ParsedType target);

    Object instantiate(List<Object> dependencies);
  }

  private record InstanceConstructor(FuncType func) implements WitnessRule {
    @Override
    public Maybe<List<ParsedType>> tryMatch(ParsedType target) {
      return Unification.unify(func.returnType(), target)
          .map(map -> Unification.substituteAll(map, func.paramTypes()));
    }

    @Override
    public Object instantiate(List<Object> dependencies) {
      try {
        return func.java().invoke(null, dependencies.toArray());
      } catch (Exception e) {
        throw new RuntimeException(e);
      }
    }
  }

  // ...
}

Then, the code that drives the recursive resolution:

class TypeClasses {
  // ...

  private static Either<SummonError, Object> summon(ParsedType target) {
    return switch (ZeroOneMore.of(findCandidates(target, context))) {
      case ZeroOneMore.One<Candidate>(Candidate(var rule, var requirements)) ->
          summonAll(requirements, context)
              .map(rule::instantiate)
              .mapLeft(error -> new SummonError.Nested(target, error));
      case ZeroOneMore.Zero<Candidate>() ->
          Either.left(new SummonError.NotFound(target));
      case ZeroOneMore.More<Candidate>(var candidates) ->
          Either.left(new SummonError.Ambiguous(target, candidates));
    };
  }

  private static Either<SummonError, List<Object>> summonAll(List<ParsedType> targets) {
    return Either.traverse(targets, TypeClasses::summon);
  }

  private sealed interface SummonError {
    record NotFound(ParsedType target) implements SummonError {}

    record Ambiguous(ParsedType target, List<Candidate> candidates) implements SummonError {}

    record Nested(ParsedType target, SummonError cause) implements SummonError {}
  }

  // ...
}

And, finally, the public entry point for all of it:

class TypeClasses {
  public static <T> T witness(Ty<T> ty) {
    return switch (summon(ParsedType.parse(ty.type()))) {
      case Either.Left<SummonError, Object>(SummonError error) ->
          throw new WitnessResolutionException(error);
      case Either.Right<SummonError, Object>(Object instance) -> {
        @SuppressWarnings("unchecked")
        T typedInstance = (T) instance;
        yield typedInstance;
      }
    };
  }

  public static class WitnessResolutionException extends RuntimeException {
    private WitnessResolutionException(SummonError error) {
      super(error.format());
    }
  }

  // ...
}

Surprinsingly, that's it!

You can find a mostly accurate compilation of all of the above code in this Gist. (This was the first version of the type class system that I built.)

Now, let's see it in action.

Examples: First-Order Type Classes & Instances # Type Class: Show #

Given:

@TypeClass
interface Show<A> {
  String show(A a);

  static <A> String show(Show<A> showA, A a) {
    return showA.show(a);
  }

  @TypeClass.Witness
  static Show<Integer> integerShow() {
    return i -> Integer.toString(i);
  }


  @TypeClass.Witness
  static Show<String> stringShow() {
    return s -> "\"" + s + "\"";
  }

  @TypeClass.Witness
  static <A> Show<Optional<A>> optionalShow(Show<A> showA) {
    return optA -> optA.map(a -> "Some(" + showA.show(a) + ")").orElse("None");
  }

  @TypeClass.Witness
  static <A> Show<List<A>> listShow(Show<A> showA) { ... }

  @TypeClass.Witness
  static <K, V> Show<Map<K, V>> mapShow(Show<K> showK, Show<V> showV) { ... }
}

Then:

Map<String, List<Optional<Integer>>> m1 =
    Map.of(
        "a", List.of(Optional.of(1), Optional.empty()),
        "b", List.of(Optional.of(2), Optional.of(3)));

println(Show.show(witness(new Ty<>() {}), m1));
// Prints: {"a": [Some(1), None], "b": [Some(2), Some(3)]}

Note that this requires the recursive instantiation of 5 witness constructors. Here's what some debug logs show:

Instantiating: () -> Show[A](String)
Instantiating: () -> Show[A](Integer)
Instantiating: ∀ A. (Show[A](A)) -> Show[A](Optional[T](A))
Instantiating: ∀ A. (Show[A](A)) -> Show[A](List[E](A))
Instantiating: ∀ K V. (Show[A](K), Show[A](V)) -> Show[A](Map[K, V](K)(V))

Type Class: Monoid & Num #

Given:

@TypeClass
interface Monoid<A> {
  A combine(A a1, A a2);

  A identity();

  static <A> A combineAll(Monoid<A> monoid, List<A> elements) {
    A result = monoid.identity();
    for (A element : elements) {
      result = monoid.combine(result, element);
    }
    return result;
  }
}

@TypeClass
interface Num<A> {
  A add(A a1, A a2);

  A mul(A a1, A a2);

  A zero();

  A one();

  @TypeClass.Witness
  static Num<Integer> integerNum() {
    return new Num<>() {
      @Override
      public Integer add(Integer a1, Integer a2) {
        return a1 + a2;
      }

      @Override
      public Integer mul(Integer a1, Integer a2) {
        return a1 * a2;
      }

      @Override
      public Integer zero() {
        return 0;
      }

      @Override
      public Integer one() {
        return 1;
      }
    };
  }
}

record Sum<A>(A value) {
  @TypeClass.Witness
  public static <A> Monoid<Sum<A>> monoid(Num<A> num) {
    return new Monoid<>() {
      @Override
      public Sum<A> combine(Sum<A> s1, Sum<A> s2) {
        return new Sum<>(num.add(s1.value(), s2.value()));
      }

      @Override
      public Sum<A> identity() {
        return new Sum<>(num.zero());
      }
    };
  }
}

Then:

var sums = List.of(new Sum<>(3), new Sum<>(5), new Sum<>(10));

println(Monoid.combineAll(witness(new Ty<>() {}), sums));
// Prints: Sum[value=18]

Type Class: PrintAll #

This example is based on: https://wiki.haskell.org/Varargs

It abuses type classes in order to implement variadic functions. Please read the link above to understand how this works.

It's really striking how it just works with our system!

Given:

@TypeClass
interface PrintAll<T> {
  T printAll(List<String> strings);

  static <T> T of(PrintAll<T> printAll) {
    return printAll.printAll(List.of());
  }

  @TypeClass.Witness
  static PrintAll<Void> base() {
    return strings -> {
      for (String s : strings) {
        System.out.println(s);
      }
      return null;
    };
  }

  @TypeClass.Witness
  static <A, R> PrintAll<Function<A, R>> func(Show<A> showA, PrintAll<R> printR) {
    return strings -> a -> printR.printAll(Lists.concat(strings, List.of(showA.show(a))));
  }
}

Then:

Function<String, Function<List<String>, Function<Integer, Void>>> printer =
    PrintAll.of(witness(new Ty<>() {}));

printer.apply("Items:").apply(JavaList.of("apple", "banana", "cherry")).apply(42);
// Prints:
// "Items:"
// ["apple", "banana", "cherry"]
// 42

Type Class: Type Equality! #

This examples shows how we can encode propositions about types themselves and have the resolver construct evidence for them.

Reified type equality is very neat construct that I'd like to write more about soon.

For now, let's appreciate how this neatly encodes Haskell's own type equality in Java.

Consider:

@TypeClass
sealed interface TyEq<A, B> {
  A castR(B b);

  B castL(A a);

  @TypeClass.Witness
  static <T> TyEq<T, T> refl() {
    return new Refl<>();
  }

  record Refl<T>() implements TyEq<T, T> {

    @Override
    public T castR(T t) {
      return t;
    }

    @Override
    public T castL(T t) {
      return t;
    }
  }
}

TyEq<A, B> represents the proposition that A = B.

And the only possible constructor for it is refl(), which can only prove that TyEq<T, T> for any T.

How is that useful?

TyEq<T, T> may be trivial in the static context where it is constructed. But it certainly is not in the context where it may be needed.

Take a look:

class List<T> {
  // ...

  int sum(TyEq<T, Integer> ty) {
    int s = 0;
    for (T item : this) {
      s += ty.castL(item); // casts T to Integer !
    }
    return s;
  }

  // ...
}

// then

var xs = List.of(1, 2, 3);

xs.sum(refl());
// Returns: 6

Within List<T>, we have no idea what T. Code can make no assumptions about it. However, if we have a proof that T = Integer, then we can sum the elements up into an int.

Now, we can also request them as witnesses in a witness constructor:

@TypeClass
interface SumAllInt<A> {
  A sum(List<Integer> list);

  static <T> T of(SumAllInt<T> sumAllInt) {
    return sumAllInt.sum(List.of());
  }

  @TypeClass.Witness
  static SumAllInt<Integer> base() {
    return list -> list.stream().mapToInt(Integer::intValue).sum();
  }

  @TypeClass.Witness
  static <A, R> SumAllInt<Function<A, R>> func(TyEq<A, Integer> eq, SumAllInt<R> sumR) {
    return list -> a -> sumR.sum(Lists.concat(list, List.of(eq.castL(a))));
  }
}

Similar to the PrintAll example, this lets us summon variadic functions:

Function<Integer, Function<Integer, Function<Integer, Integer>>> sum =
    SumAllInt.of(witness(new Ty<>() {}));
println(sum.apply(1).apply(2).apply(3));

However, notice how in the func rule, we requested TyEq<A, Integer>. This constrains the A type argument to just Integer.

Funny enough, this is actually only necessary in Haskell due to how integer literals are overloaded. So I implemented this here for no gain. But it was cool that it just worked!

Goal: Higher-Order Type Classes #

Consider the Functor type class in Haskell:

class Functor f where
  fmap :: (a -> b) -> f a - > f b

Notice how f is not a type?

It is not one because it is being applied to types a and b, respectively.

This means that f is a type constructor.

A type constructor is a sort of type-level function that builds new types from existing types.

Here, we mean function in the sense that it has the mathematical property of being injective. That is, no two inputs map to the same output.

In practice, this means that if we know that the output is the same, then the input must be the same:

F<X> = F<Y> ⇒ X = Y

This property is what allows us to unify under type constructors.

In Java, class List<T> can be seen as a type constructor. Though we don't use type application syntax as in Haskell, we do use type parameterization syntax: List<Integer>. In Java, generic types cannot be partially applied. That is, for a type class C<T1, T2, ..., TN>, we must provide all N type arguments at once.

Let's try representing the Functor type class with a Java interface:

interface Functor<F> {
  <A, B> F<B> fmap(Function<A, B> f, F<A> fa);
}

Uh-oh:

The type F is not generic; it cannot be parameterized with arguments <B>

Indeed, Java requires that the type parameter F is a type, and not a type constructor.

Are we cooked?

Not quite.

Aside: What is a kind? #

Simply: values have types; types have kinds.

42 has type Integer.

Integer has kind * (read: star).

The syntax * for the kind of plain types comes from Haskell.

We could also say Integer has kind Type, but that may be confusing given how overloaded the word 'type' is in this context.

List<Integer> also has kind *.

So what is the kind of List itself?

List has kind * -> *.

That is, it is a type-level function that accepts a type and returns a type.

Aside: Higher-Kinded Types in Java #

Here we introduce an encoding of higher-kinded types for Java.

Recall that interface Functor<F> failed because Java insists F is a concrete type, not a type constructor. This encoding gives us a way to pretend F is higher-kinded.

Note: I did not invent this encoding. I have seen several projects using it (see the Related Work section). I am not sure where it originated, but it is rather elegant.

Consider:

interface TApp<F, A> {}

abstract class TagBase {}

sealed interface Maybe<A> extends TApp<Maybe.Tag, A> {
  record Nothing<A>() implements Maybe<A> {}
  
  record Just<A>(A value) implements Maybe<A> {}

  final class Tag extends TagBase {}

  static <A> Maybe<A> unwrap(TApp<Maybe.Tag, A> m) {
    return (Maybe<A>) m; // unsafe if we misuse TApp and tag types
  }
}

Let's unpack this:

TApp<F, A> represents type application, hence its name.
- We assume that F has kind * -> * and A has kind *.
- TApp<F, A> represents F<A>.
- Therefore TApp<F, A>, in principle, has kind *.
Maybe.Tag is a sort of proxy type for Maybe as an unapplied type constructor.
- TApp<Maybe.Tag, A>, in principle, means Maybe<A>.
- That is why Maybe<A> extends TApp<Maybe.Tag, A>.
Maybe.Tag extends TagBase just so that we can easily identify this type via reflection later.

How is this useful?

Now we do have a mechanism (more of a convention) to represent Functor!

Check it out:

interface Functor<F> {
  <A, B> TApp<F, B> fmap(Function<A, B> f, TApp<F, A> fa);
}

Remember: TApp<F, A> means F<A>.

And we can also define the witness constructor:

sealed interface Maybe<A> extends TApp<Maybe.Tag, A> {
  // ...

  default <B> Maybe<B> map(Function<A, B> f) { ... }

  @TypeClass.Witness
  static Functor<Maybe.Tag> functor() {
    return new Functor<>() {
      @Override
      public <A, B> TApp<Maybe.Tag, B> fmap(Function<A, B> f, TApp<Maybe.Tag, A> fa) {
        return unwrap(fa).map(f);
      }
    };
  }
}

Let's unpack that:

We define a witness constructor for Functor<Maybe.Tag>.
- Remember: Functor<Maybe.Tag> means Functor<Maybe>, where Maybe is the unapplied type constructor.
We then have to implement:
- <A, B> TApp<Maybe.Tag, B> fmap(Function<A, B> f, TApp<Maybe.Tag, A> fa)
- And remember:
  - TApp<Maybe.Tag, A> means Maybe<A>
  - TApp<Maybe.Tag, B> means Maybe<B>
- If we squint our eyes a bit, it does make sense.
We use unwrap() to convert from TApp<Maybe.Tag, A> to Maybe<A>.
Maybe<B> is a subtype of TApp<Maybe.Tag, B>, so the return type just works.

That's it!

Now, we must extend our type class system to understand these typing conventions.

Subgoal: Extending ParsedType to support HKTs #

This is actually not very involved. Check it out:

sealed interface ParsedType {
  // ...

  static ParsedType parse(Type java) {
    return switch (java) {
      // New:
      case Class<?> tag
        when parseTagType(tag) instanceof Maybe.Just<Class<?>>(var tagged) ->
          new Const(tagged);
      // New:
      case ParameterizedType p
          when parseAppType(p)
              instanceof Maybe.Just<Pair<Type, Type>>(Pair<Type, Type>(var fun, var arg)) ->
          new App(parse(fun), parse(arg));
      // etc
    };
  }

  private static Maybe<Class<?>> parseTagType(Class<?> c) {
    return switch (c.getEnclosingClass()) {
      case Class<?> enclosing when c.getSuperclass().equals(TagBase.class) ->
          Maybe.just(enclosing);
      case null -> Maybe.nothing();
      default -> Maybe.nothing();
    };
  }

  private static Maybe<Pair<Type, Type>> parseAppType(ParameterizedType t) {
    return switch (t.getRawType()) {
      case Class<?> raw when raw.equals(TApp.class) ->
          Maybe.just(Pair.of(t.getActualTypeArguments()[0], t.getActualTypeArguments()[1]));
      default -> Maybe.nothing();
    };
  }
}

We only had to add two new pattern-match cases to parse:

case Class<?> tag when parseTagType(tag) ... tagged
- -> new Const(tagged)
- This replaces any occurrence of T.Tag with T itself.
case ParameterizedType p when parseAppType(p) ... (fun, arg)
- -> new App(parse(fun), parse(arg))
- This replaces any occurrence of TApp<F, A> with new App(parse(F), parse(A)).

That's it!

I was also surprised! The witness resolution code needs no changes whatsoever!

Although, I was worried about potential bugs coming from misuses of Tags and TApp. So I came up with a lightweight embedded kind-checking mechanism.

Aside: Kind-checking Java-embedded HKTs #

Given:

interface Kind<K extends Kind.Base> {
  sealed interface Base {}

  // KStar = *
  final class KStar implements Base {}

  // KArr k = * -> k
  final class KArr<K extends Base> implements Base {}
}

abstract class TagBase<K extends Kind.Base> implements Kind<K> {}

// Full application of a unary type constructor
// TApp :: (* -> *) -> * -> *
interface TApp<Tag extends Kind<KArr<KStar>>, A> extends Kind<KStar> {}

// Partial application of a binary type constructor
// TPar :: (* -> * -> *) -> * -> (* -> *)
interface TPar<Tag extends Kind<KArr<KArr<KStar>>>, A> extends Kind<KArr<KStar>> {}

TApp now can only apply to tags of kind * -> *, itself becoming a kind *.
TPar applies to tags of kind * -> * -> *, itself becoming a kind * -> *.
This gives us some rudimentary kind-checking in Java

Then:

sealed interface Maybe<A> extends TApp<Maybe.Tag, A> {
  // ...

  final class Tag extends TagBase<KArr<KStar>> {}

  static <A> Maybe<A> unwrap(TApp<Tag, A> value) {
    return (Maybe<A>) value;
  }
}

And also:

@FunctionalInterface
interface State<S, A> extends TApp<TPar<State.Tag, S>, A> {
  // ...

  @TypeClass.Witness
  static <S> Functor<TPar<Tag, S>> functor() { ... }

  final class Tag extends TagBase<KArr<KArr<KStar>>> {}

  static <S, A> State<S, A> unwrap(TApp<TPar<Tag, S>, A> value) {
    return (State<S, A>) value;
  }
}

Notice in this case that State has two type parameters:

State.Tag has kind KArr<KArr<KStar>>
- This means: * -> * -> *
The first application of State.Tag must go through TPar.
The subsequent application must go through TApp.

Examples: Higher-Kinded Type Clases # Type Class: Traversable #

According to Haskell's documentation on Traversable:

Traversable is the class of data structures that can be traversed from left to right, performing an action on each element.

The Haskell type class looks like this:

class (Functor t, Foldable t) => Traversable t where
  traverse :: Applicative f => (a -> f b) -> t a -> f (t b)

Notice the use of the following features:

Traversable is higher-kinded.
- Its parameter t has kind * -> *.
Traversable has two superclasses: Functor and Foldable.
Its method traverse is quantified on three type variables:
- a and b are plain types, of kind *.
- f is of kind * -> *, as it seen applied as f b and f (t b).
The type variable f has the constraint Applicative f.

Traversable is a rather advanced type class.

Let's translate it into Java.

For now, let's ignore its superclasses:

@TypeClass
interface Traversable<T extends Kind<KArr<KStar>>> {

  <F extends Kind<KArr<KStar>>, A, B>
      TApp<F, ? extends TApp<T, B>> traverse(
          Applicative<F> applicative,
          Function<A, ? extends TApp<F, B>> f,
          TApp<T, A> t);

  // And a static helper, for convenience:
  // (I omitted some types and arguments for brevity)
  static <F, A, B> ... traverse(
      Traversable<T> traversable,
      Applicative<F> applicative,
      ...) {
    return traversable.traverse(applicative, f, t);
  }
}

We see that:

T has kind KArr<KStar> which means * -> *.
The method traverse:
- Is generic on three type variables: F, A, B.
  - F is also declared as having kind * -> *.
- Depends on a witness of Applicative<F>.
- Declares covariance TApp<T, B> on a couple of places:
  - Because we know that we won't return values of static type TApp<T, B> itself, but rather values of type T<B>, which in our HKT convention will extend TApp<T, B>.

Then:

// Unfortunately for HKTs, we must wrap regular Java lists
record JavaList<A>(List<A> toList) implements TApp<JavaList.Tag, A> {
  // ...

  @TypeClass.Witness
  public static <A> Show<JavaList<A>> show(Show<A> showA) { ... }

  @TypeClass.Witness
  public static Functor<JavaList.Tag> functor() { ... }

  @TypeClass.Witness
  public static Traversable<JavaList.Tag> traversable() { ... }

  // ...
}

// and:

sealed interface Maybe<A> {
  // ...

  @TypeClass.Witness
  static Applicative<Maybe.Tag> applicative() { ... }

  // ...
}

For example:

Traversable.traverse(
    witness(new Ty<>() {}), // for Traversable<JavaList>
    witness(new Ty<>() {}), // for Applicative<Maybe>
    JavaList.of(1, 2, 3),
    Maybe::just);
// Returns: Just[value=JavaList[toList=[1, 2, 3]]]

Type Class: Functor, Applicative, Monad #

As expected:

@TypeClass
interface Functor<F extends Kind<KArr<KStar>>> {
  <A, B> TApp<F, B> map(Function<A, B> f, TApp<F, A> fa);
}

@TypeClass
interface Applicative<F extends Kind<KArr<KStar>>> extends Functor<F> {
  <A> TApp<F, A> pure(A a);

  <A, B> TApp<F, B> ap(TApp<F, Function<A, B>> ff, TApp<F, A> fa);

  @Override
  default <A, B> TApp<F, B> map(Function<A, B> f, TApp<F, A> fa) {
    return ap(pure(f), fa);
  }
}

@TypeClass
interface Monad<M extends Kind<KArr<KStar>>> extends Applicative<M> {
  <A, B> TApp<M, B> flatMap(Function<A, TApp<M, B>> f, TApp<M, A> fa);

  @Override
  default <A, B> TApp<M, B> map(Function<A, B> f, TApp<M, A> fa) {
    return flatMap(a -> pure(f.apply(a)), fa);
  }

  @Override
  default <A, B> TApp<M, B> ap(TApp<M, Function<A, B>> ff, TApp<M, A> fa) {
    return flatMap(a -> map(f -> f.apply(a), ff), fa);
  }
}

I currently don't have any interesting Monad examples. I may add one or two soon.

Recap #

We have built:

A simple mechanism to capture static types for runtime access.
An AST and a parser for Java types.
- Including support for higher-kinded types encoded via 'tag types'.
Type unification on said AST of types.
A witness resolution algorithm that:
- For a witness type C<T>, finds witness constructors in C and T.
  - A witness constructor is a public static method annotated with @TypeClass.Witness.
- Uses type unification to:
  - Filter relevant witness constructors.
  - Guide the recursive search for witness dependencies.

And altogether:

We are able to resolve arbitrarily nested witnesses for first-order type clases like Eq, Ord, and Monoid and also for higher-kinded type classes like Functor, Monad, and Traversable.
The witness summoning syntax looks like:
- C<T> w = witness(new Ty<>() {});
Witness resolution is currently done at runtime. Ideally, we should be able to do this work at compile time, as that's where the power of type classes lies in languages like Haskell.

Conclusion #

This was a lot of fun to work on. I had never before implemented type classes in a language, and it turned out to be way simpler than I thought. The key insight was figuring out that type unification was all that I needed. Stupidly, I did not check Haskell's spec to figure that out. But it did come somewhat naturally after a bit of thinking, given my experience in implementing type systems.

This project was partially inspired by this recent Brian Goetz JVMLS talk where he presents his early ideas on how to bring type clases to Java. His proposed syntax is Show<Integer>.witness. And, of course, the mechanism is supposed to resolve witnesses at compile time.

You can find the complete implementation in this Gist.

The code also contains several other type class examples like QuickCheck's Arbitrary type class.

Future work #

The resolution mechanism could use some caching:
- For witness constructor lookups.
- For witness instances themselves.
It would be ideal to shift witness resolution to compile time.
- Perhaps with a javac plugin or something like that?

Related Work #

Here are some related projects and posts:

HighJ
- HKT encoding via Higher<F, A>.
- Provides Functor/Monad/etc.
- No automatic instance resolution, instances are explicit classes.
Cyclops React
- Functional programming toolkit for Java.
- Type class-like interfaces (Functor, Monad, etc.).
- Manual instance lookup, no reflection or unification.
Higher-Kinded-J
- Modern HKT encoding using witness types.
- Supports Functor, Applicative, Monad, MonadError, etc.
- Instances must be explicitly provided, not inferred.
mduerig/java-functional
- Experimental HKT + type class encodings in Java.
- Conceptually related exploration.
- No generic instance search or unification engine.
Type Classes in Java (blog post)
- Demonstrates type class pattern using interfaces.
- Requires manual wiring of instances.
JavaGI
- Research extension to Java with Haskell-style type classes.
- Uses unification-based instance resolution.
- Requires its own compiler, not a Java library.

Annex: Context Instances #

Consider this example:

static <A> String example(Show<A> showA, A value) {
  return Show.show(witness(new Ty<>() {}), JavaList.of(value));
}

It is equivalent to the following Haskell code:

example :: Show a => a -> String
example value = show [value]

In the Java code, we try to lookup Show<List<A>>, but we don't know what A is!

Sure, at runtime we may know its real type, but we actually would like resolution to be static. (Even though we use reflection for witness resolution.)

In the Haskell code, the available instance of Show a as captured by the function's signature becomes available as a 'context instance' that is used to derive Show [a] for the call of show.

How can we achieve this in Java?

First, we define a type that can capture a witness along with its static type:

abstract class Ctx<T> {
  private final T instance;

  Ctx(T instance) {
    this.instance = instance;
  }

  public T instance() {
    return instance;
  }

  public Type type() {
    return requireNonNull(
        ((ParameterizedType) getClass().getGenericSuperclass())
            .getActualTypeArguments()[0]);
  }
}

It leverages the same mechanism as Ty<T> to capture static types.

Then, we pass it to the witness() method:

static <A> String example(Show<A> showA, A value) {
  return Show.show(
      witness(new Ty<>() {}, new Ctx<>(showA) {}),
      JavaList.of(value));
}

Finally, we update a bit of our type class resolution code:

class TypeClasses {
  // New: second parameter
  public static <T> T witness(Ty<T> ty, Ctx<?>... context) {
    return switch (summon(ParsedType.parse(ty.type()), parseContext(context))) {
      // ...
    };
  }

  // New: parsing Ctx<?> into ContextInstance
  private static List<ContextInstance> parseContext(Ctx<?>[] context) {
    return Arrays.stream(context)
        .map(ctx -> new ContextInstance(ctx.instance(), ParsedType.parse(ctx.type())))
        .toList();
  }

  // ...

  // New: second parameter
  private static Either<SummonError, Object> summon(
      ParsedType target, List<ContextInstance> context) {
    // ...
  }

  // New: second parameter
  private static Either<SummonError, List<Object>> summonAll(
      List<ParsedType> targets, List<ContextInstance> context) {
    // ...
  }

  // New: second parameter
  private static List<Candidate> findCandidates(
      ParsedType target, List<ContextInstance> context) {
    // New: use the context instances along with the discovered witness constructors!
    return Stream.<WitnessRule>concat(
            context.stream(),
            findRules(target).stream())
        .flatMap(...)
        .toList();
  }

  // ...

  // New: a new case class for WitnessRule
  private record ContextInstance(Object instance, ParsedType type) implements WitnessRule {
    @Override
    public Maybe<List<ParsedType>> tryMatch(ParsedType target) {
      // This is a concrete instance, so we only check for type equality
      return target.equals(type) ? Maybe.just(List.of()) : Maybe.nothing();
    }

    @Override
    public Object instantiate(List<Object> dependencies) {
      // Trivial
      return instance;
    }
  }
}

That's all. A bit noisy, but rather simple.

Now this works:

static <A> String example(Show<A> showA, A value) {
  return Show.show(
      witness(new Ty<>() {}, new Ctx<>(showA) {}),
      JavaList.of(value));
}

println(example(witness(new Ty<>() {}), 123));

Here, the type captured by new Ctx<>(showA) {} is Show<A>, where A is the unique type variable belonging to the example method.

The Show<List<A>> witness that we are trying to summon needs a Show<A> whose type could only possibly exist in this static context!

Annex: Overlapping Instances #

In Haskell, the String type is defined as:

type String = [Char]

That is, a String is just a list of Char.

Now, notice the difference here:

show [1, 2, 3]
-- "[1, 2, 3]"

show [('a', 1), ('b', 2)]
-- "[('a', 1), ('b', 2)]"

show ['a', 'b']
-- "\"ab\"" what?

The generic Show instance for [a] simply intercalates ", " between elements.

But the Show instance for [Char] behaves differently.

Why is that? This is due to a language extension called overlapping instances.

It allows otherwise ambiguous instances to coexist:

instance Show a => Show [a] where ...

instance {-# OVERLAPPING #-} Show [Char] where ...

The OVERLAPPING pragma tells the compiler that this instance may override another instance iff it is more specific.

The rules for instance specificity are explained in the same link.

Now, consider in Java:

sealed interface FwdList<A> extends TApp<FwdList.Tag, A> {
  record Nil<A>() implements FwdList<A> {}

  record Cons<A>(A head, FwdList<A> tail) implements FwdList<A> {}

  @TypeClass.Witness
  static <A> Show<FwdList<A>> show(Show<A> showA) { ... }

  // Ambiguous!
  @TypeClass.Witness
  static Show<FwdList<Character>> show() { ... }
}

FwdList (name inspired by C++'s std::forward_list) implements a data structure like Haskell's lists.

As it is, witness resolution will fail with an ambiguous witness error.

In order to support overlapping instance, we must apply some changes.

First, let's model Haskell's OVERLAPPING and OVERLAPPABLE pragmas:

@Retention(RetentionPolicy.RUNTIME)
@interface TypeClass {
  @Retention(RetentionPolicy.RUNTIME)
  @interface Witness {
    Overlap overlap() default Overlap.NONE;

    enum Overlap {
      NONE,
      OVERLAPPING,
      OVERLAPPABLE
    }
  }
}

Then, we make it accessible from InstanceConstructor:

private record InstanceConstructor(FuncType func) implements WitnessRule {
  public TypeClass.Witness.Overlap overlap() {
    return func.java().getAnnotation(TypeClass.Witness.class).overlap();
  }

  // ...
}

Finally, we implement the overlapping instances deduction algorithm as described in Haskell's spec:

private static List<InstanceConstructor> reduceOverlapping(
    List<InstanceConstructor> candidates) {
  return candidates.stream()
      .filter(
          iX ->
              candidates.stream()
                  .filter(iY -> iX != iY)
                  .noneMatch(iY -> isOverlappedBy(iX, iY)))
      .toList();
}

private static boolean isOverlappedBy(
    InstanceConstructor iX, InstanceConstructor iY) {
  return (iX.overlap() == OVERLAPPABLE || iY.overlap() == OVERLAPPING)
      && isSubstitutionInstance(iX, iY)
      && !isSubstitutionInstance(iY, iX);
}

private static boolean isSubstitutionInstance(
    InstanceConstructor base, InstanceConstructor reference) {
  return Unification.unify(base.func().returnType(), reference.func().returnType())
      .fold(() -> false, map -> !map.isEmpty());
}

And we apply it to our candidate resolution function:

private static List<Candidate> findCandidates(
    ParsedType target, List<ContextInstance> context) {
  return Stream.<WitnessRule>concat(
          context.stream(),
          reduceOverlapping(findRules(target)).stream())
      .flatMap(...)
      .toList();
}

And, of course, we annotate our witness constructor:

sealed interface FwdList<A> extends TApp<FwdList.Tag, A> {
  record Nil<A>() implements FwdList<A> {}

  record Cons<A>(A head, FwdList<A> tail) implements FwdList<A> {}

  @TypeClass.Witness
  static <A> Show<FwdList<A>> show(Show<A> showA) { ... }

  // OK now!
  @TypeClass.Witness(overlap = OVERLAPPING)
  static Show<FwdList<Character>> show() { ... }
}

Now, our witness resolution code will pick Show<FwdList<Character>> because it is more specific AND it declares that it may overlap other instances.

https://garciat.com/posts/java-type-classes/

Extensions

Calling Haskell from Java

Jan 30, 2025

I put together a self-contained example that calls compiled Haskell from Java using the new Foreign Function & Memory API from JEP 454.

Show full content

Around 2021, while working at Uber, I collaborated briefly with Joshua Shinavier on Dragon, a schema integration tool based on Algebraic Property Graphs. This was a very interesting project to me, because it was the only (to my knowledge) project at Uber that used Haskell.

Joshua had written the Dragon's core in Haskell and wanted to expose its functionality to Java without having to go through the hassle of creating an RPC service or something similar. Since I was not familiar with Property Graphs, I thought that I could at least help with this problem.

My proposal was to:

Compile Haskell into a shared library that exposed C-compatible symbols via Haskell's FFI.
Use Java's JNI to wrap the C functions into Java-callable native functions.
Write an idiomatic Java façade that eventually calls into the JNI-exposed functions.

In principle, the idea was rather simple and almost obvious; but in practice, it turned out to be a pain to get it all working, especially on macOS. I think we got it working on Linux, but we also needed macOS support because other Uber devs had to be able to run the project locally. I can't really recall what sort of issue we encountered, but we just couldn't make it work after days of trying.

Today I remembered that experience and wondered whether the state of things had improved. For example, I was aware of JEP 454 1 that introduced a much better alternative to JNI. After a few minutes of searching and a bit of trial-and-error, I got a small demo working. I didn't struggle at all this time, and the final result (code and steps) were pretty neat and simple.

I uploaded the demo to garciat/haskell-from-java, in case I or anyone ever needs a basic starting point to call Haskell from Java.

https://garciat.com/posts/haskell-from-java/

Extensions

Compiling Go to WASM

Jan 16, 2025

I share my experience compiling Go to WASM for a side project.

Show full content

Background #

Last year I wrote a type-checker for Golang (written in Go) whose aim is to expand the language's type inference capabilities by using bidirectional typechecking. (It's just a small "what if" experiment.)

It was my first experience with that technique. In retrospect, I think I did a decent job but I am sure I did not grasp the concept fully at the time. For example, I would've liked to split my AST in the way that Conor McBride describes in this talk on type systems.

As I developed the type-checker, I found myself repeatedly clearing my terminal screen and re-running the tool to check its output. My write-run-debug loop was inefficient. So I put together a tiny web app.

The web app consisted of a frontend and a backend. The frontend makes use of Monaco Editor to display two code editors side by side (a la Compiler Explorer) -- one to edit the source Go code and one to display the tool's output. The backend receives POST /compile requests from the frontend every time the user edits the source code; the input is the user's code as text, and the output is the type-checker's internal logs.

Admittedly, I spent too much time on this (I have a weak spot for webdev), but it worked out pretty well.

...and it wasn't long before I wanted to share my work with friends and colleagues 😄. So I used Google Cloud Run's free tier to build and deploy the app, which gave me access to a public URL for the app.

For the time being, I was satisfied with all the work that I did. And I moved on.

Fast forward to a few weeks ago...

During the holidays, I had enough time and energy to finally pour some love into my personal website. It had been neglected for years, and one of the reasons for such neglect was my annoyance of working with Ruby and Jekyll. I just found it clunky and uninteresting.

On my search for alternative static site generators, I found Lume. It is a dead-simple, batteries-included framework that just works. But more importantly, it just makes sense. I was instantly delighted, so I spent the following days moving every site that I own (mainly GitHub Pages) to Lume. I had lots of fun and was extremely productive using Deno, Lume, TypeScript, and Preact.

And then it hit me: What if I don't need a backend for my type-checker web app? Could I also turn it into a statically-built website?

The answer was yes. I was able to replace the backend call with a direct call to the type-checker compiled as WASM. That allowed me to get rid of the Google Cloud deployment and simply build the app with Lume and deploy it for free to GitHub Pages.

So, that's the why. Now, let's focus on the how.

Compiling and running #

(Note: I'm currently running Go version 1.23.4.)

Compiling Go to WASM (docs) is pleasantly simple and easy:

GOOS=js GOARCH=wasm go build -o build/main.wasm github.com/garciat/gobid/cmd/main

To my surprise, the command ran successfully on the first try without any changes at all to the codebase. Props to the Golang dev team.

In order to execute the code, the documentation asks us to copy the file "$(go env GOROOT)/misc/wasm/wasm_exec.js" into our project and to import it as a helper script:

<script src="wasm_exec.js"></script>
<script>
  const go = new Go();
  WebAssembly.instantiateStreaming(fetch("main.wasm"), go.importObject)
    .then((result) => {
      go.run(result.instance);
    });
</script>

I did so, and it worked. I saw the following logged into the developer console:

wasm_exec.js:22 === CheckPackageDeclarations ===
wasm_exec.js:22 === CheckPackageCycles ===
wasm_exec.js:22 === ResolvePackageNames(main) ===
wasm_exec.js:22 === Loading Package (main) ===
wasm_exec.js:22 === Checking Package (main) ===
wasm_exec.js:22 === Verifying Package (main) ===
wasm_exec.js:22 === Verify ===
wasm_exec.js:22 === iteration 0 ===
wasm_exec.js:22 Context:
{} || {}
wasm_exec.js:22 Steps:
wasm_exec.js:22 === DONE ===
wasm_exec.js:22 {{  }}

But, wait, there's more.

My tool's main executable is meant to be called with file paths passed as commandline arguments. For example:

go run github.com/garciat/gobid/cmd/main hello.go

I needed to be able to:

Provide program arguments. (os.Args)
Provide readable files.
Capture stdout into a JS string, instead of console.log.

I could'ved skipped points 1 and 2 by changing the program to read from stdin instead. That would've been good enough for an MVP. But for whatever reason I chose to stick to its original behavior. How hard could that be?

Providing program arguments #

This is actually easy to do:

const go = new Go();
go.argv.push("hello.go");
// ...

But how did I figure that out? Go's official documentation for WebAssembly does not document the interface of the Go object provided by the support library.

Our only options are to either read the file itself to infer its API, or to read the code of one of the many provided example apps.

As we'll see further on, this will not be our last problem with underspecification.

Providing readable files #

The wasm_exec.js support file (link) gives us a hint as to how filesystem access works in Go-WASM:

if (!globalThis.fs) {
  let outputBuf = "";
  globalThis.fs = {
    constants: {/** ... */},
    writeSync(fd, buf) {
      // ...
    },
    // Several more methods like:
    // write, chmod, chown, etc.
  };
}

That piece of code seems to be defining a Node.js-like filesystem interface. But there are essentially zero direct source references to fs in the wasm_exec.js file itself. How is this being used then?

It turns out that the Go standard library file src/syscall/fs_js.go (link) reads the fs global variable like so:

// NOTE: some lines were redacted away for brevity

package syscall

import (
  "syscall/js"
)

var jsFS = js.Global().Get("fs")
var constants = jsFS.Get("constants")

func Ftruncate(fd int, length int64) error {
  _, err := fsCall("ftruncate", fd, length)
  return err
}

func fsCall(name string, args ...any) (js.Value, error) {
  type callResult struct {
    val js.Value
    err error
  }

  c := make(chan callResult, 1)
  f := js.FuncOf(func(this js.Value, args []js.Value) any {
    var res callResult

    if len(args) >= 1 { // on Node.js 8, fs.utimes calls the callback without any arguments
      if jsErr := args[0]; !jsErr.IsNull() {
        res.err = mapJSError(jsErr)
      }
    }

    res.val = js.Undefined()
    if len(args) >= 2 {
      res.val = args[1]
    }

    c <- res
    return nil
  })
  defer f.Release()
  jsFS.Call(name, append(args, f)...)
  res := <-c
  return res.val, res.err
}

From this, we may infer the following:

Go expects a global name fs.
Go redirects filesystem syscalls to method calls on fs.
The syscall API mimicks Node.js's fs module.
- Go relies on specific overloads of each function.
- For example: Node.js's fs.ftruncate (link) may be called with fd, callback or with fd, len, callback, but Go only uses the fd, len, callback overload.
All syscalls are callback-based.
- The Go callback specifically expects null when there is no error. I.e. undefined or some other falsey value will not do.

A Google search for "nodejs fs on browser" eventually led me to ZenFS, a browser-compatible filesystem emulation library that is compatible with Node.js's fs API.

I tried doing the following:

import { fs } from "https://esm.sh/@zenfs/core";
globalThis.fs = fs;
const go = new Go();
fs.writeFileSync("main.go", "package main\n");
go.argv.push("main.go");

But it did not work right away. After hours of debugging, I found that I needed to write quite a bit of custom code to get ZenFS to work with Go's implicit expectations and in order to work around some of ZenFS's strange behavior.

In summary, I defined a GoFileSystemAdapter class that wraps a ZenFS FileSystem object and exposes it as the Node.js-like fs API that Go expects.

Capturing stdout #

ZenFS itself has no support for any of the standard streams. I ended up manually creating /dev/stdout and /dev/stderr files and redirecting operations on file descriptors 1 and 2 respectively.

I did not spend time on enabling /dev/stdin, though, because I did not need it.

I also could've implemented a custom ZenFS Device that more accurately mimicks the usual stdin and stdout behavior. But the above was sufficient for my use case.

Upon calling GoFileSystemAdapter#finalize(), I close and fully read those special files and return their contents as JS strings.

Updating wasm_exec.js to JS modules #

I also spent some time modifying wasm_exec.js to make it easier to import as a JS module. More specifically, I made the following changes:

I converted the file extension .ts for TypeScript support. This allowed me to explicitly define the expected API of the filesystem FFI. For example:

export interface GoFileSystemError extends Error {
  readonly errno: number;
  readonly code: string;
}

export type GoFileSystemCallback<T> = (
  err: GoFileSystemError | null,
  value?: T,
) => void;

export interface GoFileSystemFFI {
  // ...

  write(
    fd: number,
    buf: Uint8Array,
    offset: number,
    length: number,
    position: number | null,
    callback: GoFileSystemCallback<number>,
  ): void;

  // ...
}

Then, I changed the constructor of new Go(...) to take in fs and process as parameters instead of reading them from globalThis; and I used export class Go instead of globalThis.Go = class Go.

Sadly, because Go's dependency on globalThis.fs happens in the standard library, it would've been a pain to change this. I had previously hacked on the Go compiler, but I did not want to go down this path at the time. (I will consider it in the future.)

Finally, I renamed the file to wasm_exec@1.23.4.ts in order to make it clear that it is tied to Go version 1.23.4.

Putting it all together #

In the end, my 'compile' function ended up looking like this. I named the function gobid, because that's the name I chose for the type-checker back when I started it (it is a portmanteau of Go and Bidirectional).

import { createFS } from "./go-fs.ts";
import { process } from "./go-process.ts";
import { Go } from "./wasm_exec@1.23.4.ts";

type Path = string;
type GoSourceCode = string;

export async function gobid(
  inputs: Record<Path, GoSourceCode>,
): Promise<string> {
  const fs = await createFS();

  const go = new Go(fs, process);

  for (const [path, source] of Object.entries(inputs)) {
    fs.writeFileSync(path, source);
    go.argv.push(path);
  }

  const source = await WebAssembly.instantiateStreaming(
    fetch(import.meta.resolve("../build/main.wasm")),
    go.importObject,
  );

  await go.run(source.instance);

  const { stdout, stderr } = fs.finalize();

  if (stderr) {
    return stderr;
  }

  return stdout;
}

Although globalThis shenanigans do end up happening, at the very least those details do not leak into the client code.

Check it out #

You can access the web app here: https://garciat.com/gobid/

Conclusions #

Go's WASM support works well, but it is sorely underspecified. It would be great if its JS FFI contract were strictly specified in their documentation and, ideally, with TypeScript interfaces (or at least with JSDoc types).

Go's filesystem JS FFI reliance on a global fs object makes it hard to isolate WASM instances. As an alternative, the syscall code could use the Go object instance itself as a top-level scope and read fs and other FFI dependencies off it directly. GitHub issue #56084 proposed a similar approach back in 2022, but no changes have been landed since.

wasm_exec.js does not leverage modern JavaScript features like modules (aka ESM), and is otherwise too coupled to Node.js API expectations. Also, rather than copying the file off the local Go installation, it would be more convenient to be able to download or import it from NPM or JSR. For example:

import { Go } from "jsr:@go/wasm-ffi@1.23.4"

Finally, implementing a fully-compliant filesystem for the Go JS FFI is non-trivial and there are no ready-made solutions available, as far as I can tell.

Update (2025-02-04) #

I tried using Go's WASI support as an alternative, better-defined system interface for the executable. Using WASI should allow me to ignore the wasm_exec.js shenanigans and directly plug in a browser-compatible WASI environment that provides its own contained file system.

Unfortunately, Go makes use of asynchronous IO (via poll_oneoff) which seems to be implemented via SharedArrayBuffer and Atomics in the WASI implementations that I found (antmicro/jswasi and wasmerio/wasmer-js). And for the time being, GitHub Pages does support the required security headers to enable SharedArrayBuffer, so without recurring to something like setting up Cloudfare Workers I will not be able to leverage WASI.

https://garciat.com/posts/go-wasm/

Extensions

TypeScript and Haskell: Unions & Singletons

Dec 14, 2024

I briefly explore union types and singleton types in Haskell and TypeScript.

Show full content

I like TypeScript's type system. In particular, I like the combination of Union Types (aka anonymous Sum Types) and Literal Types (aka Singleton Types).

Those features allow us to achieve API precision, while also minding conciseness:

// Not very precise
type Result = {
  ok: boolean;
  error?: string;
  userRole?: string;
  avatarSize?: number;
};

// Much more precise
type Result =
  | { ok: false; error: string }
  | { ok: true; userRole: "ADMIN" | "GUEST"; avatarSize: 96 | 128 | 512 };

Let's compare that to Haskell without any language extensions:

data Result =
  ResultError { resultError :: String }
  | ResultOk { resultUserRole :: UserRole, resultAvatarSize :: AvatarSize }

data UserRole = UserRoleAdmin | UserRoleGuest

data AvatarSize = AvatarSize96 | AvatarSize128 | AvatarSize512

Unfortunately, Haskell falls short compared to TypeScript:

We must declare the types UserRole and AvatarSize.
Actual value alternatives (e.g 96, 128) are obfuscated by their respective ADT tags (e.g AvatarSize96 and AvatarSize128).

So I went on the hunt for Haskell extensions that could help bridge those gaps.

Here's one alternative that I found:

{-# Language DataKinds, GADTs #-}

import GHC.Base
import GHC.TypeLits

---

type OneOfNat :: [Nat] -> Type

data OneOfNat ns where
  Here :: SNat n -> OneOfNat (n ': ns)
  There :: OneOfNat ns -> OneOfNat (n ': ns)

---

type AvatarSize = OneOfNat [96, 128, 512]

hello :: AvatarSize -> String
hello size =
  case size of
    Here @96 _ -> "96"
    There (Here @128 _) -> "128"
    There (There (Here @512 _)) -> "512"

---

main = print $ hello $ There (Here SNat)

Here's another alternative:

{-# Language DataKinds, UnboxedSums #-}

import GHC.Base
import GHC.TypeLits

---

type AvatarSize = (# SNat 96 | SNat 128 | SNat 512 #)

hello :: AvatarSize -> String
hello size =
  case size of
    (# SNat @96 | | #) -> "96"
    (# | SNat @128 | #) -> "128"
    (# | | SNat @512 #) -> "512"

---

main = print $ hello $ (# SNat @96 | | #)

I like this solution because the syntax is flat, compared to the previous nested Heres and Theres.

However, unpacked sum types behave very differently from regular types.

For example, we cannot do this:

type family UOneOfNat ns where
  UOneOfNat [a, b] = (# SNat a | SNat b #)
  UOneOfNat [a, b, c] = (# SNat a | SNat b | SNat c #)

type AvatarSize = UOneOfNat [96, 128, 512]

Because each unpacked sum type has a different kind based on its arity:

Main.hs:10:25: error: [GHC-83865]
    • Couldn't match kind: '[LiftedRep]
                     with: '[]
      Expected kind ‘TYPE (SumRep [LiftedRep, LiftedRep])’,
        but ‘(# SNat a | SNat b | SNat c #)’ has kind ‘TYPE
                                                         (SumRep [LiftedRep, LiftedRep, LiftedRep])’
    • In the type ‘(# SNat a | SNat b | SNat c #)’
      In the type family declaration for ‘UOneOfNat’
   |
10 |   UOneOfNat [a, b, c] = (# SNat a | SNat b | SNat c #)
   |                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Moreover, neither alternative holds the desired semantics of the type union behaving like a set of types, i.e. normalizing duplicates. We could use type-level Sets for this, but that only further complicates our representation.

https://garciat.com/posts/haskell-union-types/

Extensions

Understanding Go generics

May 17, 2024

A complete brain dump of my understanding of Go generics at the time.

Show full content

Introduction #

As of Go 1.18, type parameters can be used in the following declarations:

Function declarations
- But not methods
  - Although generic type may have methods (just not generic methods)
Named type declarations
- But not type aliases

Generic functions by example #

func IdentityInt(x int) x { return x }
func IdentityString(x string) x { return x }
// etc...

// They can be replaced by
func Identity[T any](x T) T { return x }

// Identity[int]    ~ IdentityInt
// Identity[string] ~ IdentityString

func Concat[T any](slices ...[]T) []T {
  var s []T
  for _, slice := range slices {
    for _, elem := range slice {
      s = append(s, elem)
    }
  }
  return s
}
// Concat([]int{1,2,3}, []int{4,5}, []int{6}) == []int{1,2,3,4,5,6}

Generic types by example #

type Pair[T any, U any] struct {
  T Fst
  U Snd
}
// Can have Pair[int, string] or Pair[struct{}, Pair[*int, func()]] etc.

// This is a method for the generic type `Pair`
// The type parameters `T` and `U` are implicitly declared
// We cannot add any other explicitly declared type parameters with e.g. `[X any]`
func (p Pair[T, U]) Swap() Pair[U, T] {
  return Pair[U, T]{p.Snd, p.Fst}
}

Instantiation #

Generic functions are not functions; and generic types are not types. They become functions and types, respectively, when they are instantiated.

Instantiation is substituting (applying) type arguments for each of the declared type parameters. Instantiation may fail (at compile time, of course) if the given type arguments do not satisfy their respective type parameters’ type constraints.

func IdentityInt(x int) x { return x }
func Identity[T any](x T) T { return x }

var f = IdentityInt // OK
var g = Identity    // NOT OK: "cannot use generic function without instantiation"
var h = Identity[int] // OK
var i func(int)int = Identity // OK: implicit instantiation; type arguments were inferred

Type constraints # Usage #

The operations possible on an unconstrained type are limited:

func example[T any](t T) T {
  t2 := t  // OK: can be 'copied'
  var t3 T // OK: can declare new variable
  return t // OK: can return it
  new(T)   // OK: can allocate for it
  // etc.. moving it around, using it in other declarations; generally fine

  t + t    // NOT OK: `+` operator may not be available for all Ts
  t.field  // NOT OK: field may not be available
  t.m()    // NOT OK: method may not be available
  map[T]int{} // NOT OK: `T` may not be compatible for map keys (like a function type)
  // etc.. specific operations; cannot be assumed
}

So we use constraints to ‘gain’ information about a type that we may use in the function implementation:

type Stringer interface { String() string }

func example[T Stringer](t T) string {
  return t.String() // OK! we declared that `T` must implement `Stringer`
                    //     so we can rely on the presence of method `String() string`
}

Syntax #

Type constraints are defined through the general interface syntax:

Intersection of zero or more of (separated by ;)
- Method
- Union of one or more of (separated by |)
  - Empty interface type
  - Non-interface type
  - Underlying type of (denoted by ~)
    - Non-interface type

And any of the above may be wrapped by interface { … } zero or more times.

An interface type denotes a type set; the set of all types that may be assigned to it.

A union denotes a closed type set.

An intersection denotes an open type set, if it does not include any unions.

The comparable constraint is a builtin that denotes the type set of types that are strictly comparable. I.e. types that can be used as map keys.

Let’s explore:

[T any] // no constraint on `T`
[T interface{}] // same
[T interface{interface{}}] // valid syntax; exactly the same as above
[T interface{interface{interface{}}}] // same

[T interface{ M() }] // interface type with one method `M` with type signature `func()`
[T interface{ interface{ M() }] // again, same thing
// A type `T` satisfies the above if it has a method `M()`

[T interface{ M(); N() }] // interface with two methods
[T interface{ interface{ M() }; interface{ N() } }] // same thing
[T interface{ M(); interface{ N() } }] // same thing
// A type `T` satisfies the above if it has method `M()` AND it has method `N()`

[T interface{ int }] // type union of one element, `int`
[T interface{ interface{ int } }] // same
[T int] // shorthand for unions
// A type `T` satisfies the above if it is `int`

[T interface{ int | float32 }] // type union of two elements, `int` and `float32`
[T interface{ int | interface{ float32 }}] // you get the idea...
[T int|float32] // shorthand for unions
// A type `T` satisfies the above if it is either `int` or `float32`

[T interface{ ~int }] // type union of one element, `~int`
// A type `T` satisfies the above if its underlying type is `int`
// For example: `type PrimaryKey int`
// The `~` syntax basically 'unwraps' named types
[T ~int] // shorthand

[T interface{ ~int; M() }]
// A type `T` satisfies the above if its underlying type is `int` AND it has method `M()`
// For example: `type Code int; func (c Code) M() { ... }`

[T interface{ int; M() }]
// A type `T` satisfies the above if it is `int` AND has method `M()`
// ... which is impossible
// The builtin type `int` has no methods (nor any may be declared for it)

[T interface{ int; float32 }] // Intersection of union `int` and union `float32`
// A type `T` satisfies the above it is `int` AND it is `float32`
// ... which is impossible
// So the above denotes an empty type set.

[T interface{ comparable }]
[T comparable] // same
// A type `T` satisfies the above if it can be compared with `==` and `!=`
// This is needed to use a type parameter as a map key

// given
type Stringer interface { String() string }
// then
[T interface{ ~int; comparable; Stringer }]
// A type `T` satisfies the above if: its underlying type is `int`
//                                AND it is comparable
//                                AND it implements Stringer

[T interface{ M() U }, U any]
// Satisfies if `T` has a method `M() U`
// And `U` can be any type

type WithM[T any] = interface{ M() T }
[T WithM[U], U any] // same as above

Type inference #

Type inference follows suit from constraints and turns out to be rather principled. (Perhaps this is no surprise given that Philip Wadler worked on an early version of Go generics.)

Inference kicks in when we do not provide sufficient explicit type arguments at an instantiation site. That is, given a generic declaration with N type parameters, if we provide 0 to N-1 arguments during instantiation, inference will deal with the rest.

Type inference supports calls of generic functions and assignments of generic functions to (explicitly function-typed) variables.

The inference process uses a unification algorithm and is pretty math-y.

Let’s explore:

// I'll use a simplified syntax for type parameter lists + instantiation
[T any](T)[_](32) // means: given a generic function
                  //          with type parameters [T any]
                  //          and function parameters (T)
                  //        do not provide an explicit type argument for T
                  //        and provide 32 as a function argument
// inference will derive T=int

[T any, U any](T, U)[_, _]("hi", 42)
// straightforward: T=string; U=int

[T struct{Name U}, U any](T)[_, _](struct{Name string}{})
// T = struct{Name string} -- given by function argument
// but also:
// struct{Name U} = struct{Name string} -- given by constraint on T
// U = string -- given by 'pattern matching' on struct shape

[T any](struct{Name T})[_](struct{Name string}{}) // kind of the same
// struct{Name T} = struct{Name string} -- given by function argument
// T = string -- given by 'pattern matching' on struct shape

[T int|float32](T)[_](42)
// known:
//   T=int | T=float32 -- from constraints of `T`
//   T=int -- from function argument
// infers:
//   T=int

[T int|float32]()[_]() // note no function arguments!
// known:
//  T=int | T=float32 -- from constraints of `T`
// infers:
//  ??? fail

[T int]()[_]() // note no function arguments!
// known:
//  T=int -- from constraints of `T`
// infers:
//  T=int -- follows directly from known equations!

[T any, PT *T](T)[_, _](42)
// known:
//  PT=*T -- from constraints of `PT`
//  T=int -- from function argument
// infers:
//  T=int
//  PT=*int  -- from replacing T=int in PT=*T !
[T any, PT *T](T)[int, *int](42) // this becomes the effective call

type User struct{ ... }
type Users []User
[T ~[]E, E any, PE *E](T)[_, _, _](Users{})
// known:
//   T~[]E -- from constraint on `T`
//   PE=*E    -- from constraint on `PE`
//   T=Users  -- from function argument
// infers:
//   T=Users  -- follows directly
//   E=User   -- from T=User & T~[]E & Users~[]User ==> []E=[]User ==> E=User
//   PE=*User -- from PE=*E & E=User
[T ~[]E, E any, PE *E](T)[Users, User, *User](Users{}) // effective call

// wild!

Compilation #

A compiler must follow the semantics described in the language specification.

The language specification doesn’t say anything about translation (compilation), so any details about how language constraints are optimized and compiled are implementation-specific.

Regardless, it is worth having a general idea of how the current (1.22) Go compiler compiles generics.

Generics compilation is described in this design document.

In short, the compiler does stenciling (a la C++ templates), but it tries to reduce instantiations by reusing them across types that have the same ‘GC shape’.

The GC shape of a type means how that type appears to the allocator / garbage collector. It is determined by its size, its required alignment, and which parts of the type contain a pointer. Reference

For example:

Given the following Go code:

package main

type Num int

//go:noinline
func add[T ~int | ~float64](x, y T) T {
  return x + y
}

func main() {
  println(add(3, 3))
  println(add(5.2, 2.3))
  println(add[Num](9, 99))
}

We get the following x86 code:

main.main:
  PUSHQ	BP
  MOVQ	SP, BP
  SUBQ	$40, SP

  LEAQ	main..dict.add[int](SB), AX
  MOVL	$3, BX
  MOVQ	BX, CX
  CALL	main.add[go.shape.int](SB)
  MOVQ	AX, main..autotmp_2+32(SP)
  CALL	runtime.printlock(SB)
  MOVQ	main..autotmp_2+32(SP), AX
  CALL	runtime.printint(SB)
  CALL	runtime.printnl(SB)
  CALL	runtime.printunlock(SB)

  LEAQ	main..dict.add[float64](SB), AX
  MOVSD	$f64.4014cccccccccccd(SB), X0
  MOVSD	$f64.4002666666666666(SB), X1
  CALL	main.add[go.shape.float64](SB)
  MOVSD	X0, main..autotmp_3+24(SP)
  CALL	runtime.printlock(SB)
  MOVSD	main..autotmp_3+24(SP), X0
  CALL	runtime.printfloat(SB)
  CALL	runtime.printnl(SB)
  CALL	runtime.printunlock(SB)

  LEAQ	main..dict.add[main.Num](SB), AX
  MOVL	$9, BX
  MOVL	$99, CX
  CALL	main.add[go.shape.int](SB)
  MOVQ	AX, main..autotmp_3+32(SP)
  CALL	runtime.printlock(SB)
  MOVQ	main..autotmp_3+32(SP), AX
  CALL	runtime.printint(SB)
  CALL	runtime.printnl(SB)
  runtime.printunlock(SB)

  ADDQ	$40, SP
  POPQ	BP
  RET

main.add[go.shape.float64]:
  ADDSD	X1, X0
  RET

main.add[float64]:
  LEAQ	main..dict.add[float64](SB), AX
  CALL	main.add[go.shape.float64](SB)
  RET

main.add[go.shape.int]:
  LEAQ	(BX)(CX*1), AX // fancy add lol
  RET

main.add[int]:
  MOVQ	BX, CX
  MOVQ	AX, BX
  LEAQ	main..dict.add[int](SB), AX
  CALL	main.add[go.shape.int](SB)
  RET

main.add[<unlinkable>.Num]:
  MOVQ	BX, CX
  MOVQ	AX, BX
  LEAQ	main..dict.add[main.Num](SB), AX
  CALL	main.add[go.shape.int](SB)
  RET

Things to note:

The compiler stenciled the code:

main.add[go.shape.int]
main.add[go.shape.float64]

Those contain the actual function logic. Observe how the code is specialized to the respective data types. It uses the ADDSD instruction for float64, and a sneaky ‘addition’ instruction for int (to avoid ADD BX, CX + MOV CX AX).

But it also generated the wrappers:

main.add[int]
main.add[float64]
main.add[Num]

Those load the appropriate dictionaries from static data and then defer to the shape-based stenciled code.

Also, in main.main, it seems like the compiler is inlining the wrappers. The actual code is not inlined because I used the go:noinline pragma. (Otherwise everything would’ve been constant-folded.)

Examples # Generic algorithms #

type PrimaryKeyed[T any] interface {
  GetPrimaryKey() T
}

func SliceByKeys[T PrimaryKeyed[K], K comparable](elems []T) map[K]T {
  m := make(map[K]T, len(elems))
  for _, elem := range elems {
    m[elem.GetPrimaryKey()] = elem
  }
  return m
}

// ===

type PaymentProfile struct {
  ID uuid.UUID
  // etc
}
func (p PaymentProfile) GetPrimaryKey() uuid.UUID { return p.ID }

var paymentProfiles []PaymentProfile = ...

// map[uuid.UUID]PaymentProfile
profilesByUUID := SliceByKeys(paymentProfiles)

// note that the inference here is pretty wild:
// known:
//   []PaymentProfile = []T -- from function argument
//   T `sat` PrimaryKeyed[K] -- from constraint on `T`
//      ==> T `impl` interface{GetPrimaryKey() K}  -- from definition of constraint
// infers:
//   unify([]PaymentProfile, []T)
//     ==> T=PaymentProfile
//   impl(PaymentProfile, interface{GetPrimaryKey() K})
//     ==> unify(GetPrimaryKey() uuid.UUID, GetPrimaryKey() K)
//     ==> uuid.UUID=K
SliceByKeys[[]PaymentProfile, uuid.UUID](paymentProfiles) // effective call

func SliceGroupBy[T any, K comparable](elems []T, key func(T) K) map[K][]T {
  m := make(map[K][]T)
  for _, elem := range elems {
    k := key(elem)
    m[k] = append(m[k], elem)
  }
  return m
}

func SlicePartition[T any](elems []T, predicate func(T) bool) struct{True, False []T} {
  g := SliceGroupBy(elems, predicate) // infers K=bool; and it is comparable
  return struct{True, False []T}{g[true], g[false]}
}

// ===

g := SlicePartition(paymentProfiles, func(pp PaymentProfile) { return pp.IsBanned })
g.True  // banned profiles
g.False // non-banned profiles

Generic utilities #

type PaymentProfile struct {
  UUID uuid.UUID
  // ...
}

func (p *PaymentProfile) MarshalLogObject(enc zapcore.ObjectEncoder) error {
  enc.AddString("payment_profile_uuid", p.UUID.String())
  // ...
}

type ListPaymentProfilesResponse struct {
  PaymentProfiles []PaymentProfile
  // ...
}

func (r *ListPaymentProfilesResponse) MarshalLogObject(enc zapcore.ObjectEncoder) error {
  enc.AddArray("payment_profiles", zapcore.ArrayMarshalerFunc(func(arrEnc zapcore.ArrayEncoder) error {
    for _, pp := range r.PaymentProfiles {
      arrEnc.AppendObject(&pp) // need to take address here because
                               // MarshalLogObject is implemented on *PaymentProfile
    }
  })
  // ...
}

// or sometimes we create `type PaymentProfiles []PaymentProfile`
// and implement `zapcore.ArrayMarshaler` on it

// either way, it is a lot of duplication

// ===

func MarshalArrayOfPtrObject[T ~[]E, E any, PE interface {
	*E
	zapcore.ObjectMarshaler
}](enc zapcore.ObjectEncoder, key string, arr T) error {
	enc.AddArray(key, zapcore.ArrayMarshalerFunc(func(arrEnc zapcore.ArrayEncoder) error {
		for _, elem := range arr {
			var ptr PE = &elem // cool
			arrEnc.AppendObject(ptr)
		}
	}))
}

// ===

func (r *ListPaymentProfilesResponse) MarshalLogObject(enc zapcore.ObjectEncoder) error {
  MarshalArrayOfPtrObject(enc, "payment_profiles", r.PaymentProfiles)
  // ...
}

Crazy stuff #

The example below is inspired by Haskell’s type equality module.

// Given
type Slice[T any] []T

// We could write
func SliceJoin(s Slice[string], sep string) string {
  // join strings
}

// But it is not a method, so it cannot be chained:
Slice[string]{...}.Filter(...).Join(...) // ???

// Could we write this?
func (s Slice[string]) Join(sep string) { ... }
// NO we can't
// Although this compiles, `string` above is actually a type parameter
// and it is not the concrete builtin type `string`.
// Remember, all of these mean the same:
func (s Slice[T]) Join(sep string) { ... }
func (s Slice[U]) Join(sep string) { ... }
func (s Slice[Hello]) Join(sep string) { ... }
func (s Slice[hello]) Join(sep string) { ... }
func (s Slice[_]) Join(sep string) { ... } // special case, when we don't care

// So are we stuck with top-level functions?
// Nope.

// Enter type witnesses 😈

// Note: this makes a LOT more sense if you understand "Propositions as Types"

// ===

// A 'witness' that T=U
type TyEq[T, U any] interface {
  Cast(T) U // if T=U, then I should be able to cast `T` to `U`
}

// TyEqImpl[T] implements TyEq[T, T] which is a trivial case, of course
type TyEqImpl[T any] struct{}
func (_ TyEqImpl[T]) Cast(t T) { return t }

// `Refl` stands for reflexivity, i.e. for every type T, then T=T
func Refl[T any]() TyEq[T, T] { return TyEqImpl[T]{} }

// ===

// We can only join a Slice[T] if T=string
// So we demand from the caller a PROOF that T=string, in the form of TyEq[T, string]
func (s Slice[T]) Join(ty TyEq[T, string], sep string) string {
  x := ""
  for _, elem := range s {
    x += ty.Cast(elem) + sep // and we use it to 'cast' elem:T to a string
  }
  return x
}

// ===

// Here, the call to Join() is made on a Slice[string]
// So we are being asked for a TyEq[string, string], which is trivial by reflexivity
Slice[string]{...}.Filter(...).Join(Refl(), ", ")

// In a way, we have worked around Go's limitation on generic methods 😇

The above trick only works for type equality. It would also be nice to encode “T implements A” as a value:

// A witness that type T implements interface A
type Implements[T any, A any] interface{
  Cast(T) A
}

type ImplemenentsImpl[T A, A any] struct{} // DOES NOT COMPILE

The language does not support using a type parameter (A) as a constraint (T A). I created a GitHub issue to propose enabling this, but it didn't go anywhere unfortunately.

Closing thoughts #

I recommend learning Haskell
- Haskell code is extremely generic
- Its “generics” (parametric polymorphism) also follow a unification-based inference
- It has helped me shape my generic reasoning and design
In general, only use generics for ‘library-level’ code. Business code should seldom use generics.

https://garciat.com/posts/go-generics/

Extensions

Understanding Go interfaces

May 8, 2024

I explore the implementation details of Go interfaces.

Show full content

Introduction #

We usually write very simple code (as we should) where a single type implements a single interface, and we export the interface but hide the implementation. Classic. We learned that in OOP 101.

type Controller interface {
  // public methods
}

func New(...) Controller { return &controller{ ... } }

type controller struct { ... } // concrete type

// public methods implementations
func (c *controller) Method1() { ... }
// maybe some private methods
func (c *controller) private1() { ... }

// git commit. arc diff. arc land. go home!

But Golang is not an OO language. In fact, its interfaces are outright bizarre to someone coming from e.g. Java. Why? Because we never declare that we want to implement an interface. It ‘just works’. We can fabricate anonymous interfaces from thin air and, as long as the methods match, cast any type into it; even types that are owned by different packages and are separately compiled.

// defined somewhere inside the standard library; already compiled in a shared object
package io
type File struct { ... }
func Open() *File { ... }
func (f *File) Write() { ... }
func (f *File) Flush() { ... }
func (f *File) Close() { ... }

// our own code
package main
import "io"
func main() {
  // we can come up with an anonymous interface with some methods
  var what interface { Flush(); Close() }
  what = Open() // casting an `*io.File` to it just works 🤷‍♂️
  what.Flush()
  what.Close()
}

Of course, we may have read some tutorials (or even the language spec) so we are aware of Go’s semantics. It may be bizarre, but it is explained by the rules of the language.

But… have you ever wondered HOW that works? After all, this isn’t JavaScript; Go is compiled to machine code ahead of time. What kind of magic is the compiler pulling off to get this to run?

Well, let’s find out.

First, we will review a few bits and pieces of Golang that are relevant to our exploration. I will also introduce what I call Go-- (Go minus minus), which will aid us in understanding compilation without reading assembly code.

Then, we will dive deep into Go’s runtime and compiler transformations relevant to types and interfaces. At each step, we will decompile Go into Go-- and try to understand what’s going on.

Review # Types #

This is a quick overview of the kinds of types available in Go.

// === Basic types (primitives, slices, maps, structs, channels, pointers)
var a int
var b float32
var c []int
var d map[string]string
var e struct { x, y int }
var f chan int
var g *int
// i.e. any type that can be instantiated

// === Basic Interfaces
var a interface { M() }
var b interface { M(); N(int) }
var c1 interface {}
var c2 any // same as above
// (let's ignore general interfaces for now)

// === Defined types
type A int
type B []int
type C struct { x, y int }
type D interface { M() }
type E *int
// different from _type alias_
type F = int
// because only defined types can be method receivers (with exceptions)
func (a A) Hello() {} // OK
func (b B) Hello() {} // OK
func (c C) Hello() {} // OK
func (d D) Hello() {} // NO: D is an interface type
func (e E) Hello() {} // NO: E is a pointer type (weird, to me)
func (f F) Hello() {} // NO: F is an alias

Implementing interfaces #

Interfaces are structurally-typed (vs. nominal in e.g. Java)

Given a non-interface type X and an interface A; if X has methods of A, then X implements A. Without any explicitly-declared intent to implement A!

package main

type A interface {
  M()
  N()
}

type X struct{}
func (x X) M() {}
func (x X) N() {}

var a A = X{} // OK!

var b interface{ M(); N() } = X{} // Also OK!

Method receivers #

package main

type X struct{}
func (x X) M() {}  // value receiver
func (x *X) N() {} // pointer receiver

func exampleX(x X) {
  x.M()
  x.N() // OK: compiler takes address of `x` for me (allocates if `x` escapes in `N`)
}
func examplePtrX(x *X) {
  x.M() // OK: compiler makes a copy of `x` in the stack for me
  x.N()
}

The above roughly compiles to:

func `main.X.M`(x X) {} // receiver becomes the first parameter
func `main.(*X).N`(x *X) {}

func `main.exampleX`(x X) {
  `main.X.M`(x)
  `main.(*X).N`(&x) // if escape analysis knows that this pointer can escape
                    // then the parameter `x` is copied to a new heap-allocated X
                    // and from then on, `x` refers to the value in the heap
                    // and forgets about the value received in the stack argument.
                    // example here (but that's a different talk!)
}

func `main.examplePtrX`(x *X) {
  var stackX X = *x
  `main.X.M`(stackX)
  `main.(*X).N`(x)
}

The above code is our first Go-- snippet. Note how the function names were replaced by their linker symbol names. (This will be relevant later on.)

☝️ A linker symbol represents a named address into a compiled binary (executable or shared object). A symbol may either point to a segment of compiled machine code (referred to as ‘text’); or a global variable; or a static read-only resource.

The Go compiler will prefix symbols with package paths/names, as well as unique prefixes like “type:”. The goal is for symbols not to clash with each other.

PS: Go-- is an example of “lowering”, a compilation technique that consists of rewriting more complex semantic constructs in terms of simpler ones. Throughout this document, we perform lowering on the specific constructs that we care about understanding.

Let’s Go # Runtime type information (RTTI) #

The compiler stores a runtime representation of types in the compiled binary.

This information is used for reflection and other runtime capabilities (like dynamic casting).

The structure of this information is defined in internal/abi/type.go.

ABI stands for “application binary interface”. It is similar to an API, but specially concerned with memory access compatibility across Go binaries (through memory layout, register use, etc.).

package abi

// Type is the runtime representation of a Go type.
type Type struct {
  Size_       uintptr
  // ... etc ... a lot of information about the type
}

type StructType struct {
  Type
  PkgPath Name
  Fields  []StructField
}

type InterfaceType struct {
  Type
  PkgPath Name
  Methods []Imethod
}

// and so on ... for ArrayType, ChanType, MapType, FuncType, etc.

Compiling a type #

All types (primitive types, user-defined types, and even anonymous types) must be compiled in terms of their runtime type information as described in the previous section. This is necessary because all values may be reflected on.

Given this source code:

package main

type X struct { x, y int }
func (x X) M(int) string { return "hi" }

It (roughly) compiles to the following code. Here, I use “const” variables to represent the RTTI metadata that is embedded into the Go compiled binary. (In reality, it happens in a very different way.)

import "internal/abi"
import "runtime/symtab"

// * this is not the exact type/data; for illustration purposes only
const `type:main.X` abi.StructType = {
    Size_: 8,
    // ...
    Fields: // info about fields x and y
    Methods: [1]abi.Method{ ... }, // actually lives in 'StructUncommon'
}

// Methods become plain functions; receiver is the first parameter
func `main.X.M`(x X) {}

The takeaway here is that the compiler generates a lot of metadata about types and embeds it into the final compiled binary. As mentioned earlier, this information is necessary for reflection and other runtime capabilities.

Interface value representation #

A value of an interface type is represented at runtime as a data pointer + a type information ‘header’. The compiler calls this an “iface”.

An “iface” value is not allocated; it is passed around by-value as a struct/tuple/pair of those two components. This stands in contrast to languages like Java or even C++ that must allocate the object header along with the object data. Go does not pay this cost upfront; and when it is needed, it only occupies stack space (or registers). In other words, we only pay for polymorphism if we’re using it.

Defined in internal/abi/iface.go:

package abi

import "unsafe"

type ITab struct {
  Inter *InterfaceType // the type of the interface
  Type  *Type          // the type of the underlying, concrete value
  Hash  uint32         // copy of Type.Hash. Used for type switches.
  Fun   [1]uintptr     // array of pointers to the method implementations
                       // * safe to cast to [len(Inter.Methods)]uintpr
}

// Note: not actually defined here (seems to be redefined in multiple places: 1, 2)
type IFace struct {
  Tab  *ITab
  Data unsafe.pointer
}

// EmptyInterface describes the layout of a "interface{}" or a "any."
type EmptyInterface struct {
  Type *Type
  Data unsafe.Pointer
}

Compiling a static cast to interface #

A static cast is a cast between two compile-time-known types. Syntactically, a type cast occurs when:

Explicitly
- cast expression
  - A(x)
Implicitly
- variable assignment
  - var a A = x
- argument passing
  - (func (A) {...})(x)
- return values
  - func () A { return x }

Given the source code:

package main

type X struct {}
func (x X) M() {}

type A interface { M() }

func example(x X) A {
  return x
}

It (roughly) compiles to:

import "abi"
import "runtime" // func newobject(typ *abi.Type) unsafe.Pointer
import "unsafe"

const `type:main.X` abi.StructType = ...

const `type:main.A` abi.InterfaceType = ...

const `go:itab.main.X,main.A` abi.ITab = {
  Inter: &`type:main.A`,
  Type:  &`type:main.X`,
  Fun: [1]uintptr{
    &`main.(*X).M()`, // uses the pointer receiver wrapper
  },
}

func `main.X.M`(x X) {}

// a wrapper is generated
func `main.(*X).M()`(ptr *X) {
  var x X = *ptr // copy to stack
  `main.X.M`(x)
}

func `main.example`(x X) abi.IFace {
  // ITab methods always need a pointer receiver; plus `&x` is escaping the function
  var heapX unsafe.Pointer = runtime.newobject(&`type:main.X`)
  // and copy the expected contents into it
  *(*X)(heapX) = x
  // above may also be done by one of the runtime.convXXX functions
  return abi.IFace{
    Tab: &`go:itab.main.X,main.A`,
    Data: heapX,
  }
}

(Bonus) Compiling a cast to empty interface #

Given the source code:

package main

func example(x int) any {
  return x
}

It (roughly) compiles to:

import "abi"
import "runtime" // func convT64(val uint64) unsafe.Pointer

func `main.example`(x int64) abi.EmptyInterface {
  return abi.EmptyInterface{
    Type: &`type:int`,
    Data: runtime.convT64(x),
  }
}

Compiling a dynamic cast to interface #

Given the source code:

package main

type A interface { M() }

func example(x any) A {
  return x.(A)
}

It (roughly) compiles to:

import "abi"
import "runtime" // func typeAssert(s *abi.TypeAssert, t *abi.Type) *abi.ITab

const `type:main.A` abi.InterfaceType = ...

const `main.typeAssert.0` abi.TypeAssert = {
  Inter:   &`type:main.A`,
  CanFail: false, // because not doing `a, ok := a.(A)`
}

func `main.example`(x abi.EmptyInterface) abi.IFace {
  return abi.IFace{
    Tab:  runtime.typeAssert(&`main.typeAssert.0`, x.Type), // panics if bad cast
    Data: x.Data,
  }
}

The above “runtime.typeAssert”, after some runtime validations, will eventually call runtime.itabInit. Its job is to do the actual work of verifying that the concrete type’s methods implement the interface’s methods.

(I don’t expect you to read the code below; I pasted it in for you to get a ‘feel’ of the complexity behind.)

// itabInit fills in the m.Fun array with all the code pointers for
// the m.Inter/m.Type pair. If the type does not implement the interface,
// it sets m.Fun[0] to 0 and returns the name of an interface function that is missing.
// If !firstTime, itabInit will not write anything to m.Fun (see issue 65962).
// It is ok to call this multiple times on the same m, even concurrently
// (although it will only be called once with firstTime==true).
func itabInit(m *itab, firstTime bool) string {
  inter := m.Inter
  typ := m.Type
  x := typ.Uncommon()

  // both inter and typ have method sorted by name,
  // and interface names are unique,
  // so can iterate over both in lock step;
  // the loop is O(ni+nt) not O(ni*nt).
  ni := len(inter.Methods)
  nt := int(x.Mcount)
  xmhdr := (*[1 << 16]abi.Method)(add(unsafe.Pointer(x), uintptr(x.Moff)))[:nt:nt]
  j := 0
  methods := (*[1 << 16]unsafe.Pointer)(unsafe.Pointer(&m.Fun[0]))[:ni:ni]
  var fun0 unsafe.Pointer
imethods:
  for k := 0; k < ni; k++ {
    i := &inter.Methods[k]
    itype := toRType(&inter.Type).typeOff(i.Typ)
    name := toRType(&inter.Type).nameOff(i.Name)
    iname := name.Name()
    ipkg := pkgPath(name)
    if ipkg == "" {
      ipkg = inter.PkgPath.Name()
    }
    for ; j < nt; j++ {
      t := &xmhdr[j]
      rtyp := toRType(typ)
      tname := rtyp.nameOff(t.Name)
      if rtyp.typeOff(t.Mtyp) == itype && tname.Name() == iname {
        pkgPath := pkgPath(tname)
        if pkgPath == "" {
          pkgPath = rtyp.nameOff(x.PkgPath).Name()
        }
        if tname.IsExported() || pkgPath == ipkg {
          ifn := rtyp.textOff(t.Ifn)
          if k == 0 {
            fun0 = ifn // we'll set m.Fun[0] at the end
          } else if firstTime {
            methods[k] = ifn
          }
          continue imethods
        }
      }
    }
    // didn't find method
    // Leaves m.Fun[0] set to 0.
    return iname
  }
  if firstTime {
    m.Fun[0] = uintptr(fun0)
  }
  return ""
}

Compiling a call to interface method #

Given the source code:

package main

type A interface { M(int, int) int }

func example(a A) int {
  return a.M(10, 20) // the compiler knows that A is an interface type,
                     // so the A.M() call is an interface method call
}

It (roughly) compiles to:

import "abi"
import "unsafe"

const `type:main.A` abi.InterfaceType = ...

func `main.example`(a abi.IFace) {
  // `a.Tab.Fun[0]` contains the address of the code we want to jump to
  // first, cast it to a function pointer with the expected signature
  // note that the first argument is a pointer to the receiver
  f := (*func(unsafe.Pointer, int, int) int)(unsafe.Pointer(a.Tab.Fun[0]))
  // the receiver is stored in the `abi.IFace.Data` field
  f(i.Data, 10, 20) // indirect call (costly)
}

Note that the above does not actually compile due to how Golang handles function pointers. For a full running example, check out this snippet.

Resources #

Clement Rey's Go Internals: Chapter 2: Interfaces
- This was the kickstarter for my investigation
Go language spec
Go compiler source
Compiler Explorer

https://garciat.com/posts/go-interfaces/

Extensions

Programming with Units

Mar 5, 2021

I describe a code meta-pattern that I have developed and applied in the last year.

Show full content

Abstract #

In this post I describe a code 'meta-pattern' that has helped me segregate responsibilities and keep tests neat.

Note: there is more I'd like to say about this topic, so consider this a draft.

Structurally #

A Unit is a class that follows this structure:

Its instance fields (if any) are private and final. These are called dependencies.
It has a trivial constructor which receives the expected dependencies and sets them to the corresponding fields.
It has a single public instance method.
It does not reference any stateful classes directly (i.e. If a stateful class is referenced, then it must be declared as a dependency).

Example:

@RequiredArgsConstructor // trivial constructor via Lombok
public class NameForTheClass {

  private final SomeDependencyA a;
  private final SomeDependencyB b;

  public ReturnType methodName([Arg arg, ...]) {
    [...]
  }

  // as many private methods as desired
}

Semantically #

Units and their dependency requirements form a DAG.
- This graph is resolved at application startup by a Dependency Injection library.
- Thus, all Units are singletons.
Units are stateless.
Units with only stateless dependencies (aka pure Units) are referentially transparent.
A Unit's public method describes its interface. (Obviously.)

Testing #

All of a Unit's dependencies are mocked.
Tests exercise individual code paths, stubbing the necessary method calls on the mocked dependencies.

Example JUnit 5 tests with Mockito:

@RunWith(MockitoJUnitRunner.StrictStubs.class)
public class NameForTheClassTests {

  @Mock private SomeDependencyA a;
  @Mock private SomeDependencyB b;
  @InjectMocks private NameForTheClass subject;

  @Test
  public void methodName_pathA() {
    when(a.someMethod(x, y)).thenReturn(z);

    val result = subject.methodName(a, b, c);

    val expected = [...];
    assertThat(expect).isEqualTo(result);
  }
}

Intuition: Correct by Construction #

A cluster of pure Units can be programmed in a 'correct by construction' fashion:

Assume the received dependencies are correct.
- And they are, if tested appropriately.
Implement logic using the dependencies.
- Test the logic appropriately, mocking the dependencies.
The resulting Unit is now correct.
Note: 'Correct by construction' is not a strict claim -- just an analogy.

Prescription: Programming Paradigm #

My recommended programming paradigm is Functional Programming.

This means that:

Business logic is exclusive to Units.
Data types are immutable POJOs with trivial constructors and no behavior.
- Exception: Base Types (TODO).
The input and output types of a Unit must be immutable.

https://garciat.com/posts/programming-with-units/

Extensions

My Programming Principles in 2020

Dec 20, 2020

This is rough brain dump of several principles/ideas that I applied in 2020 while working at Uber.

Show full content

Abstract #

This is rough brain dump of several principles/ideas that I applied in 2020 while working at Uber.

The fact that the code runs and does what you want means very little #

There are several other aspects of code that matter critically in a professional environment.

Correctness composes #

It's easier to prove that a small thing is correct.
- So write small classes; glue them together.
  - A small class is easy to test, change, reason about, delete, refactor.
  - The hard part is picking What it does and its Name, of course!
- If the glue is correct, then everything is correct.
To prove that something is correct:
- Try all inputs
  - If there are too many combinations (N*M inputs)
  - Then use Types to constrain the domains! (smaller N * smaller M)
  - Or if you split the responsibility (if possible) you get N+M
- Exercise all code paths
  - If there are too many combinations (N*M paths)
  - Then extract the inner part to a class. Now you have N+M.

Let tests tell you how complex your code is #

If tests for a single class are:
- Abundant (many test cases)
- Require big preambles/setups
  - Like loading big, complex JSON fixtures
- Hard to read
- ...
Then the code being tested is too complex!
Note: do not try to refactor the tests!
- They are only a side-effect of the logic being tested!

Try to not couple tests to logic that is not the subject of the test # Some test inputs shall remain fully abstract # There are kinds of responsibilities #

For example:
- Orchestrating other classes
- Bridging domains
- Making decisions
- Bridging layers
- …
It is useful to understand which kind of responsibility you’re writing.
- And make sure you’re doing only one!

There are kinds of data #

For example:
- Boundary types (aka DTOs)
- …
Again, it is useful to understand which kind of data type you’re writing.
- And make sure you’re doing only one!

Arrows shall flow in a single direction #

Arrows include:
- Control flow
- Data flow
- Class relationships
- Dependencies
- Layers
- …
The ideal shape is a line.
- But sometimes a shallow tree may be better than a very long line.
  - One problem with a long line is the burden of passing and accumulating context.
    - Either as data or just cognitively.
The next best shape is a tree.
- Remember: sibling tree branches never meet at any point (except for their parent).
- Do note: I have a strong suspicion that shallow trees are easier to deal with than deep ones.
If your arrows form a graph, you’re in trouble.
- Think very very hard and try to make it a tree.
- At least try to make it a DAG.
- If all else fails, aim to reduce the size of your loops.

Immutability changes everything #

Mutation introduces a boatload of problems:
- ...
That said, if you must, mutation is only allowed within the scope of a single method.
- And only on data that was created within that scope.
- Imagine constructing a list or some other structure.

Data or logic; not both #

Only place logic in business classes.
Business classes do not have data fields.
- They only have dependencies on other business classes.
There is one exception:
- Base types (will try to think of a better name)
  - (Explanation)
- But this comes at a cost: these are hard to mock and usually shouldn’t be mocked.
  - Hard to mock because of
    - A) coupling to constructor / static factory
    - B) repeated/complex interactions
      - Think how much fun you’d have mocking a Stream object!

Rich data reifies meaning, but is interpretation-free #

This one is pretty abstract, but there is something to it. I need to think harder on how to articulate it.
Some thoughts:
- Computation flow/semantics can be reified into data
  - Think eDSL + interpreter

Plain data = pure interface #

See: Scott encoding
The moment you deviate from:
- Immutable
- Trivial constructor
- Structural equality
Then you are in a different territory.
- Not bad.
- Just be aware.

Think in terms of types and transformations # Added meaning = changed type #

For example:
- If you validated that a String is a valid currency code,
- Then you should wrap that String in a CurrencyCode type.
- Otherwise, that String is still a String.
  - And any code that touches it must hold an implicit assumption (which implies accidental coupling) that the String has been validated at some point.
    - Haha! And sometimes that assumption becomes a hope!

Organize code in domain packages #

What is the point of putting all Controllers in a controller folder when they’re all functionally unrelated to each other? Really.

‘Code reuse’ considered harmful #

Writing libraries is extremely hard.
- And even then, experts constantly fail.
- I’m sure you’ve come across more than a few hideous libraries and frameworks.
And the cost of failure is, by definition, high.
- The code is being used in many places. So it will be wrong in many places.
Another point:
- Things that are conceptually different should be different.
  - TODO make this a principle?
- ...

‘Everything is an X’ considered harmful #

Everything is a Controller. Everything is a Service. Everything is an Object.
If you like, sure.
But what distinguishes controller A from controller B?
And how do you make that distinction evident to others?

‘Mappers’ considered harmful #

‘Mapping’ means nothing.
- Any function is a mapping. Literally.
Because the name means nothing and imposes no restrictions
- Then the code will inevitably do anything
- This is a ‘Pit of Despair’
Important: if they are not declared dependencies (e.g. in Spring) they will be retested and add more complexity to the overall tests.
- See: "Unmockable + reused = double-testing & test breakage"
Instead consider:
- If the input is some structured object and the output is a String
  - Call this a Format or Codec
- If the input is a String and the output is a structured object
  - Call this a Parser
- If the input is a Thrift/Proto object and the output is a structured object
  - Call this a Parser
  - Yes, that’s right.
    - Because the code is taking barely-structured data (structs and strings, basically) into well-typed, well-structured data in your internal model
  - Bonus: parsers compose (shameless plug).
- And more
  - The point is to consider what are the layers or domains being bridged.

Java Builders considered harmful #

Automatically-generated Builders (eg via Lombok) have issues:
- It is possible to forget setting a field, so it will be null and NullAway wont pick it up
  - Source of bugs
  - Pit of Despair
- If you use @NonNull, then you just turned a compiler-detectable error into a runtime error
  - Read the point below
  - Also: "Static > Dynamic"
Instead, use straight up constructors or factory methods
- Then whenever you add a new field to your data type, all callers will break. And that is exactly what you want. The compiler will not let you move on unless you fix this.
  - Builders will silently compile and fail at runtime.

null considered harmful # ‘Automappers’ considered harmful # ‘Anti-boilerplate’ sometimes considered harmful # Mind your layers # Mind your public API surface # Unmockable + reused = double-testing & test breakage # If error handling is not required, it will be skipped #

Yet another ‘Pit of Despair’.
Consider using ADTs
- Where success and failure cases are represented by the same type
- Then the observer must pattern match on the ADT to be able to proceed
  - And it must handle all the cases

Algebraic Data Types are not great; they’re necessary #

What are ADTs? Shameless plug.
- Think of them as “enums with data inside”.
A very useful feature is that an ADT disjunction (case A, or case B) can be used not only to model decisions but also to separate the decision maker from the decision observer.
- The decision maker returns the decision as one of the cases of an ADT.
- The decision observer uses pattern matching to exercise/interpret the decision.
- This helps with:
  - Separation of concerns
  - Sticking to a single responsibility
  - Avoiding deep class dependencies
    - Because instead of calling the next step in the flow as a decision, you return the decision as data and let your caller do that.

(Tip) Use ADTs in your public API to remain open to extension # Build ‘Pits of Success’ #

If you are designing an API that others will use (e.g. Thrift)
- Then you want the correct thing to be the easiest and simplest thing to do
If it is easy to make a mistake (misinterpret, misuse, misunderstand)
- Then the wrong thing will inevitably happen!
- aka Murphy’s Law
In contrast, a ‘Pit of Despair/Doom’ is an API that is so hard to use that misuse is the default.
Some tips:
- Leverage types!
  - Make incorrect inputs or states impossible to represent!
- If all else fails, use documentation.
Credits to Scott Hanselman for this idea.

Incorrect code should not compile #

Contrast these signatures:
- sadMethod(String, String, String)
- happyMethod(Currency, Locale, UserName)
I think it is clear which one evokes the ‘Pit of Success’ concept.
Again, if it is possible to mix up the arguments, then they will be mixed.
- Good luck writing your postmortem and refunding your users!
That is just one example of leveraging types so that the compiler makes it impossible to do the wrong thing.
There are many more cases, examples, and techniques.
- To learn many of them: learn Haskell!

One public method #

If a business class has more than one public method, then it probably has more than one responsibility.
- And we want to avoid that!

Static > Dynamic #

It is easier to understand static relationships.
It is harder to understand relationships that depend on time or the state of the system.
Relationships are:
- Caller-callee relationships
- Dependencies
- Control flow
- ...

Avoid inheritance #

I have seen it misused more frequently than not.
- And if it does work, it will probably not be compatible with the rest of these principles, because I simply avoid thinking in terms of inheritance.
Many authors advise against it.
It is one of the (if not the) strongest coupling relationships in the toolbox.
- And that is very risky.

Be wary of dynamic business-level higher-order inputs & outputs #

Dependencies are static higher-order inputs
- They are easier to reason about, mock

(Proto idea) Structural equality only #

https://garciat.com/posts/programming-principles-2020/

Extensions

https://garciat.com/posts.rss

Posts