Jake Bailey — GeistHaus

Public Speaking

May 11, 2026

public-speaking

Show full content

Here’s all of my public speaking stuff, or at least what’s out there on the internet.

2026 Keynote - TS 7: How We Got There

TSKaigi 2026, May 22, 2026

e18e & friends E003 - Jake Bailey

e18e, April 28, 2026

Recording

2025 What’s Coming in TypeScript 6/7

typescript.fm, November 11, 2025

Recording

Why and How We Ported TypeScript to Go

SquiggleConf 2025, September 18, 2025

Abstract

In March 2025, we surprised everyone by announcing TypeScript’s port to Go. This is a certified Big Deal™, given the scale, complexity, and importance of the TypeScript toolchain.

From the beginning, TypeScript has been written in TypeScript; like most languages, we’re self hosting. We work on our own toolchain, fix our own bugs. But as time went on, we faced the challenges with the compiler’s performance, largely inherent to the implementation language itself. We squeezed every ounce of performance we could, but we needed to scale further. And through experimentation and testing, we decided to port TypeScript to Go, achieving a 10x faster TypeScript.

In this talk, we’ll go over the why and the how. Why Go turned out to be the perfect language for the port, why it was sometimes hard to do (but also sometimes easy), how we actually were able to port 150k lines of code and 90k tests, and how this will affect you!

Recording • Slides • Source code

Porting the TypeScript Compiler to Go for a 10x Speedup

GopherCon 2025, August 27, 2025

Abstract From the beginning, the TypeScript compiler has been self-hosted, evolving alongside a growing ecosystem of millions of developers. As time went on, we faced challenges with the compiler’s performance, largely inherent to the implementation language itself. Through experimentation and testing, we found Go to be an excellent language for our specific needs; a perfect porting language. In this talk, we will explore the process of porting the 150,000+ line TypeScript compiler and its 90,000+ tests to Go, the challenges we faced, lessons we learned, all leading to an overall 10x performance improvement over our previous implementation.

Recording • Slides • Source code

TypeScript with Jake Bailey

Software Engineering Daily, July 15, 2025

Recording

2023 Migrating TypeScript to Modules: The Fine Details

TypeScript Congress, September 21, 2023

Abstract In TypeScript 5.0, the TypeScript toolchain migrated to modules. In this talk, we’ll get deep in the weeds, discussing what “modules” even are (and how we somehow weren’t using them), the specifics of the migration itself, how we managed to make the switch “mid-flight” on an actively-developed project, how the migration went, and what’s next.

Recording • Slides • Source code

https://jakebailey.dev/public-speaking/

Projects

May 11, 2026

projects

Show full content

Here’s a listing of a few of my side projects.

hereby

hereby is a simple task runner, kinda like gulp or make, but much smaller (~500 lines). I created it during the conversion of TypeScript to modules so I could better represent the dependency graph of all of our build steps, as well as eliminate a huge swath of devDependencies.

Go ahead and use it, if you dare; the only user I plan on actually supporting is TypeScript itself, though some daring projects appear to have switched to it.

every-ts

every-ts is a utility that can build and bisect any version / commit of TypeScript. It’s useful for finding which PR broke (or fixed) something, without figuring out how to build TypeScript.

pprof-it

pprof-it is wrapper for pprof, allowing for quick and easy profiling of Node programs that can be loaded into the pprof tooling. If I’m profiling something that can be run at the CLI, I’m using this.

esbuild-playground

esbuild-playground is “yet another” playground for esbuild. Like the TypeScript playground, it supports links and auto-compiles as you type. It’s very basic right now, but whenever I get some free time (hah) I’ll expand it.

pyright-action

pyright-action is a GitHub Action for pyright (a type checker for Python), allowing for fast execution through caching, plus PR comments for errors.

https://jakebailey.dev/projects/

How much faster?

May 11, 2026

.post-header { text-align: center !important; } .entry-hint-parent { justify-content: center !important; } .calculator-container { max-width: 600px; margin: 0 auto; padding: 1rem; } @media (max-width: 640px) { .calculator-container { padding: 0.5rem; } } .calculator-form { display: grid; grid-template-columns: 1fr 1fr; gap: 2rem; margin-bottom: 2rem; } @media (max-width: 640px) { .calculator-form { grid-template-columns: 1fr; gap: 1rem; } } .input-group { display: flex; flex-direction: column; } .input-group label { font-weight: 500; margin-bottom: 0.5rem; color: var(--primary); } .input-group input { padding: 0.75rem; font-size: 1.1rem; border: 2px solid var(--border); border-radius: var(--radius); background: var(--entry); color: var(--primary); transition: border-color 0.2s ease; } @media (max-width: 640px) { .input-group input { padding: 1rem; font-size: 1.1rem; } } /* Hide number input spinners */ .input-group input[type="number"]::-webkit-outer-spin-button, .input-group input[type="number"]::-webkit-inner-spin-button { -webkit-appearance: none; margin: 0; } .input-group input[type="number"] { -moz-appearance: textfield; } .input-group input:focus { outline: none; border-color: var(--secondary); } .results-container { background: var(--entry); border: 1px solid var(--border); border-radius: var(--radius); padding: 1.5rem; margin-top: 1rem; opacity: 0; transform: translateY(-10px); transition: all 0.3s ease; } .results-container.show { opacity: 1; transform: translateY(0); } .results-container h3 { margin: 0 0 1rem 0; color: var(--primary); font-size: 1.2rem; } .result-item { display: flex; flex-direction: column; align-items: center; padding: 1rem 0; border-bottom: 1px solid var(--border); text-align: center; } .result-item:last-child { border-bottom: none; } .result-label { color: var(--content); margin-bottom: 0.5rem; font-size: 0.9rem; } .result-value { font-weight: 600; font-size: 1.3rem; color: var(--primary); } .result-value.positive { color: #22c55e; } .result-value.negative { color: #ef4444; } .copy-button { display: inline-flex; align-items: center; gap: 0.5rem; padding: 0.5rem 1rem; background: var(--primary); color: var(--theme); border: none; border-radius: var(--radius); font-size: 0.9rem; cursor: pointer; transition: all 0.2s ease; margin-top: 1rem; width: 100%; justify-content: center; } .copy-button:hover { opacity: 0.8; transform: translateY(-1px); } .copy-button:active { transform: translateY(0); } .copy-button.copied { background: #22c55e; } .copy-button svg { width: 16px; height: 16px; } .attribution { max-width: 600px; margin: 2rem auto 0 auto; padding: 1rem; text-align: center; color: var(--secondary); font-size: 0.9rem; border-top: 1px solid var(--border); } .attribution a { color: var(--primary); text-decoration: none; } .attribution a:hover { text-decoration: underline; } Old time: New time: Results Performance improvement: — Speed multiplier: — Time saved: — Copy Link Inspired by Paul Irish's original how-much-faster page // Hash routing functions function updateHash() { const baseline = document.getElementById('baseline').value; const newtime = document.getElementById('newtime').value; if (baseline && newtime) { window.location.hash = `old=${baseline}&new=${newtime}`; } else { window.location.hash = ''; } } function loadFromHash() { const hash = window.location.hash.substring(1); if (!hash) return; const params = new URLSearchParams(hash); const baseline = params.get('old'); const newtime = params.get('new'); if (baseline) { document.getElementById('baseline').value = baseline; } if (newtime) { document.getElementById('newtime').value = newtime; } if (baseline || newtime) { calculate(); } } // Copy link function function copyLink() { const url = window.location.href; navigator.clipboard.writeText(url).then(() => { const button = document.getElementById('copy-button'); const originalText = button.innerHTML; button.innerHTML = ` <svg viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2"> <polyline points="20,6 9,17 4,12"/> </svg> Copied! `; button.classList.add('copied'); setTimeout(() => { button.innerHTML = originalText; button.classList.remove('copied'); }, 2000); }).catch(() => { // Fallback for older browsers const textArea = document.createElement('textarea'); textArea.value = url; document.body.appendChild(textArea); textArea.select(); document.execCommand('copy'); document.body.removeChild(textArea); const button = document.getElementById('copy-button'); button.textContent = 'Copied!'; setTimeout(() => { button.innerHTML = ` <svg viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2"> <path d="M10 13a5 5 0 0 0 7.54.54l3-3a5 5 0 0 0-7.07-7.07l-1.72 1.71"/> <path d="M14 11a5 5 0 0 0-7.54-.54l-3 3a5 5 0 0 0 7.07 7.07l1.71-1.71"/> </svg> Copy Link `; }, 2000); }); } function calculate() { const baseline = parseFloat(document.getElementById('baseline').value); const newtime = parseFloat(document.getElementById('newtime').value); const results = document.getElementById('results'); if (!baseline || !newtime || baseline <= 0 || newtime <= 0) { results.classList.remove('show'); return; } // Update hash with current values updateHash(); // Calculate improvements (assuming lower time is better) const improvement = baseline - newtime; const fmt = (n) => parseFloat(n.toFixed(2)); // Update display const fasterPercentEl = document.getElementById('faster-percent'); const fasterPercentLabelEl = document.getElementById('faster-percent-label'); const fasterTimesEl = document.getElementById('faster-times'); const lessTimeEl = document.getElementById('less-time'); const lessTimeLabelEl = document.getElementById('less-time-label'); if (improvement > 0) { const pct = fmt((baseline - newtime) / newtime * 100); const times = fmt(baseline / newtime); const saved = fmt((baseline - newtime) / baseline * 100); fasterPercentLabelEl.textContent = 'Performance improvement:'; fasterPercentEl.textContent = `${pct}% faster`; fasterPercentEl.className = 'result-value positive'; fasterTimesEl.textContent = `${times}× faster`; fasterTimesEl.className = 'result-value positive'; lessTimeLabelEl.textContent = 'Time saved:'; lessTimeEl.textContent = `${saved}% less time`; lessTimeEl.className = 'result-value positive'; } else if (improvement < 0) { const pct = fmt((newtime - baseline) / baseline * 100); const times = fmt(newtime / baseline); const extra = fmt((newtime - baseline) / baseline * 100); fasterPercentLabelEl.textContent = 'Performance regression:'; fasterPercentEl.textContent = `${pct}% slower`; fasterPercentEl.className = 'result-value negative'; fasterTimesEl.textContent = `${times}× slower`; fasterTimesEl.className = 'result-value negative'; lessTimeLabelEl.textContent = 'Time added:'; lessTimeEl.textContent = `${extra}% more time`; lessTimeEl.className = 'result-value negative'; } else { fasterPercentLabelEl.textContent = 'Performance improvement:'; fasterPercentEl.textContent = 'No change'; fasterPercentEl.className = 'result-value'; fasterTimesEl.textContent = '1.00× (same)'; fasterTimesEl.className = 'result-value'; lessTimeLabelEl.textContent = 'Time saved:'; lessTimeEl.textContent = 'No time saved'; lessTimeEl.className = 'result-value'; } results.classList.add('show'); } // Initialize on page load document.addEventListener('DOMContentLoaded', () => { loadFromHash(); document.getElementById('baseline').addEventListener('input', calculate); document.getElementById('newtime').addEventListener('input', calculate); // Handle browser back/forward window.addEventListener('hashchange', loadFromHash); });

Show full content

.post-header { text-align: center !important; } .entry-hint-parent { justify-content: center !important; } .calculator-container { max-width: 600px; margin: 0 auto; padding: 1rem; } @media (max-width: 640px) { .calculator-container { padding: 0.5rem; } } .calculator-form { display: grid; grid-template-columns: 1fr 1fr; gap: 2rem; margin-bottom: 2rem; } @media (max-width: 640px) { .calculator-form { grid-template-columns: 1fr; gap: 1rem; } } .input-group { display: flex; flex-direction: column; } .input-group label { font-weight: 500; margin-bottom: 0.5rem; color: var(--primary); } .input-group input { padding: 0.75rem; font-size: 1.1rem; border: 2px solid var(--border); border-radius: var(--radius); background: var(--entry); color: var(--primary); transition: border-color 0.2s ease; } @media (max-width: 640px) { .input-group input { padding: 1rem; font-size: 1.1rem; } } /* Hide number input spinners */ .input-group input[type="number"]::-webkit-outer-spin-button, .input-group input[type="number"]::-webkit-inner-spin-button { -webkit-appearance: none; margin: 0; } .input-group input[type="number"] { -moz-appearance: textfield; } .input-group input:focus { outline: none; border-color: var(--secondary); } .results-container { background: var(--entry); border: 1px solid var(--border); border-radius: var(--radius); padding: 1.5rem; margin-top: 1rem; opacity: 0; transform: translateY(-10px); transition: all 0.3s ease; } .results-container.show { opacity: 1; transform: translateY(0); } .results-container h3 { margin: 0 0 1rem 0; color: var(--primary); font-size: 1.2rem; } .result-item { display: flex; flex-direction: column; align-items: center; padding: 1rem 0; border-bottom: 1px solid var(--border); text-align: center; } .result-item:last-child { border-bottom: none; } .result-label { color: var(--content); margin-bottom: 0.5rem; font-size: 0.9rem; } .result-value { font-weight: 600; font-size: 1.3rem; color: var(--primary); } .result-value.positive { color: #22c55e; } .result-value.negative { color: #ef4444; } .copy-button { display: inline-flex; align-items: center; gap: 0.5rem; padding: 0.5rem 1rem; background: var(--primary); color: var(--theme); border: none; border-radius: var(--radius); font-size: 0.9rem; cursor: pointer; transition: all 0.2s ease; margin-top: 1rem; width: 100%; justify-content: center; } .copy-button:hover { opacity: 0.8; transform: translateY(-1px); } .copy-button:active { transform: translateY(0); } .copy-button.copied { background: #22c55e; } .copy-button svg { width: 16px; height: 16px; } .attribution { max-width: 600px; margin: 2rem auto 0 auto; padding: 1rem; text-align: center; color: var(--secondary); font-size: 0.9rem; border-top: 1px solid var(--border); } .attribution a { color: var(--primary); text-decoration: none; } .attribution a:hover { text-decoration: underline; } Old time: New time: Results Performance improvement: — Speed multiplier: — Time saved: — Copy Link Inspired by Paul Irish's original how-much-faster page // Hash routing functions function updateHash() { const baseline = document.getElementById('baseline').value; const newtime = document.getElementById('newtime').value; if (baseline && newtime) { window.location.hash = `old=${baseline}&new=${newtime}`; } else { window.location.hash = ''; } } function loadFromHash() { const hash = window.location.hash.substring(1); if (!hash) return; const params = new URLSearchParams(hash); const baseline = params.get('old'); const newtime = params.get('new'); if (baseline) { document.getElementById('baseline').value = baseline; } if (newtime) { document.getElementById('newtime').value = newtime; } if (baseline || newtime) { calculate(); } } // Copy link function function copyLink() { const url = window.location.href; navigator.clipboard.writeText(url).then(() => { const button = document.getElementById('copy-button'); const originalText = button.innerHTML; button.innerHTML = ` <svg viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2"> <polyline points="20,6 9,17 4,12"/> </svg> Copied! `; button.classList.add('copied'); setTimeout(() => { button.innerHTML = originalText; button.classList.remove('copied'); }, 2000); }).catch(() => { // Fallback for older browsers const textArea = document.createElement('textarea'); textArea.value = url; document.body.appendChild(textArea); textArea.select(); document.execCommand('copy'); document.body.removeChild(textArea); const button = document.getElementById('copy-button'); button.textContent = 'Copied!'; setTimeout(() => { button.innerHTML = ` <svg viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2"> <path d="M10 13a5 5 0 0 0 7.54.54l3-3a5 5 0 0 0-7.07-7.07l-1.72 1.71"/> <path d="M14 11a5 5 0 0 0-7.54-.54l-3 3a5 5 0 0 0 7.07 7.07l1.71-1.71"/> </svg> Copy Link `; }, 2000); }); } function calculate() { const baseline = parseFloat(document.getElementById('baseline').value); const newtime = parseFloat(document.getElementById('newtime').value); const results = document.getElementById('results'); if (!baseline || !newtime || baseline <= 0 || newtime <= 0) { results.classList.remove('show'); return; } // Update hash with current values updateHash(); // Calculate improvements (assuming lower time is better) const improvement = baseline - newtime; const fmt = (n) => parseFloat(n.toFixed(2)); // Update display const fasterPercentEl = document.getElementById('faster-percent'); const fasterPercentLabelEl = document.getElementById('faster-percent-label'); const fasterTimesEl = document.getElementById('faster-times'); const lessTimeEl = document.getElementById('less-time'); const lessTimeLabelEl = document.getElementById('less-time-label'); if (improvement > 0) { const pct = fmt((baseline - newtime) / newtime * 100); const times = fmt(baseline / newtime); const saved = fmt((baseline - newtime) / baseline * 100); fasterPercentLabelEl.textContent = 'Performance improvement:'; fasterPercentEl.textContent = `${pct}% faster`; fasterPercentEl.className = 'result-value positive'; fasterTimesEl.textContent = `${times}× faster`; fasterTimesEl.className = 'result-value positive'; lessTimeLabelEl.textContent = 'Time saved:'; lessTimeEl.textContent = `${saved}% less time`; lessTimeEl.className = 'result-value positive'; } else if (improvement < 0) { const pct = fmt((newtime - baseline) / baseline * 100); const times = fmt(newtime / baseline); const extra = fmt((newtime - baseline) / baseline * 100); fasterPercentLabelEl.textContent = 'Performance regression:'; fasterPercentEl.textContent = `${pct}% slower`; fasterPercentEl.className = 'result-value negative'; fasterTimesEl.textContent = `${times}× slower`; fasterTimesEl.className = 'result-value negative'; lessTimeLabelEl.textContent = 'Time added:'; lessTimeEl.textContent = `${extra}% more time`; lessTimeEl.className = 'result-value negative'; } else { fasterPercentLabelEl.textContent = 'Performance improvement:'; fasterPercentEl.textContent = 'No change'; fasterPercentEl.className = 'result-value'; fasterTimesEl.textContent = '1.00× (same)'; fasterTimesEl.className = 'result-value'; lessTimeLabelEl.textContent = 'Time saved:'; lessTimeEl.textContent = 'No time saved'; lessTimeEl.className = 'result-value'; } results.classList.add('show'); } // Initialize on page load document.addEventListener('DOMContentLoaded', () => { loadFromHash(); document.getElementById('baseline').addEventListener('input', calculate); document.getElementById('newtime').addEventListener('input', calculate); // Handle browser back/forward window.addEventListener('hashchange', loadFromHash); });

https://jakebailey.dev/how-much-faster/

Detecting dubious shadowing in Go

Apr 6, 2025

The most common porting bug in the TypeScript Go port

Show full content

If you hadn’t already heard, we’re porting the TypeScript compiler to Go. This is a certified Big Deal, no small feat. Language choice is one of those contentious things, and I’m not going to go into great detail about it here, but one factor in the language choice is that the ported code is very, very close to the original TypeScript code.

But obviously Go isn’t TypeScript. There are whole classes of bugs that can happen in Go that won’t happen in TypeScript (and vice versa). If you aren’t careful, a direct translation could behave differently, or you could add all new bugs.

There was one specific kind of bug that came up the most as more and more code was ported: unintentional shadowing.

Enter the Shadow Realm

Can you spot the bug?

func (c *Checker) getUnresolvedSymbolForEntityName(name *ast.Node) *ast.Symbol {
    // ...
    result := c.unresolvedSymbols[path]
    if result == nil {
        result := c.newSymbol(ast.SymbolFlagsTypeAlias, text)
        c.unresolvedSymbols[path] = result
        result.Parent = parentSymbol
        c.declaredTypeLinks.Get(result).declaredType = c.unresolvedType
    }
    return result
}

The intent here was to return the new symbol, but Go’s := operator creates a new variable within the scope of the if statement’s block, so this function always returns nil. := should have been =. One character, very hard to spot.

Seasoned Go devs are probably screaming right now. “NoOoOoOo why aren’t you using an early return here??” And they’re right! A more idiomatic translation would be:

func (c *Checker) getUnresolvedSymbolForEntityName(name *ast.Node) *ast.Symbol {
    // ...
    if result := c.unresolvedSymbols[path]; result != nil {
        return result
    }
    result := c.newSymbol(ast.SymbolFlagsTypeAlias, text)
    c.unresolvedSymbols[path] = result
    result.Parent = parentSymbol
    c.declaredTypeLinks.Get(result).declaredType = c.unresolvedType
    return result
}

But remember, this is a port. We’re not really trying to change the style of the code all of the time. In fact, the code may have been partially autogenerated and then copy/pasted. Or just split-screened with the original code and typed out (i.e., error-prone).

If you look at the original TypeScript code, you’ll see what we’re trying to emulate:

function getUnresolvedSymbolForEntityName(
    name: EntityNameOrEntityNameExpression,
) {
    // ...
    let result = unresolvedSymbols.get(path);
    if (!result) {
        unresolvedSymbols.set(
            path,
            result = createSymbol(SymbolFlags.TypeAlias, text),
        );
        result.parent = parentSymbol;
        result.links.declaredType = unresolvedType;
    }
    return result;
}

Should this have been an early return? In my opinion? Yes, absolutely.1 Regardless of the language. But, I didn’t write this code, it just is what it is.

This bug kept happening over and over during the port; multiple people on the team complained about it. It’s not hard to see why; it becomes muscle memory to type :=, and then it’s just one character away from = so not too easy to notice (yourself, or in review). The above example is one of the easier ones to see visually, but we have other examples too. For example, this kind of code is all over our type relations code:

switch {
// ...
case source.flags&TypeFlagsIndexedAccess != 0:
    result = r.isRelatedTo(source.objectType, target.objectType)
    if result != TernaryFalse {
        result &= r.isRelatedTo(source.indexType, target.indexType)
        if result != TernaryFalse {
            return result
        }
    }
// ...
}

You can imagine maybe that assignment should have been a :=.2 Or maybe something else is wrong? Hard to say.

A little go/analysis goes a long way

How do we avoid these problems?

As a compiler dev, I only have one solution to every problem: static analysis.

Go has a great static analysis framework called go/analysis, which is itself built upon Go’s built-in AST and type checking packages (go/ast, go/types). With it, we can create our own analyzers to run over our code and find errors. For convenience, those analyzers can then be compiled into golangci-lint to be run with the rest of the linters.

An Analyzer looks like this:

var myAnalyzer = &analysis.Analyzer{
    Name: "myAnalyzer",
    Run: func(pass *analysis.Pass) (any, error) {
        // ...
        pass.Reportf(node.Pos(), "%s is doing a bad thing!", node.Name)
        // ...
    },
}

Declare an Analyzer, then the Run function is given all of the information about the code being analyzed, report errors (even fixes and related positions, LSP-style), even return analysis results to be consumed by other passes.

The Go team has actually already created a shadow analyzer that looks for mistakenly shadowed variables. How does it work? Consider the following code:

func f() int {
    value := 1
    println(value)
    if condition {
        value := 2
        // ...
        println(value)
    }
    return value
}

This code probably has a shadowing bug. The shadow pass determines this by checking every variable to see if there’s another variable it could shadow (same name in a parent scope, with the same type), then checks to see if that potentially shadowed variable is used “after” the inner declaration, where “after” is a position check. This avoids false positives like when we don’t use the outer value later:

func f() int {
    value := 1
    println(value)
    if condition {
        value := 2
        // ...
        println(value)
    }
    // no use of "value" here!
    // ...
    return someOtherValue
}

This method is pretty good and does catch many bugs, but it’s not perfect. Take for example:

func f() int {
    value := 1
    if condition {
        value := 2
        // ...
        println(value)
        return value + 1234
    }
    return value
}

Is this a shadowing bug? I’d say it isn’t. Nothing we do in the inner scope is observable in the outer scope since we’re returning early. In different words, no use of the outer scope’s value is reachable from the shadowing in the inner scope.

Going with the (control) flow

Yes, I said it, the magic word, reachable. The thing we want to be checking is whether any assignment to the inner declaration could reach a use of the outer declaration. This is all just a dataflow analysis question in disguise.3

The existing shadow pass approximates reachability using source positions. This is fast and easy for sure; just look up the scope chain, find what you might shadow, then check if any use is “after” the shadowing.

But this had enough false positives that I wasn’t comfortable enabling it.4 I figured there must be a way to use a proper control-flow graph (CFG) to figure this out.

Fortunately, control-flow graphs are a well established concept and the Go tooling has ways to get them. The predominant way to do this is to use the go/ssa package, which builds a static single assignment (SSA) representation of the code. I opted not to use it, however. The SSA representation is pretty fine-grained with its own quirks and (in my opinion) better suited for more complicated analyses.5

Instead, I used go/cfg, which builds simple control flow graphs out of the AST.

Asking for this info is pretty straightforward; just declare that your analyzer needs it, and then grab its result.

var shadowAnalyzer = &analysis.Analyzer{
    // ...
    Requires: []*analysis.Analyzer{ctrlflow.Analyzer},
    Run: func(pass *analysis.Pass) (any, error) {
        // ...
        cfgs := pass.ResultOf[ctrlflow.Analyzer].(*ctrlflow.CFGs)
        cfg := cfgs.FuncDecl(node)
        // ...
    },
}

What does the CFG look like for our previous example? For reference, the code was:

func f() int {
    value := 1
    if condition {
        value := 2
        // ...
        println(value)
        return value + 1234
    }
    return value
}

This produces a CFG that looks like:

div:nth-child(1 of .goat) > svg > g { text[x="16"][y="20"], text[x="24"][y="20"], text[x="32"][y="20"], text[x="40"][y="20"], text[x="48"][y="20"] { color: var(--safe-magenta); } text[x="280"][y="148"], text[x="288"][y="148"], text[x="296"][y="148"], text[x="304"][y="148"], text[x="312"][y="148"] { color: var(--safe-magenta); } text[x="16"][y="148"], text[x="24"][y="148"], text[x="32"][y="148"], text[x="40"][y="148"], text[x="48"][y="148"] { color: var(--safe-blue); } text[x="80"][y="164"], text[x="88"][y="164"], text[x="96"][y="164"], text[x="104"][y="164"], text[x="112"][y="164"] { color: var(--safe-blue); } text[x="72"][y="180"], text[x="80"][y="180"], text[x="88"][y="180"], text[x="96"][y="180"], text[x="104"][y="180"] { color: var(--safe-blue); } } v c v p r a o a r e l n l i t u d u n u e i e t r t l n : i : n = o = ( v n v a 1 2 a l l u u e e ) + 1 2 3 4 r e t u r n v a l u e Expand to see the above in textual form

.0: # Body@L5
        value := 1
        condition
        succs: 1 2

.1: # IfThen@L7
        value := 2
        println(value)
        return value + 1234

.2: # IfDone@L7
        return value

This is exactly what we need! If we start at the inner declaration of value, we can see that it doesn’t flow into any use of the outer declaration, so this code is safe.

Let’s try something a little more complicated:

func f() int {
    value := 1
    for i := 0; i < N; i++ {
        value := 2
        // ...
        if i%2 == 0 {
            println("continue!")
            continue
        }
        println(value)
        return value + 1234
    }
    return value
}

Is there a shadowing bug here? Let’s consult the CFG:

div:nth-child(2 of .goat) > svg > g { text[x="16"][y="20"], text[x="24"][y="20"], text[x="32"][y="20"], text[x="40"][y="20"], text[x="48"][y="20"] { color: var(--safe-magenta); } text[x="304"][y="20"], text[x="312"][y="20"], text[x="320"][y="20"], text[x="328"][y="20"], text[x="336"][y="20"] { color: var(--safe-magenta); } text[x="256"][y="84"], text[x="264"][y="84"], text[x="272"][y="84"], text[x="280"][y="84"], text[x="288"][y="84"] { color: var(--safe-blue); } text[x="248"][y="212"], text[x="256"][y="212"], text[x="264"][y="212"], text[x="272"][y="212"], text[x="280"][y="212"] { color: var(--safe-blue); } text[x="240"][y="228"], text[x="248"][y="228"], text[x="256"][y="228"], text[x="264"][y="228"], text[x="272"][y="228"] { color: var(--safe-blue); } } v i a l : u = e 0 p : r = i n 1 t i l + n + ( " c o n i t i < n u N e p r ! r e " i t ) n u t r l n n ( v r v a e v i a l t a % l u u l 2 u e r u e n e = ) + = v : 1 a = 0 2 l 3 u 2 4 e Expand to see the above in textual form

.0: # Body@L5
        value := 1
        i := 0
        succs: 3

.1: # ForBody@L7
        value := 2
        i%2 == 0
        succs: 5 6

.2: # ForDone@L7
        return value

.3: # ForLoop@L7
        i < N
        succs: 1 2

.4: # ForPost@L7
        i++
        succs: 3

.5: # IfThen@L10
        println("continue!")
        succs: 4

.6: # IfDone@L10
        println(value)
        return value + 1234

If we follow the control flow from the node containing the inner declaration, we can see that there is a path to return value in the outer scope. If := was meant to be =, the behavior would change; the function would return 2 instead of 1. So, we say this is a potential shadowing bug, and so that inner variable would be best renamed.

This CFG-based approach also works nicely for other constructs as well; we can follow the inner declaration all the way through if statements, loops, switches, everything; no special casing required.

The only interesting case is function literals. Is there a shadowing bug in this code?

func f() int {
    value := 1

    callIt(func() {
        value := 2
        println(value)
    })

    return value
}

I’d say “maybe”, since swapping := to = could change the behavior of the code. The CFG wouldn’t exactly show this, since the two functions have different CFGs entirely. How we choose to relate them is pure choice! It’s actually a pretty similar problem to the infamous TypeScript issue 9998; do we consider this code to have executed immediately? Never? Potentially at any time? I chose to ignore the problem and simply disallow shadowing across function boundaries, which thankfully didn’t have too high of a false-positive rate to feel bad.

Enabling the analyzer

With all of this in place, I created a lint rule that we could add to our customlint plugin for golangci-lint, which you can see in typescript-go PR #365. Surprisingly, this didn’t negatively affect lint time. Most of the time, there isn’t shadowing, ast/inspector avoids the need to check a lot of the AST6, and there are some simple checks that we can do on top of that.

The PR only fixed one bug, which is pretty good, but less than I was hoping for; it turns out that people on the team had already been wasting time debugging to find these issues, so they were all pretty much fixed already. Thankfully, the rule seems to be very reliable, and no new shadowing bugs have appeared.

Anyway, hopefully that was an interesting look at one source of potential bugs in Go (especially for ported code), and how static analysis can help. There are of course other sources of bugs in Go, and I’m planning to write about how I was above to catch those too. Stay tuned for that!

I wouldn’t have used an assignment expression either. Expressions shouldn’t assign things! You cannot convince me otherwise. My brain just isn’t wired for looking for side effects in the middle of something else. Sorry (not sorry). ↩︎
Or maybe you can imagine this code should have been written with early returns too. I digress. ↩︎
There’s probably some proper way to describe this in terms of “dominators”, “dominance frontiers”, a “dominator tree”, something like that. But if I do the math, it’s been almost 10 years since I took a compiler optimization course, and I can’t say I understood the terminology back then either. So let’s just stick with “reachability”. ↩︎
I think the Go team would agree; the docs say: “[this analyzer] generates too many false positives and is not yet enabled by default”. ↩︎
I found that it modified the structure of the code enough to make it difficult to work with for this particular use, doing things like constant propagation and simplification. This is definitely great for other analyzers, but it seemed a little challenging to use in my case. It’s a shame; go/analysis allows passes to share results, so using go/ssa would have effectively been “free” via one of the other lint rules we run. Thankfully, go/cfg is plenty fast and it doesn’t seem to have much of an impact on our lint time. ↩︎
This deserves its own post, honestly. In short, the ast/inspector package (along with the inspector pass) amortizes the cost of AST walks looking for specific node kinds by walking all ASTs once, keeping track of which nodes are present in 32-bit bitmasks, which works since Go has fewer than 32 node types. Then you can repeatedly ask for certain nodes, and “is this one of the nodes I need” is just a bitwise OR. I briefly thought “oh man, can we do this in TypeScript?”, but then I realized that our AST has 349 node kinds, way more than could fit in a single integer. Drat. It turns out that simple languages are in fact faster to analyze (shocker). ↩︎

https://jakebailey.dev/posts/go-shadowing/

DefinitelyTyped is a monorepo!

Oct 17, 2023

Yes, it is! But for real this time!

Show full content

Previously, on “Is DT a monorepo?”

In a previous post, I talked about the layout of DefinitelyTyped and how it was indeed a monorepo, albeit a funky one. In short, packages were laid out (more or less) like this:

types/
  gensync/
    tsconfig.json
    index.d.ts
  node/
    tsconfig.json
    index.d.ts
    package.json
  react/
    tsconfig.json
    index.d.ts
    package.json

And so on. Each tsconfig.json file contained bits like:

{
    "compilerOptions": {
        "baseUrl": "../",
        "typeRoots": [
            "../"
        ],
        "types": []
    }
}

This config means that when a types package looks for itself or another package, it can map to other directories in types; no package.json is needed. At publish time, we detected dependencies and add them. If a package needed an external dependency, then the package would need a package.json file with that dependency declared.

This provided a monorepo-like feel without any symlinking, but with many downsides, including:

Long npm install times when external dependencies are needed, especially when testing the entire repo. The tooling just looped over every folder with a package.json and ran npm install.
Completely unrealistic module resolution (no node16 / nodenext, no export maps, etc.) thanks to the use of baseUrl, typeRoots, and paths. Not even typesVersions works.

I also talked about what we could do to remedy the situation, which boils down to “what if we were just a monorepo like everyone else uses in the JS ecosystem and let a package manager handle things”?

Making fetch happen

Obviously, all of that was a good 6 months ago. There were some unresolved blockers that made me put the project on the backburner. What changed?

Recently, Andrew merged fetch support into @types/node. Yay!

But, you might have noticed that only @types/node@20 got this feature. Surprise! It’s the second bullet point from above. DefinitelyTyped’s fake module resolution broke resolution inside undici-types, the package @types/node depends on to provide fetch types (without depending on undici itself, which is vendored into Node). The effect of this is that we could only add fetch to the latest types for Node, not to any older versions. In fact, if @types/node@22 were needed, we’d have to drop it from @types/node@20! Boo.

This problem shuffled the whole “DefinitelyTyped monorepo” thing straight to the top of our interest list.

And so with that, I’m happy to say that after a few weeks of effort from Nathan, Andrew, and myself, we’re actually doing it! DefinitelyTyped is becoming a monorepo!

Hello, pnpm

If you’ve read the previous posts, you won’t be surprised to find that we’re using pnpm to do this. All modern package managers have some sort of monorepo support these days, but DefinitelyTyped’s unique situation limits what we can use. Specifically, DefinitelyTyped contains multiple versions of the same package. For example, we currently have @types/react v15-v18, @types/node v16-v20, and so on. Both npm and yarn exit early when they see two workspace packages with the same name. Understandable!1 But, we need to do it somehow.

With pnpm, this “just works”. Internally within pnpm, workspace packages are identified by their paths, so there’s no conflict. Then, when pnpm goes to resolve packages, it only cares about the name and version. This actually means we get something better than just “it doesn’t fail”; it can actually resolve to these workspace packages based on their versions! It behaves just as though the packages were provided by the npm registry. So long, paths.

There’s a bunch more goodness pnpm provides, but for now, let’s just look at the new layout.

The new layout

Anyone who’s worked in a monorepo will not be surprised by the new layout. Now, we have:

types/
  gensync/
    tsconfig.json
    index.d.ts
    package.json # new!
  node/
    tsconfig.json
    index.d.ts
    package.json
  react/
    tsconfig.json
    index.d.ts
    package.json

Every @types package now requires a package.json, even if it doesn’t have any external dependencies. Let’s take a look at what’s inside. Here’s the new bits of package.json for @types/jsdom:

{
    "private": true,
    "name": "@types/jsdom",
    "version": "21.1.9999",
    "projects": [
        "https://github.com/jsdom/jsdom"
    ],
    "minimumTypeScriptVersion": "4.5",
    "dependencies": {
        "@types/node": "*",
        "@types/tough-cookie": "*",
        "parse5": "^7.0.0"
    },
    "devDependencies": {
        "@types/jsdom": "workspace:."
    },
    "owners": [
        { "name": "Leonard Thieu", "githubUsername": "leonard-thieu" },
        { "name": "Johan Palmfjord", "githubUsername": "palmfjord" },
        { "name": "ExE Boss", "githubUsername": "ExE-Boss" }
    ]
}

That’s a lot of stuff. Much of this is information that was previously a part of the index.d.ts “header”, i.e. something like:

// Type definitions for jsdom 21.1
// Project: https://github.com/jsdom/jsdom
// Definitions by: Leonard Thieu <https://github.com/leonard-thieu>
//                 Johan Palmfjord <https://github.com/palmfjord>
//                 ExE Boss <https://github.com/ExE-Boss>
// Definitions: https://github.com/DefinitelyTyped/DefinitelyTyped
// Minimum TypeScript Version: 4.5

In the new layout, we’re going to be using a package manager, so we need to declare a name and version to get packages to link up. That’s the first bit of the header. At that point, we may as well just move everything into JSON and be done with it. Additionally, this means that tools wanting to grab info about a DefinitelyTyped package don’t need to parse the header text; it’s all in package.json.

Let’s break down the fields.

private

{
    "private": true
}

This is always set to true, telling pnpm to not attempt to publish this package to the registry. The DefinitelyTyped publisher handles publishing. Packages that had a package.json previously already had this set, so this is nothing new.

name

{
    "name": "@types/jsdom"
}

This is new! Previously this was declared only via the directory name, but now we’re going to be using pnpm to handle things, so we need to specify this.

version

{
    "version": "21.1.9999"
}

This one’s funky; it’s almost what we put in the header, but with a patch version of 9999. When DT packages are published (automatically on a schedule), the patch version is generated; it’s just whatever the previous version was, plus one. So the patch version that’s actually in the repo never matters.

At development time, we’re making use of the fact that pnpm can resolve to local versions. Normally, we could set prefer-workspace-packages, which would force pnpm to always link to the local workspace package. But we actually have a few packages which intentionally point to old versions of @types packages. If we were to do the much nicer thing of using 0 as our patch version, the version from the registry would always be chosen instead. So, we can’t use prefer-workspace-packages. Instead, we just pick an arbitrarily high patch version, such that it will always be newer than what’s in the registry, hence 9999. 9999 publishes ought to be enough for anyone, right?

projects

{
    "projects": [
        "https://github.com/jsdom/jsdom"
    ]
}

This is an array of helpful links to info about a project. Usually it contains a GitHub link, but can sometimes contain more.

minimumTypeScriptVersion

{
    "minimumTypeScriptVersion": "4.5"
}

This defines the minimum supported version of TypeScript version for a package.

dependencies

{
    "dependencies": {
        "@types/node": "*",
        "@types/tough-cookie": "*",
        "parse5": "^7.0.0"
    }
}

This isn’t new, but it is bigger! Dependencies on @types packages are now explicit. No longer can every package access every other package; pnpm won’t link them. This removes the complexity of the infrastructure; we don’t need to parse the code or rely on heuristics to figure out what packages depend on what.

Additionally, this makes pnpm fully aware of how the packages interrelate, meaning that we can use fun features like --filter (more on that later).

devDependencies

{
    "devDependencies": {
        "@types/jsdom": "workspace:."
    }
}

This is new. For the most part, this will contain just one thing; a self-dependency. Without the baseUrl / typeRoots / paths combo, a package can’t find itself anymore, but that’s the API that we’re wanting to test. pnpm doesn’t yet support creating self-links, so we do it ourselves using a workspace:. specifier.2

This list can also contain packages that are needed for testing. This technically an improvement over the previous setup, which didn’t allow devDependencies at all. But, it’s generally better to not have any testing dependencies anyhow.

owners

{
    "owners": [
        { "name": "Leonard Thieu", "githubUsername": "leonard-thieu" },
        { "name": "Johan Palmfjord", "githubUsername": "palmfjord" },
        { "name": "ExE Boss", "githubUsername": "ExE-Boss" }
    ]
}

This is a list of the users that “own” the package. They get pings when people send PRs to packages and can approve them. This used to be in the as URLs (as that’s the syntax needed for the contributors array in package.json), but our tooling only wants usernames and loads of people incorrectly typed their GitHub profile URLs. For owners that aren’t directly on github, url can still be passed (though not shown above).

nonNpm, nonNpmDescription

{
    "nonNpm": true,
    "nonNpmDescription": "Google Maps JavaScript API"
}

My example didn’t have these two, but some of the packages in DefinitelyTyped describe things that aren’t npm packages at all. For example, @types/google.maps describes the Google Maps API (a global), and had a header like:

// Type definitions for non-npm package Google Maps JavaScript API 3.54

This info is used to inform various checks and is carried into the published package. In the new layout, this information is represented in JSON.3

Installing dependencies

Let’s start by doing the naive thing and just run pnpm install in the root of the repo.

$ pnpm install
Scope: all 9114 workspace projects
...
Done in 3m 35.4s

Wow, that’s a lot of install. But it’s a major improvement over the previous layout, where installing the entire repo (with 10x fewer package.json files) took some 30 minutes.

The good news is that those working on DT don’t actually need to install the entire repo. pnpm supports filtering. Let’s say I’m working on @types/node, and want to be able to test it and any packages it depends on. I can run:

$ pnpm install -w --filter '...@types/node...'
Scope: 2722 of 9114 workspace projects
...
Done in 1m 2.5s

That’s a good bit better! Since we have explicit dependencies, pnpm can actually figure out what packages are needed for @types/node and install those, but also figure out which packages depend on @types/node and install those too. The -w tells pnpm to also install the workspace root, which is needed to get the DefinitelyTyped tooling, linters, dprint, etc.

What about package that isn’t so hefty?

$ pnpm install -w --filter '...@types/lodash...'
Scope: 372 of 9114 workspace projects
...
Done in 7.2s

Now we’re talking. Most packages won’t need to do a huge install, so long as people read the docs (🙃) to know how to avoid the big install.

From this point on, the workflow is the same as DefinitelyTyped was before.

Filtering in CI

There’s one other cool trick that we can use in CI; not only can we filter by package name, but we can also filter by what changed since a specific git ref. In a PR build, we can use:

$ pnpm install -w --filter '...[origin/master]'

And only get what we need.

Other misc improvements

There’s also a grab bag of other improvements that come with this change. In no particular order:

Having to redo a bunch of the DefinitelyTyped tooling has led to improvements like dtslint-runner no longer bailing early on certain kinds of errors. Many more things are collected for reporting at the end such that doing one thing wrong doesn’t hide the problem until a second run.
As a part of making everything work in the new monorepo, we manually fixed a few hundred packages. These packages were silently broken (or at least, weird) in various ways. For example, multiple packages imported the events library. This could mean @types/node, but it could also mean @types/events. In practice, it resolved to the latter, but then sometimes, the tooling would say the package depended on neither (probably due to a bug in the implicit dependency resolution). Now, each package actually has to say which they need. There are other weird things besides just this; invalid references directives, packages depending on the wrong versions of things, etc.
Having a complete working package.json for every package means that one can theoretically just npm pack and get a working tarball. This is likely to become useful for tools like Are The Types Wrong, although the publisher still does a bunch of stuff (notably, deciding which files actually get included in each package, which is still mostly implicit).
It turns out that a load of react-based types packages are broken at the moment due to @types/react using typesVersions. Since typesVersions is in package.json, but baseUrl / typesRoot / paths skip package.json resolution, packages that depend on @types/react always get the types meant for TS 5.1 and above. Oops. With actual node_modules linking, this isn’t a problem and things work as intended. Another reason to speed this along.

It’s not all rainbows and sunshine

Everything I’ve described so far has been an improvement over the previous layout. But there are some warts left to figure out.

Using shared-workspace-lockfile=false

The astute reader may have noticed that the performance of pnpm install seems way slower than expected, especially given the numbers I achieved in the previous post about making pnpm faster.

The reason for this slowdown is our use of shared-workspace-lockfile=false. This is a lesser-used option which instructs pnpm to instead handle each workspace package individually. Each package gets is own dependency graph, and calculating that 9000 some times is a lot slower than doing it once.

Why enable it, then? It’s a bit hard to fully explain. I filed an issue for it upstream, but the gist is that since pnpm is stricter about how it handles dependencies (they aren’t just all hoisted to the top), the combo of so many packages forgetting to declare dependencies on @types packages along with some packages (outside DT) explicitly depending on @types/node@17 (why??) causes the “fallback” @types/node to point to that awkward v17 version.

I haven’t quite figured out what the best solution is; it’s possible that pnpm could gain a yarn-like hoisting-limit system to avoid this problem, or always resolve “fallback” dependencies to the latest version.

I had previously skipped working on this project due to this problem, but the upsides of the migration (especially with fetch getting thrown in the mix) tipped the scales. Although shared-workspace-lockfile=false is quite a bit slower, it’s still an improvement when installing the entire repo, and filtering provides a very straightforward way to reduce the cost of package installs.

Just to show what the difference is, here’s the install without this setting.

$ pnpm install
Scope: all 9114 workspace projects
...
Done in 1m 5.8s

$ pnpm install -w --filter '...@types/node...'
Scope: 2722 of 9114 workspace projects
...
Done in 30.2s

$ pnpm install -w --filter '...@types/lodash...'
Scope: 372 of 9114 workspace projects
...
Done in 7.9s

Not helpful super for a small number of packages, but quite a bit faster if you ever need to work with the whole thing.

git clean is broken on Windows

pnpm uses symlinks under the hood. On POSIX-ish platforms like Linux and macOS, this is all good; symlinks work for any user and behave as expected. On Windows, however, the story is different. For a very long time, the only way to get “real” symlinks was to gain elevated permissions. Without being an admin, the best you could hope for was a “junction”. I’m not sure I could explain the ins and outs of junctions other than to say that they’re like a symlink, but only for directories, and they act kinda weird sometimes. But, they do the job.

There’s a gotcha; if you run git clean, git treats junctions as directories! This is normally not a big deal, but every one of our packages has a self link, which makes the symlinks recursive. And since git doesn’t treat junctions like symlinks, git clean will just keep recursing infinitely until it hits the max path length. It may eventually finish, but without loads of errors and complaining.

There are two ways forward:

Make git treat junctions as symlinks. I sent a PR for this earlier this year, but it hasn’t yet been accepted. I personally think this is the correct solution; if git clean had been implemented in shell scripts (like much of git), it would have treated junctions as symlinks and just worked. But git clean is written in C, so instead of going through git-bash, it goes through the shims which translate the POSIX-y file system accesses into the Windows API, and those shims disagree with git-bash.4
Make pnpm use real symlinks. You’re probably confused; didn’t I just say that you needed elevated privileges to do that? Normally, yes, but if you enable Developer Mode, any user is able to make symlinks! And, I fully suspect that most people developing on Windows have this enabled. You even need enabled to enable WSL. This is probably a good idea whether or not git changes; real symlinks don’t have the same warts as junctions. I’ll disclaim that I haven’t actually proposed this change upstream. There are some gotchas in that it may be awkward to enable this automatically (what happens if you end up with a node_modules with both junctions and symlinks?), but I think it should be straightforward to detect.

For now, DefinitelyTyped has included a script Windows users can run to clean up node_modules; pnpm run clean-node-modules will find and delete all node_modules directories within the repo. Good enough for now.

Removing a package won’t expose breaking changes in newly-typed packages

This one’s subtle. Imagine we have a package @types/foo. Another package (in the repo or even external) depends on @types/foo. But, foo has just gained types, which means that it’s time to remove @types/foo from DefinitelyTyped.

In the old layout, we’d delete the directory and add it to notNeededPackages. When the PR that does this is merged, the publisher will publish one final version of @types/foo that contains only a package.json with a dependency on the real foo.

But, when you’re actually working on the PR that does this, the shim package hasn’t yet been published! If another package within DefinitelyTyped depends on @types/foo it will stop pointing to the one in the repo (it’s been deleted). In the old layout, things would stop compiling until dependencies are updated. But in the new layout, pnpm will just resolve to the latest version of @types/foo in the registry, which will be exactly the same code that is being deleted. This means that the PR will definitely pass CI, when it may actually fail later if the real upstream foo package has types which differ enough to break things.

There’s not really a great solution to this other than to ban external dependencies on @types packages that aren’t contained in the repo; that handles some of the situations but not all. (If you have any clever ideas, let me know.)

Removing a package isn’t reflected in pnpm git filter

In addition to the above, when we delete a package, pnpm’s behavior for --filter '...[origin/master]' doesn’t pick up on the removed package. In an ideal world, it’d see the package removed, and then figure out which local packages are affected by the removal. But, it doesn’t do that, either simply because the package is gone (so there’s no more edges to check), or due to the previous section (where the package is still there, just not in the repo). The workaround is to instead use pnpm ls --depth -1 --filter '...@types/removed...' to get some sort of list of what may need to be tested. In CI, if a package.json is removed from the repo, we also don’t use pnpm install --filter '...[origin/master]', resorting to a complete install.

Future work

After this is merged, we’re still not done! There is some exiting stuff that gets unlocked:

Since there’s no more header, there’s nothing special about index.d.ts anymore. Since we’re trying to get DT as close to the upstream packages as possible, it may actually be wrong to have an index.d.ts if the package doesn’t contain index.js. We should be able to remove the requirement that packages have an index.d.ts.
Right now, there’s no way to really verify that types work properly for people using nodenext resolution (though, some analysis has shown that DefinitelyTyped does better in this regard than those publishing types themselves).Now that everything works via node_modules, we could finally use enable those options in tsconfig.json and verify that things work. Maybe even required.
We could offload even more of dtslint-runner onto pnpm scripts.
We could move tests out into their own packages, forcing them to use the public APIs of the packages they’re testing. Or even, have multiple tsconfigs to make sure things work with various settings.

Anyway…

I hope this info dump was interesting; I’m really excited to see this change happen.

Big thanks to everyone involved with this big migration, including Nathan and Andrew from the TypeScript team, Zoltan of pnpm fame, and everyone who spent time finding more ways to speed pnpm and semver up for our ridiculous use case.

Honestly, this is a little unsatisfying. If you think about it, all package managers already need to be able to handle multiple versions of the same package when they’re sourced from the npm registry. Theoretically, they could support multiple versions of workspace packages, but alas, no. ↩︎
Technically, link:. would also work, but for “reasons”, workspace:. has better performance. ↩︎
There’s still some clarity needed about what these fields are supposed to represent. There are quite a few packages (~200) that don’t have this field set but aren’t on npm either. We’ll get it sorted; my hope is that this field becomes defined specifically as “this package is not on npm, do not look at npm for it, but if you do find an npm package with this name, then that may be a problem so CI should fail until we triage the problem”. ↩︎
It’s absolutely possible that my take is the wrong one here; I know Go recently recently changed things to treat these special reparse points as some sort of “irregular” file. Honestly, I have no clue. ↩︎

https://jakebailey.dev/posts/pnpm-dt-3/

Speeding up pnpm

Mar 26, 2023

DefinitelyTyped contains over 8000 packages. What could go wrong?

Show full content

Background

For more background, see the previous post about DefinitelyTyped.

TL;DR: DefinitelyTyped is huge; installing it in its entirety involves processing over 9,000 packages. And that’s slow! Or is it?

Taking a profile

Many people may not know this, but I’ve actually written more Go than I have TypeScript.1 As such, when I have a performance problem, the tool I like to use is pprof.

More commonly, this tool is used when profiling Go, C, C++ code. And I like this tool! Lucky for me, there is a library which lets you use it with Node.2 The API is pretty straightforward; you can start and stop both CPU and heap profiles, and write them to disk.

Unfortunately, that’s a little annoying, because effectively 100% of the time, I’m profiling a CLI application or someone else’s project where I don’t really want to inject the code. It does include some code to let you do node --require=pprof myScript.js, but there’s no way to configure its behavior.

So a few years ago, I made a little wrapper, pprof-it, which makes things much easier to use. You can check the README for more details, but in short, to get a pprof profile you just run:

$ pprof-it /path/to/script.js

pprof-it will start profiling both CPU and heap allocation immediately at startup then dump profiles to the current directory on exit. These files can then be loaded into pprof (or one of the many other tools which support the format, like flamegraph.com or speedscope).

So, let’s take a profile of pnpm install on one of my work-in-progress “DT as a monorepo” branches. (Forgive the roundabout way of running things; some of my fixes are already released, so I need to do a little movie magic.)

$ npx --package=pnpm@7.30.0 -c 'pprof-it $(which pnpm) install'

This actually OOMs on my laptop (I have yet to determine why), but on my desktop, I get this:

pprof-it: Starting profilers (heap, time)
    # a very long pause...
Scope: all 9031 workspace projects
    # a very very long warning about cycles (I need to file an issue for this!)
Lockfile is up to date, resolution step is skipped
Already up to date
    # another long pause
Done in 1m 39.7s
pprof-it: Stopping profilers
pprof-it: Writing heap profile to pprof-heap-286252.pb.gz
pprof-it: Writing time profile to pprof-time-286252.pb.gz

Great, now let’s run pprof:

$ pprof -http=: pprof-time-286252.pb.gz

Automatically, pprof starts up my browser and puts me right into the graph view. This view outside of Node profiles is very useful, but Node profiles have an unfortunate problem which leads to all anonymous (i.e. arrow) functions being counted as one node named “(anonymous)”.3 So, let’s flip into the flame view.

A pprof profile of the original test case; two large blocks. The overall execution takes about 100 seconds.

Already, I’m excited; this is every profiler’s dream. Two very obvious chunks of work attributed to real names I can search for. Roughly 50 seconds are spent in createPkgGraph and another 32 seconds in getRootPackagesToLink. I should note that at this point in my adventure, I know absolutely nothing about how pnpm works; I haven’t even checked out the repo. But, now I know exactly where to look! (If pnpm had been minified, I’d be in a much worse position.)

Working through the code

From the get-go I can see that there’s a lot of time spent in resolve. One thing I hadn’t mentioned was how I set up this huge monorepo; my initial version of the monorepo transition used version specifiers like workspace:../node to directly map packages to each other, avoiding the need for us to specify names/versions in every package.json (they’re already auto-generated by the DT publisher). Without even looking at the code, I (correctly) guessed that these paths were involved in the slowdown and filed an issue.

It turns out that this path mapping is actually a negative for other reasons as well, so I just rewrote my transform to use versions instead of paths. After switching to this new version, the profile looks like this:

A pprof profile of the “no paths” test case, two large blocks, first one smaller than before. The overall execution takes about 65 seconds.

Alright, that’s better already, down from ~100 seconds to 64 seconds. We’ll come back to resolve later.

createPkgGraph

The first block is the first “very long pause” (which happens even in the “new” version of the repo), so let’s start there. Searching the pnpm codebase, I find the offending function. It looks something like this (cut down for brevity):

function createPkgGraph(pkgs: Array<Package>) {
    const pkgMap = createPkgMap(pkgs);
    return mapValues((pkg) => ({
        dependencies: createNode(pkg),
        package: pkg,
    }), pkgMap);

    function createNode(pkg: Package): string[] {
        const dependencies = {
            ...pkg.manifest.devDependencies,
            ...pkg.manifest.optionalDependencies,
            ...pkg.manifest.dependencies,
        };

        return Object.entries(dependencies)
            .map(([depName, rawSpec]) => {
                const isWorkspaceSpec = rawSpec.startsWith("workspace:");
                const spec = npa.resolve(depName, rawSpec, pkg.dir);

                if (spec.type === "directory") {
                    const matchedPkg = Object.values(pkgMap).find((pkg) =>
                        path.relative(pkg.dir, spec.fetchSpec) === ""
                    );
                    return matchedPkg?.dir;
                }

                const pkgs = Object.values(pkgMap).filter((pkg) =>
                    pkg.manifest.name === depName
                );

                if (pkgs.length === 0) return "";

                const versions = pkgs.filter(({ manifest }) => manifest.version)
                    .map((pkg) => pkg.manifest.version) as string[];

                if (isWorkspaceSpec && versions.length === 0) {
                    const matchedPkg = pkgs.find((pkg) =>
                        pkg.manifest.name === depName
                    );
                    return matchedPkg!.dir;
                }

                if (versions.includes(rawSpec)) {
                    const matchedPkg = pkgs.find((pkg) =>
                        pkg.manifest.name === depName
                        && pkg.manifest.version === rawSpec
                    );
                    return matchedPkg!.dir;
                }

                // ...
            })
            .filter(Boolean);
    }
}

Alright, so we can sort of see what might be going on here. First off, we have pkgMap. By attaching to the code and looking at the variable, we find that it’s an object which consists of all 9,000+ packages. So doing anything with that is going to take a while.

At the top level, we’re already looping over every entry in the object via ramda’s mapValues. But, if we look inside createNode, we can see that it is also looping over all of pkgMap by calling Object.values(pkgMap)! This is quadratic; we’ll be doing 9,000 x 9,000 scans over the array. We could fix this by instead creating a mapping and accessing it instead. For example, one of the loops is just looking for all of the entries in pkgMap where pkg.manifest.name is some value. We could precalculate this mapping, producing an object of type Record<string, Package[]>.

The other loop is more complicated; this is where resolve comes in. We can see that we’re searching not for a specific name but for a specific set of packages whose paths map the one we specified (that workspace:../node from earlier). This one is tricky, but it’s possible that we could precalculate some table here too, depending on how sensitive this code is to path.resolve’s platform-specific semantics.

Speaking of precalculating… We just said that pkgMap was huge. But, for every call to createNode, we call Object.values(pkgMap)! The profile doesn’t explicitly state so, but this is really, really expensive. The good news is that pkgMap is never modified. This means that we could calculate this big array once and then reuse it, for example:

function createPkgGraph(pkgs: Array<Package>) {
    const pkgMap = createPkgMap(pkgs);
    const pkgMapValues = Object.values(pkgMap); // <-- NEW!
    return mapValues((pkg) => ({
        dependencies: createNode(pkg),
        package: pkg,
    }), pkgMap);

    function createNode(pkg: Package): string[] {
        // ...

        return Object.entries(dependencies)
            .map(([depName, rawSpec]) => {
                // ...

                if (spec.type === "directory") {
                    const matchedPkg = pkgMapValues.find((pkg) =>
                        path.relative(pkg.dir, spec.fetchSpec) === ""
                    );
                    return matchedPkg?.dir;
                }

                const pkgs = pkgMapValues.filter((pkg) =>
                    pkg.manifest.name === depName
                );

                // ...
            })
            .filter(Boolean);
    }
}

This turns out to save the bulk of the time. Yay!

Algorithmically, the code is still quadratic, but it’s still a lot faster and this kind of change is very safe, safe enough to be backported. I sent this one as a quick PR, and it’s now out in v7.30.4.

The fix to the quadratic-ness is going to be a different, more complicated change I plan to send later.

UPDATE: Later is now the past! All of the quadratic-ness has been fixed as of:

getRootPackagesToLink

Let’s look at the second big chunk. Cut down for brevity again, we have:

async function getRootPackagesToLink(
    lockfile: Lockfile,
    opts: {/* some options */},
) {
    const importerManifestsByImporterId = {};
    for (const { id, manifest } of opts.projects) {
        importerManifestsByImporterId[id] = manifest;
    }

    const projectSnapshot = lockfile.importers[opts.importerId];
    const allDeps = {
        ...projectSnapshot.devDependencies,
        ...projectSnapshot.dependencies,
        ...projectSnapshot.optionalDependencies,
    };

    return (await Promise.all(
        Object.entries(allDeps)
            .map(async ([alias, ref]) => {
                // ...

                return {
                    // a bunch of props
                };
            }),
    ))
        .filter(Boolean) as LinkedDirectDep[];
}

Again, the profile is not being very specific. It’s just saying that a lot of time is being spent in getRootPackagesToLink. Thankfully, there’s not much code actually inside this function. It can only be the calculation of importerManifestsByImporterId, or the spread to produce allDeps.

I debugged this to try and get the size of these elements. getRootPackagesToLink is called for every package in the repo, and allDeps is small. So that’s not likely to be it.

The importerManifestsByImporterId loop, on the other hand, is suspicious. I just said that getRootPackagesToLink is called once per package in the repo. But, opts.projects is a big list of all packages in the repo! We’re quadratic again!

This is better than before, in theory; there are lookups inside the .map call below, but they’re efficient because they don’t loop over opts.projects (as opposed to createNode from earlier, which does do the linear lookup). But, getRootPackagesToLink is recreating this mapping every single time it’s called!

If we scroll down a little bit, we can find its sole caller:

const projectsToLink = Object.fromEntries(
    await Promise.all(
        projects.map(async ({ rootDir, id, modulesDir }) => [id, {
            dir: rootDir,
            modulesDir,
            dependencies: await getRootPackagesToLink(filteredLockfile, {
                // ...
                projects,
                // ...
            }),
        }]),
    ),
);

There’s that “for each package” thing again. Thankfully, we can again see that projects is not changing between calls. So, we can instead calculate this mapping once and pass it in to getRootPackagesToLink, again without changing much logic.

const importerManifestsByImporterId = {} as { [id: string]: ProjectManifest; };
for (const { id, manifest } of opts.projects) {
    importerManifestsByImporterId[id] = manifest;
}

const projectsToLink = Object.fromEntries(
    await Promise.all(
        projects.map(async ({ rootDir, id, modulesDir }) => [id, {
            dir: rootDir,
            modulesDir,
            dependencies: await getRootPackagesToLink(filteredLockfile, {
                // ...
                importerManifestsByImporterId,
                // ...
            }),
        }]),
    ),
);

Now drop the code to produce the mapping from getRootPackagesToLink and we’re done.

I sent this as a PR over too, and it also is available in v7.30.4.

The “final” result (for now)

Now that we have these two fixes in, let’s re-profile pnpm install for the newer version:

$ npx --package=pnpm@7.30.4 -c 'pprof-it $(which pnpm) install'
# ...
Done in 13.6s

Immediately, the difference is evident. There’s no longer a huge delay before I get the cycle warning. The whole thing now takes 13.6 seconds. That’s a huge improvement! It’s outlandishly good to be processing 9,000+ packages in such a short time.

What about the profile, though?

A pprof profile of the finalized code, with the two blocks (mostly) gone, and a lot of little stuff now showing. The overall execution takes about 13 seconds.

Much different. We can see that the huge obvious blocks are gone, leaving us with a bunch of small stuff (if two obvious chunks were “the dream”, a bunch of small stuff is “the nightmare”). We can still see that createPkgGraph is still the most obvious chunk, lending to the fact that we didn’t fix the fact that it’s quadratic. But, if we fix that, that’ll be a few more seconds saved! And, we can profile it again, and maybe we can look into sequenceGraph or getAllProjects, the next big chunks.

Recapping

To recap, we:

Ran pnpm on a huge monorepo, and found it to be suspiciously slow, visibly hanging at times.
Ran pprof-it to take a look under the hood.
Found a couple of big candidates for optimization.
Stared at some code.
Got lucky, addressing both problems by simply shifting some code around.
Made pnpm 4x faster! (For this super ridiculous test case, anyway.)

I hope this was informative. Profiling is an excellent trick to have in your toolbox. Sometimes, you’ll be unlucky and it won’t show you much. But, when you do find something, it’s worth having spent a few minutes trying it out.

In case you’re curious what else we’ve (me and the TypeScript team) have been able to find, check out these PRs and issues:

A performance boost from avoiding the calculation of all properties of unions / intersections where all we wanted to know is if any type matches a condition.
A performance boost by discovering that a computation was not being cached.
A performance regression I (unwittingly) introduced in TypeScript’s string template literals when used with intersections, with two PRs (#53406 and #53413) attempting to address it.
A performance boost in TypeScript 5.0, where I identified that we weren’t reusing our “printers” as much as we could have, saving a few percent (and even more in some projects).
An older PR where pprof had pointed out that a lot of time during a build of a TypeScript project was being spent normalizing paths, even if the platform was UNIX-like and the paths were already using the correct slashes.
A PR I sent back when I was working on Pylance/pyright, where 50% of GC time was spent concatenating strings.

Well, this used to be true, but might not be anymore. Definitely not if you git blame the TypeScript repo and forget to use .git-blame-ignore-revs! Thanks, modules. ↩︎
Okay, this is a fork of the original released by Google, but that one hasn’t been updated in years, and DataDog’s fork includes prebuilt binaries. ↩︎
This is something I’ve been meaning to dig into, but it turns out to be a problem that also happens to the more typical .cpuprofile files Node performance nerds may already be familiar with, so I just haven’t prioritized looking into it. ↩︎

https://jakebailey.dev/posts/pnpm-dt-2/

What is DefinitelyTyped, and is it a monorepo?

Mar 26, 2023

Yes, it is! Kinda.

Show full content

This post is a brief(-ish) overview of the current state of DefinitelyTyped and its (potential) future. If you’re looking for deep history, you should definitely check out John Reilly’s “Definitely Typed: The Movie”, which tells the story of DefinitelyTyped from the start in 2012.

What is “DefinitelyTyped”?

Generally speaking, there are two categories of packages on npm:

Packages authored in TypeScript.
Packages authored in JavaScript.

Whenever you install a package authored in TypeScript, you’ll also get its types.1 This means that when you import it in your own project, you’ll get the types that the authors wrote in their code. This is the easy path; the hard work is done!

But what if the package wasn’t written in TypeScript? In this situation, it may be the case that the author hand-wrote types for their package, but most of the time, you’ll have to install types separately. It’s likely you’ve installed a package like @types/node or @types/react.

Packages published under the @types scope come from “DefinitelyTyped”, aka “DT”. DT is huge, comprising of 8,000+ packages, 6,000+ package owners, and 17,000+ unique contributors since its inception in 2012. Operating at this scale is hard, but the infrastructure is powerful enough to automate most PRs and automatically publish these packages every half hour.

How is DT laid out?

In the DT repo, there’s a directory named “types”, and that directory has all 8,000+ packages. With so many packages, you’d expect this to be one of those newfangled “monorepos” everyone’s been talking about. And it is! Well, kinda.

It turns out that even though there are over 8,000 packages in the repo, there are only about 1,200 package.json files. What gives? How does anything work?

Let’s look a file that every package does have; tsconfig.json. Here’s the tsconfig for @types/minimist:

{
    "compilerOptions": {
        "module": "commonjs",
        "lib": [
            "es6"
        ],
        "noImplicitAny": true,
        "noImplicitThis": true,
        "strictNullChecks": true,
        "strictFunctionTypes": true,
        "baseUrl": "../",
        "typeRoots": [
            "../"
        ],
        "types": [],
        "noEmit": true,
        "forceConsistentCasingInFileNames": true
    },
    "files": [
        "index.d.ts",
        "minimist-tests.ts"
    ]
}

Pretty standard stuff, but this is the critical subset:

{
    "compilerOptions": {
        "baseUrl": "../",
        "typeRoots": [
            "../"
        ],
        "types": []
    }
}

What does this do?

baseUrl defines a path where TypeScript is allowed to perform absolute lookups. So if this package were to write import _ from "lodash", TypeScript will look for that in the types directory.
typeRoots tells TypeScript to consider the types directory to be the @types directory that would typically be in node_modules; now, it can find @types/lodash as /types/lodash!
types configures which @types packages are automatically included in the compilation. This can be convenient for typical projects since installing @types/node will declare all of Node’s packages and ambient types. But on DT, this is a bad idea, because we’d pull every @types package in. Setting this to the empty array stops this and allows us to manually pull things in with /// <reference types="...">.

The combined result is that DT works like a monorepo already, just without the involvement of a package manager (for the most part). If a package depends on another DT package, the publisher detects that import and automatically adds a dependency to the final package when publishing to npm.

And so, DT is a monorepo, but, it also isn’t, at least not in the way that people have come to know most monorepos in the JS world.

Of course, there are exceptions to every rule. A small fraction (~15%) of DT do have package.json files. This is because some packages depend on the types of packages not in DefinitelyTyped. This makes sense; a lot of packages are now written in TypeScript directly, and so publish their types directly, without involving DT. If a package typed on DT depends on a package that already has types, then the DT types will likely need types from that dependency as well.

What’s the problem?

It turns out that we’ve recently felt the need to change the status quo, for at least two reasons.

Firstly, since each package with a package.json needs its own external dependencies, we need to run npm install. But, we’re not a monorepo! This turns into over 30 minutes of just looping over every folder with a package.json and running npm install. Recently (as of writing), we’ve had issues with the install step randomly timing out. It’s really frustrating for the TypeScript team as we test all of DefinitelyTyped on most type checking changes, just to make sure we don’t break anyone (or, only break things in desirable ways).

Secondly, you may remember that the tsconfig.json from earlier set "module": "commonjs". This is the only valid configuration on DT and it has worked for a very long time. But as more and more packages start using features like ESM and export maps, DT needs to be able to support those features. And it does! Mostly. The "module": "commonjs" lie can be worked around for the most part, but DT should really be set to "moduleResolution": "node16" and then actually test that the packages and their dependencies and dependants actually work in that more modern mode.

A solution to both of these problems is to turn DefinitelyTyped into a monorepo more like what other major projects are doing, meaning:

Add a package.json to every DT package.
Explicitly declare all dependencies, even those within the repo.
Let a package manager or monorepo tool link the projects in node_modules.
Install everything at once.
Drop baseUrl and typeRoots out of every tsconfig.json.

This (theoretically) gets us a much faster install time, as well as getting us a final state on disk that matches what downstream users see, enabling packages to start making use of "moduleResolution": "node16".

What next?

This is a cool idea in theory, but to make it real, we have to make some choices. Specifically, the tooling. There are some unique restrictions which make this choice complicated:

The tool has to be handle the 8,000+ DT packages and their external dependencies.
The tool shouldn’t hoist anything, unless it’s safe to do so. We don’t want to accidentally resolve anything.
The tool must be able to handle multiple versions of packages in types (e.g. @types/react in types/react, @types/react@v17 in types/react/v17, and so on).
The tool should be fast. Right now, if you work on one package, you may not even need to install anything. If you do install a package, you’re only going to pay for the cost of installing that one DT package’s deps. If we have to get the whole monorepo, that experience hopefully shouldn’t suffer.
The tool shouldn’t try and do anything else. We just want package linking, not a build system. There’s nothing to build!

This set of requirements really narrows it down; at the time of writing, the only package manager which meets these requirements is pnpm. The other choices either ban packages of duplicated names, are generally not configurable enough, or take too long to install (though no option is likely slower than the 30 minute CI install). I’m not super surprised; pnpm is the default package manager of the rushstack tooling and there are some pretty ridiculously sized monorepos using it.

Even still, pnpm’s great performance still felt a little slow. I noticed that on install it’d hang and then start printing text, implying some performance problem. Not shocking; the number of packages it finally resolves to is over 9,000, and I’d think any tool would chug with that much work to do.

But, there’s good news! By profiling pnpm install, I discovered that that the performance holes are mostly just cases of “accidentally quadratic” code, and therefore can be addressed.

And that’s the actual thing I wanted to write about before I got carried away. For details on that, check out the next post in this series.

👋 package manager maintainers

There’s no doubt in my mind that this post will eventually make its way to the maintainers of the other package managers and monorepo tools. Understand, I really truly do not mean anything negative in the above. I use all of your tools and they’re all great! My focus on pnpm above is due to the fact that I’m able to make immediate progress with it and that it also lets me demo some cool profiling techniques I’ve been meaning to share for a while. I have no idea how DT will actually end up, we’re just hurting now and I’m finding this fun to play around with.

If the author set "declaration": true and published them, anyway. ↩︎

https://jakebailey.dev/posts/pnpm-dt-1/