GeistHaus
log in · sign up

n8henrie.com

Part of n8henrie.com

Technology, medicine, science, superstition... and having fun. Brought to you by Nathan Henrie.

stories primary
Nix: Debugging MacOS / Darwin Sandbox Issues
debuggingMac OSXMacOSnixtech

Bottom Line: Filtering the macos logs for sandbox violations can help flesh out the root cause of sandbox-related crashes when building with nix.

I recently started getting some unexpected build failures when trying to rebuild my nix-darwin system:

$ nix build -v --print-build-logs github:nixos/nixpkgs/2bdc7039afa38f4330de69360a817e11f7e2f2c5#mpv-unwrapped
...
mpv> buildPhase completed in 38 seconds
mpv> Running phase: installPhase
mpv> mesonInstallPhase flags: ''
mpv> Installing mpv.1 to /nix/store/fpvji1dkf6ciql0icyc4fa7rab8k5zvb-mpv-0.40.0-man/share/man/man1
mpv> Installing libmpv.2.dylib to /nix/store/6jdaccxf7ic245vvqslcd28imkwikzn8-mpv-0.40.0/lib
mpv> Installing mpv to /nix/store/6jdaccxf7ic245vvqslcd28imkwikzn8-mpv-0.40.0/bin
mpv> Installing /nix/var/nix/builds/nix-58338-1051359316/source/include/mpv/client.h to /nix/store/xbv4gki5sms5zcx592dnb8n8sirylpqv-mpv-0.40.0-dev/include/mpv
mpv> Installing /nix/var/nix/builds/nix-58338-1051359316/source/include/mpv/render.h to /nix/store/xbv4gki5sms5zcx592dnb8n8sirylpqv-mpv-0.40.0-dev/include/mpv
mpv> Installing /nix/var/nix/builds/nix-58338-1051359316/source/include/mpv/render_gl.h to /nix/store/xbv4gki5sms5zcx592dnb8n8sirylpqv-mpv-0.40.0-dev/include/mpv
mpv> Installing /nix/var/nix/builds/nix-58338-1051359316/source/include/mpv/stream_cb.h to /nix/store/xbv4gki5sms5zcx592dnb8n8sirylpqv-mpv-0.40.0-dev/include/mpv
mpv> Installing /nix/var/nix/builds/nix-58338-1051359316/source/build/meson-private/mpv.pc to /nix/store/6jdaccxf7ic245vvqslcd28imkwikzn8-mpv-0.40.0/lib/pkgconfig
mpv> Installing /nix/var/nix/builds/nix-58338-1051359316/source/etc/mpv.conf to /nix/store/6jdaccxf7ic245vvqslcd28imkwikzn8-mpv-0.40.0/share/doc/mpv
mpv> Installing /nix/var/nix/builds/nix-58338-1051359316/source/etc/input.conf to /nix/store/6jdaccxf7ic245vvqslcd28imkwikzn8-mpv-0.40.0/share/doc/mpv
mpv> Installing /nix/var/nix/builds/nix-58338-1051359316/source/etc/mplayer-input.conf to /nix/store/6jdaccxf7ic245vvqslcd28imkwikzn8-mpv-0.40.0/share/doc/mpv
mpv> Installing /nix/var/nix/builds/nix-58338-1051359316/source/etc/restore-old-bindings.conf to /nix/store/6jdaccxf7ic245vvqslcd28imkwikzn8-mpv-0.40.0/share/doc/mpv
mpv> Installing /nix/var/nix/builds/nix-58338-1051359316/source/etc/restore-osc-bindings.conf to /nix/store/6jdaccxf7ic245vvqslcd28imkwikzn8-mpv-0.40.0/share/doc/mpv
mpv> Installing /nix/var/nix/builds/nix-58338-1051359316/source/etc/mpv.bash-completion to /nix/store/6jdaccxf7ic245vvqslcd28imkwikzn8-mpv-0.40.0/share/bash-completion/completions
mpv> Installing /nix/var/nix/builds/nix-58338-1051359316/source/etc/_mpv.zsh to /nix/store/6jdaccxf7ic245vvqslcd28imkwikzn8-mpv-0.40.0/share/zsh/site-functions
mpv> Installing /nix/var/nix/builds/nix-58338-1051359316/source/etc/mpv.fish to /nix/store/6jdaccxf7ic245vvqslcd28imkwikzn8-mpv-0.40.0/share/fish/vendor_completions.d
mpv> Installing /nix/var/nix/builds/nix-58338-1051359316/source/etc/mpv.metainfo.xml to /nix/store/6jdaccxf7ic245vvqslcd28imkwikzn8-mpv-0.40.0/share/metainfo
mpv> Installing /nix/var/nix/builds/nix-58338-1051359316/source/etc/encoding-profiles.conf to /nix/store/6jdaccxf7ic245vvqslcd28imkwikzn8-mpv-0.40.0/etc/mpv
mpv> Installing /nix/var/nix/builds/nix-58338-1051359316/source/etc/mpv-icon-8bit-16x16.png to /nix/store/6jdaccxf7ic245vvqslcd28imkwikzn8-mpv-0.40.0/share/icons/hicolor/16x16/apps
mpv> Installing /nix/var/nix/builds/nix-58338-1051359316/source/etc/mpv-icon-8bit-32x32.png to /nix/store/6jdaccxf7ic245vvqslcd28imkwikzn8-mpv-0.40.0/share/icons/hicolor/32x32/apps
mpv> Installing /nix/var/nix/builds/nix-58338-1051359316/source/etc/mpv-icon-8bit-64x64.png to /nix/store/6jdaccxf7ic245vvqslcd28imkwikzn8-mpv-0.40.0/share/icons/hicolor/64x64/apps
mpv> Installing /nix/var/nix/builds/nix-58338-1051359316/source/etc/mpv-icon-8bit-128x128.png to /nix/store/6jdaccxf7ic245vvqslcd28imkwikzn8-mpv-0.40.0/share/icons/hicolor/128x128/apps
mpv> Installing /nix/var/nix/builds/nix-58338-1051359316/source/etc/mpv-gradient.svg to /nix/store/6jdaccxf7ic245vvqslcd28imkwikzn8-mpv-0.40.0/share/icons/hicolor/scalable/apps
mpv> Installing /nix/var/nix/builds/nix-58338-1051359316/source/etc/mpv-symbolic.svg to /nix/store/6jdaccxf7ic245vvqslcd28imkwikzn8-mpv-0.40.0/share/icons/hicolor/symbolic/apps
mpv> Installing symlink pointing to libmpv.2.dylib to /nix/store/6jdaccxf7ic245vvqslcd28imkwikzn8-mpv-0.40.0/lib/libmpv.dylib
mpv> /nix/var/nix/builds/nix-58338-1051359316/source/TOOLS /nix/var/nix/builds/nix-58338-1051359316/source/build
mpv> /nix/var/nix/builds/nix-58338-1051359316/source/build
mpv> Running phase: fixupPhase
mpv> Moving /nix/store/6jdaccxf7ic245vvqslcd28imkwikzn8-mpv-0.40.0/share/doc to /nix/store/lzr5jj41d40378vwvpix0lw5mp8vf8lz-mpv-0.40.0-doc/share/doc
mpv> Moving /nix/store/6jdaccxf7ic245vvqslcd28imkwikzn8-mpv-0.40.0/lib/pkgconfig to /nix/store/xbv4gki5sms5zcx592dnb8n8sirylpqv-mpv-0.40.0-dev/lib/pkgconfig
mpv> Patching '/nix/store/xbv4gki5sms5zcx592dnb8n8sirylpqv-mpv-0.40.0-dev/lib/pkgconfig/mpv.pc' includedir to output /nix/store/xbv4gki5sms5zcx592dnb8n8sirylpqv-mpv-0.40.0-dev
mpv> checking for references to /nix/var/nix/builds/nix-58338-1051359316/ in /nix/store/6jdaccxf7ic245vvqslcd28imkwikzn8-mpv-0.40.0...
mpv> patching script interpreter paths in /nix/store/6jdaccxf7ic245vvqslcd28imkwikzn8-mpv-0.40.0
mpv> /nix/store/6jdaccxf7ic245vvqslcd28imkwikzn8-mpv-0.40.0/bin/umpv: interpreter directive changed from "#!/usr/bin/env python3" to "/nix/store/xcjk9ill54kjk8mzgq6yydnx9015lidg-python3-3.13.9/bin/python3"
mpv> /nix/store/6jdaccxf7ic245vvqslcd28imkwikzn8-mpv-0.40.0/bin/mpv_identify.sh: interpreter directive changed from "#!/bin/sh" to "/nix/store/19zw2r9dl44wk3j5ncwsk743zr9fc584-bash-interactive-5.3p3/bin/sh"
mpv> stripping (with command strip and flags -S) in  /nix/store/6jdaccxf7ic245vvqslcd28imkwikzn8-mpv-0.40.0/lib /nix/store/6jdaccxf7ic245vvqslcd28imkwikzn8-mpv-0.40.0/bin /nix/store/6jdaccxf7ic245vvqslcd28imkwikzn8-mpv-0.40.0/Applications
mpv> checking for references to /nix/var/nix/builds/nix-58338-1051359316/ in /nix/store/xbv4gki5sms5zcx592dnb8n8sirylpqv-mpv-0.40.0-dev...
mpv> patching script interpreter paths in /nix/store/xbv4gki5sms5zcx592dnb8n8sirylpqv-mpv-0.40.0-dev
mpv> stripping (with command strip and flags -S) in  /nix/store/xbv4gki5sms5zcx592dnb8n8sirylpqv-mpv-0.40.0-dev/lib
mpv> checking for references to /nix/var/nix/builds/nix-58338-1051359316/ in /nix/store/lzr5jj41d40378vwvpix0lw5mp8vf8lz-mpv-0.40.0-doc...
mpv> patching script interpreter paths in /nix/store/lzr5jj41d40378vwvpix0lw5mp8vf8lz-mpv-0.40.0-doc
mpv> checking for references to /nix/var/nix/builds/nix-58338-1051359316/ in /nix/store/fpvji1dkf6ciql0icyc4fa7rab8k5zvb-mpv-0.40.0-man...
mpv> gzipping man pages under /nix/store/fpvji1dkf6ciql0icyc4fa7rab8k5zvb-mpv-0.40.0-man/share/man/
mpv> patching script interpreter paths in /nix/store/fpvji1dkf6ciql0icyc4fa7rab8k5zvb-mpv-0.40.0-man
mpv> Running phase: installCheckPhase
mpv> Executing versionCheckPhase
mpv> Did not find version 0.40.0 in the output of the command /nix/store/6jdaccxf7ic245vvqslcd28imkwikzn8-mpv-0.40.0/bin/mpv --help
mpv>
mpv> Did not find version 0.40.0 in the output of the command /nix/store/6jdaccxf7ic245vvqslcd28imkwikzn8-mpv-0.40.0/bin/mpv --version
mpv>
error: Cannot build '/nix/store/y75gpq7cpspdlj0pyz7vz2dza8p8vrfb-mpv-0.40.0.drv'.
       Reason: builder failed with exit code 2.
       Output paths:
         /nix/store/0k1fix2idrk4jwvq8m1l8sx6sqqrqc6v-mpv-0.40.0-dev
         /nix/store/261j9ny9h3vjd7654wc388j9jibhi9xv-mpv-0.40.0-man
         /nix/store/j4k3k512157c73falxnqshmqa35whxg6-mpv-0.40.0-doc
         /nix/store/jndgs2x8g080032gma77ikcbw4vrp2bx-mpv-0.40.0
       Last 25 log lines:
       > /nix/var/nix/builds/nix-58338-1051359316/source/TOOLS /nix/var/nix/builds/nix-58338-1051359316/source/build
       > /nix/var/nix/builds/nix-58338-1051359316/source/build
       > Running phase: fixupPhase
       > Moving /nix/store/6jdaccxf7ic245vvqslcd28imkwikzn8-mpv-0.40.0/share/doc to /nix/store/lzr5jj41d40378vwvpix0lw5mp8vf8lz-mpv-0.40.0-doc/share/doc
       > Moving /nix/store/6jdaccxf7ic245vvqslcd28imkwikzn8-mpv-0.40.0/lib/pkgconfig to /nix/store/xbv4gki5sms5zcx592dnb8n8sirylpqv-mpv-0.40.0-dev/lib/pkgconfig
       > Patching '/nix/store/xbv4gki5sms5zcx592dnb8n8sirylpqv-mpv-0.40.0-dev/lib/pkgconfig/mpv.pc' includedir to output /nix/store/xbv4gki5sms5zcx592dnb8n8sirylpqv-mpv-0.40.0-dev
       > checking for references to /nix/var/nix/builds/nix-58338-1051359316/ in /nix/store/6jdaccxf7ic245vvqslcd28imkwikzn8-mpv-0.40.0...
       > patching script interpreter paths in /nix/store/6jdaccxf7ic245vvqslcd28imkwikzn8-mpv-0.40.0
       > /nix/store/6jdaccxf7ic245vvqslcd28imkwikzn8-mpv-0.40.0/bin/umpv: interpreter directive changed from "#!/usr/bin/env python3" to "/nix/store/xcjk9ill54kjk8mzgq6yydnx9015lidg-python3-3.13.9/bin/python3"
       > /nix/store/6jdaccxf7ic245vvqslcd28imkwikzn8-mpv-0.40.0/bin/mpv_identify.sh: interpreter directive changed from "#!/bin/sh" to "/nix/store/19zw2r9dl44wk3j5ncwsk743zr9fc584-bash-interactive-5.3p3/bin/sh"
       > stripping (with command strip and flags -S) in  /nix/store/6jdaccxf7ic245vvqslcd28imkwikzn8-mpv-0.40.0/lib /nix/store/6jdaccxf7ic245vvqslcd28imkwikzn8-mpv-0.40.0/bin /nix/store/6jdaccxf7ic245vvqslcd28imkwikzn8-mpv-0.40.0/Applications
       > checking for references to /nix/var/nix/builds/nix-58338-1051359316/ in /nix/store/xbv4gki5sms5zcx592dnb8n8sirylpqv-mpv-0.40.0-dev...
       > patching script interpreter paths in /nix/store/xbv4gki5sms5zcx592dnb8n8sirylpqv-mpv-0.40.0-dev
       > stripping (with command strip and flags -S) in  /nix/store/xbv4gki5sms5zcx592dnb8n8sirylpqv-mpv-0.40.0-dev/lib
       > checking for references to /nix/var/nix/builds/nix-58338-1051359316/ in /nix/store/lzr5jj41d40378vwvpix0lw5mp8vf8lz-mpv-0.40.0-doc...
       > patching script interpreter paths in /nix/store/lzr5jj41d40378vwvpix0lw5mp8vf8lz-mpv-0.40.0-doc
       > checking for references to /nix/var/nix/builds/nix-58338-1051359316/ in /nix/store/fpvji1dkf6ciql0icyc4fa7rab8k5zvb-mpv-0.40.0-man...
       > gzipping man pages under /nix/store/fpvji1dkf6ciql0icyc4fa7rab8k5zvb-mpv-0.40.0-man/share/man/
       > patching script interpreter paths in /nix/store/fpvji1dkf6ciql0icyc4fa7rab8k5zvb-mpv-0.40.0-man
       > Running phase: installCheckPhase
       > Executing versionCheckPhase
       > Did not find version 0.40.0 in the output of the command /nix/store/6jdaccxf7ic245vvqslcd28imkwikzn8-mpv-0.40.0/bin/mpv --help
       >
       > Did not find version 0.40.0 in the output of the command /nix/store/6jdaccxf7ic245vvqslcd28imkwikzn8-mpv-0.40.0/bin/mpv --version
       >
       For full logs, run:
         nix log /nix/store/y75gpq7cpspdlj0pyz7vz2dza8p8vrfb-mpv-0.40.0.drv

It was especially unusual because the build phase itself seemed to succeed, only the version check (which had been recently added by this commit) appeared to be failing:

> Did not find version 0.40.0 in the output of the command /nix/store/6jdaccxf7ic245vvqslcd28imkwikzn8-mpv-0.40.0/bin/mpv --version

(NB: if one has already successfully built a package the checkPhase doesn’t re-run, so it may be necesarry to add the --rebuild flag to reproduce this error.)

The versionCheckHook essentially just runs your binary with --help and then --version and looks for the specified package version in the stdout or stderr of either of those. Oddly, using that exact path and command seemed to work fine outside of the build environment, and indeed shows 0.40.0 in the output:

$ /nix/store/6jdaccxf7ic245vvqslcd28imkwikzn8-mpv-0.40.0/bin/mpv --version
mpv v0.40.0 Copyright © 2000-2025 mpv/MPlayer/mplayer2 projects
 built on Jan  1 1980 00:00:00
libplacebo version: v7.351.0
FFmpeg version: 8.0
FFmpeg library versions:
   libavcodec      62.11.100
   libavdevice     62.1.100
   libavfilter     11.4.100
   libavformat     62.3.100
   libavutil       60.8.100
   libswresample   6.1.100
   libswscale      9.1.100

Digging a litle deeper, I added set -x to the preVersionCheck step in the package to get some debugging output. It seemed to be getting no output at all:

+++++ env --chdir=/ --argv0=mpv --ignore-environment /nix/store/0d1f4drwi4ikcmgjwn9asral2a4cbf3m-mpv-0.40.0/bin/mpv --version
+++++ true
++++ versionOutput=

Changing the preVersionCheck to this:

preVersionCheck = ''
  set -x

  $out/bin/mpv --version
'';

revealed mpv to be crashing completely:

mpv> +++++ /nix/store/grzyi5fn7wv5d5v0hc8fbhh3r5zrmzjm-mpv-0.40.0/bin/mpv --version
mpv> /nix/store/dnjd7b7v5vyd8g152ziivp2jaz56bb5l-stdenv-darwin/setup: line 288: 96459 Abort trap: 6

I initially suspected this might be a codesigning issue, since these often cause headaches on macos machines. However, I eventually tried building with --option sandbox relaxed and then --option sandbox false and found that disabling the sandbox completely let the version check pass, confirming this to be fundamentally a sandbox issue.

I have some basic familiarity with the macos sandbox thanks to this issue. In the context of nix builds, the sandbox allows one to disallow access to anything outside the nix build environment (including network access, filesystem access, etc.). This helps ensure that nix builds aren’t “polluted” by a user’s specific environment and helps ensure that builds are reproducible in other contexts.

Additionally, I remember having learned that macos used to have an option to trace the sandbox execution and help users figure out what was failing, but that this functionality no longer exists. I also remembered that nix spits out its sandbox configuration if one builds with debug mode; using the nix command that means --debug. Using nix-build I had thought one could see the sandbox profile with -vvvv (4 vs) or maybe NIX_DEBUG=4, but I have to admit it’s not working for me right now (please comment if you know how to get nix-build to spit out a sandbox profile).

Using the --debug flag to see this output (and --rebuild in my case, since I had previously built successfully):

$ nix build --rebuild --debug .#mpv-unwrapped 2>&1 | tee mpv-debug.log
...
sandbox setup: Generated sandbox profile:
sandbox setup: (version 1)
sandbox setup: (deny default (with no-log))
sandbox setup:
sandbox setup:
sandbox setup: (define TMPDIR (param "_GLOBAL_TMP_DIR"))
sandbox setup:
sandbox setup: (deny default)
sandbox setup:
sandbox setup: ; Disallow creating setuid/setgid binaries, since that
sandbox setup: ; would allow breaking build user isolation.
sandbox setup: (deny file-write-setugid)
...

As we can see, each line seems to be prefixed with sandbox setup: , and Generated sandbox profile: lets us know where the interesting stuff starts. A little awk should help us strip this out (I “cached” the output to mpv-debug.log so I could tinker without having to wait for a rebuild each time):

$ awk < mpv-debug.log \
  -v header='Generated sandbox profile:' \
  -v leader='sandbox setup: ' \
  '
    flag && $0 ~ "^" leader { sub(leader, ""); print }
    $0 ~ header { flag=1 }
  ' |
  tee mpv.sb
$ head mpv.sb
(version 1)
(deny default (with no-log))
(define TMPDIR (param "_GLOBAL_TMP_DIR"))
(deny default)
; Disallow creating setuid/setgid binaries, since that
; would allow breaking build user isolation.
(deny file-write-setugid)
; Allow forking.
(allow process-fork)
; Allow reading system information like #CPUs, etc.
$ tail mpv.sb
        (literal "/nix/store")
  (literal "/nix/var")
    (literal "/nix/var/nix")
        (literal "/nix/var/nix/builds")
 (literal "/private")
    (literal "/private/var")
        (literal "/usr")
        (literal "/usr/lib")
    (literal "/usr/lib/system")
)

Now that we have a sandbox profile, we can run a command in the context of this profile using sandbox-exec -f mpv.sb: Unfortunately we see that the command fails, with a fairly unhelpful message:

$ sandbox-exec -f mpv.sb /nix/store/iq7mr3dxkq09cg851dzhnlkvb34wcf4k-mpv-0.40.0/bin/mpv --version
sandbox-exec: invalid data type of path filter; expected pattern, got boolean

I eventually I sorted out that there are some required parameters that must be passed in, including _GLOBAL_TMP_DIR, _NIX_BUILD_TOP, and _ALLOW_LOCAL_NETWORKING:

$ rg '\bparam\b' mpv.sb
3:(define TMPDIR (param "_GLOBAL_TMP_DIR"))
29:       (subpath (param "_NIX_BUILD_TOP")))
38:(if (param "_ALLOW_LOCAL_NETWORKING")

These can be specified via -D param=val, and I think _ALLOW_LOCAL_NETWORKING can be left empty (which should evaluate to #f / false). Wrapping this up in a script for convenience:

#!/usr/bin/env bash
# sandbox.sh

set -Eeuf -o pipefail
set -x

main() {
  sandbox-exec \
    -D _GLOBAL_TMP_DIR="${TMPDIR}" \
    -D _NIX_BUILD_TOP="${TMPDIR}" \
    -f mpv.sb \
    /nix/store/iq7mr3dxkq09cg851dzhnlkvb34wcf4k-mpv-0.40.0/Applications/mpv.app/Contents/MacOS/mpv --version
}
main "$@"

Running this script gives us the same Abort trap: 6 we saw earlier:

$ ./sandbox.sh
+ main
++ mktemp -d
++ mktemp -d
+ sandbox-exec -D _GLOBAL_TMP_DIR=/var/folders/kb/tw_lp_xd2_bbv0hqk4m0bvt80000gn/T/tmp.8LoLZufaEe -D _NIX_BUILD_TOP=/var/folders/kb/tw_lp_xd2_bbv0hqk4m0bvt80000gn/T/tmp.YUT5GARrUD -f mpv.sb /nix/store/iq7mr3dxkq09cg851dzhnlkvb34wcf4k-mpv-0.40.0/Applications/mpv.app/Contents/MacOS/mpv --version
./sandbox.sh: line 6: 95518 Abort trap: 6              sandbox-exec -D _GLOBAL_TMP_DIR="$(mktemp -d)" -D _NIX_BUILD_TOP="$(mktemp -d)" -f mpv.sb /nix/store/iq7mr3dxkq09cg851dzhnlkvb34wcf4k-mpv-0.40.0/Applications/mpv.app/Contents/MacOS/mpv --version

Progress!

(NB: In some cases I imagine it may be helpful to run nix build with the --keep-failed flag, search the output for the build directory, then specify this as _NIX_BUILD_TOP.)

Now that we have an example sandbox profile and can reproduce the error, it’s time to sort out what is missing in the sandbox to allow successful execution of mpv --version. One can always use Console.app to get an idea, but I find that the command line version fits my workflow better, which are log stream (or log show --last 1h for reviewing historical logs).

Unfortunately, the log is very busy. Naively piping its output to rg helped me find a promising line to help me set up a filter:

$ log stream --info --debug | rg sandbox
...
2025-12-21 09:58:24.425558-0700 0x1093f84  Debug       0x0                  601    0    sandboxd: [com.apple.sandbox.reporting:violation] begin container id: 17384080, type: thread container
...

The example in log help predicates was really helpful in setting up a filter to match this output:

$ log help predicates
...
 log fields corresponding to a log line:
       2024-05-30 08:40:15.980893-0400 0x26166c   Default     0x0                  90092  0    log: (libxpc.dylib)   [com.apple.xpc:connection]
     [0x6000007081e0] activating connection: mach=true listener=false peer=false name=com.apple.logd.admin

       where

       '2024-05-30 08:40:15.980893-0400' == date
       '0x26166c' == threadIdentifier
       'Default' == logType
       '90092' == processIdentifier
       'log' == process
       'libxpc.dylib' == sender
       'com.apple.xpc' == subsystem
       'connection' == category
       'activating connection[...]' == composedMessage

Using that as an example, I found this to work fairly well:

$ log stream --info --debug --predicate '(process == "sandboxd") && (subsystem == "com.apple.sandbox.reporting") && (category == "violation")'

Running this in one window and then executing my sandbox.sh script in another, I get a lot of output from the violations:

Failed to symbolicate: NULL symbolicator
2025-12-21 10:01:59.712321-0700 0x1095684  Error       0x0                  601    0    sandboxd: [com.apple.sandbox.reporting:violation] Sandbox: mpv(96060) deny(1) mach-lookup com.apple.CoreServices.coreservicesd
Process:         mpv [96060]
Path:            /nix/store/iq7mr3dxkq09cg851dzhnlkvb34wcf4k-mpv-0.40.0/Applications/mpv.app/Contents/MacOS/mpv
Load Address:    0
Identifier:      io.mpv
Version:         ??? (0.40.0)
Code Type:       unknown (Native)
Parent Process:  bash [96057]
Responsible:     /Applications/Nix Apps/Alacritty.app/Contents/MacOS/alacritty
User ID:         501

Date/Time:       2025-12-21 10:01:59.712 MST
OS Version:      macOS 26.2 (25C56)
Release Type:    User
Report Version:  8

MetaData: {"operation":"mach-lookup","responsible-process-sdk":918528,"sandbox_checker":"launchd","signing-id":"mpv","mach_namespace":1,"build":"macOS 26.2 (25C56)","target":"com.apple.CoreServices.coreservicesd","profile-in-collection":false,"hardware":"J614c","uid":501,"platform-binary":false,"translated":false,"primary-filter":"global-name","policy-description":"Sandbox","parent-process-name":"bash","platform_binary":"no","primary-filter-value":"com.apple.CoreServices.coreservicesd","responsible-process-path":"\/Applications\/Nix Apps\/Alacritty.app\/Contents\/MacOS\/alacritty","flags":5,"action":"deny","checker":"launchd","process_path":["nix","store","iq7mr3dxkq09cg851dzhnlkvb34wcf4k-mpv-0.40.0","Applications","mpv.app","Contents","MacOS","mpv"],"binary-in-trust-cache":false,"apple-internal":false,"process-path":"\/nix\/store\/iq7mr3dxkq09cg851dzhnlkvb34wcf4k-mpv-0.40.0\/Applications\/mpv.app\/Contents\/MacOS\/mpv","normalized_target":["com.apple.CoreServices.coreservicesd"],"profile-flags":0,"global-name":"com.apple.CoreServices.coreservicesd","responsible-process-signing-id":"alacritty","platform-policy":false,"pid":96060,"checker-pid":1,"summary":"deny(1) mach-lookup com.apple.CoreServices.coreservicesd","errno":1,"parent-process-pid":96057,"process":"mpv","release-type":"User"}

Failed to symbolicate: NULL symbolicator
2025-12-21 10:01:59.712538-0700 0x1095684  Error       0x0                  601    0    sandboxd: [com.apple.sandbox.reporting:violation] Sandbox: mpv(96060) deny(1) mach-lookup com.apple.DiskArbitration.diskarbitrationd
Process:         mpv [96060]
Path:            /nix/store/iq7mr3dxkq09cg851dzhnlkvb34wcf4k-mpv-0.40.0/Applications/mpv.app/Contents/MacOS/mpv
Load Address:    0
Identifier:      io.mpv
Version:         ??? (0.40.0)
Code Type:       unknown (Native)
Parent Process:  bash [96057]
Responsible:     /Applications/Nix Apps/Alacritty.app/Contents/MacOS/alacritty
User ID:         501

Date/Time:       2025-12-21 10:01:59.712 MST
OS Version:      macOS 26.2 (25C56)
Release Type:    User
Report Version:  8

MetaData: {"operation":"mach-lookup","responsible-process-sdk":918528,"sandbox_checker":"launchd","signing-id":"mpv","mach_namespace":1,"build":"macOS 26.2 (25C56)","target":"com.apple.DiskArbitration.diskarbitrationd","profile-in-collection":false,"hardware":"J614c","uid":501,"platform-binary":false,"translated":false,"primary-filter":"global-name","policy-description":"Sandbox","parent-process-name":"bash","platform_binary":"no","primary-filter-value":"com.apple.DiskArbitration.diskarbitrationd","responsible-process-path":"\/Applications\/Nix Apps\/Alacritty.app\/Contents\/MacOS\/alacritty","flags":5,"action":"deny","checker":"launchd","process_path":["nix","store","iq7mr3dxkq09cg851dzhnlkvb34wcf4k-mpv-0.40.0","Applications","mpv.app","Contents","MacOS","mpv"],"binary-in-trust-cache":false,"apple-internal":false,"process-path":"\/nix\/store\/iq7mr3dxkq09cg851dzhnlkvb34wcf4k-mpv-0.40.0\/Applications\/mpv.app\/Contents\/MacOS\/mpv","normalized_target":["com.apple.DiskArbitration.diskarbitrationd"],"profile-flags":0,"global-name":"com.apple.DiskArbitration.diskarbitrationd","responsible-process-signing-id":"alacritty","platform-policy":false,"pid":96060,"checker-pid":1,"summary":"deny(1) mach-lookup com.apple.DiskArbitration.diskarbitrationd","errno":1,"parent-process-pid":96057,"process":"mpv","release-type":"User"}

Failed to symbolicate: NULL symbolicator

NB: I found several times that log stream would have no output the second or third time I ran sandbox.sh; I suspect there is some filtering for “duplicate lines” happening. Waiting a minute or two and running again (with no changes) worked for me, YMMV.

Now we can use something like rg (or grep would be fine) to further filter results; lines containing deny(1) look particularly promising:

$ log stream --info --debug --predicate '(process == "sandboxd") && (subsystem == "com.apple.sandbox.reporting") && (category == "violation")' |
  rg 'sandboxd:.*violation.*deny\(1\)'

2025-12-21 10:13:59.160033-0700 0x1099e49  Error       0x0                  601    0    sandboxd: [com.apple.sandbox.reporting:violation] Sandbox: mpv(96654) deny(1) mach-lookup com.apple.logd
2025-12-21 10:13:59.187779-0700 0x1099e49  Error       0x0                  601    0    sandboxd: [com.apple.sandbox.reporting:violation] Sandbox: mpv(96654) deny(1) mach-lookup com.apple.system.notification_center
2025-12-21 10:13:59.188426-0700 0x1099e49  Error       0x0                  601    0    sandboxd: [com.apple.sandbox.reporting:violation] Sandbox: mpv(96654) deny(1) mach-lookup com.apple.pasteboard.1
2025-12-21 10:13:59.188789-0700 0x1099e49  Error       0x0                  601    0    sandboxd: [com.apple.sandbox.reporting:violation] Sandbox: mpv(96654) deny(1) mach-lookup com.apple.distributed_notifications@Uv3
2025-12-21 10:13:59.189289-0700 0x1099e49  Error       0x0                  601    0    sandboxd: [com.apple.sandbox.reporting:violation] Sandbox: mpv(96654) deny(1) mach-lookup com.apple.tccd.system
2025-12-21 10:13:59.189640-0700 0x1099e49  Error       0x0                  601    0    sandboxd: [com.apple.sandbox.reporting:violation] Sandbox: mpv(96654) deny(1) mach-lookup com.apple.windowserver.active
2025-12-21 10:13:59.189994-0700 0x1099e49  Error       0x0                  601    0    sandboxd: [com.apple.sandbox.reporting:violation] Sandbox: mpv(96654) deny(1) mach-lookup com.apple.CoreServices.coreservicesd
2025-12-21 10:13:59.190377-0700 0x1099e49  Error       0x0                  601    0    sandboxd: [com.apple.sandbox.reporting:violation] Sandbox: mpv(96654) deny(1) mach-lookup com.apple.DiskArbitration.diskarbitrationd

Now we can see pretty specifically what is being denied, which are a number of mach-lookups (not an out-of-sandbox file-read like I had suspected). Lacking a sharper tool, at this point I searched our sandbox profile for a mach-lookup line, of which there was only one match:

$ rg mach-lookup mpv.sb
21:(allow mach-lookup (global-name "com.apple.system.opendirectoryd.libinfo"))

Because order generally matters in these firewall-ish matters, I thought it would be best to modify the script just after this line. I matched this format and added the corresponding lines from our logged violations:

(allow mach-lookup (global-name "com.apple.logd"))
(allow mach-lookup (global-name "com.apple.system.notification_center"))
(allow mach-lookup (global-name "com.apple.pasteboard.1"))
(allow mach-lookup (global-name "com.apple.distributed_notifications@Uv3"))
(allow mach-lookup (global-name "com.apple.tccd.system"))
(allow mach-lookup (global-name "com.apple.windowserver.active"))
(allow mach-lookup (global-name "com.apple.CoreServices.coreservicesd"))
(allow mach-lookup (global-name "com.apple.DiskArbitration.diskarbitrationd"))

Running sandbox.sh again at this point showed two new violations:

2025-12-21 10:19:07.327764-0700 0x1099c8f  Error       0x0                  601    0    sandboxd: [com.apple.sandbox.reporting:violation] Sandbox: mpv(96921) deny(1) mach-lookup com.apple.coreservices.launchservicesd
2025-12-21 10:19:07.328418-0700 0x1099c8f  Error       0x0                  601    0    sandboxd: [com.apple.sandbox.reporting:violation] Sandbox: mpv(96921) deny(1) mach-lookup com.apple.CARenderServer

I added allows for these as well:

(allow mach-lookup (global-name "com.apple.coreservices.launchservicesd"))
(allow mach-lookup (global-name "com.apple.CARenderServer"))

Lo and behold, sandbox.sh now works!

$ ./sandbox.sh
+ main
++ mktemp -d
++ mktemp -d
+ sandbox-exec -D _GLOBAL_TMP_DIR=/var/folders/kb/tw_lp_xd2_bbv0hqk4m0bvt80000gn/T/tmp.ky3oo6TxVE -D _NIX_BUILD_TOP=/var/folders/kb/tw_lp_xd2_bbv0hqk4m0bvt80000gn/T/tmp.9YOoyXB3cd -f mpv.sb /nix/store/iq7mr3dxkq09cg851dzhnlkvb34wcf4k-mpv-0.40.0/Applications/mpv.app/Contents/MacOS/mpv --version
mpv v0.40.0 Copyright © 2000-2025 mpv/MPlayer/mplayer2 projects
 built on Jan  1 1980 00:00:00
libplacebo version: v7.351.0
FFmpeg version: 8.0
FFmpeg library versions:
   libavcodec      62.11.100
   libavdevice     62.1.100
   libavfilter     11.4.100
   libavformat     62.3.100
   libavutil       60.8.100
   libswresample   6.1.100
   libswscale      9.1.100

I wasn’t sure if all of these allows were really required, so next I manually deleted each of them from the sandbox profile, eventually finding that the only one required is:

(allow mach-lookup (global-name "com.apple.coreservices.launchservicesd"))

After a little rummaging around, I found that one can modify the sandbox environment for a package like so:

sandboxProfile = lib.optionalString stdenv.hostPlatform.isDarwin ''
  (allow mach-lookup (global-name "com.apple.coreservices.launchservicesd"))
'';

This then allows a successful build if one specifies --option sandbox relaxed. I wrapped this up in a PR which was merged. Phew!

https://n8henrie.com/2025/12/nix-debugging-macos-darwin-sandbox-issues/
How to Set a Conditional Breakpoint When Debugging Rust with LLDB
rustdebuggingtech

Bottom Line: A little Python can help LLDB break on String contents in Rust.

A few months ago I was trying to debug some Rust code that was misbehaving. The bug was in a function that was run hundreds of thousands of times, but I had made a logic error somewhere that only seemed to manifest in a very small fraction of the cases. I had gathered a few examples that were reliably misbehaving, so I wanted to set a breakpoint in this function, but have it only break depending on the contents of a specific variable (a String in this case).

I do a fair amount of work on my Macbook, so I try to use LLDB as my debugger, since it runs on both MacOS and Linux. Unfortunately for me, it seems far more popular to debug Rust with a Linux + GDB setup, and it’s relatively difficult to find LLDB-specific content. Thankfully I have the option to switch to a Linux + GDB workstation when needed, but I prefer to sort things out with LLDB when possible. This post describes my journey sorting out the process of establishing a conditional breakpoint for Rust code, using LLDB.

As a very brief introduction, I can compile this code in debug mode with an unpretentious cargo build:

fn main() {
    let var = String::from("asdf");
    println!("{:?}", var);
}

rustup provides a rust-lldb helper script, often found at ~/.cargo/bin/rust-lldb, and invoking lldb this way sets up some pretty printers and helpers to make the Rust experience more ergonomic. I recommend using it. To debug this executable (in this case creatively named foo), I’ll run rust-lldb target/debug/foo. Once LLDB is set up, I can print the value of var by:

  1. setting a breakpoint at the function named foo::main
  2. run to run the executable until the breakpoint is hit, at which point var will not yet be defined
  3. n to step to the “next” line (after which var is defined)
  4. p to print var

Here’s how this looks:

$ rust-lldb target/debug/foo
...skipping some lldb output...
(lldb) break set --name foo::main
Breakpoint 1: where = foo`foo::main::hf5da6d1ecd5155fc + 20 at main.rs:2:15, address = 0x0000000100000ef8
(lldb) run
Process 63389 launched: '/path/to/foo/target/debug/foo' (arm64)
Process 63389 stopped
* thread #1, name = 'main', queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
    frame #0: 0x0000000100000ef8 foo`foo::main::hf5da6d1ecd5155fc at main.rs:2:15
   1    fn main() {
-> 2        let var = String::from("asdf");
                      ^
   3        println!("{:?}", var);
   4    }
(lldb) n
Process 63389 stopped
* thread #1, name = 'main', queue = 'com.apple.main-thread', stop reason = step over
    frame #0: 0x0000000100000f10 foo`foo::main::hf5da6d1ecd5155fc at main.rs:3:5
   1    fn main() {
   2        let var = String::from("asdf");
-> 3        println!("{:?}", var);
            ^
   4    }
(lldb) p var
(alloc::string::String) "asdf" {
  [0] = 'a'
  [1] = 's'
  [2] = 'd'
  [3] = 'f'
}

LLDB supports conditional breakpoints by adding -c 'my condition' to the break set (in addition to breaking on a specific line in a file, as opposed to in a function, via -f path/to/file.rs -l line_number). Unfortunately, attempting to break on a condition like var.contains("asdf") fails:

(lldb) b -f foo/src/main.rs -l 3 -c 'var.contains("asdf")'
(lldb) run
...
error: stopped due to an error evaluating condition of breakpoint 4.1: "var.contains("asdf")"
Couldn't parse conditional expression:
error: <user expression 3>:1:5: no member named 'contains' in 'alloc::string::String'
    1 | var.contains("asdf")
      | ~~~ ^

With some help from Claude I eventually found a way to cast into a type that worked, but it’s not very ergonomic:

(lldb) b -f foo/src/main.rs -l 3 -c '(char*)strstr((char*)var.vec.buf.inner.ptr.pointer.pointer, "asdf")'
(lldb) run
Process 7605 launched: '/path/to/foo/target/debug/foo' (arm64)
Process 7605 stopped
* thread #1, name = 'main', queue = 'com.apple.main-thread', stop reason = breakpoint 19.1
    frame #0: 0x0000000100000d8c foo`foo::main::h5c931762e824ef5f at main.rs:3:5
   1    fn main() {
   2        let var = String::from("asdf");
-> 3        println!("{var}");
            ^
   4    }

By far the easiest approach that I came up with was just modifying the source code to do the searching, and then adding a fitting breakpoint. For example, if one is able to modify the source and is searching for the string NEEDLE in variable haystack, just add:

if haystack.contains("NEEDLE") {
    dbg!(&haystack); // <- breakpoint here
}

Done.

However, my interest was piqued, so I kept digging for other ways to accomplish this with LLDB. I asked for suggestions on the Rust user forum. Nobody chimed in to point out an obvious easy way, but one user was kind enough to suggest that I look into creating custom functions.

I found the official documentation to be interesting and fairly approachable. These pages were particularly helpful:

For context, I’m using:

$ lldb --version
lldb version 21.1.2

I decided to use this Rust code as my test case, ensuring that I can search within a nested structure:

#[derive(Debug)]
struct Parent {
    child: Child,
}

#[derive(Debug)]
struct Child {
    data: String,
}

fn main() {
    let parent = Parent {
        child: Child {
            data: String::from("hello"),
        },
    };

    println!("{:?}", parent.child.data);
}

LLDB supports a couple of scripting language options for conditional breakpoints. I know Python best, so that was an easy choice in my case. The conditional breakpoint script I eventually got to work is:

def break_if_contains(frame, bp_loc, extra_args, internal_dict):
    """Break if a Rust string contains a substring

    NB: Returning `False` means "do *not* break". Anything else (including
        `return None`, empty return, or no return value specified) means
        *do* break

    Usage:
        (lldb) command script import /path/to/filename.py
        (lldb) # set a breakpoint however you like
        (lldb) break set -f foo/src/main.rs -l 13
        (lldb) break command add --python-function filename.break_if_contains \
            -k haystack -v UNQUOTED_VAR_NAME \
            -k needle -v UNQUOTED_STRING \
            BREAKPOINT_NUMBER
        (lldb) run

    Example:
        (lldb) break command add -F strcompare.break_if_contains \
            -k haystack -v self.description \
            -k needle -v Hello \
            1
    """
    needle = str(extra_args.GetValueForKey("needle"))
    haystack = str(extra_args.GetValueForKey("haystack"))

    parts = haystack.split(".")
    current = frame.FindVariable(parts[0])
    if not current.IsValid():
        return False

    for part in parts[1:]:
        current = current.GetChildMemberWithName(part)
        if not current.IsValid():
            return False

    summary = current.summary.strip('"')
    return needle in summary

Hopefully I’ve made the setup and usage fairly clear in the docstring. Passing variables to the function is a little odd; the most obvious way I could find is by passing key-value pairs, specified by -k and -v respectively. The values are then accessible via extra_args.GetValueForKey() as shown.

Writing and debugging this function was made easier by a few strategies:

  • one can use breakpoint() in the Python script to drop into PDB (from LLDB) at runtime, just like a normal Python script
  • lldb (or rust-lldb) accepts a -o / --one-line flag that will automatically run an LLDB command at launch time
    • -o 'another command' can be repeated to run multiple commands in series
  • the same Python file that contains the break_if_contains function can also define a __lldb_init_module(debugger, internal_dict) function that can then automatically run LLDB code at command script import time via debugger.HandleCommand("lldb command here")
    • for my purposes, using -o 'some lldb command' from bash vs debugger.HandleCommand("some lldb command") within the body of __lldb_init_module() seems to be different routes to accomplish the same task

For example, I used a bash script that I would run after each iteraction of the Python function to launch LLDB and import the script:

#!/usr/bin/env bash

cargo build
rust-lldb target/debug/foo \
    -o 'command script import /path/to/strcompare.py'

In the imported Python script, I included this function, which automatically sets a breakpoint, modifies the breakpoint to conditionally break based on the result of the Python function, then tells LLDB to proceed with running the binary:

def __lldb_init_module(debugger, internal_dict):
    debugger.HandleCommand("break set -f foo/src/main.rs -l 13")
    debugger.HandleCommand(
        " ".join(
            [
                "breakpoint command add --python-function strcompare.break_if_contains",
                "-k haystack -v parent.child.data",
                # "-k haystack -v parent",
                "-k needle -v hello",
            ]
        )
    )
    debugger.HandleCommand("run")

As you can see, this can simplify commenting out parts of the script (in this case I was ensuring that it works for simple variables like String::from("foo") as well as layered structs). You may also notice that specifying the number of the breakpoint to modify is optional; if omitted, it defaults to the most recently added breakpoint. I could have done this in the bash script, but the complexities of line continuation in bash made Python a little more comfortable for this task.

Finally, because it feels a little clumsy to have to first set a breakpoint and then separately modify it with the conditional script, here is an example of how to write an LLDB “custom command” that can accept all required arguments, set up the breakpoint, modify it, and run, all in one fell swoop:

def run_break(debugger, command, exe_ctx, result, internal_dict):
    """Helper to run `break_on_contains`

    All arguments are required:
        `-f`: file for breakpoint
        `-l`: line for breakpoint
        `-n`: needle
        `-h`: haystack

    example:
        (lldb) run_break -f foo/src/main.rs -l 17 -n hello -h parent.child.data
    """
    args = command.split()
    args.reverse()
    while True:
        try:
            arg = args.pop()
        except IndexError:
            break
        match arg:
            case "-f":
                file = args.pop()
            case "-l":
                line = args.pop()
            case "-n":
                needle = args.pop()
            case "-h":
                haystack = args.pop()
    debugger.HandleCommand(f"break set -f {file} -l {line}")
    debugger.HandleCommand(
        " ".join(
            [
                "breakpoint command add --python-function strcompare.break_if_contains",
                f"-k haystack -v {haystack}",
                f"-k needle -v {needle}",
            ]
        )
    )
    debugger.HandleCommand("run")


def __lldb_init_module(debugger, internal_dict):
    debugger.HandleCommand(
        "command script add -f strcompare.run_break run_break"
    )

Clearly this could be cleaned up with argparse and maybe shlex. However, it accomplishes my goal of a conditional breakpoint based on String contents in Rust! At this point, the process is as simple as this:

$ rust-lldb target/debug/foo
(lldb) command script import /path/to/strcompare.py
(lldb) run_break -f foo/src/main.rs -l 17 -n hello -h parent.child.data
Breakpoint 1: 2 locations.
Process 64567 launched: '/path/to/foo/target/debug/foo' (arm64)
Process 64567 stopped
* thread #1, name = 'main', queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
    frame #0: 0x0000000100000f74 foo`foo::main::hf5da6d1ecd5155fc at main.rs:18:5
   15           },
   16       };
   17
-> 18       println!("{:?}", parent.child.data);
            ^
   19   }

Finally, as one last convenience, if one wanted all of this to be automatically loaded at LLDB startup, one can create a file at ~/.lldbinit with the following contents:

command script import /path/to/strcompare.py

After that, one should be able to invoke rust-lldb and subsequently the run_break helper function is ready to rock!

Here is my final (proof of concept) code:

"""strcompare.py

Adds Rust string comparison breakpoint to lldb

Usage:

```
$ rust-lldb target/debug/my_binary
(lldb) command script import /path/to/strcompare.py
(lldb) break set -f project_name/src/main.rs -l 42
(lldb) break command add -F strcompare.break_if_contains -k haystack -v self.description -k needle -v hello 1
(lldb) run
```

Alternatively, using the `run_break` helper:

```
$ rust-lldb target/debug/my_binary
(lldb) command script import /path/to/strcompare.py
(lldb) run_break -f project_name/src/main.rs -l 42 -n hello -h self.description
```

Further reading:

<https://n8henrie.com/2025/12/how-to-set-a-conditional-breakpoint-when-debugging-rust-with-lldb/>
<https://lldb.llvm.org/use/tutorials/writing-custom-commands.html>
<https://lldb.llvm.org/use/tutorials/breakpoint-triggered-scripts.html>
"""


def break_if_contains(frame, bp_loc, extra_args, internal_dict):
    """Break if a Rust string contains a substring

    Usage:
        (lldb) break command add -F filename.break_if_contains \
            -k haystack -v UNQUOTED_VAR_NAME \
            -k needle -v UNQUOTED_STRING \
            BREAKPOINT_NUMBER

    Example:
        (lldb) break command add -F strcompare.break_if_contains \
            -k haystack -v self.description \
            -k needle -v Hello \
            1
    """
    needle = str(extra_args.GetValueForKey("needle"))
    haystack = str(extra_args.GetValueForKey("haystack"))

    parts = haystack.split(".")
    current = frame.FindVariable(parts[0])
    if not current.IsValid():
        return False

    for part in parts[1:]:
        current = current.GetChildMemberWithName(part)
        if not current.IsValid():
            return False

    summary = current.summary.strip('"')
    return needle in summary


def run_break(debugger, command, exe_ctx, result, internal_dict):
    """Helper to run `break_on_contains`

    All arguments are required:
        `-f`: file for breakpoint
        `-l`: line for breakpoint
        `-n`: needle
        `-h`: haystack

    example:
        (lldb) run_break -f foo/src/main.rs -l 17 -n hello -h parent.child.data
    """
    args = command.split()
    args.reverse()
    while True:
        try:
            arg = args.pop()
        except IndexError:
            break
        match arg:
            case "-f":
                file = args.pop()
            case "-l":
                line = args.pop()
            case "-n":
                needle = args.pop()
            case "-h":
                haystack = args.pop()
    debugger.HandleCommand(f"break set -f {file} -l {line}")
    debugger.HandleCommand(
        " ".join(
            [
                "breakpoint command add --python-function strcompare.break_if_contains",
                f"-k haystack -v {haystack}",
                f"-k needle -v {needle}",
            ]
        )
    )
    debugger.HandleCommand("run")


def __lldb_init_module(debugger, internal_dict):
    debugger.HandleCommand(
        "command script add -f strcompare.run_break run_break"
    )
https://n8henrie.com/2025/12/how-to-set-a-conditional-breakpoint-when-debugging-rust-with-lldb/
Nix: Why Does My System Depend on $PKG?
MacOSnixtech

Bottom Line: nix why-depends --derivation /run/current-system and nix derivation show --recursive /run/current-system are handy.

NixOS and nix-darwin users – especially those following unstable – will be familiar with the situation of nixos-rebuild or darwin-rebuild failure due to breakage in a dependency. (Said user will hopefully also be grateful that this failure prevents them from switching into a system with a broken dependency!) Once this situation is encountered, there are a few reasonable approaches, including but not limited to:

  • Do nothing. Patiently wait with your existing system, grateful that it has been prevented from switching into a broken state. Hope that the issue is eventually fixed, or fix it yourself.
    • In this case, I strongly recommend searching the nixpkgs issues and either subscribing to the existing issue, or opening a new one if you’re sure it hasn’t been reported
  • If the package isn’t particularly critical, remove the broken package from your system closure so your system upgrade can complete.
  • Use an overlay to pin the broken package to a working version so your system upgrade can complete while keeping a working (but possibly outdated) version of the package.

To some degree, all of these options rely on knowing what package is broken, which can be surprisingly hard to determine. Sometimes it is obvious based on the error message. Other times, the broken package may be a transitive dependency of your system’s direct dependency; while this is spelled out in the same error message, it can be tough to pick out sometimes.

As an example, my aarch64-darwin system is currently unable to build. Here is what I see:

$ darwin-rebuild build --flake .
building the system configuration...
error: builder for '/nix/store/kyiywvx9mnq90f466nwwj876zbzqmc1m-ruby3.3-nokogiri-1.16.0.drv' failed with exit code 1;
       last 25 log lines:
       > current directory: /nix/store/plibr3n8ym9d4n3v9yzlr72wsslfqfrs-ruby3.3-nokogiri-1.16.0/lib/ruby/gems/3.3.0/gems/nokogiri-1.16.0/ext/nokogiri
       > make DESTDIR\= sitearchdir\=./.gem.20251015-88848-9l6020 sitelibdir\=./.gem.20251015-88848-9l6020
       > compiling gumbo.c
       > In file included from gumbo.c:30:
       > In file included from ./nokogiri.h:77:
       > In file included from /nix/store/5kmiw2wqmhk6y0ij77z6xpw3jddbqmii-ruby-3.3.9/include/ruby-3.3.0/ruby.h:38:
       > In file included from /nix/store/5kmiw2wqmhk6y0ij77z6xpw3jddbqmii-ruby-3.3.9/include/ruby-3.3.0/ruby/ruby.h:28:
       > In file included from /nix/store/5kmiw2wqmhk6y0ij77z6xpw3jddbqmii-ruby-3.3.9/include/ruby-3.3.0/ruby/internal/arithmetic.h:24:
       > In file included from /nix/store/5kmiw2wqmhk6y0ij77z6xpw3jddbqmii-ruby-3.3.9/include/ruby-3.3.0/ruby/internal/arithmetic/char.h:29:
       > /nix/store/5kmiw2wqmhk6y0ij77z6xpw3jddbqmii-ruby-3.3.9/include/ruby-3.3.0/ruby/internal/core/rstring.h:398:24: warning: default initialization of an object of type 'struct RString' with const member leaves the object uninitialized [-Wdefault-const-init-field-unsafe]
       >   398 |         struct RString retval;
       >       |                        ^
       > /nix/store/5kmiw2wqmhk6y0ij77z6xpw3jddbqmii-ruby-3.3.9/include/ruby-3.3.0/ruby/internal/core/rbasic.h:88:17: note: member 'klass' declared 'const' here
       >    88 |     const VALUE klass;
       >       |                 ^
       > gumbo.c:32:10: fatal error: 'nokogiri_gumbo.h' file not found
       >    32 | #include "nokogiri_gumbo.h"
       >       |          ^~~~~~~~~~~~~~~~~~
       > 1 warning and 1 error generated.
       > make: *** [Makefile:248: gumbo.o] Error 1
       >
       > make failed, exit code 2
       >
       > Gem files will remain installed in /nix/store/plibr3n8ym9d4n3v9yzlr72wsslfqfrs-ruby3.3-nokogiri-1.16.0/lib/ruby/gems/3.3.0/gems/nokogiri-1.16.0 for inspection.
       > Results logged to /nix/store/plibr3n8ym9d4n3v9yzlr72wsslfqfrs-ruby3.3-nokogiri-1.16.0/lib/ruby/gems/3.3.0/extensions/arm64-darwin-24/3.3.0/nokogiri-1.16.0/gem_make.out
       For full logs, run:
         nix log /nix/store/kyiywvx9mnq90f466nwwj876zbzqmc1m-ruby3.3-nokogiri-1.16.0.drv
error: 1 dependencies of derivation '/nix/store/a4539sgfjbl0xqz3jx4slch2ba9d6rwg-ronn-gems.drv' failed to build
error: 1 dependencies of derivation '/nix/store/97882fyjn32cc6biar60x75fpp371566-ronn-0.10.1.drv' failed to build
error: 1 dependencies of derivation '/nix/store/gsnwdgafz7czy1ca96b2sgr095k47iq2-actionlint-1.7.7.drv' failed to build
error: 1 dependencies of derivation '/nix/store/7nvsky84fws4lc1nhkr38c2b9yyr7s9d-home-manager-applications.drv' failed to build
error: 1 dependencies of derivation '/nix/store/vlzwa0j6ljj38457whfi9ll4ay8m4krf-home-manager-fonts.drv' failed to build
error: 1 dependencies of derivation '/nix/store/m60anx6pcl18gb6wdaqpd7nxms4yizjg-home-manager-path.drv' failed to build
error (ignored): error: cannot unlink "/private/tmp/nix-build-whisper-0.6.0.drv-4/source/target/aarch64-apple-darwin/release/deps": Directory not empty
error: 1 dependencies of derivation '/nix/store/60z2q88r583368c5lgbwgn2lp9y31pwf-system-applications.drv' failed to build
error: 1 dependencies of derivation '/nix/store/3l5p89klbglv23xk3mlnjj6lrmh3kakx-system-path.drv' failed to build
error: 1 dependencies of derivation '/nix/store/ps2pykvd8866hysx82dwgqb9pkjdh33z-darwin-system-25.11.9a9ab01.drv' failed to build

Based on this, I can tell that nokogiri seems to have failed:

For full logs, run:
nix log /nix/store/kyiywvx9mnq90f466nwwj876zbzqmc1m-ruby3.3-nokogiri-1.16.0.drv

Based on the heirarchy of th error message, it looks like ronn-gems, ronn, actionlint, and some home-manager paths are also failing (likely as a result). However, I’m not entirely sure exactly how I am depending on nokogiri.

Up until recently, my approach in these situations was to just grep (or rg rather) to see if any of these are explicitly included in my configuration:

$ rg -t nix nokogiri ~/git/nixos
$

However, I have recently dug more into nix why-depends, and I’m pleased to report that it seems helpful here – but not necessarily by itself.

$ nix why-depends --help
nix why-depends [option...] package dependency

Examples

  · Show one path through the dependency graph leading from Hello to Glibc:

      │ # nix why-depends nixpkgs#hello nixpkgs#glibc

My first thought was that I should use my system (.#darwinConfigurations.natepro.config.system.build.toplevel) as the “dependency” and nokogiri as the “package”, but it seems necessary for the “package” to be successfully built for why-depends to work; attempting to use it in this way just gives us back the same backtrace:

$ nix why-depends ~/git/nixos#darwinConfigurations.natepro.config.system.build.toplevel nokogiri
error: builder for '/nix/store/kyiywvx9mnq90f466nwwj876zbzqmc1m-ruby3.3-nokogiri-1.16.0.drv' failed with exit code 1;
       last 25 log lines:
       > current directory: /nix/store/plibr3n8ym9d4n3v9yzlr72wsslfqfrs-ruby3.3-nokogiri-1.16.0/lib/ruby/gems/3.3.0/gems/nokogiri-1.16.0/ext/nokogiri
       > make DESTDIR\= sitearchdir\=./.gem.20251017-95545-55qfgi sitelibdir\=./.gem.20251017-95545-55qfgi
       > compiling gumbo.c
       > In file included from gumbo.c:30:
       > In file included from ./nokogiri.h:77:
       > In file included from /nix/store/5kmiw2wqmhk6y0ij77z6xpw3jddbqmii-ruby-3.3.9/include/ruby-3.3.0/ruby.h:38:
       > In file included from /nix/store/5kmiw2wqmhk6y0ij77z6xpw3jddbqmii-ruby-3.3.9/include/ruby-3.3.0/ruby/ruby.h:28:
       > In file included from /nix/store/5kmiw2wqmhk6y0ij77z6xpw3jddbqmii-ruby-3.3.9/include/ruby-3.3.0/ruby/internal/arithmetic.h:24:
       > In file included from /nix/store/5kmiw2wqmhk6y0ij77z6xpw3jddbqmii-ruby-3.3.9/include/ruby-3.3.0/ruby/internal/arithmetic/char.h:29:
       > /nix/store/5kmiw2wqmhk6y0ij77z6xpw3jddbqmii-ruby-3.3.9/include/ruby-3.3.0/ruby/internal/core/rstring.h:398:24: warning: default initialization of an object of type 'struct RString' with const member leaves the object uninitialized [-Wdefault-const-init-field-unsafe]
       >   398 |         struct RString retval;
       >       |                        ^
       > /nix/store/5kmiw2wqmhk6y0ij77z6xpw3jddbqmii-ruby-3.3.9/include/ruby-3.3.0/ruby/internal/core/rbasic.h:88:17: note: member 'klass' declared 'const' here
       >    88 |     const VALUE klass;
       >       |                 ^
       > gumbo.c:32:10: fatal error: 'nokogiri_gumbo.h' file not found
       >    32 | #include "nokogiri_gumbo.h"
       >       |          ^~~~~~~~~~~~~~~~~~
       > 1 warning and 1 error generated.
       > make: *** [Makefile:248: gumbo.o] Error 1
       >
       > make failed, exit code 2
       >
       > Gem files will remain installed in /nix/store/plibr3n8ym9d4n3v9yzlr72wsslfqfrs-ruby3.3-nokogiri-1.16.0/lib/ruby/gems/3.3.0/gems/nokogiri-1.16.0 for inspection.
       > Results logged to /nix/store/plibr3n8ym9d4n3v9yzlr72wsslfqfrs-ruby3.3-nokogiri-1.16.0/lib/ruby/gems/3.3.0/extensions/arm64-darwin-24/3.3.0/nokogiri-1.16.0/gem_make.out
       For full logs, run:
         nix log /nix/store/kyiywvx9mnq90f466nwwj876zbzqmc1m-ruby3.3-nokogiri-1.16.0.drv
error: 1 dependencies of derivation '/nix/store/a4539sgfjbl0xqz3jx4slch2ba9d6rwg-ronn-gems.drv' failed to build
error: 1 dependencies of derivation '/nix/store/97882fyjn32cc6biar60x75fpp371566-ronn-0.10.1.drv' failed to build
error: 1 dependencies of derivation '/nix/store/gsnwdgafz7czy1ca96b2sgr095k47iq2-actionlint-1.7.7.drv' failed to build
error (ignored): error: cannot unlink "/private/tmp/nix-build-time-1.9.drv-3/time-1.9": Directory not empty
error: 1 dependencies of derivation '/nix/store/baczh3ilpnzwii15r3hd1vzhnmgscdcy-home-manager-applications.drv' failed to build
error (ignored): error: cannot unlink "/private/tmp/nix-build-ffmpeg-7.1.1.drv-10": Directory not empty
error: 1 dependencies of derivation '/nix/store/1n84x1s39rmp27c2m4wg7i5rih9xq1xs-home-manager-fonts.drv' failed to build
error: 1 dependencies of derivation '/nix/store/isnjghv45qr10hdwakv86m0993phhb55-home-manager-path.drv' failed to build
error: 1 dependencies of derivation '/nix/store/n2am0d06r4bbn637v64f566zkvyki1k9-system-applications.drv' failed to build
error: 1 dependencies of derivation '/nix/store/pkkf0bsgnxfg0x1vwcgpqr676cph79gc-system-path.drv' failed to build
error: 1 dependencies of derivation '/nix/store/9wv921vl3aacjxkzm6s9ip247nhwlww1-darwin-system-25.11.9a9ab01.drv' failed to build

It seems that a reasonable workaround might be to use the current system, to see how it depends on nokogiri. Thankfully the symlink at /run/current-system provides a handy shortcut!

$ readlink /run/current-system
/nix/store/735wnqlyhfx5h4l5p22cpg06463b84gq-darwin-system-25.11.9a9ab01

We then run into our next issue:

$ nix why-depends /run/current-system nokogiri
error: cannot find flake 'flake:nokogiri' in the flake registries

The “dependency” is not a string of a package name, looks like it needs to be a flake. We saw ruby3.3-nokogiri above, so the package is likely rubyPackages.nokogiri or rubyPackages_3_3.nokogiri, but neither of these seem to work either, nor does using the path of the failing derivation:

$ nix why-depends /run/current-system nixpkgs#rubyPackages.nokogiri
these 9 paths will be fetched (0.72 MiB download, 4.95 MiB unpacked):
  /nix/store/2h6w1ka2q5ksnwfgw064240r3vmir32p-find-xml-catalogs-hook
  /nix/store/x06zagc5aasqd4nvjfzg6zv917v7zkvn-find-xml-catalogs-hook
  /nix/store/jf18994msibcw4z59mh2lgw2hmfavlg4-libxml2-2.14.5-dev
  /nix/store/3lq98gys6gf97z6cjpklnf9mc4s4wmyb-libxslt-1.1.43
  /nix/store/p35v975i3kx3clh4afgi7lcjnb0b5wpy-libxslt-1.1.43-bin
  /nix/store/xld92i9ps44vrdj7dcdzk4bhzaihcb6i-libxslt-1.1.43-dev
  /nix/store/6ixfg46lnzmssfj71h5rf11aspvirmc0-ruby3.3-mini_portile2-2.8.8
  /nix/store/znhn5z9z0c1i780jw54ag9rnpm92z1ln-ruby3.3-nokogiri-1.18.7
  /nix/store/bgr5q188w30w1sbc58wglnf1f8hrrmls-ruby3.3-racc-1.8.1
'/nix/store/735wnqlyhfx5h4l5p22cpg06463b84gq-darwin-system-25.11.9a9ab01' does not depend on 'flake:nixpkgs#rubyPackages.nokogiri'
$ nix why-depends /run/current-system nixpkgs#rubyPackages_3_3.nokogiri
these 9 paths will be fetched (0.72 MiB download, 4.95 MiB unpacked):
  /nix/store/2h6w1ka2q5ksnwfgw064240r3vmir32p-find-xml-catalogs-hook
  /nix/store/x06zagc5aasqd4nvjfzg6zv917v7zkvn-find-xml-catalogs-hook
  /nix/store/jf18994msibcw4z59mh2lgw2hmfavlg4-libxml2-2.14.5-dev
  /nix/store/3lq98gys6gf97z6cjpklnf9mc4s4wmyb-libxslt-1.1.43
  /nix/store/p35v975i3kx3clh4afgi7lcjnb0b5wpy-libxslt-1.1.43-bin
  /nix/store/xld92i9ps44vrdj7dcdzk4bhzaihcb6i-libxslt-1.1.43-dev
  /nix/store/6ixfg46lnzmssfj71h5rf11aspvirmc0-ruby3.3-mini_portile2-2.8.8
  /nix/store/znhn5z9z0c1i780jw54ag9rnpm92z1ln-ruby3.3-nokogiri-1.18.7
  /nix/store/bgr5q188w30w1sbc58wglnf1f8hrrmls-ruby3.3-racc-1.8.1
'/nix/store/735wnqlyhfx5h4l5p22cpg06463b84gq-darwin-system-25.11.9a9ab01' does not depend on 'flake:nixpkgs#rubyPackages_3_3.nokogiri'
$ nix why-depends /run/current-system /nix/store/kyiywvx9mnq90f466nwwj876zbzqmc1m-ruby3.3-nokogiri-1.16.0.drv
'/nix/store/735wnqlyhfx5h4l5p22cpg06463b84gq-darwin-system-25.11.9a9ab01' does not depend on '/nix/store/kyiywvx9mnq90f466nwwj876zbzqmc1m-ruby3.3-nokogiri-1.16.0.drv'

Rethinking, this makes some sense, as the current system obviously doesn’t depend on the failing package; its build succeeded! So if we’re trying to trace the path from the failing dependency to the failing build, we would need to ask about how the current, successfully built system dependended on the successful build of its nokogiri (assuming that this relationship hasn’t changed – which is a load-bearing assumption).

So how do we figure out what nokogiri path the current system depended on? Unfortunately just searching /nix/store for ‘.nokogiri.’ will give us dozens (or more) of results from prior system generations and other packages, so that’s out. Luckily we can look a little deeper into our current system dependencies with nix derivation show /run/current-system.

Unfortunately, we are again stymied:

$ nix derivation show /run/current-system | rg nokogiri
$

No results. I think this occurs because the current system doesn’t have a direct dependency on nokogiri; as noted above, we haven’t explicitly listed it anywhere. So perhaps it is a build dependency (i.e. a dependency of a dependency). Thankfully there is a --recursive flag! This will take longer to run, but should hopefully get us to our next step:

$ nix derivation show --recursive /run/current-system | rg -c nokogiri
23

Based on its help page, we see that this tool outputs JSON that maps store_path:output where output is in the format described in the nix manual.

$ nix derivation show --help | rg JSON
This command prints on standard output a JSON representation of the store
nix derivation show outputs a JSON map of store paths to derivations in the

Looking through this format, we could almost certainly “reimplement” nix why-depends by searching the inputDrvs, but this is slightly more effort than just finding the store paths:

$ nix derivation show --recursive /run/current-system | jq -r 'keys | map(select(match("nokogiri")))[]'
/nix/store/fy1akqdymr80p6ypjwvdrpp6l9fm1j68-ruby3.3-nokogiri-1.16.0.drv
/nix/store/xa2icl9j5jxwx0v2fdxv3qazyv1h43m4-nokogiri-1.16.0.gem.drv

then putting these into nix why-depends:

$ nix why-depends /run/current-system /nix/store/fy1akqdymr80p6ypjwvdrpp6l9fm1j68-ruby3.3-nokogiri-1.16.0.drv
'/nix/store/h4s1jcgfygr4mraz3lvy41iwrgzjsimr-darwin-system-25.11.9a9ab01' does not depend on '/nix/store/fy1akqdymr80p6ypjwvdrpp6l9fm1j68-ruby3.3-nokogiri-1.16.0.drv'

Huh, another speedbump. We’ve already verified that the current system does at least transitively depend on this derivation. Reading through nix why-depends --help, we see that it accepts a --derivation flag, and this seems to finally get us our answer:

$ nix why-depends --derivation /run/current-system /nix/store/xa2icl9j5jxwx0v2fdxv3qazyv1h43m4-nokogiri-1.16.0.gem.drv
/nix/store/2qhzzfbv4zdks9rh1j1nq405m4lifbr3-darwin-system-25.11.9a9ab01.drv
└───/nix/store/1067s8x77wnbp4x59aa9vgrdpjazjrpk-system-applications.drv
    └───/nix/store/s5yav52c6b6d8ffl1pcla41y9njqzklf-actionlint-1.7.7.drv
        └───/nix/store/iwnbim2kfpsx0njiczrjdbwkzppqgzpf-ronn-0.10.1.drv
            └───/nix/store/bjkr33d0dsr4am83n6ypjfazkm58my8s-ronn-gems.drv
                └───/nix/store/fy1akqdymr80p6ypjwvdrpp6l9fm1j68-ruby3.3-nokogiri-1.16.0.drv
                    └───/nix/store/xa2icl9j5jxwx0v2fdxv3qazyv1h43m4-nokogiri-1.16.0.gem.drv

Anticlimactically, careful inspect reveals this to be essentially a cleaned-up version of the output we got from darwin-rebuild in the first place. (NB: some of the store paths will have changed during the writing of this article, as I rebuilt my system before finalizing the post.)

Thankfully, with some help from the cleaner output, we can see fairly easily that the system depends on system-applications, which depends on actionlint. Sure enough, searching our flake, we find it is directly referenced:

$ rg -t nix -C3 actionlint
modules/default-packages.nix
25-      ]
26-      ++ [
27-        (pass.withExtensions (exts: [ exts.pass-otp ]))
28:        actionlint
29-        alacritty
30-        alejandra
31-        bacon

Unsurprisingly commenting out this line allows our system to build.

As a side-note, there is also an --all flag that shows other paths to the same dependency; its output looks more similar to the original output from darwin-rebuild:

$ nix why-depends --derivation --all /run/current-system /nix/store/fy1akqdymr80p6ypjwvdrpp6l9fm1j68-ruby3.3-nokogiri-1.16.0.drv
/nix/store/2qhzzfbv4zdks9rh1j1nq405m4lifbr3-darwin-system-25.11.9a9ab01.drv
├───/nix/store/1067s8x77wnbp4x59aa9vgrdpjazjrpk-system-applications.drv
│   └───/nix/store/s5yav52c6b6d8ffl1pcla41y9njqzklf-actionlint-1.7.7.drv
│       ├───/nix/store/iwnbim2kfpsx0njiczrjdbwkzppqgzpf-ronn-0.10.1.drv
│       │   └───/nix/store/bjkr33d0dsr4am83n6ypjfazkm58my8s-ronn-gems.drv
│       │       ├───/nix/store/fy1akqdymr80p6ypjwvdrpp6l9fm1j68-ruby3.3-nokogiri-1.16.0.drv
│       │       └───/nix/store/77xqfn4qs4d4k9fabvjq72sj33asm26y-ruby3.3-ronn-ng-0.10.1.drv
│       │           └───/nix/store/fy1akqdymr80p6ypjwvdrpp6l9fm1j68-ruby3.3-nokogiri-1.16.0.drv
│       └───/nix/store/92525bicsii166yk56g3vm4grbwpjh5h-actionlint-1.7.7-go-modules.drv
│           └───/nix/store/iwnbim2kfpsx0njiczrjdbwkzppqgzpf-ronn-0.10.1.drv
├───/nix/store/dwh1c68h8xijmfbykyn9ggkpjshw17fl-system-path.drv
│   └───/nix/store/s5yav52c6b6d8ffl1pcla41y9njqzklf-actionlint-1.7.7.drv
├───/nix/store/qvw37qgfkpfldbv0vsihfijsrxb8xgda-etc.drv
│   ├───/nix/store/dwh1c68h8xijmfbykyn9ggkpjshw17fl-system-path.drv
│   └───/nix/store/440cn56dyhlc0ghf281gb4r6dbzcbfc8-user-environment.drv
│       └───/nix/store/v1b1lvcc9is3dpxjl2qr3dxks8f9kw24-home-manager-path.drv
│           └───/nix/store/s5yav52c6b6d8ffl1pcla41y9njqzklf-actionlint-1.7.7.drv
└───/nix/store/88fh7jn807sfhjjzk0f67s48f443yl25-activation-n8henrie.drv
    └───/nix/store/r0adycland7ffzpz80l35y7x2iw9ald3-home-manager-generation.drv
        ├───/nix/store/v1b1lvcc9is3dpxjl2qr3dxks8f9kw24-home-manager-path.drv
        ├───/nix/store/c749wf5q4d9c6cd78wypd9am1kl92kby-home-manager-files.drv
        │   ├───/nix/store/nkhwrgp52yhambmzafqsbrlwimcm6zsz-home-manager-applications.drv
        │   │   └───/nix/store/s5yav52c6b6d8ffl1pcla41y9njqzklf-actionlint-1.7.7.drv
        │   ├───/nix/store/d8p8kh2f9kz3ikirxggicf1iirdllp33-hm_fontconfigconf.d10hmfonts.conf.drv
        │   │   └───/nix/store/v1b1lvcc9is3dpxjl2qr3dxks8f9kw24-home-manager-path.drv
        │   └───/nix/store/xzp24ns6lk93vd3spdkfqb5dmgncvb45-hm_LibraryFonts.homemanagerfontsversion.drv
        │       └───/nix/store/x42ccmdhrw0smyln136vcwkw1kr8qx42-home-manager-fonts.drv
        │           └───/nix/store/s5yav52c6b6d8ffl1pcla41y9njqzklf-actionlint-1.7.7.drv
        └───/nix/store/na4sxb3szzxhw89pm9j8rywmfj4fhdc9-activation-script.drv
            ├───/nix/store/x42ccmdhrw0smyln136vcwkw1kr8qx42-home-manager-fonts.drv
            └───/nix/store/xzp24ns6lk93vd3spdkfqb5dmgncvb45-hm_LibraryFonts.homemanagerfontsversion.drv
https://n8henrie.com/2025/10/nix-why-does-my-system-depend-on-pkg/
Compiling Rust for the ESP32 with Nix
arduinoelectronicsnixrusttechtech

Bottom Line: Nix’s tooling for Rust compilation can target the ESP32.

Preface

This is a fairly long, meandering post about using nix to compile a no_std Rust project for the ESP32C3. I was able to get things working eventually, and I try to recreate the process that took me there, including several mistakes along the way. Many of the error messages that I encountered seemed quite obscure and had non-obvious (to me, at least) fixes. Worst of all was that I found relatively few directly relevant or helpful blog posts in spite of fairly diligent searches through DuckDuckGo, Google, Stack Overflow, GitHub, and the NixOS Discourse. I decided to write this post in this style – including the error messages I encountered and what I did to resolve or work around them – specifically hoping to provide future searchers with something more helpful than what I found, should they run across similar errors. For anyone that just wants to “flip to the solutions at the end of the textbook,” feel free to scroll to the bottom, where I’ve included the final nix config; you’ll obviously still need to clone the esp-rs/no_std-training repo to get the relevant Rust code.



I occasionally like to tinker with electronics, like toy projects on an arduino, or sometimes building for even cheaper targets like the ESP01 or an ATMEGA328P directly.

I’ve traditionally used the Arduino IDE and/or PlatformIO to get the job done, and since I hardly know any C, I’ve also experimented with micropython (whose support for the ESP8266 is particularly welcome).

More recently, as I continue learning about Rust, one of the features that particularly appeals to me is the support for compiling for “bare-metal” no_std targets, including my beloved ATMEGA328p (--target avr-unknown-gnu-atmega328). Perhaps an even more exciting target is the ESP32C3, for which an incredible amount of (ongoing) work is making this a wifi-enabled no_std Rust-compatible chip: https://github.com/esp-rs/esp-wifi

Because I’m only an occasional tinkerer with these types of projects, one issue that has bitten me more than once is when updates to the tooling and ecosystem make it so that once-working code no longer works when I come back to it after a hiatus. While many of these projects can run for years or decades once flashed to a device, I often find that if I return to update or modify a project months or years later, that so much of the tooling has changed that I can’t get the project to compile (even with no changes to my code) or perhaps the tooling to flash the binary has changed or become outdated. While it’s great that arduino, platformio, esptool, ampy, etc. are continuing to evolve and improve, it is certainly frustrating when things have changed so much that existing projects no longer work.

The Rust tooling is already pretty solid at protecting against this; for example, one can include a rust-toolchain.toml file along with a project and pin a specific version of the Rust compiler (e.g. nightly-2020-07-10), and even specify included components and targets: https://rust-lang.github.io/rustup/overrides.html

I think this would probably suffice for making it highly likely that one could return to a Rust-based microcontroller project years later and still be able to produce a usable binary. However, this is the type of problem for which nix really shines – it can help guarantee that all of the dependencies for a project are reproducible down to first principles and even leverages a binary cache that can help ensure that tools are available for use in nix projects even if their original sources are taken offline. If one knows beforehand that it may be many years before they return to a project, it’s even possible to vendor archives of all of these dependencies, guarding against the hypothetical possibility that the nightly-2020-07-10 version of Rust is taken down and no longer available for download (see also: nix nar, nix bundle, nix-copy-closure).

For the purposes of this post, I found that – with some effort – I was able to use the nix tooling to compile a no_std project for the ESP32C3 that successfully connects to wifi. I think the best place to start is by putting nix aside for a moment to focus on the Rust code.

I started by dusting off my ESP32C3 and referring to the esp-rs/esp-wifi repo. I had toyed with it a year or two ago, but the esp-rs team has put a lot of work into it since then, so I wanted to see how well the updates worked. I was able to get the code in examples-esp32c3/examples/dhcp.rs to work, but as of the time of writing the instructions are set up for this to be run as an example (cargo run --example dhcp --release --features "embedded-svc,wifi") from the root of the repo, and I found it fairly difficult to make modifications to this code for a standalone project, in part due to the inter-dependencies within the workspace.

Luckily, while poking around, I found a fairly new repo at github.com/esp-rs/no_std-training that seemed to be just the ticket – in no_std-training/intro/http-client, I found an example project including a Cargo.toml, rust-toolchain.toml, and sample code in a subdirectory at examples/http-client.rs that seems like a great start. At the time of writing src/main.rs seemed incomplete and was not working – this project appears to be a work in progress.

On my M1 Mac, I found that I was able to compile this code with no difficulty:

$ git clone git@github.com:esp-rs/no_std-training.git
$ cd no_std-training
$ git checkout 88bc692d81dfcf9491c80dc7c9e8601b702e465a
$ cd intro/http-client
$ cat examples/http-client.rs > src/main.rs
$ rustup target add riscv32imc-unknown-none-elf
$ export SSID=foo PASSWORD=bar
$ cargo build
$ file target/riscv32imc-unknown-none-elf/debug/http-client
target/riscv32imc-unknown-none-elf/debug/http-client: ELF 32-bit LSB executable, UCB RISC-V, RVC, soft-float ABI, version 1 (SYSV), statically linked, with debug_info, not stripped

NB: the esp-rs team strongly recommends building in --release mode, and cautions that the code may fail to run if compiled in debug mode (the default) like I’ve done above; I’m just using debug mode to check my work while writing this post because it’s faster to compile.

With that working, I set about to putting dependencies into nix to hopefully help keep it working. One of the first steps to help this process is to pin any git dependencies in Cargo.toml, to make sure we’re always pulling down the same version.

Thankfully, reviewing Cargo.toml shows only a single git dependency, on esp-wifi itself, which we can pin to a recent and known working commit by adding a rev to the esp-wifi line:

esp-wifi = { git = "https://github.com/esp-rs/esp-wifi/", features = ["esp32c3", "wifi-logs", "wifi"], rev = "e7140fd35852dadcd1df7592dc149e876256348f" }

I usually start adding nix to my projects using a flake template, which I’ve made available at github.com/n8henrie/flake-templates and can be used like so:

$ nix flake init -t github:n8henrie/flake-templates#trivial

This includes a function named systemClosure that helps reduce some boilerplate to expose outputs for multiple systems. (Most people use flake-utils for this, no specific reason that I don’t.)

Next, I add an input for oxalica/rust-overlay, which is an overlay that – among other things – makes it easier to leverage an existing rust-toolchain.toml file in order to specify the desired versions of the Rust tools. I pinned its input to match my nixpkgs version:

inputs = {
    nixpkgs.url = "github:nixos/nixpkgs/release-23.05";
    rust-overlay = {
        url = "github:oxalica/rust-overlay";
        inputs.nixpkgs.follows = "nixpkgs";
    };
};

Next, I did the easy part, by making a dev shell that includes the version of cargo specified by rust-toolchain.toml, by adding the following (I have ommitted some context for the sake of brevity; the full final file is at the bottom of the post):

let
    pkgs = import nixpkgs {
        inherit system;
        overlays = [(import rust-overlay)];
    };
    toolchain = (
        pkgs.rust-bin.fromRustupToolchainFile ./rust-toolchain.toml
    );
in
    devShells.${system}.default = pkgs.mkShell {
        buildInputs = [
            pkgs.cargo-espflash
            toolchain
        ];
    };

This allows me to:

$ git add flake.nix
$ nix develop
$ # show that cargo is being provided by nix:
$ type -p cargo
/nix/store/5sdglskvfpv67kw2hcp8pnkvk7w5d4rl-rust-default-1.72.0-nightly-2023-06-25/bin/cargo
$ # cargo has the expected version:
$ cargo --version
cargo 1.72.0-nightly (03bc66b55 2023-06-23)
$ cargo build
$ # nix's cargo compiles the project without errors:
$ file target/riscv32imc-unknown-none-elf/debug/http-client
target/riscv32imc-unknown-none-elf/debug/http-client: ELF 32-bit LSB executable, UCB RISC-V, RVC, soft-float ABI, version 1 (SYSV), statically linked, with debug_info, not stripped
$ # show that the espflash utility is also available
$ type -p cargo-espflash
/nix/store/yf5d1k5mdqxghpb89qfqglcxqs4ksx0n-cargo-espflash-1.7.0/bin/cargo-espflash

Hint: if nix gives you error: getting status of... default.nix': No such file or directory, when there clearly is a default.nix, it probably means that you’re working in a git repo (which we are) but haven’t added that file; try git add default.nix (or whatever the file is) and run the nix command again.

Cool, it worked!

This is probably good enough for most intents and purposes, at it should provide a reproducible Rust / cargo toolchain (and the espflash utility used to flash the code onto the esp32). One simply has to nix develop and they should be dropped into a shell environment with all of the required tools, and that environment should be reproducible in the future.

However, I’ve seen that nix also includes tooling for building a Rust package directly with the likes of buildRustPackage. Recommended reading:

I wanted to explore this approach as well, and this is where things got a little hairy.

To start, I added a default.nix with the following contents:

{
  lib,
  rustPlatform,
  name,
}: (rustPlatform.buildRustPackage
  {
    inherit name;
    src = lib.cleanSource ./.;
  })

and I added the following to my flake.nix:

packages.${system}.default = pkgs.callPackage ./. {
    inherit ((builtins.fromTOML (builtins.readFile ./Cargo.toml)).package) name;
};

For anyone less familiar with nix, this pulls the name attribute from Cargo.toml and passes it to default.nix using the callPackage pattern. pkgs.callPackage is not required in this case but is a handy pattern in general because nix automatically resolves input dependencies that are available attributes of pkgs (in this case rustPlatform) but also allows for passing in dependencies manually. This allows me to pass in name (which is not an attribute of pkgs), or I could also override rustPlatform if desired. When one has dozens of inputs it can be particularly handy, as one can override a single one of them while letting the remainder be resolved automatically to their defaults. Also, default.nix – as its name suggests – is picked up automatically by callPackage ./., but I could have named it foo.nix and used callPackage ./foo.nix.

Let’s see where this gets us:

$ nix build
error: getting status of '/nix/store/s9af3f3j2lz0sa9l3n6d2lsxhngyqq96-source/intro/http-client/default.nix': No such file or directory
$ # whups, see my hint above
$ git add default.nix
$ nix build
error: cargoSha256, cargoHash, cargoVendorDir, or cargoLock must be set

Ok, so nix wants me to point it to a Cargo.lock file so it can ensure that all of the Rust dependencies are reproducible. Thankfully we should still have one hanging around from the cargo build --target=... step above. (If not you’ll need to re-run that step.) Add the following to default.nix:

cargoLock.lockFile = ./Cargo.lock;

One might also need to add Cargo.lock to git at this point, but in this case it’s already being tracked. Sometimes it is .gitignored in which case one might choose to git add -f Cargo.lock.

Next error:

$ nix build
error: No hash was found while vendoring the git dependency esp-wifi-0.1.0. You can add
       a hash through the `outputHashes` argument of `importCargoLock`:

       outputHashes = {
         "esp-wifi-0.1.0" = "<hash>";
       };

       If you use `buildRustPackage`, you can add this attribute to the `cargoLock`
       attribute set.

Ok, so let’s change the cargoLock part to the following, knowing that we’ll get an error about an invalid hash (the error message will tell us the correct value to fill in):

cargoLock = {
    lockFile = ./Cargo.lock;
    outputHashes = {
        "esp-wifi-0.1.0" = "";
    };
};
$ nix build
error: hash mismatch in fixed-output derivation '/nix/store/c0icjxbnwfhbw2w0pk5vd4dcw9p6irpr-esp-wifi-b54310e.drv':
         specified: sha256-AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=
            got:    sha256-IUkX3inbeeRZk9q/mdg56h+qft+0/TVpOM4rCKNOwz8=

Ok, let’s fill that in:

cargoLock = {
    lockFile = ./Cargo.lock;
    outputHashes = {
        "esp-wifi-0.1.0" = "sha256-IUkX3inbeeRZk9q/mdg56h+qft+0/TVpOM4rCKNOwz8=";
    };
};

This time we get a different error:

$ nix build
      > error: "/nix/store/8sindl6wnv2s5z1zwvq0rkffacicx80d-rustc-1.69.0/lib/rustlib/src/rust/Cargo.lock" does not exist, unable to build with the standard library, try:
       >         rustup component add rust-src

Now this one left me scratching my head for a little while, because I knew that the esp-rs team had conveniently put the rust-src dependency in our rust-toolchain.toml for us:

$ cat rust-toolchain.toml
[toolchain]
channel = "nightly-2023-06-25"
components = ["rust-src"]
targets = ["riscv32imc-unknown-none-elf"]

Eventually I realized that the version numbers didn’t add up: note the rustc-1.69.0 here as opposed to rust-default-1.72.0-nightly above. So clearly one issue is that the toolchain from the oxalica override is not being used. Which makes sense, because we’re using nix’s default rustPlatform.

After reading the nix + Rust links above a few more times, I noticed this section on building Rust nightly with buildRustPackage, which refers to the makeRustPlatform function and thankfully uses the oxalica overlay in its example! Taking from there, I added an additional variable to flake.nix:

rustPlatform = pkgs.makeRustPlatform {
    rustc = toolchain;
    cargo = toolchain;
};

and, lower in the same file, I used this to pass it as the rustPlatform input to default.nix:

packages.${system}.default = pkgs.callPackage ./. {
    inherit ((builtins.fromTOML (builtins.readFile ./Cargo.toml)).package) name;
    inherit rustPlatform;
};

Now, I got a new error:

$ nix build
error: no matching package named `addr2line` found

Here, I eventually came across this related post in the NixOS Discourse that has a suggested workaround. Essentially, certain packages that are required by the rust-std feature need to be downloaded (at build time), which cargo usually takes care of. However, the “purity” of nix builds disallows network access*, so this step fails. Instead, one needs to manually specify these dependencies in Cargo.toml, and apparently the dev-dependencies is the proper section for this (perhaps because they are required to build the build tooling, not to build the crate itself – let me know if this is way off base).

* At least outside of explicit downloads with tools like pkgs.fetchurl, which also require a hash to verify that the resulting download’s contents are exactly correct.

One way to add these to Cargo.toml is via cargo add, which should result in two new lines at the bottom:

$ cargo add --dev addr2line
$ tail -2 Cargo.toml
[dev-dependencies]
addr2line = "0.21.0"

Re-running nix build at this point gave me a slightly different error:

> error: failed to select a version for the requirement `addr2line = "^0.19.0"` (locked to 0.19.0)
> candidate versions found which didn't match: 0.21.0

I eventually sorted out that I needed to pin that exact version by editing Cargo.toml adding an = just before the version number:

[dev-dependencies]
addr2line = "=0.19.0"

Interestingly, upon re-running nix build, I got the exact same error:

> error: failed to select a version for the requirement `addr2line = "^0.19.0"` (locked to 0.19.0)
> candidate versions found which didn't match: 0.21.0

I eventually realized that the change I made to Cargo.toml wasn’t reflected in Cargo.lock; for that, I needed to run cargo update. After a cargo update and another attempt at building, I see an error also discussed in that thread:

$ cargo update && nix build
...
> error: no matching package named `compiler_builtins` found

Here we’ll repeat the same procedure:

  1. cargo add --dev compiler_builtins
  2. cargo update && nix build
  3. If there is an error about the version, pin it by modifying the respective line in Cargo.toml from compiler_builtins = "some_version_number" to compiler_builtins = "=other_version_number" (don’t forget the extra =), where other_version_number is taken from (locked to ...) in the error message.
  4. cargo update && nix build again, evaluate for new error message

I then repeated this process a fair number of times and eventually made it to a dependency that wouldn’t work:

$ cargo update && nix build
    Updating crates.io index
    Updating git repository `https://github.com/esp-rs/esp-wifi/`
error: failed to select a version for the requirement `hermit-abi = "=0.3.0"`
candidate versions found which didn't match: 0.3.3, 0.3.2, 0.2.6, ...
location searched: crates.io index
required by package `http-client v0.1.0 (/Users/n8henrie/git/no_std-training/intro/http-client)`
perhaps a crate was updated and forgotten to be re-vendored?

I eventually navigated to https://crates.io/crates/hermit-abi/versions and found that the 0.3.0 version we need has been yanked. Ugh.

I tried looking at the documentation for patching dependencies, but I couldn’t find an obvious way to override the version of an intermediate dependency. Eventually I gave up and changed the version of the toolchain in rust-toolchain.toml (I found that nightly-2023-08-23 worked). Unfortunately, this also means that I had to delete all those dev-dependencies and start again, since these are additional dependencies required to build Rust’s build tools (I think).

Many rounds of cargo update && nix build later, I came across a new error:

   > LLVM ERROR: Global variable '_start_rust' has an invalid section specifier '.init.rust': mach-o section specifier requires a segment and section separated by a comma.
       > error: could not compile `esp-riscv-rt` (lib)
       > warning: build failed, waiting for other jobs to finish...
       > LLVM ERROR: Global variable '__EXTERNAL_INTERRUPTS' has an invalid section specifier '.trap.rodata': mach-o section specifier requires a segment and section separated by a comma.

At this point, I figured that the error was related to the fact that I wasn’t cross-compiling at all, something I had noticed in the build logs earlier in the process:

++ env CC_aarch64-apple-darwin=/nix/store/p72lcp92djj8xpdjm27rjrrxznjjgvyi-clang-wrapper-11.1.0/bin/cc CXX_aarch64-apple-darwin=/nix/store/p72lcp92djj8xpdjm27rjrrxznjjgvyi-clang-wrapper-11.1.0/bin/c++ CC_aarch64-apple-darwin=/nix/store/p72lcp92djj8xpdjm27rjrrxznjjgvyi-clang-wrapper-11.1.0/bin/cc CXX_aarch64-apple-darwin=/nix/store/p72lcp92djj8xpdjm27rjrrxznjjgvyi-clang-wrapper-11.1.0/bin/c++ cargo build -j 8 --target aarch64-apple-darwin --frozen --release

Here’s the same command split into separate lines for readability:

++ env \
  CC_aarch64-apple-darwin=/nix/store/p72lcp92djj8xpdjm27rjrrxznjjgvyi-clang-wrapper-11.1.0/bin/cc \
  CXX_aarch64-apple-darwin=/nix/store/p72lcp92djj8xpdjm27rjrrxznjjgvyi-clang-wrapper-11.1.0/bin/c++ \
  CC_aarch64-apple-darwin=/nix/store/p72lcp92djj8xpdjm27rjrrxznjjgvyi-clang-wrapper-11.1.0/bin/cc \
  CXX_aarch64-apple-darwin=/nix/store/p72lcp92djj8xpdjm27rjrrxznjjgvyi-clang-wrapper-11.1.0/bin/c++ \
  cargo build \
  -j 8 \
  --target aarch64-apple-darwin \
  --frozen \
  --release

If you’ll look carefull at that long incantation, you’ll see --target aarch64-apple-darwin. When building with cargo, we were able to lean on ./.cargo/config.toml, conveniently provided by the esp-rs team, which sets a default build target. Nix apparently doesn’t take that into account and is building for the host system architecture.

It seems that the nix way to cross-compile Rust for other architectures is not by setting cargo’s --target directly (although it seems like previously this was the case, but no longer). Instead, one is expected to use the usual nix cross-compilation strategy of setting a crossSystem with the desired config. Here is the example from that link:

import <nixpkgs> {
  crossSystem = (import <nixpkgs/lib>).systems.examples.armhf-embedded // {
    rustc.config = "thumbv7em-none-eabi";
  };
}

I thought this seemed easy enough and set about trying to figure out the right combination. Cargo specifies the target as riscv32imc-unknown-none-elf, so one can search the available nix-provided examples by looking at lib/systems/examples.nix, or by using the following command to search for examples containing riscv:

$ nix eval --json \
        --apply builtins.attrNames \
        nixpkgs#lib.systems.examples |
    jq -r .[] |
    grep -i riscv
riscv32
riscv32-embedded
riscv64
riscv64-embedded

riscv32-embedded sounds pretty promising, right? Let’s change flake.nix to use this cross system for rustPlatform:

rustPlatform = let
    pkgsCross = import nixpkgs {
        inherit system;
        crossSystem =
            lib.systems.examples.riscv32-embedded
            // {
                rustc.config = "riscv32imc-unknown-none-elf";
            };
        };
    in
        pkgsCross.makeRustPlatform
        {
            rustc = toolchain;
            cargo = toolchain;
        };

This gets us a new error:

$ cargo update && nix build
    Updating crates.io index
    Updating git repository `https://github.com/esp-rs/esp-wifi/`
error: builder for '/nix/store/szli9axz1hgswa0b9k3327pl506hmhi6-http-client-riscv32-none-elf.drv' failed with exit code 101;
       last 10 log lines:
       > error[E0432]: unresolved import `core::sync::atomic::AtomicUsize`
       >   --> /private/tmp/nix-build-http-client-riscv32-none-elf.drv-0/cargo-vendor-dir/atomic-waker-1.1.2/src/lib.rs:27:5
       >    |
       > 27 | use core::sync::atomic::AtomicUsize;
       >    |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ no `AtomicUsize` in `sync::atomic`
       >
       >    Compiling managed v0.8.0
       > For more information about this error, try `rustc --explain E0432`.
       > error: could not compile `atomic-waker` (lib) due to previous error
       > warning: build failed, waiting for other jobs to finish...
       For full logs, run 'nix log /nix/store/szli9axz1hgswa0b9k3327pl506hmhi6-http-client-riscv32-none-elf.drv'.

At this point I did a lot of reading about nix cross-compiling, including some excellent comments and a few examples by Oxalica, but there were few results for this exact error. This thread is relevant and has some notes from one of the main esp-rs developers (@MabezDev on GitHub), but seemed to be about compiling std, and this is a no_std project. Taking a second look at the log output (again split into separate lines for readability):

++ env \
    CC_aarch64-apple-darwin=/nix/store/p72lcp92djj8xpdjm27rjrrxznjjgvyi-clang-wrapper-11.1.0/bin/cc \
    CXX_aarch64-apple-darwin=/nix/store/p72lcp92djj8xpdjm27rjrrxznjjgvyi-clang-wrapper-11.1.0/bin/c++ \
    CC_riscv32imc-unknown-none-elf=/nix/store/lasmnmwpszbyv8xambkxyhyvwi3164w2-riscv32-none-elf-stage-final-gcc-wrapper-12.2.0/bin/riscv32-none-elf-cc \
    CXX_riscv32imc-unknown-none-elf=/nix/store/lasmnmwpszbyv8xambkxyhyvwi3164w2-riscv32-none-elf-stage-final-gcc-wrapper-12.2.0/bin/riscv32-none-elf-c++ \
    cargo build \
    -j 8 \
    --target riscv32imc-unknown-none-elf \
    --frozen \
    --release

It looks like the target is being set correctly, but in true nix cross compilation fashion it looked like it might also be using a cross-compiled version of the compiler (based on the CC_* variables). That seems unnecessary, since we’ve already proven that an aarch64-darwin compiled toolchain can do the heavy lifting of cross compilation, we’re just trying to set the desired --target.

I browsed nixpkgs until I found where it seems to be setting --target in the cargo call, which sets it to rustTargetPlatformSpec. This, in turn, is being set to rust.toRustTargetSpec stdenv.hostPlatform here. toRustTargetSpec is defined here as the following:

toRustTarget = platform: let
    inherit (platform.parsed) cpu kernel abi;
    cpu_ = platform.rustc.platform.arch or {
      "armv7a" = "armv7";
      "armv7l" = "armv7";
      "armv6l" = "arm";
      "armv5tel" = "armv5te";
      "riscv64" = "riscv64gc";
    }.${cpu.name} or cpu.name;
    vendor_ = toTargetVendor platform;
  in platform.rustc.config
    or "${cpu_}-${vendor_}-${kernel.name}${lib.optionalString (abi.name != "unknown") "-${abi.name}"}";

toRustTargetSpec = platform:
    if platform ? rustc.platform
    then builtins.toFile (toRustTarget platform + ".json") (builtins.toJSON platform.rustc.platform)
    else toRustTarget platform;

So for the case at hand, I read this as:

  1. Does pkgs.stdenv.hostPlatform have a rustc.platform attribute? No (otherwise would make a .json target from the platform).
  2. Therefore, use toRustTarget pkgs.stdenv.hostPlatform.
  3. Continuing in toRustTarget, does pkgs.stdenv.hostPlatform have a rustc.config attribute? Yes.
  4. Therefore, use rustc.config (otherwise would construct a string from cpu, vendor, abi, etc.).

So it looks like rust.config may be all that’s required to set the --target. Let’s try the following:

rustPlatform = let
    pkgsCross = import nixpkgs {
        inherit system;
        rustc.config = "riscv32imc-unknown-none-elf";
    };
in
    pkgsCross.makeRustPlatform
    {
        rustc = toolchain;
        cargo = toolchain;
    };
$ cargo update && nix build
    Updating crates.io index
    Updating git repository `https://github.com/esp-rs/esp-wifi/`
error: builder for '/nix/store/9w5m6wb7di7br2ar3wy5a9kcrc6dizj3-http-client.drv' failed with exit code 101;
       last 10 log lines:
       >    Compiling enumset v1.1.2
       >    Compiling managed v0.8.0
       >    Compiling atomic-waker v1.1.2
       >    Compiling bitflags v1.3.2
       >    Compiling no-std-net v0.5.0
       > LLVM ERROR: Global variable '_start_rust' has an invalid section specifier '.init.rust': mach-o section specifier requires a segment and section separated by a comma.
       > error: could not compile `esp-riscv-rt` (lib)
       > warning: build failed, waiting for other jobs to finish...
       > LLVM ERROR: Global variable '__EXTERNAL_INTERRUPTS' has an invalid section specifier '.trap.rodata': mach-o section specifier requires a segment and section separated by a comma.
       > error: could not compile `esp32c3` (lib)
       For full logs, run 'nix log /nix/store/9w5m6wb7di7br2ar3wy5a9kcrc6dizj3-http-client.drv'.

Well, now we’re back to an error we’ve seen before, when we were compiling for the wrong architecture. Sure enough, glancing through the log, we’re back to --target aarch64-apple-darwin – a step in the wrong direction. Let’s put the crossSystem back:

rustPlatform = let
    pkgsCross = import nixpkgs {
        inherit system;
        crossSystem = {
            inherit system;
            rustc.config = "riscv32imc-unknown-none-elf";
        };
    };
in
    pkgsCross.makeRustPlatform
    {
        rustc = toolchain;
        cargo = toolchain;
    };

This gets us to our next error. Progress!

$ cargo update && nix build
    Updating crates.io index
    Updating git repository `https://github.com/esp-rs/esp-wifi/`
error: builder for '/nix/store/2fp9fkha1qjnand2xwrrair8jg86ml65-http-client-aarch64-apple-darwin.drv' failed with exit code 101;
       last 10 log lines:
       > error: environment variable `PASSWORD` not defined at compile time
       >   --> src/main.rs:26:24
       >    |
       > 26 | const PASSWORD: &str = env!("PASSWORD");
       >    |                        ^^^^^^^^^^^^^^^^
       >    |
       >    = help: use `std::env::var("PASSWORD")` to read the variable at run time
       >    = note: this error originates in the macro `env` (in Nightly builds, run with -Z macro-backtrace for more info)
       >
       > error: could not compile `http-client` (bin "http-client") due to 2 previous errors
       For full logs, run 'nix log /nix/store/2fp9fkha1qjnand2xwrrair8jg86ml65-http-client-aarch64-apple-darwin.drv'.

Looking through the build logs, the cargo build seems to be doing what we had hoped; I see an aarch64-darwin toolchain and a riscv32 target:

++ env \
    CC_aarch64-apple-darwin=/nix/store/p72lcp92djj8xpdjm27rjrrxznjjgvyi-clang-wrapper-11.1.0/bin/cc \
    CXX_aarch64-apple-darwin=/nix/store/p72lcp92djj8xpdjm27rjrrxznjjgvyi-clang-wrapper-11.1.0/bin/c++ \
    CC_riscv32imc-unknown-none-elf=/nix/store/py4adxsy9vzdgb7qlqv570wdc9rsayhf-aarch64-apple-darwin-clang-wrapper-11.1.0/bin/aarch64-apple-darwin-cc \
    CXX_riscv32imc-unknown-none-elf=/nix/store/py4adxsy9vzdgb7qlqv570wdc9rsayhf-aarch64-apple-darwin-clang-wrapper-11.1.0/bin/aarch64-apple-darwin-c++ \
    cargo build \
    -j 8 \
    --target riscv32imc-unknown-none-elf \
    --frozen \
    --release

The new error is one I actually understand (for once): the esp-rs authors have the project configured to read the wifi credentials from the build environment at compile time with the env! macro. When comiling with cargo, we can just export these in the build environment, but nix build intentionally cleans impurities (like the build environment), so it won’t be able to see these by default. I don’t know of any way to configure the runtime environment on the esp32, so I don’t think we can use the compiler’s suggestion (using std::env::var). Instead, we know that nix will generally pass along values that are set in a mkDerivation call as environment variables, so we’ll just try setting some dummy values in default.nix, to see if that allows the build to proceed:

SSID = "foo";
PASSWORD = "bar";

NB: Like basically everything else in nix, these will get built into a derivation in /nix/store that is world readable. Passwords and other secrets in nix are an entire topic on its own. For the moment, just know that this route of setting the wifi credentails will make them discoverable by anyone with read access to your device. I believe this would still be the case if using builtins.getEnv + --impure instead of building it into the derivation.

That was a pretty easy fix, and successfully leads us to our next error:

$ cargo update && nix build
    Updating crates.io index
    Updating git repository `https://github.com/esp-rs/esp-wifi/`
error: builder for '/nix/store/p2gp7hl5xnddn3w8snn6dfpbzrj9dyfd-http-client-aarch64-apple-darwin.drv' failed with exit code 101;
       last 10 log lines:
       >   = note: second definition in `core` loaded from /nix/store/cjc6j5r11wqmdkp6f5mcbrzb938rg9dw-rust-std-1.74.0-nightly-2023-08-23-riscv32imc-unknown-none-elf/lib/rustlib/riscv32imc-unknown-none-elf/lib/libcore-68e03c5be2ffebdc.rlib
       >
       > error[E0152]: duplicate lang item in crate `core` (which `alloc` depends on): `CStr`.
       >   |
       >   = note: the lang item is first defined in crate `core` (which `twox_hash` depends on)
       >   = note: first definition in `core` loaded from /private/tmp/nix-build-http-client-aarch64-apple-darwin.drv-0/source/target/riscv32imc-unknown-none-elf/release/deps/libcore-dc12a78182d2c0a4.rmeta
       >   = note: second definition in `core` loaded from /nix/store/cjc6j5r11wqmdkp6f5mcbrzb938rg9dw-rust-std-1.74.0-nightly-2023-08-23-riscv32imc-unknown-none-elf/lib/rustlib/riscv32imc-unknown-none-elf/lib/libcore-68e03c5be2ffebdc.rlib
       >
       > For more information about this error, try `rustc --explain E0152`.
       > error: could not compile `twox-hash` (lib) due to 121 previous errors
       For full logs, run 'nix log /nix/store/p2gp7hl5xnddn3w8snn6dfpbzrj9dyfd-http-client-aarch64-apple-darwin.drv'.

duplicate lang item in crate `core` – what’s that all about? I found a few GitHub issues and SO posts that didn’t give me much insight (or hope), but you’re welcome to peruse:

Thankfully, I eventually found this SO post which linked to this comment, talking about how cargo test for embedded targets perhaps didn’t make much sense (yet). By default, nix generally tries to test everything it can prior to saying that “everything compiled fine”, so it would make sense that perhaps it was running cargo test and having trouble there. Sure enough, digging deeper through the log:

++ cargo test -j 8 --release --target riscv32imc-unknown-none-elf --frozen -- --test-threads=8
   Compiling stable_deref_trait v1.2.0
   Compiling thiserror-core v1.0.38
   Compiling crc32fast v1.3.2
   Compiling thiserror-core-impl v1.0.38
   Compiling static_assertions v1.1.0
   Compiling adler v1.0.2
   Compiling memchr v2.5.0
   Compiling cpp_demangle v0.4.3
error[E0463]: can't find crate for `std`
  |
  = note: the `riscv32imc-unknown-none-elf` target may not support the standard library
  = note: `std` is required by `stable_deref_trait` because it does not declare `#![no_std]`
  = help: consider building the standard library from source with `cargo build -Zbuild-std`

What happens if we just disable the tests, by adding doCheck = false; to default.nix?

$ cargo update && nix build
    Updating crates.io index
    Updating git repository `https://github.com/esp-rs/esp-wifi/`
$ echo $?
0
$ file result/bin/http-client
result/bin/http-client: ELF 32-bit LSB executable, UCB RISC-V, RVC, soft-float ABI, version 1 (SYSV), statically linked, with debug_info, not stripped

Holy cow, a successful build. But does it work?

Running espflash flash seems to connect and tell us which serial port to use, but needs us to specify the firmware file:

$ nix develop --command espflash flash
New version of espflash is available: v2.0.1

Serial port: /dev/tty.usbserial-1110
Connecting...

Chip type:         ESP32-C3 (revision 3)
Crystal frequency: 40MHz
Flash size:        4MB
Features:          WiFi
MAC address:       84:f7:03:39:f1:cc
Error:
  × No such file or directory (os error 2)

Adding the file and specifying --monitor seems to work, and gives us some output that confirms it’s running!

$ nix develop --command espflash --monitor ./result/bin/http-client
New version of espflash is available: v2.0.1

Serial port: /dev/tty.usbserial-1110
Connecting...

Chip type:         ESP32-C3 (revision 3)
Crystal frequency: 40MHz
Flash size:        4MB
Features:          WiFi
MAC address:       84:f7:03:39:f1:cc
App/part. size:    516368/4128768 bytes, 12.51%
[00:00:01] ########################################      12/12      segment 0x0
[00:00:00] ########################################       1/1       segment 0x8000
[00:00:31] ########################################     269/269     segment 0x10000
Flashing has completed!
Commands:
    CTRL+R    Reset chip
    CTRL+C    Exit

ESP-ROM:esp32c3-api1-20210207
Build:Feb  7 2021
rst:0x1 (POWERON),boot:0xc (SPI_FAST_FLASH_BOOT)
SPIWP:0xee
mode:DIO, clock div:1
load:0x3fcd6100,len:0x172c
load:0x403ce000,len:0x928
0x403ce000 - .L17
    at ??:??
load:0x403d0000,len:0x2ce0
0x403d0000 - .L17
    at ??:??
entry 0x403ce000
0x403ce000 - .L17
    at ??:??
I (30) boot: ESP-IDF v4.4-dev-2825-gb63ec47238 2nd stage bootloader
I (30) boot: compile time 12:10:40
I (30) boot: chip revision: 3
I (33) boot_comm: chip revision: 3, min. bootloader chip revision: 0
I (41) boot.esp32c3: SPI Speed      : 80MHz
I (45) boot.esp32c3: SPI Mode       : DIO
I (50) boot.esp32c3: SPI Flash Size : 4MB
I (55) boot: Enabling RNG early entropy source...
I (60) boot: Partition Table:
I (64) boot: ## Label            Usage          Type ST Offset   Length
I (71) boot:  0 nvs              WiFi data        01 02 00009000 00006000
I (78) boot:  1 phy_init         RF data          01 01 0000f000 00001000
I (86) boot:  2 factory          factory app      00 00 00010000 003f0000
I (93) boot: End of partition table
I (98) boot_comm: chip revision: 3, min. application chip revision: 0
I (105) esp_image: segment 0: paddr=00010020 vaddr=3c060020 size=125f8h ( 75256) map
I (125) esp_image: segment 1: paddr=00022620 vaddr=3fc84588 size=01214h (  4628) load
I (126) esp_image: segment 2: paddr=0002383c vaddr=3fc9d958 size=00168h (   360) load
I (130) esp_image: segment 3: paddr=000239ac vaddr=40380000 size=04584h ( 17796) load
I (142) esp_image: segment 4: paddr=00027f38 vaddr=00000000 size=080e0h ( 32992)
I (152) esp_image: segment 5: paddr=00030020 vaddr=42000020 size=5e0c0h (385216) map
I (215) boot: Loaded app from partition at offset 0x10000
I (215) boot: Disabling RNG early entropy source...
Wi-Fi set_configuration returned Ok(())
Is wifi started: Ok(true)
Start Wifi Scan
AccessPointInfo { ssid: "REDACTED", bssid: [...], channel: 6, secondary_channel: None, signal_strength: -43, protocols: EnumSet(), auth_method: WPAWPA2Personal }
AccessPointInfo { ssid: "REDACTED2", bssid: [...], channel: 6, secondary_channel: None, signal_strength: -85, protocols: EnumSet(), auth_method: None }
Ok(EnumSet(Client | AccessPoint))
Wi-Fi connect: Ok(())
Wait to get connected
Disconnected

Finally, we can add one more convenience to our flake.nix by moving our definition of name up a layer and defining a default app that does the flashing:

apps.${system}.default = let
    flash = pkgs.writeShellApplication {
        name = "flash-${name}";
        runtimeInputs = [pkgs.cargo-espflash];
        text = ''
            espflash --monitor ${self.packages.${system}.default}/bin/${name}
        '';
    };
in {
    type = "app";
    program = "${flash}/bin/flash-${name}";
};

With this in place, a simple nix run builds and flashes! (For the below, I’ve put proper values into the SSID and PASSWORD.)

$ nix run
New version of espflash is available: v2.0.1

Serial port: /dev/tty.usbserial-1110
Connecting...

Chip type:         ESP32-C3 (revision 3)
Crystal frequency: 40MHz
Flash size:        4MB
Features:          WiFi
MAC address:       84:f7:03:39:f1:cc
App/part. size:    516448/4128768 bytes, 12.51%
[00:00:01] ########################################      12/12      segment 0x0
[00:00:00] ########################################       1/1       segment 0x8000
[00:00:32] ########################################     269/269     segment 0x10000
Flashing has completed!
Commands:
    CTRL+R    Reset chip
    CTRL+C    Exit

ESP-ROM:esp32c3-api1-20210207
Build:Feb  7 2021
rst:0x1 (POWERON),boot:0xc (SPI_FAST_FLASH_BOOT)
SPIWP:0xee
mode:DIO, clock div:1
load:0x3fcd6100,len:0x172c
load:0x403ce000,len:0x928
0x403ce000 - .L17
    at ??:??
load:0x403d0000,len:0x2ce0
0x403d0000 - .L17
    at ??:??
entry 0x403ce000
0x403ce000 - .L17
    at ??:??
I (30) boot: ESP-IDF v4.4-dev-2825-gb63ec47238 2nd stage bootloader
I (30) boot: compile time 12:10:40
I (30) boot: chip revision: 3
I (33) boot_comm: chip revision: 3, min. bootloader chip revision: 0
I (41) boot.esp32c3: SPI Speed      : 80MHz
I (45) boot.esp32c3: SPI Mode       : DIO
I (50) boot.esp32c3: SPI Flash Size : 4MB
I (55) boot: Enabling RNG early entropy source...
I (60) boot: Partition Table:
I (64) boot: ## Label            Usage          Type ST Offset   Length
I (71) boot:  0 nvs              WiFi data        01 02 00009000 00006000
I (78) boot:  1 phy_init         RF data          01 01 0000f000 00001000
I (86) boot:  2 factory          factory app      00 00 00010000 003f0000
I (93) boot: End of partition table
I (98) boot_comm: chip revision: 3, min. application chip revision: 0
I (105) esp_image: segment 0: paddr=00010020 vaddr=3c060020 size=125f8h ( 75256) map
I (125) esp_image: segment 1: paddr=00022620 vaddr=3fc84588 size=01214h (  4628) load
I (126) esp_image: segment 2: paddr=0002383c vaddr=3fc9d958 size=00168h (   360) load
I (130) esp_image: segment 3: paddr=000239ac vaddr=40380000 size=04584h ( 17796) load
I (142) esp_image: segment 4: paddr=00027f38 vaddr=00000000 size=080e0h ( 32992)
I (152) esp_image: segment 5: paddr=00030020 vaddr=42000020 size=5e11ch (385308) map
I (215) boot: Loaded app from partition at offset 0x10000
I (215) boot: Disabling RNG early entropy source...
Wi-Fi set_configuration returned Ok(())
Is wifi started: Ok(true)
Start Wifi Scan
AccessPointInfo { ssid: "REDACTED1", bssid: [...], channel: 6, secondary_channel: None, signal_strength: -39, protocols: EnumSet(), auth_method: WPA2Personal }
AccessPointInfo { ssid: "REDACTED2", bssid: [...], channel: 6, secondary_channel: None, signal_strength: -39, protocols: EnumSet(), auth_method: WPAWPA2Personal }
AccessPointInfo { ssid: "REDACTED3", bssid: [...], channel: 11, secondary_channel: None, signal_strength: -81, protocols: EnumSet(), auth_method: WPA2Personal }
Ok(EnumSet(Client | AccessPoint))
Wi-Fi connect: Ok(())
Wait to get connected
Ok(true)
Wait to get an ip address
got ip Ok(IpInfo { ip: 192.168.1.123, subnet: Subnet { gateway: 192.168.1.4, mask: Mask(24) }, dns: Some(192.168.1.4), secondary_dns: None })
Start busy loop on main
Making HTTP request
HTTP/1.0 200 OK
X-Cloud-Trace-Context: b3a2f08c40d782146364b65262968b33
Server: Google Frontend
Content-Length: 335
Date: Tue, 26 Sep 2023 16:49:18 GMT
Expires: Tue, 26 Sep 2023 16:59:18 GMT
Cache-Control: public, max-age=600
ETag: "uJJDjQ"
Content-Type: text/html
Age: 0
<!DOCTYPE html>
<html>
<head>
    <title>Nothing here</title>
</head>
<body>
<pre>
    __________________________
    < Hello fellow Rustaceans! >
     --------------------------
            \
             \
                _~^~^~_
            \) /  o o  \ (/
              '_   -   _'
              / '-----' \
</pre>
</body>
</html>

Phew, well that was a lot of work, but with any luck it’s work we should only have to do once, and going forward the same project should – theoretically, if done from the same archtecture – continue to compile and continue to flash, no matter how much time passes before returning to tinker.

As I’m sure is obvious, I’m no expert in Rust, embedded systems, electronics, or nix, so if you have suggestions for improvement, I’d love to hear about it in the comments section.

I’m not going to bother making a GitHub repo for these, since they require pinning specific versions of so many dependencies (which will likely be outdated or unrelated to your specific project), but below you can reference the final version of the relevant files. That’s all for now!

rust-toolchain.toml:

[toolchain]
channel = "nightly-2023-08-23"
components = ["rust-src"]
targets = ["riscv32imc-unknown-none-elf"]

Cargo.toml:

[package]
name = "http-client"
version = "0.1.0"
authors = ["Sergio Gasquez <sergio.gasquez@gmail.com>"]
edition = "2021"
license = "MIT OR Apache-2.0"
# TODO: Explain
resolver = "2"

# TODO: Explain
[profile.release]
# Explicitly disable LTO which the Xtensa codegen backend has issues
lto = "off"
opt-level = 3
[profile.dev]
lto = "off"

[dependencies]
hal             = { package = "esp32c3-hal", version = "0.12.0" }
esp-backtrace   = { version = "0.8.0", features = ["esp32c3", "panic-handler", "exception-handler", "print-uart"] }
esp-println     = { version = "0.6.0", features = ["esp32c3", "log"] }
esp-wifi        = { git = "https://github.com/esp-rs/esp-wifi/", features = ["esp32c3", "wifi-logs", "wifi"], rev = "e7140fd35852dadcd1df7592dc149e876256348f" }
smoltcp = { version = "0.10.0", default-features=false, features = ["proto-igmp", "proto-ipv4", "socket-tcp", "socket-icmp", "socket-udp", "medium-ethernet", "proto-dhcpv4", "socket-raw", "socket-dhcpv4"] }
embedded-svc = { version = "0.25.0", default-features = false, features = [] }
embedded-io = "0.4.0"
heapless = { version = "0.7.14", default-features = false }

[dev-dependencies]
compiler_builtins = "=0.1.100"
addr2line = "0.21.0"
allocator-api2 = "=0.2.15"
dlmalloc = "0.2.4"
fortanix-sgx-abi = "0.5.0"
getopts = "0.2.21"
hermit-abi = "=0.3.2"
libc = "=0.2.147"
miniz_oxide = "0.7.1"
object = "=0.32.0"
rustc-demangle = "0.1.23"
wasi = "0.11.0"
cc = "=1.0.79"
memchr = "=2.5.0"
unicode-width = "=0.1.10"
{
  description = "Flake to accompany https://n8henrie.com/2023/09/compiling-rust-for-the-esp32-with-nix/";
  inputs = {
    nixpkgs.url = "github:nixos/nixpkgs/release-23.05";
    rust-overlay = {
      url = "github:oxalica/rust-overlay";
      inputs.nixpkgs.follows = "nixpkgs";
    };
  };

  outputs = {
    self,
    nixpkgs,
    rust-overlay,
  }: let
    inherit (nixpkgs) lib;
    systems = ["aarch64-darwin" "x86_64-linux" "aarch64-linux"];
    systemClosure = attrs:
      builtins.foldl' (acc: system:
        lib.recursiveUpdate acc (attrs system)) {}
      systems;
  in
    systemClosure (
      system: let
        inherit ((builtins.fromTOML (builtins.readFile ./Cargo.toml)).package) name;
        pkgs = import nixpkgs {
          inherit system;
          overlays = [(import rust-overlay)];
        };
        toolchain = (
          pkgs.rust-bin.fromRustupToolchainFile ./rust-toolchain.toml
        );
        rustPlatform = let
          pkgsCross = import nixpkgs {
            inherit system;
            crossSystem = {
              inherit system;
              rustc.config = "riscv32imc-unknown-none-elf";
            };
          };
        in
          pkgsCross.makeRustPlatform
          {
            rustc = toolchain;
            cargo = toolchain;
          };
      in {
        packages.${system}.default = pkgs.callPackage ./. {
          inherit name rustPlatform;
        };

        devShells.${system}.default = pkgs.mkShell {
          buildInputs = [
            pkgs.cargo-espflash
            toolchain
          ];
        };

        apps.${system}.default = let
          flash = pkgs.writeShellApplication {
            name = "flash-${name}";
            runtimeInputs = [pkgs.cargo-espflash];
            text = ''
              espflash --monitor ${self.packages.${system}.default}/bin/${name}
            '';
          };
        in {
          type = "app";
          program = "${flash}/bin/flash-${name}";
        };
      }
    );
}

default.nix:

{
  lib,
  rustPlatform,
  name,
}: (rustPlatform.buildRustPackage
  {
    inherit name;
    src = lib.cleanSource ./.;
    cargoLock = {
      lockFile = ./Cargo.lock;
      outputHashes = {
        "esp-wifi-0.1.0" = "sha256-IUkX3inbeeRZk9q/mdg56h+qft+0/TVpOM4rCKNOwz8=";
      };
    };
    SSID = "foo";
    PASSWORD = "bar";
    doCheck = false;
  })

Currently working versions of the flake inputs:

$ nix flake metadata
Resolved URL:  git+file:///Users/n8henrie/git/no_std-training?dir=intro%2fhttp-client
Locked URL:    git+file:///Users/n8henrie/git/no_std-training?dir=intro%2fhttp-client
Description:   Flake to accompany https://n8henrie.com/2023/09/compiling-rust-for-the-esp32-with-nix/
Path:          /nix/store/dr1pc7kzsal5ndzwgj0lgypkr7fyvsiy-source
Last modified: 2023-09-18 00:51:10
Inputs:
├───nixpkgs: github:nixos/nixpkgs/43257a0d289e9f3fd5e3ad0dd022e911d9781a37
└───rust-overlay: github:oxalica/rust-overlay/23224b680af0b27b320adec2a0dae4eef29350e6
    ├───flake-utils: github:numtide/flake-utils/cfacdce06f30d2b68473a46042957675eebb3401
    │   └───systems: github:nix-systems/default/da67096a3b9bf56a91d16901293e51ba5b49a27e
    └───nixpkgs follows input 'nixpkgs'

Finally, as noted above, I used esp-rs/no_std-training at commit 88bc692d81dfcf9491c80dc7c9e8601b702e465a. If at some point this repo (or esp-wifi) are taken down, I’ve made forks available at e.g. github.com/n8henrie/esp-wifi.

https://n8henrie.com/2023/09/compiling-rust-for-the-esp32-with-nix/
Cross-Compile Rust for x86 Linux from M1 Mac with Nix
nixrustlinuxMacOStechtech

Bottom Line: Nix makes cross-compiling Rust fairly straightforward.

I have been tinkering with using nix to build Rust projects over the last couple of weeks and decided to try my hand at cross-compiling Rust for x86_64-linux from my M1 Mac (aarch64-darwin) via nix. Currently, several of my machines are running various flavors of NixOS (several aarch64-linux Raspberry Pis, a few x86_64-linux machines, an aarch64-linux Asahi-turned-NixOS machine, my MBP with nix-darwin), but it’s still really important for me to be able to compile for regular non-NixOS Linux machines.

Via rustup, rust does a great job providing toolchains to facilitate cross-compiling: simply rustup target add x86_64-unknown-linux-gnu. Unfortunately it doesn’t provide linkers, so even after you’ve added the toolchain, if you try to compile for linux, it’s not going to work:

$ cargo new linux-cross-example && cd linux-cross-example
$ rustup target add x86_64-unknown-linux-gnu
info: component 'rust-std' for target 'x86_64-unknown-linux-gnu' is up to date
$ cargo build --target=x86_64-unknown-linux-gnu
...
  = note: clang: warning: argument unused during compilation: '-pie' [-Wunused-command-line-argument]
          ld: unknown option: --as-needed
...

There are a number of workarounds, including downloading linkers via homebrew, various GitHub projects, using docker, or – my favorite – using zig to do the work for you (although this doesn’t seem to be working currently for musl targets, issue).

With nix, the current best practice seems to be having nix (as opposed to cargo / rust) do the heavy lifting of cross-compilation. Continuing in the linux-cross-example directory created above, I created a basic flake.nix, including these inputs:

inputs = {
    nixpkgs.url = "nixpkgs/nixos-unstable";
    rust-overlay.url = "github:oxalica/rust-overlay";
};

Perhaps the key feature of nix is ensuring reproducibility, so to that end, if readers are not having luck following this post, it may be necessary to pin the inputs to these specific revisions:

$ nix flake metadata
warning: Git tree '/Users/n8henrie/Desktop/linux-cross' is dirty
Resolved URL:  git+file:///Users/n8henrie/Desktop/linux-cross
Locked URL:    git+file:///Users/n8henrie/Desktop/linux-cross
Description:   Example of cross-compiling Rust on aarch64-darwin for x86_64-linux
Path:          /nix/store/d67wnc6v391x4gq5a24wzxbxxxfbvx07-source
Last modified: 1969-12-31 17:00:00
Inputs:
├───nixpkgs: github:nixos/nixpkgs/3c15feef7770eb5500a4b8792623e2d6f598c9c1
└───rust-overlay: github:oxalica/rust-overlay/a8b4bb4cbb744baaabc3e69099f352f99164e2c1
    ├───flake-utils: github:numtide/flake-utils/cfacdce06f30d2b68473a46042957675eebb3401
    │   └───systems: github:nix-systems/default/da67096a3b9bf56a91d16901293e51ba5b49a27e
    └───nixpkgs: github:NixOS/nixpkgs/96ba1c52e54e74c3197f4d43026b3f3d92e83ff9

For our first trick, we’ll try to compile a hello world program for x86_64-unknown-linux-gnu, probably better known as “your run-of-the-mill standard Linux system.” Here is the rest of my flake.nix:

{
  inputs = {
    nixpkgs.url = "nixpkgs/nixos-unstable";
    rust-overlay.url = "github:oxalica/rust-overlay";
  };

  outputs = {
    self,
    nixpkgs,
    rust-overlay,
  }: let
    system = "aarch64-darwin";
    overlays = [(import rust-overlay)];
    pkgs = import nixpkgs {
      inherit overlays system;
      crossSystem = {
        config = "x86_64-unknown-linux-gnu";
        rustc.config = "x86_64-unknown-linux-gnu";
      };
    };
  in {
    packages.${system} = {
      default = self.outputs.packages.${system}.x86_64-linux-example;
      x86_64-linux-example = pkgs.callPackage ./. {};
    };
  };
}

To go along with the above, we’ll use this very simple default.nix (which is the file that will be called by “default” via pkgs.callPackage ./., or if one were to import a directory as in import ./.):

{rustPlatform}:
rustPlatform.buildRustPackage {
  name = "rust-cross-test";
  src = ./.;
  cargoLock.lockFile = ./Cargo.lock;
}

So currently our working directory looks like this:

$ tree .
.
├── Cargo.toml
├── default.nix
├── flake.nix
└── src
    └── main.rs

2 directories, 4 files

Amazingly, all that it takes for a successful build from here is to first run cargo update (or cargo build) to generate Cargo.lock, and then run nix build!

$ cargo update
$ nix build
warning: Git tree '/Users/n8henrie/Desktop/linux-cross' is dirty
warning: creating lock file '/Users/n8henrie/Desktop/linux-cross/flake.lock'
warning: Git tree '/Users/n8henrie/Desktop/linux-cross' is dirty
$ echo $?
0

We can see that the resulting file seems to have the expected architecture:

$ file result/bin/linux-cross-example
result/bin/linux-cross-example: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /nix/store/2abz7cq1p8c1pg38prm2gpja67bzr9gq-glibc-x86_64-unknown-linux-gnu-2.37-8/lib/ld-linux-x86-64.so.2, for GNU/Linux 3.10.0, not stripped

I used scp to copy the binary from ./result/bin/linux-cross-example to my Arch linux machine. Unfortunately, upon trying to run it, I got a surprising error:

$ cat /etc/os-release
NAME="Arch Linux"
PRETTY_NAME="Arch Linux"
ID=arch
BUILD_ID=rolling
ANSI_COLOR="38;2;23;147;209"
HOME_URL="https://archlinux.org/"
DOCUMENTATION_URL="https://wiki.archlinux.org/"
SUPPORT_URL="https://bbs.archlinux.org/"
BUG_REPORT_URL="https://bugs.archlinux.org/"
PRIVACY_POLICY_URL="https://terms.archlinux.org/docs/privacy-policy/"
LOGO=archlinux-logo
$ ./linux-cross-example
-bash: ./linux-cross-example: cannot execute: required file not found

Huh.

$ ldd ./linux-cross-example
        linux-vdso.so.1 (0x00007ffc363a4000)
        libgcc_s.so.1 => /usr/lib/libgcc_s.so.1 (0x00007f5dc90ed000)
        libc.so.6 => /usr/lib/libc.so.6 (0x00007f5dc8e00000)
        /nix/store/2abz7cq1p8c1pg38prm2gpja67bzr9gq-glibc-x86_64-unknown-linux-gnu-2.37-8/lib/ld-linux-x86-64.so.2 => /usr/lib64/ld-linux-x86-64.so.2 (0x00007f5dc91a6000)

Huh, so it seems to be looking for ld-linux-x86-64.so.2 in /nix/store but isn’t finding it (because it’s not there).

Skipping back up a few lines, we actually already saw this path in the output from the file command, run locally on MacOS: dynamically linked, interpreter /nix/store/2abz7c...

After a bit of investigative work, it seems that rust binaries are mostly statically linked by default, but do need to find a few libraries like glibc, which are dynamically linked. Nix is creating this binary in such a way that it is trying to find nix’s copy of this required file, but the nix version doesn’t exist on my Arch machine. Apparently most Linux machines put it in /lib64/ or perhaps /usr/lib64/; on my Arch machine, it looks like /lib64/ should work (which is a symlink to /usr/lib/):

$ stat /lib64/ld-linux-x86-64.so.2
  File: /lib64/ld-linux-x86-64.so.2
  Size: 216192    	Blocks: 424        IO Block: 4096   regular file
Device: 0,25	Inode: 17427431    Links: 1
Access: (0755/-rwxr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2023-09-05 14:40:37.920798992 -0600
Modify: 2023-08-17 09:05:37.000000000 -0600
Change: 2023-08-22 14:10:02.729886634 -0600
 Birth: 2023-08-22 14:10:02.729886634 -0600

Thankfully, nix provides a tool called patchelf that can patch the binary to look in a non-default location for this required file. We’ll add it to default.nix:

{rustPlatform}:
rustPlatform.buildRustPackage {
  name = "rust-cross-test";
  src = ./.;
  cargoLock.lockFile = ./Cargo.lock;
  postBuild = ''
    patchelf --set-interpreter /lib64/ld-linux-x86-64.so.2 target/x86_64-unknown-linux-gnu/release/linux-cross-example
  '';
}

We’ll once again run nix build, use scp to copy the binary, and…

$ ./linux-cross-example
Hello, world!

Sweet, it works! We cross-compiled Rust from our M1 Mac to x86_64-linux with just a few lines of nix code!

For our next challenge, let’s see if we can build a fully static x86_64-unknown-linux-gnu binary! We’ll modify default.nix by removing the patchelf code (since this will by fully static and not require the --set-interpreter business) and adding a few lines of code from the same StackOverflow thread from above:

{
  rustPlatform,
  glibc,
}:
rustPlatform.buildRustPackage {
  name = "rust-cross-test";
  src = ./.;
  cargoLock.lockFile = ./Cargo.lock;
  buildInputs = [glibc.static];
  RUSTFLAGS = ["-C" "target-feature=+crt-static"];
}
$ nix build
$ file result/bin/linux-cross-example
result/bin/linux-cross-example: ELF 64-bit LSB pie executable, x86-64, version 1 (GNU/Linux), static-pie linked, for GNU/Linux 3.10.0, not stripped

It certainly looks like a static binary. Sure enough, our it runs like a champ on our Linux machine!

$ ldd linux-cross-example
        statically linked
$ ./linux-cross-example
Hello, world!

For our last trick, we’ll try to compile a fully static musl build, which should run on basically any x86_64-linux machine. For this, we can revert our default.nix back to the very simple way it started:

{rustPlatform}:
rustPlatform.buildRustPackage {
  name = "rust-cross-test";
  src = ./.;
  cargoLock.lockFile = ./Cargo.lock;
}

And simply change flake.nix to reflect the musl target triple, setting isStatic = true;:

{
  inputs = {
    nixpkgs.url = "nixpkgs/nixos-unstable";
    rust-overlay.url = "github:oxalica/rust-overlay";
  };

  outputs = {
    self,
    nixpkgs,
    rust-overlay,
  }: let
    system = "aarch64-darwin";
    overlays = [(import rust-overlay)];
    pkgs = import nixpkgs {
      inherit overlays system;
      crossSystem = {
        config = "x86_64-unknown-linux-musl";
        rustc.config = "x86_64-unknown-linux-musl";
        isStatic = true;
      };
    };
  in {
    packages.${system} = {
      default = self.outputs.packages.${system}.x86_64-linux-musl-example;
      x86_64-linux-musl-example = pkgs.callPackage ./. {};
    };
  };
}
$ nix build
$ file result/bin/linux-cross-example
result/bin/linux-cross-example: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), static-pie linked, not stripped

On the Arch machine:

$ ldd linux-cross-example
        statically linked
$ ./linux-cross-example
Hello, world!

Cool! It took a little bit of reading and tinkering to sort this out, but in the end it’s a remarkably simple setup requiring very few lines of code (at least for this hello world project). As a side note, I didn’t have any luck statically compiling for x86_64-unknown-linux-gnu with the isStatic setting; for me this results in a unsupported system error.

Putting everything together, with a little bit of refactoring, and adding a bonus config for aarch64-unknown-linux-musl (which runs without issue on an aarch64-linux Raspberry Pi):

{
  description = "Example of cross-compiling Rust on aarch64-darwin for x86_64-linux";

  inputs = {
    nixpkgs.url = "github:nixos/nixpkgs/nixos-unstable";
    rust-overlay.url = "github:oxalica/rust-overlay";
  };

  outputs = {
    self,
    nixpkgs,
    rust-overlay,
  }: let
    system = "aarch64-darwin";
    overlays = [(import rust-overlay)];
    makePkgs = config:
      import nixpkgs {
        inherit overlays system;
        crossSystem = {
          inherit config;
          rustc = {inherit config;};
          isStatic = builtins.elem config [
            "aarch64-unknown-linux-musl"
            "x86_64-unknown-linux-musl"
          ];
        };
      };
  in {
    packages.${system} = {
      default = self.outputs.packages.${system}.x86_64-linux-gnu-example;
      x86_64-linux-gnu-example = (makePkgs "x86_64-unknown-linux-gnu").callPackage ./. {};
      x86_64-linux-gnu-static-example = (makePkgs "x86_64-unknown-linux-gnu").callPackage ./. {buildGNUStatic = true;};
      x86_64-linux-musl-example = (makePkgs "x86_64-unknown-linux-musl").callPackage ./. {};
      aarch64-linux-musl-example = (makePkgs "aarch64-unknown-linux-musl").callPackage ./. {};
    };
  };
}
{
  rustPlatform,
  glibc,
  targetPlatform,
  lib,
  buildGNUStatic ? false,
}:
rustPlatform.buildRustPackage ({
    name = "rust-cross-test";
    src = ./.;
    cargoLock.lockFile = ./Cargo.lock;
  }
  // (
    if buildGNUStatic
    then {
      buildInputs = [glibc.static];
      RUSTFLAGS = ["-C" "target-feature=+crt-static"];
    }
    else
      lib.optionalAttrs (targetPlatform.config == "x86_64-unknown-linux-gnu") {
        postBuild = ''
          patchelf --set-interpreter /lib64/ld-linux-x86-64.so.2 target/x86_64-unknown-linux-gnu/release/linux-cross-example
        '';
      }
  ))

From here, one should be able to nix build .#x86_64-linux-musl-example and be off to the races! And thanks to the power of nix, with any luck, and if you pin your inputs to the versions listed towards the beginning of this post, you should theoretically be able to rely on a successful build today, tomorrow, and maybe months, years, or – who knows – maybe even a decade from now!

https://n8henrie.com/2023/09/crosscompile-rust-for-x86-linux-from-m1-mac-with-nix/
Hacking on nixpkgs with nix develop
linuxMac OSXMacOSnixtechtech

Bottom Line: Failing to add MPS support to nixpkgs.python310Packages.torch.

I’ve been using nix / nixpkgs / NixOS on and off for a year or two now. I’ve been genuinely impressed at how open and welcome the community is to contributions, so I’d like to start taking advantage of that by contributing back.

I’m writing this post as I go, in the spirit of this (highly recommended) series: https://ianthehenry.com/posts/how-to-learn-nix/.

The issue I’d like to work on is adding MPS support to the pytorch derivation: https://github.com/NixOS/nixpkgs/issues/243868; NB: the torch-bin derivation already has it.

(SPOILER: I was not successful, but I did learn a few things along the way.)

I start by cloning nixpkgs, and for the sake of reproducibility for readers in years to come (one of the big advantages of nix!) we’ll checkout a specific commit (though one would normally stay on master in order to contribute a PR).

$ git clone https://github.com/NixOS/nixpkgs.git
$ cd nixpkgs
$ git checkout 7d053c812bb59bbb15293f9bb6087748e7c21b1a

First we’ll make sure we can build and run the current derivation (some context for this command):

$ nix shell "$(
    nix eval --raw --apply '
        py: (py.withPackages (pp: [ pp.torch ])).drvPath
    ' .#python310
)"
$ type -p python
/nix/store/1sgzgqbfyj8sn7rjzhvrzy1nj38cwfi1-python3-3.10.12-env/bin/python
$ python -c 'import torch; print(torch.__version__, torch.backends.mps.is_available())'
2.0.1 False
$ exit
$

Cool.

The nix build process is documented in a few places, to get an idea of what’s going on above:

While inside an interactive nix-shell, if you wanted to run all phases in the order they would be run in an actual build, you can invoke genericBuild yourself.

So basically, it looks like the process is:

  • run nix-shell -A foo <- this sources $stdenv/setup, which defines a bunch of phases
  • run genericBuild <- this was defined in the step above

Let’s try it:

$ # show that e.g. buildPhase and stdenv are undefined:
$ declare -p stdenv
bash: declare: stdenv: not found
$ declare -f buildPhase
$
$ nix-shell -A python3Packages.torch
Sourcing python-remove-tests-dir-hook
Sourcing python-catch-conflicts-hook.sh
Sourcing python-remove-bin-bytecode-hook.sh
Sourcing setuptools-build-hook
Using setuptoolsBuildPhase
Using setuptoolsShellHook
Sourcing pip-install-hook
Using pipInstallPhase
Sourcing python-imports-check-hook.sh
Using pythonImportsCheckPhase
Sourcing python-namespaces-hook
Sourcing python-catch-conflicts-hook.sh
Executing setuptoolsShellHook
Finished executing setuptoolsShellHook
$
$ # we can see that stdenv and buildPhase are now defined:
[nix-shell:~/git/nixpkgs]$ declare -p stdenv
declare -x stdenv="/nix/store/lplcqh67ldaj5f4pg4js2sgf860nn4iz-stdenv-darwin"
[nix-shell:~/git/nixpkgs]$ declare -f buildPhase
buildPhase ()
{
    runHook preBuild;
    if [[ -z "${makeFlags-}" && -z "${makefile:-}" && ! ( -e Makefile || -e makefile || -e GNUmakefile ) ]]; then
        echo "no Makefile or custom buildPhase, doing nothing";
    else
        foundMakefile=1;
        local flagsArray=(${enableParallelBuilding:+-j${NIX_BUILD_CORES}} SHELL=$SHELL);
        _accumFlagsArray makeFlags makeFlagsArray buildFlags buildFlagsArray;
        echoCmd 'build flags' "${flagsArray[@]}";
        make ${makefile:+-f $makefile} "${flagsArray[@]}";
        unset flagsArray;
    fi;
    runHook postBuild
}
[nix-shell:~/git/nixpkgs]$
[nix-shell:~/git/nixpkgs]$ # we can see where buildPhase is defined:
[nix-shell:~/git/nixpkgs]$ grep '^buildPhase' $stdenv/setup
buildPhase() {

Now we’ll try with the nix develop environment. You can add the --debug flag for a lot of extra information on what it’s doing.

$ # show that stdenv and buildPhase are undefined:
$ declare -p stdenv
bash: declare: stdenv: not found
$ declare -f buildPhase
$
$ nix develop .#python310Packages.pytorch
$ # it looks like the same environment is available:
$ declare -p stdenv
declare -x stdenv="/nix/store/lplcqh67ldaj5f4pg4js2sgf860nn4iz-stdenv-darwin"
$ declare -f buildPhase
buildPhase ()
{
    runHook preBuild;
    if [[ -z "${makeFlags-}" && -z "${makefile:-}" && ! ( -e Makefile || -e makefile || -e GNUmakefile ) ]]; then
        echo "no Makefile or custom buildPhase, doing nothing";
    else
        foundMakefile=1;
        local flagsArray=(${enableParallelBuilding:+-j${NIX_BUILD_CORES}} SHELL=$SHELL);
        _accumFlagsArray makeFlags makeFlagsArray buildFlags buildFlagsArray;
        echoCmd 'build flags' "${flagsArray[@]}";
        make ${makefile:+-f $makefile} "${flagsArray[@]}";
        unset flagsArray;
    fi;
    runHook postBuild
}

If we look at the definition of genericBuild in $stdenv/setup, we can see the list of phases that the manual keeps talking about:

if [ -z "${phases[*]:-}" ]; then
    phases="${prePhases[*]:-} unpackPhase patchPhase ${preConfigurePhases[*]:-} \
        configurePhase ${preBuildPhases[*]:-} buildPhase checkPhase \
        ${preInstallPhases[*]:-} installPhase ${preFixupPhases[*]:-} fixupPhase installCheckPhase \
        ${preDistPhases[*]:-} distPhase ${postPhases[*]:-}";
fi

In this case phases doesn’t seem to be defined…

$ declare | grep '^phases'
$

… so I assume it’s using the default phases listed above.

Let’s see what happens if we manually follow those phases:

$ ${prePhases[*]}
$ unpackPhase
unpacking source archive /nix/store/dxqxfw4r00s0v033w7yam3bkblynrad7-source
source root is source
setting SOURCE_DATE_EPOCH to timestamp 315644400 of file source/version.txt
$ patchPhase
$ ${preConfigurePhases[*]}
$ configurePhase
no configure script, doing nothing
$ ${preBuildPhases[*]}
$ buildPhase
/nix/store/bq1q4gk52gsx4fg4pf07f2kxqgazkcls-python3-3.10.12/bin/python3.10: can't open file '/Users/n8henrie/git/nixpkgs/setup.py': [Errno 2] No such file or directory
CMake Error: The source directory "/Users/n8henrie/git/nixpkgs/build" does not exist.
Specify --help for usage, or press the help button on the CMake GUI.
no Makefile or custom buildPhase, doing nothing
bash: pushd: dist: No such file or directory
Bad wheel filename 'torch-2.0.1*.whl'
sed: can't read unpacked/torch-2.0.1/torch-2.0.1.dist-info/METADATA: No such file or directory
Traceback (most recent call last):
  File "/nix/store/bq1q4gk52gsx4fg4pf07f2kxqgazkcls-python3-3.10.12/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/nix/store/bq1q4gk52gsx4fg4pf07f2kxqgazkcls-python3-3.10.12/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/nix/store/gdh5vg1j8b4qmri26hzl520asq9j3h8a-python3.10-wheel-0.38.4/lib/python3.10/site-packages/wheel/__main__.py", line 23, in <module>
    sys.exit(main())
  File "/nix/store/gdh5vg1j8b4qmri26hzl520asq9j3h8a-python3.10-wheel-0.38.4/lib/python3.10/site-packages/wheel/__main__.py", line 19, in main
    sys.exit(wheel.cli.main())
  File "/nix/store/gdh5vg1j8b4qmri26hzl520asq9j3h8a-python3.10-wheel-0.38.4/lib/python3.10/site-packages/wheel/cli/__init__.py", line 91, in main
    args.func(args)
  File "/nix/store/gdh5vg1j8b4qmri26hzl520asq9j3h8a-python3.10-wheel-0.38.4/lib/python3.10/site-packages/wheel/cli/__init__.py", line 25, in pack_f
    pack(args.directory, args.dest_dir, args.build_number)
  File "/nix/store/gdh5vg1j8b4qmri26hzl520asq9j3h8a-python3.10-wheel-0.38.4/lib/python3.10/site-packages/wheel/cli/pack.py", line 25, in pack
    for fn in os.listdir(directory)
FileNotFoundError: [Errno 2] No such file or directory: 'unpacked/torch-2.0.1'
bash: popd: directory stack empty

Huh, a bunch of errors there.

unpackPhase seems to create a directory ./source. My guess is that we’re supposed to cd here at some point. Sure enough, looking at the end of genericBuild, we see:

if [ "$curPhase" = unpackPhase ]; then
    [ -z "${sourceRoot}" ] || chmod +x "${sourceRoot}";
    cd "${sourceRoot:-.}";
fi;

So let’s try again:

$ declare -p sourceRoot
declare -- sourceRoot="source"
$ cd $sourceRoot
$ buildPhase

After which we see lots of build output, ending in a few errors:

-- Build files have been written to: /Users/n8henrie/git/nixpkgs/source/build
make[1]: Entering directory '/Users/n8henrie/git/nixpkgs/source/build'
make[1]: *** No targets specified and no makefile found.  Stop.
make[1]: Leaving directory '/Users/n8henrie/git/nixpkgs/source/build'
make: *** [Makefile:6: all] Error 2
bash: pushd: dist: No such file or directory
Bad wheel filename 'torch-2.0.1*.whl'
sed: can't read unpacked/torch-2.0.1/torch-2.0.1.dist-info/METADATA: No such file or directory
Traceback (most recent call last):
  File "/nix/store/bq1q4gk52gsx4fg4pf07f2kxqgazkcls-python3-3.10.12/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/nix/store/bq1q4gk52gsx4fg4pf07f2kxqgazkcls-python3-3.10.12/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/nix/store/gdh5vg1j8b4qmri26hzl520asq9j3h8a-python3.10-wheel-0.38.4/lib/python3.10/site-packages/wheel/__main__.py", line 23, in <module>
    sys.exit(main())
  File "/nix/store/gdh5vg1j8b4qmri26hzl520asq9j3h8a-python3.10-wheel-0.38.4/lib/python3.10/site-packages/wheel/__main__.py", line 19, in main
    sys.exit(wheel.cli.main())
  File "/nix/store/gdh5vg1j8b4qmri26hzl520asq9j3h8a-python3.10-wheel-0.38.4/lib/python3.10/site-packages/wheel/cli/__init__.py", line 91, in main
    args.func(args)
  File "/nix/store/gdh5vg1j8b4qmri26hzl520asq9j3h8a-python3.10-wheel-0.38.4/lib/python3.10/site-packages/wheel/cli/__init__.py", line 25, in pack_f
    pack(args.directory, args.dest_dir, args.build_number)
  File "/nix/store/gdh5vg1j8b4qmri26hzl520asq9j3h8a-python3.10-wheel-0.38.4/lib/python3.10/site-packages/wheel/cli/pack.py", line 25, in pack
    for fn in os.listdir(directory)
FileNotFoundError: [Errno 2] No such file or directory: 'unpacked/torch-2.0.1'
bash: popd: directory stack empty
$

Huh. Let’s dig deeper.

$ declare -p buildPhase
declare -- buildPhase="setuptoolsBuildPhase"
$ declare -f setuptoolsBuildPhase
setuptoolsBuildPhase ()
{
    echo "Executing setuptoolsBuildPhase";
    local args;
    runHook preBuild;
    cp -f /nix/store/fscd8f71wmpwphcmi5mx8qnif2402x9m-run_setup.py nix_run_setup;
    args="";
    if [ -n "$setupPyGlobalFlags" ]; then
        args+="$setupPyGlobalFlags";
    fi;
    if [ -n "$enableParallelBuilding" ]; then
        setupPyBuildFlags+=" --parallel $NIX_BUILD_CORES";
    fi;
    if [ -n "$setupPyBuildFlags" ]; then
        args+=" build_ext $setupPyBuildFlags";
    fi;
    eval "/nix/store/bq1q4gk52gsx4fg4pf07f2kxqgazkcls-python3-3.10.12/bin/python3.10 nix_run_setup $args bdist_wheel";
    runHook postBuild;
    echo "Finished executing setuptoolsBuildPhase"
}
$ cd $sourceRoot
$ setuptoolsBuildPhase
$ # this fails with the same error, obviously

Bummer. Digging into this a bit, it looks like nix develop may just not be quite ready for primetime with some differences in behavior compared to nix-shell.

Further, reading in this issue, it looks like sometimes a particular phase may be a variable and other times a function, sometimes with one overriding the other. This means that running buildPhase and eval "$buildPhase" may produce totally different results, which is why the manual suggests running these interactively as eval "${buildPhase:-buildPhase}".

I had to run an example to wrap my head around this:

$ # define a variable `foo`
$ foo='echo bar'
$ # define a function `foo`
$ foo() { echo baz; }
$ # `type` calls `foo` a function -- is this just because it was defined last?
$ type -t foo
function
$ # `eval` seems to run the variable, as if we had run `$ "${foo}"` or `$ $foo`
$ eval "${foo:-foo}"
bar
$ # plain `foo` runs the function
$ foo
baz
$ foo='echo bar'
$ type -t foo
function
$ asdf() { echo asdfasdf; }
$ eval "${asdf:-asdf}"
asdfasdf
$ declare | grep '^foo'
foo='echo bar'
foo ()

I’m a little embarrassed that it wasn’t immediately obvious to me, but clearly eval "${foo:-foo}" means “run the variable if it’s defined, otherwise run the function,” so the variable should take priority if present.

Going back to our nix-shell, we can see which functions seem to be overridden:

$ declare | grep -o '^\w[^= ]\+' | sort | uniq -d
buildPhase
checkPhase
installCheckPhase
installPhase

So it looks like buildPhase should be our first phase that we need to echo $buildPhase" to see in this case; e.g. type checkPhase should work for the preceding phases. Unfortunately, this doesn’t solve our issue, as manually running each phase as eval "${thePhase:-thePhase}" also fails.

Looking closer at the errors, there are several errors about protobuf paths:

source/third_party/protobuf/src/google/protobuf/implicit_weak_message.cc:31:10: fatal error: 'google/protobuf/implicit_weak_message.h' file not found
#include <google/protobuf/implicit_weak_message.h>

The file exists:

$ find . -name implicit_weak_message.h
./third_party/protobuf/src/google/protobuf/implicit_weak_message.h

Interestingly, it looks like nix-shell also fails if I run from this bash --norc environment:

$ nix-shell --pure \
    -I nixpkgs=https://github.com/nixos/nixpkgs/archive/nixpkgs-unstable.tar.gz \
    -A python310Packages.pytorch \
    '<nixpkgs>' \
    --command 'bash --norc'

I’m not sure what to make of this, other than that there may be some impurities in this derivation that are requiring paths or environment variables outside the nix store?

Let’s try cleaning the nix develop environment’s PATH of all non-/nix elements:

$ nix develop ~/git/nixpkgs#python310Packages.pytorch
$ # prints path delimited by linebreak instead of :
$ printf "${PATH//:/'\n'}\n"
$ export PATH=$(awk -v RS=: -v ORS=: '$0 ~ /^\/nix\/.*/' <<<"$PATH")
$ # verify it worked
$ printf "${PATH//:/'\n'}\n"
$ genericBuild
$ ls ../outputs/dist/
torch-2.0.1-cp310-cp310-macosx_11_0_arm64.whl

Huh, so getting rid of the non-nix paths allowed it to succeed. Let’s try with the --ignore-environment flag:

$ nix develop -i ~/git/nixpkgs#python310Packages.pytorch
$ export PATH=$(awk -v RS=: -v ORS=: '$0 ~ /^\/nix\/.*/' <<<"$PATH")
$ genericBuild
$ ls ../outputs/dist/
torch-2.0.1-cp310-cp310-macosx_11_0_arm64.whl

Sure enough that also works. I don’t know why building fails with bash --norc.

I think that removing the non-/nix paths allows it to succeed because it finds xcrun at /usr/bin/xcrun, which points it to the system MacOS SDK:

$ xcrun --sdk macosx --show-sdk-path
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX13.3.sdk

where it finds the MetalPerformanceShaders framework and tries to enable MPS (one can see USE_MPS is enabled in the build logs). However, this framework (which seems to lives within nixpkgs at nixpkgs#darwin.apple_sdk.frameworks) isn’t passed into the derivation, so it fails with ld: framework not found MetalPerformanceShaders.

Let’s try patching nixpkgs to pass in this framework:

diff --git a/pkgs/development/python-modules/torch/default.nix b/pkgs/development/python-modules/torch/default.nix
index 912628bf9497..eb452442201d 100644
--- a/pkgs/development/python-modules/torch/default.nix
+++ b/pkgs/development/python-modules/torch/default.nix
@@ -10,7 +10,7 @@

   # Build inputs
   numactl,
-  Accelerate, CoreServices, libobjc,
+  Accelerate, CoreServices, MetalPerformanceShaders, libobjc,

   # Propagated build inputs
   filelock,
@@ -27,6 +27,9 @@
   # this is also what official pytorch build does
   mklDnnSupport ? !(stdenv.isDarwin && stdenv.isAarch64),

+  # Use MPS on M1 machines
+  mpsSupport ? (stdenv.isDarwin && stdenv.isAarch64),
+
   # virtual pkg that consistently instantiates blas across nixpkgs
   # See https://github.com/NixOS/nixpkgs/pull/83888
   blas,
@@ -294,7 +297,8 @@ in buildPythonPackage rec {
     ++ lib.optionals rocmSupport [ openmp ]
     ++ lib.optionals (cudaSupport || rocmSupport) [ magma ]
     ++ lib.optionals stdenv.isLinux [ numactl ]
-    ++ lib.optionals stdenv.isDarwin [ Accelerate CoreServices libobjc ];
+    ++ lib.optionals stdenv.isDarwin [ Accelerate CoreServices libobjc ]
+    ++ lib.optionals mpsSupport [ MetalPerformanceShaders ];

   propagatedBuildInputs = [
     cffi
diff --git a/pkgs/top-level/python-packages.nix b/pkgs/top-level/python-packages.nix
index 088b79d86c37..1b91ef99a445 100644
--- a/pkgs/top-level/python-packages.nix
+++ b/pkgs/top-level/python-packages.nix
@@ -12681,7 +12681,7 @@ self: super: with self; {
       if pkgs.config.cudaSupport
       then pkgs.magma-cuda-static
       else pkgs.magma;
-    inherit (pkgs.darwin.apple_sdk.frameworks) Accelerate CoreServices;
+    inherit (pkgs.darwin.apple_sdk.frameworks) Accelerate CoreServices MetalPerformanceShaders;
     inherit (pkgs.darwin) libobjc;
     inherit (pkgs.llvmPackages_rocm) openmp;
   };

With this patch, the build succeeds, but MPS is still not present:

$ nix develop -i ~/git/nixpkgs#python310Packages.pytorch
$ export PATH=$(awk -v RS=: -v ORS=: '$0 ~ /^\/nix\/.*/' <<<"$PATH")
$ genericBuild
$ # ... build succeeds ...
$ ls ../outputs/dist
torch-2.0.1-cp310-cp310-macosx_11_0_arm64.whl

It would save a lot of time to figure out how to run the phases independently so that I can avoid needing to e.g. download / unpack the source code repeatedly (which requires replaying any modifications repeatedly). Reading nix develop --help, it seemed like I should be able to take advantage of built-in flags to run the appropriate phase, for example starting with nix develop --unpack ~/git/nixpkgs#python310Packages.torch. I struggled with this for a day or two because i kept trying to run this from a clean working directory (first running cd $(mktemp -d)) and ran into weird errors that referenced the code code’s root directory (at ~/git/nixpkgs in this case). Reading the nix develop manpage didn’t help much.

I eventually figured out a few gotchas that – to me – were nonobvious:

  • that these commands should be run from the flake’s root directory, which in this case is the nixpkgs repo, which I’ve cloned to ~/git/nixpkgs
  • runnning unpackPhase twice in a row fails because ./source (and maybe ./outputs) already exists
  • running with a specified phase does not drop you into a nix environment for subsequent commands (such as manually running genericBuild)
    • nix develop ~/git/nixpkgs#python310Packages.torch followed by echo $IN_NIX_SHELL prints impure
    • nix develop --configure ~/git/nixpkgs#python310Packages.torch doesn’t print anything
    • the same is true for e.g. nix develop ~/git/nixpkgs#python310Packages.torch --command bash -c 'echo $IN_NIX_SHELL', which prints impure when the command is run, but if run a second time the result is empty, shows that you are not left in that environment after the nix develop command completes
  • running nix develop -i ~/git/nixpkgs#python3Packages.pytorch --unpack from any directory creates source at ~/git/nixpkgs/source, not in the current directory
    • this really confused me, as the command leaves you in your current directory, so unless you’re in the root directory, ./source never appears in your current working directory, but running the command twice fails with an error suggesting that it exists

Demonstrating the last point, and how behavior differs between entering the develop environment and manually running unpackPhase vs using flags like --unpack:

$ # Go to a clean temporary directory
$ cd $(mktemp -d)
$ ls -ld ./source ~/git/nixpkgs/source
ls: cannot access './source': No such file or directory
ls: cannot access '/Users/n8henrie/git/nixpkgs/source': No such file or directory
$ nix develop -i ~/git/nixpkgs#python3Packages.pytorch
$ ls -ld ./source ~/git/nixpkgs/source
ls: cannot access './source': No such file or directory
ls: cannot access '/Users/n8henrie/git/nixpkgs/source': No such file or directory
$ eval "${unpackPhase:-unpackPhase}"
$ # source is unpacked to PWD, not to ~/git/nixpkgs
$ ls -ld ./source ~/git/nixpkgs/source
ls: cannot access '/Users/n8henrie/git/nixpkgs/source': No such file or directory
drwxr-xr-x 80 n8henrie staff 2560 Jan  2  1980 ./source

contrast that with:

$ # Go to a clean temporary directory
$ cd $(mktemp -d)
$ ls -ld ./source ~/git/nixpkgs/source
ls: cannot access './source': No such file or directory
ls: cannot access '/Users/n8henrie/git/nixpkgs/source': No such file or directory
$ nix develop -i ~/git/nixpkgs#python3Packages.pytorch --unpack
$ ls -ld ./source ~/git/nixpkgs/source
ls: cannot access './source': No such file or directory
drwxr-xr-x 80 n8henrie staff 2560 Jan  2  1980 /Users/n8henrie/git/nixpkgs/source

Keep in mind, this means that with the former approach, one can change to a clean workdir and repeat the exact same steps without error, but with the second approach, you’ll get an error if you don’t first remove ~/git/nixpkgs/source:

$ cd "$(mktemp -d)"
$ ls
$ nix develop -i ~/git/nixpkgs#python3Packages.pytorch --unpack
Executing setuptoolsShellHook
Finished executing setuptoolsShellHook
unpacking source archive /nix/store/dxqxfw4r00s0v033w7yam3bkblynrad7-source
Cannot copy /nix/store/dxqxfw4r00s0v033w7yam3bkblynrad7-source to source: destination already exists!
Did you specify two "srcs" with the same "name"?
do not know how to unpack source archive /nix/store/dxqxfw4r00s0v033w7yam3bkblynrad7-source

Note that the error above doesn’t mention anything about ~/git/nixpkgs, which threw me off.

Other notes worth mentioning:

  • the phases are not already defined within the context of --command (with or without -i), so one needs to source $stdenv/setup to get much done
  • --command runs within the context of the current working directory (not the flake root)
$ cd "$(mktemp -d)"
$ ls
$ nix develop ~/git/nixpkgs#python3Packages.pytorch \
    --command bash -c 'eval "${unpackPhase:-unpackPhase}"'
Executing setuptoolsShellHook
Finished executing setuptoolsShellHook
bash: line 1: unpackPhase: command not found
$ nix develop ~/git/nixpkgs#python3Packages.pytorch --command bash -c '
    source $stdenv/setup
    eval "${unpackPhase:-unpackPhase}"
'
Executing setuptoolsShellHook
Finished executing setuptoolsShellHook
Sourcing python-remove-tests-dir-hook
Sourcing python-catch-conflicts-hook.sh
Sourcing python-remove-bin-bytecode-hook.sh
Sourcing setuptools-build-hook
Using setuptoolsBuildPhase
Using setuptoolsShellHook
Sourcing pip-install-hook
Using pipInstallPhase
Sourcing python-imports-check-hook.sh
Using pythonImportsCheckPhase
Sourcing python-namespaces-hook
Sourcing python-catch-conflicts-hook.sh
unpacking source archive /nix/store/dxqxfw4r00s0v033w7yam3bkblynrad7-source
source root is source
setting SOURCE_DATE_EPOCH to timestamp 315619200 of file source/version.txt
$ ls
source
  • commands like --configure don’t seem to respect overridden phases (such as a custom configurePhase) and run their phase within the context of the flake’s root directory. For example, I manually added to the torch derivation: configurePhase = ''echo "I am in configure!"; ls -l'';. With that change:
$ cd "$(mktemp -d)"
$ nix develop -i ~/git/nixpkgs#python310Packages.torch --configure
Executing setuptoolsShellHook
Finished executing setuptoolsShellHook
no configure script, doing nothing
$ # Add a fake executable named `./configure`:
$ touch ~/git/nixpkgs/configure
$ chmod +x !$
$ nix develop -i ~/git/nixpkgs#python310Packages.torch --configure
Executing setuptoolsShellHook
Finished executing setuptoolsShellHook
configure flags: --prefix=/private/var/folders/kb/tw_lp_xd2_bbv0hqk4m0bvt80000gn/T/tmp.aZbb96cCot/outputs/out --bindir=/private/var/folders/kb/tw_lp_xd2_bbv0hqk4m0bvt80000gn/T/tmp.aZbb96cCot/outputs/out/bin --sbindir=/private/var/folders/kb/tw_lp_xd2_bbv0hqk4m0bvt80000gn/T/tmp.aZbb96cCot/outputs/out/sbin --includedir=/private/var/folders/kb/tw_lp_xd2_bbv0hqk4m0bvt80000gn/T/tmp.aZbb96cCot/outputs/dev/include --oldincludedir=/private/var/folders/kb/tw_lp_xd2_bbv0hqk4m0bvt80000gn/T/tmp.aZbb96cCot/outputs/dev/include --mandir=/private/var/folders/kb/tw_lp_xd2_bbv0hqk4m0bvt80000gn/T/tmp.aZbb96cCot/outputs/out/share/man --infodir=/private/var/folders/kb/tw_lp_xd2_bbv0hqk4m0bvt80000gn/T/tmp.aZbb96cCot/outputs/out/share/info --docdir=/private/var/folders/kb/tw_lp_xd2_bbv0hqk4m0bvt80000gn/T/tmp.aZbb96cCot/outputs/out/share/doc/python3.10-torch --libdir=/private/var/folders/kb/tw_lp_xd2_bbv0hqk4m0bvt80000gn/T/tmp.aZbb96cCot/outputs/lib/lib --libexecdir=/private/var/folders/kb/tw_lp_xd2_bbv0hqk4m0bvt80000gn/T/tmp.aZbb96cCot/outputs/lib/libexec --localedir=/private/var/folders/kb/tw_lp_xd2_bbv0hqk4m0bvt80000gn/T/tmp.aZbb96cCot/outputs/lib/share/locale
$
$ nix develop -i ~/git/nixpkgs#python310Packages.torch \
    --command bash -c '
        source $stdenv/setup
        eval "${configurePhase:-configurePhase}"
    '
Executing setuptoolsShellHook
Finished executing setuptoolsShellHook
Sourcing python-remove-tests-dir-hook
Sourcing python-catch-conflicts-hook.sh
Sourcing python-remove-bin-bytecode-hook.sh
Sourcing setuptools-build-hook
Using setuptoolsBuildPhase
Using setuptoolsShellHook
Sourcing pip-install-hook
Using pipInstallPhase
Sourcing python-imports-check-hook.sh
Using pythonImportsCheckPhase
Sourcing python-namespaces-hook
Sourcing python-catch-conflicts-hook.sh
I am in configure!
total 0

So I’m not exactly sure how nix develop ... --configure is supposed to work, since it seems to look for ./configure at ~/git/nixpkgs/configure, while it seems that --unpack is going to put it at ~/git/nixpkgs/source/configure, and unpackPhase puts it at $PWD/source/configure.

Perhaps the most reliable approach is just to run the following:

$ nix develop -i ~/git/nixpkgs#python310Packages.torch \
    --command bash -c '
        source $stdenv/setup
        eval "${unpackPhase:-unpackPhase}"
        cd $sourceRoot
        eval "${patchPhase:-patchPhase}"
        eval "${configurePhase:-configurePhase}"
    '
Executing setuptoolsShellHook
Finished executing setuptoolsShellHook
Sourcing python-remove-tests-dir-hook
Sourcing python-catch-conflicts-hook.sh
Sourcing python-remove-bin-bytecode-hook.sh
Sourcing setuptools-build-hook
Using setuptoolsBuildPhase
Using setuptoolsShellHook
Sourcing pip-install-hook
Using pipInstallPhase
Sourcing python-imports-check-hook.sh
Using pythonImportsCheckPhase
Sourcing python-namespaces-hook
Sourcing python-catch-conflicts-hook.sh
unpacking source archive /nix/store/dxqxfw4r00s0v033w7yam3bkblynrad7-source
source root is source
setting SOURCE_DATE_EPOCH to timestamp 315619200 of file source/version.txt
I am in configure!
total 612
-rw-r--r--   1 n8henrie staff  4103 Jan  2  1980 BUCK.oss
-rw-r--r--   1 n8henrie staff 64723 Jan  2  1980 BUILD.bazel
$ # ... lots of other files, truncated...

Ok, so I think we’re back in business, realizing that we should probably either work from the rootdir of the repo in question or manually cd there beforehand. Note that we can’t necessarily cd $sourceRoot in the context of our --command, because in this case sourceRoot is defined somewhere during unpackPhase, so if we’re not running that phase, sourceRoot is undefined.

Let’s start again, from the beginning:

$ nix develop -i ~/git/nixpkgs#python310Packages.torch \
    --command bash -c '
        source $stdenv/setup
        eval "${unpackPhase:-unpackPhase}"
        cd $sourceRoot
        eval "${patchPhase:-patchPhase}"
    '
Executing setuptoolsShellHook
Finished executing setuptoolsShellHook
Sourcing python-remove-tests-dir-hook
Sourcing python-catch-conflicts-hook.sh
Sourcing python-remove-bin-bytecode-hook.sh
Sourcing setuptools-build-hook
Using setuptoolsBuildPhase
Using setuptoolsShellHook
Sourcing pip-install-hook
Using pipInstallPhase
Sourcing python-imports-check-hook.sh
Using pythonImportsCheckPhase
Sourcing python-namespaces-hook
Sourcing python-catch-conflicts-hook.sh
unpacking source archive /nix/store/dxqxfw4r00s0v033w7yam3bkblynrad7-source

Cool, so far so good.

$ cd source
$ nix develop -i ~/git/nixpkgs#python310Packages.torch \
    --command bash -c '
        source $stdenv/setup
        eval "${configurePhase:-configurePhase}"
    '
Checking if build backend supports build_editable ... done
Preparing editable metadata (pyproject.toml) ... |

This seems to hang here indefinitely. SMH. Adding export sourceRoot=. doesn’t help, nor does re-running patchPhase. Adding eval "${unpackPhase:-unpackPhase}" works, which is a pain since that’s what I’m trying to avoid running over and over again.

For whatever reason export sourceRoot=source works (and running from the parent directory):

$ ls -ld ./source
drwxr-xr-x 82 n8henrie staff 2624 Aug 18 11:03 ./source
$ nix develop ~/git/nixpkgs#python310Packages.torch     --command bash -c '
        source $stdenv/setup
        export sourceRoot=source
        eval "${patchPhase:-patchPhase}"
        eval "${configurePhase:-configurePhase}"
    '
Executing setuptoolsShellHook
Finished executing setuptoolsShellHook
Sourcing python-remove-tests-dir-hook
Sourcing python-catch-conflicts-hook.sh
Sourcing python-remove-bin-bytecode-hook.sh
Sourcing setuptools-build-hook
Using setuptoolsBuildPhase
Using setuptoolsShellHook
Sourcing pip-install-hook
Using pipInstallPhase
Sourcing python-imports-check-hook.sh
Using pythonImportsCheckPhase
Sourcing python-namespaces-hook
Sourcing python-catch-conflicts-hook.sh
no configure script, doing nothing

Phew, finally on to the buildphase!

$ nix develop ~/git/nixpkgs#python310Packages.torch     --command bash -c '
        source $stdenv/setup
        export sourceRoot=source
        eval "${buildPhase:-buildPhase}"
    '
...
FileNotFoundError: [Errno 2] No such file or directory: 'setup.py'
...

Huh, so this expects to already be in source. Let’s try again:

$ cd source
nix develop ~/git/nixpkgs#python310Packages.torch     --command bash -c '
    source $stdenv/setup
    export sourceRoot=source
    eval "${buildPhase:-buildPhase}"
'
Preparing editable metadata (pyproject.toml) ... -

This hangs indefinitely at this step again. I am so frustrated. Leaving sourceRoot undefined or export sourceRoot=. hangs at the same place. Going through the PATH fixes from above and using -i make no difference.

However, if I both export sourceRoot and cd there, we make progress (meaning I probably should go back and rerun patchPhase and configurePhase, since they would have run in the wrong directory):

$ nix develop -i ~/git/nixpkgs#python310Packages.torch \
    --command bash -c '
        source $stdenv/setup
        export sourceRoot=source
        cd $sourceRoot
        eval "${patchPhase:-patchPhase}"
        eval "${configurePhase:-configurePhase}"
    '
Executing setuptoolsShellHook
Finished executing setuptoolsShellHook
Sourcing python-remove-tests-dir-hook
Sourcing python-catch-conflicts-hook.sh
Sourcing python-remove-bin-bytecode-hook.sh
Sourcing setuptools-build-hook
Using setuptoolsBuildPhase
Using setuptoolsShellHook
Sourcing pip-install-hook
Using pipInstallPhase
Sourcing python-imports-check-hook.sh
Using pythonImportsCheckPhase
Sourcing python-namespaces-hook
Sourcing python-catch-conflicts-hook.sh
no configure script, doing nothing
$
$
$ nix develop -i ~/git/nixpkgs#python310Packages.torch \
    --command bash -c '
        source $stdenv/setup
        export sourceRoot=source
        cd $sourceRoot
        eval "${buildPhase:-buildPhase}"
    '
... tries to build...
fatal error: 'google/protobuf/any.h' file not found
#include <google/protobuf/any.h>
... and many other similar errors ..

Oh man, not this again. Running without -i makes no difference.

Maybe the PATH workaround (note the extra ' escaping)?

$ nix develop -i ~/git/nixpkgs#python310Packages.torch \
    --command bash -c $'
        export PATH=$(awk -v RS=: -v ORS=: \'$0 ~ /^\/nix\/.*/\' <<<"$PATH")
        source $stdenv/setup
        export sourceRoot=source
        cd $sourceRoot
        eval "${buildPhase:-buildPhase}"
    '
...
fatal error: 'google/protobuf/any.h' file not found
#include <google/protobuf/any.h>

What is going on here?!

$ rm -rf ./source ./outputs
$ nix develop -i ~/git/nixpkgs#python310Packages.torch \
    --command bash -c $'
        export PATH=$(awk -v RS=: -v ORS=: \'$0 ~ /^\/nix\/.*/\' <<<"$PATH")
        source $stdenv/setup
        eval "${unpackPhase:-unpackPhase}"
        cd $sourceRoot
        eval "${patchPhase:-patchPhase}"
        eval "${configurePhase:-configurePhase}"
        eval "${buildPhase:-buildPhase}"
    '
fatal error: 'google/protobuf/any.h' file not found
#include <google/protobuf/any.h>

Huh. So still stuck at the same error from way up above (which was like a week ago now). Running interactively worked before, so I guess we’ll do that.

$ nix develop -i ~/git/nixpkgs#python310Packages.torch
$ export PATH=$(awk -v RS=: -v ORS=: '$0 ~ /^\/nix\/.*/' <<<"$PATH")
$ source $stdenv/setup
$ export sourceRoot=source
$ cd source
$ eval "${buildPhase:-buildPhase}"
fatal error: 'google/protobuf/any.h' file not found
#include <google/protobuf/any.h>

How about without -i?

$ nix develop ~/git/nixpkgs#python310Packages.torch
$ export PATH=$(awk -v RS=: -v ORS=: '$0 ~ /^\/nix\/.*/' <<<"$PATH")
$ source $stdenv/setup
$ export sourceRoot=source
$ cd source
$ eval "${buildPhase:-buildPhase}"
fatal error: 'google/protobuf/any.h' file not found
#include <google/protobuf/any.h>

How about manually walking through everything, no PATH manipulation, and no -i?

$ cd "$(mktemp -d)"
$ nix develop ~/git/nixpkgs#python310Packages.torch
$ source $stdenv/setup
Sourcing python-remove-tests-dir-hook
Sourcing python-catch-conflicts-hook.sh
Sourcing python-remove-bin-bytecode-hook.sh
Sourcing setuptools-build-hook
Sourcing pip-install-hook
Sourcing python-imports-check-hook.sh
Using pythonImportsCheckPhase
Sourcing python-namespaces-hook
Sourcing python-catch-conflicts-hook.sh
$ eval "${unpackPhase:-unpackPhase}"
unpacking source archive /nix/store/dxqxfw4r00s0v033w7yam3bkblynrad7-source
source root is source
setting SOURCE_DATE_EPOCH to timestamp 315619200 of file source/version.txt
$ cd $sourceRoot
$ eval "${patchPhase:-patchPhase}"
$ eval "${configurePhase:-configurePhase}"
no configure script, doing nothing
$ eval "${buildPhase:-buildPhase}"
fatal error: 'google/protobuf/any.h' file not found
#include <google/protobuf/any.h>

Trying the same approach in ~/git/nixpkgs gives the same file not found error. We showed above that genericBuild runs without this error. What is genericBuild doing that I’m missing?

$ echo $genericBuild
$ # empty output -- it is not defined as a variable, just a function
$ type genericBuild
genericBuild is a function
genericBuild ()
{
    export GZIP_NO_TIMESTAMPS=1;
    if [ -f "${buildCommandPath:-}" ]; then
        source "$buildCommandPath";
        return;
    fi;
    if [ -n "${buildCommand:-}" ]; then
        eval "$buildCommand";
        return;
    fi;
    if [ -z "${phases[*]:-}" ]; then
        phases="${prePhases[*]:-} unpackPhase patchPhase ${preConfigurePhases[*]:-}             configurePhase ${preBuildPhases[*]:-} buildPhase checkPhase             ${preInstallPhases[*]:-} installPhase ${preFixupPhases[*]:-} fixupPhase installCheckPhase             ${preDistPhases[*]:-} distPhase ${postPhases[*]:-}";
    fi;
    for curPhase in ${phases[*]};
    do
        if [[ "$curPhase" = unpackPhase && -n "${dontUnpack:-}" ]]; then
            continue;
        fi;
        if [[ "$curPhase" = patchPhase && -n "${dontPatch:-}" ]]; then
            continue;
        fi;
        if [[ "$curPhase" = configurePhase && -n "${dontConfigure:-}" ]]; then
            continue;
        fi;
        if [[ "$curPhase" = buildPhase && -n "${dontBuild:-}" ]]; then
            continue;
        fi;
        if [[ "$curPhase" = checkPhase && -z "${doCheck:-}" ]]; then
            continue;
        fi;
        if [[ "$curPhase" = installPhase && -n "${dontInstall:-}" ]]; then
            continue;
        fi;
        if [[ "$curPhase" = fixupPhase && -n "${dontFixup:-}" ]]; then
            continue;
        fi;
        if [[ "$curPhase" = installCheckPhase && -z "${doInstallCheck:-}" ]]; then
            continue;
        fi;
        if [[ "$curPhase" = distPhase && -z "${doDist:-}" ]]; then
            continue;
        fi;
        if [[ -n $NIX_LOG_FD ]]; then
            echo "@nix { \"action\": \"setPhase\", \"phase\": \"$curPhase\" }" >&"$NIX_LOG_FD";
        fi;
        showPhaseHeader "$curPhase";
        dumpVars;
        local startTime=$(date +"%s");
        eval "${!curPhase:-$curPhase}";
        local endTime=$(date +"%s");
        showPhaseFooter "$curPhase" "$startTime" "$endTime";
        if [ "$curPhase" = unpackPhase ]; then
            [ -z "${sourceRoot}" ] || chmod +x "${sourceRoot}";
            cd "${sourceRoot:-.}";
        fi;
    done
}

It looks like the buildCommand stuff can be ignored, and phases are not overridden:

$ declare | grep -e '^buildCommand' -e '^phases'

Going through the default phases, it looks like we might be missing a few!

$ declare | grep '^preConfigurePhase'
preConfigurePhases=' updateAutotoolsGnuConfigScriptsPhase'

Let’s see if we can find all the defined phases:

$ type genericBuild | awk -F= '/phases=/ { print $2 }' | awk -v RS=' +' '{ gsub(/[^[:alnum:]]/, ""); print}' | sort -u
buildPhase
checkPhase
configurePhase
distPhase
fixupPhase
installCheckPhase
installPhase
patchPhase
postPhases
preBuildPhases
preConfigurePhases
preDistPhases
preFixupPhases
preInstallPhases
prePhases
unpackPhase
$ phasenames=($(!!))
$ echo "${#phasenames[@]}"
16
$ echo "${phasenames[0]}"
buildPhase
$ for pn in "${phasenames[@]}"; do declare | grep -o "^${pn}"; done | sort -u
buildPhase
checkPhase
configurePhase
distPhase
fixupPhase
installCheckPhase
installPhase
patchPhase
preConfigurePhases
preDistPhases
preFixupPhases
unpackPhase

Wow, it looks like there are a lot of phases that are defined, but which we haven’t been using. So perhaps instead of trying to run all of these manually, if our goal is to avoid unpacking the source every time, we should take advantage of the dontUnpack variable, which is checked prior to running that phase.

$ nix develop .#python310Packages.torch
$ # source already exists:
$ ls -ld ./source
drwxr-xr-x 82 n8henrie staff 2624 Aug 18 12:23 ./source
$ dontUnpack=1
$ genericBuild
FileNotFoundError: [Errno 2] No such file or directory: 'setup.py'

Ah, I keep forgetting about sourceRoot being defined in unpackPhase.

$ nix develop .#python310Packages.torch
$ cd source
$ dontUnpack=1
$ genericBuild
$ # building proceeds

Hooray!

However, in the output we can see sdk version: 13.3 – which means it’s using my local system’s SDK, since that SDK version doesn’t exist in nixpkgs yet:

$ nix eval --json --apply 'builtins.attrNames' nixpkgs#darwin |
    jq -r '.[] | select(contains("sdk"))'
apple_sdk
apple_sdk_10_12
apple_sdk_11_0

Running with -i gives me the same result, so I probably need to clean my path. Maybe I can try again with the --norc approach, now that the phases issue has been figured out?

$ nix develop -i .#python310Packages.torch --command bash --norc
$ source $stdenv/setup
$ cd source
$ dontUnpack=1
$ genericBuild
fatal error: 'google/protobuf/any.h' file not found
#include <google/protobuf/any.h>

Nope, I guess not.

$ nix develop -i .#python310Packages.torch
$ export PATH=$(awk -v RS=: -v ORS=: '$0 ~ /^\/nix\/.*/' <<<"$PATH")
$ cd source
$ dontUnpack=1
$ genericBuild
...
-- MPS: unable to get MacOS sdk version
...

Huh, looking through the nix develop source, I found a new command: nix print-dev-env. Looks handy, seems like it should let you redirect the build environment to a file which you can then source in a regular shell.

Well, having exhausted all the troubleshooting steps I could come up with, I started a thread on the NixOS discourse, and so far nobody has pointed out an obvious mistake in my approach.

For the moment, this seems to work for a rebuild without repeating the “get the source code” step.

$ nix develop -i ~/git/nixpkgs#python310Packages.torch \
    --command bash -c '
        source $stdenv/setup
        dontUnpack=1
        export sourceRoot=source
        cd $sourceRoot
        genericBuild
    '

Unfortunately, this is roughly where our adventure ends. With the below patch, I override xcrun with nixpkgs’ xcbuild.xcrun, which should give it paths to the nixpkgs-based frameworks that I have included in the build inputs, such as MetalPerformanceShaders and MetalPerformanceShadersGraph. Unfortunately, it complains that the SDK is not new enough (requires MacOS SDK >=12.3, nixpkgs currently only provides 11).

diff --git a/pkgs/development/python-modules/torch/default.nix b/pkgs/development/python-modules/torch/default.nix
index 912628bf9497..e022974f5687 100644
--- a/pkgs/development/python-modules/torch/default.nix
+++ b/pkgs/development/python-modules/torch/default.nix
@@ -10,7 +10,7 @@

   # Build inputs
   numactl,
-  Accelerate, CoreServices, libobjc,
+  Accelerate, CoreServices, MetalPerformanceShaders, MetalPerformanceShadersGraph, libobjc, xcbuild,

   # Propagated build inputs
   filelock,
@@ -27,6 +27,9 @@
   # this is also what official pytorch build does
   mklDnnSupport ? !(stdenv.isDarwin && stdenv.isAarch64),

+  # Use MPS on M1 machines
+  mpsSupport ? (stdenv.isDarwin && stdenv.isAarch64),
+
   # virtual pkg that consistently instantiates blas across nixpkgs
   # See https://github.com/NixOS/nixpkgs/pull/83888
   blas,
@@ -190,6 +193,9 @@ in buildPythonPackage rec {
     substituteInPlace third_party/pocketfft/pocketfft_hdronly.h --replace '#if __cplusplus >= 201703L
     inline void *aligned_alloc(size_t align, size_t size)' '#if __cplusplus >= 201703L && 0
     inline void *aligned_alloc(size_t align, size_t size)'
+  '' + lib.optionalString mpsSupport ''
+    substituteInPlace CMakeLists.txt \
+      --replace 'xcrun' "${xcbuild.xcrun}/bin/xcrun"
   '';

   preConfigure = lib.optionalString cudaSupport ''
@@ -294,7 +300,8 @@ in buildPythonPackage rec {
     ++ lib.optionals rocmSupport [ openmp ]
     ++ lib.optionals (cudaSupport || rocmSupport) [ magma ]
     ++ lib.optionals stdenv.isLinux [ numactl ]
-    ++ lib.optionals stdenv.isDarwin [ Accelerate CoreServices libobjc ];
+    ++ lib.optionals stdenv.isDarwin [ Accelerate CoreServices libobjc ]
+    ++ lib.optionals mpsSupport [ MetalPerformanceShaders MetalPerformanceShadersGraph ];

   propagatedBuildInputs = [
     cffi

At this point, I made one more desperate attempt to force it to try to use the nixpkgs MPS libraries, changing only the following from the above diff:

+  '' + lib.optionalString mpsSupport ''
+    substituteInPlace CMakeLists.txt \
+      --replace 'bash -c "xcrun ' 'bash -c "${xcbuild.xcrun}/bin/xcrun ' \
+      --replace '"MPS_FOUND" OFF)' '"MPS_FOUND" ON)'

Re-running everything with this patch shows that it seems to work (it tries to use MPS):

$ rg -i '^--.*( use_| )mps' build.log
29:-- sdk version: 11.0, mps supported: OFF
30:-- MPSGraph framework not found
572:--   USE_MPS               : ON
619:-- sdk version: 11.0, mps supported: OFF
620:-- MPSGraph framework not found
912:--   USE_MPS               : ON

but the build ultimately fails with an enormous number of not-unexpected errors about MPS not having the same API it is expecting, in keeping with it being brought in from a far too old SDK.

Here’s where I leave things for now, a little discouraged, but I have learned a fair amount about the nix build process in the meantime. Hope this is educational to someone out there!

https://n8henrie.com/2023/08/hacking-on-nixpkgs-with-nix-develop/
Write a Firefox Extension in Python
firefoxjavascriptpythontech

Bottom Line: One can write a Firefox extension in (mostly) Python via Pyodide.

Disclaimer: I’m not a huge fan of JavaScript, and I don’t use it much, so I am likely not following best practices. I’ve also never written a Firefox extension, so the below is pretty bare-bones, but hopefully enough to get you off the ground.

With the recent amazing advancements in Python and wasm, brought to us in large part by way of Pyodide (repo) and PyScript (repo), I thought it would be interesting to try to build a Firefox extension in Python.

I found a very helpful Medium article and corresponding GitHub repo for building a Chrome extension in Python, which provided some examples and a framework. I wanted to do things a little differently (for no good reason) – specifically, I didn’t want to rely on an html-based pop-up page, which that project uses to load all the JavaScript files.

I struggled to get PyScript to work in the way I wanted, but I was eventually able to get Pyodide to help me create an extension that contains its own Python wasm runtime (and therefore doesn’t need to load it from their web-hosted version and should be a little snappier to load in some cases).

To try out the toy extension:

  1. clone the example repo, which is at https://github.com/n8henrie/python_firefox_ext.git
  2. inspect (for safety) setup.sh (MacOS / probably Linux) or setup.ps1 (Windows) and afterwards run them; this will download the necessary files from Pyodide so you can embed them in your extension.
    • You can also consider changing the script to download the debug version during development
  3. open Firefox to about:debugging
  4. click the link for This Firefox
  5. click Load Temporary Add-on...
  6. Select the manifest.json from the cloned repo

In short, I found that you can import pyodide.js in your manifest.json using a local path. That defines a function loadPyodide, which can accept an object with an indexURL argument. manifest.json then loads a local JavaScript file, hello.js, which calls loadPyodide with indexURL set to a local path to the rest of the necessary files.

From here, loading and running some Python is a little janky, but seems to work – I just read the contents of hello.py and pass it (as a string) to pyodide.runPython. One reason I wanted to structure things this way is it allows me to use my usual Python workflow to write / edit / lint / format the Python code.

In hello.py, I demonstrate very basic functionality for both a content_script extension, which can modify the content one sees, as well as a background extension, which has access to inspect, open, and close tabs (among many other things). To demonstrate the content_script functionality, the extension sets a red border around the currently open webpage. In manifest.json, I restrict the extension to only run this content script on n8henrie.com, so if you open a page to my site you should see a red border.

For the background script functionality, I print out a list of currently open tabs into the devtools console; to view this, click the Inspect button in Firefox’s about:debugging tab, then go to the Console tab. I also print out the current webpage’s URL, and open a new page to this blog post (which should get a red border).

This was a fun project, if a little frustrating to sort out (given my unfamiliarity with JavaScript). If you have any recommendations or other example projects, I’d love to hear about it in the comments below!

https://n8henrie.com/2023/06/write-a-firefox-extension-in-python/
Persistent VMs on pfSense with vm-bhyve
pfsensefreebsdlinuxnixtech

Bottom Line: Configure your VMs to start and run automatically on pfSense with vm-bhyve.

20251104 Update: Changed /boot/loader.conf to /boot/loader.conf.local thanks to @gmipf, based on https://docs.netgate.com/pfsense/en/latest/config/advanced-tunables.html

If you’re trying to get a Linux VM running under bhyve on pfSense, I strongly recommend that you start with my first post on the topic. Once you have things running interactively, it’s time to try to get the VM to start and run automatically.

vm-bhyve is designed to help make the process a little easier, and it’s available in the default pfSense repo, which is highly convenient.

First off, it’s important to realize that pfSense apparently does not use the standard FreeBSD rc init system; I would have saved a lot of time had I realized this earlier, as it means that the default FreeBSD instructions on this topic, which advise adding a number of settings to /etc/rc.conf, won’t work.

Further, at boot time, you can automatically run scripts in /usr/local/etc/rc.d/, but they must end in .sh and be executable.

With these two facts in mind, the rest wasn’t too difficult. As a reminder, I’m using bash as my shell, and all commands below are being run as root (I’m using $ instead of # in codeblocks below for better markdown syntax highlighting.)

You’ll probably want to keep vm-bhyve’s GitHub page open to references its documentation.

  1. Install vm-bhyve with pkg install vm-bhyve
  2. Using the web interface, configure pfSense to load the necessary kernel modules on boot by adding the following to /boot/loader.conf.local (following the official FreeBSD instructions, though I didn’t need nmdm_load):
    vmm_load="YES"
    if_bridge_load="YES"
    if_tap_load="YES"
    
  3. Configure pfSense to bring up your TAP interface on boot:
    • System -> Advanced -> System Tunables (/system_advanced_sysctl.php)
    • + New
    • Tunable: net.link.tap.up_on_open
    • Value: 1
    • Description: Open TAP on boot for vm-bhyve
  4. This is probably a good time to reboot, which should load / activate the above settings and make sure they are working
  5. Symlink vm-bhyve’s rc script to something pfSense will run:
    $ ln -s /usr/local/etc/rc.d/vm /usr/local/etc/rc.d/vm.sh
    
  6. Edit /etc/rc.conf.local and add vm-bhyve’s config. The contents of mine are the following lines, yours may vary:
    vm_enable="YES"
    vm_dir="zfs:pfSense/vm"
    vm_list="nixos0"
    vm_delay="5"
    
  7. You may need to remove the remnants of the zfs virtual drive created in the prior post:
    $ zfs destroy pfSense/vm/nixos0
    
  8. Create and populate vm-bhyve’s directory structure: $ vm init
  9. Copy the UEFI file we downloaded in the last post to a place that vm-bhyve will look for it:
    $ cp BHYVE_UEFI.fd /pfSense/vm/.config/
    
  10. Next you need to configure the VM
    1. Start by looking through a few of the samples:
    2. Next, I downloaded and then edited the default config:
       $ curl 'https://raw.githubusercontent.com/churchers/vm-bhyve/master/sample-templates/default.conf'  > /pfSense/vm/.templates/nixos.conf
       $ vim /pfSense/vm/.templates/nixos.conf
      

    My config ultimately ended up looking like this:

     loader="uefi-custom"
     cpu=1
     memory=1024M
     network0_type="virtio-net"
     network0_switch="public"
     disk0_type="virtio-blk"
     disk0_dev="sparse-zvol"
     disk0_name="nixos0"
     disk0_size="16G"
     graphics="no"
    
  11. I had trouble with the vm-bhyve console until I configured it to run in tmux; if you’re not a tmux user maybe skip this: $ vm set console="tmux"
  12. Create a “manually” managed switch (since we’ve configured it in pfSense in the prior post, and pfSense will manage it)
    $ vm switch create -t manual -b bridge0 public
    
  13. Create a VM named nixos0 based on your customized nixos template:
    $ vm create -t nixos nixos0
    
  14. Tell vm-bhyve to download the installer ISO:
    $ vm iso https://releases.nixos.org/nixos/22.11/nixos-22.11.2979.47c00341629/nixos-minimal-22.11.2979.47c00341629-x86_64-linux.iso
    
  15. Install in the foreground using your currently active terminal session. Note that immediately after running this command I had to hold down the down arrow key and keep tapping it for 15 seconds or so, during which the entire SSH session seemed be frozen. Afterwards, it comes up with the option to go into Accessibility and redirect its output to the serial console, just like in the last post.
    $ vm install -f nixos0 nixos-minimal-22.11.2979.47c00341629-x86_64-linux.iso
    
  16. At this point, you should be able to get a shell, sudo su to elevate, systemctl start sshd, passwd to set a root password, ip addr to get your IP address, and you’re off to the races! (Don’t forget to add boot.kernelParams = [ "console=ttyS0" ]; if using NixOS, and you’ll need SSH access configured so you can access the machine once it’s booting in the background)
  17. Once you’ve completed your installation, see if your VM comes up (make sure to give it a minute):
    $ vm poweroff nixos0
    $ vm start -f nixos0
    
  18. If that works, try SSH access. If that also works, try rebooting pfSense – your VM should automatically start in the background a few seconds after bootup is complete, at which point you should be able to connect via SSH (check your pfSense logs for the IP address, for which you might want to add a DHCP reservation at this point).
  19. If you’ve made it to this point, everything seems to be working. Congratulations! All that’s left is to make a snapshot of your working VM, which I suppose you should be able to zfs send to another machine, or to which you can roll back if something goes wrong in the future:
$ vm stop nixos0
$ vm snapshot nixos0@booting
$ vm info nixos0
vm info nixos0
------------------------
Virtual Machine: nixos0
------------------------
  state: stopped
  datastore: default
  loader: uefi-custom
  uuid: 0b49b710-e2ce-11ed-b922-00e0672a504a
  uefi: default
  cpu: 1
  memory: 1024M

  network-interface
    number: 0
    emulation: virtio-net
    virtual-switch: public
    fixed-mac-address: 58:9c:fc:03:dc:eb
    fixed-device: -

  virtual-disk
    number: 0
    device-type: sparse-zvol
    emulation: virtio-blk
    options: -
    system-path: /dev/zvol/pfSense/vm/nixos0/nixos0
    bytes-size: 17179869184 (16.000G)
    bytes-used: 1650454528 (1.537G)

  snapshots
    pfSense/vm/nixos0@booting	0	Mon Apr 24 15:25 2023
    pfSense/vm/nixos0/nixos0@booting	0	Mon Apr 24 15:25 2023

I hope you’ve found this useful – I still have a lot to learn, so if you see any major missteps or recommendations for improvement please let me know in the comments.

https://n8henrie.com/2023/03/persistent-vms-on-pfsense-with-vmbhyve/
Running NixOS and Ubuntu VMs on pfSense via bhyve
pfsensefreebsdlinuxnixtech

Bottom Line: With some effort, I got VMs running on my pfSense router/firewall.

Preface

Being a hobbyist without much experience in networking, this project took me a fair amount of effort over a period of weeks. It is mostly proof-of-concept, and I think there are good reasons not to use your firewall to host virtual machines. I’m sure this could impair performanc or even compromise the security and integrity of your network. If you decide to give this a shot, caveat emptor! If you have recommendations for improving the process, please let me know in the comments.

NB: I’m booting the VM via UEFI and using ZFS for storage, so you may need to make adjustments if this is incompatible with your setup.

Spoiler alert: You might want to scroll down and read the Fixing DNS part first, since it requires changing a setting that requires a reboot, and you basically have to start from scratch after a reboot.

Introduction

I started using pfSense firewalls a year or so ago, and I’ve been overall very happy with them. I put one on a pre-built device from Amazon that ran a couple hundred dollars, and after warming up to the configuration and a few power-user options, I bought a used Lenovo M93P for $80 US, designed and printed a custom bracket for a few SSDs, and installed pfSense on a mirrored ZFS root there as well. My internet speeds went from ~300 mbps (I always forget whether that’s supposed to be capitalized or not) with the commercial router I’d been using to a full 1 gbps with the cheaper used hardware setup, which was fantastic! I also have unbound doing local DNS resolution for performance and privacy, pfblocker-ng for network-wide adblocking and improved security, tailscale and wireguard, automatic config backups, bandwidthd, iperf… lots of great stuff.

My only beef with pfSense is that I don’t know FreeBSD as well as I do Linux, so when I want to do something simple like set up a little python service, I’m kind of lost. Because pfSense is somewhat locked down (for security purposes), it’s harder than plain FreeBSD to install freely available FreeBSD packages.

After a year or so of stable performance and no major issues, and having heard good things about the bhyve hypervisor, I thought I would try my hand at installing a Linux VM, which would hopefully let me use my Linux knowledge while still getting the benefits of the pfSense host.

This article will mostly be me trying to adapt the instructions from https://people.freebsd.org/~blackend/doc/handbook/virtualization-host-bhyve.html, which are for FreeBSD but not necessarily pfSense, and troubleshooting issues I found along the way.

Preparation

First of all, there are a few general notes and debugging steps I used along the way that might have saved me a lot of time and effort had I adhered to them from the beginning:

  • If you see it below, 192.168.0.2 is my pfSense router’s LAN address. It’s running a DHCP server and local DNS resolution via unbound.
  • pfSense’s firewall filters and rules do not like it when you change things from the CLI. When it seems like something isn’t working that was just working a minute ago, especially after a reboot, go back to the interfaces page at /interfaces_assign.php, click each interface in question as if to edit its configuration, change nothing, then click Save. Then do the same for the bridge at /interfaces_bridge.php. Then do the same for each relevant firewall rule at /firewall_rules.php. Then go to Status -> Filter Reload (/status_filter_reload.php) and reload the firewall. Several times this process got things working; I think it helps things re-sync after you change things from the command line.
  • Anywhere possible, use tmux sessions to which you can reconnect (tmux -a), since you may end up interrupting your SSH connection repeatedly with all the firewall flushes and rule changes.
    • pfSense: pkg install tmux
    • Ubuntu server: pre-installed
    • NixOS: nix-shell -p tmux
      • You’ll first need to get an IP address and possible need to specify an alternative DNS server by adding e.g. nameserver 1.1.1.1 to /etc/resolv.conf
  • In Ubuntu, don’t forget to disable and flush the firewall when troubleshooting:
    • ufw disable; iptables -F
  • When in doubt, use tcpdump (preinstalled on pfSense and Ubuntu, nix-shell -p tcpdump) on the host and guest to determine if packets are being sent and received as expected. A few useful flags:
    • -i enp0s2: only look at interface enp0s2
    • -XXvv: greatly increase verbosity and show the text content of the packet
    • host 192.168.0.2 and udp: only packets that involve 192.168.0.2 and are udp
    • src 192.168.0.2 and udp: only packets that are from 192.168.0.2 and are udp
    • ether host 00:a0:98:c9:2a:33: filter by mac address

Without further delay, starting in the pfSense CLI:

  1. Follow the FreeBSD.org instructions to ensure your CPU is compatible and the prior bios settings are enabled. My pre-built device was ready to rock, but my Lenovo device did not have the approach bios settings. If the below awk script prints OK you should be set.
    $ awk < /var/run/dmesg.boot '
     /Features2.*POPCNT/ { popcnt=1 }
     /VT-x.*EPT.*UG/ { vtx=1 }
     /VT-x.*UG.*EPT/ { vtx=1 }
     popcnt && vtx { print "OK"; exit }
     '
    
  2. Ensure bhyve is installed: bhyve --help
  3. Follow the freebsd.org instructions to:
    1. Load the kernel module: kldload vmm
    2. Create a TAP device for your VM: ifconfig tap0 create
    3. Enable the TAP device: sysctl net.link.tap.up_on_open=1
    4. Stop here and skip to Creating a FreeBSD Guest. Specifically, do not create the bridge or do any bridge steps from the CLI.
    5. Create a dataset for VMs and a 16gb zvol inside that for this VM’s storage disk, named nixos0 in this case:
      $ zfs create pfSense/vm
      $ zfs create -V16G -o volmode=dev pfSense/vm/nixos0
      
    6. Because – for the moment – these have to be repeated (once) after every reboot, I saved these in a script named prepare.sh:
     #!/usr/bin/env bash
    
     main() {
         kldload vmm
         ifconfig tap0 create
         sysctl net.link.tap.up_on_open=1
     }
     main "$@"
    
  4. Download the ISO image for your distro of choice:
  5. Preparing for UEFI booting was a little tricky, because pfSense doesn’t include edk2-bhyve in its repos. We need this to get a copy of BHYVE_UEFI.fd, which is required for UEFI booting. This was the inspiration for my recent post on installing FreeBSD packages on pfSense; please refer there for the install_from_freebsd function that you’ll need below.
    1. install_from_freebsd edk2-bhyve
    2. Copy the file to a safe place: cp /usr/local/share/uefi-firmware/BHYVE_UEFI.fd .
      • FWIW, mine has the sha256 7f93ab9fbd196c61b4a9e7040e94647b30d23acae14c2157fb015b223a9c8d5d
    3. You can now remove edk2-bhyve, that’s all we needed: pkg remove edk2-bhyve
  6. Using a minimally modified command from the FreeBSD instructions, start the installer image in a VM. Because I ran this many times, I saved it in a script named run.sh. You may need to alter the paths to your installer ISO, to the .fd file, etc.
#!/usr/bin/env bash

bhyve -A -H -P -D \
    -c 2 \
    -m 1024M \
    -s 0:0,hostbridge \
    -s 1:0,lpc \
    -s 2:0,virtio-net,tap0 \
    -s 3:0,ahci-cd,./nixos-minimal-22.11.1705.b83e7f5a04a-x86_64-linux.iso \
    -s 4:0,virtio-blk,/dev/zvol/pfSense/vm/nixos0 \
    -l com1,stdio \
    -l bootrom,./BHYVE_UEFI.fd \
    nixos0
    # Easily copy and paste these above to switch distros
    # -s 3:0,ahci-cd,./nixos-minimal-22.11.1705.b83e7f5a04a-x86_64-linux.iso \
    # -s 3:0,ahci-cd,./ubuntu-22.04.1-live-server-amd64.iso \
  1. ./run.sh and you should see the installer image start booting.
  2. I was unable to complete the boot process for either image initially and had to take an extra step or two to enable serial output:
    • NixOS:
      1. Hit an uninteresting key a few times (like the down arrow)
      2. When able, arrow down to HiDPI, Quirks and Accessibility
      3. From this submenu, choose Serial console=ttyS0,115200n8
      4. Continue the boot process
    • Ubuntu has some weird keybindings, so be careful not to mistype:
      1. Arrow to Try or Install Ubuntu Server
      2. Hit the letter e
      3. Arrow down to the line with linux
      4. Hit ctrl-e to jump to the end of the line (after ---)
      5. Add console=ttyS0
      6. Hit ctrl-x to boot
      7. If you mess up, hit F2 and try again
      8. Once the boot process is complete and you see Continue in rich mode, hit F2 to get a shell
  3. Run ip addr ands note that you probably don’t have an IP address.
  4. As an aside, if you need to reboot the VM, I had to run bhyvectl --destroy --vm=nixos0 from pfSense prior to being able to boot the VM a second time.
Networking

Next we want to give this VM access to the LAN; run the followuping steps from the pfSense web interface. I’ll try to list both the link name as well as the (/url.php) for these, since navigating the nested menus can be tough.

For much of this, I was following this helpful thread on the Netgate forum.

Assign tap0 to an interface
  1. Interfaces -> Assignments (/interfaces_assign.php)
  2. Available network ports: -> tap0 -> Add
  3. Click the new interface to edit its configuration (/interfaces.php?if=opt1, mine was automatically named OPT1)
  4. Check the box to Enable interface
  5. Change description to TAP0
  6. Leave remaining defaults, Save, Apply Changes
Create a bridge with LAN and TAP0
  1. Interfaces -> Assignments -> Bridges (/interfaces_bridge.php)
  2. Add
  3. Select both LAN and TAP0
  4. Save
Create an “allow all” firewall rule
  1. Firewall -> Rules -> TAP0 (/firewall_rules.php?if=opt1, not sure why the URL doesn’t update with the new name)
  2. Add
    1. Interface -> TAP0
    2. Address Family -> IPv4+IPv6
    3. Protocol -> Any
    4. Save and Apply
Test DHCP
  1. Return to your VM and see if you can get an IP address:
    • Ubuntu: dhclient -v enp0s2
    • NixOS: sudo systemctl restart dhcpcd
  2. Hopefully ip addr now shows an IP address on your LAN!
  3. From here, I found it much easier to SSH directly to the VM guest
    • Ubuntu
      1. set a password for root with passwd
      2. enable SSH password authentication for root by changing PermitRootLogin to yes in /etc/ssh/sshd_config
      3. systemctl restart ssh to pick up the new settings
      4. From your main workstation ssh root@your_guest_ip_address
    • NixOS
      1. set a password for root: sudo passwd root
      2. From your main workstation ssh root@your_guest_ip_address
Fixing DNS

At this point, I found that I could:

  • get an IP address via DHCP (in the LAN subnet)
  • ping both internal and external hosts by IP address, including the host pfsense machine at 192.168.0.2
  • send and receive TCP and UDP data with netcat to both internal and external hosts
  • resolve DNS using an external DNS resolver (e.g. host n8henrie.com 1.1.1.1)

However, for some bizarre reason, I couldn’t use my local DNS from the pfSense host:

$ host -4 n8henrie.com 192.168.0.2
;; connection timed out; no servers could be reached

I didn’t see any relevant blocked packets in /var/log/filter.log (or in the GUI), and the weirdest part was that I could see the responses – including the properly resolved IP address:

  • First, in the VM guest, start requesting DNS resolution for n8henrie.com every second with a 1 second timeout: watch host -W1 -4 n8henrie.com 192.168.0.2
  • CLI (from pfSense): tcpdump -i tap0 src vm_ip_address and udp, note requests to resolve n8henrie.com
  • GUI: Firewall -> pfblockerng -> Reports -> DNS Reply (/pfblockerng/pfblockerng_alerts.php?view=reply), note propertly resolved requests to n8henrie.com

Even more strange was that I could see the DNS reply in the VM as well:

  1. Open 2 panes in tmux
  2. Pane 1: watch host -W1 -4 n8henrie.com 192.168.0.2
  3. Pane 2:
    root@ubuntu-server:/# tcpdump -vv -i enp0s2 host 192.168.0.2 and udp
    11:46:49.569313 IP (tos 0x0, ttl 64, id 34767, offset 0, flags [none], proto UDP (17), length 58)
     192.168.0.202.44462 > 192.168.0.2.domain: [udp sum ok] 59526+ A? n8henrie.com. (30)
    11:46:49.576273 IP (tos 0x0, ttl 64, id 30262, offset 0, flags [none], proto UDP (17), length 90)
     192.168.0.2.domain > 192.168.0.202.44462: [bad udp cksum 0x8274 -> 0x871c!] 59526 q: A? n8henrie.com. 2/0/0 n8henrie.com. A 104.21.37.209, n8henrie.com. A 172.67.213.115 (62)
    

I got stuck here for over a week and just could not figure out why DNS resolution was fine from a remote DNS server but not my VM host, with the same behavior in both NixOS and Ubuntu. I tried asking on r/PFSENSE, StackExchange, and the Netgate forums (the last of which I eventually deleted with zero responses in a week or so).

Finally, this morning I took a closer look at the tcpdump output from the guest, increasing verbosity with -XXvv and comparing it to the response for an identical request on one of my other machines (which was working fine with the same DNS server). I noticed a lot of bad udp cksum in the VM, where as the other machine had all udp sum ok.

With a bit of searching, I eventually came across this SO thread, which led me to this article from wireshark.org, and finally I came across https://docs.netgate.com/pfsense/en/latest/virtualization/virtio.html:

With the current state of VirtIO network drivers in FreeBSD, it is necessary to disable hardware checksum offload to reach systems (at least other VM guests, possibly others) protected by pfSense software directly from the VM host.

Sure enough System -> Advanced -> Networking (/system_advanced_network.php), check to disable Hardware Checksum Offloading, reboot, go through the above steps again, and I was delighted to see:

[root@nixos:~]# host -4 n8henrie.com 192.168.0.2
Using domain server:
Name: 192.168.0.2
Address: 192.168.0.2#53
Aliases:

n8henrie.com has address 104.21.37.209
n8henrie.com has address 172.67.213.115
n8henrie.com has IPv6 address 2606:4700:3037::6815:25d1
n8henrie.com has IPv6 address 2606:4700:3037::ac43:d573

Phew!

From here, you should be able to follow your choice of installation guides, such as https://nixos.wiki/wiki/NixOS_Installation_Guide, to install your VM into the zvol you created earlier. Don’t forget to enable serial output (boot.kernelParams = [ "console=ttyS0" ];) in your configuration prior to nixos-install. After going through the install process, you should be able to remove from run.sh the line referencing the installer ISO, run the bhyvectl destroy step, then run run.sh again and you should boot into your installed system.

In a future post I’ll go over using vm-bhyve for a friendlier interface as well as some settings that will persist the VM and configuration across reboots; as is, you’ll have to start from scratch (more or less) after a reboot.

In the meantime, you almost certainly want to go back and tighten up some security settings:

  • turn off SSH password authentication for root
  • rethink your life choices because you’re running a VM on your firewall
  • pick a stronger root password
  • add some additional firewall rules
https://n8henrie.com/2023/03/running-nixos-and-ubuntu-vms-on-pfsense-via-bhyve/
Quickly add FreeBSD Packages to pfSense
techterminalpfsensefreebsdbashtech

Bottom Line: I wrote a bash function to add packages directly from FreeBSD to pfSense.

Disclaimer: Modifying your firewall is a security risk. Please don’t use the information on this page unless you know what you are doing and are willing to accept the consequences for yourself and anyone on your network.

I’ve been happily using pfSense for a year or so now. I am getting more

comfortable with Linux over time but know very little about FreeBSD, on which pfSense is based.

As I explore pfSnse, I occasionally want to add a package from the main FreeBSD repos. Netgate provides instructions on how to add the FreeBSD repos at https://docs.netgate.com/pfsense/en/latest/recipes/freebsd-pkg-repo.html; essentially you change FreeBSD: { enabled: yes } in /usr/local/etc/pkg/repos/FreeBSD.conf and /usr/local/etc/pkg/repos/pfSense.conf.

However, changing this messes with your whole pkg database (ask how I know) and they have a very visible warning that this is generally not a good idea.

They also list another way to install a specific package:

$ pkg add http://pkg.freebsd.org/FreeBSD:11:amd64/latest/All/tshark-3.2.6.txz

This looks much better to me, but unfortunately it’s pretty difficult (or was for me anyway) to figure out the exact path to a specific package. Unfortunately, attempting to browse https://pkg.freebsd.org/FreeBSD:12:amd64/latest/All gives me a 403 Forbidden error, and a directory above that just includes some compressed directories that aren’t really helpful in a web browser.

Thankfuly, the base pfSense install includes a few basic utilities like curl and jq that let me piece together the below function:

install_from_freebsd() {
    pkgname=$1
    base_url='https://pkg.freebsd.org/FreeBSD:12:amd64/latest'
    path=$(
        curl -s "${base_url}/packagesite.txz" |
            tar -xzf- --to-stdout packagesite.yaml |
            jq -r --arg pkgname "${pkgname}" \
                'select(.name == $pkgname) | .path'
    )
    pkg add "${base_url}/${path}"
}

Once you’ve entered bash and defined / sourced the function, it works like a charm:

$ bash
$ install_from_freebsd tshark

Once I verified it was working, I went ahead and put it in ~/.bashrc so it would be available automatically in bash.

https://n8henrie.com/2023/01/quickly-add-freebsd-packages-to-pfsense/