gonsoloblog — GeistHaus

VW ID Kuller

gonsoloblog Jul 25, 2025

So sollte ein VW Bus heutzutage aussehen!

Show full content

So sollte ein VW Bus heutzutage aussehen!

http://gonsoloblog.wordpress.com/?p=165

Extensions

An Update on the Gonzales Renderer

gonsoloblog May 15, 2023

TLDR: Render Moana 20 times faster! I just released v0.1.0 of the Gonzales renderer. After 450 commits it’s time to write a little bit about the progress in the last two and a half years. The main points are: There are still lots of things to do of course: May, 15th, 2023Andreas

Show full content

Moana (1920×800 pixels, 64spp) rendered with Gonzales on an AMD Threadripper 1920x workstation (12 cores/24 threads) with 64GB RAM and 80GB swap space in 78 minutes.

TLDR: Render Moana 20 times faster!

I just released v0.1.0 of the Gonzales renderer. After 450 commits it’s time to write a little bit about the progress in the last two and a half years. The main points are:

Thanks to Embree (and other optimizations) rendering is 20 times faster now. Instead of taking 26 hours like in 2021 gonzales finished after 78 minutes. The detailed timing is:
- Reading: 16m41s
- Building accelerator: 7m54s
- Rendering: 54m19s
- Total: 78m
All scenes from Bitterli’s resources can be rendered now.
A power-based light sampler. This is especially important for the spaceship as there are lots of tiny lights in the screen in the cockpit.
A Debian package for ptex.
Reworked dependencies on Embree, OpenImageIO and Ptex relying on existing Debian packages.
Lots of new materials like CoatedDiffuse, Conductor, Dielectric and Hair to be able to render scenes for PBRTv4.
Adding volume integration (e.g. the Volumetric Caustic from Bitterli).
OpenImageIO for most of texturing, caching and image writing.

There are still lots of things to do of course:

Ray differentials.
Parallel parsing (shouldn’t be too difficult using Swift’s structured concurrency for PBRT’s Import statement).
Getting all PBRTv4 scenes to work (most of them should work, the rest should be relatively easy).
Bump and displacement mapping.
Memory usage goes up when using Embree. I have yet to investigate this. Also mapping materials etc. for Embree is done very quick’n’dirty.
A denoiser.
Better sampling (low-discrepancy, blue noise, progressive multi-jittered)
GPU rendering; this will be probably next on my plate.

May, 15th, 2023
Andreas

http://gonsoloblog.wordpress.com/?p=155

Extensions

Optix in Software

gonsoloblog Mar 3, 2021

There is a course on accelerated Ray Tracing with Optix online (https://github.com/ingowald/optix7course). I decided to go through it, just the other way around; implement Optix in Software. There is some glue code implementing the API and the ray tracing kernel from my own renderer (https://github.com/gonsolo/gonzales). It takes advanted of the fact that Clang can parse… Weiterlesen Optix in Software →

Show full content

There is a course on accelerated Ray Tracing with Optix online (https://github.com/ingowald/optix7course). I decided to go through it, just the other way around; implement Optix in Software. There is some glue code implementing the API and the ray tracing kernel from my own renderer (https://github.com/gonsolo/gonzales). It takes advanted of the fact that Clang can parse the CUDA kernels as pure C which are linked on the fly into a dynamic shared library and dlopen’ed.

Of course it is slow as hell but it was fun doing it. The repository is forked at https://github.com/gonsolo/optix7course. But be warned; it’s not the cleanest code in the universe. 8)

http://gonsoloblog.wordpress.com/?p=149

Extensions

Rendering Moana with Swift

gonsoloblog Jan 14, 2021

TLDR: Render Disney’s Moana scene in less than 10.000 lines of Swift code. After Walt Disney Animation Studios released the scene description of the island in Moana some efforts were started to render it besides Disneys Hyperion. I am aware of the following render engines: Hyperion (naturally) Renderman PBRT Embree/OSPRay A hobby renderer from Joe… Weiterlesen Rendering Moana with Swift →

Show full content

Moana (2048×858 pixels, 64 spp) rendered with Gonzales on a Google Cloud Instance with 8 vCPUs and 64GB of memory in roughly 26 hours. Memory usage is about 60GB. (Denoised with OpenImageDenoise.)

TLDR: Render Disney’s Moana scene in less than 10.000 lines of Swift code.

After Walt Disney Animation Studios released the scene description of the island in Moana some efforts were started to render it besides Disneys Hyperion. I am aware of the following render engines:

Hyperion (naturally)
Renderman
PBRT
Embree/OSPRay
A hobby renderer from Joe Schutte (using Embree)
Moana on RTX (using Optix)
GPU-Mononui (using Optix)

Here I present another one, the Gonzales renderer, written by me. It is heavily inspired by PBRT and written in Swift (with a few lines in C++ to call OpenEXR and Ptex). It is optimized only as far as to be able to render it in a reasonable amount of time on a free Google Cloud instance (8 vCPUS, 64GB RAM). As far as I know this is the only renderer able to render Moana not written in C/C++. I wrote it with vi and command line Swift on Ubuntu Linux and Xcode on macOS so it should be relatively painless to get it compiled on these platforms.

Why Swift?

I was always uncomfortable with header files and the preprocessor in C and C++. From my point of view something (a variable, a function, …) should be declared and defined once, not twice. Also, the textual inclusion of header files brings with it many problems like having to add implementation details to header files (templates come to mind) or slow compilation times by repeated inclusion of headers and its combinatorial explosion. When I started C++ modules were not available so I evaluated Python (too slow), Go (too much like C) and some others but in the end only Rust and Swift were serious contenders. I finally chose Swift because of readability (I just don’t like „fn main“ of „impl trait“). Also, being written by the implementors of LLVM and Clang gave me confidence that it would a) not be abandoned in the future and b) meet my performance goals. In short, I wanted a compiled language, no pointers, modules, concepts, ranges, readable templates, and I wanted it now. Also, compilers were invented to make the life of programmers easier by making programs more readlabe, and sometimes when looking at templated-based code makes me think we are going backwards in time. I like my stuff readable.

Random notes

Parsing went through a few incarnations. First it was a simple String(file.availableData, encoding: .utf8) but that is simply to big to fit in memory. Data was not used for similar reasons. Also Scanner from Foundation was evicted at a time. In the end I settled on a InputStream read into an UnsafeMutablePointer<UInt8> array of 64kB.

The Array dead end; in short, don’t ever use Array in a hot path. That is to say, do not ever generate one. This should have been clear from the beginning since it is heap allocated but the lesson was learned quickly since it always turned up at the top of an analysis done with perf. For fixed-size arrays this can be overcome with tuples or Swift’s internal FixedArray. Even if the Array is only used subscript getters tend to show up at the top of perf runs.

In general, I found it quite practical to develop on Linus and macOS in parallel since the available tools to check for performance and memory nicely complement each other. I used mainly four tools:

Perf: This Linux kernel tool gives valuable information where time is spent. Just fire it up, look at the function showing up at the top and wonder where time is wasted. Hint; it is usually not where you think it is. In my case it was always swift_retain or release which tells you over and over again to not allocate objects on the heap.
Valgrind Memcheck: This shows where the memory is gone. For example, an analysis with this tool is the reason why the acceleration structure is separated from the acceleration structure builder; the memory spent in building a bounding hierarchy was simply never released. It is nice to have no pointers in Swift, no malloc or new, or even shared_pointers, but it is still necessary to think about how memory is used.
Xcode profiling: I mostly used Time Profiler, Leaks and Allocations which gives you roughly the same information as Perf and Valgrind but from a different viewpoint. Sometimes it is very helpful to look at the same thing from two different views. Which reminds me of the old times when we used to feed our software to three different compilers (Visual Studio, GCC and the one from IRIX, what was its name again? MIPSPro?).

Talking about memory, while Swift makes it very easy to write readable and compact code, you still have to think about low-level operations like memory allocations and the like. I frequently switched between structs and classes just to see how memory and performance are affected. The nice thing about not having pointers, new and shared_pointers is that I was able most of the time to just switch between the two without changing anything else in the system.

One tool I didn’t use extensively but which gives nice images is FlameGraph. One thing that can be seen though is that most time is spent in intersection testing for bounding hierarchies and triangles. Things like protocol witness checking do not use much time.

About protocol-based programming: Grepping through todays‘ Gonzales shows 23 protocols, 57 structs, 47 final classes and 2 non-final classes. Inheritance is almost never used. The two remaining non-final classes are TrowbridgeReitzDistribution and Texture, both of which I’m not happy about and think about redesigning them in the future. All in all, protocol-based programming turns out to result in nice code, for example I used to have a Primitive class like PBRT but soon changed it to a protocol inheriting from protocols like Boundable, Intersectable, Emitting (gone now) and others. Now it is gone too, the BoundingHierarchyBuild just depends on a Boundable existential type and returns a hierarchy of Intersectables that is used by BoundingHierarchy. All primitives are now stored as an array of existential types consisting of a composition of protocols of Boundable and Intersectable (var primitives = [Boundable & Intersectable]()).

The primitives in a BoundingHierarchy on the other hand are stored as a [AnyObject & Intersectable]. This has two reasons: 1. Only intersection is needed. 2. AnyObject forces the stored objects to be reference types (or classes) which saves memory since the layout of protocols for both structs and classes (OpaqueExistentialContainer) uses 40 bytes since Swift tries to store structs inline, whereas class-only protocols (ClassExistentialContainer) use only 16 bytes as only a pointer has to be stored as can be seen in Swift’s documentation or verified in the source. I emphasize that this is not only an academic discussion but I came across this since it showed up at the top of a memcheck run.

One of the reasons you can render Moana in less than 10.000 lines is the ability to write compact code in Swift. One extreme example is parameter lists. In PBRT you can attach arbitrary parameters to objects which results in around 1000 lines of code in paramset.[h|cpp]. In Swift you can achieve the same in about three lines:

protocol Parameter {} extension Array: Parameter {} typealias ParameterDictionary = Dictionary<String, Parameter>

Actually, I’m cheating a little bit here but you get the point. (Also, I think this has changed in PBRT-v4.)

About interfacing C++ for Ptex and OpenEXR support: Interoperability with C++ is on the way for Swift but wasn’t available when I started/as of now. Since I’m using OpenEXR and Ptex only for reading textures and writing images I resorted to extern "C". One modulemap and a few lines of C++ code later (100 for Ptex, 82 for OpenEXR) I had support for reading and writing OpenEXR images and Ptex textures.

I am releasing the code now as I am able to render Moana on a Google Compute Engine with 8 vCPUs and 64GB memory which is free for three months, so please download the code, get an account at fire it up. That said, there is a lot to do as I optimized it only as far as to be able to get one image rendered. The following is a big todo list roughly sorted from easily implemented to big projects which I might or might not tackle in the future.

TODO

Ray differentials for direct rays. This should be relatively easy; have a look at how PBRT-v3 does it, implement differential generation in the camera, pump it through the system and use it in the call to Ptex. There it is handled automatically.
Better hierarchies: I only implemented the simplest bounding hierarchy which is nice since it is only 177 lines of code but is also results in suboptimal rendering times. SAH optimized hierarchies should be much better in this regard. They also should not be to difficult to implement since I followed very much PBRT’s implementation.
Faster parsing: Integrate Ingo Wald’s fast pbrt parser which parses Moana in seconds instead of half an hour. Or even better: Write a parser for the pbf format in Swift.
Faster hierarchy generation: This is somewhat slow. Can there be done something about it?
An idea about faster parsing, hierarchy generation and scene formats: LLVM has three different bitcode formats; in-memory, machine readable (binary) and human readable and it can losslessly convert between the three. Can we have the same? Like PBRT (human readable), PBF or USD (machine readable) and BHF (binary hierarchy format) where bounding hierarchies are already generated and can simply be mapped to memory.
Beginner tasks: I only tried to get Moana to render but it should be fairly easy to enhance Gonzales to be able to render other scenes by adding features or fixing bugs. There are lots of scenes to try. Also there are many exporters for PBRT which should work for Gonzales too.
Bump mapping: Should be fairly easy.
Displacement mapping: Not so easy.
Memory: Lots of memory is used for pixel samples as the image is only written when rendering is finished. Change that to write tiles as they are rendered and discard samples early. This interferes with pixel filtering but since we are denoising anyway maybe this is not needed anymore?
Smaller Transforms: As of now Transforms store two matrices, a 4×4 matrix storing the transformation and its inverse. This is a little wasteful since you can always compute one from the other but inversion is slow but after careful thinking when which transform is needed it should be possible to get rid of one. Right now both are used when intersecting a triangle but is it possible to store triangle (and other objects like curves) in world space to get rid of the transformation of the ray into object space and similarly the transformation to world space for surface interactions? And how does this interact with transformed primitives and object instances?
Denoising: I am using OpenImageDenoise for the time being but of course an integrated denoiser in Swift would be nice to have. Also, the beauty, albedo and normal image are written separately, this should be rearchitected.
USD: Write a parser for Pixar’s Universal Scene Description.
Better sampling: Implement discrepany-based sampling or correlated multi-jittered sampling.
Beyond path tracing: Look at PxrUnified and implement Guided Path Tracing (I had a look at it but it looks… confusing) and Manifold Next Event Estimation. I think I saw an implementation somewhere but I forgot. (And if only Weta followed Disney’s lead and published the Gandalf head from that paper, sigh!)
Subsurface scattering. Already in PBRT.
Faster rendering: Embree has a path tracer. Look at it hard and try to make Gonzales faster.
GPU rendering: This should be a big one, PBRT-v4 obviously does this as some of the mentioned renderers above. It should be very well possible to follow them and use Optix to render on a graphics card but I would much prefer a solution not involving closed source. Which would mean that you have to implement your own Optix. $:\$ But looking at how CPUs and GPUs are evolving it might be possible in a distant future to use the same (Swift) code on both of them; you can have instances with 448 CPUs in the cloud and the latest GPUs have a few thousand micro-cpus, they look more and more the same. I wonder whether it will be needed to program for AVX in the future as it seems less needed as you can just throw more cores at the problem. At the same time memory is getting more and more NUMA-like so having your data next to the ALU is getting more important. Maybe one day we have render nodes in the cloud each responsible for one part of the scene, each node partitioning the scene geometrically and sending only portions to the CPUS. Then the returned intersections could simply sorted by the t value of the ray which reminds me of sort-first/middle/last architectures like Chromium.

That’s it for now. I would be extremely happy to receive comments what could be done better or implemented more elegantly, bug reports or even pull requests. Also thanks to Matt Pharr and PBRT, the most valuable resource in the known universe (at least when it involves rendering).

January 14th, 2021.

Andreas

http://gonsoloblog.wordpress.com/?p=111

Extensions

Ein duales Fahrradnetz für Berlin

gonsoloblog Jul 31, 2016

Nachdem der Volksentscheid Fahrrad eingeschlagen hat wie eine Bombe (140.000 Unterschriften) und die CDU im Herbst wohl abgewählt wird (wahlrecht.de), ist es Zeit sich Gedanken zu machen, wie man die Fahrradstruktur in Berlin ausbaut. Dazu hat der Volksentscheid ja auch 10 Punkte identifiziert, die besonders wichtig sind. Jetzt bin ich selbst nicht der größte Fan von Hauptstraßen, deswegen… Weiterlesen Ein duales Fahrradnetz für Berlin →

Show full content

Nachdem der Volksentscheid Fahrrad eingeschlagen hat wie eine Bombe (140.000 Unterschriften) und die CDU im Herbst wohl abgewählt wird (wahlrecht.de), ist es Zeit sich Gedanken zu machen, wie man die Fahrradstruktur in Berlin ausbaut. Dazu hat der Volksentscheid ja auch 10 Punkte identifiziert, die besonders wichtig sind.

Jetzt bin ich selbst nicht der größte Fan von Hauptstraßen, deswegen kann ich mich mit Punkt 2 obiger Ziele nicht ganz identifizieren, deswegen habe ich mir Gedanken gemacht, wie es (noch) besser funktionieren könnte. Wenn ich in Berlin ein neues Ziel ansteuere, konsultiere ich eigentlich immer den Fahrradnavigationsplaner BBBike mit den Optionen „Nebenstraßen bevorzugen“ und „Grüne Wege bevorzugen“. Hierbei habe ich schon sehr oft wunderschöne Nebenstraßen und Plätze entdeckt, die ich noch nicht kannte. Der Zeitverlust dadurch ist mir egal und hält sich auch in Grenzen (bei einer 20-Minuten-Fahrt weniger als eine Minute).

Eine kleine Anekdote dreht sich um das „Loch“ zwischen FT am Friedrichshain und Greifswalder Straße; auf dem Weg von meiner Heimat Friedrichshain zum Fußballspielen im Mauerpark. Wenn man links hinter dem Kino in die Sackgasse hineinfährt, kann man durch ein Loch in der Mauer in den letzten Hinterhof der Greifswalder Straße fahren. Und BBBike kannte dieses Loch. Allgemein kann man sagen, dass die Datenbasis und Routenführung von BBBike einfach sagenhaft gut sind.

Um dem Senat und den Bezirksbürgermeistern eine Datengrundlage zur Verfügung zu stellen, wo man am besten Fahrradstraßen anlegt, habe ich einen kleinen Hack in BBBike geschrieben. Ausgehend von 1000 zufälligen Routen im gesamten Stadtgebiet Berlin konnte ich die Frequenz der benutzten Wege identifizieren. Ausgangspunkt war der Gedanke, dass mich der BBBike immer auf denselben Wegen entlangführt. Aus den 1000 Routen wurde eine Datei mit 165.000 Routenteilstücken, daraus wiederum eine Datei mit 12.000 Kreuzungen, die mehr als einmal (genauer zwischen 2 und 144 Mal) befahren wurden. Von denen habe ich die ersten 5.000 ausgewählt und visualisiert. Die Frequenz dieser Kreuzungen rangiert von 7 bis 144 Durchfahrungen.

Man kann deutlich erkennen, dass BBBike immer dieselben Wege auswählt. Wenn man nur diese Wege für Fahrradfahrer ausbaut, ist schon viel gewonnen. Ein angenehmer Nebeneffekt ist, dass die Autofahrer auf den Hauptstraßen fast völlig ohne Radfahrer fahren können. Im Idealfall muss der Autofahrer am Anfang und Ende der Fahrt nur durch eine oder zwei Nebenstraßen, um auf die Hauptstraßen zu gelangen. Umgekehrt können Radfahrer ungestört auf den Nebenstraßen, am Kanal oder im Park fahren.

Dies sind die Vorteile eine „Dualen Netzes“, auf der Autofahrer und Fahrradfahrer weitgehend unabhängig voneinander fahren können. Gleichzeitig kann man auf den Nebenstraßen die Ampeln abmontieren und rechts-vor-links einführen. Ein Beispiel wäre die Ampel an der Ecke Reichenberger/Glogauer Straße. Wie man auch erkennen kann, führt ein Hauptradverkehrsweg auch am Fraenkelufer entlang, wo sich meiner Meinung nach der schlechteste Rad- und Fußweg Berlins befinden. Auf der Seite des Bürgerbegehrens Fraenkelufer, die sich leider gegen ein Ausbau des Ufers ausgesprochen haben, sieht man das eindrucksvoll auf dem Aufmacherfoto; hoppelige Pflastersteinstraße, quer parkende Autos, kein Radweg, und auch kein Fußweg, der diesen Namen verdient. Und das auf einer der Fahrradhauptstraßen Berlins!

Den (an einem Nachmittag hingerotzten) Quellcode findet man auf Github, aber ich verspreche nicht, dass man ihn versteht, wenn man ihn nicht selbst geschrieben hat.

DualesNetz2

Anhang: Das Duale Netz als PDF.

http://gonsoloblog.wordpress.com/?p=2

Extensions