GeistHaus
log in · sign up

Random ASCII – tech blog of Bruce Dawson

Part of wordpress.com

Forecast for randomascii: programming, tech topics, with a chance of unicycling

stories
Reflections on My Tech Career – Part 2
BugsChromiumFloating PointInvestigative ReportingLinuxPerformanceProgrammingQuadraticSymbolsuiforetwUnicyclingXbox 360xperfcareerCavedogGoogleHumongousMicrosoftValve
This is second and final part of the story of how my career as a software developer unfolded (part 1 is here). In this half I work at four different companies in the Seattle area, make my mark, and then … Continue reading →
Show full content

This is second and final part of the story of how my career as a software developer unfolded (part 1 is here). In this half I work at four different companies in the Seattle area, make my mark, and then retire.

Cavedog/Humongous Entertainment – 1997 to 2002

In 1997 my wife and I got annoyed with being so far from family (we were in Wisconsin, they were mostly in British Columbia) and we wanted to move to the west coast. This time I did a proper job search, talking to at least four companies in Vancouver and greater Seattle. I mostly interview pretty well and that is an important skill. A good job hunt requires a well written resume that reflects some actual achievements, the ability to interview well, and ideally some contacts. This time, as with ASDG and as with every future job, contacts were a critical part of getting the job.

Total Annihilation box artI got hired at Cavedog Entertainment with an unclear mandate or title based significantly on the recommendation of Chris Taylor who I had worked with back in the Distinctive Software days. I showed up as they were shipping Total Annihilation (TA) and they were having some stability problems. I wasn’t sure what I was doing but I eventually contributed by adding a crash reporting system that also included the ability for the game to ingest the crash reports to set up the stack and registers inside itself, like a bizarre crash-dump loading system. I do like that my crash system was purely text based – I could reconstruct crash states from text-only emails.

If a crash dump was passed to TA on the command line then TA would parse it, load the stack and registers from the text file into itself, then hit a breakpoint. After single-stepping past the breakpoint and a ret instruction the debugger would then display the state of the crashed program – call stack and all.

I also created an allocator that would keep individual memory allocations on different pages in order to expose out-of-bounds and use-after-free bugs. I “invented” these debugging concepts, with the quotation marks because I’m not sure when Windows integrated crash dumps, Dr. Watson, and pageheap – the same concepts but productized much better.

Some of the crashes ended up being caused by a buggy sound driver that was corrupting FP registers and by faulty memory on one of our test machines – sometimes it really is somebody else’s fault.

A combination of stress-testing TA locally and analyzing remote crashes in the debugger eventually helped me bring the crash rate down and it was at this job that I started making a career primarily out of fixing code rather than writing it. When programs were crashing, misbehaving, or running slowly I would often be the one asked to investigate and fix the problems, and I was good at this. I gradually lost the ability to create significant new features in consumer products

Putt Putt Saves the ZooMy next career mistake was sticking around too long at Cavedog/Humongous Entertainment, as the companies sunk into bankruptcy. Companies aren’t loyal to employees but they try to encourage employees to be loyal to them, to reduce attrition and depress compensation. It took me a long time to learn this cynical lesson, and I was too loyal to this sinking ship.

When leaving Cavedog/Humongous Entertainment I interviewed at Valve and Microsoft (using personal contacts for the Microsoft job in particular). I withdrew my Valve application before finding out if I’d get an offer and I do sometimes wonder what would have happened if I’d gone there nine years earlier than I actually did. I think I would have missed out on some valuable learning at Microsoft, but I might have become fabulously wealthy. There’s no way to know.

Microsoft – 2002 to 2011

Original XboxAt Microsoft I started in the Xbox group. My job was to help game developers create better games for their console. This involved writing samples and whitepapers, giving talks, visiting developers, and reviewing code, crashes, and performance analysis. I got to contribute to many dozens of games, honing my skills at optimizing and debugging.

Giving these talks for Microsoft plus teaching at DigiPen took me from being pathologically afraid of public speaking to absolutely loving it.

One of my most memorable contributions was Halo 2. I didn’t contribute much to the game. Maybe only one commit. And that was a one-line fix. But that one-line change was enough to make the game run about 7% faster!

Annotated Xbox 360 processorWhen the Xbox 360 was being created I voraciously devoured the CPU manual. I wasn’t assigned the job of CPU expert, I just became the CPU expert and shared my distilled knowledge with my coworkers and other game developers. I gave talks and created CPU pipeline animations to help developers understand this bizarrely finicky CPU.

Reading the CPU documentation multiple times eventually gave me a powerful intuition for how the undocumented details of the CPU must work, and this is what allowed me to find a CPU Design Bug in the Xbox 360 CPU. I discovered – with just weeks to spare – that the xdcbt instruction was so dangerous that having it anywhere in an executable could cause memory corruption, even if it was never executed. You can’t make this stuff up.

The Xbox 360 project was in trouble – it nearly didn’t get finished in time – and the leads were willing to accept any help that was offered. This is how I ended up working on some fascinating aspects of the project:

  • I realized that 4-KB memory pages were dramatically harming performance. That realization led the kernel developers to use 64-KB pages for all code, and I then modified the Windows-based heap (basically orphaned code that nobody understood) to use 64-KB pages
  • I noticed that the CRT math functions (sin, cos, exp, log, etc.) were poorly written for our quirky CPU. I rewrote these functions to make them about five times faster. The faster versions gave bit-identical results, except in the cases where I fixed some correctness bugs. Apparently the original developers were slightly confused about how denormals work
  • I created a hacky tool that would single-step the CPU for an entire game frame and record the instructions executed. I then wrote a tool that would replay these execution traces and look for patterns that would trigger CPU pipeline flushes or other slowdowns. Load-hit-stores, reads from uncacheable memory or 4-KB pages, floating-point comparison branches, and much more
  • I also created a tool (Pgo-Lite) that would use these execution traces to rearrange code for better i-cache efficiency. This simple step made most games run about 7% faster in the Xbox 360’s two-way i-cache (make functions that call each other adjacent – that’s it)

Xbox 360 CPU die - you can see that some cores are farther from the L2 cache than othersPresumably the IBM employees who had designed the Xbox 360 CPU understood how it worked better than anybody else in the world, but they didn’t necessarily understand the implications of how it worked. It turned out that prefetch instructions which missed in the TLB cache were discarded, which is what made 4-KB memory pages uselessly slow. And it turned out that speculative execution of extended-prefetch instructions was the same as real execution (the CPU design bug). I was in the right place – the intersection of CPU internals knowledge and in-the-trenches experience – to realize these two critical implications.

I was sometimes amazed that I was left alone to pursue these madcap projects but at that time at Xbox it was results that mattered. Not only did they let me ship all of these diverse projects, they gave me an excellent review (a coveted 5.0 rating) and let me transfer to London for a year.

It is worth pointing out, however, that while I got a nice bonus and more stocks than normal after that year it was actually not spectacular compensation for helping save a multi-billion dollar project. Compensation at Microsoft is determined more than anything by level and at level 64 (or whatever I was) the biggest bonuses they will give you don’t match “meets expectations” for a level or two above. Just as when playing video games, leveling up is everything. I got a couple of promotions on the back of my Xbox 360 work but in hindsight I realize that I was falling behind.

Aside: one day while unicycling to my job at Microsoft (I was training) I saw a limo pulling in. I rode over to see who was in it and it was Don Mattrick and some other people from my Distinctive Software days. We chatted and I rode away thinking about how little we had all changed. They still loved expensive cars and the trappings of power, and I loved odd commute methods. They have more money. I like to think that I have more fun.

I eventually got bored of the Xbox 360 and moved to the Windows group to focus on performance investigations. It was here that I first learned about Event Tracing for Windows (ETW). ETW allows a wealth of performance information to be recorded on consumer Windows machines.

Windows Performance Analyzer - ETW visualizer

I saw the value in recording performance traces on customer machines when they were encountering issues, and I learned the dark art of how to analyze these traces. This was the most useful skill that I had learned in years. The majority of my blog posts – including the ones that cemented my reputation – would not exist if I hadn’t learned these tools. Suddenly I had the magical ability to understand performance issues that only ever occurred on remote machines. I took this skill to subsequent jobs and it – plus debugging obscure crashes – became the main way that I contributed.

When working with ETW I became acutely aware that while a profiler can tell you how long different parts of a program take, that information is useless without knowing how long these different parts should take. Although the cycle-accurate estimates of the 68000 were long gone it is often still possible to guess when an algorithm has gone quadratic, or when a function is either “too” expensive or called “too” often. Knowing this for your own code is vital. Guessing this for code I’d never even seen the source code for was often my secret sauce.

But, the excessive structure at Microsoft became frustrating. I created a one-line fix that would sometimes make Internet Explorer’s frame rate increase by 10x or more, but landing this trivial change required enduring interminable bogus security reviews. Meanwhile the code analysis team kept filing bugs and wasting developer time without acknowledging that the “bugs” were actually highly speculative. I was chafing at the amount of process and I wanted more freedom.

Teaching at DigiPen

While working at Microsoft I started teaching at DigiPen. I taught CS230 which was officially a 2D graphics course – students wrote a raycaster and did some other basic CPU-rendering projects. In fact a lot of the course was me giving opinionated rants about how to write good C/C++ code, and challenging the students to write clean code that was also efficient. When necessary I would say “am I contradicting your previous instructor? Well here’s the advantages of my method” – confident that they would see the value in what I was doing.

One semester a year paid for most of a family vacation, but most importantly it honed my public speaking skills. I went from being beet red when speaking for two minutes in front of my coworkers to loving giving a three-hour lecture. This skill served me well in my day job and turned into a love of public speaking for fun, which led to this talk.

Valve – 2011 to 2014

My next job was working at Valve Software (based on personal recommendations from several Microsoft people who had moved there). In many ways this job was perfect for me. With no management structure I had the freedom to work on what I saw as valuable. And, Valve’s games were loaded with low-hanging fruit. There was no shortage of quadratic algorithms, obsolete throttling, logic errors, crashes, and memory leaks. I fixed bugs at a higher rate than I had ever managed before, and these were important bugs that were costing money and wasting the time of both developers and players.

One logic bug that VC++’s /analyze found led to one of the L4D2 designers saying “so that’s why we couldn’t tune the difficulty of that one level…” – it was years too late to fix it

I used Visual C++’s /analyze feature to find thousands of code correctness bugs, often in rarely executed code. In one memorable case I found an sprintf statement that was guaranteed to crash if executed. I created a CL to fix this bug on Friday, planning to submit it on Monday. On Sunday there was a power outage at Valve’s data-center servers. While the servers were being brought up many rarely executed code paths were hit, including this one. Steam crashed every time they tried to restart it. Steam remained down for several hours while the bug was investigated and fixed and Steam was recompiled and deployed. If I had found this bug just a few days earlier then Steam would have restarted after this power outage without difficulty. If you couldn’t play games that Sunday morning in 2014 now you know why.

I also fixed or created a lot of processes while at Valve. I worked on the build system and set up source indexing and symbol servers. I even figured out how to do the equivalent of symbol servers on Linux, and wrote some Linux debugger extensions for source indexing. Being able to load a crash dump on Linux and have symbols and source magically appear (just like on Windows!) was pretty amazing.

I also ran into some serious dysfunction at Valve. I filed an HR complaint about bullying by one of my coworkers (name withheld). I found that there were already three outstanding complaints against this person – including one from the director of HR. Years later all the complainants have left and the “brilliant jerk” remains.

IMG_3077On the other hand, the yearly company trips to Hawaii were definitely a lovely bonus.

One of my favourite investigations was when I was asked to look at memory consumption on CounterStrike servers. I found an unused global variable that was consuming 50 MB of memory per server process, and I found map ID mismatches that were wasting 20 MB every time a new game started. Those two fixes saved a huge amount of memory and greatly increased the number of server processes that could run per machine. Success.

But while I was on the live servers doing investigations I decided to look into some server processes that were consuming a lot of CPU time. They were consuming all of one core. I fired up perf and found that they were spinning in an eleven instruction loop. Forever. At first I assumed I was misunderstanding the data. I was new to perf and me being wrong was the most likely explanation for what seemed like impossible behaviour.

Synthesized recreation of server CPU usage over three monthsIt turns out that there was a pathfinding bug. Every now and then the pathfinding algorithm would generate a set of nodes that created a loop and the game would traverse it forever. Player connections would time out and the server would keep spinning – wasting CPU time for no reason. Over time more and more server processes would hit this trap and the percentage of server CPU time going to this one loop would increase. The only reason this had never been noticed is that every month the server machines would be rebooted. The CPU usage graphs were hilarious with the daily usage variations overlaid on the inexorable climb caused by this bug.

After fixing this series of bugs and thereby increasing CounterStrike server capacity by more than anyone had dreamed possible, I got another mediocre review. I’d been doing the best work of my career and Valve didn’t care. You can’t make a company value you so my only option was to leave.

Blogging – 2009 to present

While at Valve I started finding interesting issues that I felt compelled to share – and was allowed to share – so I started publishing more frequently on my blog. It started out being some pretty dry explanations of VC++’s /analyze, but I started branching out. A whole series on floating-point, advice on 64-bit porting, symbol servers, Linux, etc.

My first investigative reporting post was this one from 2011, and dozens more of this type followed. These were the ones that got the most readers, and got me noticed. But more importantly they were the ones that I most enjoyed writing. I found great joy in leading my readers through the twists and turns of solving an elaborate mystery, trying to explain the intricate details of quadratic algorithms, lock contention, and zombie processes.

imageIt was also satisfying when one of my posts reached a large audience. It was extremely satisfying whenever one crossed 100,000 views, and some of them received many more views in Russian and other translations.

This blog was incredibly valuable during the job search that landed me at Google. And it was also surprisingly valuable within Google because often my manager and my peers would know about what I was doing because they read it on my blog. It was a brilliant form of self promotion.

But these aren’t the reasons I wrote the blog, and given how long it took to reap these benefits (my initial audience was basically zero) it would have been challenging to maintain motivation for these reasons. Instead I wrote because I wanted to. I had knowledge and opinions that I wanted to share with the world – that I needed to share with the world – and it felt good to get this out of my system.

I was reminded of this motivation when talking to my father. He loves music but he loves it in an esoteric way. When he tries describing it to “normal” people their eyes often glaze over, and so he has an unresolved need to share his passion. My work also has the problem that if I try explaining it at a dinner party in any detail then I will not be invited back. The blog gives me an audience of the small set of people around the world who do think my work is interesting, so that when I am at a social gathering I can talk about more interesting topics like long-distance unicycling.

Google – 2014 to 2024

This time I did my best job search ever. I talked to about twenty coworkers and ex-coworkers. I asked them for details about their compensation and most of them shared. For the first time I learned how much money was potentially available, and I wanted some of it. I hadn’t been paid badly in the past, but it was clear to me that there was the potential to be paid better. All else being equal, I wanted that.

I ended up talking to about ten companies (having ex-coworkers at the companies submit my resume) and doing formal interview loops with Microsoft, Facebook, Amazon, and Google. I interviewed well and got offers from all four. My blog had had some “hits” at this point and I’m sure my increased visibility from this was helpful. Companies hate hiring an unknown quantity and my blog proved some of my abilities.

I also remember one particular interview, at Google, with Steve Yegge. Steve asked me about the various steps a compiler goes through when consuming a source file. One of those steps was lexing, and I didn’t know that, ‘cause I missed compiler class due to failing out of university. But while I didn’t know that about compilers there was a lot that I did know, and I managed to change the subject to link-time-code-generation. I spent a good chunk of the interview explaining how this works and it’s non-obvious benefits and I suspect that I got a glowing review based on this lucky save.

I made sure that all of the companies knew that I was talking to the others, to ensure that they would give their best offers. Then I negotiated for a little bit more, because why not? I also used the rhetorical device of “this offer looks great to me, but my spouse is really concerned about the cost of sending our kids to university so…” – it doesn’t hurt.

imageI agonized over the choice for a while until I realized that Google offered the shortest commute, had an on-site gym and free food, the best compensation, the most promising job opportunities, and an engineering culture that I admired. Done.

Google lived up to their promise. I was given the freedom to focus my attention where it seemed worthwhile, which included chasing bugs that I discovered, along with those sent to me by my peers. I found serious bugs in Windows, Visual Studio, Chrome, and a host of third-party software. Working at Google/Chrome scale was an amazing experience – it exposed issues that would normally never be seen. Just a few of the hundreds of issues I had the pleasure of uncovering or fixing include:

I stayed at Google for ten years, continuing my pattern of not shipping any features. Google was a great fit for me because they let me be self directed (unlike my later years at Microsoft) and they appreciated my work (unlike Valve).

If you stay too long at one company then your compensation may stagnate, because companies are happy to pay you below market rates if you do nothing to prevent this. In my case my sign-on stock grant had become quite valuable over my first four years as Google stock rocketed upwards and when those monthly grant vests stopped after four years my compensation dropped significantly. Even though I had received refresher grants ever year the economic reality was that that sign-on grant was bigger, and granted at a lower stock price, and stock grants had become the majority of my compensation. I was still being paid well, but I knew that I could get a bump back up by moving somewhere else and getting a new sign-on grant.

I didn’t want to leave Google, and they didn’t want to give me more stock. Unless they had to. So the advice I was given was to interview elsewhere, get an offer, give it to my manager and see what happened. You can disapprove if you want, but this is the game that you must play if you want to maximize earnings. I interviewed at <company X> that I did not want to work at, got an offer, and Google played their part in this charade by making a counter offer which I happily accepted. I could have done this game again when the second large stock grant finished vesting, but I lacked the stomach for it and I was almost done at that point anyway.

Retirement – 2024 to ???

After my wife died in 2023 I reassessed my life and decided that I wasn’t enjoying work enough anymore. I wasn’t finding amazing problems to solve as frequently. I had cut back my hours to improve my work-life balance but this meant that meetings – which continued at the same rate – became a larger percentage of my work time. I was working from home in a new city (we moved back to Vancouver in 2022) and I wanted to focus my time on hobbies and making friends, rather than working alone. I talked to my financial adviser and they advised me that I could stop if I wanted to. I took a three-month leave of absence as a trial run, loved it, and quit on my ten-year anniversary. It’s been over a year now and, to the surprise of me from five years earlier, I don’t miss work at all.

In hindsight I got out at an excellent time. I have no enthusiasm for AI and I avoided having to learn it, or compete with it. I’m loving the freedom to play tennis whenever I want and take vacations without counting the days.

Ruminations

As I realized that I preferred fixing code rather than writing code I realized that this meant that it was critical that I work at a large company. With the big teams at Microsoft and Google there were always enough new and interesting issues being created to keep me busy. That plus the huge number of customers and the importance of improving the build pipeline meant that I could always add value. If I could make Chrome build five percent faster, or make Chrome use a few percent fewer CPU cycles, or crash slightly less frequently then I could justify my existence (and sometimes the change was much more than a few percent) by improving the experience for large numbers of coworkers and enormous numbers of users. In a smaller company I would have needed to buckle down and write some features – and I’d mostly forgotten how to do that. The correct company size is a very personal choice but for me “large” was the only option.

My desire to focus on large companies meant that a startup was never going to happen, and that’s fine. The theoretical riches of startups almost never pan out – the expected value is really not worth it.

At my first couple of programming jobs I worked long hours due to pressure from management. This was probably okay because it helped me master my craft a bit faster. As I progressed through my career I mostly pulled back on the hours worked. I don’t know exactly how much I was working at the end, and it did depend on how interesting a puzzle I was solving, but somewhere around 40 hours a week I think. The more I got paid, the less I worked. Or vice-versa.

The point of that, I guess, is that you should be sure that you don’t sacrifice vacations, family time, and hobbies to your job. Taking the time to have friends and a diverse range of activities is crucial.

Skills learned

Software development requires constant reinvention. New languages are a given, but so are new techniques, so plan to stay open to keeping up as long as you are working. Some skills that I learned over the years include:

  • C, C++, Python, assembly languages (6502, 68000, PowerPC, x86, x64, ARM)
    In addition to learning these languages I learned, at Google, that with enough test coverage, code reviewing, and pattern matching it is possible to fix bugs in languages that you do not know at all
  • While I never learned parsing, lexing, or other compiler implementation skills I did learn how to “think like a C compiler” which turns out to be extremely useful when understanding how to make a compiler generate efficient code. A lot of this comes down to realizing what optimizations the compiler can do, and which (perhaps due to aliasing) it cannot.
  • Version control, various types
  • Windbg and Visual Studio arcana
  • ETW, perf, various other profiling tools
  • Various Xbox and Xbox 360 profiling tools
  • So much more

Lessons learned:

  • Interview practice matters. If you’re not naturally good at interviews you will be at a disadvantage but you can try to mitigate that
  • Talk to other developers about compensation. You can read compensation guides all day but it’s more useful to talk to developers at your level and above and ask them how they are being paid
  • Don’t stay too long at any one job. Or, at least, don’t go too long without interviewing
  • Keep learning. Always. New languages, tools, techniques, everything.
  • Have fun, both in work and outside of work. Don’t save all your enjoyment for retirement
brucedawson
Total Annihilation box art
Putt Putt Saves the Zoo
Original Xbox
Annotated Xbox 360 processor
Xbox 360 CPU die - you can see that some cores are farther from the L2 cache than others
Windows Performance Analyzer - ETW visualizer
IMG_3077
Synthesized recreation of server CPU usage over three months
image
http://randomascii.wordpress.com/?p=4211
Extensions
Reflections on My Tech Career – Part 1
FractalsProgrammingcareereducationinterviewinglearning
I’ve been lucky enough to have had a successful career as a software developer. Spanning six companies and thirty-seven years I’ve had the opportunity to work on Elastic Reality, Xbox, Windows, Steam, Internet Explorer, dozens of games, and Chrome, and … Continue reading →
Show full content

I’ve been lucky enough to have had a successful career as a software developer. Spanning six companies and thirty-seven years I’ve had the opportunity to work on Elastic Reality, Xbox, Windows, Steam, Internet Explorer, dozens of games, and Chrome, and create a blog whose investigative reporting and ETW tutorial articles have been read over five million times.

I was recently explaining to a friend how I got my start in this industry and I decided that I should write it down, if only to clarify in my mind how it all happened. I don’t necessarily recommend my career path, but there may be other people out there for whom it is the right way to go. Or, more likely, perhaps there will be a few ideas in here that could inspire somebody’s career path or help them avoid mistakes.

Here goes.

This is first part of the story of how my career as a software developer unfolded (part 2 is here). In this half I learn my craft, work at my first couple of tech jobs, and travel.

Learning to Program – 1983 to present

The most important initial detail is that I am a university dropout. During second-year university I stopped going to school. I lacked a support network of friends and was probably depressed and lonely and the more classes I skipped the harder it got to resume attending. I was also busying myself with disassembling the Apple ][ ROMs and other experimental programming projects. At the end of the year UBC politely told me that I was not permitted to continue attending. Oops. My parents were horrified.

Luckily through some convenient nepotism (thanks Lance!) I got an unskilled union desk job in a time when living in Vancouver was affordable. I made friends at work and got out of my lonely and depressed stage. The job paid well enough that I didn’t need to work a full 40 hours so I had lots of spare time.

I bought an Amiga computer and spent all my free time reading the Amiga, 68000, and 68881 reference manuals, C language tutorials, and whatever other sources of information there were in the pre-internet mid-80s. My dream was to become a video-game programmer. I spent hours on my Amiga and also continued programming on my 6502-based Apple ][ and at some point around 1986 I interviewed for a job at a local game-programming company called Distinctive Software.

They turned me down. I didn’t have enough experience or education.

At this time I discovered the Mandelbrot set. This is an infinitely detailed mathematical object that can be fascinating and beautiful to explore, but imagedoing so was very compute intensive, especially on circa-1986 hardware. It was particularly slow if you used software floating-point to do the calculations, which all of the explorer programs were doing. I immediately knew that by discarding the flexibility of a floating-point exponent and using fixed-point numbers (exactly three bits to the left of the binary point) and by coding in assembly language I could make things run at least an order of magnitude faster. Note that this project and my subsequent one were both done in collaboration with a friend but I’ve omitted any other mention of him to simplify the story – sorry Steve!

Estimating

At this point I leveraged a skill that would serve me well throughout my career. I wrote up what the basic assembly-language calculation would look like, precisely estimated how many clock cycles it would take for that loop to execute (easy in those in-order processing days), estimated how many loops would need to execute to render an initial image, and on the back of an envelope calculated that the first image should take about fifteen seconds. That was enough faster than the many minutes for other programs that I knew that I had a viable product before I typed in the first line of code. Here is the critical inner loop for calculating the Mandelbrot set to 16 bits of precision, with cycle timings as comments:

loop16
         LSL.L   #4,zi                   ;4
         SWAP    zi                      ;4
         ADD.W   ci,zi                   ;2
         SUB.L   isquared,zr             ;2
         LSL.L   #3,zr                   ;4
         SWAP    zr                      ;4
         ADD.W   cr,zr                   ;2
entrypoint16
         MOVE.W  zi,isquared             ;2
         MULS    isquared,isquared       ;28
         MULS    zr,zi                   ;28
         MULS    zr,zr                   ;28
         MOVE.L  isquared,temp           ;2
         ADD.L   zr,temp                 ;2
         CMP.L   four,temp               ;2
;;;;    DBHI    counter,loop16          ;6/10/6
; Technically should be DBHI, thus keeping 2.0 in the loop,
; but +2.0^2 doesn’t quite fit.
         DBCC    counter,loop16          ;6/10/6
;;;;    —————                 ———-
;;;;    13 instructions                 116 cycles, 84 from three multiplies

My original cycle timings would have been for the 68000 processor but all the source-code copies I can find have updated timings that appear to be for the 68020. The multiplies, in particular, would have been 38+2*n cycles (where ‘n’ depends on the bit patterns) and the total time would have been somewhere around 200 clock cycles. 68000 instruction timings can be found here.

When I got an initial version of my program running and the first image took over a minute to render I trusted my estimates and knew that there must be something wrong with my code. I quickly found that the WritePixel() calls were consuming the extra 45 seconds so I learned how to batch those up and I got the predicted performance. This served as a powerful lesson in the value of knowing how fast something could theoretically run. Only then can you know when no further speedups are possible.

Arthur C ClarkeAs an aside, Arthur C. Clarke briefly became obsessed with fractals so I have a letter from him asking for a copy of MandFXP. He then mentioned it in the sources and acknowledgments section of his odd Fractal/Titanic cross-over book, which led to a few extra sales.

Arthur C Clarke MandFXP Mention - croppedHere’s the mention from Clarke’s The Ghost From The Grand Banks. Don’t try contacting me at that address. I gave up that PO box 30 years ago.

Clarke’s letter included his phone number so I called him once on a dare at 2 am (mid-afternoon in Sri Lanka) and we briefly spoke, talking about fractals and 2001: A Space Odyssey.

I also wrote high-precision math routines to allow zooming arbitrarily deeply (more or less) into the Mandelbrot set.

Aside: the 68000 version of my high-precision code could accumulate a 16×16 multiply result every 80 clock cycles. In 2011 an x64 processor could accumulate a 64×64 multiply result every 3 clock cycles. That’s 16x as much work per block, running 26.67x as fast per clock, running at 446x as high a clock speed, for a throughput increase of over 190,000 times. Plus more speedup from multiple cores, so call it a million to one. Ain’t progress grand?

1987_06 Info mergedOnce I had a Minimum Viable Product called MandFXP (Mandelbrot FiXed Point) I took it to the local Amiga User’s Group meeting to demonstrate it. The reaction told me that I had a hit on my hands, and I did indeed turn the MVP into a shareware product that I sold many copies of. But that wasn’t the most important outcome.

A game programmer from Distinctive Software (hi Mike!) was at the meeting. He saw my demonstration and recognised that there was some raw talent behind this. I was now a much safer bet than when I had interviewed a year or so earlier and a job offer was quickly made. This was a good thing because at this point in my life I was unemployed.

CygnusEd

Around the same time I had written a text editor, CygnusEd, for Amiga programmers. This text editor had blazing fast text rendering, smooth scrolling, and some user-interface improvements that I felt were important. I’d been so confident in its success that I’d quit my job to focus fulltime on promoting it, and it turned out I was a terrible promoter. Although CygnusEd eventually developed a cult following and strong sales, in 1987 it was a commercial flop. It sold poorly so when this job offer showed up it was particularly fortuitous.

Cirque de SoleilBut first I had a choice to make. In addition to being a software developer I was an amateur juggler and unicyclist. I harboured dreams of turning pro and I had applied to go to the circus school in Montreal. Around the same time that I got the job offer I also got a circus-school acceptance letter.

I spoke no French, I had no savings, and (in hindsight) I was a strictly mediocre juggler while I was a fairly promising programmer. I chose the game-programming offer, and thank God. I would have been a failed circus performer, so programming was definitely the correct choice.

Distinctive Software – 1987 to 1990

undefinedAt Distinctive Software I worked on Test Drive, Test Drive II, and Grand Prix Cycles. The Amiga was powerful enough (compared to the PC, Apple ][ and Commodore 64 which also ran these games) that I could code mostly in C and still end up with games that were more colourful and ran faster than the other platforms. I even played around with the Amiga’s custom Copper chip to program the video display to do beautiful 60 fps transitions between menu screens while using almost no CPU time – the closest I ever came to programming a classic game console.

While working at Distinctive Software I continued to develop CygnusEd. I used it at work, as did many of my coworkers, and it slowly attracted more attention. Eventually it caught the attention of ASDG, a publisher of Amiga software. They worked with me to create a higher-quality manual and got CygnusEd in stores. They raised the price and I started getting royalty cheques. I wasn’t making a huge amount of money at Distinctive Software (about the same as my union job) and sometimes the monthly royalty cheques from CygnusEd were larger than my monthly salary.

Travel – 1990 to 1992

SC2_250I’d been intending for a long time to try my hand at travelling the world, inspired by my ex-girlfriend (hi Heather!). My goal was to backpack around the world for two years. In early 1990 I suddenly realised that I’d saved up enough money to make this dream a reality. The only problem was that I’d just met a woman who I kinda sorta liked, and I didn’t like the idea of choosing between this trip and her. And so it was that about four months after we had met that I asked Helen to quit her job, sell her car, give up her apartment, and put all of her stuff in storage. Totally normal stuff. Amazingly enough she agreed and by the end of the year we were in Sydney, Australia.

ASDG continued sending me royalty cheques while we were traveling. This was quite frustrating for them because it meant that they were funding my continued travels while what they really wanted was for me to come home and release a new version of CygnusEd.

We traveled for two years. I mention this because it’s important to remember that a successful career isn’t all there is to life. I’m sure that dropping out of the job market for two years cost us a fortune but it also gave us amazing memories which sustained us for decades. We saw natural and human wonders that fed our souls and after two years of traveling – as planned – we came home to resume normal life (including getting married and having two kids).

ASDG/Elastic Reality – 1993 to 1997

When we returned to Vancouver we could have gotten our old jobs back but that didn’t seem particularly adventurous so at this point I did the world’s worst job search. I contacted the only other software company that I knew – ASDG – and asked for a job. They said yes, they helped me with the immigration paperwork, and we moved to Madison, Wisconsin. I call this the world’s worst job search because it apparently never occurred to us to look for other companies and do a proper series of interviews. I should have talked to Microsoft, for instance, but I did not. Maybe I wasn’t ready, maybe things would have gone worse, but it was pretty sloppy to not even try.

At ASDG I learned Windows, C++, and version control – until then I had no experience with any of these things. Yes, in the late 80s it was possible to ship five different software products without using version control. It was a crazy time.

1995_12 Amazing Computing Amiga 14During this time I also created another fractal program, because why not? Actually two others. The first was called Mand2000 and it was for the Amiga. I wanted to integrate one of the key ideas from CygnusEd – smooth transitions between states – into fractal exploration. Roughly speaking I felt it was important that when you zoomed in on the image it should animate the zooming of the image and then recalculate. This was a radical idea at the time, and very difficult to make run fast enough on the bit-plane graphics architecture of the Amiga – the blitter chip couldn’t scale graphics. I ended up writing C/C++ code that on-the-fly generated machine code for each magnification level. Fun stuff. I actually roughed out the idea while traveling. Apparently I sent a nerdy friend (hi David!) a letter that included a description of the idea, some hand-written 68000 assembly language, and (of course) some timing estimates. Always be coding.

I also did progressive rendering with 8×8 blocks on the first pass to give a low-res overview of the fractal images far faster than the total rendering time. Again, a fairly unusual idea at the time.

1997_06 PC Format 14 - croppedThe other fractal program was called Fractal eXtreme and it was my first solo project for Windows. It had all the same bells and whistles of the previous versions, plus the ability to create “zoom movies” (key frames that a custom player could interpolate, with speed adjustable after rendering) to make it more efficient than ever before to create long fractal movies. My two most popular YouTube videos are both Fractal eXtreme zoom movies. I just watched the 4K version and it still blows my mind that the ridiculous complexity of this video comes from an algorithm that can be expressed in two lines of code.

Amazingly enough Fractal eXtreme (with updates for multi-core and 64-bit) still sells about a dozen copies a year, almost thirty years later.

While still at ASDG I also created a new version of CygnusEd. In total these four independently created programs (MandFXP, Mand2000, Fractal eXtreme, and CygnusEd) probably brought in about $200,000, spread out over decades. This was particularly vital money when I wasn’t making much, but the experience I gained by hacking away on these home projects, and the reputation, were even more valuable.

r/vintagecgi - Avid Elastic Reality (Warping and Morphing Software), Avid Media Illusion (Digital Nonlinear Compositing Software) and Avid Matador (Paint/Rotoscoping/Motion Tracking & Image Stabilization) [Late 90s - Early 2000s]While at ASDG the main product that I worked on was a morphing program called Elastic Reality. This product was so important to ASDG that they eventually renamed the company for it, and then sold the company to AVID. It was used in many movies and TV shows and won a technical Academy Award. The Windows version (not including the underlying morphing engine that was shared between platforms) was created by two of us and, though I didn’t know it at the time, it was basically the last time that I wrote significant amounts of new code to implement a consumer product. From then on my career moved towards developer tools and fixing of code rather than writing it.

For most of my time at ASDG/Elastic Reality I wasn’t making much more than I’d made at Distinctive Software, which had been about the same as at my unskilled union job.The fractal and CygnusEd income helped, but I definitely wasn’t making the Silicon-Valley stock-option money that I’d heard rumours about. This is because employers are happy to pay you as little as possible if you don’t force their hand with competitive interviewing. Well, that and the fact that I was still an inexperienced software developer.

This post is already quite long so I’m going to do the rest of my career as part 2.

Hacker news discussion is here.

Blue sky discussion is here.

brucedawson
image
Arthur C Clarke
Arthur C Clarke MandFXP Mention - cropped
1987_06 Info merged
Cirque de Soleil
undefined
SC2_250
1995_12 Amazing Computing Amiga 14
1997_06 PC Format 14 - cropped
http://randomascii.wordpress.com/?p=4184
Extensions
Finding a VS Code Memory Leak
BugsCode ReliabilityDebuggingInvestigative ReportingmemoryProgrammingRantsETWhandlesleaksVS CodeWindows
In 2021 I found a huge memory leak in VS code, totalling around 64 GB when I first saw it, but with no actual limit on how high it could go. I found this leak despite two obstacles that should … Continue reading →
Show full content

In 2021 I found a huge memory leak in VS code, totalling around 64 GB when I first saw it, but with no actual limit on how high it could go. I found this leak despite two obstacles that should have made the discovery impossible:

  1. The memory leak didn’t show up in Task Manager – there was no process whose memory consumption was increasing.
  2. I had never used VS Code. In fact, I have still never used it.

So how did this work? How did I find an invisible memory leak in a tool that I have never used?

This was during lockdown and my whole team was working from home. In order to maintain connection between teammates and in order to continue transferring knowledge from senior developers to junior developers we were doing regular pair-programming sessions. I was watching a coworker use VS Code for… I don’t remember what… and I noticed something strange.

So many of my blog posts start this way. “This doesn’t look right”, or “huh – that’s weird”, or some variation on that theme. In this case I noticed that the process IDs on her system had seven digits.

That was it. And as soon as I saw that I knew that there was a process-handle leak on her system and I was pretty sure that I would find it. Honestly, the rest of this story is pretty boring because it was so easy.

You see, Windows process IDs are just numbers. For obscure technical reasons they are always multiples of four. When a process goes away its ID is eligible for reuse immediately. Even if there is a delay before the process ID (PID) is reused there is no reason for the highest PID to be much more than four times the maximum number of processes that were running at one time. If we assume a system with 2,000 processes running (according to pslist my system currently has 261) then PIDs should be four decimal digits. Five decimal digits would be peculiar. But seven decimal digits? That implies at least a quarter-million processes. The PIDs I was seeing on her system were mostly around four million, which implies a million processes. Nope. I do not believe that there were that many processes.

It turns out that “when a process goes away its ID is eligible for reuse” is not quite right. If somebody still has a handle to that process then its PID will be retained by the OS. Forever. So it was quite obvious what was happening. Somebody was getting a handle to processes and then wasn’t closing them. It was a handle leak.

The first time I dealt with a process handle leak it was a complicated investigation as I learned the necessary techniques. That time I only realized that it was a handle leak through pure luck. Since then I’ve shipped tools to find process-handle and thread handle leaks, and have documented the techniques to investigate handle leaks of all kinds. Therefore this time I just followed my own recipe. Task Manager showed me which process was leaking handles:

And an ETW trace gave me a call stack for the leaking code within the hour (this image stolen from the github issue):

The bug was pretty straightforward. A call to OpenProcess was made, and there was no corresponding call to CloseProcess. And because of this a boundless amount of memory – roughly 64 KiB for each missing CloseProcess call – was leaked. A tiny mistake, with consequences that could easily consume all of the memory on a high-end machine.

This is the buggy code (yay open source!):

void GetProcessMemoryUsage(ProcessInfo process_info[1024], uint32_t* process_count) {
  DWORD pid = process_info[*process_count].pid;
  HANDLE hProcess;
  PROCESS_MEMORY_COUNTERS pmc;
  hProcess = OpenProcess(PROCESS_QUERY_INFORMATION | PROCESS_VM_READ, false, pid);
  if (hProcess == NULL) {
    return;
  }
  if (GetProcessMemoryInfo(hProcess, &pmc, sizeof(pmc))) {
    process_info[*process_count].memory = (DWORD)pmc.WorkingSetSize;
  }
}

And this is the code with the fix – the bold-faced line was added to fix the leak:

void GetProcessMemoryUsage(ProcessInfo& process_info) {
  DWORD pid = process_info.pid;
  HANDLE hProcess;
  PROCESS_MEMORY_COUNTERS pmc;
  hProcess = OpenProcess(PROCESS_QUERY_INFORMATION | PROCESS_VM_READ, false, pid);
  if (hProcess == NULL) {
    return;
  }
  if (GetProcessMemoryInfo(hProcess, &pmc, sizeof(pmc))) {
    process_info.memory = (DWORD)pmc.WorkingSetSize;
  }
  CloseHandle(hProcess);
}

That’s it. One missing line of code is all that it takes to waste tens of GB of memory.

The bug was found back when I still used Twitter so I reported my findings there (broken link, cached copy found in the wayback machine) and somebody else then filed a github issue based on my report. I stopped using twitter a couple of years later and then my account got banned (due to not being used?) and then deleted, so now that bug report along with everything else I ever posted is gone. That’s pretty sad actually. Yet another reason for me to dislike the owner of Twitter.

The bug was fixed within a few days of the report. Maybe The Great Software Quality Collapse hadn’t quite started then. Or maybe I got lucky.

Anyway, if you don’t want me posting embarrassing stories about your software on my blog or on bsky then be sure to leave the Handles column open in Task Manager and pay attention if you ever see it getting too high in a process that you are responsible for.

Sometimes I think it would be nice to have limits on resources in order to more automatically find mistakes like this. If processes were automatically crashed (with crash dumps) whenever memory or handles exceeded some limit then bugs like this would be found during testing. The limits could be set higher for software that needs it, but 10,000 handles and 4 GiB RAM would be more than enough for most software when operating correctly. The tradeoff would be more crashes in the short term but fewer leaks in the long term. I doubt it will ever happen, but if this mode existed as a per-machine opt-in then I would enable it.

brucedawson
http://randomascii.wordpress.com/?p=4154
Extensions
Acronis True Image Costs Performance When Not Used
Investigative ReportingPerformanceuiforetwxperfacronisprocess enumerationshell extensions
Over two years ago I installed Acronis True Image for Crucial in order to migrate my data to a new SSD I had just purchased. It worked. I then left True Image installed “just in case”, and what harm could … Continue reading →
Show full content

Over two years ago I installed Acronis True Image for Crucial in order to migrate my data to a new SSD I had just purchased. It worked. I then left True Image installed “just in case”, and what harm could that possibly cause.

Well, funny you should ask.

I recently noticed that whenever I plugged or unplugged my external monitor Explorer.exe would consume a lot of CPU time – dozens of seconds of it. It was enough CPU time to make my computer noticeably sluggish until things calmed down which could take 15+ seconds. “That’s odd” is how most of my investigative reporting starts so I grabbed an ETW trace and drilled in. It didn’t take long to find the culprit.

Aside: I have worked with Acronis to help them understand this issue and they have provided a mitigation and have said that they plan to address the problem in the next release of their software. See “Workarounds and fixes” for details.

In the trace Explorer.exe was using 44 s of CPU time over a 16 s time period (from 7.0 s to 23.0 s in the trace) which is way too much:

image

I opened up CPU Usage (Sampled) to investigate. The CPU usage was distributed across dozens of unnamed threads so I hid the Thread ID column and the Thread Name column in order to group all the threads together and drilled down:

image

I quickly found that windows.storage.dll!CFSFolder::_GetOverlayInfo was consuming a large chunk of the time (20,191 of the 42,299 samples), and most of that was in a call to an unknown function in tishell64_26_0_39450.dll. I temporarily ignored the question of who owned that DLL while I first tried to understand what it was doing.

If you want to follow along you can download the trace I’m looking at and load it into Microsoft’s Windows Performance Analyzer (WPA).

The CPU Usage (Sampled) data works by interrupting all running CPUs 1,000 times a second (by default) and grabbing call stacks. This makes it a powerful tool for understanding where CPU time is being spent. You can read more about how to use this information in Xperf for Excess CPU Consumption.

The 20,191 samples with CFSFolder::_GetOverlayInfo on the stack suggest that approximately 20 s of CPU time was consumed inside that function and its descendants (on that call stack). Approximately 6.6 s of that is in Process32NextW (and its descendants) and approximately 3.1 s in CreateToolhelp32Snapshot (and its descendants). I don’t have symbols or source for the tishell64 DLL but I know what those two Windows functions do so I’ll start with those.

CreateToolhelp32Snapshot grabs a snapshot of system data that could include a list of processes, threads, modules, heaps, etc. Process32NextW is one of the functions used to iterate through the snapshot and its presence tells us that TH32CS_SNAPPROCESS was specified. So, the tishell64 DLL is grabbing a list of running processes and iterating through that list.

The CPU Usage (Sampled) data gives you an approximation of how much time is spent in different call stacks but it cannot differentiate between a small number of expensive function calls and a large number of cheap calls. That is, I couldn’t tell whether CreateToolhelp32Snapshot and Process32NextW were expensive, being called too frequently, or a bit of both.

I decided to investigate this by attaching the Visual Studio debugger to Explorer.exe and setting a breakpoint on kernel32.dll!CreateToolhelp32Snapshot. I set this as a conditional breakpoint that would only halt after being hit one billion times because I didn’t actually want Explorer.exe to halt in the debugger – I just wanted Visual Studio to count how many times the breakpoint was hit. The breakpoint settings looked like this:

image

Debugging Explorer.exe made me nervous because if Visual Studio’s debugger tried to invoke some Explorer.exe functionality while Explorer was halted at a breakpoint then I could end up with a deadlock. But, it worked! I had to tell Visual Studio not to stop on imagesome sort of COM exception that Explorer.exe was throwing, but after that things went smoothly. Visual Studio doesn’t update the hit count while the debuggee is running so after plugging or unplugging my external monitor I would use Debug-> Break All to temporarily break into Explorer.exe to see the count.

Results varied but with my setup (three Explorer windows open) I would see anywhere from 1,200 to 3,000 hits on the CreateToolhelp32Snapshot breakpoint:

image

With no Explorer windows open I would still see 44 hits on the breakpoint, so 44 calls to CreateToolhelp32Snapshot. Now, without symbols or source code for the tlshell64 DLL I can’t say what is going on but I will say that I don’t understand why a shell extension would need to get a list of running processes even a single time. That sort of functionality is useful for debugging and development tools but it seems unusual – downright strange in fact – for it to be called in this context.

Calling CreateToolhelp32Snapshot once is strange. Calling it up to 3,000 times because a monitor is plugged or unplugged is the problem.

How many times CreateToolhelp32Snapshot is called seems to depend on how many Explorer Windows are open (I use three) and perhaps on how many icons are visible (my Downloads folder shows many) and then the cost of CreateToolhelp32Snapshot presumably depends on how many processes are running. With my system under its normal load of processes I saw this path consuming up to 32 CPU seconds in explorer.exe when I unplug my external monitor.

The total cost from the tishell64 DLL is greater than this, however. I noticed the tishell64 DLL on some other call stacks so I used WPA’s View Callers By Module feature to group all samples with the tishell64 DLL present on the stack:

image

This showed that the actual cost from the tishell64 DLL was somewhere around 26 CPU seconds in my initial trace. In one torture-test trace the total cost from the tishell64 DLL was more than 60 CPU seconds! That is an enormous amount of CPU time for just unplugging my external monitor.

Who dunnit?

Now it’s time to find out who owns the tishell64 DLL, although the title of this blog post is a bit of a spoiler.

In WPA’s Graph Explorer I expanded System Activity and then Images and then double clicked on Lifetime By Process, Image, which gives me this view:

image

There are 370 (!!!) DLLs loaded into Explorer.exe so I dragged the File Version column to the left of the Image Name column, sorted by that column, and then expanded explorer.exe and the blank and <Unknown> file versions to get this view:

image

That’s 11 DLLs that are lacking file version information. This strikes me as very sloppy – who would ship all these DLLs with this important information missing? I added the Image Path column and now we can see where all of these DLLs live:

image

In particular the tlshell64 DLL is located in “C:\Program Files (x86)\Acronis\TrueImageHome” and we can therefore assume that it is published by Acronis as part of Acronis True Image, and running sigcheck on it verifies this.

Why is Process32NextW expensive?

The ETW profiling data that showed me that Process32NextW() is consuming lots of CPU time also lets me see exactly where it is spent. It’s mostly in mapping and unmapping sections, and some page faults (probably from this). Maybe it could be faster, but optimizing it is almost certainly the wrong place to spend resources. It just shouldn’t be called this frequently, and if it was called a thousand times less frequently (very practical) then its performance wouldn’t matter.

In other words, I don’t care why it is expensive.

Workarounds and fixes

I was able to reach out to Acronis and talk to one of their representatives about this issue. It was a slow process (time zones!) but they shared symbols for the tishell64 DLL to help us understand what was going on. Now that they are aware of the process-enumeration issues they plan to address the problem in the next release.

Until then the process-enumeration code can be disabled by deleting the following registry key:

Computer\HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\
Explorer\ShellIconOverlayIdentifiers\     AcronisDrive

Note that the key name has five spaces at the start of it.

The other tishell64 costs seem to have already been fixed in the latest versions of Acronis True Image. However if you are using the version from Crucial then all of the performance issues are still there. I tested with the most recent version on Crucial’s website and all of the issues are present. I have reached out to Crucial. Their first response: “We have not encountered any customers experiencing issues with the Acronis free version available on the Crucial website.” After a bit more pushing I got “Our team is looking into this matter, and we will provide any relevant updates as soon as possible.” – here’s hoping.

Personally I have mitigated these performance issues in the simplest and most effective way – I uninstalled Acronis True Image. If you are running the version distributed by Crucial or some other potentially out-of-date version then I recommend this.

Missing metadata

I ran sysinternals’ sigcheck on the 11 DLLs with no version listed and… 10 of them have the Publisher listed as Microsoft, and one as Acronis International GmbH.

These files are also missing Product Name, Company Name, and Product Version in the ETW fields and much of this information is also missing from the sigcheck output. My tests were on Windows 10 but Windows 11 still shows 8 Microsoft DLLs in Explorer.exe that don’t have File Version filled out for ETW to record. There really should be automated checks to make sure that appropriate metadata is added to the bits that Microsoft ships. I’ve reported this before but it’s not clear that there has been any progress.

Acronis also needs to fix this missing metadata, but really, that is the least of their problems. What Acronis needs to do is to either iterate through the list of running processes orders of magnitude less frequently or, better yet, not at all.

Conclusion

Acronis True Image is iterating through a list of running processes many times – sometimes thousands of times – whenever my monitor is plugged or unplugged. It probably does this same wasteful iteration in other situations as well. This iterating wastes dozens of seconds of CPU time, wasting battery life and making my computer sluggish while this is happening. That’s the bug.

The version of Acronis True Image that is distributed by Crucial was last updated July 20, 2020 (as of September 16, 2025). It’s five years old. I have attempted to contact Crucial about this to get them to update their software but I don’t have high hopes. I guess I’ll just have to keep uninstalling it as soon as I’ve used it.

Don’t look behind the curtain

Attaching a debugger to Explorer.exe feels like looking into a filthy basement that is being used for packaging of medical supplies. It seems like Explorer should be clean and tidy because otherwise we risk having our computers be unstable but the reality is a busy stream of C++ and COM exceptions and warnings about unsupported interfaces and invalid window handles. The exceptions may be working-as-intended but I am dubious about the other errors. Here is some typical debug spew that I saw when I had the debugger connected:

Exception thrown at 0x00007FFF89C4B699 in explorer.exe: Microsoft C++ exception: Platform::COMException ^ at memory location 0x0000000002CED670.
Exception thrown at 0x00007FFF89C4B699 in explorer.exe: Microsoft C++ exception: [rethrow] at memory location 0x0000000000000000.
Exception thrown at 0x00007FFF89C4B699 in explorer.exe: Microsoft C++ exception: [rethrow] at memory location 0x0000000000000000.00007FFF53D62633: (caller: 00007FF7BC39E620) ReturnHr(159) tid(b138) 80070578 Invalid window handle.
     Msg:[Platform::Exception^: Invalid window handle.
pcshell\shell\applicationframe\frame\lib\titlebar.cpp(5549)\ApplicationFrame.dll!00007FFF533D74AE: (caller: 00007FFF533EEA09) ReturnHr(10) tid(4ae4) 80070490 Element not found.
shell\lib\gitreglist.cpp(53)\twinui.dll!00007FFF534B6AEA: (caller: 00007FFF534FD9B8) ReturnHr(133) tid(678) 80004002 No such interface supported

I also noticed in the File I/O graph that Explorer.exe creates (opens) “C:\Program Files (x86)\Acronis\TrueImageHome\ti_managers_proxy.dll” 4,663 times during one of its busy times, representing 79% of its Create calls. I’m not sure what’s going on here and I’m not sure it actually “matters” but this seems wasteful. I am guessing that this is being done by the Acronis code but I didn’t actually check.

Discussion

Hacker news discussion is here

brucedawson
image
image
image
image
image
image
image
image
image
http://randomascii.wordpress.com/?p=4143
Extensions
Google Maps Doesn’t Know How Street Addresses Work
BugsRantscartographyGoogleGoogle Mapsmaps
(or actually they do, but they don’t use this knowledge effectively) Update, April 26, 2025: the address fix for W 6th Ave is live, mostly. Going forward I wish that Google Maps would make it harder to get bad data … Continue reading →
Show full content

(or actually they do, but they don’t use this knowledge effectively)

Update, April 26, 2025: the address fix for W 6th Ave is live, mostly. Going forward I wish that Google Maps would make it harder to get bad data into maps, I wish they would respond to feedback faster, and I wish they would make their estimates for when changes go live more accurate (two weeks versus 24 hours).

I was driving around Vernon, BC a few weeks ago and I asked Google Maps for directions to 3207 30th Ave. It confidently told me where to go but luckily my passenger noticed that it was actually directing me to 3207 34th Ave, four blocks north. Well that’s odd.

A few days later my cousin asked me (as the ex-Google still-nerd member of the family) if I could help with a Google Maps issue. The problem was that the address 138 W 6th Ave in Vancouver was being mapped at a location 2.4 km (that’s 1.5 miles or 12 furlongs) away from the actual location.

I could visualize the absurdity of where it maps the W 6th Ave address by asking Google Maps for directions between 136 W 6th Ave and 138 W 6th Ave. These addresses are adjacent in real life, but Google Maps gave me this:

image

That’s a long walk to get to the building next door.

There’s another fun way to visualize this bug. Search for “Clark & Page Casting Studios” in Google Maps. Then copy its address, shown in Google Maps, to the clipboard and ask for directions to Clark & Page Casting Studios from its address. This should be a zero-meter walk, but of course it isn’t. Instead it is, no surprise, a 2.4 km walk from Clark & Page Casting Studios to its address. Fun!

Or this silliness. If you navigate from “138 W 6th Ave Unit 1B” to “138 W 6th Ave #2b” then it is, you guessed it, a 2.4 km walk.

This error was pointed out to me because apparently aspiring actors kept going to the wrong place and being late for their auditions. These mistakes have real-world consequences.

There are more

Finding one error is curious, but two suggests a pattern. I started browsing Google Maps looking for addresses that seemed out of place. I quickly found three more.

1951 W 19th Ave in Vancouver is mapped at a 2.1 km walk from where its address should logically be. It should be in the 1900 block of W 19th Ave but is instead placed ten blocks away by Google Maps:

image

1355 W 17th Ave, North Vancouver is a particularly odd case because it is mapped as being in the wrong city (in Vancouver instead of North Vancouver), but on the right street (W 17th Ave) but in the wrong block (the 900 block instead of the 1300 block). As it turns out W 17th Ave doesn’t actually exist in North Vancouver. What is going on?

Typos? Street View?

The answer might be typos. 138 W 6th Ave is being mapped at the location where I would expect to find 1038 W 16th Ave located – a pair of single-digit errors. This requires that somebody/something made two errors when entering the address for 1038 W 16th Ave. The problem with this explanation is that 1038 W 16th Ave doesn’t exist – I cycled over there to check and the addresses go straight from 1020 to 1040.

3207 30th Ave in Vernon got a 30 changed to a 34. Maybe that was a typo?

1951 W 19th Ave is mapped where I would expect to find 951 W 19th Ave. This is another single-digit error. This one is less harmful because (again, I cycled over to check) there is no 1951 W 19th Ave, and 1951 and 951 W 19th Ave both map to roughly the same place. If you ask for directions from 951 to 1951 W 19th Ave (which should be ten blocks) you get these 0.0 km directions:

image

1355 W 17th Ave, North Vancouver is harder to explain. It was mapped adjacent to 979 W 17th Ave, Vancouver. This error severely stretches the definition of “typo” since nothing but the street name is correct (Vancouver and North Vancouver are different cities, separated by Vancouver Harbour).

I also noticed an anomaly in 5 Montcalm St, Vancouver. This address is in the 1300 block of Montcalm so the address makes no sense. I visited this location as well and the building address is actually 1131 W 16th Ave (the house is on a corner) and there is a five on one of the doors on the Montcalm side. Further creeping around the house revealed that there are five units inside the house – the five is a unit number, not a street number! Now I started wondering if a person or AI had seen the five on the door on Montcalm St and assumed that it was an address.

PXL_20250424_173954222

Internals guesswork

The fact that Google Maps can have these errors – that apparently the mapped location of addresses need have no relationship to the layout of the city’s streets – makes it clear that Google Maps has no concept of how street addresses work. There are many rules for how most addresses work in Vancouver but Google Maps appears to have no knowledge of these rules.

It appears that there is an address database somewhere – created by Google Maps, or the cities in BC, or perhaps from Street View data. Somehow that database seems to allow addresses to be mapped to parcels of land and when the address of a parcel of land is entered (by a human being or an AI bot) the database software happily accepts any address and maps it to the parcel, with no sanity checks to make sure it makes sense. Possibly sanity checks that are needed include:

  • Is the parcel in the geographical bounds of the city name entered?
  • Is the parcel in the vicinity of the road name entered?
  • Is the parcel in the correct hundred block for the road name entered?

These checks would detect all five of the errors that I found.

The hundred-block check only makes sense in some cities. In others it might be better to just do a comparison with nearby numbers, or perhaps skip that check completely. And there are enough weird addresses in the world that these checks probably just have to be a suggestion rather than a hard blocker.

Since there are apparently a lot of these bad addresses in the wild (my ability to find five errors in two cities this quickly suggests there must be many thousands) it seems that somebody needs to run a batch process over the database to find these errors – me scrolling through the map really doesn’t scale well.

While it seems clear that Google Maps uses an address database to map arbitrary addresses to parcels of land, it is also capable of guessing where an address would be if that address existed. That is, if I ask it to map the non-existent addresses 1953, 1955, 1957, 1959, and 1961 on W 19th Ave it places the address balloon in plausible locations, interpolating between 1947 and 1981 (the surrounding “real” addresses). This suggests that Google Maps has the knowledge and heuristics needed to correctly place 138 W 16th Ave, but this knowledge is then overridden by a database that contains errors. Fun!

Something new?

I talked to the business at 138 W 6th Ave and they said that these problems are new – starting around mid March. I don’t remember noticing this type of error before so it does seem like Google Maps might have just ingested a batch of bad data.

Attempted fixes

When I encountered the first two errors I confidently said that I’d use the Google Maps feedback tool to get the errors fixed. I’ve had good luck in the past with this. But this time my luck ran out.

I dutifully submitted feedback for “Wrong pin location or address”:

image

And I got an email the next day saying that my edit was accepted:

image

But it’s been 14 days and the address still maps incorrectly.

I had better luck with my edit to 3207 30th Ave that was accepted the same day. That fix actually went live sometime between April 17th and April 23rd. That is still nowhere near the promised 24-hour latency, but at least it showed up eventually. Maybe the 138 W 6th Ave edit will still go live?

Not all errors are equal

The first two errors that I found – 3207 30th Ave in Vernon and 138 W 6th Ave in Vancouver – are problematic because those addresses are real and Google Maps plots them incorrectly. This leads to people going to the wrong place.

The other errors are less important because they are non-existent addresses that are plotted in nonsensical places. This is mostly harmless.

Anybody else seeing this?

If you have noticed any similar anomalies then please share them in the comments.

If you work on Google Maps please reach out to me if you have any information that you can share. I’ve tried reaching out through some ex-coworker friends, but no luck so far.

Discussion

https://bsky.app/profile/randomascii.bsky.social/post/3lnlwmoayks2s

https://news.ycombinator.com/item?id=43788832

https://www.reddit.com/r/GoogleMaps/comments/1k77440/google_maps_doesnt_know_how_street_addresses_work/

brucedawson
image
image
image
PXL_20250424_173954222
image
image
http://randomascii.wordpress.com/?p=4110
Extensions
What this blog is about
FunInvestigative ReportingProgrammingQuadraticsummary
I’ve recently told a few people that I write, that I have a blog, and then I try to describe what I write about. I’m kinda proud of some of the stuff that I’ve covered here on randomascii over the … Continue reading →
Show full content

I’ve recently told a few people that I write, that I have a blog, and then I try to describe what I write about. I’m kinda proud of some of the stuff that I’ve covered here on randomascii over the years but I struggle when trying to summarize it to a non-technical audience. So here goes:

I’ve got a few human interest stories such as sharing my grief in 2024, sharing my loss of my right ear in 2016, and sharing the fun of commute-challenge 2017 (2017 video here) and commute-challenge 2018 (2018 video here), plus the time I spoke about the commute challenge at Ignite Seattle. I’ve also got some quirky posts such as how to make a warm weather snowman, how to unicycle faster, and the time I unicycled a really long way. And I’ve got a summary of my entire career (part one and part two).

But what about the investigative reporting, the revealing of hidden stability or performance problems (usually in Windows) that I’m particularly proud of discovering? What about the bug fixes that have happened because of my work? How can I share (brag?) about those to my non-technical friends?

I decided to try writing one-paragraph summaries of some of the stories that I’m most proud of, that made an impact, and which can just possibly be understood by non-experts. Here goes.

Everybody on Windows has some desktop icons so this first issue is potentially relevant to all Windows users:

Tweet asking why explorer keeps hanging on a fast computerIn 2021 I saw a post on twitter (back when I still used twitter) from a software developer who had a powerful computer that would frequently hang for 10 seconds or much longer. It sounded interesting so I investigated. It turns out that Windows had a bug where it would spend a lot of time rearranging icons on the desktop. If you doubled the number of icons it would spend four times as much time. This is called a quadratic algorithm and it meant that Windows Explorer collapsed under its own weight if you had just a few hundred icons, and the person reporting the problem had about a thousand – actually images that they had dropped there. The hilarious thing was that this would happen even if you had your desktop configured to not show desktop icons! This was reported to Microsoft and they have fixed it in Windows 11. My investigation means that most users are now safe from this problem. Here’s the full writeup that explains how I identified what the problm was – note that I was able to scientifically analyze the problem despite the fact that it was happening on a different computer on a different continent.

This next problem was some rare gmail hangs I noticed, and the fix ended up saving hundreds of MB of memory for all gmail users on Windows:

imageI had a very powerful computer (24-core CPU) that was mostly idle and yet I found that Chrome/gmail would frequently hang for several seconds at a time. I eventually traced this to a performance bug in Windows that Chrome and gmail were tickling, which caused problems when our IT department ran a scan. I made some changes to Chrome so that it wouldn’t tickle the bug and the problem went away. Microsoft also fixed their performance bug, but since my tweak also saved hundreds of MB of memory we kept it. And once again Chrome users were protected from weird performance issues. Here is the full writeup for “24-core CPU and I can’t type an email” which ended up being read over 125,000 times.

The next two problems are particularly esoteric. One was a bug deep inside Windows that many developers people had hit but I was the first to correctly diagnose, and the other was a Windows performance problem that was affecting Chrome developers – that blog post is one of my most popular:

imageFor many years Chrome’s build system on Windows – the thing that turns source-code into a version of Chrome that you can run – would fail about 3% of the time. This is not how computers are supposed to work. Similar failures were happening at other companies but nobody was able to understand the problem well enough to do anything. Through some combination of persistence and luck and good intuition I realized that the crashes were caused by a disk-caching bug deep inside the Windows kernel. I worked with a friend at Microsoft to gather more information and he was able to find and fix the exact problem, making high-performance Windows computers around the world more reliable. Here is the full writeup for “Compiler bug? Linker bug? Windows Kernel Bug!

Left block is process creation, devil horns to the right are process destructionFinally, my first “big hit” was an article I wrote when I noticed that when I was using my extremely powerful computer to build Chrome I often couldn’t even move my mouse, despite the fact that the machine was barely 50% busy. My machine was made useless, and I knew that this wasn’t how things were supposed to work. It turns out that destruction of processes that load gdi32.dll causes heavy contention on the same lock that is needed to update the mouse position. I know that sounds like gobbledygook but I don’t know how to get rid of any more of the jargon. The good news is that we were able to work around the issue by being very careful not to load gdi32.dll into the many processes we create when building Chrome, and this resolves the issue. Microsoft also slightly reduced their overhead. Here is the full writeup for “24-core CPU and I can’t move my mouse” which has been read almost 300,000 times, making it my second most popular blog post ever.

And in the number one spot…

My number one post, at over 400,000 readers and still read 20,000 times a year, is a recipe book for different ways to compare numbers on a computer to see if they are “close”. Doing this well is surprisingly tricky.

brucedawson
Tweet asking why explorer keeps hanging on a fast computer
image
image
Left block is process creation, devil horns to the right are process destruction
http://randomascii.wordpress.com/?p=4095
Extensions
Find me on bsky
Uncategorized
I used to really enjoy the other microblogging site but it became too much of a democracy-destroying disinformation hell site so I haven’t been there in a long time. I’ve moved to bsky – I’m https://bsky.app/profile/randomascii.bsky.social. Follow me there for … Continue reading →
Show full content

I used to really enjoy the other microblogging site but it became too much of a democracy-destroying disinformation hell site so I haven’t been there in a long time.

I’ve moved to bsky – I’m https://bsky.app/profile/randomascii.bsky.social. Follow me there for ranting on base 2 versus base 10, transportation and housing, and tech failures. I also seem to post occasional vacation photos and astronomy blurbs, and anything else that seems amusing.

I hope to see some of you there, and I hope to see fewer of you on that other place.

Want thoughts on why leaving X/Twitter is a good idea? Bill Gates has some ideas, or you could just look at the number of organizations that are abandoning the site and think about whether they might have a point.

brucedawson
http://randomascii.wordpress.com/?p=4091
Extensions
Life, death, and retirement
Uncategorizeddeathgriefliferetirement
I haven’t been blogging much lately, and it turns out there is a very good reason. My last technical blog post was October 1st of last year. After I hit publish on that one I went to get ready for … Continue reading →
Show full content

I haven’t been blogging much lately, and it turns out there is a very good reason.

My last technical blog post was October 1st of last year. After I hit publish on that one I went to get ready for bed and found my wife lying on the bathroom floor in excruciating pain.

I took her to the hospital. She was diagnosed with pancreatitis which is a truly horrible disease. Her hospital stay was an insane rollercoaster and she ultimately died nine weeks later.

Heartbeat

So yeah. It’s been a shit year.

After she died there were a lot of bureaucratic tasks to be done – at least doubled because of our dual US/Canada citizenships. Google gave me a month of leave and I took a month of vacation and I came back… not at all better. Grief is a long process, complicated by the fact that I’d only fairly recently moved back to Vancouver – I had practically zero close friends nearby.

As I tried to piece together a new life I found that work – especially remote work, isolated in my home office – was not a useful part of the healing process.

I had already dropped down to 80% time – just 32 hours a week – but even that felt like a burden. Work kept interfering with getting outside, spending time with people, and exercising. I decided to take a three-month leave of absence, both to focus more time on healing and to see if I would miss work at all. I did not miss it. Not one bit.

Four years ago I was having fun at work, solving interesting bugs, writing tutorials, and generally living my best life. I published eleven blog posts that year. Even Covid didn’t slow me down. Back then the idea of stopping would have seemed crazy. But my best work has usually been random discoveries – improbable bugs that were often polite enough to manifest on my machine before they affected anybody else – and there was always the risk that these random discoveries would peter out… and they seemed to be doing that. Maybe all of the bugs in Windows have been fixed now, or maybe I’m looking in the wrong places, but even before the shit hit the fan a year ago I was already not finding as many exciting things to do.

If you then layer on grief it’s not surprising that my motivation dropped. It’s not surprising that I started to resent every meeting that reduced my flexibility for playing tennis or other more fun and social activities.

imageAnd so, after getting a thumbs up from my financial adviser, I decided to quit. To retire as soon as I got back from my leave. I gave notice on the 10th anniversary of starting at Google and my last working day is October 4th, 2024. I am taking my work/life balance and turning the dial all the way to “life”.

Maybe this will be my last blog post ever. Or maybe I’ll investigate and write up a few issues that I have notes on. We’ll see.

And maybe my readers can help. I won’t be finding crazy bugs at work anymore, but I’ve had a few blog posts that were triggered by somebody reaching out with a problem.

I can’t promise that I will investigate any particular thing, but if you have a performance or stability or floating-point problem that seems like it might pique my interest, and if you are motivated enough to either help me reproduce it or to send me traces, then, well, who knows?

And, if you work for a company that has Windows performance problems and you want some private consulting, well, I can still be motivated by money, so reach out.

I did not expect this to reach the front page of Hacker News, but apparently it did.

brucedawson
Heartbeat
image
http://randomascii.wordpress.com/?p=4082
Extensions
Localization Failure: Temperature is Hard
MathmetricRantsCelsiusFahrenheitlocalization
The Guardian is one of my favorite news sources. I’m a subscriber (support news organizations!) and I read it daily. But it is not immune to errors, as this headline shows: 68 °F above average is a lot. For a … Continue reading →
Show full content

The Guardian is one of my favorite news sources. I’m a subscriber (support news organizations!) and I read it daily. But it is not immune to errors, as this headline shows:

Record heat: Malawi swelters with temperatures nearly 68F above average

68 °F above average is a lot. For a tropical country it is not credible for temperatures to be that much warmer than average because the average is too high to give enough headroom. So what gives?

Reading the article I found this:

parts of Malawi saw a maximum temperature of 43C (109F), compared with an average of nearly 25C (77F)

As I expected the actual temperature increase was 32 °F, not 68 °F. So what’s up with that headline? Here’s a hint: this is what the headline might say if you set your location to somewhere other than the United States:

Malawi swelters in record heat with temperatures nearly 20C above average

Now “nearly 20C” is an odd way of saying “18 °C”, but I guess they really like round numbers, and that’s not the problem. The problem is that somebody – the localization team? an algorithm? – decided that 20 °C was equivalent to 68 °F. And they’re not wrong. And yet they are.

When converting from a temperature in Celsius to one in Fahrenheit you have to multiply by 1.8 (because each degree Celsius covers a range 1.8 times as large as a degree Fahrenheit) and you have to add 32 °F (because the freezing point in Fahrenheit is 32, compared to 0 in Celsius). However if you are converting a temperature difference you just multiply by 1.8.

That is, if the temperature goes up by 1 °C then it has gone up by 1.8 °F. If it goes up by 10 °C then it has gone up by 18 °F. If it goes up by 20 °C then it has gone up by 36 °F. Adding 32 °F in this context is just wrong.

This is just another version of the fallacy involved when somebody says that it is “twice as hot” when the temperature goes from 5 °C to 10 °C – note that this is equivalent to going from 278 K to 283 K, or 41 °F to 50 °F, so clearly not “twice as hot” in any meaningful way.

In short, translating 20 °C requires examining the context and there are at least three possible translations:

  • “The temperature is 20 °C” translates to “The temperature is 68 °F”
  • “It’s 20 °C warmer than yesterday” translates to “It’s 36 °F warmer than yesterday”
  • “The temperature is minus 20 °C” translates to “The temperature is minus 4 °F”

So 20 °C is either 68 °F, 36 °F, or (minus) 4 °F.

Reported here:

https://twitter.com/BruceDawson0xB/status/1714406661904007624

Hacker news discussion here.

brucedawson
Record heat: Malawi swelters with temperatures nearly 68F above average
Malawi swelters in record heat with temperatures nearly 20C above average
http://randomascii.wordpress.com/?p=4050
Extensions
32 MiB Working Sets on a 64 GiB machine
Computers and InternetInvestigative ReportingmemoryPerformanceProgramminguiforetwxperfpriorityworking set
Memory is a relatively scarce resource on many consumer computers, so a feature to limit how much memory a process uses seems like a good idea, and Microsoft did indeed implement such a feature. However: They didn’t document this (!) … Continue reading →
Show full content

Memory is a relatively scarce resource on many consumer computers, so a feature to limit how much memory a process uses seems like a good idea, and Microsoft did indeed implement such a feature. However:

  • They didn’t document this (!)
  • Their implementation doesn’t actually save memory
  • The implementation can have a prohibitively high CPU cost

This feature works by limiting the working set of a process – the amount of memory mapped into the address-space of the process – to 32 MiB. Before reading any further take a moment to guess what the maximum slowdown might be from this feature. That is, if a process repeatedly touched more than 32 MiB of memory – let’s say 64 MiB of memory – then how much longer could these memory operations take compared to if the working set was not limited? Take a moment and write down your guess. The answer is later in this post.

This exploration started when a Chrome user tweeted at me that they kept seeing Chrome’s setup.exe hogging the CPU. Investigating weird Chrome performance problems is literally my job so we started chatting. Eventually they used UIforETW’s circular-buffer recording mode (leave tracing running, save the buffers when the problem happens) to capture an ETW trace. They filed a Chromium bug and shared the trace and I took a look.

The trace did indeed show lots of CPU time being spent in setup.exe (the sampling rate is 1 kHz so each sample represents approximately 1 ms of CPU time), but there was nothing obviously out of order:

WPA CPU Usage (Sampled) screenshot showing setup.exe spending its time applying a patch

That is, at a first glance there was nothing obviously out of order, however as soon as I drilled down into the hottest call stack I saw something peculiar:

WPA CPU Usage (Sampled) screenshot showing setup.exe spending its time applying a patch, but mostly in KiPageFault

A few hundred samples spent in KiPageFault seemed maybe plausible, but more than 20,000 samples is definitely weird.

KiPageFault is triggered whenever a process touches memory that is not currently in the working set of the process. The memory faulted in might be a zeroed page (first use of an allocated page), a page from the standby list (pages in memory that contain data), a compressed page, or a page that is backed by a file (a memory mapped file or the page file). Whatever the source, this function adjusts the page tables to make the page visible inside the process, and then restarts the faulting instruction.

Since KiPageFault is showing up on multiple call stacks (memory can get paged in from almost anywhere, after all) I needed to use a butterfly view to find out the total cost, and get some hints as to why so much time was being spent there. So, I right-clicked on KiPageFault and selected View Callees, By Function. This showed me two very interesting details:

WPA CPU Usage (Sampled) screenshot showing setup.exe spending 99% of its time in KiPageFault

The first detail is that of the 46,912 CPU samples taken from this process fully 46,444 of them (99%!) were inside KiPageFault. That is remarkable. In a steady-state process (not allocating excessively) on a system with sufficient memory (this system had 64 GiB of RAM and roughly 47 GiB of that was available) the number of page faults should be close to zero, and this was a long way from that.

The other detail is that most of the time inside of KiPageFault was spent in MiTrimWorkingSet. This makes sense. But at the same time it is, actually, pretty weird. It looks like every time a page is faulted in to the process the system immediately trims the working set, presumably removing another page from the working set. Doing this is expensive, and increases the odds of future page faults. So, it makes sense in that it explains why the process is spending so much time in KiPageFault, but it is weird because I don’t know why Windows would be doing this.

WPA Total Commit table showing setup.exe with 47.418 of commitETW traces contain a wealth of information so I looked at the “Total Commit” table and found that setup.exe only had 47.418 MiB of commit. This measures the total amount of allocated memory in this process, plus a few other types of memory such as stack, and modified global variables. 47.418 MB is a pretty tiny amount and should take less than 10 ms to fault in (see Hidden Costs of Memory Allocation for details), and there were no new allocations during the trace, so the KiPageFault overhead was definitely excessive.

WPA Virtual Memory Snapshots table showing the working set varying but always staying around 32 MiBI then looked in the “Virtual Memory Snapshots” table at the Working Set column. This column contains working-set information sampled occasionally – 19 times during the 48 seconds I looked at. These samples showed the working set varying between 31.922 MiB and 32.004 MiB. That is, the sampled working set went as low as 80 KiB below 32 MiB, and as high as 4 KiB above 32 MiB. That is a very tight range.

Procrastination

I thought that SetProcessWorkingSetSize might be involved in triggering this behavior, and a coworker suggested SetPriorityClass with PROCESS_MODE_BACKGROUND_BEGIN could be a factor, so I thought about doing some experimentation with these functions. But, the issue was reported on Windows 11 and I assumed that there must be some odd-ball configuration triggering this edge case behavior so I didn’t think my tests would be fruitful so I did nothing for three weeks.

I finally got back to the bug and decided to start by doing the simplest possible test. I wrote code that allocated 64 MiB of RAM, touched all of it, then used EmptyWorkingSet, SetProcessWorkingSetSize, and SetPriorityClass with PROCESS_MODE_BACKGROUND_BEGIN, then touched the memory again. I used some Sleep(5000) calls and Task Manager to monitor the working set. I was not expecting the simplest possible test to reveal the problem.

My tests showed that EmptyWorkingSet and SetProcessWorkingSetSize both emptied the working set almost to nothing, but the working set “refilled” when the memory was touched again. So, the documentation for these functions (as crazy and archaic as it sounds) seems to be mostly accurate. And, unless they were called extremely frequently these functions could not cause the problem.

On the other hand, my tests showed that SetPriorityClass with PROCESS_MODE_BACKGROUND_BEGIN caused the working set to be trimmed to 32 MiB, and kept it there when I touched all the memory again. That is, while touching 64 MiB of memory would normally fault those pages in and push the working set to 64 MiB or higher, instead the working set stayed capped.

Whoa. That’s crazy. It wasn’t supposed to be that simple. I refined the test code more but it’s still fairly simple. In its final form the code allocates 64 MiB of memory and then repeatedly walks over that memory (writing once to each page) to see how many times it can walk over the memory in a second. Then it does the same thing with the process set to background mode. The difference is dramatic:

Screenshot of command-prompt output from BackgroundBegin.exe showing normal mode scanning memory ~4400 times per second, while background mode does it 6-17 times

The performance of scanning the memory in the normal mode is quite consistent, taking about 0.2 ms per scan. Scanning in background mode normally takes about 250 times as long per scan (two hundred and fifty times as long!!!). Sometimes the background-mode scanning goes dramatically slower – up to about 800 times as long per scan, 160 ms for 64 MiB.

This dramatic increase in CPU time is not a great way to reduce the impact of background processes.

Limiting the Working Set Doesn’t Save Memory!

Okay, so PROCESS_MODE_BACKGROUND_BEGIN makes some operations take more than 250 times as long to run, but at least it saves memory. Right? Right?

Well, no. Not really. Not in any situation I can imagine.

Trimming the working set of a process doesn’t actually save memory. It just moves the memory from the working set of the process to the standby list. Then, if the system is under memory pressure the pages in the standby list are eligible to be compressed, or discarded (if unmodified and backed by a file), or written to the page file. But “eligible” is doing a lot of heavy lifting in that sentence. The OS doesn’t immediately do anything with the page, generally speaking. And, if the system has gobs of free and available memory then it may never do anything with the page, making the trimming pointless. The memory isn’t “saved”, it’s just moved from one list to another. It’s the digital equivalent of paper shuffling.

Another reason this trimming is pointless is because the system already has a (much more efficient) mechanism for managing working sets. Every second the system process wakes up and runs KeBalanceSetManager. Among other things this function calls MiProcessWorkingSets which calls MiTrimOrAgeWorkingSet:

Screenshot of WPA's CPU Usage (Sampled) graph showing the system process running KeBalanceSetManager

All I know about this system is the names of the functions and the frequency of its operation, but I feel pretty confident in speculating about roughly what it’s doing, and it seems like a strictly better solution to the problem. Here’s why MiTrimOrAgeWorkingSet is better than PROCESS_MODE_BACKGROUND_BEGIN:

  • Trimming the working set once per second is far more efficient (uses less CPU time) than trimming it after every page fault, and it greatly reduces the odds of trimming a page just before it is needed
  • Trimming the working set once per second is just as memory efficient as trimming after every page fault because trimming doesn’t immediately save memory anyway
  • Trimming the working set every second can more easily respond to changes in memory pressure, doing nothing when there is lots of free memory, and then aggressively trimming rarely-touched pages from idle processes when conditions change.
Resolution

As far as Chrome is concerned the solution to this problem was simple – don’t call this function, therefore don’t use PROCESS_MODE_BACKGROUND_BEGIN, and therefore don’t put Chrome’s setup process into this mode. We still run in low-priority mode, but not the problematic “background” mode.

But this flag remains, waiting to snare some future developer. The easiest thing that Microsoft could do would be to change the documentation to acknowledge this behavior. I have in mind a large, red, bold-faced label saying “if your process uses more than 32 MiB of memory then PROCESS_MODE_BACKGROUND_BEGIN may make your program run 250 times slower and it won’t really save memory so maybe use THREAD_MODE_BACKGROUND_BEGIN instead.” But fixing the documentation would not be as valuable as fixing the background mode. I have trouble imagining any scenario where capping the working set to less than 0.5% of the memory on a low-end laptop would be better than the working-set trimming implemented in the system process, so removing this functionality seems like a pure win.

And fixing the background mode would avoid the need for the ugly large, red, bold-faced warning label.

Ironically the impetus for using PROCESS_MODE_BACKGROUND_BEGIN in Chrome was a 2012 Chrome bug (predating my time on the team, and I’ve been there a while) complaining that the updater was using too much CPU time.

This recent issue was reported on Windows 11, but I found a Mozilla bug discussing this flag that linked to a Stack Overflow answer from 2015 that pointed out that PROCESS_MODE_BACKGROUND_BEGIN limited the working set to 32 MiB on Windows 7. This issue has been known for eight years, on many versions of Windows, and it still hasn’t been corrected or even documented. I hope that changes now (update, December 2024, no sign of change).

Addendums

To clarify, it is the working-set that is trimmed to 32 MiB, not the private working set. So, the 32 MiB number includes code as well as data, for what it’s worth.

Also, after posting this I was playing around and found that when I reset the process with PROCESS_MODE_BACKGROUND_END this causes the working set to be trimmed. That’s harmless, but weird. Why would taking the process out of background mode cause the working set to be trimmed as if the process had called EmptyWorkingSet?

A twitter user posted a bit of history and a tool (untested!) to list working-set state for processes on the system.

Socials media and links
brucedawson
WPA CPU Usage (Sampled) screenshot showing setup.exe spending its time applying a patch
WPA CPU Usage (Sampled) screenshot showing setup.exe spending its time applying a patch, but mostly in KiPageFault
WPA CPU Usage (Sampled) screenshot showing setup.exe spending 99% of its time in KiPageFault
WPA Total Commit table showing setup.exe with 47.418 of commit
WPA Virtual Memory Snapshots table showing the working set varying but always staying around 32 MiB
Screenshot of command-prompt output from BackgroundBegin.exe showing normal mode scanning memory ~4400 times per second, while background mode does it 6-17 times
Screenshot of WPA's CPU Usage (Sampled) graph showing the system process running KeBalanceSetManager
http://randomascii.wordpress.com/?p=4032
Extensions