Scali's OpenBlog™

Scali Nov 8, 2025

I recently found this new blogpost by VileR: A Lost IBM PC/AT Model? Analyzing a Newfound Old BIOS. It’s a very interesting look into the mess that is IBM’s PC division in the early 80s. And I had various thoughts … Continue reading →

Show full content

I recently found this new blogpost by VileR: A Lost IBM PC/AT Model? Analyzing a Newfound Old BIOS.

It’s a very interesting look into the mess that is IBM’s PC division in the early 80s. And I had various thoughts while reading it. I thought it would go a bit too far to put everything into a comment there, as it may be too long and too far off-topic in some ways, so I decided to write it down here.

Now, as I was reading it, the first thing that jumped into my mind was the IBM 5162, aka the XT-286. At least from a functional standpoint, it is actually an AT, despite the name. The main difference with the actual 5170 PC/AT is that it uses the XT casing, so it has a different form-factor. But as far as the BIOS goes, the XT-286 would be very similar to an AT, as the hardware is nearly identical. However, the XT-286 motherboard does have the capacity for 640k, where the AT does not.

The XT-286 was introduced in September 1986, so this BIOS clearly predates it. So could it be that this elusive ‘Skyrocket’ AT variation is actually an early iteration of the motherboard that eventually ended up in the XT-286? That’s something I was wondering.

VileR does not go into the XT-286 aspect, but the Usenet thread by ex-IBM employee Tony Ingenoso that he refers to, has an interesting reply:

The last ATs came with the XT286-mainboard, running the 80286 at 6 MHz, being
physically much smaller than the original AT “toilet seat sized” board – and
having 128K soldered plus 2 x 256K 30-pin SIMMs removeable.

This confirms that although the AT has a different form-factor than the XT, an XT motherboard can fit inside an AT-case. While it is made clear that ‘Skyrocket’ is certainly NOT the XT-286, as the 640k motherboard of Skyrocket was the larger AT-size, and would not physically have fit inside an XT case.

It is however interesting that this post claims that some ATs actually DID come with a 640k board, and were actually populated with 640k as well, as they were technically just XT-286es.

So perhaps the XT-286 board is a later iteration of the Skyrocket prototype. And perhaps due to advances in technology, they could make the board smaller, and figured that they might as well make it fit the XT form factor. Which may or may not have been because they still had these cases in stock, or just because they were cheaper to build than AT cases. Who knows.

In fact, perhaps the whole reason why the AT case was introduced was because they couldn’t get an AT board to fit inside a PC/XT case with 1984-era technology. Not only did the AT have more components than the XT (CMOS timer chip, second PIC, second DMA controller), but it also had a 16-bit bus, requiring physically larger ISA slots and more traces to be routed through the board. So they had more chips to put on the board, and less room to put them. And even with the larger case, they only designed it to fit up to 512k.

If you compare it to the XT-286 board, you see that a lot of room is saved by using newer technology: SIMMs. They are a very compact way to install large amounts of memory on a board. But they were first introduced in 1983, perhaps too late to be incorporated into the design of the original AT. And based on the information we have on Skyrocket, that one did not use SIMMs yet either.

The other obvious space saver on the XT-286 board is that only 6 of the 8 slots are full 16-bit slots. The other two slots are 8-bit only, and there are ROM chips located where the 16-bit portion of the ISA slot would be. On the 5170, there are also only 6 16-bit slots, but in theory there is room for 8 16-bit slots. On two of the slots, the 16-bit portion is just not installed on the board, although it appears to be routed, and soldering holes are present.

For some reason IBM never used the other obvious option for saving space: integration. Given the volume of machines that IBM produced, and the fact that IBM had the means to produce chips in-house, I don’t understand why they didn’t just invest into integrating some of the standard chips into a larger circuit. Tandy did that with its Tandy 1000, which was also introduced in 1984. And IBM did make a small attempt at integration with the PCjr, where they put most of the enhanced CGA circuitry into a single chip: the VGA (which stands for Video Gate Array, not for Video Graphics Array, as we know VGA from the standard introduced in 1987 with the PS/2 line of computers).

512k ought to be enough for anyone

Now, the existence of the XT-286 and the fact that the XT-286 board can be used in an AT case begs some questions:

Why was the XT-286 board only used with 6 MHz processors, but not with 8 MHz ones (if the information is correct)?
Why did IBM limit the AT to 512k for its entire lifetime?

It would seem that IBM was very close to solving both issues. That is, the XT-286 can most probably run fine at 8 MHz if you just install the correct CPU and crystal, and make sure there’s no code blocking this in the BIOS. I can understand that IBM wanted to market the XT-286 as a budget machine, and have the 8 MHz option for the AT only. But reading this claim that late AT’s were actually built with the XT-286 board makes me think ‘hmmm, so close…’

Also, IBM never seemed to market 640k AT’s even if they are claimed to exist, with an XT-286 board (in which case they would be late 6 MHz models apparently). I wonder if anyone has actually seem any of these AT/XT-286 machines in the wild, and if they were indeed 6 MHz only, and what memory configurations they had.

But, now to a more philosophical question: what is 512k?

And then I find that my perspective is skewed. I entered the PC world in 1988, by which time the IBM PC line had already been discontinued, and 640k was the de-facto standard for any cheap PC clone. So 640k was my starting point. I have never used a PC with less than 640k until I finally got a PCjr a few years ago.

What I do remember however, from my early years of the PC, is that 640k was often a requirement for software. And given how DOS worked, and how more than 640k was made possible on later machines, I am intimately familiar with having to free up more than 600k of conventional memory just to get software working, even though my PC had 4 MB or more in total.

I think in part it was also convenience: the market was saturated with machines that had 640k in the late 80s, so why bother trying to make your software fit in less? We were guilty of this ourselves, with 8088 MPH: a few parts require over 512k of memory, so practically they only run on a 640k machine. Could we have reduced the memory footprint? Sure, with a bit of effort we could probably get everything to run in 256k or less. Was there any need? No. People expect an IBM PC to have 640k.

However, in 1988, when we got our first PC, I remember being completely impressed by the specs. 640k was a lot. We came from a C64, so this was 10 times the memory! Likewise, its 9.54 MHz speed was about 10 times the clock speed of the C64. Even compared to an Amiga 500, at 7 MHz and 512k, the specs were good.

But the AT is not from 1988. It’s from 1984. And if we look at that other machine that IBM released in 1984, the aforementioned PCjr, then we see that this was sold with either 64k or 128k, and then it even had shared video memory. The original 5150 PC never had more than 256k on board. And although the PC/XT, introduced in 1983, could take 640k on board, it was never sold with more than 512k populated, if the information on Wikipedia is correct.

So perhaps not having 640k was not as big of a deal back then as we perceive it to be today. That is, today an AT is rather useless with just 512k, because a lot of software will require more. Besides, it’s not that easy to find a good 16-bit memory expansion card for an AT. 8-bit cards for PC and XT are far more common.

But at the time, perhaps IBM could get away with it just fine. The situation of 640k being a requirement didn’t arise until after the AT had already been discontinued in 1987. It’s just a bit weird that they introduce a machine that supports 640k on board in 1983, to then introduce a faster, more advanced machine the next year, which is limited to 512k on board. It’s just one of the many quirks in the history of PC design that makes little sense. But that’s PC for you.

http://scalibq.wordpress.com/?p=5884

Extensions

On rendering and solved problems

Scali Jul 21, 2025

Here’s just a quick brainfart I had after watching a video on 90s rendering technology, part of which concentrated on how John Carmack moved from Commander Keen to Wolfenstein 3D, DOOM and Quake: An interesting point there is that each … Continue reading →

Show full content

Here’s just a quick brainfart I had after watching a video on 90s rendering technology, part of which concentrated on how John Carmack moved from Commander Keen to Wolfenstein 3D, DOOM and Quake:

An interesting point there is that each of these these games uses a very different approach to rendering. And a related point is that after Quake, virtually all types of games would start to adopt 3D rendering. Even platform and puzzle games, which we would have considered to be very much a 2D affair until then. I suppose you could say that 3D rendering was a ‘solved problem’ after Quake, especially when 3D accelerators came around as well. With 3D hardware, APIs and middleware now readily available, getting stable 3D on screen at interactive speeds was not a problem anymore. The solutions were already available.

And more recently, we also got raytracing support in our APIs and 3D hardware, and now raytracing is starting to become a ‘solved problem’ as well, and is being adopted by various mainstream games.

So that got me thinking: these paradigm shifts in rendering, how many can we discern? And how does the hardware of the era tie into it? As in general, it is the hardware that dictated what kind of rendering was possible. It would be rare to have a breakthrough in rendering on hardware that had already been available on the market for some years, but had not been used in a specific way yet.

I could also mention an overview of 3D hardware and Direct3D API generations that I made earlier, which may be interesting background information.

I suppose that if we want to start at the earliest point, we would be talking about ‘video games’, not necessarily ‘computer games’. That is, some of the very first games that you could play on a television screen, did not actually use a computer. And there were some other oddballs.

In 1958, possibly the earliest installation that can be considered a ‘video game’, was Tennis For Two. It uses an oscilloscope as a display, and the display signal is generated with an ‘analog computer‘. Not a digital device as we know computers today. And not programmable as we know it today. You would ‘program’ it by using patch wires to connect different analog operators together.

One of the first ever digital video games is Spacewar! from 1962, which ran on a DEC PDP-1 computer. So that could classify as a ‘computer game’ as well, as it was implemented on a computer, with software. However, these two early examples would use an oscilloscope-like display. Computer games for consumers took a slightly different route. They would target commodity display technology, which would be standard TV-sets, or at least technology derived from standard TV-sets, so CRT-based devices that ran at standard broadcast resolutions (such as NTSC or PAL), and would draw the screen one scanline at a time, left-to-right, top-to-bottom.

That would start in the 1960s with pioneers such as Ralph Baer, and his work eventually turned into the Magnavox Odyssey in 1972, the first video console for use at home. It used very simple digital building blocks like diodes and transistors. Yes, it was digital, but it did not use an actual processor, and was not programmable. So not a ‘computer’ as such.

At around the same time, video game arcade machines took off. Nolan Bushnell made the arcade game Computer Space in 1971, which was more or less a clone of Spacewar!, but implemented in custom-designed hardware using TTL logic, rather than an existing computer running software. This made it much more compact and affordable than using a PDP-1 or similar machine required to power such a game. Bushnell’s next arcade game was Pong (the first under the Atari brand), based on the built-in ping-pong game in the Odyssey.

Where it gets interesting for us is the move from discrete TTL logic to programmable logic, using microprocessors. An early video game such as Pong didn’t really draw much on the screen. It would draw some horizontal and vertical lines, and the main moving objects were the ball and the paddles.

Racing the beam

I think we have arrived at our first rendering paradigm. The way these early video games drew onto the screen is known as ‘racing the beam‘. The ‘beam’ refers to the cathode ray in a CRT-based display. This was a purely analog system, where the video signal would drive the ray directly. So in its simplest form, like a black-and-white image, generating a video signal would consist of turning the beam on to draw a white line horizontally, and turning it off for black (aside from generating the required horizontal and vertical sync signals, but most hardware uses hardwired circuitry for that, so in most cases it is not the concern of the programmer, and as such it is beyond the scope of this article).

If you look at an early video game console, such as the Atari VCS (aka 2600) from 1977, you see that it draws its entire screen ‘on the fly’ by racing the beam. Its graphics hardware is just a small step up from a Pong game that draws a ball and two player paddles: it has special logic for drawing ‘ball’ and ‘missile’ graphics. Aside from that, it can draw ‘player sprites’ and a background. These are basically single horizontal lines, and the programmer has to update them on-the-fly to draw shapes that consist of anything other than the same horizontal pattern repeated vertically.

It was crude and blocky, but could be quite colourful, and in the hands of the master, you could create quite impressive graphics that way.

Character-based displays

The reason why you would draw the screen that way is because it doesn’t require a full framebuffer in memory. You could draw images at the full framerate of your screen, and move objects around on screen, giving you very fast and smooth action in games. The downside was that it was difficult to display text this way. Also, the display wasn’t persistent: you had to run the drawing code at every frame.

So, we have the second early rendering paradigm here. For displaying text, it made sense to have a framebuffer, so that the text could remain in memory, and be drawn by hardware automatically. However, there were two requirements fighting each other here: for sharp and crisp text, you want the highest possible resolution. But the higher the resolution, the more pixels you need to process per second. So that means more and faster memory. When you want to keep the hardware affordable, you want to save money on memory.

So early hardware would not draw out text pixel-by-pixel into a framebuffer. Instead, the framebuffer would contain character values (usually 8 bits per character, and then often some more bits for simple attributes such as bold/underline and/or colour). The actual bitmaps for the characters, the font, would be stored in ROM. The video hardware would then do an indirect lookup into the framebuffer, and then fetch the pixels for the character from the ROM. So for say a 40×25 display with 8-bit characters and 8-bit attribute, you’d only need 2000 bytes of memory for the framebuffer, regardless of what resolution the actual font is.

In its most basic form, your character ROM contained some simple shapes that were useful for creating rudimentary graphics.

Later machines would store the font in RAM and allow you to modify it. The so-called ‘redefinable character set’. This allowed for pseudo-graphics modes. For example, you may have a 40×25 text mode, and 8×8 characters. That would give you a virtual 320×200 resolution. However, your character set is only 256 characters large. So that means that you can only have an area of for example 16×16 characters that you could uniquely assign pixels to, which would give you a resolution of 128×128.

However, you can re-use the same character multiple times. So by cleverly re-using patterns, you can create the illusion of having a full-screen bitmap image. In games, you would usually construct a world from 2d tiles anyway, and re-using tiles is common.

One problem here is that you can only move the screen around by increments of 1 character, so with an 8×8 font size, that is 8 pixels at a time. Which is why various graphics chip offer ‘hardware scrolling’. This allows the graphics hardware to modify the timing, so it can offset the entire screen in pixel-accurate increments. That will allow you to move the screen from 1 to 7 pixels horizontally or vertically. And for the 8th pixel you would reset the scrolling to 0, and move the characters on screen. That way you can endlessly smoothly scroll your screen with pixel-accurate increments in every direction: horizontally, vertically, and when combined, diagonally.

Racing-the-beam and character-based hybrids

If you want to move shapes/sprites around on screen, then character-based graphics tend to be a bit clumsy. You have to update the characters on screen, and possibly also update the character map, to change the position of a sprite. And when the sprite is moving over a background, you have to restore the background in order to erase the sprite at the old position. And you are drawing into a framebuffer that is directly visible on screen. So your changes show up as soon as the cathode ray reaches the part where you are drawing.

This makes it rather difficult to make sprites move quickly and flicker-free, so it is not well-suited for fast and smooth action games, like racing the beam is.

So we find that various graphics hardware would combine the two techniques. For example, the Commodore 64 combines character-based graphics (with redefinable characterset) with sprite hardware. The VIC-II chip in the C64 can overlay a maximum of 8 sprites on top of a character-based background. What’s more, it can perform collision detection between sprites, and it also supports sprites moving ‘behind’ the character-based graphics.

These sprites are literally ‘overlays’: as the video chip outputs the video signal, it will normally output the character-based graphics, but the sprites are ‘overlaid’ on the output. That is, they are never actually drawn into the framebuffer. The video chip will determine the order of the background and the sprites, and output the pixels of the topmost sprite to the signal, or the background when there are no sprites (or transparent pixels in the sprite).

This gives you the best of both worlds: you can have quite detailed background graphics, scroll them around on screen, and use sprites to move objects around at full framerate. That makes basic 2d graphics for all sorts of games a solved problem.

Bitmap modes

Some of these early machines would also have a so-called ‘bitmap’ mode. That is, a mode where every pixel can be addressed in video memory directly, rather than indirectly via either characters or sprites. As said before, one reason for character-based graphics is that they were more efficient in terms of memory storage and bandwidth requirements. The C64 does actually have a bitmap mode, but it takes up 9k of memory, which takes quite a chunk out of your total 64k. It is also slower to manipulate individual pixels. So this mode was not suited for most use cases, and did not see much use.

The slow performance of bitmap modes remained a problem for quite a while. The Commodore Amiga can be seen as a turning point here. It did not have a specific character-based mode anymore. It only had bitmap modes. The Amiga came with 256k as a minimum, later with 512k, so the memory requirements for a bitmap mode weren’t such an issue anymore.

The use of a hardware blitter (bitblock transfer) allowed you to move memory around as fast as the physical memory allowed. This was fast enough to copy characters on screen, pixel-by-pixel, so a dedicated character mode was no longer required. Which is a good thing, as by this time graphical user interfaces (GUIs) were becoming popular, and in that case you generally want to use more than just text anyway. You want to display icons, windows, widgets and whatnot as well, so a mode that can only display text would be of limited use.

The Amiga still had 8 hardware sprites, so technically it was similar to the C64 in being a ‘hybrid’ between a framebuffer (with hardware scrolling) and sprite overlays. However, the sprites of the Amiga were quite limited in terms of colours and resolution, so they were not used much, outside from displaying the mouse cursor.

Instead, the blitter was often used to implement sprites. You could draw your sprites into the framebuffer with the blitter, and erase them again afterwards. The blitter was fast enough that the Amiga was capable of drawing many large, colourful sprites at full framerate this way, giving you the same perfectly smooth results as hardware sprites would.

Mind you, you would usually still be drawing directly to the visible framebuffer, in which case you would still have to ‘race the beam’ in some way: you’d have to make sure the blitter has finished drawing the pixels before the scanline is scanned out onto the CRT. Else you will get flicker or tearing artifacts on screen. So you would still need to keep the screen timing into consideration, and you also need to make sure you draw left-to-right and top-to-bottom.

However, it still wasn’t perfect. The blitter could only perform a few standard tasks, like masked copies, line drawing and filling. For anything more advanced, you’d need to draw the pixels with the CPU, and that still wasn’t fast enough to update a full screen at the full framerate.

This point wasn’t quite reached until the PC platform became a viable gaming platform in the early 90s, with the introduction of fast CPUs and fast VGA cards. These machines would finally allow you to rewrite every pixel at every frame. That is, as long as you stuck to the lowest resolution, which was 320×200 at the time, in 256 colours. But at least under those conditions, the bitmap mode was now a solved problem.

In a way this was an interesting and counter-intuitive development: we started out with special hardware to accelerate certain tasks. And in the end we did away with all of that, and just drew everything into memory with the CPU.

At around this time it also became viable to use double-buffering to avoid flickering. The system would be fast enough for you to draw an entire frame into memory, and then copy it to the visible framebuffer faster than the cathode ray could draw it, so you wouldn’t see any flicker or tearing.

More advanced hardware would allow you to have two or more buffers in video memory and ‘flip pages’: you could just toggle the hardware to display another buffer, rather than copying the buffer around.

From 2D to 3D

The next step in the evolution of rendering was to go from 2D graphics to 3D graphics. Relatively early on, somewhere in the mid-80s, the consensus became that polygon rendering would be the best way to render interactive 3D scenes going forward. Although the hardware was not quite up to the task yet, various 3D games arrived in the 80s, even on simple machines such as the C64. The Amiga and the PCs of the time would give you somewhat playable 3D games, but they were still quite sluggish compared to the silky-smooth 2D games we were used to.

Then in the early 90s, the PC platform really came into its own. There was a lot of competition between different clone builders, driving prices down and performance up. Because of some quirks in the PC hardware, a small detour was made from polygon-rendered 3D games. It was found that EGA and VGA graphics were specifically adept at rendering vertical columns. You could build walls out of textured columns that were scaled to suit the perspective, to get a 3-dimensional look. This led to games such as Wolfenstein 3D and DOOM (more on that here).

But then came the successor to DOOM, which was Quake. Which brings us back to the introduction: Quake rendered all its 3D with textured polygons. This remained the dominant rendering paradigm to this day.

Quake was not the first game to do this, and not the only game. But Quake arrived at a point where textured 3D polygons were basically a ‘solved problem’. The Quake renderer itself is very nice from a technical point of view. It had very good overall rendering accuracy, so polygons and textures were rendered with subpixel accuracy, and no warping, shaking, seams and whatnot, that we were accustomed to with early 3D games (and pretty much everything on the PlayStation 1 for example). And it combined this accuracy with excellent performance on the hardware available at the time, so it was suitable for smooth and agile action games.

From software to hardware

At around the same time that Quake came out, the first 3D accelerator cards for consumers also arrived on the market. I mentioned my overview of the hardware evolution earlier. But this is more about rendering paradigms than about the hardware itself.

And on that, I would like to mention that the move from software to hardware rendering was a case of all-or-nothing. Hardware acceleration only made sense if you rendered everything in hardware, for various reasons. From a performance standpoint it wouldn’t make sense. 3D acceleration allowed you to use higher resolutions, more colours and more detailed geometry and textures than possible when using only the CPU.

For example, on 1996-era hardware, Quake could run in software in 320×200 resolution with 256 colours, but not much more. When using a fast 3D accelerator, you could run the game smoothly in 640×480 or even 800×600, with 16-bit or even 24-bit truecolour. Those resolutions were well out of reach for a software renderer. So even trying to render just part of the screen with the CPU would quickly become a bottleneck.

Aside from that, most hardware and APIs would not even let you access the framebuffer (or z-buffer for that matter) with the CPU to begin with, or only at the cost of a performance penalty, and only under specific conditions.

What this means is that in practice, you had to ‘make do’ with whatever rendering operations the hardware allowed. So from here on in, the hardware started to dictate the rendering paradigm, to some degree.

Especially in the early days, when hardware was still very limited, there was very little flexibility in how you could render your games. Early hardware would basically just allow you to put one texture on it, and interpolate some light gradients from per-vertex values (usually a separate diffuse and specular value, which were then combined with the texture in a hardwired calculation).

Then came multitexture hardware, so you could put more than one texture on your polygons at the same time, and combine them in various ways. This would allow various tricks, such as using precalculated lightmaps, using image-based lighting/environment mapping, and simple ‘fake’ embossing/bumpmapping effects.

Then came programmable shaders, which slowly but surely brought back the flexibility in your rendering algorithms.

From rasterization to raytracing

The most recent development is hardware-accelerated raytracing. This is an interesting thing, as raytracing is one of the earliest rendering algorithms in the history of computer graphics. Turner Whitted developed the classic recursive raytracing algorithm back in 1979. It’s a very simple and elegant algorithm. And I think most people who started with software rendering, will have written a simple raytracer at some point, probably at a relatively early stage in their career.

In a nutshell, Whitted raytracing simply exploits the fact that you can trace light ‘in reverse’. That is, we see objects because light bounces off them and reaches our eyes/camera. So if you reverse that, you construct rays from your eye/camera through your viewing plane, and then find intersections with objects. At each intersection point on an object’s surface, you can construct rays from that point to every lightsource in the scene. Then if there are no more intersections with objects, the light can be seen, and you can calculate how much light from the distance to the lightsource and the angle. The problem with classic Whitted raytracing is that it is slow. You need a lot of calculations per ray to generate a pixel, and you need a lot of rays and pixels to generate a reasonable image.

There have been various efforts in trying to make raytracing more efficient. There were attempts at realtime raytracing in software (the demogroup Federation Against Nature did a few very nice ones back in the early 2000s). There were also attempts at building acceleration hardware for raytracing.

However, these attempts tried to generate a screen entirely with raytracing. Which works, but is not very efficient. The so-called ‘first bounce’, the first hit of a ray from the eye/camera to an object can be approximated just fine by rasterizing polygons. And with perspective-correct interpolation, you can generate accurate rays for further raytracing on every pixel of the polygon’s surface. A hybrid renderer constructed in this way is much faster.

This was known long before hardware acceleration arrived. Many offline renderers such as Pixar’s Renderman or 3DSMax or such, have used this approach for decades, to find a good balance between image quality and render time. But 3D hardware has basically taken the same route: there is now hardware available that can perform basic raytracing operations. And in most cases they are used for adding raytraced effects (such as reflection/refraction) to scenes that are rendered with traditional polygon rasterization and per-pixel shading.

That seems to be the common rendering paradigm today, and we will likely stick with it for years to come.

There’s always a pet peeve

Now that the ‘introduction’ is over, perhaps one of my pet peeves now makes more sense. It’s common these days to do ‘retro’ games and demos: graphics that look like they were from a different era, but made today, usually for modern hardware, just for the nostalgic vibe.

The thing is though, as I hope I have been able to explain, graphics looked a certain way in a certain era, because of the capabilities and limitations of the machines of the time. Graphics effects were generally a result of very specific tricks on very specific hardware.

Now as I said, around the time of Quake, software rendering was a ‘solved problem’: you could render everything with the CPU, and get perfectly smooth scrolling, sprites and whatnot. However, those sorts of tricks were traditionally done on hardware that had specific capabilities for doing so, and getting it right would often involve time-critical coding, racing-the-beam and whatnot.

That is where my own work differs from stuff that just goes for the ‘nostalgic vibe’: I try to do effects on actual hardware with the actual tricks of that period (or extending known tricks, or coming up with previously unseen tricks). Yes, you can do ‘raster bars’ on a modern machine, you can just draw them in a backbuffer and flip at full framerate. But what’s the point? Rasterbars were ‘invented’ because the hardware wasn’t fast enough to draw entire screens, and by manipulating the palette at the right time, you could draw colourful patterns on the screen. That’s why rasterbars are cool. If you just draw them in a framebuffer, it’s not actually an effect.

http://scalibq.wordpress.com/?p=5821

Extensions

Another adventure in downgrading, part 5: Linux

Scali Apr 16, 2025

I have taken on yet another project with the same codebase as discussed in the previous instalments about downgrading. And that is a task I had been meaning to tackle for some time: porting the code to Linux (or Android, … Continue reading →

Show full content

Now how is that a downgrade, you may ask? Well, that has to do with the nature of this project. Namely, now that we have .NET Core, we can, under the right circumstances (as briefly touched upon earlier), run the same C#/.NET code on a variety of platforms, including Linux. In that sense I have to ‘downgrade’ the code to meet these circumstances: you have to write the code against the lowest-common-denominator within the .NET Core target platforms.

Now, right off the bat I would like to mention that Linux is a rather peculiar target in .NET Core, as it does not have a platform-specific target at all, unlike most others (Windows, Android, iOS, macOS, tvOS etc). That means you’re basically limited to console-only applications, unless you use third-party libraries that allow you to create a GUI under Linux, such as Avalonia. There is no support for a cross-platform GUI environment such as Xamarin/MAUI.

Having said that, the first order of business then is to get as many of the projects in the solution to a platform-agnostic target, rather than Windows-specific as some of them are now. Some of them already are platform-agnostic. Some of them can be rewritten to be platform-agnostic. Some can be split up in a platform-agnostic and a platform-specific part, and some have to be replaced altogether, or just dropped altogether.

This also means that we could use cross-platform technologies (such as OpenGL, OpenAL and VLC) which could be used on both Windows and Linux. So there’s a fine line between making a version of the codebase that can run on Linux, and one that can run on Windows using cross-platform technologies, rather than the Windows-specific technologies we currently use. In some cases it may be useful to test under Windows, and compare the Windows-specific implementation against the cross-platform implementation that will eventually run on Linux. This means you don’t have to have a working platform-agnostic codebase right away: you can test the code from a Windows-environment.

Having said that, the goal is not to replace the current code with platform-agnostic code. Rather, the goal is to keep the current Windows-specific code, and add platform-agnostic or platform-specific code for other platforms, in a modular way. The codebase is already designed to allow you to select between various APIs, as we’ve seen. This is the result of earlier changes to the codebase, where new APIs were added, but support for the existing ones was maintained. The reason is that in specific cases, one API might be better than the other, so being able to select the API to use with some simple configuration is a good fallback mechanism.

For this reason we have Media Foundation, even though we also have VLC, which is probably the most compatible library out there for playing audio and video. While it can play more formats than Media Foundation can, the latter can play all the common formats, and has the advantage that it integrates very efficiently with Direct3D 11. This allows you to decode video directly into NV12 textures, without having to copy pixels over with the CPU. Especially for 4k content at high framerates, this makes a lot of difference, as otherwise the memory bandwidth becomes a bottleneck.

So given this example, it is likely that although a codebase based on VLC and OpenGL will work fine on Windows, it is not likely to perform as well as Media Foundation in combination with Direct3D 11. VLC currently has the status of being a fallback path only, for media that isn’t supported by Media Foundation.

If we go back to the earlier attempts, I had an overview of the technologies that worked in the best possible case on Windows:

Direct3D 9
Direct3D 11
DirectShow
Media Foundation
WIC
CefSharp
VLC
NAudio

From that list we can scrap Direct3D 9, 11, DirectShow, Media Foundation and WIC immediately, because they are only available on Windows.

Direct3D will have to be replaced with OpenGL or Vulkan, the two major graphics APIs supported by Linux, where OpenGL seems to be the best choice for low-end devices for now. There is no direct support for OpenGL in .NET Core, but we can use OpenTK (Open Toolkit) to provide bindings for OpenGL and various other APIs, much like how SharpDX does the same job for DirectX and related APIs. And there is Silk.NET, which does more or less the same as OpenTK, and even includes DirectX support as well.

DirectShow and Media Foundation are two of the ways to play audio and video media, with VLC as the third. If we can get VLC working under Linux, we can just use that, else we need to look further for an alternative media library on Linux.

WIC is used for loading images (JPEG, PNG, GIF, WEBP and other formats). Here we’ll also need to find an alternative library for Linux. In the past, I used FreeImage, but the .NET wrapper appears to have been abandoned years ago. The OpenTK examples use StbImageSharp. That might be an interesting option, as it is a source port of stb_image.h to C#, rather than a wrapper around some native library. This means it does not rely on any third-party binaries, which also means no platform-specific binaries required, so that’s good for platform-agnostic code. It does not support such a wide range of formats as FreeImage does, but it supports the common JPG, PNG and GIF formats, which is good enough for a start. WEBP would be nice to have though.

This leaves us with CefSharp, VLC and NAudio to investigate further.

CefSharp

We can be quick about CefSharp: the project has a Windows-only focus, and there are no other platforms supported currently. CEF itself does support Linux of course, so it is not completely hopeless. There may be alternatives available already, or else perhaps we can make at least the offscreen version working under Linux, as it does not require WinForms or WPF. But for our first attempt, let’s just shelve web view functionality for the moment, as before, and deal with it at a later time.

VLC

VLC itself is a multi-platform application. By extension, libvlc and the .NET wrapper LibVlcSharp are as well. But that doesn’t mean that all code using VLC is platform-neutral. The first thing is that for Windows you normally also use a NuGet package that contains all the binaries, so they are bundled with your app nicely. Similar NuGet packages exist for various other platforms as well, but not for Linux. The reason for this is that there are too many variations of Linux to meaningfully maintain NuGet packages for them. However, if you just install VLC on your machine, and the development package for VLC via your distribution’s package manager (libvlc-dev), then all the binary libraries should at least be on your machine, and the application should be able to find them, so you can get LibVlcSharp to work in Linux.

The second problem is the code that you write yourself. I have of course targeted Windows and Direct3D only until now, so my VLC code was interacting directly with Direct3D and was using various Windows-specific functions, datatypes and whatnot. So I will have to strip this from my VLC code, and abstract it in a way that I can use the same basic VLC code with both Direct3D and OpenGL, and on both Windows and Linux.

NAudio

NAudio originally started as a Windows-only library in its 1.x-version. Which is not strange, given that .NET itself was originally a Windows-only affair anyway. However, with the 2.x version, NAudio has been refactored from a single package into several subpackages (as mentioned before). Some of these are Windows-specific, such as the code that outputs to the various audio APIs supported by Windows (WASAPI, WinMM, DirectSound and ASIO). But other functionality, such as the audio resampling, mixing, compressor effect and whatnot, might be platform-independent.

So much like with VLC this will probably be a case of splitting off the Windows-specific parts from the platform-independent parts, and then I should be able to connect the output of the whole mixing chain to a Linux-specific library to output it to an actual sound device. OpenTK also offers bindings for the OpenAL API, so perhaps that is a good option to use.

But there’s more

So far we’ve only covered the obvious technologies that won’t be available on Linux. Another technology that’s perhaps slightly less obvious, especially since this is a fullscreen Direct3D-rendered application, is WinForms. Obviously even for a fullscreen Direct3D application, you still need to create a window of some sorts, and you still need to handle certain basic window events and such to get things working.

Now, that part is well and good, and for OpenGL there are .NET libraries such as OpenTK Desktop which offer alternative windowing environments (based on GLFW in this case). But in practice the WinForms-specific stuff will have spread through the codebase further than just the window alone. Various WinForms-specific objects and datatypes may have been passed-on as-is through various parts of the application, such as sending mouse or keyboard events to your own custom Direct3D-rendered controls. So the codebase will need to be refactored to remove these Windows-dependencies.

One major dependency I have already found is fonts. Although the codebase already uses FreeType to convert fonts into bitmaps and then store them in textures for Direct3D to render, there are various routines around the FreeType-code that use WinForms-specific font routines to get various metadata from the fonts. This too will have to be rewritten.

Not entirely from scratch

There apparently has been an attempt at porting the codebase to Android before I joined the company. In various places I have found #if ANDROID directives. There also was already an implementation of some OpenGL code, based on an earlier version of OpenTK. There was even a collection of shaders in GLSL format.

Now I have never actually tried to run any of this code, so I have no idea how mature it is, and what works and what does not (when I started on the code, it also had Direct3D 10 code in there, but that code had clearly never actually run, because there were some fundamental things missing or implemented incorrectly).

I went back to the first version of the code that I started from originally, to make sure that none of my refactorings or other cleanup and updates of the codebase had incapacitated any of the OpenGL code (before I eventually removed it). But I found that someone already beat me to it: in one of the refactoring commits of my predecessor, they removed the OpenGL flag from an enum used in the configuration settings. This means you couldn’t actually select the OpenGL path in the code at all. So for that reason alone, it could never have worked. Another issue that I found was that that some of the assemblies had been upgraded to .NET 4.6.2, but the OpenGL assembly was still .NET 4.0. This meant that if you tried to build it, it tried to link against some newer code, and failed to build. So while it may have worked in some shape or form at some point in time, it was definitely not in a working state when I started on it.

Having said that, it does implement various interfaces of the rendering engine via OpenGL, and there is the set of shaders, so this might be a good starting point to try and get the code working.

Not your grandma’s .NET

The old codebase was written against .NET 4.6.2, in the time when you would use Mono for Linux/Android. This is a contrast with today’s .NET Core-based environment. Namely, Mono did actually aim for full compatibility with the Windows-environment, and offered a WinForms implementation and pretty much the entire API that was available on Windows, unlike .NET Core. So the model was similar to Java, where you have a ‘write once, run anywhere’ approach.

This also meant that there were no specific target platforms as there are in .NET Core now. So instead, you’d use #if statements to conditionally turn certain platform-specific code on or off depending on your build configuration. In .NET Core this isn’t as simple anymore, as you need to set a specific target framework for your application. It is possible to specify multiple target frameworks, but that requires conditional configurations in your project file, which can be far from trivial. At this point I’m not quite sure whether to use this functionality, or to just use separate projects for each target.

So, while I may be able to get the old OpenGL code working under Windows, or even Linux, via Mono, that won’t make the code work in a modern .NET Core-based environment, such as .NET 8, which I am targeting with Windows now, and which is what I would also want to use on Linux/Android.

Mono itself is not entirely abandoned, but is now maintained by WineHQ, and has not seen a lot of development lately. Things have happened in the meantime. The original people behind the Mono project went on to form the Xamarin company, which developed a .NET-environment that could target iOS and Android. Xamarin was then acquired by Microsoft. And more recently, Xamarin was superceded by .NET MAUI, and Xamarin went end-of-life by May 1, 2024.

And while I could use .NET MAUI to target Android (and Windows), there is currently no Linux support, as said earlier, so I need a third-party UI solution for that. Also, OpenTK officially supports Windows and Linux, but not Android. That is, their older versions used to run on Android, but the current version may or may not run, the status is untested. So we will just have to see how it goes. As already mentioned, there is also the possible alternative of Silk.NET to look at. This also gives us bindings for OpenGL and OpenAL for a wide range of platforms. And it even supports DirectX and other Windows-specific APIs. So in the future I may want to migrate from the now-outdated and unsupported SharpDX to Silk.NET as well.

Not to mention that Windows and Linux are desktop OSes, which normally use the regular version of OpenGL, where Android is mobile/embedded and will use the slightly incompatible OpenGL ES variation of the API. There will probably be more on that later.

So there are a lot of small details to take care of, and various subtargets hidden beneath the surface of a thin veil of platform-independency.

But bottom-line, once I have a working implementation of OpenGL code and shaders for Windows/Linux via OpenTK, it should not be that difficult to translate it to either OpenGL ES with OpenTK, or with some alternative OpenGL wrapper.

And once the code works on Linux on x86, I can also add support for other CPUs, mainly ARM. By using AnyCPU where possible, and making CPU-specific parts compile for ARM as well, I should be able to get the code to run on Raspberry Pi and similar low-cost Linux-based devices. If I want to target Android, I will need to be able to support ARM anyway, as ARM is by far the most common architecture for Android devices.

Another possibility is to convert the code to WebGL without too much effort, as this is also derived from OpenGL ES. For C/C++ code, there is already Emscripten, which can convert compiled code to WebAssembly, including automatic translation of OpenGL to WebGL. .NET also supports using Emscripten and compiling to WebAssembly. There is no official support yet for OpenGL, but there have been experiments with Silk.NET which appear to have been succesful.

Yet another interesting cross-platform technology that can be useful, is WebGPU. It is the successor to WebGL, and the interesting thing is that it is designed to not only be used from inside a browser, with JavaScript. It also has bindings for C/C++ and Rust, for use on desktop platforms. However, the system requirements are still quite high at the time (requiring DirectX 12/Vulkan hardware), and it is not very mature yet, so I will probably leave it for some other time.

So all in all, there are a lot of avenues to explore, in trying to get this code to run on a wide variety of configurations with as little effort as possible. This was once primarily the domain of C/C++. Java made a valiant attempt, but ultimately fell short because of lacking performance and lack of support for interfacing with native code and hardware. Now C# seems to be at the point where it allows you to write very portable code, so let’s see how this goes.

This has become quite a long introduction, so I will leave it at that for now, and we’ll get hands-on in the next instalment.

http://scalibq.wordpress.com/?p=5739

Extensions

Windows 11 24H2: not for everyone

Scali Oct 25, 2024

Show full content

The 24H2 edition of Windows 11 was officially released this month. I have Windows 11 running on a variety of systems, only a few of which are actually officially supported. The other machines do not meet the system requirements in one way or another (mainly TPM, SecureBoot, UEFI and CPU support). Since the first edition of Windows 11 (released in October 2021), it was commonly known that Windows 11 supported far more hardware in practice than what the system requirements suggested, and it wasn’t too hard to find information on how to get around the checks in the installer, and get a Windows 11 installation running on unsupported hardware. Rule of thumb was: if the machine can run Windows 10, then it can run Windows 11.

The 24H2 edition changes this somewhat. A quick overview:

TPM: No changes. While Windows 11 officially requires TPM 2.0, it still works on machines with a lower version TPM, or no TPM at all.
SecureBoot: No changes. Like with TPM, it still works without SecureBoot if you just skip the checks during installation.
CPU: A breaking change here, the kernel now requires the POPCNT instruction.
Legacy BIOS/MBR environment: Another breaking change: BOOTMGR will not boot from a legacy BIOS and MBR partition table.

Let us look into those last two in a bit more detail here.

CPU

Windows 11 has had quite strict CPU requirements since its release. For both Intel and AMD processors, you’d have to have a reasonably recent CPU, not older than 2017. In practice however, Windows 11 did not appear to actually use any recent instructions, so if you just skipped the CPU check during installation, it would work fine on older CPUs. The oldest I tried it on was a Core2 Duo E6600 from 2006. Even some of the later Pentium 4 models (you need x64 support) can run it.

With the 24H2 edition, that has changed somewhat. The kernel is now using the POPCNT instruction. This was introduced around 2007/2008. AMD introduced the POPCNT instruction in what they call the ABM extension, first found on the Barcelona (aka K10) generation. On Intel CPUs, it is part of the SSE4.2 extension, first introduced on Nehalem, the first Core i7 generation.

So while this requirement is still much lower than the official requirements (1st generation Core i3/i5/i7/i9 vs 8th generation Core i3/i5/i7/i9), it is more strict than the earlier versions of Windows 11, and it effectively locks out CPUs such as the Core2 Duo from running this newer edition.

There’s no real workaround for this. So if you want to upgrade your Windows 11 installation on an unsupported CPU, make sure to run a tool such as CPU-Z to check whether or not you have support for SSE4.2/ABM, before trying to upgrade. If your CPU doesn’t support it, you should just stick with 23H2.

Legacy BIOS/MBR

The second change is a bit more tricky. If you’ve installed Windows 11 on an unsupported system earlier, you will likely have used a tool called Rufus. It can create a bootable USB stick from a downloaded Windows ISO, and it can also make it bootable for non-UEFI systems (and it will allow you to bypass various checks). Systems with legacy BIOS can only boot from devices with an MBR partition table, not with the newer GPT layout that is supported by UEFI.

Now, the problem is that the 24H2 edition comes with a new BOOTMGR, which simply doesn’t work with legacy BIOS/MBR anymore. You can create a USB stick with Rufus as before, but it won’t boot on a machine with legacy BIOS. So a clean install from USB stick is not possible this way.

A workaround is detailed in this video:

In short: you create a bootable USB stick from a 23H2 ISO instead. This will give you a USB stick with the old BOOTMGR, which is still compatible with legacy BIOS/MBR. Then you replace the main Windows installation file (install.wim) with the one from the 24H2 ISO. So the Windows version it will actually install will still be 24H2.

That should take care of clean installs. A related problem is with upgrading an existing installation. The previous tricks of setting various registry keys and replacing appraiserres.dll with an older version to get around the checks are not working anymore with 24H2. However, an even simpler workaround has been found:

Mount your 24H2 ISO image as a virtual DVD drive. Then open a command prompt. Go to the 24H2 drive and type the command:

setup /product server

And there you go. It will run the installer, but it is tricked to install ‘Windows Server’ instead of Windows 11. This apparently bypasses all checks. Since you are effectively performing an upgrade of Windows 11 instead of clean install, it doesn’t matter that it thinks it is installing ‘Windows Server’. You may see it on screen here and there, but the upgrade will go as planned, and once it is complete, you have Windows 11 24H2 as expected.

I haven’t tried this from a modified 23H2 image as described above, in which case I got the new, incompatible BOOTMGR installed, and the upgrade process only got as far as the first reboot, at which point I got an IO error from the Windows Boot Manager immediately.

It was a case of “easy when you know how”. Namely, I couldn’t boot into any of my Windows installations anymore. So the only thing I could do was boot from a DVD or USB stick. My first attempts at restoring the bootloader from a recovery command prompt all failed. It seemed as if nothing happened. And as it turns out, it didn’t. Apparently there are version checks built into the tools that allow you to restore your Windows Boot Manager (bootrec.exe, bcdboot.exe and such). They won’t actually overwrite a BOOTMGR file if one is already there, and it is newer than the version you’re trying to install.

So the trick is that BOOTMGR is located in the root of your C: drive, and it has the hidden, system and readonly attributes set. So you don’t see it, and you can’t accidentally delete it.

So first you remove the attributes:

attrib -s -h -r C:\BOOTMGR

Then you copy the BOOTMGR file from your recovery media (note, this is not on the x: disk where you are dropped when you get to the recovery console. The actual disk is mounted with another drive letter. You can use the diskpart tool to list all volumes and their drive letters to see where it is mounted). In my case that was drive h:

copy h:\BOOTMGR C:\

Then you can restore the attributes:

attrib +s +h +r C:\BOOTMGR

Now you can reboot your system, and it should boot again. The upgrade should also continue where it left off.

Caveat

The problem with running 24H2 on a system with legacy BIOS is that anytime an update wants to restore the BOOTMGR to the 24H2 version, it will make your system unbootable again. You have to manually copy back the BOOTMGR file from the 23H2 version as described above. Because of this, I wouldn’t really recommend running Windows 11 24H2 on legacy BIOS systems any longer. It’s easier to just stick with 23H2.

Update: I haven’t verified this myself, but I read a thread discussing problems with BOOTMGR on an MBR system, and their claim is that the issue is specifically when you have both MBR and GPT partitions in your system. Then the new BOOTMGR will fail to load. That is my situation indeed: I have a 1 TB MBR disk that contains DOS and various other legacy OSes. This is also my first drive, so this is where I store my BOOTMGR. Windows 11 was actually installed on a 4 TB disk formatted in GPT.

http://scalibq.wordpress.com/?p=5707

Extensions

Troy Grady appreciation post

Scali Aug 21, 2024

Show full content

Time for a quick guitar-related post. I’ve been playing guitar for some 30 years now, and having grown up in the 80s, I of course grew up in the era of hair metal and shred guitar. Virtually every pop song had a guitar solo, and often a very virtuosic one, even when the song wasn’t necessarily metal or even rock. So that’s the kind of guitar playing I wanted to learn. Van Halen, Joe Satriani, Steve Vai, Gary Moore, Yngwie Malmsteen etc.

So I was trying to push for technical and fast guitar playing from early on. In my early years, I mainly tried to play like Eddie van Halen and Joe Satriani, because they used a lot of legato playing, which was an easy way to play fast and complex phrases (along with various other tricks to spice up the playing). I found that it was much harder to pick every note at high speeds.

I did eventually get reasonably good at picking as well, but there were always things that I couldn’t quite nail, so I’d have to cheat by adding some legato here and there. And although there was quite a bit of instructional material available even in those days, nobody ever seemed to break down exactly how to do it.

Anyway, many years went by, and I developed my own style within my own limitations (I picked up things from various guitarists here and there, but what mostly stuck with me was ‘speed picking’ by Frank Gambale. I took some of his ideas and developed an economy picking style for when I wanted to pick every note, btut I certainly never mastered his entire speed picking method), and I no longer really tried to learn other guitarists’ stuff note-for-note, so the problems and limitations had basically gone to the background and become irrelevant.

And then a few years ago, I stumbled upon a series of YouTube videos called “Cracking the code”, made by one Troy Grady.

This completely blew my mind! Not only did this guy apparently grow up in about the same era, listening to pretty much the same guitarists, and using the same instructional material as I had… But he was taking it apart in a way that I had never seen before. He actually was… cracking the code!

The insights I got from watching Troy Grady’s videos were two-fold: one the one hand he showed and explained some parts of picking technique that somehow more or less eluded me. And not just me apparently, because for various things he had to invent his own terms, such as ‘pickslanting’ and ‘string hopping’ etc. Various guitarists were using these techniques that they probably weren’t even fully aware of themselves. Which would explain why nobody explained these details in their instructional material.

On the other hand, he breaks down the technique of various highly skilled guitarists, and shows that many of them basically do the same as I do: they ‘cheat’. They may not pick every note either in every phrase, but use legato here and there to overcome technical difficulties.

And many of them developed their style and phrasing around their own technical abilities and limitations. They may play ascending scales in a different way than descending scales. And they may not use straight runs but instead rely on certain patterns to move around on the fretboard, to sidestep certain technical issues.

And I say ‘more or less eluded me’, because in retrospect it’s quite a ‘duh’-moment… I was already doing pickslanting. When I started practicing sweep picked arpeggios, it was obvious that I had to slant my pick in one direction on the downstroke, and in another direction on the upstroke, to avoid the pick getting caught in the strings.

I knew that, but it had never occurred to me to also apply that approach to more conventional picking. As a result, I had developed a technique where I was good at playing ascending scales and licks, using economy picking. But I couldn’t ‘reverse’. So I just avoided straight descending scales and licks. Instead I’d use various patterns that were arranged in a way that I could get around the problem that I couldn’t get problem-free string changes with a straight descending scale.

I had two guitarists who I saw as great examples of good picking technique, as their playing was centered around picking every note, and they had a very fast and clean style. These guitarists were Michael Angelo Batio and Vinnie Moore. But Troy Grady made me come to the realization that these guitarists were actually very distinct. Batio is a guitarist who is extremely good at playing straight scales both up and down (and as Troy Grady will tell you, even among virtuoso players that is very rare). Moore on the other hand plays mostly pattern-based stuff instead of scales.

Another thing that Troy Grady did was to show me that you were never too old to learn new techniques. Sure, I’d been playing for about 30 years, and for most of that time I hadn’t even thought about my technique much. It was just etched into my subconscious, in my muscle memory, and it was versatile enough for me to do what I wanted to do. But, why stop there?

So I started looking at my own playing, my own technique, my own weaknesses. And one of those was those descending scales. I’ve been trying to add the ‘reverse’ to my technique: switch the pickslanting mode when I descend, so I can make the economy picking work both ways. And it’s gotten quite a bit better already.

I will also try doing pure alternate picking. The whole reason why I started economy picking was because pure alternate picking was more difficult for me with string changes. But Troy Grady’s analysis showed me what the problem is and how to get around it: I got in the ‘trapped zone’ because I didn’t always have the correct ‘escape motion’.

Because what Troy Grady also showed me: whereas many guitarists will develop their own style, partly because their technique steers them into that direction… It is also possible, if you understand the techniques at a deep enough level, to master all of them. Because Troy Grady can demonstrate every one of them (okay, perhaps not the full Frank Gambale speed picking, that just seems inhuman). He can show pure alternate picking, economy/sweep picking, straight scale runs, arpeggios, and various types of patterns.

So I want to express my appreciation for Troy Grady and his excellent work. In all my years of playing I have never found any instructional material that is this insightful, helpful and inspiring. I can only imagine what impact this will have on future generations of guitarists.

http://scalibq.wordpress.com/?p=5687

Extensions

How to build your own CEF with MP4 support

Scali Aug 16, 2024

Show full content

For the uninitiated, CEF stands for the Chromium Embedded Framework. Which is, as the name suggests, a framework that allows you to run Chromium embedded inside your own applications. Chromium being the open source browser which forms the basis for Google Chrome, the most popular browser in today’s world. Aside from that, various other browsers are also powered by Chromium these days, such as Microsoft Edge and Opera.

So you probably understand why one may want to run an embedded browser in their own application. Various popular applications do so, including for example Valve’s Steam platform.

But why would you want to build your own? The binaries are freely available, right? Well, yes they are. Thing is, they are built with a specific set of build flags. If you want or need to use different flags, then you may have to roll your own.

Now, MP4 support is one such case. By default, CEF is compiled with various of the audio/video codecs disabled, because of licensing issues. So the full Chrome browser supports more audio/video formats than Chromium and CEF do. On YouTube for example, there are various different codecs in use. With the standard build of CEF you can play some of the videos, but not all of them. Especially live streams and 50/60 fps content will often not work, because of unsupported codecs.

This issue is on the radar with the CEF project, and there are plans to reorganize the code so it can build with support for licensed codecs only when using (hardware accelerated) OS libraries, such as Media Foundation on Windows. In that case, the license is already taken care of by the OS, and any application can freely use it.

The snag at this point is that although hardware acceleration is done via these licensed OS libraries, there is still a chance of a software fallback, in which case you run different code, for which a license would have to be procured.

However, if you look at the licensing terms, the short version is that a license is free for the first 100,000 installations. So in practice, people who use CEF on a smaller scale can use the proprietary code for free, and won’t have to wait for the CEF code to be reorganized to build without any proprietary software codecs (be sure to check this for your specific situation with some legal advisors).

The advantage of supporting MP4/H264/AAC in CEF is that you can play not only more videos, but also live streams, and content with (Widevine) DRM. You still won’t be able to play ALL content that Chrome can, but it will be much closer than before. You will have support for a lot more YouTube content (I believe that at the time of writing, all videos will work, but this might change in the future), and you should be able to play content from all major streaming services.

Two interesting sites to check for codec and DRM support are these:

HTML5Test shows you a very detailed list of all features that your browser does and does not support.

The DRM Secure Stream Test at Bitmovin.com shows you an actual video with a supported codec and supported DRM if your browser is capable.

Now that we have the introduction out of the way, let’s look at building CEF. I will be focusing on a Windows build, but it shouldn’t be that difficult to translate the steps to other target OSes.

The basic CEF resources

There is a Quick Start guide at the CEF Bitbucket. We will use that as a starting point: https://bitbucket.org/chromiumembedded/cef/wiki/MasterBuildQuickStart.md

Follow the basic steps there of creating a few folders, setting up Python and the automate-git.py script, downloading and extracting the depot_tools and creating the update.bat file in the correct subfolder. Note their mention of the Windows SDK. They require that you install “Debugging Tools for Windows”. This cannot be installed via the regular Visual Studio installer. You can however go to your installed apps, and find the appropriate “Windows Software Development Kit” installation there, and click “Modify”.

This will then open the installer for the Windows SDK where you can change your installation and enable the “Debugging Tools for Windows”.

Then run the update_depot_tools.bat script, and stop there for now. Because this is where we start customizing the scripts.

Customizing the build

We need to add a few flags to the update.bat script, so that it will build with the extra codecs. The first line will need to be like this:

set GN_DEFINES=is_component_build=false ffmpeg_branding=Chrome proprietary_codecs=true is_official_build=true use_thin_lto=false chrome_pgo_phase=false

Now you can run the update.bat script. Then we get to the create.bat script from the Quick Start guide. We do the same here: the first line has to be modified with the same extra flags as we used above for update.bat.

Now you can create a custom build with the additional codecs enabled. Just follow the remaining steps of the Quick Start guide.

CefSharp

If you want to use a custom build of CEF with CefSharp, then there are some extra considerations. CefSharp provides its own binary NuGet packages, so you don’t rely directly on the CEF binaries that are publicly available. If you want to know how exactly these are built, you can check out the cef-binary repository of the CefSharp project.

We don’t actually need to completely rebuild the NuGet packages however. We can take a simpler route: we can build our custom CEF in a way that the files are drop-in replacements for CefSharp. So all you need to do is build your application with CefSharp as you normally do, and then replace a few files to enable the additional codecs.

The catch here is that CefSharp is rather picky about the binaries it uses, so you have to match the version of CEF that the NuGet package was built against exactly.

That means we have to understand how versioning in CEF is done. The CEF project has a page with an overview on this. In short, each major release of CEF corresponds to a specific branch, which is named by a 4-digit number.

The version number also includes a part that looks like this: +gHHHHHHH. The ‘HHHHHHH’ part is the hash of the actual commit.

These two numbers are what we need. So for example, say we use CefSharp version 125.0.21. We can go to the Releases section of their Github, to find a list of all the released versions. For 125.0.21, we find the following version: v125.0.21+gc8b1a8c+chromium-125.0.6422.142

So we now look up the branch for release 125 at the CEF project, and we find that this is branch ‘6422’. And we see that the version contains ‘+gc8b1a8c’, so we know that we need the commit with hash ‘c8b1a8c’.

We now go back to update.bat. We have to add the following flags on the last line (the one with python3 calling automate-git.py):

--branch=6422 --checkout=c8b1a8c --force-clean --force-clean-deps

If you now run update.bat, it will check out the specific branch and commit that the CefSharp NuGet package was built from. This will ensure that the files will be accepted by CefSharp.

After you built this version, all you have to do is take the files libcef.dll and chrome_elf.dll from your build, and copy these over the ones in your application folder. Now your application should still work with CefSharp, but the additional codecs will be enabled.

http://scalibq.wordpress.com/?p=5600

Extensions

CEF developers…

Scali Aug 6, 2024

As you may know, I have been using CefSharp as a browser component in some of the software I’m working on for some years now. CefSharp is a .NET wrapper for the Chromium Embedded Framework (CEF), which is a variation … Continue reading →

Show full content

As it happens, a few years ago, around release 76 of CEF, there was an interface available to access the Direct3D11-texture that CEF renders to, as a shared resource. This was a very helpful interface for Direct3D11 applications, such as mine, because it gave you a very efficient way to render the browser output directly to screen, and apply shaders to them. Which is exactly what my code does.

Before this interface, web views were always quite a bottleneck, as the regular CEF interface would just give you a rectangle of pixels in system memory, which you had to copy to a texture yourself, via the CPU. Especially on low-end devices which don’t have a lot of bandwidth, such as the ones we are targeting, this gave a huge performance hit.

So I was very happy when this accelerated D3D interface arrived, as it made even the low-end devices render web views quite acceptably. Sadly though, this didn’t last long. After just a few releases, the interface was broken, for reasons not entirely clear to me, and the slow CPU/system memory interface was the only one available.

But, when release 124 of CEF came along, lo and behold, there was a NEW interface. Since I use CefSharp rather than CEF, and the regular CefSharp team doesn’t have experience with D3D11, my first step was to create a workable wrapper for the new interface and make a pull request. This was quickly adopted, and a new official release of CefSharp followed soon, so we were ready for primetime. Or were we?

I was as unsure about the rationale of the new interface as I was about abandoning the previous one. Okay, the older interface used the old shared handle system, and now they used the updated shared handles of D3D11.1. An advantage is that they support more texture formats and such. So far so good.

However, the old interface would just pass you a shared texture, and you could cache the handle, create a a shader resource view on the shared texture, and keep using it until the handle changed, if ever. In practice it rarely did, so basically there was almost zero overhead for using the shared texture.

The new interface expects you to open the shared handle, copy the contents of the shared resource to your own texture, and synchronize explicitly, and then release the shared texture. That is, you have to make sure your copy has completed before you return from the callback, because you cannot assume that the shared texture will be valid for any longer than that. Yes, remember when we discussed what’s new in Direct3D 12? How the programmer had to manually synchronize operations to avoid race conditions? In Direct3D 11 this is taken care of automatically. Except, that is only within the scope of your own Direct3D instance. When you are sharing resources between two or more instances, the built-in synchronization on resources and such does not work, because each instance has its own command buffers, state machine etc. So you have to do it manually in this specific case.

That is a very inefficient way of going about it. Namely, there is considerable overhead in opening a shared texture, making a copy, and then synchronizing. You’re not actually exploiting the fact that the texture is shared between two devices. You’re just using it as a very inefficient temporary storage.

But, computers are fast these days, so let’s ignore the theoretical inefficiencies for now. Does it actually work? Well… as it turns out, it behaves in some unusual ways. Once you get the hang of how and where exactly you should copy the texture and synchronize, you will get your browser contents on the screen.

However, as I found, the performance is very haphazard. When I don’t use vsync, it runs very smoothly. Playing a video at 30 fps or better is no problem. But when I enable vsync, somehow the CEF framerate goes way down, and the performance becomes jerky. A video that should be playing at 30 fps, will drop to 15-20 fps, depending on the circumstances, and the framerate seems to jump up and down, it’s not constant. That is strange, as vsync means that the CPU and GPU have less frames to render, and will be idle, waiting for vsync to occur. CEF uses its own D3D instance, which means CEF will get both more CPU and GPU time, in theory. So why would the framerate be WORSE? (The whole issue reminded me a bit of the OpenGL bug I encountered in NVIDIA drivers many moons ago)

It gets even worse when your application switches to fullscreen mode. Which is even more strange, as fullscreen mode should be more efficient, so it generally requires even less CPU and GPU time than a windowed application. And even stranger than that: I found that when I run at a higher resolution, CEF appears to run more smoothly than at a lower resolution. So the pattern here appears to be that the performance of CEF is inversely proportional to the performance of the host D3D application.

In order to rule out issues in CefSharp/C#/.NET, I decided to turn to a native implementation. Back when I wanted to implement support for the old accelerated interface, I could find only one example that used this interface, a project by the name of cef-mixer. I decided to see if they had updated the code for the new interface. They hadn’t. So I figured I’d do a quick-and-dirty update of the code myself.

Now, doing the full copy-and-synchronize thing is a bit cumbersome, especially within that existing framework. So I decided to go for the quick-and-dirty approach first: just open the shared texture and create a shader view directly on it, rather than making a copy. Then the rest of cef-mixer can work as it did before, and will visualize the textures, so we can at least see what is happening. Now, we know that sometimes we may be using the shared textures longer than we should, so CEF will be re-using them, and we can expect flicker. But in practice that doesn’t happen much, and the real point is to monitor the rate at which the OnAcceleratedPaint()-callback is being called… Is the framerate tanking or not? And it seems that it does, even if you don’t wait for the copy-operation, so that may not be the root cause here.

Anyway, after spending way too much time on this, trying to rewrite the code in any possible variation that I could think of, to try and fix this issue, I started to think that there was no way that you could get this to work as it was intended. It seems like there may be some kind of timing issue in CEF itself. So I decided to open an issue on the CEF Github. Since my own code is proprietary commercial code, which cannot be shared for the purposes of reproducing this bug, I decided that I would share my quick-and-dirty cef-mixer instead. It was good enough to get the point across, I thought. And the developers should be smart enough to understand that although it doesn’t strictly follow conventions for how you should use the shared textures, that is not a relevant detail in this specific case.

Boy was that a letdown… Firstly, they try to school me on how this should have worked by-the-book, even though I specifically said I knew it was a quick-and-dirty way to reproduce it, and not representative of the actual code in my application.

Then they try to mince words… I suggested that using a single D3D instance for both the host application and CEF may give better performance than having two separate D3D instances. Media Foundation also has an interface that allows you to pass in an existing D3D interface. That would be helpful for CEF. But no… someone by the name of reitowo responded:

CEF doesn’t create D3D in OSR. It passes whatever chromium gives to you, and you just can’t pass a D3D instance to chromium.

Really? You’re trying to make a distinction between Chromium and CEF here? While CEF is technically a wrapper around the Chromium code which adds the API to host it in your own applications? It should be obvious that I consider Chromium to be part of a CEF instance. And yea, you can’t pass a D3D instance to Chromium, genius, that’s why I’m suggesting that an interface be made to make that possible. An interface that obviously has to be exposed by CEF, even if the underlying Chromium code also requires some changes (just as Chromium required changes to get OnAcceleratedPaint() working in CEF, but these changes were done by the people who were working on that CEF interface, so it’s not a relevant distinction).

At this point, it was already clear that this reitowo character had decided that I was some kind of newbie asking for help, and that he was going to show how much smarter and knowledgeable he was. Or so he thought…

We get gems like this:

I’m just wondering why you can conclude this is a CEF bug despite all rendering code are implemented by yourself.

Well, I’m not the one doing the OnAcceleratedPaint() callback, am I? That’s where the slowdown is coming from. I have no control over that. Obviously not ALL rendering code is implemented by me. I only consume the shared texture. If the callback doesn’t offer me one in a reliable fashion, I’m dead in the water. And that was the point of cef-mixer: it doesn’t actually copy the texture, it doesn’t do any flushing or query or anything else on my side that could possibly slow it down, and STILL the issue remains.

Interestingly enough, he then started bugging me for code to reproduce the issue. I already supplied that in the form of cef-mixer. Apparently he couldn’t be bothered to actually build and run that code. Instead, he started experimenting with his own code, but that did not render in an asynchronous matter. It would just render frames directly from OnAcceleratedPaint(). Yes, I had verified that already, that works. But that’s not useful for an application such as the broadcasting software we develop, where the output has to be a solid 50 or 60 fps.

Then another person by the name of KubaGluszkiewicz enters. He actually makes some more sense by pointing out that in an asynchronous situation, you must use an ID3D11Query event to make sure that the copy operation has completed (sadly he also expresses his assumption that I wouldn’t understand asynchronous situations and GPUs). He advises against using Flush(), as that will cause jerkiness.

Well, since there was no example code (which they should REALLY release, as this isn’t trivial to use even for an experienced D3D developer such as myself), it was useful that he confirmed that this was the way to implement it. That was what I was doing. In fact, my solution was even more elaborate, as I used two textures on my end, so I could double-buffer. That means that I would render one texture in the main loop, and I could copy to another texture from the OnAcceleratedPaint() callback without issue. I would only have to synchronize the swap.

Right, so I said that was what I’m doing, but it doesn’t work, and I showed a quick video of what it looked like, showing that especially in fullscreen it is very bad, and when in a non-vsynced window, it runs smoothly, even when switching between modes on-the-fly in the same running instance of the app. So the exact same code/state/etc. From here on it became comically insane.

First reitowo chips in again:

If you mean your fork, it doesn’t even copy, so chromium writes next frame to it and makes it jerky. If you don’t Flush the context and let the callback return, the async rendering would also happen after chromium reuse that resource.

Again, the best way works is

open handle with your device

get texture out of it

copy to your texture or just render it

flush your context

and then return from the callback

Uhh, wait a second, the other guy said just above that you shouldn’t flush your context, but instead use a query. And if reitowo would actually RTFM, he’d see that this is correct, namely:

Most applications don’t need to call this method.

…

Because Flush operates asynchronously, it can return either before or after the GPU finishes executing the queued graphics commands. However, the graphics commands eventually always complete. You can call the ID3D11Device::CreateQuery method with the D3D11_QUERY_EVENT value to create an event query; you can then use that event query in a call to the ID3D11DeviceContext::GetData method to determine when the GPU is finished processing the graphics commands.

So the documentation of Flush() specifically says that Flush() does NOT wait for the commands to complete, it merely pushes the buffer out to the GPU. Which means Flush() is NOT a reliable way to wait for a copy to complete. The documentation specifically points to a Query event for doing that, as KubaGluszkiewicz also said. Seems like reitowo is the one who doesn’t really know what he’s doing. Yet he is so confident in his abilities.

So, now you’d expect KubaGluszkiewicz to point out that reitowo was giving out incorrect information, which was in direct conflict with what he said earlier, right? No, instead he decided to have a go at me together with reitowo. And he started posting the same kind of insane things:

My last advice and take it seriously. I watched your issue on YT. It’s not a problem with CEF/Chromium it’s 100% problem in your code. Learn, figure out, improve and fix it.

Right, you can watch a video, and without inspecting one line of code, you can conclude that my code is the issue. And of course you assume that CEF/Chromium has no problems whatsoever. Nope, it doesn’t work that way. All of this smells quite a lot like Dunning-Kruger. And they even try to frame it as me asking for help with my code. Well no, I was reporting a bug. I haven’t seen anyone prove that CEF can in fact work flawlessly in this situation. All I see is two people communicating in fallacies.

Anyway, at this point I decided to just give up on CEF. These people seem to be too self-involved to even consider that there may actually be a problem with their code, and that a person reporting a bug may actually know what they are doing. Not a very constructive way to deal with possible issues.

In the meantime I continued working on a sample project using CefSharp. It is now available here. I had been using this simple application as a testbed for various custom tweaks in CefSharp. Now it can be used as a reference of how Direct3D11 can be used together with CefSharp and the new accelerated interface. The issue still isn’t solved. The code is out there. I don’t expect the CEF people to bother looking at it, because they’re convinced their code is perfect anyway. But to the best of my knowledge, I am doing things correctly as KubaGluszkiewicz describes: use a Query event to wait for the copy to complete before returning from OnAcceleratedPaint(), and not using the shared texture outside that scope.

I don’t do anything other than just copying and waiting in OnAcceleratedPaint(). How can we explain that the same code behaves differently depending on whether you wait for vsync or not, or whether you are in fullscreen or not? None of that code is related to OnAcceleratedPaint() or CEF in general.

Either there’s some magic incantation that I have somehow overlooked, or the problem is actually in CEF. Perhaps someone else can figure it out. I just can’t explain how my code works perfectly without the vsync. If it was a synchronization issue that caused the slowdown, then disabling vsync shouldn’t make it run faster and better, right? If my code renders more frames per second, then effectively the CPU and GPU are busy for longer, and the texture is in use for more of the time, so there is less room where the texture is not in use to make the swap, and there’s less CPU and GPU time to perform the copy in parallel. So common sense would be that the synchronization would be a bigger problem without vsync.

As an aside, I thought I’d search my old blog posts to see if and when I had written about Direct3D 11 before. And apparently my first post on Direct3D 11 was from 2009, shortly after its preview release in the DX SDK. And a few posts later, I present an early working test application in Direct3D9, 10 and 11. So I guess I’ve been working with Direct3D 11 for 15 years by now. I suppose not many people have been using it much longer, because you’d have to have had access to pre-release versions.

Update: I may have found a possible ‘magic incantation’: if you run in vsync mode, calling WaitOnVerticalBlank() on the swapchain’s containing output just before Present() in the main renderloop, the jerkiness is reduced.
Which would indicate that Present() does things differently when waiting for vsync than when you use the WaitOnVerticalBlank() (which only sleeps the CPU). It appears to stall the copy/query operation going on in another thread in OnAcceleratedPaint(). Which would indicate that Present() sleeps both the CPU and the GPU.
Question remains: why? This issue does not present itself with other async rendering, such as with MediaFoundation or VLC decoding video with the GPU in a separate thread. And when not using the accelerated interface, CEF still renders with D3D11 internally as far as I know. It only offers the pixels in a system memory buffer. So that still begs the question: why and how does the OnAcceleratedPaint()-interface affect this? And why did it not affect the earlier interface of CEF release 76?

Note also that none of the suggestions from the CEF developers said anything about this. Their example snippets and suggestions just indicated vanilla calls to Present(), and made no reference to any possible issues. Note also that using the WaitOnVerticalBlank() call just before a Present() call is a very unusual way to handle a renderloop. WaitOnVerticalBlank() is not meant for this purpose. When doing this, there is the risk that between returning from WaitOnVerticalBlank() and calling Present(), you actually ‘miss’ the vblank in the Present() call, so you may not reach the full refresh rate of the screen at all times. So this code is certainly not ‘by the book’, and I wouldn’t recommend it in any case. But until we figure out the real problem, this is the best we can do, it would seem.

Update 2: Another person by the name of kin4stat has entered the conversation, and was greeted with the same pleasant reception that I got. He says he got the same issues that I reported, and that there are also issues reported in the OBS-fork of CEF. Let’s see where this is going.

Update 3: The smoking gun has been located, with help of the people of the OBS-fork. They managed to pinpoint the issue to some oddly specific 250 ms delays. If I understand correctly, it has to do with the detection of animations inside Chromium, where it sometimes used the wrong region, so it wrongly concluded that nothing was animated, and therefore no new samples were taken. And if I understand correctly, this is the code that Chromium normally uses for casting, so the casting functionality was affected by this. And if I also understand things correctly, CEF basically ‘piggybacks’ their accelerated paint code onto this casting implementation, which resulted in not only casting, but also the CEF accelerated paint being affected by the lack of updates. Which would explain what I was seeing: the OnAcceleratedPaint() was simply not called often enough. At any rate, the problem appears to be fixed in the OBS fork, and has gone upstream to Chromium, so it should find its way in upcoming releases of CEF and CefSharp as well.

http://scalibq.wordpress.com/?p=5616

Extensions

.NET Core: the small print

Scali Mar 10, 2024

Some time ago, I wrote an article about upgrading code from .NET Framework to .NET Core. While this may give you a decent overview to get your code up and running in .NET Core, the devil is, as always, in … Continue reading →

Show full content

In this case, getting your code up and running in .NET Core is one thing. But in the case of user interfaces, more specifically when they are using WinForms, there’s a difference between having them up and running, and having them look and behave correctly.

Namely, when you design forms with the Designer in Visual Studio, a lot of code is generated automatically, and you may not have given it any second thought. But one thing that is relevant, is that the Designer will generate components that have the AutoScaleMode property set to Font.

What does that mean exactly? Well, it may seem a bit unusual to scale things by a ‘font’, but the idea is that your controls will not have an absolute size, but will scale depending on the size of the system’s fonts. Which makes sense when you mostly want to display readable text, and are not worried about absolute size.

So, what this means is that it is likely that when you’ve designed a number of forms/controls with the Visual Studio Designer, that they will have font-relative scaling. So far so good, but what’s the problem then?

Well, the question is: WHAT font is being used for scaling? That is the Font property of the control. Which you probably, just like the AutoScaleMode properly, rarely set manually. In which case, it will default to the DefaultFont.

Okay… but then why is it that if you use Font-scaling and you are using the DefaultFont, that the scaling is different depending on whether you compile the same code for .NET Framework or .NET core?

Well, there’s our small print: For .NET Core 3 there was a breaking change: the default font was changed from Microsoft Sans Serif 8 to Segoe UI 9. That explains why the default behaviour differs between .NET Framework and .NET Core 3.1 or higher. The scaling is based off a different font. The differences are generally subtle, so you may not even notice in most screens. But sometimes things will look very wrong.

However, we’re in luck. Because .NET 6 introduced another change, which can help us here: A new Application.SetDefaultFont() function was added, which allows you to set the default font application-wide at startup. If you set it to Microsoft Sans Serif 8 at startup, the font-relative scaling in your application will behave the same as it did .NET Framework. That means you won’t have to manually set the font for every control.

However, the tricky thing here is that this behaviour was changed in .NET Core 3, a very early version, from the days when .NET Core was not yet considered ready for mainstream/desktop usage. As I wrote last time, that wasn’t until .NET 5. But as we now see, it may be useful to read through the entire changelog of .NET Core whenever we run into an issue, even the early versions, as some breaking changes may have been done early on, and they may affect us now.

http://scalibq.wordpress.com/?p=5584

Extensions

Lemmings, or how clever tricks make platforms more different than they seem

Scali Feb 18, 2024

The other day I read this in-depth article on font usage in early DOS games by VileR. Since some of the fonts were apparently stored not as 1-bit bitmaps, but as multiple bits per pixel, I was wondering when the … Continue reading →

Show full content

I randomly thought of Lemmings as a game that I recall using a very nice and detailed font. But that turned out to be quite the can of worms, so I thought I’d write a quick summary of what we uncovered.

Now Lemmings was a game originally developed on the Amiga, and then ported to many different platforms. I mostly played the Amiga game back in the day, although I did also have a copy of the PC version. I had a vague recollection that although the Amiga version did look somewhat better, the PC version used basically the same font as the Amiga version.

Let’s compare the Amiga and PC version of Lemmings. And while we’re at it, let’s throw in the Atari ST version as well. More specifically, let’s concentrate on the VGA version of Lemmings for PC. Then we have three machines that have roughly similar video capabilities. All three machines have a video mode of 320×200, and support a palette that can be user-defined by RGB values. The Atari ST supports 3 bits per component (512 colours in total), the Amiga supports 4 bits per component (4096 colours in total), and VGA supports 6 bits per component (262144 colours in total).

The Atari ST supports 16 colours at once, the Amiga supports 32 colours at once (or 64 colours in the special ‘Extra HalfBrite’ mode), and VGA supports 256 colours at once. So at first glance, all three machines appear to have similar capabilities, with the Atari ST being the most limited, and VGA being the most capable. But now let’s look at how the game looks on these three systems.

First, the original on the Amiga:

Then the Atari ST:

Okay, looks very similar at first glance, although there is something I couldn’t quite put my finger on at first glance. But let’s look at the VGA version first:

Hum, wait a second… When I look for screenshots on the internet, I also find some that look like this:

Are there different versions of Lemmings for PC? Well, yes and no, as it turns out. When you start the game, there is a menu that asks what machine type you have:

The first screenshot is from the game in “For PC compatibles” mode, the second screenshot is “For High Performance PCs”. So let’s call the first ‘lo’ mode, and the second ‘hi’ mode.

Okay, so let’s inspect things closer here. At first glance, the main level view appears to be the same on all three systems. That would imply that only 16 colours are used on all systems, otherwise the Atari ST would not be able to keep up visually with the others.

The only difference that stands out is that the Amiga and Atari ST have a blue-ish background colour, where the VGA versions are black. It’s not entirely clear why that is. Also, the blue background is used only for the level on Amiga, where the background is black for the text and icons. On the Atari ST, the background for the text is blue, and it only switches to black for the icons.

But then we get to the part that kicked this off in the first place: the font. On the Amiga we see a very detailed font, using various shades of green. On the Atari ST, we see a font that looks the same, at first glance (more on that later). On the ‘lo’ VGA version, we see a font with the same basic shape, but it appears to only have two shades: one green and one white.

The ‘hi’ VGA version however, looks different. For some reason, the font is not as high. Instead of the font filling out the entire area between the level view and the icon bar, there are 4 black scanlines between the level and the font. The icons are the same size and in the same position on screen, so effectively the font is scaled down a bit. It is only 11 pixels high, where the others are 15 pixels high. The font has more shades of green here: a number of 4 in total. Still less than on the Amiga (I count 7 shades there) and Atari ST (5 shades).

Okay, so there is something going on here. But what exactly? Well, we are being tricked! The game runs in a 16-colour mode on all three systems. However, if you inspect the screenshots closely, you will see that there are actually more than 16 colours on screen. As already mentioned, the font itself uses various shades of green. You don’t see that many shades of green in the level. That implies that the palette is changed between the level and the font.

This explains why the PC version has a ‘lo’ and a ‘hi’ version: Because VGA is not synchronized to the system clock, it is not trivial to change the palette at a given place on screen. While it is possible (see also my 1991 Donut), it will require some clever timer interrupts and recalibrating per-frame to avoid drift. So that explains why they chose to only do this on high performance PCs. On a slow PC, it would slow down the game too much. It also explains why there are 4 black scanlines between the level and the font. Firstly, because of all the different PCs out there, it is very hard to predict exactly how long the palette change takes. So you’ll want a bit of margin to avoid visible artifacts. Secondly, various VGA implementations won’t allow the RAMDAC to read the palette registers while the CPU is updating them. This can lead to black output or artifacts similar to CGA snow. But if all pixels are black, you won’t notice.

So apparently the ‘hi’ version does perform a palette change, where the ‘lo’ version does not. That means the ‘lo’ version can only use colours that are already in the level palette for its font. It also explains why the icons don’t have the brownish colours of the other three versions: the icons also have to make do with whatever is in the palette.

But getting back to the ‘hi’ version… Its icons still don’t look as good as the Amiga and Atari ST versions. We can derive why this is: we do not see any black scanlines between the font and icons. So we know that the ‘hi’ version does not perform a second palette change between font and icons. The Amiga and Atari ST versions do, however. On the PC, this wouldn’t have been practical. They would have had to sacrifice another few black scanlines, and the CPU requirements would have gone up even further. So apparently this was the compromise. That means that a single 16-colour palette is shared between the font and the icons.

Speaking of which, during the in-between screens, the VGA version also changes palette:

The top part shows the level in 16 colours. Then there are a few black scanlines, where the palette is changed to the brown earth colours and the blue shades for the font.

Mind you, that is still a simplification of how it looks on the Amiga:

Apparently the Amiga version changes the palette at every line of text. The PC is once again limited to changing the palette once, in an area with a few black scanlines. In this case, both the ‘lo’ and ‘hi’ versions appear to do the same. Performance was not an issue with a static info screen, apparently.

The Amiga uses 640×200 resolution here. The PC instead uses 640×350. That explains why the PC version has a somewhat strange aspect ratio for the level overview.

But getting back to the font and icons in-game. They do look a bit more detailed on the Amiga than on the Atari ST. And it’s not just the colours, it seems. So what is going on here? Well, possibly the most obvious place to spot it is the level overview in the bottom-right corner. Yes, it has twice the horizontal resolution of the other platforms. Apparently it is running in 640×200 resolution, rather than 320×200.

That explains why the icons look slightly different as well. They are a more detailed high-resolution version than the other platforms. And if we look closer at the font, we see that this is the high-resolution font that is also used in the other screen.

The Atari ST cannot do this, because it does not have a 640×200 mode that is capable of 16 colours. And for VGA, as already said, it’s not possible to accurately perform operations at a given screen position. So if you can’t accurately change palettes, you certainly can’t accurately change display resolution.

So there we have it, three systems with very similar graphics capabilities on paper, yet we find that there are 4 different ways in which the game Lemmings is actually rendered. Clever developers pushing the limits of each specific system.

I suppose the biggest unanswered question is: why does the VGA version have this limitation? Worst-case, you have 3 palettes of 16 colours on screen, which is 48 colours. In mode 13h, you can have 256 colours, so no palette changes would be required. Instead the developers appear to have chosen to use the same 16-colour mode for both EGA and VGA, and only improve the palette for the VGA version. This may be because they use EGA functionality for scrolling and storing sprites offscreen. In mode 13h you wouldn’t have that. You’d have to perform scrolling by copying data around in memory. That may have been too slow. And perhaps they weren’t familiar with mode X. Or perhaps they tried mode X, but found that it was too limiting, so they stuck with EGA mode 0Dh anyway. Or perhaps they figured they’d need separate content for a mode X mode, which would require too much extra diskspace. Who knows.

http://scalibq.wordpress.com/?p=5548

Extensions

Some results from the modified XDC movie player for 8088/CGA

Scali Jan 2, 2024

For my modifications to the XDC player, I targeted my Philips 3105 XT clone. It has a turbo mode of 8 MHz, and an ATi Small Wonder CGA clone. The harddisk is a 32MB Disk-on-Module connected to an XT-IDE card. … Continue reading →

Show full content

8 MHz you say? Yes, I was somewhat surprised by that myself. Most turbo XT machines will derive their clockspeed from the NTSC base clock of 14.31818. So common speeds are 4.77 MHz, 7.16 MHz and 9.54 MHz.

So I checked with TOPBench and Landmark, and they both confirmed that this CPU is actually running at 8 MHz:

I wrote a simple tool that can halve the sample rate of the audio in a video file, and also preprocess the audio to PWM or Tandy formats, so there is no translation required at runtime, reducing CPU overhead.

The first attempt wasn’t too successful… The machine wasn’t quite fast enough. Or, the code wasn’t quite fast enough, depending on how you look at it. This resulted in a few buffer underruns, causing the PC speaker to glitch, as it would play garbage data at this point (it also does this at the start of each video, as there is no data buffered yet). So I performed some optimization in assembly, and some other minor performance tweaks, and after a few tries, I came up with a version that can *almost* play the 8088 Domination content with 11 kHz PC speaker audio:

As you can see, there is still one place in the video where there’s an underrun, but it quickly solves itself, and it continues playing (with audio and video in sync of course). I suppose that is close enough for me. If I were to optimize the code further, I don’t think I could do it with the inline assembler in Pascal. Besides, the performance is highly dependent on all the hardware. With just a slightly faster video card or a slightly faster HDD controller, this machine could probably play it without a glitch. And the more common 9.54 MHz turbo XTs should also have no problem with it. Nor would a 6 MHz AT.

Oh, and yes, the Bad Apple part is missing. That’s because my HDD is only 32 MB, so I can’t fit all videos on the disk at the same time.

If you want to try this modified version of 8088 Domination for yourself, you can download it here.

http://scalibq.wordpress.com/?p=5523

Extensions

https://scalibq.wordpress.com/feed

Posts