Show full content
Recently we (BUas colleague Peter van Dranen and me) released a 3D racing game for the MSX2 retro platform: A 1986 Z80-based home computer standard, supported by all major manufacturers at the time, including Philips, Sony, JVC, Toshiba and Panasonic.
MSX2 is normally not a platform on which one would render 3D graphics. This was the time of the ZX-Spectrum and the Commodore 64, and graphics were tile-based, with additionally (at least on C64 and MSX) hardware sprite support.

That being said, some 3D games did exist. For example, David Braben’s Elite runs well on the platform, if the player is willing to accept a frame rate of about 5 fps, as well as wireframe 3D objects.
Also worth mentioning is Konami’s Penguin Adventure, which used some clever tricks to produce quite convincing 3D.

Still, it seems rather far-fetched to even attempt an action game with full-screen polygon graphics. And in fact it is! However, learning from Konami, and taking advantage of some specific features of the platform, we can produce something that convinces the player that this appears to be precisely what we did.
In this article we describe the renderer of DELTA (a.k.a. “Dodge! Engage! Launch! Target! Assault!”). We start with a description of the MSX2 platform specifics relevant to the technique. We then show that 3D images of reasonable complexity can be encoded as tiles and sprites. And finally, we show how this data can be compressed to better fit available ROM options, while allowing for fast run-time decompression.
Platform Specifics: VDPThe MSX2 platform provides graphics via its VDP (Video Display Processor), specifically: the V9938. The chip provides several display modes, including a full-colour 256×192 mode and a 16-color 512×192 mode. The modes that offer most freedom are of course also those that require most bandwidth, which makes them often impractical for real-time graphics.
Promising are the tile-based modes, which use 8×8 character tiles to fill the screen. Each tile uses a byte per line of 8 pixels to define its ‘pattern’, as well as a byte per line to define the two colors used in that pattern. This way a tile requires only 16 bytes of storage, while allowing the use of 16 colours, as long as no more than two colours per line of 8 pixels are used. Tiles often don’t change from frame to frame, which further reduces required bandwidth. Instead, the tile based screen modes use a ‘name table’ of 768 bytes which index the 256 unique tiles.
The screenshot of Gradius 2 shows this approach in action. Colourful tiles are used (and re-used) to build a detailed display. Scrolling happens with 8 pixels at a time, and is done by modifying the name table alone.
Gradius 2 also shows another important feature of the MSX2 VDP: Hardware sprites. The player ship, bullets and smaller enemies are rendered using 8×8 or 16×16 images. A maximum number of 32 of these can be displayed at any time, on top of the tile display. There is a limitation however: Each screen line can display a maximum of 8 sprites at a time. These sprites will turn out to be very useful for solving the ‘two colours per 8 pixels’ limitation of the tiles.
Platform Specifics: MemoryApart from the VDP there is another important (and rather unique) aspect of the MSX2 hardware standard: Memory. Where other systems at the time were limited to 64 kilobytes or 128 kilobytes, MSX2 went above and beyond.
The 64KB limitation is a result of the choice of processor: The Z80 (like other common CPUs at the time) can only use 16 bits for RAM addresses, which limits RAM to 65536 bytes. MSX2 however solves this with various mechanisms (slots, memory mappers, megaroms), effectively doing away with any and all limits on RAM and ROM capacity. Back in 1988 this allowed Konami to use 4 Megabit (512KB) for Metal Gear 2 Solid Snake; however in 2025 the community is producing Flash cartridges of up to 16MB. On top of that, all that memory can be accessed at the same speed: The CPU will never know the difference between the original 64KB and the expanded 16MB. This opens up entirely new ways of doing graphics.
NumbersBefore we dive into the specifics of the DELTA renderer, let’s list some numbers that affect what can and cannot be done.
CPU: Zilog Z80A, running at 3.58Mhz. It needs four cycles per instruction, at least. At 20 fps, the very best it can do is 179,000 instructions. In practice, it will do far less. The CPU can (at best) copy bytes at a rate of 1 per 5.5 cycles, so ~651KB per second or 32.5KB per frame at 20fps.
VDP: Data transfer to the VDP is slower. The details are complex, but in general, the VDP cannot accept data faster than 1 byte per 15 cycles, or ~12KB per frame at 20fps.
Bandwidth: The bandwidth-friendly tile-based mode ‘GRAPHIC4’ (see below) displays 32×24=768 tiles. The name table stores a byte for each of these, so changing each tile index requires writing 768 bytes. If we also want to update the tiles themselves, we need to write 16 bytes per tile for patterns and colours, so 4KB in total. And finally, if we want to update the patterns for 32 sprites of 16×16 pixels, we need to write 1KB of data. The colours for those sprites sum up to 512 bytes, and their position and index data is another 128 bytes. It sounds like a lot, but when we sum everything, it turns out that we can update everything that affects screen appearance by writing 6,528 bytes, i.e. well within the bandwidth limit of the VDP.
Data: For a racing game we need a track of some length. At 20fps, 800 frames yield a 40 second race; two laps make this 1 minute and 20 seconds, which seems about right. At 6,528 bytes per frame, this would be a little more than 5.2MB.
The DELTA RendererWith the above in mind, our goal now becomes to convert a 256×192 16-color bitmap to (no more than) 256 tiles, plus a 32×24 name table indexing those tiles, and a set of sprites. A ‘nice to have’ would be some form of compression, so a 16MB cartridge can store at least 4 (but ideally more) tracks.

The question is of course, will 256 unique tiles be sufficient for all frames of the track, and how many sprites will we need to resolve issues where 8 pixels on a tile need more than 2 colours?
Here is a frame from the game (on the left) with the (not yet repaired) tiles on the right:

The tiles used in this frame are:

This is well within budget; in fact 31 tiles remain unused for this frame, some of which are claimed for displaying score and other game data. It turns out that this is the case for all frames of the animation: As long as the detail is dosed carefully, it is quite possible to make a pleasing 3D environment using just 236 or so tiles.
The sprite set needed to resolve the remaining issues is modest as well:

Note how e.g. the yellow sprites above the car resolve the problem where a single tile must display blue for the water, dark-grey for the track and yellow for the speed lane symbols.
Some frames are more complex. For example, this frame from the intro animation:

Here, the subtle shading on the letters M and S causes substantial issues for the tile conversion. A lot of sprites are needed to remedy this:

So many in fact that we run the risk of exceeding the 8 sprites per line limit. This limit is strict on MSX2: Even parts of the sprite that do not contain any pixels contribute to the count, and exceeding the limit makes sprites invisible, exposing the tile issues we where trying to hide.
Luckily, most ‘remedial sprites’ do not use the full 16×16 square available to them. By shifting the set pixels up or down it is often possible to find a sprite configuration where no line contains more than 8 sprites.
During gameplay this challenge is even greater however, and this is where DELTA occasionally fails. The car sprite needs multiple colours, and therefore multiple sprite layers. It is also larger than 16×16 for many angles, and therefore the car is rendered using no less than 8 sprites. These do not occupy the same lines on-screen, but still four sprites do – and that means the scenery has only four slots to work with, at least on the screen lines occupied by the player car. One consequence of this is that DELTA is strictly a single-player game, without other cars to combat the player.
GRAPHIC4In the next section we will discuss how the prerendered data can be compressed. Before we get to that, we look at the used VDP mode in some more detail.
Mirror ModeGRAPHIC4 on the MSX2 VDP refers to a 16-color tiled video mode, where each of the 16 colours can be chosen from a 9-bit palette. On top of the tiles, up to 32 single-color sprites can be displayed, with a maximum of 8 occupying any screen line.
Tile indices are stored in a 768 byte (i.e., 32×24) name table. By default, GRAPHIC4 uses three tile sets: One for the top 1/3rd of the screen, one for tile lines 9 – 15 and the final one for tile lines 16 – 23, effectively allowing a unique tile on every location, if desired. Mirror mode allows these three sets to be identical, so that a tile used on position (0,0) can also be used on tile position (0,8). DELTA uses this mode to reduce tile data from 3x 4KB to 1x 4KB.
VRAM LayoutThe MSX2 VDP lets the programmer select where each of the data tables is located in VRAM. The tables are:
- Name table: Stores tile indices. 768 bytes, by default located at 0x1800.
- Tile pattern data: 256 x 8 = 2048 bytes, by default located at 0x0000.
- Tile colour data: 256 x 8 = 2048 bytes, by default located at 0x2000.
- Sprite pattern data: 32 x 32 = 1024 bytes, by default located at 0x3800.
- Sprite colour data: 32 x 16 = 512 bytes, by default located at 0x1C00.
- Sprite attribute data: 32 x 4 = 128 bytes, by default located at 0x1E00.
DELTA renders at 20fps and thus needs 3 NTSC frames to create the next image. This means that double buffering is required. In the case of DELTA, this means: double-buffering all tables. We use the following layout:
Switching between the two pages is very fast and can be done during the vertical retrace to avoid tearing.
Note that in the above scheme, only 64KB is used, which simplifies setting VRAM addresses somewhat. Moreover, the name table and sprite data for both pages resides within the first 16KB, further reducing the effort of specifying VRAM addresses. Only the tile patterns and colours touch addresses above 0x4000; these are however always written in large batches, amortizing the VRAM address setup cost over many writes.
CompressionIn the previous sections we discussed the specifics of the MSX2 video processor. Respecting the limitations of the system, but ignoring ROM size for the moment, we conclude the hardware meets our initial requirements:
- We can convert 16-color images containing reasonable polygon detail to less than 256 tiles.
- The number of sprites needed to remedy tile colour issues is reasonable.
- Bandwidth is within boundaries.
However, even when we do not use all tiles for all frames, the raw data for 800 animation frames approaches 5MB, which is unwieldy. So, a new question arises: Can this data be compressed, and can the poor Z80 decompress it fast enough to make this worthwhile?
We approach this question in three steps:
- Reducing per-frame sprite data traffic
- Optimal run-length encoding and decoding of tile colour data
- Run-length encoding of tile pattern data
- Adaptive run-length encoding of the name table data
- Data placement in megarom segments
Each step is detailed in the following paragraphs.
1. Per-frame Sprite Data TrafficAs can be seen in Figure 6 and 8, most remedial sprites are small, often just a few pixels. It thus seems a waste to write full sprite pattern data each frame. Instead, DELTA keeps track of occupied lines in sprites. In subsequent frames, these lines are either overwritten with new data, or cleared.
A few examples: A sprite that stores two lines of data that becomes unused in the next frame will have these two lines cleared using two single-byte writes. A sprite that stores three lines while in the next frame that same sprite stores only two lines will have those two lines overwritten (for patterns and colours), and the third line cleared (just the pattern).
At the expensive of some non-trivial administration, this approach greatly reduces bandwidth for sprite patterns. A further reduction is possible if we do not overwrite colour data if this did not change, but it turns out that this is too rare to justify the cost of checking this.
Table 1 shows the sprite traffic of several frames of track 1.
As can be seen in Figure 5, tiles for a particular frame often use only two colours. We thus split all tiles in two groups: One group for 2-colour tiles, and one group for tiles that require more colours. The 2-colour tiles are then sorted in such a way that large continues spans of the same colour values emerge. Figure 5 clearly shows this. Run-length encoding of this data becomes very effective, reducing tile colour data to a fraction of its original size.
The run-length scheme used in DELTA is simple and translates to efficient Z80 assembler for decoding directly to VRAM. The compressor distinguishes two types of span:
- ‘Uniform spans’, storing three or more identical values, with a maximum length of 128 pixels
- ‘Raw spans’, storing arbitrary values, also with a maximum length of 128 pixels.
A span starts with a single byte that stores the span type in 1 bit and the length in the remaining 7 bits. To speed up decompression, a raw span is assumed to be followed by a uniform span.
The core of the Z80 decompression code is shown below. Timings include the M1 wait on MSX, as specified on Grauw’s website.
... setup VRAM access
ld a, (hl) // 7 - get first command
loop:
rra // 5T - lowest bit contains span type (1)
ld b, a // 5T - b = length of raw span
jp c, dospan // 11T - span is uniform?
inc hl // 7T - advance input stream pointer
otir // 23T - emit 'b' bytes reading from [hl]
ld b, (hl) // 7T - b = length of uniform span minus two
dospan:
inc hl // 7T - advance input stream pointer
ld a, (hl) // 7T - a = value for uniform span
out (0x98), a // 12T - write first pixel of span (2)
inc hl // 7T - advance input stream pointer
out (0x98), a // 12T - write second pixel of span (2)
nop // 5T - VDP timing (3)
uniform_loop:
out (0x98), a // 12T - write remaining span pixels
djnz uniform_loop // 14T - while (--b) goto uniform_loop
xor (hl) // 8T - obtain next command (4)
jp nz loop // 11T - if not zero: next pair of spans (5)
The run-length decompression code is pivotal for the performance of the renderer. It has thus been very carefully crafted. Some details:
- The span type is stored in the lowest bit of each span header. This way, it can be shifted into the carry flag using the rra instruction.
- A uniform span is at least three pixels. The first two are drawn unconditionally, outside the loop.
- The nop is needed to avoid timing issues for the VDP, as the out instruction before it takes less than 15 cycles.
- After the first two spans, span headers are xor’ed with the value used in the previous uniform span. This way, a single xor (hl) sets the zero flag on end-of-stream while obtaining the next command in all other cases.
- The final span must be a uniform span (of at least three bytes), even if the input data does not need this. For this reason, the target buffer must have room for this dummy span. In DELTA, line 23 in the name table starts with three spaces for this reason.
And finally: Instead of the final jp nz it is more efficient to unroll the code (several times), as not taking a jump is faster on the Z80. DELTA unrolls the code three times for this purpose.
Table 2 shows details on the compression of tile colour data. Simple run-length encoding is able to decimate the data rate here.
3. Compressing Tile PatternsAt first sight, tile pattern data appears to be mostly random and thus unsuitable for basic run-length encoding. It turns out however that this is worthwhile.
DELTA further improves run-length encoding of tile pattern data using random reordering. This is based on the observation that the order of tiles within a slice of equally coloured tiles is irrelevant; swapping two tiles (along with their indices in the name table) yields the same image. We obtain a near-optimal ordering using a random process, in which we repeatedly swap tiles and assess the rate of compression, keeping advantageous mutations.
Table 3 shows the result of tile pattern compression. Although compression is not nearly as efficient as in the case of colour data, size is still almost halved in many cases.
Each frame uses 32×23 tile indices in the name table (the last line of tiles displays score and such). Like the tile colours, this data is run-length compressible, although the gains are often modest.
We found that for some complex frames maintaining 20fps was difficult. Therefore, name table compression is made optional: It is skipped when the tile count exceeds a certain threshold, which is a good indicator for frame complexity. Copying raw data is always faster than decompression; by making it adaptive we sacrifice compression efficiency for rendering performance. If needed, this could be extended to other data, e.g. the tile pattern data.
Table 4 shows the efficiency of name table compression. Compared to colour and pattern data, name table compression shows somewhat poor compression.
After compressing all data for a frame, the end result is somewhere between 800 bytes and 2 kilobytes, with an average size of 1695 bytes for track 1 frames. This data has to be stored in one of the 16KB segments of the final megarom, and for efficiency reasons, it should not span a segment boundary. At 1695 bytes per frame, this would leave (on average) a gap of 848 bytes at the end of each segment, which is a 5% waste of space.
Here we again use the stochastic optimization approach. After generating all track frames, we reorder them in such a way that each megarom segment is as full as possible. For the 802 frames of track 1, this reduces wasted space in the segments to just a few bytes.
Future WorkAs stated in the previous paragraph, compression of track data yields an average size of 1695 bytes per frame, which is a solid improvement over storing raw data, while still allowing run-time decompression. There is however room for improvement.
To improve rendering speed and compression efficiency, we should reuse data more often. The tile colour table is 2KB for 256 characters; yet from frame to frame tile colours will be similar. We may be able to avoid writing a substantial part of that 2KB if we carefully order our tiles. This would however require a strict sequential decompression of frames, potentially using occasional keyframes.
Similarly, we could simply leave all (or most) tile colours untouched, by predefining groups of tiles with specific colour pairs.
If memory capacity increases even further, it may be advantageous eventually to discard compression altogether. Clever reuse of frame-to-frame coherence then becomes the only useful scheme.
And finally, we did not explore the use of a dynamic palette. Currently a track can use 16 colours, of which 3 are shared with the player car and another one (white) with the score display. The remaining 12 colours are rarely in view at the same time, which should allow for more freedom in this regard.
Concluding RemarksThe main technical contribution in DELTA is, we believe, the use of GRAPHIC4 mode combined with sprites to display pre-rendered polygon graphics. The available bandwidth and sprite count allows for a level of detail that is sufficient for a 256×192 display, as evidenced by the game.
Additionally we demonstrated that basic run-length encoding is effective for all data streams: Although tile colour data benefits most, tile pattern data and name table data still benefit substantially, while decompression remains simple enough to be used at 20fps on the Z80 CPU.
Finally, we indicated several avenues for further improving the technique. This could either allow for higher frame rates (25 fps seems feasible), more complex scenery, a larger display (using ‘overscan’ for 26.5 lines of tiles), or more complex game play and/or music.
The technique does impose strong limitations on gameplay. When properly mitigated, we believe the proposed technique provides a visual style that is rather unique for the platform.
Playing DELTADELTA has been submitted to the MSXDev25 competition and can be played for free via File-Hunter.com.
Contact us in case of questions / remarks: bikker.j@gmail.com.





























































