M68k LLVM — GeistHaus

What’s New in M68k LLVM (May 2023)

Min-Yih "Min" Hsu May 25, 2023 Updated May 25, 2023

It has been a minute1 since the last update on our Open Collective and Patreon campaigns. So I thought it’s a good idea to have a slightly more formal write-up on the progress we made in the past year. Last update was March 2022, my apology for the delay. ↩

Show full content

Atomic Instructions

Atomic instructions are commonly seen in modern architectures to perform indivisible operations. However, historically speaking, atomic instructions have never really been a thing for m68k, since processors in this family are predominantly single-core2, which is the model we primarily focus on in this project. That said, as a backend we still need to lower atomic instructions passing from earlier stages in the compilation pipeline. Otherwise, LLVM will simply bail out with a crash.

For atomic load and store, the stories are a lot simpler: due to the aforementioned single-core nature, lowering them to normal MOV instructions should be sufficient, which was something D136525 did. In the same patch, the author, Sheng, also dealt with something more tricky: atomic compare-exchange (cmpxchg) and its friends, like atomic fetch-and-add (or add-and-fetch). Despite being single-core, the processor can still run multi-tasking systems. So we need to make sure an atomic cmpxchg is immune to system routines like interrupts and/or context-switching. To this end, 68020 and later processors are equipped with the CAS instruction, which can be used as the substrate for fetch-and-X instructions, in addition to implementing cmpxchg. For older processors, we expanded these instructions into lock-free library calls (i.e. __sync_val_compare_and_swap and __sync_fetch_*). In addition, this patch also lowered atomic read-modify-write (RMW) and any atomic operations larger than 32 bits into library calls of libatomic, which are not lock-free3. Last but not the least, 85b37d0 added the lowering for atomic swap operations.

D146996 was dealing with a similar puzzle: atomic fence. As mentioned before, we don’t need to worry about the memory operation order in a in-order single-core processor, like most members in 68k. Thus, this patch only needs to prevent compiler optimizations from reodering instructions across atomic fence. I believe there is definitely a more sophisticate solution, like adding dependencies (e.g. SelectionDAG chains) between instructions placed before and after the fence…but, well, I was lazy so I literally copied what m68k GCC did: lower atomic fence into an inline assembly memory barrier a.k.a asm __volatile__ ("":::"memory") (more precisely, an inline assembly instruction in LLVM’s MachineIR).

That said, if we want to deal with potentially-out-of-order 68060 processors in the future, we might need to lower any fence into a NOP, which has the syntax of synchronizing the pipeline.

Floating Point Support

Similar to atomic instructions, another thing people might find surprised is the lack of (builtin) floating point insturctions in most 68k family processors. Just like the original x87, m68k employed co-processors for floating point (FP) operations, called 68881 and 68882 (after 68040, the 68881/2 are integrated into the main processor). Luckily, compared to x87, m68k’s FP instructions are much more straightforward. Notably, they use nearly identical addressing modes as their integer counterparts (except using floating point data registers, of course). The list of FP instructions can be found here.

D147479 and D147481 laid down 68k’s FP foundation like new register classes and compiler driver flags, in their respective LLVM and Clang components; D147480 and D148255 added definitions and MC supports (AsmParser/Printer and disassembler) for an extremely limited number of data and arithmetic instructions. Currently no codegen support has been added to these instructions, which is definitely one of our future plans. Aside from codegen, an easier task might be adding preliminary inline assembly supports for floating point constraints and escaped characters, as described in this issue.

Aggregate-Type Return Values

One of the patches authored by our new contributor, Ian (Welcome!🎉), was D148856, which added supports for lowering aggregate-type return values like structs or arrays.

The way how a function returns aggregate values is heavily ABI-dependent. While many modern architectures leverage registers to return small structs (and use memory for larger ones), none of the 68k’s ABIs specify such optimization. Therefore, we simply return aggregate values by storing to a caller-allocated4 memory, whose pointer is passed from an implicit-inserted function argument. So a C++ code like this:

struct Hello {
  unsigned f1, f2;
  float f3;
};

Hello foo(unsigned v) {
  Hello obj{v, v, 8.7f};
  return obj;
}

will be translated to the following LLVM IR code when targeting m68k:

define void @_Z3fooj(ptr sret(%struct.Hello) %agg.result, i32 %v) {
entry:
  store i32 %v, ptr %agg.result
  %f2 = getelementptr inbounds %struct.Hello, ptr %agg.result, i32 0, i32 1
  store i32 %v, ptr %f2
  %f3 = getelementptr inbounds %struct.Hello, ptr %agg.result, i32 0, i32 2
  store float 0x4021666660000000, ptr %f3
  ret void
}

in which the %agg.result is the implicit-inserted argument used to return our aggregate value.

Inline Assembly: Memory Constraints

We added the supports for most of the inline assembly constraints, either target independent or dependent ones about 2 years ago – but memory constraints (e.g. m) had always been absent.

Wait no more! D143529 just added the supports for m, Q, and U constraints. The m constraint accounts for generic memory operands; Q is subject to any addressing modes that involves an address register as the base; U is similar to Q, but limited to those using constant offsets.

There were two challenges in this task: first, there are more than one possible addressing modes to which operands with any of those memory constraints can be lowered. We need to use instruction selector to select an addressing mode for the memory operand in question. Unfortunately, in our backend, the selection logic for finding the optimal addressing modes is not easily accessible from outside the instruction selector, so we ended up trying every possible addressing modes, one after another, following an order I arbitrarily picked🤪.

Second, AsmPrinter – the last Pass in the codegen pipeline – is required to print the selected memory operand into the inline assembly string without the help of MC5. Since m68k’s own MC component also has the exact same printing logics for memory operands, we ended up having duplicate code in two places (AsmPrinter and m68k’s MC). To avoid that, D143528 factored out the printing logics shared by so that both components can share.

Miscs Improvements on register spilling

D133636 fixed the instructions emitted for register spilling.

TRAP instruction and its friends

D147102 added MC supports6 for TRAP, TRAPV, BKPT, ILLEGAL. These instructions are crucial for making (Linux) system calls. This patch also added a special immediate operand class to verify the odd-sized 3-and 4-bit immediate values used in some of these instructions.

Some exception handling supports

058f744 specified the registers for exception handling: d0 for exception pointer and d1 for selector, as suggested by GCC.

Better -stop-before/after flags

Started by Nick in D140364 and followed by 3204740, most of the machine Passes in 68k’s codegen pipeline are now registered with sane names. So that it’s easier for backend developers to stop/(re)start from a specific point in the codegen pipeline with the -stop-before/after=<pass name> flags.

// End Report

Last update was March 2022, my apology for the delay. ↩
related: 68060 actually has superscalar support. ↩
LLVM has an amazing page documenting codegen for atomic operations, including the explanations to lock-free v.s. libatomic library calls. Highly recommended. ↩
SysV ABI actually didn’t specify who (caller v.s. callee) should allocate the memory, but LLVM’s codegen infrastructure designates it to caller by default. Plus, it makes more sense. ↩
The reason being that, despite being rare, integrated assembler is not required for a target. Thus, we need to make sure those inline assembly memory operands are still properly printed in the absence of integrated assembler. ↩
It’s almost exclusively used in inline assembly so no codegen support is needed. ↩

https://m680x0.github.io/blog/2023/05/may-updates

The tale of `-mrtd` in GCC and Clang

Min-Yih "Min" Hsu May 6, 2023 Updated May 6, 2023

Recently I’ve been working on an issue about supporting the RTD instruction in m68k LLVM backend (well, it turned out to be not directly related to the instruction itself but we will talk about that shortly). The gist of that issue goes like this: the backend bailed out when it tried to lower a certain kind of return statement to RTD, if we’re targeting 68020 or later1. The 68020 predicate is actually wrong, RTD is already available in 68010. ↩

Show full content

Recently I’ve been working on an issue about supporting the RTD instruction in m68k LLVM backend (well, it turned out to be not directly related to the instruction itself but we will talk about that shortly). The gist of that issue goes like this: the backend bailed out when it tried to lower a certain kind of return statement to RTD, if we’re targeting 68020 or later1.

RTD is a variant of return instruction that subtracts some bytes, whose quantity is indicated by the instruction’s operand, from stack pointer (which effectively pops the stack) before returning. It can be used to implement a special kind of calling convention in which the callee has to clean out the space allocated for arguments passed from the stack. Though you don’t have to use RTD to implement the said calling convention, as long as you pop the arguments upon returning.

In m68k GCC, this calling convention is not enabled by default unless the -mrtd flag is present. Since this project is aiming to be compatible with its GCC counterpart (and the fact that this calling convention was not commonly used even in the good ol’ days), we want to implement the same behavior.

Cool, I guess we need to add -mrtd to Clang. Because given “rtd” being such an odd name and sounds really 68k-specific, there is no way it’s already there…

“So, I have been digging into this now and what I found is that -mno-rtd is actually already handled by Clang:

def mrtd: Flag<["-"], "mrtd">, Group<m_Group>;

and

def mno_rtd: Flag<["-"], "mno-rtd">, Group<m_Group>;

Also documented here: https://clang.llvm.org/docs/ClangCommandLineReference.html”

– Quoted (and slightly edited) from one of the comments by Adrian.

So -mrtd is already there? That’s a little weird…

It seems like X86 is the only user of that flag. Specifically, only the 32-bit i386 which uses it to enable the stdcall calling convention, a CC that also requires callee functions to pop out incoming arguments on the stack. stdcall is primarily adopted by Win32 API.

Aside from stdcall’s similarity with our special CC mentioned earlier, it’s not quite obvious why the flag is named after something unrelated to i386.

Rest of this post is served to answer a simple question: why do i386 Clang and GCC use the name “(m)rtd?”. It’s important to note that this article is NOT meant to criticize / compare any naming choice or convention made in the past between different compiler backends and implementations, but merely a historical study.

Why is it called “(m)rtd”?

The first assumption I came out was that maybe there is an instruction called “rtd” in i386, specifically the old 80386 processor. Given how iron-fisting Intel is on maintaining backward compatibility, it’s nearly impossible that any instruction has been removed from the ISA since 80386. Therefore, looking up a relatively modern 32-bit x86 ISA manual should suffice.

Unfortunately, a simple search will tell you that i386 only has RET and RETI (return from interrupt). RET does have a variant that takes an immediate-value operand, acting just like RTD in 68k we mentioned earlier. But, well, it’s still called “ret” rather than “rtd”.

Now, maybe there are some clues in the patch that introduced this flag to Clang or even GCC – time to dig into the past.

Dragon archaeology

Let’s start from the Clang/LLVM side. The -mrtd flag was added to Clang by 65b88cd in 2011. Unfortunately, there wasn’t any commit message or code comment attached to shed some lights on the choice of flag name. But luckily Clang, as a compiler driver, is supposed to be compatible with GCC. So one can safely assume that this flag is originated from GCC.

Digging into GCC’s source code, at hindsight 6ac4959 added -mrtd to i386 GCC in 2005, residing in file config/i386/i386.opt. But if we look closer, that patch was merely transferring flag declarations to the newer generator-based approach with *.opt files. The original definition of -mrtd flag in config/i386/i386.h can actually be traced all the way back to the initial version of i386 backend! Specifically, c98f874 authored on Feb 9th 1992.

DragonGnu archaeology

Here was my assumption:

The name “-mrtd” in i386 GCC was reused or copied from its m68k counterpart.

So I looked into the first commit that introduced -mrtd to m68k GCC, which was 3d339ad. But the timestamp showed that it was authored on Feb 18th 1992, 9 days after the first file in the initial version of i386 GCC!

Is it possible that m68k GCC’s -mrtd was actually copied from i386 GCC?

“LookMaNoVCS_FINALv9.4FINALFINALrev87.psd”

It turns out GCC not only has used several different VCS (Version Control System) in the past, it was not even managed with a VCS at the very beginning. According to GCC’s History, the first (beta) release was put…on a FTP server located in MIT.

So the Feb 9th 1992 date we just mentioned was merely the time i386 backend was checked into GCC’s VCS from a plain source tree. Same for the Feb 18th 1992 date of its m68k counterpart. In other words, it’s highly likely that the code for i386 and m68k backend was already there before any VCS adoption. The best way to answer this is to grab GCC’s pre-VCS era source code. Unfortunately, while the FTP server that originally hosted GCC is still there, I no longer can find that particular copy of source code. We can only make some educated guesses now.

There are three pieces of clues I found useful here:

-mrtd was also used for one of the obselete (and ancient) architectures called Gmicro. A comment about -mrtd in Gmicro backend said: “…On the m68k this is an RTD option, so I use the same name for the Gmicro. The option name may be changed in the future.”
The initial version of config/i386/i386.h and config/m68k/m68k.h shared a nearly identical line of the comments related to -mrtd handlings (i386 line v.s. m68k line). The only difference between them is the supported processor name (i.e. “80386” v.s. “68010”).
From the announcement of the first GCC beta release made by RMS (circa March 1987), it’s high likely that m68k and VAX were the only two supported targets.

Item (1) suggests that reusing a flag name, despite having little to do with the respective instruction name (in Gmicro the corresponding instruction is called EXITN not RTD) was a thing, and might even be a common practice; item (2) is likely to be a trail of boilerplate copy-n-paste on not just the comment but also the code, as well as the flag. Finally, item (3) further affirms that if both (1) and (2) hold, it’s likely that -mrtd was reused or copied from m68k to i386 GCC rather than the other way around.

Conclusion

Though there isn’t any direct evidence2 showing that -mrtd was borrowed from m68k GCC to i386 GCC (and eventually rippled to Clang), from the artifacts I presented it’s very likely the case.

But in any case, should patch D149864 and D149867 be accepted, m68k Clang/LLVM will finally have the ability to recognize -mrtd — nearly 40 years after its debute in GCC.

A small victory for the m68k LLVM community nonetheless!

The 68020 predicate is actually wrong, RTD is already available in 68010. ↩
Alternatively I can directly ask some of the early GCC contributors. Unfortunately I’m not sure whether their emails are still reachable or even worse, whether they are still with us. ↩

https://m680x0.github.io/blog/2023/05/the-tale-of-mrtd

Encoding Variable-Length Instructions in LLVM

Min-Yih "Min" Hsu Feb 16, 2022 Updated Feb 16, 2022

One of the most important jobs for an assembler is encoding assembly instructions – potentially in a textual form – into their corresponding binary code, namely the instruction encoding. LLVM has provided a nice framework to spare toolchain developers from crafting this process manually, by generating such encoder, also called code emitter, from high-level target instruction descriptions. That is, the instruction info written in TabelGen. This feature greatly improves the productivity and makes the target-specific codebase more readable.

Show full content

One of the most important jobs for an assembler is encoding assembly instructions – potentially in a textual form – into their corresponding binary code, namely the instruction encoding. LLVM has provided a nice framework to spare toolchain developers from crafting this process manually, by generating such encoder, also called code emitter, from high-level target instruction descriptions. That is, the instruction info written in TabelGen. This feature greatly improves the productivity and makes the target-specific codebase more readable.

However, such framework came with an assumption that is unfavorable to our M68k target: All target instructions must have the same size. Where M68k has always been an ISA with variable-length instructions, ranging from 2 bytes to 22 bytes, since day one.

In this blog post, I’m going to talk about how M68k LLVM overcomes this issue by augmenting the existing code emitter generator with more powerful TableGen syntax.

First, let’s talk about some backgrounds and history behind M68k’s instruction encoding scheme.

The original code emitter generator (CodeEmitterGen)

The core idea of the original code emitter generator is to generate the instruction encoder’s C++ code from (target-specific) TableGen instruction definitions via a special TableGen backend, namely the CodeEmitterGen.

For instance, assuming we have a hypothetical instruction, MyInst, that has the following encoding scheme:

  15       12  11         8   7             0
| 1  0  1  1 | dst register | 8-bit immediate |

In this scheme, bit 15 to 12 are fixed values. Bit 11 to 8 and 7 to 0, on the other hand, contain the encoded value of destination register and a 8-bit immediate, respectively. So for instance, if the destination register is r8, which is encoded as 0b1001 in our hypothetical architecture, and 0b01010111 for the immediate, then this MyInst will eventually be encoded as

1011_1001_01010111

The key is that an instruction is usually consisting of fixed values – in this case 0b1011 on bit 15 to 12 – and dynamic parts – dst register and 8-bit immediate – that varies among different instances. Which are usually the instruction operands.

The TableGen instruction definition also accounts for this fact. Below you can see the definition for MyInst:

class MyInst : Instruction {
    let OutOperandList = (outs GR64:$dst);
    let InOperandList  = (ins GR64, i8imm:$imm);

    bits<16> Inst;

    bits<4> dst;
    bits<8> imm;
    let Inst{15-12} = 0b1011;
    let Inst{11-8} = dst;
    let Inst{7-0} = imm;
}

We can ignore the OutOperandList and InOperandList fields for now. The Inst field is where we put the instruction encoding scheme.

For the fixed values part, we simply assigns the bits to the corresponding bit segment:

let Inst{15-12} = 0b1011;

For the dynamic parts, it is twofold: First, we declare a new field (i.e. bits<4> dst and bits<8> imm) for each instruction operand. Then, these uninitialized fields are assigned to their corresponding bit segments right away:

let Inst{11-8} = dst;
let Inst{7-0} = imm;

The CodeEmitterGen TableGen backend will then interpret these descriptions and generate an instruction encoder.

Alright, so this is the status quo of how majority of the LLVM targets (which have fixed-length instructions) describe their instruction encoding schemes. Let’s try to adopt this framework in our M68k target.

Right off the bat, we need to solve the problem that the Inst field has a fixed size. The most intuitive solution is, of course, using the maximum instruction size of the ISA – 22 bytes in M68k’s case. But then for every encoded instructions emitted from the encoder, it will always be 22 bytes long regardless of it’s instruction type, which is not what we want, since CodeEmitterGen was designed with fixed-length instruction in mind.

Luckily, there is a post-processing callback we can define to massage the size of encoded instructions:

// MxInst is the base class for all the M68k instructions.
class MxInst : Instruction {
    let PostEncoderMethod = "adjustInstSize";
    ...
}

By specifying the name of the post-processing function in the PostEncoderMethod field, the generated encoder will use that function to give the “raw” encoded instruction – in this case the 22-byte long instruction – a pass before presenting the final result. In this case, the adjustInstSize function might look like this:

APInt M68kMCCodeEmitter::adjustInstSize(const MCInst &MI, const APInt &RawValue,
                                        const MCSubtargetInfo &STI) {
    unsigned RealSizeInBits = 22 * 8;
    switch (MI.getOpcode()) {
        case M68k::ADD16ri:
        case M68k::SUB16ri:
        ...
            RealSizeInBits = 32;
            break;
        ...
    }

    return RawValue.truncOrSelf(RealSizeInBits);
}

Basically, we copy the original encoded instruction and truncate to its real size, deduced from its opcode. Though this also means we need to enumerate all of opcodes in this function, it’s not a big deal. Actually, later on we will see that the problem we just discussed is the least tricky one.

Next on our issue list, we have M68k instructions whose operands might vary in their sizes. Take the move.w instruction below as an example:

move.w %d0, (87,%a1)

Which moves data from %d0 to (87,%a1), the destination. And its encoding has a layout like this:

 31                               16 15                                0
|  immediate in destination operand |          Base encoding            |

Alternatively, we can use an immediate value as the source operand:

move.w #94, (87,%a1)

In this case, the instruction size increases to 48 bits, with the following layout:

 47                               32 31                        16 15                                0
|  immediate in destination operand | source operand (immediate) |          Base encoding            |

The problem here is that a single instruction type can have dramatically different operands in terms of their sizes and bit positions (immediate in the destination operand was at 31 ~ 16 in the first case, but moved to 47 ~ 32 in the second one). But in the framework provided by CodeEmitterGen, we can only assign an operand’s placeholder field into a fixed position (e.g. let Inst{11-8} = dst we saw earlier).

A potential solution will be creating an opcode for each of these instances. So in our previous example, move.w %d0, (87,%a1) and move.w #94, (87,%a1) will have different opcodes. This is not a bad solution, to be honest. In fact, most of the LLVM targets use this very approach when encountering similar cases.

The real problem is, it doesn’t scale in M68k.

In M68k, there are at least 12 different types – or what you can call addressing modes – of operands. And each M68k instruction, unfortunately, supports nearly all of them. That means, for all 190+ differnt instruction types, we need to compose – 12 times 190 – around 2000 different instruction definitions in our TableGen file!

You may ask: “Wasn’t TableGen invented to eliminate such repetition?”. Well, TableGen does have amazing features to factor out common patterns…except for those we’ve seen here. More specifically, the fact that an operand’s placeholder field can only be assigned to a fixed position, which we discussed previously, really prevent us from factoring out anything useful.

We can, of course, change the TableGen language and allow non-literal values for indicies in bits slicing (e.g. let Inst{8-n} = ... where n is a template variable). But it seems to be much more difficult if not impossible.

Last but not the least issue, we have complicated M68k instruction operands that are not even located at contiguous bit positions! In the previous issue we saw this instruction layout for move.w %d0, (87,%a1):

 31                               16 15                                0
|  immediate in destination operand |          Base encoding            |

The reason I said “immediate in destination operand” rather than “destination operand” was because bit 31 ~ 16 was just part of the destination operand, namely the 87 part in (87,%a1). For the other part, base register %a1, it is located at bit 2 ~ 0.

However, CodeEmitterGen’s TableGen framework doesn’t provide any mechanism to assign such sub-operands into Inst. A potential workaround will be assigning the operand’s placeholder to every sub-operand positions. For example:

let Inst{31-16} = dst;
let Inst{2-0} = dst;

But then we need a way to assigned the correct encoded value to the right bit position.

LLVM calls into a callback function, which can be customized on a per-operand basis, whenever it needs to encode an operand. It has the following function signature:

uint64_t getMachineOpValue(const MCInst &MI, const MCOperand &MCO,
                           SmallVectorImpl<MCFixup> &Fixups,
                           const MCSubtargetInfo &STI);

or the following when instruction size exceeds 64 bits:

void getMachineOpValue(const MCInst &MI, unsigned OpIdx, APInt &Result,
                       SmallVectorImpl<MCFixup> &Fixups,
                       const MCSubtargetInfo &STI);

In the previous snippet, since we assign dst to both sub-operands’ positions, the same callback function will be invoked twice on those positions – with the same arguments! (Note that both MCO and OpIdx provide an operand rather than a sub-operand) In other words, a callback has no way to tell what sub-operand it is encoding now.

In summary, in the original CodeEmitterGen, we observe the following shortcomings that are unfavorable to variable-length instructions:

Bit width of the Inst field is fixed. Though we can declare the field with maximum instruction size in the ISA, it requires extra code to adjust the final instruction size.
Operand encoding can only be placed at fixed bit positions. However, the size of an operand in a variable-length instruction might vary.
In the situation where a single logical operand is consisting of multiple sub-operands, the current syntax cannot reference a sub-operand. Which means we can only reference the entire logical operand at places where we actually should put sub-operands. Making the TG code less readable and bring more burden to the operand encoding functions (because they don’t know which sub-operand to encode).

M68k’s old instruction encoder

We’ve spent some time exploring some options of adopting the original CodeEmitterGen (and why it didn’t work out). I think it’s also worth it to take a look at M68k LLVM’s previous instruction encoder: CodeBeadsGen.

CodeBeadsGen is a TableGen backend created by Artyom Honcharov. On the face of it, CodeBeadsGen simply converts a bits type TableGen field, Beads, in each instruction definition into an uint8_t array. In M68k, we used this stream of bits as the vehicle for instruction encoding fragments that are concatenated one after another, thus got the name code beads. Each encoding fragment can represent either a fixed bits value or placeholders for operands – similar to the original CodeEmitterGen except that fragments are organized as a sequence.

For example, class MxBead4Bits<bits<4> value> is a fragment representing a 4-bit wide fixed value passing from the template argument value; class MxBeadDReg<bits<3> op_idx> is a fragment for data register, where op_idx is the index of the corresponding operand; similarly, class MxBead8Imm<bits<3> op_idx> is a fragment for 8-bit immediate, and op_idx is also the index of the corresponding operand.

In addition to fragments, an utility class, MxEncoding, helps us to convert a list of fragments into raw bits. Here is how they work together:

let Beads = MxEncoding<MxBead8Imm<2>,
                       MxBeadDReg<1>,
                       MxBead4Bits<0b0100>>.Value;

The Beads above represents the following encoding layout:

  15    12  11                      8   7                         0
| 0 1 0 0 | data register (operand 1) | 8-bit immediate (operand 2) |

Note that each fragment still requires further interpretation (by the M68kMCCodeEmitter class) to emit the real encoded value.

The reason I mention this legacy framework here is because there are two things I want to highlight:

Fragment are position-independent. As a developer, we only need to put fragments in the correct ordering. The exact bit positions will be handled by the framework.
We can reuse fragments, or even compose a new one by ourselves. For example, we can create a new fragment for the (87,%a1) addressing mode we saw earlier by putting two fragments, MxBead16Imm and MxBeadAReg (for address register), together. Such aggregate fragment can also be reused multiple times in other parts of the codebase.

Some core designs in the new CodeEmitterGen, which we will cover shortly, are actually heavily influenced by these two concepts.

So why did we drop this CodeBeadsGen-based instruction encoder? There are primarily two reasons:

An operand fragment (e.g. MxBeadDReg) uses operand index rather than more ergonomic mnemonic to designate an operand. What’s worse, the index we’re talking here is actually not the logical operand index, but the MCOpreand index. A complex logical operand like (87,%a1) is actually consisting of multiple MCOperand-s. And it’s pretty tricky to figure out one of their index, because you need to account for the number of MCOpreand-s before this logical operand.
The fact that every fragments are converted into raw bits before being consumed by M68kMCCodeEmitter forcing us to funnel lots of metadata and annotations into those bits. But then in M68kMCCodeEmitter, we need to spend even more energies to decode those metadata! This eventually created many obscured TableGen and M68kMCCodeEmitter codes / logics, which are hard to maintain.

Introducing the new VarLenCodeEmitter

Finally, let’s talk about VarLenCodeEmitter – a new extension to CodeEmitterGen that adds supports for encoding variable-length instructions.

Following up our previous MyInst hypothetical instruction, let take a look one of its more advanced siblings: MyVarInst.

class MyMemOperand<dag sub_ops> : Operand<iPTR> {
    let MIOperandInfo = sub_ops;
}

class MyVarInst<MyMemOperand memory_op> : Instruction {
    let OutOperandList = (outs GR64:$dst);
    let InOperandList  = (ins memory_op:$src);
}

The OutOperandList and InOperandList are fields for instruction’s input and output operands, respectively. In here, we have a slightly more complex input operand type, MyMemOperand, which might contain more than one sub-operand. For example, MemOp16 and MemOp32 below are both consisting of two sub-operands.

def MemOp16 : MyMemOperand<(ops GR64:$reg, i16imm:$offset)>;
def MemOp32 : MyMemOperand<(ops GR64:$reg, i32imm:$offset)>;

Notice that both memory_op (in InOperandList) and sub-operands within MyMemOperand are tagged with names like $src, $reg, or $offset.

Now, we know MyVarInst has the following instruction encoding:

15             8                                   0
----------------------------------------------------
|   10110111   |  Sub-operand 0 in source operand  |
----------------------------------------------------
X                                                 16
----------------------------------------------------
|         Sub-operand 1 in source operand          |
----------------------------------------------------
                X + 4                          X + 1
                ------------------------------------
                |       Destination register       |
                ------------------------------------

We put many X-s in the diagram above because the size of a sub-operand might vary. Similar to the original CodeEmitterGen, we’re also storing encoding info in the Inst field – except that it is dag type rather than bits. Here is an example:

class MyVarInst<MyMemOperand memory_op> : Instruction {
    let OutOperandList = (outs GR64:$dst);
    let InOperandList  = (ins memory_op:$src);

    dag Inst = (ascend
        (descend /*Fixed bits*/0b10110111,
                 /*Sub-operand 0 in source operand*/(operand "$src.reg", 8)),
        // Sub-operand 1 in source operand
        (operand "$src.offset", 16),
        // Destination register
        (operand "$dst", 4)
    );
}

The idea is to use special dag operators and TableGen values (e.g. fixed bits 0b10110111) to build a sequence that reflects the encoding format. For instance, the following snippet represents the encoding from bit 15 ~ 0:

(descend /*Fixed bits*/0b10110111,
         /*Sub-operand 0 in source operand*/(operand "$src.reg", 8))

And for this snippet

// Sub-operand 1 in source operand
(operand "$src.offset", 16)

It represents the encoding from bit X ~ 16.

Instead of specifying the exact bit positions where each part is going, we’re merely assembling these parts with the correct relative ordering.

Here are the descriptions of each dag directive we just saw:

(ascend [value1, value2, ...]): DAG arguments (i.e. value1, value2, …) are concatenated one after another to form a sequence. They are organized from least-significant bit (LSB) to most-significant bit (MSB). That is, value1 will sit at lower bits and value2 will sit at higher bits. Each argument can be a bits<N> or another dag.
(descend [value1, value2, ...]): Similar to ascend, but the arguments are orgnized from MSB to LSB.
(operand <operand reference string>, <size in bits>): Representing the placeholder for an instruction operand. The first argument is a string that references the name – for example, $dst or $src we saw earlier – tagged on an operand. If there are sub-operands, we can use $<operand name>.<sub-operand name> to reference the sub-operand. The second argument specifies the size of the encoded operand.

Using ascend and descend together can help developers to transcribe instruction format descriptions from the architecture manual. Since many of them write MSB to LSB from left to right. But if there is a need to split the encoded bits into multiple “rows” (e.g. a row is a word), the top row is sitting at lower bit while the bottom is sitting at higher bit. For example: stack instruction description Where “BASE DISPLACEMENT” is sitting at bit 16 ~ 31 and “OUTER DISPLACEMENT” sits at bit 63 ~ 32. In this case, using the following syntax will be easier to understand:

let Inst = (ascend
  (descend ...), // bit 15 ~ 0
  (descend ...), // bit 31 ~ 16
  (descend ...)  // bit 63 ~ 32
);

Now let’s pause for a second and look into the example snippet we just went through:

let InOperandList  = (ins memory_op:$src);

dag Inst = (ascend
    (descend /*Fixed bits*/0b10110111,
              /*Sub-operand 0 in source operand*/(operand "$src.reg", 8)),
    // Sub-operand 1 in source operand
    (operand "$src.offset", 16),
    // Destination register
    (operand "$dst", 4)
);

In the above, sub-operand 0 and 1 in the source operand are explicitly referenced by (operand "$src.reg", 8) and (operand "$src.offset", 16), respectively.

But what if…

We have a source operand (i.e. memory_op) whose offset sub-operand has a size different than 16 bits? A good example is the MemOp32 we created earlier.
Instead of “reg” and “offset”, memory_op has different names for its sub-operands.

To address these questions, let’s make our code more flexible. First, let’s change the class definition of MyMemOperand

class MyMemOperand<dag sub_ops> : Operand<iPTR> {
    let MIOperandInfo = sub_ops;
    dag Base;
    dag Extension;
}

After finishing that, we change the content of Inst in MyVarInst:

dag Inst = (ascend
    (descend /*Fixed bits*/0b10110111,
              /*Sub-operand 0 in source operand*/memory_op.Base),
    // Sub-operand 1 in source operand
    memory_op.Extension,
    // Destination register
    (operand "$dst", 4)
);

Now let’s look at the operand. For MemOp16, instead of creating a TableGen record from MyMemOperand, we create a class first:

class MemOp16<string op_name> : MyMemOperand<(ops GR64:$reg, i16imm:$offset)> {
    let Base = (operand "$"#op_name#".reg", 8);
    let Extension = (operand "$"#op_name#".offset", 16);
}

The "$"#op_name#".reg" syntax concatenates three strings – “$”, op_name, and “.reg” – together.

Similarly…

class MemOp32<string op_name> : MyMemOperand<(ops GR64:$reg, i32imm:$offset)> {
    let Base = (operand "$"#op_name#".reg", 8);
    let Extension = (operand "$"#op_name#".offset", 32);
}

Finally, we instantiate new MyVarInst instances like this:

def FOO16 : MyVarInst<MemOp16<"src">>;
def FOO32 : MyVarInst<MemOp32<"src">>;

What we just did is factoring out the parts that are related to operands. This not only makes MyVarInst more generic, the new MemOp16 and MemOp32 classes become composable and reusable modules where we can use in other instruction definitions:

class My2ndVarInst<MyMemOperand memory_op> : Instruction {
    ...
    let InOperandList  = (ins memory_op:$opnd);
    ...
}

def BAR16 : My2ndVarInst<MemOp16<"opnd">>;

We don’t need to repeat operand encoding in every instruction definitions anymore.

To sum up the advantages of using VarLenCodeEmitter:

Developers can express instruction encodings in a more ergonomically way and avoid details like exact bit positions, which is a problem for variable-length instruction. Making instruction definitions much easier to read and maintain.
It’s easier to create composable components, which facilitates code reuse.
VarLenCodeEmitter shares many traits and most of the interfaces with the original CodeEmitterGen. For example, both of them generate <Target>MCCodeEmitter::getBinaryCodeForInstr as the primary encoder (function body is of course different but function signature is the same).
It’s a generic framework that is not limited to variable-length instructions – you can use it on normal fixed-length instructions too!

Adopting VarLenCodeEmitter

To adopt VarLenCodeEmitter, in addition to using the TableGen syntax introduced ealier, it shares the same llvm-tblgen flag as the original CodeEmitterGen:

# In your target's CMakeLists.txt
tablegen(LLVM <Target>GenMCCodeEmitter.inc   -gen-emitter)

Namely, after you write a dag-type Inst field (rather than bits) in your instruction definitions, VarLenCodeEmitter will automatically take over.

Thank you for reading! Appendix

There are two other useful dag constructions in VarLenCodeEmitter:

(slice <operand reference string>, <starting or ending bit>, <starting or ending bit>): Similar to operand, but instead of referencing the entire operand, we’re referencing part of the operand encoding, bounded by second and third dag arguments.
(operand ..., (encoder <encoder function name>)) and (slice ..., (encoder <encoder function name>)): The encoder is an extension to both operand and slice. It specifies a custom C++ encoder function for this specific operand encoding, rather than the default getMachineOp. This is similar to EncoderMethod in an Operand TableGen record, but EncoderMethod applies to every operands in the current target whereas encoder here only affects the enclosing operand or slice.

NOTE This article is adapted from my previous write up on Gist.

https://m680x0.github.io/blog/2022/02/varlen-encoder

Welcome to M68k LLVM!

Min-Yih "Min" Hsu Jan 20, 2021 Updated Jan 20, 2021

Welcome folks!

Show full content

Welcome folks!

This website is a place for the development of M68k supports in LLVM. Including code generation, assembler, Clang support, and other target-specific LLVM components.

The M68k LLVM target as well as this website are both under heavy development. Code contributions are very welcome, please follow the standard LLVM contribution guidance if you’re interested in working on the LLVM part. For this website, feel free to drop a PR to make some changes.

Alternatively, you can support us with donations. Please checkout this page for available options.

Thank you for your interests and supports on this project :-)

https://m680x0.github.io/blog/2021/01/welcome

https://m680x0.github.io/feed.xml

Posts