20140406 - Practical Assembly


Market Evolution
The market has evolved such that x86-64 is the only practical consumer platform which supports an attached GPU which has medium to high performance. Personally I'd rather work on ARM-64 as it is a much better ISA, but market forces have made that impossible. The first sign was that AMD (an x86 company) secured all the console wins. Then when Microsoft decided to limit Windows on ARM to Metro only, that effectively killed ARM-64 as an interesting platform for desktop graphics.

One positive side effect of this is that for those who build graphics which requires 100W and up, there is only one ISA, x86-64, which is, by design, fully backwards compatible. Thus for those that are brave, ISA portability is really no longer an excuse not to leverage assembly any more.

Where Traditional Assembly Goes Wrong
Core problem with traditional assembly usage is the complete lack of practical front-end tools to the language. A second problem is that of developers attempting to program assembly in a C style, using the platform ABI, resulting in a massive mess of constantly pushing and poping everything from a stack around functions.

There is a much better way to look at x86-64 as a platform for solving problems. First while x86-64 only has 16 integer registers, it is somewhat mitigated by the ISA's built-in ability to take one operand from a memory location. This provides enough effective working space to factor register allocation from inside calls to outside calls. This is a powerful simplication tool as many algorithms can then effectively have a programmer-defined global register allocation. Different subsets of program running at different times simply alias the meaning of various registers. Likewise the stack is no longer used for data (only then used to interopt with dynamic link libaries) and a pure data flow based design is much easier to realize. Assembly, with a good front-end tool, becomes easier to write than C or C++.

What is a Good Front-End Tool?
My choice has always been something similar to the Forth language. A language front-end should be powerfull enough to use code to generate code. For instance a language front-end should be able to load a file, transform that file as a executable would and produce an inline data structure when compiling. In contrast think about how limited the "#include" functionality is in the C pre-processor.

I have been leveraging Forth like interpreters for ages now. For instance, my prior FarrarFocus website was written in a server side Forth-like language which was interpreted by PHP which generated client-side javascript. One of the advantages of using a Forth style front-end, is that the assembler itself can be defined in the language. Also in contrast to the hundreds of MB of install required to compile "hello world" in gcc or clang, a simple complete Forth based system could easily be under 16K.

Here is what I've found to be useful over the years as front-end language syntax,

\comment is anything between slashes\
0deafb175- \hex number starting with 0-9, with postfix sign\
builtin \evaluate a built-in symbol\
.symbol \push the value of a symbol on the data stack\
:symbol \pop the top of data stack into symbol\
{ symbol \define a symbol as some code\ }
`symbol \evaluate a defined symbol\
'symbol \push the address of a symbol on the data stack\
"string" \push the address of a string on the data stack\


Note the use of character prefix to define usage of symbols. When editing, I use simple colored syntax highlighting to make code meaning easy to understand visually. It is important to note the data stack is a construct of the interpreter (not code generation). The language works with postfix notation: "a + b" is written "a b +". Now for some examples of parts of writing a x86-64 assembler, starting with the syntax of the assembly,

\somewhere setup the aliased global register allocation\
0 :dst \the symbol "dst" is register 0 or RAX\
1 :address \the symbol "address" is register 1 or RCX\
\then in code use symbols instead of register names or numbers\
.dst .address 10 `@MOV8, \mov rax,[rcx+16]\


The "@MOV8," is used for the 64-bit form of the MOV instruction in the form where the source operand is a load (the @ means load) such as [reg] or [reg+imm8] or [reg+imm32]. The comma in this case "," means write to the output file. The language is really flexible. The above code could be also written as follows,

0 :RAX 1 :RCX
.RAX :dst .RCX :address
.dst .address 10 `@MOV8, \mov rax,[rcx+16]\


Here are the bits of an assembler (written in the front-end language) which could implement the "@MOV8" symbol. I'm writing these in reverse order,

{ @MOV8, 8b `@OP28, } \MOV is opcode 8b then call common @OP2 for 64-bit\

\FOLLOWING CODE REUSED FOR ALL @OP2 FORMS\
\pull args from stack then write opcode\
{ @OP28, :$op :$imm :$rm :$r `REX8, .$op ,1 `@MODrm, }
\generate 64-bit REX prefix\
{ REX8, .$r 1 >> 4 & .$rm 3 >> + 48 + ,1 }
\generate MODrm byte and offset in [r] [r+imm8] [r+imm32] cases\
{ @MODrm, .$imm .@MODrm_0, if0 `#1? .@MODrm_4, if!0 40 :$mod `MODrm, .$imm ,1 }
{ @MODrm_0, 0 :$mod `MODrm, drp; }
{ @MODrm_4, 80 :$mod `MODrm, .$imm ,4 drp; }
\write out MODrm byte\
{ MODrm, .$r 7 & 8 * .$rm 7 & + .$mod + ,1 }
\check if immediate is 8-bit\
{ #1? .$imm dup s1>2 sub }


Typically the selection of builtin ops for the front-end would match what the programmer wanted to make it easy to write code. The builtin ops in this case work as follows,

2 ,1 \append 2 as one byte to the memory block of the executable\
2 ,4 \append 2 as one 32-bit word to the memory block of the executable\
.a .b >> \pop a and b from stack as 64-bit integers, push a>>b (shift left)\
.a .b & \... a&b (and)\
.a .b + \... a+b (add)\
{ bob \some defined symbol\ }
.a .bob if0 \if a==0 then call bob\
.a .bob if!0 \if a!=0 then call bob\
drp; \drop top value of return stack, later returns to caller's caller\
s1>2 \sign extend the top of data stack from byte to 2 byte\


The process of writing code for this kind of system becomes more of an exersize of generating words (or symbols) in the language a programmer would like to solve a problem by writing sentences and paragraphs of code. For instance here is an example of implementation of a simple data stack,

0 :stk \stack top pointer in RAX\
{ Stk+ .stk 8 `#ADD8, }
{ Stk- .stk 8- `#ADD8, }
{ @Stk0 \reg\ .stk 0 `@MOV8, }
{ @Stk1 \reg\ .stk 8- `@MOV8, }
{ !Stk \reg\ .stk 0 `!MOV8, }


And a simple program to implement something which takes two 64-bit values off the stack, adds them, and pushes the result (all using the above interface),

1 :stk \later decided data stack pointer is in RCX\
.tempA `@Stk0 .tempB `@Stk1 .tempA .tempB `ADD8, `Stk- .tempA `!Stk


This is only starting to scratch the surface of how such a front-end can be used to generate code...