20130808 - Runtime Recompile Reloaded


Prior posted a technique which used shared objects (Linux) or dynamic link libraries (Windows). This technique required a copy of the global data on reload so that fast RIP relative addressing could be used on x86-64. Here is something better, something much closer to the metal.

Bypassing DLLs and SOs
This new technique is simple. Instead of using a DLL/SO, just generate a raw binary and load the binary to a fixed memory address in the address space of the application. Setup an uninitialized global data section at another fixed memory address. The binary can return back to the loader which can then reload a new binary and call back in keeping existing data. Fast instant edit/compile/test loop!

The x86-64 ISA takes an extra instruction byte cost for 32-bit absolute addressing (the SIB byte). RIP (instruction) relative addressing is one byte shorter and still has a +/- 2GB range when using a 32-bit immediate offset. Given that DLLs/SOs are loaded at a variable offset at runtime, using a fixed memory segment would incure an extra byte tax for instructions reading globals. This method of avoiding DLLs/SOs and going direct to raw binaries (loaded at fixed addresses) insures that globals can be fetched with faster RIP relative addressing.

Another side effect of this fixed address is that code and data from the fixed global data section can use 32-bit pointers safely. Clearly OS and external libs can still generate pointers which would require 64-bits.

The Basics
This leverages gcc features (gcc is easy on Windows via a MinGW install). First an example loader. Showing Linux examples here because they are easier to understand. Start with a mini shell application which loads a binary. Leverage the ability to mmap() or VirtualAlloc() to a fixed memory address with READ, WRITE, and EXECUTE permissions. On Linux, 0x40000000 works well. On Linux x86-64, 2MB aligned and 2MB multiple sized mappings get auto promoted to large pages which reduces system overhead (less TLB pressure, less page table memory, etc). This example will follow that practice. Only covering how to load the first time here. Loading a new binary again can easily be done when the StartF entry function returns.

typedef void (*StartF)(void);

// FIRST MAP MEMORY AT FIXED ADDRESS
int* adr = (int*)mmap((void*)0x40000000,2*1024*1024,
PROT_READ|PROT_WRITE|PROT_EXEC,
MAP_FIXED|MAP_PRIVATE|MAP_ANONYMOUS,0,0);

// PERSISTANT FIXED DATA SEGMENT STARTS AT 1MB IN
int* adr2 = adr + (0x100000/4);
adr2[0]=1234;
printf("%d\n",adr2[0]);

// LOAD THE BINARY FROM A FILE
int h=open("Binary.LinuxI64",O_RDONLY);
read(h,(void*)adr,1024);

// CALL THE ENTRY FUNCTION
((StartF)adr)();
printf("%d\n",adr2[0]);


Generating the Binary from C Code
Same as last time (and not shown in this example), the loader would pass a pointer to dlsym() or GetProcAddress() to the binary so that the binary can fetch pointers to symbols in libraries like GL or X11 which get linked to the loader. The binary gets linked to nothing. Here is an example program, where the fixed memory segment is called ".ram".

// THIS MUST BE THE FIRST LINE OF THE FILE
__asm__ ("jmp start\n");
// DEFINE LAYOUT OF GLOBAL MEMORY
typedef struct { unsigned int a; } RamT;
// DECLARE UNINITIALIZED GLOBALS IN A USER DEFINED SECTION
static RamT ram __attribute__((section(".ram")));
// ENTRY POINT OF PROGRAM
void start(void) { ram.a=5678; }


Unlike standard C, on first load, anything in the "ram" structure would not be pre-zeroed unless the loader manually zeroed the section.

Compiling is done directly to an object file. For example,

gcc MyProgram.c -o MyProgram.o -c -std=gnu99 -O3 -fomit-frame-pointer -msse -msse2 -msse3 -march=nocona -ffast-math -mno-ieee-fp -mfpmath=sse -fno-exceptions -fno-asynchronous-unwind-tables -fno-zero-initialized-in-bss -nostdlib

Inspecting the sections of this object file can be done via,

objdump -h MyProgram.o

The magic happens at link time. Leverage gnu ld to link to a raw binary using a linker script,

ld -TLink.LinuxI64 MyProgram.o

Here is the Link.LinuxI64 linker script which works for this example,

OUTPUT(Binary.LinuxI64)
OUTPUT_FORMAT(binary)
OUTPUT_ARCH(i386:x86-64)
SECTIONS
{
. = 0x40000000;
.text : { *(.text); *(.data); *(.rodata*); *(.bss); }
. = 0x40100000;
.ram : { *(.ram); }
/DISCARD/ : { *(.note*); *(.comment); }
}


Inspecting the disassembly of Binary.LinuxI64 can be done via,

objdump -D -b binary -mi386:x86-64 Binary.LinuxI64