20140420 - Minimal x86-64 Elf Header for Dynamic Loading


It is possible to get the Linux x86-64 ELF overhead down to 495 bytes and include enough information to support one symbol, dlsym(), which is the only symbol required to do manual dynamic loading of whatever is needed at runtime. This is a very important step in reducing the work required for those doing custom languages. Below is a readelf -a dump of a test binary.

ReadELF
ELF Header:
  Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 
  Class:                             ELF64
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              EXEC (Executable file)
  Machine:                           Advanced Micro Devices X86-64
  Version:                           0x1
  Entry point address:               0x1f0
  Start of program headers:          64 (bytes into file)
  Start of section headers:          0 (bytes into file)
  Flags:                             0x0
  Size of this header:               64 (bytes)
  Size of program headers:           56 (bytes)
  Number of program headers:         3
  Size of section headers:           64 (bytes)
  Number of section headers:         0
  Section header string table index: 0

There are no sections in this file.

There are no sections in this file.

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  INTERP         0x00000000000001c4 0x00000000000001c4 0x00000000000001c4
                 0x000000000000001a 0x000000000000001a  RWE    1
      [Requesting program interpreter: /lib/ld-linux-x86-64.so.2]
  LOAD           0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x0000000000000361 0x0000000000000361  RWE    200000
  DYNAMIC        0x00000000000000e8 0x00000000000000e8 0x00000000000000e8
                 0x0000000000000080 0x0000000000000080  RWE    8

Dynamic section at offset 0xe8 contains 8 entries:
  Tag        Type                         Name/Value
 0x0000000000000001 (NEEDED)             Shared library: [libdl.so.2]
 0x0000000000000004 (HASH)               0x1b0
 0x0000000000000005 (STRTAB)             0x1dd
 0x0000000000000006 (SYMTAB)             0x168
 0x0000000000000007 (RELA)               0x198
 0x0000000000000008 (RELASZ)             24 (bytes)
 0x0000000000000009 (RELAENT)            24 (bytes)
 0x0000000000000000 (NULL)               0x0

There are no relocations in this file.

There are no unwind sections in this file.

Histogram for bucket list length (total of 1 buckets):
 Length  Number     % of total  Coverage
      0  0          (  0.0%)
      1  1          (100.0%)    100.0%

No version information found in this file.

Details
Everything is aliased into one Read/Write/Execute binary blob with all ELF related stuff at the beginning of the blob. The ordering of the ELF related stuff is {ELF header, program header, dynamic section, symbol table, relocation table, hash table, interpreter string, dynamic string table}.

There is no need for section headers as they are redundant and not read by the dynamic loader. There is no PHDR program header. This also uses the SYSV style hash instead of the GNU style hash (same as the -hash-style=sysv option for ld). The hash table can be a simple array of 32-bit words {1,2,1,0,0}. So just one bucket, and 2 symbols in the file (undefined and dlsym).

This uses "/lib/ld-linux-x86-64.so.2" as the interpreter string. Note when messing around with ld and simple assembly programs, ld can easily place in the ELF standard /lib/ld64.so.1, which does not work on Linux because of a missing symlink, so the --dynamic-linker=/lib/ld-linux-x86-64.so.2 option would be required.

This packs the interpreter string then the dynamic string table, with the dynamic string table starting at the null terminator of the interpreter string. This one byte overlap covers the required null first string.

This uses PF_X+PF_W+PF_R for p_flags for all the program headers, and runs from offset 0 instead of 0x400000 to make file offset = virtual address.

This does not use DT_BIND_NOW or the associated flag from the dynamic section, and that seems to work at least in this case when using STT_OBJECT instead of STT_FUNC for the symbol entry for dlsym.

The readelf fails to print the relocaton info and objdump simply cannot process the binary. The binary's only symbol, dlsym, is setup with STV_DEFAULT, STB_WEAK, and STT_OBJECT. The single relocation entry is R_X86_64_JUMP_SLOT which is setup to modify a 64-bit address in the binary blob which is used directly. There is no real PLT and GOT. This is setup to do binding at load time, so the 64-bit address for dlsym is just used after load directly.

Note, other types of relocation types simply don't work and I'm not sure why. Using "LD_DEBUG=all ./a.out" to debug shows the following when things work,

relocation processing: ./a.out (lazy)
3306:     symbol=dlsym;  lookup in file=./a.out [0]
3306:     symbol=dlsym;  lookup in file=/lib/libdl.so.2 [0]
3306:     binding file ./a.out [0] to /lib/libdl.so.2 [0]: normal symbol `dlsym'

And then shows something like "binding file ./a.out [0] to ./a.out [0]" (binding to itself) when things fail. From what I can remember, attempting to switch to relocations R_X86_64_64 or R_X86_64_GLOB_DAT always hit the fail case.

Fail
My new favorite measure of fail in system engineering is the ratio of the garbage required to do something vs the minimal amount of stuff actually required. In this case binaries could be as simple as {a filled in at run-time 64-bit pointer to dlsym() at address 0, program entry at address 8, and then the rest of the program}. By this metric ELF has roughly a 64x fail ratio.