Decoding ELF: How Linux Actually Loads Your Binary

You type ./a.out. The kernel reads four magic bytes, parses a header, maps a few memory regions, jumps to an entry point, and your program is running. "Executable" is not magic — it's a file format with very specific structure.

That format is ELF, the Executable and Linkable Format.

The Four Bytes

Every ELF file starts with \x7fELF. Run this on any binary:

xxd /bin/ls | head -1
# 00000000: 7f45 4c46 0201 0103 0000 0000 0000 0000  .ELF............

The kernel checks these four bytes when you execve(). If they match, control passes to the ELF binfmt handler. If not, it tries shebang (#!), then fails with ENOEXEC.

Two Views of the Same File

ELF was designed to serve both the linker and the loader, so it has two parallel tables:

  • Section headers — the linker's view: .text, .data, .bss, .rodata, .symtab, .rela.dyn, etc. Used to combine and relocate.
  • Program headers — the loader's view: LOAD, DYNAMIC, INTERP, GNU_STACK, NOTE. Tells the kernel what to map.

The loader doesn't care about sections. It only reads program headers.

ELF File
├─ ELF header  (e_entry, e_phoff, e_shoff)
├─ Program headers  → used by execve()
├─ Sections
│  ├─ .text
│  ├─ .rodata
│  ├─ .data
│  └─ .bss
└─ Section headers  → used by ld, gdb, objdump

Inspecting Headers

readelf -h /bin/ls
# Type:        DYN (Position-Independent Executable)
# Entry point address: 0x6ab0

readelf -l /bin/ls
# LOAD  0x000000  R E   0x18000
# LOAD  0x018000  RW    0x06000
# INTERP /lib64/ld-linux-x86-64.so.2

readelf -l prints the program headers — exactly what the kernel will read.

What execve() Does

A simplified pseudocode of the kernel side:

execve("/bin/ls")
  read 64 bytes: ELF header
  validate magic, class, machine
  read program headers
  for each PT_LOAD:
      mmap(file, offset, vaddr, size, prot)
  if PT_INTERP exists:
      load /lib64/ld-linux-x86-64.so.2 the same way
      jump to dynamic linker's entry point
  else:
      jump to e_entry

No bytes are copied into RAM eagerly. mmap sets up the mapping; pages fault in lazily when first accessed. That's why a 50MB binary takes microseconds to start — you're paying for page faults, not file I/O.

Static vs Dynamic

Statically linked: no PT_INTERP segment. The kernel jumps straight to _start. Your binary contains every function it calls.

Dynamically linked: a PT_INTERP segment names the dynamic linker (typically ld-linux.so). The kernel loads it first; the linker then loads libc.so.6, libm.so.6, etc., resolves symbols, and only then jumps to your code.

The dynamic linker is itself an ELF, and a clever one — it's its own interpreter, designed to bootstrap before any libraries are available.

The .bss Trick

Uninitialized globals don't exist in the file:

c
int big_buffer[1024 * 1024];   // 4 MB of zeros

This takes ~zero bytes on disk. The program header reserves the address range; the kernel maps it as anonymous, zero-filled pages on first access. Your file stays small; your runtime gets the full 4MB.

Compare:

c
int also_big[1024 * 1024] = { 1 };   // initialized

This goes in .data and is in the file — 4MB on disk.

PIE and ASLR

Modern compilers default to Position-Independent Executables. The ELF type is DYN, not EXEC. The base address is randomized at load time by ASLR — every run sees /bin/ls at a different virtual address.

This works because all internal references use RIP-relative addressing on x86-64 — no absolute addresses to relocate.

Common Inspection Commands

Command What it shows
file ./a.out Quick summary (ELF, dynamic, stripped)
readelf -h ELF header
readelf -l Program headers (loader view)
readelf -S Section headers (linker view)
readelf -d Dynamic section (runtime deps, RPATH)
nm / objdump -t Symbol table
ldd ./a.out Resolved shared library paths
objdump -D Full disassembly
strip ./a.out Remove symbols and debug info

A Single-Page Mental Model

  execve("./hello")
     → kernel reads ELF header
     → maps PT_LOAD segments into the process
     → if PT_INTERP, loads ld.so first
     → jumps to entry (ld.so or _start)
  ld.so
     → maps libc, resolves symbols
     → jumps to _start in your binary
  _start → __libc_start_main → main

That's the whole pipeline. Everything else is detail.

Takeaways

  • ELF is a file format, not an abstraction — four magic bytes and a header.
  • The loader only reads program headers; sections are for the linker.
  • mmap-and-fault is what makes binary startup fast.
  • Dynamic linking is bootstrapping a language onto itself — ld.so is an ELF that loads ELFs.
  • readelf is your microscope. Use it.