Memory Barriers Demystified: Acquire, Release, and Why They Matter

When you write x = 1; y = 2; in C, you assume those happen in order. They don't. Both the compiler and the CPU will reorder them whenever it's faster and "observably" equivalent for a single thread.

For multi-threaded code, observably equivalent is a lie. Memory barriers are how you tell the compiler and CPU: "no reordering past this point."

Two Reorderings, One Problem

Compiler reorders at compile time. The optimizer rearranges instructions for register allocation and pipelining.

CPU reorders at runtime. Out-of-order execution dispatches independent instructions in parallel; store buffers commit writes after the instruction retires.

Both are invisible from a single thread — the result looks sequential to the thread doing the work. From another core, all bets are off.

The Classic Bug

c
int data;
int ready = 0;

// Producer
data = 42;
ready = 1;

// Consumer
while (!ready) ;
printf("%d\n", data);  // could print 0 — not 42

Without barriers, the producer's two stores can be observed in reverse order by the consumer. The consumer sees ready == 1 but data == 0.

The Memory Order Vocabulary

C11 atomics give you five memory orders. The two that matter most:

c
__atomic_store_n(&ready, 1, __ATOMIC_RELEASE);   // producer
while (!__atomic_load_n(&ready, __ATOMIC_ACQUIRE)) ;  // consumer

Release on a store: nothing before it (in program order) can be reordered after it. "Everything I did is now visible."

Acquire on a load: nothing after it can be reordered before it. "Everything that was published is now visible to me."

Pair them and the producer's data = 42 is guaranteed visible by the time the consumer sees ready == 1.

Producer                 Consumer
data = 42                load(ready, ACQUIRE) → 1
store(ready, RELEASE)    use data — guaranteed 42

The Five Orders, Briefly

Order Use case
RELAXED Counters, statistics. No ordering guarantees.
CONSUME Rare; subtle; usually treated as ACQUIRE.
ACQUIRE Loads that publish happens-before relations.
RELEASE Stores that publish happens-before relations.
ACQ_REL Read-modify-write that does both (CAS, fetch).
SEQ_CST Default; total global order. Slowest, safest.

What the CPU Actually Does

On x86-64, regular loads and stores are already strongly ordered — only store-after-load can be reordered. So ACQUIRE and RELEASE cost nothing extra; SEQ_CST adds an mfence (or a locked instruction) between them.

On ARM and POWER, the CPU is genuinely weak. ACQUIRE becomes ldar, RELEASE becomes stlr, SEQ_CST becomes a full dmb ish. Costs scale with strength.

x86-64:   acquire/release  → zero-cost
          seq_cst          → mfence (~10ns)

ARMv8:    acquire          → ldar
          release          → stlr
          seq_cst          → dmb ish (~3-15ns)

This is why "just use SEQ_CST everywhere" is a real performance hit on ARM.

A Working Spinlock

c
typedef struct { _Atomic int locked; } spinlock_t;

void lock(spinlock_t *s) {
    while (__atomic_exchange_n(&s->locked, 1, __ATOMIC_ACQUIRE)) {
        while (__atomic_load_n(&s->locked, __ATOMIC_RELAXED))
            __builtin_ia32_pause();   // SMT-friendly busy-wait
    }
}

void unlock(spinlock_t *s) {
    __atomic_store_n(&s->locked, 0, __ATOMIC_RELEASE);
}

The ACQUIRE on exchange ensures everything inside the critical section is observed only after the lock is taken. The RELEASE on the unlock publishes everything before it. The inner RELAXED load avoids an expensive RMW while spinning.

Common Mistakes

Volatile is not a memory barrier. It prevents the compiler from optimizing the access away, but the CPU is still free to reorder. Use atomics.

Mixing atomic and non-atomic on the same variable is undefined behavior. Either always atomic or never.

Forgetting one side of the pair. Acquire alone or release alone doesn't establish synchronization — you need both, on matching operations.

Takeaways

  • The compiler and CPU both reorder. Single-threaded code never notices.
  • Acquire/release are the cheap, correct way to publish data between threads.
  • SEQ_CST is the default for a reason — it's safe. But on ARM/POWER it's not free.
  • Volatile is not enough. Atomics are.
  • Pair every release with an acquire, or you've published nothing.