Memory Barriers Demystified: Acquire, Release, and Why They Matter
When you write x = 1; y = 2; in C, you assume those happen in order. They don't. Both the compiler and the CPU will reorder them whenever it's faster and "observably" equivalent for a single thread.
For multi-threaded code, observably equivalent is a lie. Memory barriers are how you tell the compiler and CPU: "no reordering past this point."
Two Reorderings, One Problem
Compiler reorders at compile time. The optimizer rearranges instructions for register allocation and pipelining.
CPU reorders at runtime. Out-of-order execution dispatches independent instructions in parallel; store buffers commit writes after the instruction retires.
Both are invisible from a single thread — the result looks sequential to the thread doing the work. From another core, all bets are off.
The Classic Bug
int data;
int ready = 0;
// Producer
data = 42;
ready = 1;
// Consumer
while (!ready) ;
printf("%d\n", data); // could print 0 — not 42
Without barriers, the producer's two stores can be observed in reverse order by the consumer. The consumer sees ready == 1 but data == 0.
The Memory Order Vocabulary
C11 atomics give you five memory orders. The two that matter most:
__atomic_store_n(&ready, 1, __ATOMIC_RELEASE); // producer
while (!__atomic_load_n(&ready, __ATOMIC_ACQUIRE)) ; // consumer
Release on a store: nothing before it (in program order) can be reordered after it. "Everything I did is now visible."
Acquire on a load: nothing after it can be reordered before it. "Everything that was published is now visible to me."
Pair them and the producer's data = 42 is guaranteed visible by the time the consumer sees ready == 1.
Producer Consumer
data = 42 load(ready, ACQUIRE) → 1
store(ready, RELEASE) use data — guaranteed 42
The Five Orders, Briefly
| Order | Use case |
|---|---|
RELAXED |
Counters, statistics. No ordering guarantees. |
CONSUME |
Rare; subtle; usually treated as ACQUIRE. |
ACQUIRE |
Loads that publish happens-before relations. |
RELEASE |
Stores that publish happens-before relations. |
ACQ_REL |
Read-modify-write that does both (CAS, fetch). |
SEQ_CST |
Default; total global order. Slowest, safest. |
What the CPU Actually Does
On x86-64, regular loads and stores are already strongly ordered — only store-after-load can be reordered. So ACQUIRE and RELEASE cost nothing extra; SEQ_CST adds an mfence (or a locked instruction) between them.
On ARM and POWER, the CPU is genuinely weak. ACQUIRE becomes ldar, RELEASE becomes stlr, SEQ_CST becomes a full dmb ish. Costs scale with strength.
x86-64: acquire/release → zero-cost
seq_cst → mfence (~10ns)
ARMv8: acquire → ldar
release → stlr
seq_cst → dmb ish (~3-15ns)
This is why "just use SEQ_CST everywhere" is a real performance hit on ARM.
A Working Spinlock
typedef struct { _Atomic int locked; } spinlock_t;
void lock(spinlock_t *s) {
while (__atomic_exchange_n(&s->locked, 1, __ATOMIC_ACQUIRE)) {
while (__atomic_load_n(&s->locked, __ATOMIC_RELAXED))
__builtin_ia32_pause(); // SMT-friendly busy-wait
}
}
void unlock(spinlock_t *s) {
__atomic_store_n(&s->locked, 0, __ATOMIC_RELEASE);
}
The ACQUIRE on exchange ensures everything inside the critical section is observed only after the lock is taken. The RELEASE on the unlock publishes everything before it. The inner RELAXED load avoids an expensive RMW while spinning.
Common Mistakes
Volatile is not a memory barrier. It prevents the compiler from optimizing the access away, but the CPU is still free to reorder. Use atomics.
Mixing atomic and non-atomic on the same variable is undefined behavior. Either always atomic or never.
Forgetting one side of the pair. Acquire alone or release alone doesn't establish synchronization — you need both, on matching operations.
Takeaways
- The compiler and CPU both reorder. Single-threaded code never notices.
- Acquire/release are the cheap, correct way to publish data between threads.
SEQ_CSTis the default for a reason — it's safe. But on ARM/POWER it's not free.- Volatile is not enough. Atomics are.
- Pair every release with an acquire, or you've published nothing.