Memory Ordering

When a processor writes to a memory location, the value is cached to improve performance. Similarly, the processor attempts to satisfy read requests from the cache to improve performance. Furthermore, processors begin to fetch values from memory before they are requested by the application. This can happen as part of speculative execution or due to cache line issues.

CPU caches can be partitioned into banks that can be accessed in parallel. This means that memory operations can be completed out of order. To ensure that memory operations are completed in order, most processors provide memory-barrier instructions. A full memory barrier ensures that memory read and write operations that appear before the memory barrier instruction are committed to memory before any memory read and write operations that appear after the memory barrier instruction. A read memory barrier orders only the memory read operations and a write memory barrier orders only the memory write operations. These instructions also ensure that the compiler disables any optimizations that could reorder memory operations across the barriers.

Processors can support instructions for memory barriers with acquire, release, and fence semantics. These semantics describe the order in which results of an operation become available. With acquire semantics, the results of the operation are available before the results of any operation that appears after it in code. With release semantics, the results of the operation are available after the results of any operation that appears before it in code. Fence semantics combine acquire and release semantics. The results of an operation with fence semantics are available before those of any operation that appears after it in code and after those of any operation that appears before it.

On x86 and x64 processors that support SSE2, the instructions are mfence (memory fence), lfence (load fence), and sfence (store fence). On ARM processors, the instrutions are dmb and dsb. For more information, see the documentation for the processor.

The following synchronization functions use the appropriate barriers to ensure memory ordering:

Functions that enter or leave critical sections
Functions that acquire or release SRW locks
One-time initialization begin and completion
EnterSynchronizationBarrier function
Functions that signal synchronization objects
Wait functions
Interlocked functions (except functions with NoFence suffix, or intrinsics with _nf suffix)