C++ In Practice - Synchronization

The Hardware Reality: Concurrent Memory Mutation

C++ does not protect code execution boundaries. Concurrency primitives protect physical memory addresses.

When two threads execute count++ on the same memory address, the hardware performs a Read-Modify-Write cycle. Without synchronization, the CPU cores fetch stale data from their L1 caches or Main Memory, increment it locally, and overwrite each other's results. This is a data race. It results in silent, catastrophic state corruption.

Physical Memory (RAM): [ Address 0x1A4 : Value = 5 ]

CPU Core 1 (Thread A)                        CPU Core 2 (Thread B)
---------------------                        ---------------------
1. READ 0x1A4 into Register (Gets 5)         
                                             1. READ 0x1A4 into Register (Gets 5)
2. INCREMENT Register (Now 6)
                                             2. INCREMENT Register (Now 6)
3. WRITE Register to 0x1A4                   
                                             3. WRITE Register to 0x1A4

Final RAM State: Value is 6. (Expected 7).
Data is permanently corrupted.

The Mutex: Atomic Test-and-Set

A std::mutex is a userspace wrapper around an Operating System lock mechanism (e.g., Linux futex or Windows SRWLock).

Under the hood, a mutex consists of two components:

  1. An atomic integer residing in userspace memory (0 = Unlocked, 1 = Locked).
  2. An OS-managed Wait Queue residing in kernel space.
std::mutex Memory Layout
+-----------------------------------+
| Atomic Lock State : 0             |  <-- Userspace
+-----------------------------------+
| Kernel Wait Queue : [ ]           |  <-- Kernel Space
+-----------------------------------+

Lock Acquisition Mechanics: When mtx.lock() executes, the CPU issues a hardware-level atomic instruction (Compare-And-Swap).

  • Path A (Uncontended): If the state is 0, the CPU atomically flips it to 1. Zero kernel transitions occur. Execution continues in nanoseconds.
  • Path B (Contended): If the state is already 1, the hardware instruction fails. The thread issues a system call. The OS context-switches into kernel mode, suspends the thread's execution, and places the thread identifier into the Kernel Wait Queue.

RAII Wrappers: Exception-Safe Ownership

Manual mtx.unlock() is a defect. If a function throws an exception, the stack unwinds, bypassing the manual unlock instruction. The mutex remains permanently locked, halting all dependent threads.

RAII wrappers bind the hardware lock release instruction to object destruction. When the wrapper is destroyed (via normal exit or exception unwinding), the compiler guarantees the destructor executes.

std::lock_guard

A strict, zero-overhead lexical scope lock.

Stack Frame: lock_guard Object
+-----------------------+
| Mutex Pointer (8B)    | ---> Points to std::mutex
+-----------------------+

Under the hood: It contains only a pointer to the mutex. The compiler optimizes this entirely out of existence in release builds, injecting the raw .lock() and .unlock() instructions precisely at the block boundaries. It has zero state and cannot be moved or released early.

std::unique_lock

A stateful, movable lock manager.

Stack Frame: unique_lock Object
+-----------------------+
| Mutex Pointer (8B)    | ---> Points to std::mutex
| owns_lock Bool (1B)   | ---> Tracks internal state
| Padding (7B)          | ---> Alignment overhead
+-----------------------+

Under the hood: It incurs an 8-byte memory penalty (due to padding) and a branch-prediction penalty. Because it can be unlocked manually (.unlock()), deferred (std::defer_lock), or transferred (std::move), its destructor must evaluate the owns_lock boolean. if (owns_lock) mtx->unlock();. It is utilized strictly when lock state must mutate during the object's lifetime.

The Condition Variable: Kernel-Level Sleep Coordination

Busy-waiting (while(!ready) {}) monopolizes a CPU core, burning hardware cycles to evaluate a boolean flag.

A std::condition_variable provides a mechanism to explicitly yield the CPU until a specific memory state changes. It is an OS-level notification queue.

Condition Variable Component Architecture
+---------------------------------------+
| OS Sleep Queue : [ Thread A, Thread C]|
+---------------------------------------+

The Wait Mechanic (cv.wait):

  1. The thread evaluating the shared state holds a std::unique_lock.
  2. The state is not ready. The thread executes wait().
  3. The OS guarantees an indivisible, atomic hardware operation: The mutex is unlocked, and the thread is simultaneously descheduled and moved to the CV's Sleep Queue.

The Notify Mechanic (cv.notify_one / cv.notify_all):

  1. A mutating thread alters the shared data state.
  2. The mutating thread executes a system call via notify_one().
  3. The OS inspects the CV's Sleep Queue. It selects one thread (or all threads for notify_all), moves them out of the Sleep Queue, and places them into the OS Scheduler's Ready Queue.
  4. When the awakened thread is granted a CPU core, its first mandatory operation is to re-acquire the associated mutex. It then re-evaluates the condition predicate.

Mutex RAII Wrappers: Architectural Distinction

Manual memory management and manual lock management (mtx.lock() and mtx.unlock()) share the exact same critical flaw: exception safety. If a function throws an exception or executes an early return between a lock and an unlock, the lock is never released, resulting in a permanent thread deadlock.

C++11 mandates Resource Acquisition Is Initialization (RAII) for mutexes.

std::lock_guard

Under the hood: A strict, zero-overhead lexical scope lock.

  • Data Structure: Typically stores a single reference/pointer to the mutex. In fully inlined optimizer passes, it occupies zero bytes of memory.
  • Mechanics: Executes .lock() in the constructor and .unlock() in the destructor.
  • Constraints: It cannot be copied, moved, manually unlocked, or manually relocked. It represents absolute ownership for the duration of the C++ block scope.

std::unique_lock

Under the hood: A stateful, movable lock manager.

  • Data Structure: Stores a pointer to the mutex and a boolean flag (owns_lock) tracking ownership state.
  • Mechanics: Because it tracks its own state, it evaluates the boolean flag during destruction to determine if .unlock() must be called. This introduces a minor branching overhead compared to lock_guard.
  • Capabilities: Supports deferred locking (std::defer_lock), time-constrained locking (try_lock_for), manual .unlock(), and transfer of lock ownership between threads or scopes via move semantics (std::move).
  • The std::condition_variable Mandate: std::condition_variable::wait() requires a mechanism to atomically release a lock, put the thread to sleep, and re-acquire the lock upon waking. std::lock_guard cannot release its lock. std::unique_lock is strictly required.

Real-World Architecture: The Bounded Telemetry Queue

Problem: A hardware interrupt service routine (Producer) reads high-frequency telemetry data from an analog-to-digital converter. A background thread (Consumer) processes this data. Condition: Polling a shared buffer utilizing while(true) { check_buffer(); } burns 100% of a CPU core (busy-waiting). The Consumer must yield its CPU time slice to the OS completely when the buffer is empty, and wake deterministically the microsecond new data is available. Solution: A Thread-Safe Bounded Queue utilizing std::mutex, std::unique_lock, and std::condition_variable.

ASCII Execution State

Producer Thread (Interrupt Handler)           Consumer Thread (Processor)
===================================           ===========================
1. Acquire std::lock_guard                    1. Acquire std::unique_lock
2. Push Data to Queue                         2. Queue is Empty.
3. Release lock_guard (Scope ends)            3. Execute cv.wait(lock, predicate)
4. Execute cv.notify_one()                       -> Atomically releases mutex.
                                                 -> Thread is put to Sleep by OS.

[Time Passes - CPU Yielded]

5. Acquire std::lock_guard                    [OS Thread Scheduler Wakes Consumer]
6. Push Data to Queue                         4. Wake triggered by cv.notify_one().
7. Release lock_guard                         5. Re-acquire mutex (Blocks if Producer holds it).
8. Execute cv.notify_one()                    6. Evaluate predicate (Queue not empty? True).
                                              7. Lock is now held by Consumer.
                                              8. Pop Data.
                                              9. Release unique_lock (Scope ends).

Implementation

#include <mutex>
#include <condition_variable>
#include <queue>
#include <stdexcept>
#include <vector>

template <typename T>
class BoundedTelemetryQueue {
    std::queue<T> buffer;
    const size_t max_capacity;

    std::mutex mtx;
    std::condition_variable cv_data_available;
    std::condition_variable cv_space_available;

public:
    explicit BoundedTelemetryQueue(size_t capacity) : max_capacity(capacity) {}

    // Producer execution path
    void push(T data) {
        // 1. Utilize unique_lock because we must wait on a condition variable.
        // We cannot push if the queue has reached its maximum hardware bounds.
        std::unique_lock<std::mutex> lock(mtx);
        
        // 2. Wait until space is available.
        cv_space_available.wait(lock, [this]() { 
            return buffer.size() < max_capacity; 
        });

        // 3. Mutate shared state.
        buffer.push(std::move(data));

        // 4. Manual unlock before notification. 
        // This is a strict optimization. If we notify while holding the lock, 
        // the waking thread will immediately block again trying to acquire this exact mutex.
        lock.unlock();

        // 5. Signal the waiting consumer thread.
        cv_data_available.notify_one();
    }

    // Consumer execution path
    T pop() {
        // 1. Utilize unique_lock for the wait operation.
        std::unique_lock<std::mutex> lock(mtx);

        // 2. Wait until data is available.
        cv_data_available.wait(lock, [this]() { 
            return !buffer.empty(); 
        });

        // 3. Extract data.
        T data = std::move(buffer.front());
        buffer.pop();

        // 4. Manual unlock.
        lock.unlock();

        // 5. Signal waiting producer threads that space has been freed.
        cv_space_available.notify_one();

        return data;
    }
};

Architectural Pitfalls

1. Spurious Wakeups

The Flaw: POSIX and Windows OS thread schedulers explicitly document that a thread blocked on a condition variable may wake up without any thread calling notify_one() or notify_all(). This is a hardware/OS-level artifact. The Fix: A cv.wait() must never be executed without a condition predicate.

  • Unsafe: cv.wait(lock);
  • Safe: cv.wait(lock, []{ return !queue.empty(); }); The C++11 lambda overload internally executes a while (!predicate()) { wait(lock); } loop, guaranteeing the thread immediately returns to sleep if a spurious wakeup occurs and the logical condition is not met.

2. The "Pessimization" of Notification Under Lock

The Flaw: Executing cv.notify_one() while still holding the std::unique_lock forces the OS scheduler to wake the waiting thread. That thread transitions to a ready state, immediately attempts to acquire the mutex required to exit the wait() block, finds it still held by the notifying thread, and instantly goes back to sleep. The Fix: Call lock.unlock() strictly before cv.notify_one(). This ensures the waking thread can acquire the mutex immediately.

3. Critical Section Bloat

The Flaw: Wrapping extensive logic inside a lock_guard.

void process() {
    std::lock_guard<std::mutex> lock(mtx);
    auto data = queue.pop();
    write_to_database(data); // Blocking I/O operation (Milliseconds)
} // Lock held for milliseconds. System throughput collapses.

The Fix: Keep the critical section strictly limited to the memory mutation of the shared data structure.

void process() {
    Data data;
    {
        std::lock_guard<std::mutex> lock(mtx);
        data = queue.pop();
    } // Lock released in nanoseconds.
    write_to_database(data); 
}

OS-Level Mechanics and The Atomic Guarantee

std::condition_variable is a synchronization primitive that causes a thread to suspend execution (block) until notified by another thread that a specific shared state has been mutated.

It is a user-space wrapper over OS-level wait queues.

  • Linux (POSIX): Maps to pthread_cond_t, backed by the futex (Fast Userspace Mutex) system call.
  • Windows: Maps to CONDITION_VARIABLE, utilizing SleepConditionVariableCS or SleepConditionVariableSRW.

The Lost Wakeup Vulnerability

A condition variable cannot exist independently. It must be paired with a std::mutex (specifically via std::unique_lock). This is dictated by the hardware-level "Lost Wakeup" race condition.

If a thread evaluates a condition (if (data_ready == false)), and the OS preempts the thread right before it executes a hypothetical wait() instruction, another thread can acquire the CPU, set data_ready = true, and send a notify(). When the first thread resumes, it executes wait(). The notification was missed. The thread sleeps indefinitely.

The Fix: The OS mandates an atomic unlock-and-sleep instruction. std::condition_variable::wait() takes the std::unique_lock as an argument. The OS guarantees that releasing the mutex and placing the thread on the sleep queue occurs as a single, indivisible hardware operation.

ASCII Execution State: The Wait Lifecycle

Thread A (Consumer)                            Thread B (Producer)
===================                            ===================
1. unique_lock<mutex> lock(mtx);
   (Mutex Acquired)

2. cv.wait(lock, []{ return state; });
   |-- Evaluates predicate: false.
   |-- Atomically:
       a) Unlocks mtx.
       b) Thread A enters Sleep State.
                                               3. lock_guard<mutex> lock(mtx);
                                                  (Mutex Acquired)
                                               
                                               4. state = true;
                                                  (Shared data mutated)
                                               
                                               5. mtx implicitly unlocked.
                                                  (Scope exit)
                                               
                                               6. cv.notify_one();
                                                  (Signals OS scheduler)

   |-- OS wakes Thread A.
   |-- Thread A blocks until it can 
       re-acquire mtx. 
   |-- (Mutex Re-acquired).
   |-- Evaluates predicate: true.
   |-- wait() returns.

7. Executes critical section.
8. mtx implicitly unlocked (Scope exit).

Spurious Wakeups and Predicate Enforcement

A spurious wakeup occurs when a thread blocked on a condition variable awakens without any thread executing notify_one() or notify_all().

Architectural cause: In POSIX systems, if a process receives a UNIX signal (e.g., SIGALRM, SIGCHLD) while blocked on a futex, the kernel aborts the futex wait to execute the signal handler. Upon returning from the handler, the futex wait is not automatically resumed. The wait() call returns.

Implementation: You must pass a predicate (a lambda returning bool) to the wait() function.

// INCORRECT: Vulnerable to spurious wakeups.
cv.wait(lock); 

// CORRECT: The standard library compiles this into a while loop.
cv.wait(lock, [this]{ return data_ready; });

// Under the hood, the standard library executes:
// while (!predicate()) {
//     wait(lock);
// }

If a spurious wakeup occurs, the loop re-evaluates data_ready. Since it remains false, the thread re-executes the OS-level wait, absorbing the hardware anomaly transparently.

Real-World Architecture: Sensor Batch Aggregator with Timeout

Safety-critical systems cannot wait indefinitely. If a hardware sensor dies, a pure wait() deadlocks the processing pipeline. std::condition_variable::wait_for introduces deterministic time bounding.

Problem: A processor must aggregate 100 high-frequency sensor readings before executing a Kalman filter. If the sensor malfunctions or the data rate drops, the filter must run on whatever data is available after 50 milliseconds to maintain control loop determinism.

#include <mutex>
#include <condition_variable>
#include <vector>
#include <chrono>

struct SensorData { float value; };

class BatchProcessor {
    std::mutex mtx;
    std::condition_variable cv;
    std::vector<SensorData> batch;
    
    const size_t target_batch_size = 100;
    const std::chrono::milliseconds timeout{50};
    bool terminate_flag = false;

public:
    // Executed by hardware interrupt or polling thread
    void push_reading(SensorData data) {
        {
            std::lock_guard<std::mutex> lock(mtx);
            batch.push_back(data);
        }
        // Notify the processor that the batch size might be met.
        // Unlocked before notification to prevent waking thread from immediately blocking.
        cv.notify_one(); 
    }

    // Executed by dedicated processing thread
    void processing_loop() {
        while (true) {
            std::vector<SensorData> local_batch;
            {
                std::unique_lock<std::mutex> lock(mtx);
                
                // wait_for returns boolean 'false' if the timeout occurred, 
                // and 'true' if the predicate was satisfied.
                bool predicate_met = cv.wait_for(lock, timeout, [this] {
                    return batch.size() >= target_batch_size || terminate_flag;
                });

                if (terminate_flag) return;

                if (!predicate_met) {
                    // Timeout occurred. Proceed with partial data.
                    if (batch.empty()) continue; 
                }

                // Rapid swap to minimize lock contention duration.
                // Moves the internal pointer; O(1) complexity.
                local_batch.swap(batch); 
            }

            // Execute expensive math outside the critical section
            execute_kalman_filter(local_batch);
        }
    }

    void shutdown() {
        {
            std::lock_guard<std::mutex> lock(mtx);
            terminate_flag = true;
        }
        cv.notify_all(); // Wake the processing loop if it is waiting.
    }

private:
    void execute_kalman_filter(const std::vector<SensorData>& data) {
        // Implementation omitted.
    }
};

notify_one() vs notify_all(): The Thundering Herd

  • notify_one(): Unblocks exactly one waiting thread. Used when the state change satisfies the requirement for a single consumer (e.g., adding one item to a job queue).
  • notify_all(): Unblocks all waiting threads. Used when the state change alters a global condition that affects all threads simultaneously.

Executing notify_all() when only one thread can actually proceed (e.g., waking 10 worker threads when only 1 job was added to a queue) creates a "Thundering Herd". All 10 threads wake, context-switch into user space, and violently contend for the exact same std::mutex. One acquires it; the other 9 block again, destroying CPU cache lines and wasting scheduler quantums.


The Readers-Writers Synchronization Architecture

The Readers-Writers problem dictates synchronization for a shared resource where concurrent read operations are safe, but write operations must be strictly exclusive.

Standard State Matrix

Current State \ Incoming Request | Read Lock         | Write Lock
---------------------------------|-------------------|-------------------
Unlocked                         | Grant (Shared)    | Grant (Exclusive)
Read Locked (>= 1 readers)       | Grant (Shared)    | Block
Write Locked (1 writer)          | Block             | Block

Real-World Application: High-Frequency Telemetry Cache

Problem: A medical device or avionics system maintains a centralized telemetry registry. 50 diagnostic threads poll this registry at 1000 Hz. 1 hardware-interrupt handler thread updates the registry every 10 ms. Failure State: Utilizing a standard std::mutex forces the 50 diagnostic threads to serialize. This induces massive thread preemption, context-switching overhead, and cache staleness. The diagnostic threads spend their CPU quantums waiting on a lock rather than evaluating data. Architectural Solution: std::shared_mutex (C++17) or std::shared_timed_mutex (C++14).

#include <shared_mutex>
#include <mutex>

class TelemetryRegistry {
    std::shared_mutex rw_lock;
    TelemetryData data;

public:
    // Executed by 50 threads at 1000 Hz
    TelemetryData read() const {
        std::shared_lock<std::shared_mutex> lock(rw_lock); // Concurrent access
        return data; 
    }

    // Executed by 1 thread at 100 Hz
    void write(const TelemetryData& new_data) {
        std::unique_lock<std::shared_mutex> lock(rw_lock); // Exclusive access
        data = new_data;
    }
};

Under the Hood: Operating System Primitives

The C++ standard library does not implement lock mechanisms directly. std::shared_mutex acts as a zero-overhead wrapper over the target operating system's native synchronization primitives.

Linux (POSIX) Implementation

  • Underlying Primitive: pthread_rwlock_t.
  • Mechanics: Implemented within glibc utilizing atomic operations and futex (Fast Userspace Mutex) syscalls. A futex utilizes an integer in user-space memory monitored by the kernel. Lock acquisition attempts execute atomic Compare-And-Swap (CAS) instructions in userspace. If uncontended, zero kernel transitions occur. If the CAS fails (contention), the thread invokes the futex syscall to sleep, yielding the CPU.
  • The Default Policy: POSIX specifies that if a writer is waiting, whether incoming readers are granted access or forced to block is implementation-defined. Historically, default Linux implementations favor readers, leading to Writer Starvation.

Windows Implementation

  • Underlying Primitive: SRWLock (Slim Reader/Writer Lock).
  • Mechanics: Exclusively manages thread synchronization in user-mode using atomic instructions on a single pointer-sized value. It falls back to kernel-mode wait states only under sustained contention.
  • The Default Policy: SRWLock explicitly does not guarantee fairness or FIFO ordering. It optimizes for raw throughput. Writer starvation is a documented vulnerability if the read load is continuous and aggressive.

The Writer Starvation Vulnerability

If a continuous stream of readers acquires the shared lock, the internal atomic reference count never drops to zero. The writer thread remains blocked indefinitely.

Execution State (Starvation):

Time T0: Reader 1 acquires Shared Lock.
Time T1: Writer 1 requests Exclusive Lock -> Blocked (Active readers = 1).
Time T2: Reader 2 requests Shared Lock -> Granted (Active readers = 2).
Time T3: Reader 1 releases Shared Lock -> (Active readers = 1).
Time T4: Reader 3 requests Shared Lock -> Granted (Active readers = 2).
Result : Writer 1 is infinitely starved.

Architectural Solution: Strict Writer Priority

The C++ standard library lacks an interface to enforce writer priority on std::shared_mutex. In safety-critical systems where state mutations (writes) must be deterministic and bounded, relying on OS-defined scheduling fairness is a defect.

You must construct a custom synchronization primitive utilizing a standard std::mutex, std::condition_variable, and active state counters to mathematically enforce writer precedence.

ASCII State Tracking Logic

State Variables:
active_readers = 0
active_writers = 0
waiting_writers = 0

T0: R1 requests read -> active_readers=1. Granted.
T1: W1 requests write -> active_readers>0. waiting_writers=1. W1 Blocks.
T2: R2 requests read -> waiting_writers>0. R2 Blocks (Writer Priority Enforced).
T3: R1 releases read -> active_readers=0. W1 wakes.
T4: W1 acquires write -> waiting_writers=0, active_writers=1. W1 executes.
T5: W1 releases write -> active_writers=0. Wakes all blocked readers (R2).
T6: R2 acquires read -> active_readers=1. Granted.

Writer-Priority Implementation (C++11)

#include <mutex>
#include <condition_variable>

class WriterPriorityRWLock {
    std::mutex mtx;
    std::condition_variable read_cv;
    std::condition_variable write_cv;

    int active_readers = 0;
    int waiting_writers = 0;
    bool active_writer = false;

public:
    void lock_shared() {
        std::unique_lock<std::mutex> lock(mtx);
        // Readers strictly block if a writer is active OR waiting.
        read_cv.wait(lock, [this]() { 
            return !active_writer && waiting_writers == 0; 
        });
        active_readers++;
    }

    void unlock_shared() {
        std::unique_lock<std::mutex> lock(mtx);
        active_readers--;
        // Only wake writers when the last reader exits.
        if (active_readers == 0 && waiting_writers > 0) {
            write_cv.notify_one(); 
        }
    }

    void lock() {
        std::unique_lock<std::mutex> lock(mtx);
        waiting_writers++;
        // Writers strictly block if any readers or another writer are active.
        write_cv.wait(lock, [this]() { 
            return active_readers == 0 && !active_writer; 
        });
        waiting_writers--;
        active_writer = true;
    }

    void unlock() {
        std::unique_lock<std::mutex> lock(mtx);
        active_writer = false;
        
        // Priority delegation: Wake pending writers first.
        if (waiting_writers > 0) {
            write_cv.notify_one();
        } else {
            read_cv.notify_all();
        }
    }
};

Implementation Mechanics

  1. The Mutex: std::mutex mtx protects the internal counting variables, not the resource itself. It is held only during the brief moments of evaluating or modifying the access state.
  2. Predicate Evaluation: read_cv.wait() takes a lambda predicate to prevent spurious wakeups. The evaluation !active_writer && waiting_writers == 0 is the mathematical barrier preventing reader acquisition if a writer is queued.
  3. Thundering Herd Mitigation: notify_one() is strictly used for write_cv. Awakening multiple blocked writers when only one can acquire the logical lock induces useless context switching. notify_all() is used for read_cv because multiple blocked readers can acquire the lock simultaneously once writers yield.

C++ In Practice - Synchronization | Bijup.com