Hardware-Based Solutions for Process Synchronization

When it comes to keeping processes in sync, you can either handle it through software algorithms or let the hardware do the heavy lifting. This lesson is about the hardware side of things.

The core idea behind hardware-based synchronization is atomic instructions. These are special machine-level operations that, once started, cannot be interrupted or paused midway. No other process or core can sneak in while one of these instructions is running. That guarantee of indivisibility is what makes them useful for synchronization in the first place.

What Does Atomic Mean?

An atomic operation is one that completes in a single, uninterruptible step from the perspective of all other threads and processors. During its execution, the CPU places a write lock on the relevant memory cache line using the MESI protocol. Any other core trying to touch that same memory location gets blocked until the instruction finishes and the lock is released.

Test and Set

Test and Set is about as simple as it gets for hardware synchronization. It reads the current value of a memory location and immediately sets it to 1, all in one uninterruptible step. Whatever the original value was gets returned so the caller knows whether the lock was free (0) or already taken (1).

Here is the assembly-level representation:

text

TEST_AND_SET:
    LOAD R1, M;     // Load the value of memory location M into register R1
    MOV R2, #1;     // Set R2 to 1 (indicating lock is acquired)
    STORE R2, M;    // Store the value of R2 back to memory location M
    RETURN R1;      // Return the original value of M

And this is the equivalent logic expressed in C++:

cpp

bool test_and_set(bool *lock) {
    bool old = *lock;   // Read the current value
    *lock = true;       // Set it to true (locked)
    return old;         // Return the old value
}

If the returned value is false, the lock was free and the caller has now acquired it. If the returned value is true, someone else already holds the lock and the caller must try again.

The critical detail is that reading the old value and writing the new value happen as a single indivisible operation at the hardware level. There is no window between the read and the write where another core could interfere.

Compare and Swap

Compare and Swap (CAS) takes a slightly more cautious approach than Test and Set. Instead of blindly setting a value, it first checks whether the current value matches what you expect. If it does, it swaps in the new value. If it does not, it leaves things as they are and just returns the current value so the caller knows something changed.

Here is the assembly:

text

COMPARE_AND_SWAP:
    LOAD R1, M;     // Load the value of memory location M into R1
    CMP R1, R2;     // Compare R1 with the expected value in R2
    BEQ SWAP;       // If equal, branch to SWAP
    RETURN R1;      // Otherwise, return the original value of M
SWAP:
    STORE R3, M;    // Store the new value in R3 to memory location M
    RETURN R1;      // Return the original value of M

And the C++ version:

cpp

int compare_and_swap(int *ptr, int expected, int new_value) {
    int old = *ptr;
    if (old == expected) {
        *ptr = new_value;    // Swap only if current == expected
    }
    return old;              // Always return the original value
}

CAS is the foundation of most modern lock-free data structures. Languages like Java expose it directly through the AtomicInteger class, and C++ provides it via std::atomic. It is considered more flexible than Test and Set because it can be used for more than just locks.

Test and Set vs Compare and Swap

Feature	Test and Set	Compare and Swap
Operation	Reads the old value and unconditionally sets it to 1	Reads the old value and conditionally swaps only if it matches the expected value
Parameters	Takes 1 parameter (the lock address)	Takes 3 parameters (address, expected value, new value)
Flexibility	Limited to binary lock/unlock scenarios	Can implement counters, queues, and lock-free algorithms
Modern Usage	Mostly used in simple spinlocks	Foundation of lock-free programming in Java, C++, and Go

Spinlocks

Spinlocks are built on top of these atomic instructions. The way they work is straightforward: if a thread tries to acquire a lock that is already taken, it just keeps looping and checking until the lock becomes free. It does not sleep or yield. It just sits there spinning.

Here is a basic spinlock implementation using Test and Set:

cpp

class Spinlock {
    volatile bool lock = false;
public:
    void acquire() {
        while (test_and_set(&lock)) {
            // Busy wait: keep spinning until lock is free
        }
    }
    void release() {
        lock = false;
    }
};

The acquire() method keeps calling test_and_set() in a loop until it gets back a false, which means the lock was free and has now been taken. The release() method simply sets the lock back to false so the next thread in line can grab it.

The Spinlock Trade-Off

Spinlocks are very fast when the wait time is extremely short (nanoseconds), because they avoid the overhead of putting a thread to sleep and waking it up. However, they waste CPU cycles when a thread ends up spinning for a long time, since the CPU is burning energy doing absolutely nothing useful. For this reason, spinlocks work best when locks are held for very brief periods, typically in kernel-level code.

The Underlying Principle

All three of these mechanisms, Test and Set, Compare and Swap, and Spinlocks, rely on the same underlying principle: using hardware guarantees to make certain operations indivisible. That indivisibility is what allows processes running on different cores to coordinate without corrupting shared data.

Without these hardware primitives, every software-based synchronization algorithm (Peterson's, Dekker's, Bakery) would need to make much stronger assumptions about memory ordering and instruction interleaving, making them fragile and often impractical on modern multi-core processors.

Hardware Sync Check

Question 1 of 1

Test your understanding of atomic hardware instructions.

A thread calls test_and_set(&lock) and receives a return value of true. What does this mean?

Hardware-Based Solutions for Process Synchronization
Test and Set
Compare and Swap
Test and Set vs Compare and Swap
Spinlocks
The Underlying Principle