C++ classes for Windows User Mode Synchronization

The Windows API provides a set of synchronization mechanisms that are can only be used to synchronize operations within a single process. These are called user-mode synchronization mechanisms.

The Harlinn.Common.Core library comes with C++ classes for working with the user-mode synchronization mechanisms that is implemented by the Windows C API.

The user-mode synchronization mechanisms can avoid an expensive round-trip to kernel mode when objects are not locked, or the lock is released within a few thousand CPU cycles, which is most of the time. A thread starting a real wait operation and giving up the rest of its time-slice, must still enter kernel mode since it is only here that the system can schedule a thread for execution.

CriticalSection

The CriticalSection class wraps a CRITICAL_SECTION struct, adding no addition data members. The class is not copy constructible, or copy assignable, or move constructible or move assignable.

When a thread tries to acquire a lock on a critical section that is locked, the thread spins, trying to acquire the lock on the critical section without giving up the current time-slice for the thread. If the lock cannot be acquired before the loop is done, the thread goes to sleep to wait for the critical section to be released.

CriticalSection implements a synchronization mechanism that is similar to a mutex object, but it can only be used to synchronize the threads of a single process.

size_t counter = 0;
CriticalSection criticalSection;
criticalSection.Enter( );
ThreadGroup threadGroup;
for ( int i = 0; i < 100; ++i )
{
    threadGroup.Add( [i, &criticalSection, &counter]( )
    {
        auto id = i + 1;
        for ( int i = 0; i < 10; ++i )
        {
            PrintLn( "T{}: waiting", id );
            std::unique_lock lock( criticalSection );
            PrintLn( "T{}: acquired CriticalSection", id );
            ++counter;
            PrintLn( "T{}: value {}", id, counter );
        }
    } );
}
criticalSection.Leave( );
PrintLn( "Main thread waiting on background threads" );
threadGroup.join( );
PrintLn( "Final value {}", counter );

Constructors

CriticalSection has a single constructor:

explicit CriticalSection( UInt32 spinCount = DefaultSpinCount, bool noDebugInfo = true );

The spinCount argument specifies the spin count for the critical section.

The noDebugInfo argument specifies that the OS should not create debug information for the critical section.

The constructor uses InitializeCriticalSectionEx to initialize the CRITICAL_SECTION structure, and if the noDebugInfo argument is true, then the constructor will pass CRITICAL_SECTION_NO_DEBUG_INFO to the InitializeCriticalSectionEx function.

As of Vista, Windows Server 2008, Microsoft changed the way InitializeCriticalSection works.

As far as I understand it, InitializeCriticalSection, InitializeCriticalSectionAndSpinCount and InitializeCriticalSectionEx without the CRITICAL_SECTION_NO_DEBUG_INFO; now allocates some memory used for debug information, in the process address space, that is not released by DeleteCriticalSection. This will cause the process to leak a tiny amount of memory for each critical section that is deleted.

`TryEnter` and `try_lock`

The TryEnter function tries to acquire a lock on the critical section without blocking. If successful, the calling thread takes ownership of the critical section and must call Leave() or unlock() to release the lock. The try_lock() function just calls TryEnter().

`Enter` and `lock`

The Enter() function returns when the thread has acquired ownership of the lock on the critical section. The lock() function just calls Enter().

`Leave` and `unlock`

The Leave() function releases ownership of the lock on the critical section. Leave() must be called for each successful call to TryEnter() or Enter().

`SlimReaderWriterLock`

SlimReaderWriterLock class wraps a SRWLOCK struct, adding no additional data members. The class is not copy constructible, or copy assignable, or move constructible or move assignable. The default constructor initializes the SRWLOCK.

SlimReaderWriterLock is used to allow concurrent read access by multiple threads, while ensuring that when a thread writes to the protected resource it will have exclusive access, blocking other writers and readers.

Constructors

SlimReaderWriterLock has a single constructor:

SlimReaderWriterLock( ) noexcept

The constructor calls InitializeSRWLock(…) to initialize the SRWLOCK structure for the object.

`AcquireExclusive` and `lock`

Acquires the slim reader/writer lock in exclusive mode.

`AcquireShared` and `lock_shared`

Acquires the slim reader/writer lock in shared mode.

`TryAcquireExclusive` and `try_lock`

Tries to acquire the slim reader/writer lock in exclusive mode.

`TryAcquireShared` and `try_lock_shared`

Tries to acquire the slim reader/writer lock in shared mode.

`ReleaseExclusive` and `unlock`

Releases a lock that was acquired in exclusive mode.

`ReleaseShared` and `unlock_shared`

Releases a lock that was acquired in shared mode.

ConditionVariable

ConditionVariable class wraps a CONDITION_VARIABLE structure, adding no additional data members. The class is not copy constructible, or copy assignable, or move constructible or move assignable. The default constructor initializes the CONDITION_VARIABLE.

Condition variables are designed to let us wait on a change notification on a resource protected by a CriticalSection or SlimReaderWriterLock.

To demonstrate, here is a simple multi-producer, multi consumer, queue:

class SimpleQueue
{
    ConditionVariable queueEmpty_;
    ConditionVariable queueFull_;
    CriticalSection criticalSection_;
    size_t lastItemProduced_ = 0;
    size_t queueSize_ = 0;
    size_t startOffset_ = 0;
    bool closed_ = false;
public:
    static constexpr size_t MaxQueueSize = 50;
    using Container = std::array<size_t, MaxQueueSize>;
private:
    Container conatainer_;
public:
    SimpleQueue( )
    {
    }
    void Close( )
    {
        {
            std::unique_lock lock( criticalSection_ );
            closed_ = true;
        }
        queueEmpty_.WakeAll( );
        queueFull_.WakeAll( );
    }

    bool Push( size_t item )
    {
        {
            std::unique_lock lock( criticalSection_ );
            while ( queueSize_ == MaxQueueSize && closed_ == false )
            {
                queueFull_.Sleep( criticalSection_ );
            }
            if ( closed_ )
            {
                return false;
            }
            auto containerOffset = ( startOffset_ + queueSize_ ) % MaxQueueSize;
            conatainer_[containerOffset] = item;
            queueSize_++;
        }
        queueEmpty_.Wake( );
        return true;
    }
private:
    size_t PopValue( )
    {
        auto result = conatainer_[startOffset_];
        queueSize_--;
        startOffset_++;
        if ( startOffset_ == MaxQueueSize )
        {
            startOffset_ = 0;
        }
        return result;
    }
public:
    bool Pop( size_t& item )
    {
        bool result = false;
        {
            std::unique_lock lock( criticalSection_ );
            if ( closed_ == false )
            {
                while ( queueSize_ == 0 && closed_ == false )
                {
                    queueEmpty_.Sleep( criticalSection_ );
                }
            }
            if( queueSize_ )
            {
                item = PopValue( );
                result = true;
            }
        }
        if ( result && closed_ == false )
        {
            queueFull_.Wake( );
        }
        return result;
    }
};

Items can be pushed on the queue if it is not closed, and items can be popped as long as the queue is not closed or there are items in the queue.

The Push(…) function takes an exclusive lock on the queue, and if the queue is full, it waits on the ConditionVariable queueFull_ by calling Sleep(criticalSection_). Sleep(…) releases the specified critical section and initializes the wait as an atomic operation. The critical section is re-acquired before the call to Sleep(…) completes. This allows other threads to acquire the critical section and remove elements from the queue. Before returning the Push(…) function calls queueEmpty_.Wake( ) to release a single thread waiting for items to appear in the queue. The logic behind the Pop(…) is the same, except that this function will wait if the queue is empty, and notify waiting producers that there is room for more items after an item is removed from the queue.

This queue is quite simple to use:

SimpleQueue queue;
std::atomic<size_t> generated;
std::atomic<size_t> consumed;
ThreadGroup producerThreadGroup;
ThreadGroup consumerThreadGroup;

for ( int i = 0; i < 4; ++i )
{
    producerThreadGroup.Add( [i , &queue,&generated]( )
    {
        while ( queue.Push( 1 ) )
        {
            ++generated;
        }
        printf( "Producer %d done.\n", i + 1 );
    } );
}
for ( int i = 0; i < 4; ++i )
{
    consumerThreadGroup.Add( [i, &queue, &consumed]( )
    {
        size_t value = 0;
        while ( queue.Pop( value ) )
        {
            ++consumed;
        }
        printf( "Consumer %d done.\n", i + 1 );
    } );
}

puts( "Main thread going to sleep" );
CurrentThread::Sleep( TimeSpan::FromSeconds( 2 ) );
puts( "Main thread closing the queue" );
queue.Close( );
puts( "Main thread waiting on producer threads" );
producerThreadGroup.join( );
puts( "Main thread waiting on consumer threads" );
consumerThreadGroup.join( );
size_t sent = generated;
size_t received = consumed;
printf( "Result: produced %zu values and consumed %zu values\n", sent, received );

Constructors

ConditionVariable has a single constructor:

ConditionVariable( ) noexcept;

The constructor calls InitializeConditionVariable(…) to initialize the CONDITION_VARIABLE structure for the object.

Wake

The Wake() function wakes a single thread waiting on the condition variable.

WakeAll

The WakeAll() function wakes all threads waiting on the condition variable.

Sleep

The Sleep function has the following overloads:

bool Sleep( const CriticalSection& criticalSection,
            DWORD timeoutInMillis = INFINITE ) const;
bool Sleep( const CriticalSection& criticalSection,
            const TimeSpan& timeout ) const;
bool Sleep( const SlimReaderWriterLock& slimReaderWriterLock,
            DWORD timeoutInMillis = INFINITE, bool sharedMode = false ) const;
bool Sleep( const SlimReaderWriterLock& slimReaderWriterLock,
            const TimeSpan& timeout, bool sharedMode = false ) const;

The first and second overload sleeps on the specified condition variable and releases the specified critical section as an atomic operation.

The third and fourth overload sleeps on the specified condition variable and releases the specified SlimReaderWriterLock as an atomic operation. If sharedMode is true, the lock is held in share mode, otherwise the lock must be held in exclusive mode when calling the functions.

The timeout argument specifies the interval after which the functions returns, regardless of the outcome of the Sleep. The function returns false if a timeout occurred, true otherwise.

SynchronizationBarrier

A synchronization barrier allows multiple threads to wait until all threads have reached a stage of execution where they wait until the last thread arrives, and then they all continue their execution.

SynchronizationBarrier class wraps a SYNCHRONIZATION_BARRIER struct, adding no additional data members. The class is not copy constructible, or copy assignable, or move constructible or move assignable. The default constructor initializes the SYNCHRONIZATION_BARRIER structure.

Constructors

SynchronizationBarrier has a single constructor:

explicit SynchronizationBarrier( UInt32 totalThreads, Int32 spinCount = -1 );

The totalThreads argument specifies the number of threads that must enter the barrier, before all the threads are allowed to continue.

The spinCount argument specifies the number of times threads will spin while waiting for other threads to arrive at the barrier. If this parameter is -1, the thread spins 2000 times. When the thread exceeds spinCount, the thread blocks unless it called Enter(…) with SynchronizationBarrierFlags::SpinOnly.

Enter

The Enter(…) function causes the calling thread to wait until the required number of threads have entered the barrier.

TimerQueue

TimerQueue is a wrapper around the Windows timer queue. The implementation makes it easy to specify the callback using anything that is invokable, and an arbitrary number of arguments can be passed to the invokable implementation in the same way parameters can be passed to std::thread.

size_t counter = 0;
EventWaitHandle event( true );

TimerQueue timerQueue;
auto timer = timerQueue.CreateTimer( 100, 100, TimerQueueTimerFlags::Default, 
                                    [&counter,&event]( )
{
    counter++;
    if ( counter == 5 )
    {
        event.Signal( );
    }
} );

event.Wait( );
timer.Close( );
timerQueue.Close( );

CriticalSection

Constructors

TryEnter and try_lock

Enter and lock

Leave and unlock

SlimReaderWriterLock

Constructors

AcquireExclusive and lock

AcquireShared and lock_shared

TryAcquireExclusive and try_lock

TryAcquireShared and try_lock_shared

ReleaseExclusive and unlock

ReleaseShared and unlock_shared

ConditionVariable

Constructors

Wake

WakeAll

Sleep

SynchronizationBarrier

Constructors

Enter

TimerQueue

`TryEnter` and `try_lock`

`Enter` and `lock`

`Leave` and `unlock`

`SlimReaderWriterLock`

`AcquireExclusive` and `lock`

`AcquireShared` and `lock_shared`

`TryAcquireExclusive` and `try_lock`

`TryAcquireShared` and `try_lock_shared`

`ReleaseExclusive` and `unlock`

`ReleaseShared` and `unlock_shared`