C++ classes for Windows User Mode Synchronization
The Windows API provides a set of synchronization mechanisms that are can only be used to synchronize operations within a single process. These are called user-mode synchronization mechanisms.
The Harlinn.Common.Core library comes with C++ classes for working with the user-mode synchronization mechanisms that is implemented by the Windows C API.
The user-mode synchronization mechanisms can avoid an expensive round-trip to kernel mode when objects are not locked, or the lock is released within a few thousand CPU cycles, which is most of the time. A thread starting a real wait operation and giving up the rest of its time-slice, must still enter kernel mode since it is only here that the system can schedule a thread for execution.
CriticalSection
The CriticalSection
class wraps a CRITICAL_SECTION
struct, adding no addition
data members. The class is not copy constructible, or copy assignable, or
move constructible or move assignable.
When a thread tries to acquire a lock on a critical section that is locked, the thread spins, trying to acquire the lock on the critical section without giving up the current time-slice for the thread. If the lock cannot be acquired before the loop is done, the thread goes to sleep to wait for the critical section to be released.
CriticalSection
implements a synchronization mechanism that is similar to a
mutex object, but it can only be used to synchronize the threads of a single process.
size_t counter = 0;
CriticalSection criticalSection;
criticalSection.Enter( );
ThreadGroup threadGroup;
for ( int i = 0; i < 100; ++i )
{
threadGroup.Add( [i, &criticalSection, &counter]( )
{
auto id = i + 1;
for ( int i = 0; i < 10; ++i )
{
PrintLn( "T{}: waiting", id );
std::unique_lock lock( criticalSection );
PrintLn( "T{}: acquired CriticalSection", id );
++counter;
PrintLn( "T{}: value {}", id, counter );
}
} );
}
criticalSection.Leave( );
PrintLn( "Main thread waiting on background threads" );
threadGroup.join( );
PrintLn( "Final value {}", counter );
Constructors
CriticalSection has a single constructor:
explicit CriticalSection( UInt32 spinCount = DefaultSpinCount, bool noDebugInfo = true );
The spinCount
argument specifies the spin count for the critical section.
The noDebugInfo
argument specifies that the OS should not create debug
information for the critical section.
The constructor uses InitializeCriticalSectionEx
to initialize the
CRITICAL_SECTION
structure, and if the noDebugInfo
argument is true
,
then the constructor will pass CRITICAL_SECTION_NO_DEBUG_INFO
to the
InitializeCriticalSectionEx
function.
As of Vista, Windows Server 2008, Microsoft changed the way InitializeCriticalSection
works.
As far as I understand it, InitializeCriticalSection
, InitializeCriticalSectionAndSpinCount
and InitializeCriticalSectionEx
without the CRITICAL_SECTION_NO_DEBUG_INFO
;
now allocates some memory used for debug information, in the process address space,
that is not released by DeleteCriticalSection
. This will cause the process to
leak a tiny amount of memory for each critical section that is deleted.
TryEnter
and try_lock
The TryEnter
function tries to acquire a lock on the critical section
without blocking. If successful, the calling thread takes ownership of
the critical section and must call Leave()
or unlock()
to release the
lock. The try_lock()
function just calls TryEnter()
.
Enter
and lock
The Enter()
function returns when the thread has acquired ownership
of the lock on the critical section. The lock()
function just calls
Enter()
.
Leave
and unlock
The Leave()
function releases ownership of the lock on the critical
section. Leave()
must be called for each successful call to
TryEnter()
or Enter()
.
SlimReaderWriterLock
SlimReaderWriterLock
class wraps a SRWLOCK
struct, adding no
additional data members. The class is not copy constructible, or
copy assignable, or move constructible or move assignable. The default
constructor initializes the SRWLOCK
.
SlimReaderWriterLock
is used to allow concurrent read access
by multiple threads, while ensuring that when a thread writes
to the protected resource it will have exclusive access, blocking
other writers and readers.
Constructors
SlimReaderWriterLock
has a single constructor:
SlimReaderWriterLock( ) noexcept
The constructor calls InitializeSRWLock(…)
to initialize the
SRWLOCK
structure for the object.
AcquireExclusive
and lock
Acquires the slim reader/writer lock in exclusive mode.
AcquireShared
and lock_shared
Acquires the slim reader/writer lock in shared mode.
TryAcquireExclusive
and try_lock
Tries to acquire the slim reader/writer lock in exclusive mode.
TryAcquireShared
and try_lock_shared
Tries to acquire the slim reader/writer lock in shared mode.
ReleaseExclusive
and unlock
Releases a lock that was acquired in exclusive mode.
ReleaseShared
and unlock_shared
Releases a lock that was acquired in shared mode.
ConditionVariable
ConditionVariable
class wraps a CONDITION_VARIABLE
structure,
adding no additional data members. The class is not copy constructible,
or copy assignable, or move constructible or move assignable. The
default constructor initializes the CONDITION_VARIABLE
.
Condition variables are designed to let us wait on a change notification
on a resource protected by a CriticalSection
or SlimReaderWriterLock
.
To demonstrate, here is a simple multi-producer, multi consumer, queue:
class SimpleQueue
{
ConditionVariable queueEmpty_;
ConditionVariable queueFull_;
CriticalSection criticalSection_;
size_t lastItemProduced_ = 0;
size_t queueSize_ = 0;
size_t startOffset_ = 0;
bool closed_ = false;
public:
static constexpr size_t MaxQueueSize = 50;
using Container = std::array<size_t, MaxQueueSize>;
private:
Container conatainer_;
public:
SimpleQueue( )
{
}
void Close( )
{
{
std::unique_lock lock( criticalSection_ );
closed_ = true;
}
queueEmpty_.WakeAll( );
queueFull_.WakeAll( );
}
bool Push( size_t item )
{
{
std::unique_lock lock( criticalSection_ );
while ( queueSize_ == MaxQueueSize && closed_ == false )
{
queueFull_.Sleep( criticalSection_ );
}
if ( closed_ )
{
return false;
}
auto containerOffset = ( startOffset_ + queueSize_ ) % MaxQueueSize;
conatainer_[containerOffset] = item;
queueSize_++;
}
queueEmpty_.Wake( );
return true;
}
private:
size_t PopValue( )
{
auto result = conatainer_[startOffset_];
queueSize_--;
startOffset_++;
if ( startOffset_ == MaxQueueSize )
{
startOffset_ = 0;
}
return result;
}
public:
bool Pop( size_t& item )
{
bool result = false;
{
std::unique_lock lock( criticalSection_ );
if ( closed_ == false )
{
while ( queueSize_ == 0 && closed_ == false )
{
queueEmpty_.Sleep( criticalSection_ );
}
}
if( queueSize_ )
{
item = PopValue( );
result = true;
}
}
if ( result && closed_ == false )
{
queueFull_.Wake( );
}
return result;
}
};
Items can be pushed on the queue if it is not closed, and items can be popped as long as the queue is not closed or there are items in the queue.
The Push(…)
function takes an exclusive lock on the queue, and
if the queue is full, it waits on the ConditionVariable
queueFull_
by calling Sleep(criticalSection_)
.
Sleep(…)
releases the specified critical section and initializes
the wait as an atomic operation. The critical section is re-acquired
before the call to Sleep(…)
completes. This allows other threads to
acquire the critical section and remove elements from the queue.
Before returning the Push(…)
function calls queueEmpty_.Wake( )
to
release a single thread waiting for items to appear in the queue.
The logic behind the Pop(…)
is the same, except that this function
will wait if the queue is empty, and notify waiting producers
that there is room for more items after an item is removed from the queue.
This queue is quite simple to use:
SimpleQueue queue;
std::atomic<size_t> generated;
std::atomic<size_t> consumed;
ThreadGroup producerThreadGroup;
ThreadGroup consumerThreadGroup;
for ( int i = 0; i < 4; ++i )
{
producerThreadGroup.Add( [i , &queue,&generated]( )
{
while ( queue.Push( 1 ) )
{
++generated;
}
printf( "Producer %d done.\n", i + 1 );
} );
}
for ( int i = 0; i < 4; ++i )
{
consumerThreadGroup.Add( [i, &queue, &consumed]( )
{
size_t value = 0;
while ( queue.Pop( value ) )
{
++consumed;
}
printf( "Consumer %d done.\n", i + 1 );
} );
}
puts( "Main thread going to sleep" );
CurrentThread::Sleep( TimeSpan::FromSeconds( 2 ) );
puts( "Main thread closing the queue" );
queue.Close( );
puts( "Main thread waiting on producer threads" );
producerThreadGroup.join( );
puts( "Main thread waiting on consumer threads" );
consumerThreadGroup.join( );
size_t sent = generated;
size_t received = consumed;
printf( "Result: produced %zu values and consumed %zu values\n", sent, received );
Constructors
ConditionVariable
has a single constructor:
ConditionVariable( ) noexcept;
The constructor calls InitializeConditionVariable(…)
to initialize the
CONDITION_VARIABLE
structure for the object.
Wake
The Wake()
function wakes a single thread waiting on the condition variable.
WakeAll
The WakeAll()
function wakes all threads waiting on the condition variable.
Sleep
The Sleep
function has the following overloads:
bool Sleep( const CriticalSection& criticalSection,
DWORD timeoutInMillis = INFINITE ) const;
bool Sleep( const CriticalSection& criticalSection,
const TimeSpan& timeout ) const;
bool Sleep( const SlimReaderWriterLock& slimReaderWriterLock,
DWORD timeoutInMillis = INFINITE, bool sharedMode = false ) const;
bool Sleep( const SlimReaderWriterLock& slimReaderWriterLock,
const TimeSpan& timeout, bool sharedMode = false ) const;
The first and second overload sleeps on the specified condition variable and releases the specified critical section as an atomic operation.
The third and fourth overload sleeps on the specified condition variable
and releases the specified SlimReaderWriterLock
as an atomic operation.
If sharedMode
is true, the lock is held in share mode, otherwise the
lock must be held in exclusive mode when calling the functions.
The timeout
argument specifies the interval after which the functions
returns, regardless of the outcome of the Sleep
. The function returns
false
if a timeout occurred, true
otherwise.
SynchronizationBarrier
A synchronization barrier allows multiple threads to wait until all threads have reached a stage of execution where they wait until the last thread arrives, and then they all continue their execution.
SynchronizationBarrier
class wraps a SYNCHRONIZATION_BARRIER
struct,
adding no additional data members. The class is not copy constructible,
or copy assignable, or move constructible or move assignable.
The default constructor initializes the SYNCHRONIZATION_BARRIER
structure.
Constructors
SynchronizationBarrier has a single constructor:
explicit SynchronizationBarrier( UInt32 totalThreads, Int32 spinCount = -1 );
The totalThreads
argument specifies the number of threads that
must enter the barrier, before all the threads are allowed to
continue.
The spinCount
argument specifies the number of times threads
will spin while waiting for other threads to arrive at the barrier.
If this parameter is -1, the thread spins 2000 times. When the
thread exceeds spinCount
, the thread blocks unless it called
Enter(…)
with SynchronizationBarrierFlags::SpinOnly
.
Enter
The Enter(…)
function causes the calling thread to wait until
the required number of threads have entered the barrier.
TimerQueue
TimerQueue
is a wrapper around the Windows timer queue. The
implementation makes it easy to specify the callback using
anything that is invokable, and an arbitrary number of arguments
can be passed to the invokable implementation in the same way
parameters can be passed to std::thread
.
size_t counter = 0;
EventWaitHandle event( true );
TimerQueue timerQueue;
auto timer = timerQueue.CreateTimer( 100, 100, TimerQueueTimerFlags::Default,
[&counter,&event]( )
{
counter++;
if ( counter == 5 )
{
event.Signal( );
}
} );
event.Wait( );
timer.Close( );
timerQueue.Close( );