All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFT PATCH 0/2] win32: optimize emulation of condition variables
@ 2010-06-07 13:38 Paolo Bonzini
  2010-06-07 13:38 ` [RFT PATCH 1/2] win32: optimize condition variable implementation Paolo Bonzini
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Paolo Bonzini @ 2010-06-07 13:38 UTC (permalink / raw)
  To: git; +Cc: j.sixt

I recently looked at git's condvar implementation for use in another
project, and found a couple of simple optimization opportunities.
We can drop the waiters_lock, and we can make broadcast asynchronous
if it is waking up only one thread.

I made two simple tests:

- 1 thread sending a broadcast every 10 msec, 10 threads calling
  pthread_cond_wait every 100 msec.  Timings are (average of three runs):

  before     2.094 us/wait      4.015 us/broadcast
  after      2.064 us/wait      2.883 us/broadcast

  i.e. most broadcasts and waits hit the fast path, the few that don't
  likely avoid the rendez-vous after the patch.  In this case the
  waiters_lock is always hitting its own fast path.  The speedup is
  mostly in broadcast, and comes mostly from the second patch.

- 1 thread sending a broadcast every 100 msec, 10 threads calling
  pthread_cond_wait every 10 msec.  Timings are:

  before     17.59 us/wait      192.2 us/broadcast
  after      8.959 us/wait      141.1 us/broadcast

  i.e. most broadcasts hit the slow path, and there will be also high
  contention on waiters_lock after the broadcast.  In this case the
  speedup comes from avoiding locks in the first patch.


I have tested this patch quite thoroughly outside git, but not as part
of it.  So help with that would be appreciated.

Thanks,

Paolo

Paolo Bonzini (2):
  win32: optimize condition variable implementation
  win32: optimize pthread_cond_broadcast

 compat/win32/pthread.c |   71 +++++++++++++++++++++++++----------------------
 compat/win32/pthread.h |    1 -
 2 files changed, 38 insertions(+), 34 deletions(-)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [RFT PATCH 1/2] win32: optimize condition variable implementation
  2010-06-07 13:38 [RFT PATCH 0/2] win32: optimize emulation of condition variables Paolo Bonzini
@ 2010-06-07 13:38 ` Paolo Bonzini
  2010-06-08 16:16   ` Johannes Sixt
  2010-06-07 13:38 ` [RFT PATCH 2/2] win32: optimize pthread_cond_broadcast Paolo Bonzini
  2010-06-13 10:16 ` [PATCH 3/2] fix race in win32 pthread_cond_signal causing spurious wakeups Paolo Bonzini
  2 siblings, 1 reply; 9+ messages in thread
From: Paolo Bonzini @ 2010-06-07 13:38 UTC (permalink / raw)
  To: git; +Cc: j.sixt

The fact that the condition variable mutex must be held
(in this implementation) at the time of pthread_cond_signal
and pthread_cond_broadcast means that most of the time the
waiters_lock is useless.  There is exactly one place where
the mutex is not held, and an atomic decrement suffices
there.

Signed-off-by: Paolo Bonzini <bonzini@gnu.org>
---
 compat/win32/pthread.c |   54 +++++++++++++++++++++++++----------------------
 compat/win32/pthread.h |    1 -
 2 files changed, 29 insertions(+), 26 deletions(-)

diff --git a/compat/win32/pthread.c b/compat/win32/pthread.c
index 010e875..1a38981 100644
--- a/compat/win32/pthread.c
+++ b/compat/win32/pthread.c
@@ -61,7 +61,6 @@ int pthread_cond_init(pthread_cond_t *cond, const void *unused)
 {
 	cond->waiters = 0;
 	cond->was_broadcast = 0;
-	InitializeCriticalSection(&cond->waiters_lock);
 
 	cond->sema = CreateSemaphore(NULL, 0, LONG_MAX, NULL);
 	if (!cond->sema)
@@ -81,17 +80,17 @@ int pthread_cond_destroy(pthread_cond_t *cond)
 {
 	CloseHandle(cond->sema);
 	CloseHandle(cond->continue_broadcast);
-	DeleteCriticalSection(&cond->waiters_lock);
 	return 0;
 }
 
 int pthread_cond_wait(pthread_cond_t *cond, CRITICAL_SECTION *mutex)
 {
-	int last_waiter;
+	int num_waiters;
 
-	EnterCriticalSection(&cond->waiters_lock);
+	/*
+	 * This access is protected under the mutex.
+	 */
 	cond->waiters++;
-	LeaveCriticalSection(&cond->waiters_lock);
 
 	/*
 	 * Unlock external mutex and wait for signal.
@@ -105,17 +104,17 @@ int pthread_cond_wait(pthread_cond_t *cond, CRITICAL_SECTION *mutex)
 	WaitForSingleObject(cond->sema, INFINITE);
 
 	/*
-	 * Decrease waiters count. If we are the last waiter, then we must
+	 * Decrease waiters count.  The mutex prevents concurrent increments,
+	 * so doing this decrement atomically is enough.
+	 */
+	num_waiters = InterlockedDecrement(&cond->waiters);
+
+	/* If we are the last waiter, then we must
 	 * notify the broadcasting thread that it can continue.
 	 * But if we continued due to cond_signal, we do not have to do that
 	 * because the signaling thread knows that only one waiter continued.
 	 */
-	EnterCriticalSection(&cond->waiters_lock);
-	cond->waiters--;
-	last_waiter = cond->was_broadcast && cond->waiters == 0;
-	LeaveCriticalSection(&cond->waiters_lock);
-
-	if (last_waiter) {
+	if (num_waiters == 0 && cond->was_broadcast) {
 		/*
 		 * cond_broadcast was issued while mutex was held. This means
 		 * that all other waiters have continued, but are contending
@@ -145,16 +144,17 @@ int pthread_cond_wait(pthread_cond_t *cond, CRITICAL_SECTION *mutex)
  */
 int pthread_cond_signal(pthread_cond_t *cond)
 {
-	int have_waiters;
-
-	EnterCriticalSection(&cond->waiters_lock);
-	have_waiters = cond->waiters > 0;
-	LeaveCriticalSection(&cond->waiters_lock);
-
 	/*
-	 * Signal only when there are waiters
+	 * Signal only when there are waiters.  cond->waiters is
+	 * incremented by pthread_cond_wait under the external lock,
+	 * so we are safe about that.
+	 *
+	 * Waiting threads decrement it outside the external lock, but
+	 * only if another thread is executing pthread_cond_signal or
+	 * pthread_cond_broadcast---which means it also cannot be
+	 * decremented concurrently with this particular access.
 	 */
-	if (have_waiters)
+	if (cond->waiters > 0)
 		return ReleaseSemaphore(cond->sema, 1, NULL) ?
 			0 : err_win_to_posix(GetLastError());
 	else
@@ -168,12 +168,18 @@ int pthread_cond_signal(pthread_cond_t *cond)
  */
 int pthread_cond_broadcast(pthread_cond_t *cond)
 {
-	EnterCriticalSection(&cond->waiters_lock);
+	/*
+	 * As in pthread_cond_signal, access to cond->waiters and
+	 * cond->was_broadcast is locked via the external mutex.
+	 */
 
 	if ((cond->was_broadcast = cond->waiters > 0)) {
+		BOOLEAN result;
 		/* wake up all waiters */
-		ReleaseSemaphore(cond->sema, cond->waiters, NULL);
-		LeaveCriticalSection(&cond->waiters_lock);
+		result = ReleaseSemaphore(cond->sema, cond->waiters, NULL);
+		if (!result)
+			return err_win_to_posix(GetLastError());
+
 		/*
 		 * At this point all waiters continue. Each one takes its
 		 * slice of the semaphor. Now it's our turn to wait: Since
@@ -189,8 +195,6 @@ int pthread_cond_broadcast(pthread_cond_t *cond)
 		 * without cond->waiters_lock held.
 		 */
 		cond->was_broadcast = 0;
-	} else {
-		LeaveCriticalSection(&cond->waiters_lock);
 	}
 	return 0;
 }
diff --git a/compat/win32/pthread.h b/compat/win32/pthread.h
index 2e20548..f38c556 100644
--- a/compat/win32/pthread.h
+++ b/compat/win32/pthread.h
@@ -40,7 +40,6 @@ typedef int pthread_mutexattr_t;
 typedef struct {
 	LONG waiters;
 	int was_broadcast;
-	CRITICAL_SECTION waiters_lock;
 	HANDLE sema;
 	HANDLE continue_broadcast;
 } pthread_cond_t;
-- 
1.7.0.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [RFT PATCH 2/2] win32: optimize pthread_cond_broadcast
  2010-06-07 13:38 [RFT PATCH 0/2] win32: optimize emulation of condition variables Paolo Bonzini
  2010-06-07 13:38 ` [RFT PATCH 1/2] win32: optimize condition variable implementation Paolo Bonzini
@ 2010-06-07 13:38 ` Paolo Bonzini
  2010-06-08 16:30   ` Johannes Sixt
  2010-06-13 10:16 ` [PATCH 3/2] fix race in win32 pthread_cond_signal causing spurious wakeups Paolo Bonzini
  2 siblings, 1 reply; 9+ messages in thread
From: Paolo Bonzini @ 2010-06-07 13:38 UTC (permalink / raw)
  To: git; +Cc: j.sixt

If there is a single waiting thread, pthread_cond_signal is the
same as pthread_cond_broadcast and no extra synchronization is
necessary.

Signed-off-by: Paolo Bonzini <bonzini@gnu.org>
---
 compat/win32/pthread.c |   19 ++++++++++---------
 1 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/compat/win32/pthread.c b/compat/win32/pthread.c
index 1a38981..d46a51c 100644
--- a/compat/win32/pthread.c
+++ b/compat/win32/pthread.c
@@ -172,9 +172,10 @@ int pthread_cond_broadcast(pthread_cond_t *cond)
 	 * As in pthread_cond_signal, access to cond->waiters and
 	 * cond->was_broadcast is locked via the external mutex.
 	 */
-
-	if ((cond->was_broadcast = cond->waiters > 0)) {
+	if (cond->waiters > 0) {
 		BOOLEAN result;
+		cond->was_broadcast = cond->waiters > 1;
+
 		/* wake up all waiters */
 		result = ReleaseSemaphore(cond->sema, cond->waiters, NULL);
 		if (!result)
@@ -187,14 +188,14 @@ int pthread_cond_broadcast(pthread_cond_t *cond)
 		 * yet. For this reason, we can be sure that no thread gets
 		 * a chance to eat *more* than one slice. OTOH, it means
 		 * that the last waiter must send us a wake-up.
+		 *
+		 * As an optimization, when there was exactly one waiter
+		 * broadcast is the same as signal and we can skip this step.
 		 */
-		WaitForSingleObject(cond->continue_broadcast, INFINITE);
-		/*
-		 * Since the external mutex is held, no thread can enter
-		 * cond_wait, and, hence, it is safe to reset this flag
-		 * without cond->waiters_lock held.
-		 */
-		cond->was_broadcast = 0;
+		if (cond->was_broadcast) {
+			WaitForSingleObject(cond->continue_broadcast, INFINITE);
+			cond->was_broadcast = 0;
+		}
 	}
 	return 0;
 }
-- 
1.7.0.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [RFT PATCH 1/2] win32: optimize condition variable implementation
  2010-06-07 13:38 ` [RFT PATCH 1/2] win32: optimize condition variable implementation Paolo Bonzini
@ 2010-06-08 16:16   ` Johannes Sixt
  2010-06-08 16:27     ` Paolo Bonzini
  0 siblings, 1 reply; 9+ messages in thread
From: Johannes Sixt @ 2010-06-08 16:16 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: git

Am 07.06.2010 15:38, schrieb Paolo Bonzini:
>   int pthread_cond_wait(pthread_cond_t *cond, CRITICAL_SECTION *mutex)
>   {
> -	int last_waiter;
> +	int num_waiters;
>
> -	EnterCriticalSection(&cond->waiters_lock);
> +	/*
> +	 * This access is protected under the mutex.
> +	 */
>   	cond->waiters++;
> -	LeaveCriticalSection(&cond->waiters_lock);
>
>   	/*
>   	 * Unlock external mutex and wait for signal.
> @@ -105,17 +104,17 @@ int pthread_cond_wait(pthread_cond_t *cond, CRITICAL_SECTION *mutex)
>   	WaitForSingleObject(cond->sema, INFINITE);
>
>   	/*
> -	 * Decrease waiters count. If we are the last waiter, then we must
> +	 * Decrease waiters count.  The mutex prevents concurrent increments,
> +	 * so doing this decrement atomically is enough.
> +	 */
> +	num_waiters = InterlockedDecrement(&cond->waiters);
> +
> +	/* If we are the last waiter, then we must
>   	 * notify the broadcasting thread that it can continue.
>   	 * But if we continued due to cond_signal, we do not have to do that
>   	 * because the signaling thread knows that only one waiter continued.
>   	 */
> -	EnterCriticalSection(&cond->waiters_lock);
> -	cond->waiters--;
> -	last_waiter = cond->was_broadcast&&  cond->waiters == 0;
> -	LeaveCriticalSection(&cond->waiters_lock);
> -
> -	if (last_waiter) {
> +	if (num_waiters == 0&&  cond->was_broadcast) {
>   		/*
>   		 * cond_broadcast was issued while mutex was held. This means
>   		 * that all other waiters have continued, but are contending

This is not correct. While it is not possible that two threads increment 
waiters at the same time due to the external mutex, it is still possible 
that on thread increments, and a different one decrements. You lost all 
provisions to avoid that.

Furthermore, waiters_lock not only protects waiters, but also the combined 
state of waiters and was_broadcast. You break this protection. See also here:

> @@ -168,12 +168,18 @@ int pthread_cond_signal(pthread_cond_t *cond)
>    */
>   int pthread_cond_broadcast(pthread_cond_t *cond)
>   {
> -	EnterCriticalSection(&cond->waiters_lock);
> +	/*
> +	 * As in pthread_cond_signal, access to cond->waiters and
> +	 * cond->was_broadcast is locked via the external mutex.
> +	 */
>
>   	if ((cond->was_broadcast = cond->waiters>  0)) {
> +		BOOLEAN result;
>   		/* wake up all waiters */
> -		ReleaseSemaphore(cond->sema, cond->waiters, NULL);
> -		LeaveCriticalSection(&cond->waiters_lock);
> +		result = ReleaseSemaphore(cond->sema, cond->waiters, NULL);
> +		if (!result)
> +			return err_win_to_posix(GetLastError());
> +
>   		/*
>   		 * At this point all waiters continue. Each one takes its
>   		 * slice of the semaphor. Now it's our turn to wait: Since

-- Hannes

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFT PATCH 1/2] win32: optimize condition variable implementation
  2010-06-08 16:16   ` Johannes Sixt
@ 2010-06-08 16:27     ` Paolo Bonzini
  0 siblings, 0 replies; 9+ messages in thread
From: Paolo Bonzini @ 2010-06-08 16:27 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: git

On 06/08/2010 06:16 PM, Johannes Sixt wrote:
> This is not correct. While it is not possible that two threads increment
> waiters at the same time due to the external mutex, it is still possible
> that on thread increments, and a different one decrements. You lost all
> provisions to avoid that.

Actually, the patch is only relying more widely on the preexisting 
assumptions of the code:

/*
  * IMPORTANT: This implementation requires that pthread_cond_signal
  * is called while the mutex is held that is used in the corresponding
  * pthread_cond_wait calls!
  */

/*
  * IMPORTANT: This implementation requires that pthread_cond_broadcast
  * is called while the mutex is held that is used in the corresponding
  * pthread_cond_wait calls!
  */

During the locked decrements, but then the external mutex is held by the 
thread executing pthread_cond_signal/pthread_cond_broadcast, so that 
section of the code is still protected against increments.

> Furthermore, waiters_lock not only protects waiters, but also the
> combined state of waiters and was_broadcast.

Concurrent pthread_cond_broadcast are protected by the external mutex, 
and was_broadcast is similarly protected against increments of waiters.

Futhermore, access to was_broadcast is serialized between 
pthread_cond_wait and pthread_cond_broadcast through the semaphore and 
the event.  was_broadcast may change from 0 to 1 while pthread_cond_wait 
is not holding the external mutex, but then pthread_cond_wait is 
sleeping on the semaphore or will go to sleep very soon.  And it can 
change from 1 to 0 only after pthread_cond_wait has signaled the event, 
which means pthread_cond_wait will be waiting to reacquire the external 
mutex.

Paolo

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFT PATCH 2/2] win32: optimize pthread_cond_broadcast
  2010-06-07 13:38 ` [RFT PATCH 2/2] win32: optimize pthread_cond_broadcast Paolo Bonzini
@ 2010-06-08 16:30   ` Johannes Sixt
  2010-06-08 16:37     ` Paolo Bonzini
  0 siblings, 1 reply; 9+ messages in thread
From: Johannes Sixt @ 2010-06-08 16:30 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: git

Am 07.06.2010 15:38, schrieb Paolo Bonzini:
> @@ -172,9 +172,10 @@ int pthread_cond_broadcast(pthread_cond_t *cond)
>   	 * As in pthread_cond_signal, access to cond->waiters and
>   	 * cond->was_broadcast is locked via the external mutex.
>   	 */
> -
> -	if ((cond->was_broadcast = cond->waiters>  0)) {
> +	if (cond->waiters>  0) {
>   		BOOLEAN result;
> +		cond->was_broadcast = cond->waiters>  1;
> +

It is possible that you set was_broadcast to 1 here, while another thread 
still sees was_broadcast == 0 in cond_wait. As a consequence, this thread 
WaitsForSingleObject(), which will never arrive because the other thread 
does not call SetEvent(). But this is more a problem of your first patch, 
not of this one, so you better fix the first one first before you go 
further into this one.

That said, as long as this series buys performance only at the expense of 
clarity, I'm rather opposed to it because we do not call cond_wait and 
cond_broadcast in time-critical paths.

-- Hannes

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFT PATCH 2/2] win32: optimize pthread_cond_broadcast
  2010-06-08 16:30   ` Johannes Sixt
@ 2010-06-08 16:37     ` Paolo Bonzini
  2010-06-08 18:46       ` Johannes Sixt
  0 siblings, 1 reply; 9+ messages in thread
From: Paolo Bonzini @ 2010-06-08 16:37 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: git

On 06/08/2010 06:30 PM, Johannes Sixt wrote:
> Am 07.06.2010 15:38, schrieb Paolo Bonzini:
>> @@ -172,9 +172,10 @@ int pthread_cond_broadcast(pthread_cond_t *cond)
>> * As in pthread_cond_signal, access to cond->waiters and
>> * cond->was_broadcast is locked via the external mutex.
>> */
>> -
>> - if ((cond->was_broadcast = cond->waiters> 0)) {
>> + if (cond->waiters> 0) {
>> BOOLEAN result;
>> + cond->was_broadcast = cond->waiters> 1;
>> +
>
> It is possible that you set was_broadcast to 1 here, while another
> thread still sees was_broadcast == 0 in cond_wait.

That still cannot happen, because pthread_cond_wait will be locked on 
the semaphore until the ReleaseSemaphore.  The only race that exists is 
between broadcast/signal's ReleaseSemaphore and wait's 
WaitForSingleObject.  This is benign, and exists before my patch.  But 
in all cases the code before ReleaseSemaphore is serialized WRT to the 
code after wait's WaitForSingleObject.

> That said, as long as this series buys performance only at the expense
> of clarity, I'm rather opposed to it because we do not call cond_wait
> and cond_broadcast in time-critical paths.

Yes, it is less clear indeed.  I tried to compensate with comments but 
that was not enough apparently.  As I said I did this patch for another 
project where condvars are used in time-critical paths; if you do not 
want to keep it, that's not a problem.

Paolo

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFT PATCH 2/2] win32: optimize pthread_cond_broadcast
  2010-06-08 16:37     ` Paolo Bonzini
@ 2010-06-08 18:46       ` Johannes Sixt
  0 siblings, 0 replies; 9+ messages in thread
From: Johannes Sixt @ 2010-06-08 18:46 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: git

Am 08.06.2010 18:37, schrieb Paolo Bonzini:
> On 06/08/2010 06:30 PM, Johannes Sixt wrote:
>> Am 07.06.2010 15:38, schrieb Paolo Bonzini:
>>> @@ -172,9 +172,10 @@ int pthread_cond_broadcast(pthread_cond_t *cond)
>>> * As in pthread_cond_signal, access to cond->waiters and
>>> * cond->was_broadcast is locked via the external mutex.
>>> */
>>> -
>>> - if ((cond->was_broadcast = cond->waiters> 0)) {
>>> + if (cond->waiters> 0) {
>>> BOOLEAN result;
>>> + cond->was_broadcast = cond->waiters> 1;
>>> +
>>
>> It is possible that you set was_broadcast to 1 here, while another
>> thread still sees was_broadcast == 0 in cond_wait.
>
> That still cannot happen, because pthread_cond_wait will be locked on
> the semaphore until the ReleaseSemaphore. The only race that exists is
> between broadcast/signal's ReleaseSemaphore and wait's
> WaitForSingleObject. This is benign, and exists before my patch. But in
> all cases the code before ReleaseSemaphore is serialized WRT to the code
> after wait's WaitForSingleObject.

I think I've stared at the code long enough now to see that you are right. 
All counterexamples that I thought I could make up to disprove you didn't 
do it :-)

-- Hannes

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 3/2] fix race in win32 pthread_cond_signal causing spurious wakeups
  2010-06-07 13:38 [RFT PATCH 0/2] win32: optimize emulation of condition variables Paolo Bonzini
  2010-06-07 13:38 ` [RFT PATCH 1/2] win32: optimize condition variable implementation Paolo Bonzini
  2010-06-07 13:38 ` [RFT PATCH 2/2] win32: optimize pthread_cond_broadcast Paolo Bonzini
@ 2010-06-13 10:16 ` Paolo Bonzini
  2 siblings, 0 replies; 9+ messages in thread
From: Paolo Bonzini @ 2010-06-13 10:16 UTC (permalink / raw)
  To: git; +Cc: j.sixt

This patch fixes a bug in the win32 condvar implementation.  The bug
existed originally in pthread_cond_signal before my other recent patches;
however, my patches extended the bug to pthread_cond_broadcast because
they made it behave exactly like pthread_cond_signal when there is only
one waiter.

The bug causes spurious wakeups in pthread_cond_wait.  These are explicitly
allowed by POSIX, but it's better to prevent them in the first place.
It occurs if pthread_cond_signal is called two times with only one waiter,
and the waiter is not scheduled between the two calls.  In this case, the
second call will find cond->num_waiters == 1 and ReleaseSemaphore will
make the semaphore's count positive, thus causing a spurious wakeup on
the next pthread_cond_wait.

The solution is to decrease cond->waiters in pthread_cond_signal.  This
maintains the invariant that _before_ the external mutex is unlocked
cond->num_waiters matches the waiters count of the semaphore.  This
invariant holds for all three functions.

Broadcasting does not have the problem and uses the same algorithm as
before.

Signed-off-by: Paolo Bonzini <bonzini@gnu.org>
---
        I numbered this patch 3/2 because it's on top of the other two,
        but it can be backported to master pretty easily.

 compat/win32/pthread.c |   68 ++++++++++++++++++++++++------------------------
 1 files changed, 34 insertions(+), 34 deletions(-)

diff --git a/compat/win32/pthread.c b/compat/win32/pthread.c
index d46a51c..9aaac89 100644
--- a/compat/win32/pthread.c
+++ b/compat/win32/pthread.c
@@ -103,32 +103,27 @@ int pthread_cond_wait(pthread_cond_t *cond, CRITICAL_SECTION *mutex)
 	WaitForSingleObject(cond->sema, INFINITE);
 
 	/*
-	 * Decrease waiters count.  The mutex is not taken, so we have to
-	 * do this atomically.
-	 */
-	num_waiters = InterlockedDecrement(&cond->waiters);
-
-	/* If we are the last waiter, then we must
-	 * notify the broadcasting thread that it can continue.
-	 * But if we continued due to cond_signal, we do not have to do that
-	 * because the signaling thread knows that only one waiter continued.
+	 * If the condvar was broadcast, then waiters cooperate to notify
+	 * the broadcasting thread that they have woken, so that it can
+	 * continue.  For cond_signal we do not have to do that because the
+	 * signaling thread knows that only one waiter continued.  Also,
+	 * cond_signal will decrement num_waiters itself, to ensure it is
+	 * always a faithful reproduction of the semaphore's state.
 	 */
-	if (num_waiters == 0 && cond->was_broadcast) {
+	if (cond->was_broadcast) {
 		/*
-		 * cond_broadcast was issued while mutex was held. This means
-		 * that all other waiters have continued, but are contending
-		 * for the mutex at the end of this function because the
-		 * broadcasting thread did not leave cond_broadcast, yet.
-		 * (This is so that it can be sure that each waiter has
-		 * consumed exactly one slice of the semaphor.)
-		 * The last waiter must tell the broadcasting thread that it
-		 * can go on.
-		 */
-		SetEvent(cond->continue_broadcast);
-		/*
-		 * Now we go on to contend with all other waiters for
-		 * the mutex. Auf in den Kampf!
+		 * Decrease waiters count.  The mutex is not taken, so we have
+		 * to do this atomically.
+		 *
+		 * cond_broadcast was issued while mutex was held, so all
+		 * waiters contend for the mutex at the end of this function
+		 * until the broadcasting thread relinquishes it.  To ensure
+		 * each waiter consumes exactly one slice of the semaphore,
+		 * the broadcasting thread stops until it is told by the last
+		 * waiter that it can go on.
 		 */
+		if (InterlockedDecrement(&cond->waiters) == 0)
+			SetEvent(cond->continue_broadcast);
 	}
 	/* lock external mutex again */
 	EnterCriticalSection(mutex);
@@ -150,14 +144,15 @@ int pthread_cond_signal(pthread_cond_t *cond)
 	 * so we are safe about that.
 	 *
 	 * Waiting threads decrement it outside the external lock, but
-	 * only if another thread is executing pthread_cond_signal or
-	 * pthread_cond_broadcast---which means it also cannot be
-	 * decremented concurrently with this particular access.
+	 * only if another thread is executing pthread_cond_broadcast.
+	 * So, it also cannot be decremented concurrently with this
+	 * particular access.
 	 */
-	if (cond->waiters > 0)
+	if (cond->waiters > 0) {
+		cond->waiters--;
 		return ReleaseSemaphore(cond->sema, 1, NULL) ?
 			0 : err_win_to_posix(GetLastError());
-	else
+	} else
 		return 0;
 }
 
@@ -174,7 +169,15 @@ int pthread_cond_broadcast(pthread_cond_t *cond)
 	 */
 	if (cond->waiters > 0) {
 		BOOLEAN result;
-		cond->was_broadcast = cond->waiters > 1;
+		/*
+		 * As an optimization, when there was exactly one waiter
+		 * broadcast is the same as signal, so we use the asynchronous
+		 * algorithm that signal uses.
+		 */
+		if (cond->waiters == 1)
+			cond->waiters = 0;
+		else
+			cond->was_broadcast = 1;
 
 		/* wake up all waiters */
 		result = ReleaseSemaphore(cond->sema, cond->waiters, NULL);
@@ -183,14 +186,11 @@ int pthread_cond_broadcast(pthread_cond_t *cond)
 
 		/*
 		 * At this point all waiters continue. Each one takes its
-		 * slice of the semaphor. Now it's our turn to wait: Since
+		 * slice of the semaphore. Now it's our turn to wait: Since
 		 * the external mutex is held, no thread can leave cond_wait,
 		 * yet. For this reason, we can be sure that no thread gets
 		 * a chance to eat *more* than one slice. OTOH, it means
 		 * that the last waiter must send us a wake-up.
-		 *
-		 * As an optimization, when there was exactly one waiter
-		 * broadcast is the same as signal and we can skip this step.
 		 */
 		if (cond->was_broadcast) {
 			WaitForSingleObject(cond->continue_broadcast, INFINITE);
-- 
1.7.0.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2010-06-13 10:17 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-06-07 13:38 [RFT PATCH 0/2] win32: optimize emulation of condition variables Paolo Bonzini
2010-06-07 13:38 ` [RFT PATCH 1/2] win32: optimize condition variable implementation Paolo Bonzini
2010-06-08 16:16   ` Johannes Sixt
2010-06-08 16:27     ` Paolo Bonzini
2010-06-07 13:38 ` [RFT PATCH 2/2] win32: optimize pthread_cond_broadcast Paolo Bonzini
2010-06-08 16:30   ` Johannes Sixt
2010-06-08 16:37     ` Paolo Bonzini
2010-06-08 18:46       ` Johannes Sixt
2010-06-13 10:16 ` [PATCH 3/2] fix race in win32 pthread_cond_signal causing spurious wakeups Paolo Bonzini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.