All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2 v2] locking/qrwlock: Fix interrupt handling problem
@ 2015-06-09 15:19 Waiman Long
  2015-06-09 15:19 ` [PATCH v2 1/2] locking/qrwlock: Fix bug in interrupt handling code Waiman Long
  2015-06-09 15:19 ` [PATCH v2 2/2] locking/qrwlock: Don't contend with readers when setting _QW_WAITING Waiman Long
  0 siblings, 2 replies; 10+ messages in thread
From: Waiman Long @ 2015-06-09 15:19 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnd Bergmann
  Cc: linux-arch, linux-kernel, Scott J Norton, Douglas Hatch, Waiman Long

v1->v2:
 - Add microbenchmark data for the second patch

This patch series contains 2 patches on qrwlock. The first one is just
a recap of the patch that I sent a few weeks ago. The second one is to
optimize the writer slowpath.

Waiman Long (2):
  locking/qrwlock: Fix bug in interrupt handling code
  locking/qrwlock: Don't contend with readers when setting _QW_WAITING

 include/asm-generic/qrwlock.h |    4 +-
 kernel/locking/qrwlock.c      |   42 +++++++++++++++++++++++++++++++---------
 2 files changed, 34 insertions(+), 12 deletions(-)


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v2 1/2] locking/qrwlock: Fix bug in interrupt handling code
  2015-06-09 15:19 [PATCH 0/2 v2] locking/qrwlock: Fix interrupt handling problem Waiman Long
@ 2015-06-09 15:19 ` Waiman Long
  2015-06-11 14:21   ` Will Deacon
  2015-06-09 15:19 ` [PATCH v2 2/2] locking/qrwlock: Don't contend with readers when setting _QW_WAITING Waiman Long
  1 sibling, 1 reply; 10+ messages in thread
From: Waiman Long @ 2015-06-09 15:19 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnd Bergmann
  Cc: linux-arch, linux-kernel, Scott J Norton, Douglas Hatch, Waiman Long

The qrwlock is fair in the process context, but becoming unfair when
in the interrupt context to support use cases like the tasklist_lock.
However, the unfair code in the interrupt context has problem that
may cause deadlock.

The fast path increments the reader count. In the interrupt context,
the reader in the slowpath will wait until the writer release the
lock. However, if other readers have the lock and the writer is just
in the waiting mode. It will never get the write lock because the
that interrupt context reader has increment the count. This will
cause deadlock.

This patch fixes this problem by checking the state of the
reader/writer count retrieved at the fast path. If the writer
is in waiting mode, the reader will get the lock immediately and
return. Otherwise, it will wait until the writer release the lock
like before.

Signed-off-by: Waiman Long <Waiman.Long@hp.com>
---
 include/asm-generic/qrwlock.h |    4 ++--
 kernel/locking/qrwlock.c      |   14 ++++++++------
 2 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/include/asm-generic/qrwlock.h b/include/asm-generic/qrwlock.h
index 6383d54..865d021 100644
--- a/include/asm-generic/qrwlock.h
+++ b/include/asm-generic/qrwlock.h
@@ -36,7 +36,7 @@
 /*
  * External function declarations
  */
-extern void queue_read_lock_slowpath(struct qrwlock *lock);
+extern void queue_read_lock_slowpath(struct qrwlock *lock, u32 cnts);
 extern void queue_write_lock_slowpath(struct qrwlock *lock);
 
 /**
@@ -105,7 +105,7 @@ static inline void queue_read_lock(struct qrwlock *lock)
 		return;
 
 	/* The slowpath will decrement the reader count, if necessary. */
-	queue_read_lock_slowpath(lock);
+	queue_read_lock_slowpath(lock, cnts);
 }
 
 /**
diff --git a/kernel/locking/qrwlock.c b/kernel/locking/qrwlock.c
index 00c12bb..d7d7557 100644
--- a/kernel/locking/qrwlock.c
+++ b/kernel/locking/qrwlock.c
@@ -43,22 +43,24 @@ rspin_until_writer_unlock(struct qrwlock *lock, u32 cnts)
  * queue_read_lock_slowpath - acquire read lock of a queue rwlock
  * @lock: Pointer to queue rwlock structure
  */
-void queue_read_lock_slowpath(struct qrwlock *lock)
+void queue_read_lock_slowpath(struct qrwlock *lock, u32 cnts)
 {
-	u32 cnts;
-
 	/*
 	 * Readers come here when they cannot get the lock without waiting
 	 */
 	if (unlikely(in_interrupt())) {
 		/*
-		 * Readers in interrupt context will spin until the lock is
-		 * available without waiting in the queue.
+		 * Readers in interrupt context will get the lock immediately
+		 * if the writer is just waiting (not holding the lock yet)
+		 * or they will spin until the lock is available without
+		 * waiting in the queue.
 		 */
-		cnts = smp_load_acquire((u32 *)&lock->cnts);
+		if ((cnts & _QW_WMASK) != _QW_LOCKED)
+			return;
 		rspin_until_writer_unlock(lock, cnts);
 		return;
 	}
+
 	atomic_sub(_QR_BIAS, &lock->cnts);
 
 	/*
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v2 2/2] locking/qrwlock: Don't contend with readers when setting _QW_WAITING
  2015-06-09 15:19 [PATCH 0/2 v2] locking/qrwlock: Fix interrupt handling problem Waiman Long
  2015-06-09 15:19 ` [PATCH v2 1/2] locking/qrwlock: Fix bug in interrupt handling code Waiman Long
@ 2015-06-09 15:19 ` Waiman Long
  2015-06-10  7:35   ` Ingo Molnar
  2015-06-19 17:59   ` [tip:locking/core] locking/qrwlock: Don' t " tip-bot for Waiman Long
  1 sibling, 2 replies; 10+ messages in thread
From: Waiman Long @ 2015-06-09 15:19 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnd Bergmann
  Cc: linux-arch, linux-kernel, Scott J Norton, Douglas Hatch, Waiman Long

The current cmpxchg() loop in setting the _QW_WAITING flag for writers
in queue_write_lock_slowpath() will contend with incoming readers
causing possibly extra cmpxchg() operations that are wasteful. This
patch changes the code to do a byte cmpxchg() to eliminate contention
with new readers.

A multithreaded microbenchmark running 5M read_lock/write_lock loop
on a 8-socket 80-core Westmere-EX machine running 4.0 based kernel
with the qspinlock patch have the following execution times (in ms)
with and without the patch:

With R:W ratio = 5:1

	Threads	   w/o patch	with patch	% change
	-------	   ---------	----------	--------
	   2	     990 	    895		  -9.6%
	   3	    2136 	   1912		 -10.5%
	   4	    3166	   2830		 -10.6%
	   5	    3953	   3629		  -8.2%
	   6	    4628	   4405		  -4.8%
	   7	    5344	   5197		  -2.8%
	   8	    6065	   6004		  -1.0%
	   9	    6826	   6811		  -0.2%
	  10	    7599	   7599		   0.0%
	  15	    9757	   9766		  +0.1%
	  20	   13767	  13817		  +0.4%

With small number of contending threads, this patch can improve
locking performance by up to 10%. With more contending threads,
however, the gain diminishes.

Signed-off-by: Waiman Long <Waiman.Long@hp.com>
---
 kernel/locking/qrwlock.c |   28 ++++++++++++++++++++++++----
 1 files changed, 24 insertions(+), 4 deletions(-)

diff --git a/kernel/locking/qrwlock.c b/kernel/locking/qrwlock.c
index d7d7557..559198a 100644
--- a/kernel/locking/qrwlock.c
+++ b/kernel/locking/qrwlock.c
@@ -22,6 +22,26 @@
 #include <linux/hardirq.h>
 #include <asm/qrwlock.h>
 
+/*
+ * This internal data structure is used for optimizing access to some of
+ * the subfields within the atomic_t cnts.
+ */
+struct __qrwlock {
+	union {
+		atomic_t cnts;
+		struct {
+#ifdef __LITTLE_ENDIAN
+			u8 wmode;	/* Writer mode   */
+			u8 rcnts[3];	/* Reader counts */
+#else
+			u8 rcnts[3];	/* Reader counts */
+			u8 wmode;	/* Writer mode   */
+#endif
+		};
+	};
+	arch_spinlock_t	lock;
+};
+
 /**
  * rspin_until_writer_unlock - inc reader count & spin until writer is gone
  * @lock  : Pointer to queue rwlock structure
@@ -109,10 +129,10 @@ void queue_write_lock_slowpath(struct qrwlock *lock)
 	 * or wait for a previous writer to go away.
 	 */
 	for (;;) {
-		cnts = atomic_read(&lock->cnts);
-		if (!(cnts & _QW_WMASK) &&
-		    (atomic_cmpxchg(&lock->cnts, cnts,
-				    cnts | _QW_WAITING) == cnts))
+		struct __qrwlock *l = (struct __qrwlock *)lock;
+
+		if (!READ_ONCE(l->wmode) &&
+		   (cmpxchg(&l->wmode, 0, _QW_WAITING) == 0))
 			break;
 
 		cpu_relax_lowlatency();
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 2/2] locking/qrwlock: Don't contend with readers when setting _QW_WAITING
  2015-06-09 15:19 ` [PATCH v2 2/2] locking/qrwlock: Don't contend with readers when setting _QW_WAITING Waiman Long
@ 2015-06-10  7:35   ` Ingo Molnar
  2015-06-10 16:28     ` Waiman Long
  2015-06-19 17:59   ` [tip:locking/core] locking/qrwlock: Don' t " tip-bot for Waiman Long
  1 sibling, 1 reply; 10+ messages in thread
From: Ingo Molnar @ 2015-06-10  7:35 UTC (permalink / raw)
  To: Waiman Long
  Cc: Peter Zijlstra, Ingo Molnar, Arnd Bergmann, linux-arch,
	linux-kernel, Scott J Norton, Douglas Hatch


* Waiman Long <Waiman.Long@hp.com> wrote:

> The current cmpxchg() loop in setting the _QW_WAITING flag for writers
> in queue_write_lock_slowpath() will contend with incoming readers
> causing possibly extra cmpxchg() operations that are wasteful. This
> patch changes the code to do a byte cmpxchg() to eliminate contention
> with new readers.
> 
> A multithreaded microbenchmark running 5M read_lock/write_lock loop
> on a 8-socket 80-core Westmere-EX machine running 4.0 based kernel
> with the qspinlock patch have the following execution times (in ms)
> with and without the patch:
> 
> With R:W ratio = 5:1
> 
> 	Threads	   w/o patch	with patch	% change
> 	-------	   ---------	----------	--------
> 	   2	     990 	    895		  -9.6%
> 	   3	    2136 	   1912		 -10.5%
> 	   4	    3166	   2830		 -10.6%
> 	   5	    3953	   3629		  -8.2%
> 	   6	    4628	   4405		  -4.8%
> 	   7	    5344	   5197		  -2.8%
> 	   8	    6065	   6004		  -1.0%
> 	   9	    6826	   6811		  -0.2%
> 	  10	    7599	   7599		   0.0%
> 	  15	    9757	   9766		  +0.1%
> 	  20	   13767	  13817		  +0.4%
> 
> With small number of contending threads, this patch can improve
> locking performance by up to 10%. With more contending threads,
> however, the gain diminishes.

Mind posting the microbenchmark?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 2/2] locking/qrwlock: Don't contend with readers when setting _QW_WAITING
  2015-06-10  7:35   ` Ingo Molnar
@ 2015-06-10 16:28     ` Waiman Long
  2015-06-12  8:45       ` Ingo Molnar
  0 siblings, 1 reply; 10+ messages in thread
From: Waiman Long @ 2015-06-10 16:28 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Ingo Molnar, Arnd Bergmann, linux-arch,
	linux-kernel, Scott J Norton, Douglas Hatch

[-- Attachment #1: Type: text/plain, Size: 1445 bytes --]

On 06/10/2015 03:35 AM, Ingo Molnar wrote:
> * Waiman Long<Waiman.Long@hp.com>  wrote:
>
>> The current cmpxchg() loop in setting the _QW_WAITING flag for writers
>> in queue_write_lock_slowpath() will contend with incoming readers
>> causing possibly extra cmpxchg() operations that are wasteful. This
>> patch changes the code to do a byte cmpxchg() to eliminate contention
>> with new readers.
>>
>> A multithreaded microbenchmark running 5M read_lock/write_lock loop
>> on a 8-socket 80-core Westmere-EX machine running 4.0 based kernel
>> with the qspinlock patch have the following execution times (in ms)
>> with and without the patch:
>>
>> With R:W ratio = 5:1
>>
>> 	Threads	   w/o patch	with patch	% change
>> 	-------	   ---------	----------	--------
>> 	   2	     990 	    895		  -9.6%
>> 	   3	    2136 	   1912		 -10.5%
>> 	   4	    3166	   2830		 -10.6%
>> 	   5	    3953	   3629		  -8.2%
>> 	   6	    4628	   4405		  -4.8%
>> 	   7	    5344	   5197		  -2.8%
>> 	   8	    6065	   6004		  -1.0%
>> 	   9	    6826	   6811		  -0.2%
>> 	  10	    7599	   7599		   0.0%
>> 	  15	    9757	   9766		  +0.1%
>> 	  20	   13767	  13817		  +0.4%
>>
>> With small number of contending threads, this patch can improve
>> locking performance by up to 10%. With more contending threads,
>> however, the gain diminishes.
> Mind posting the microbenchmark?
>
> Thanks,
>
> 	Ingo

I have attached the tool that I used for testing.

Cheers,
Longman

[-- Attachment #2: locktest.tar.gz --]
[-- Type: application/x-gzip, Size: 6919 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 1/2] locking/qrwlock: Fix bug in interrupt handling code
  2015-06-09 15:19 ` [PATCH v2 1/2] locking/qrwlock: Fix bug in interrupt handling code Waiman Long
@ 2015-06-11 14:21   ` Will Deacon
  2015-06-13  3:16     ` Waiman Long
  0 siblings, 1 reply; 10+ messages in thread
From: Will Deacon @ 2015-06-11 14:21 UTC (permalink / raw)
  To: Waiman Long
  Cc: Peter Zijlstra, Ingo Molnar, Arnd Bergmann, linux-arch,
	linux-kernel, Scott J Norton, Douglas Hatch

Hi Waiman,

On Tue, Jun 09, 2015 at 04:19:12PM +0100, Waiman Long wrote:
> The qrwlock is fair in the process context, but becoming unfair when
> in the interrupt context to support use cases like the tasklist_lock.
> However, the unfair code in the interrupt context has problem that
> may cause deadlock.
> 
> The fast path increments the reader count. In the interrupt context,
> the reader in the slowpath will wait until the writer release the
> lock. However, if other readers have the lock and the writer is just
> in the waiting mode. It will never get the write lock because the
> that interrupt context reader has increment the count. This will
> cause deadlock.

I'm probably just being thick here, but I'm struggling to understand the
deadlock case.

If a reader enters the slowpath in interrupt context, we spin while
(cnts & _QW_WMASK) == _QW_LOCKED. Consequently, if there is a writer in
the waiting state, that won't hold up the reader and so forward progress
is ensured. When the reader unlocks, the reader count is decremented and
the writer can take the lock.

The only problematic case I can think of is if you had a steady stream of
readers in interrupt context, but that doesn't seem likely (and I don't
think this patch deals with that anyway).

What am I missing?

Will

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 2/2] locking/qrwlock: Don't contend with readers when setting _QW_WAITING
  2015-06-10 16:28     ` Waiman Long
@ 2015-06-12  8:45       ` Ingo Molnar
  2015-06-12 22:58         ` Waiman Long
  0 siblings, 1 reply; 10+ messages in thread
From: Ingo Molnar @ 2015-06-12  8:45 UTC (permalink / raw)
  To: Waiman Long
  Cc: Peter Zijlstra, Ingo Molnar, Arnd Bergmann, linux-arch,
	linux-kernel, Scott J Norton, Douglas Hatch


* Waiman Long <waiman.long@hp.com> wrote:

> > Mind posting the microbenchmark?
> 
> I have attached the tool that I used for testing.

Thanks, that's interesting!

Btw., we could also do something like this in user-space, in tools/perf/bench/, we 
have no 'perf bench locking' subcommand yet.

We already build and measure simple x86 kernel methods there such as memset() and 
memcpy():

 triton:~/tip> perf bench mem memcpy -r all
 # Running 'mem/memcpy' benchmark:

 Routine default (Default memcpy() provided by glibc)
 # Copying 1MB Bytes ...

       1.385195 GB/Sec
       4.982462 GB/Sec (with prefault)

 Routine x86-64-unrolled (unrolled memcpy() in arch/x86/lib/memcpy_64.S)
 # Copying 1MB Bytes ...

       1.627604 GB/Sec
       5.336407 GB/Sec (with prefault)

 Routine x86-64-movsq (movsq-based memcpy() in arch/x86/lib/memcpy_64.S)
 # Copying 1MB Bytes ...

       2.132233 GB/Sec
       4.264465 GB/Sec (with prefault)

 Routine x86-64-movsb (movsb-based memcpy() in arch/x86/lib/memcpy_64.S)
 # Copying 1MB Bytes ...

       1.490935 GB/Sec
       7.128193 GB/Sec (with prefault)

Locking primitives would certainly be more complex build in user-space - but we 
could shuffle things around in kernel headers as well to make it easier to test in 
user-space.

That's how we can build lockdep in user-space for example, see tools/lib/lockdep.

Just a thought.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 2/2] locking/qrwlock: Don't contend with readers when setting _QW_WAITING
  2015-06-12  8:45       ` Ingo Molnar
@ 2015-06-12 22:58         ` Waiman Long
  0 siblings, 0 replies; 10+ messages in thread
From: Waiman Long @ 2015-06-12 22:58 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Ingo Molnar, Arnd Bergmann, linux-arch,
	linux-kernel, Scott J Norton, Douglas Hatch

On 06/12/2015 04:45 AM, Ingo Molnar wrote:
> * Waiman Long<waiman.long@hp.com>  wrote:
>
>>> Mind posting the microbenchmark?
>> I have attached the tool that I used for testing.
> Thanks, that's interesting!
>
> Btw., we could also do something like this in user-space, in tools/perf/bench/, we
> have no 'perf bench locking' subcommand yet.
>
> We already build and measure simple x86 kernel methods there such as memset() and
> memcpy():
>
>   triton:~/tip>  perf bench mem memcpy -r all
>   # Running 'mem/memcpy' benchmark:
>
>   Routine default (Default memcpy() provided by glibc)
>   # Copying 1MB Bytes ...
>
>         1.385195 GB/Sec
>         4.982462 GB/Sec (with prefault)
>
>   Routine x86-64-unrolled (unrolled memcpy() in arch/x86/lib/memcpy_64.S)
>   # Copying 1MB Bytes ...
>
>         1.627604 GB/Sec
>         5.336407 GB/Sec (with prefault)
>
>   Routine x86-64-movsq (movsq-based memcpy() in arch/x86/lib/memcpy_64.S)
>   # Copying 1MB Bytes ...
>
>         2.132233 GB/Sec
>         4.264465 GB/Sec (with prefault)
>
>   Routine x86-64-movsb (movsb-based memcpy() in arch/x86/lib/memcpy_64.S)
>   # Copying 1MB Bytes ...
>
>         1.490935 GB/Sec
>         7.128193 GB/Sec (with prefault)
>
> Locking primitives would certainly be more complex build in user-space - but we
> could shuffle things around in kernel headers as well to make it easier to test in
> user-space.
>
> That's how we can build lockdep in user-space for example, see tools/lib/lockdep.
>
> Just a thought.
>
> Thanks,
>
> 	Ingo

I guess we can build user-space version of spinlock and rwlock, but we 
can't do that for sleeping lock like mutex and rwsem. Preemption in user 
space will also affect how those locking test will behave. Anyway, I 
will give it a thought on how to do that in perf bench when I have time.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 1/2] locking/qrwlock: Fix bug in interrupt handling code
  2015-06-11 14:21   ` Will Deacon
@ 2015-06-13  3:16     ` Waiman Long
  0 siblings, 0 replies; 10+ messages in thread
From: Waiman Long @ 2015-06-13  3:16 UTC (permalink / raw)
  To: Will Deacon
  Cc: Peter Zijlstra, Ingo Molnar, Arnd Bergmann, linux-arch,
	linux-kernel, Scott J Norton, Douglas Hatch

On 06/11/2015 10:21 AM, Will Deacon wrote:
> Hi Waiman,
>
> On Tue, Jun 09, 2015 at 04:19:12PM +0100, Waiman Long wrote:
>> The qrwlock is fair in the process context, but becoming unfair when
>> in the interrupt context to support use cases like the tasklist_lock.
>> However, the unfair code in the interrupt context has problem that
>> may cause deadlock.
>>
>> The fast path increments the reader count. In the interrupt context,
>> the reader in the slowpath will wait until the writer release the
>> lock. However, if other readers have the lock and the writer is just
>> in the waiting mode. It will never get the write lock because the
>> that interrupt context reader has increment the count. This will
>> cause deadlock.
> I'm probably just being thick here, but I'm struggling to understand the
> deadlock case.
>
> If a reader enters the slowpath in interrupt context, we spin while
> (cnts&  _QW_WMASK) == _QW_LOCKED. Consequently, if there is a writer in
> the waiting state, that won't hold up the reader and so forward progress
> is ensured. When the reader unlocks, the reader count is decremented and
> the writer can take the lock.
>
> The only problematic case I can think of is if you had a steady stream of
> readers in interrupt context, but that doesn't seem likely (and I don't
> think this patch deals with that anyway).
>
> What am I missing?
>
> Will

You are right. It was my mistake. I misread my own code. I should have a 
comment to clarify that. I will send out a revised one next week.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [tip:locking/core] locking/qrwlock: Don' t contend with readers when setting _QW_WAITING
  2015-06-09 15:19 ` [PATCH v2 2/2] locking/qrwlock: Don't contend with readers when setting _QW_WAITING Waiman Long
  2015-06-10  7:35   ` Ingo Molnar
@ 2015-06-19 17:59   ` tip-bot for Waiman Long
  1 sibling, 0 replies; 10+ messages in thread
From: tip-bot for Waiman Long @ 2015-06-19 17:59 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: doug.hatch, hpa, arnd, mingo, linux-kernel, paulmck, peterz,
	Waiman.Long, bp, tglx, akpm, scott.norton, torvalds

Commit-ID:  405963b6a57c60040bc1dad2597f7f4b897954d1
Gitweb:     http://git.kernel.org/tip/405963b6a57c60040bc1dad2597f7f4b897954d1
Author:     Waiman Long <Waiman.Long@hp.com>
AuthorDate: Tue, 9 Jun 2015 11:19:13 -0400
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Fri, 19 Jun 2015 09:45:38 +0200

locking/qrwlock: Don't contend with readers when setting _QW_WAITING

The current cmpxchg() loop in setting the _QW_WAITING flag for writers
in queue_write_lock_slowpath() will contend with incoming readers
causing possibly extra cmpxchg() operations that are wasteful. This
patch changes the code to do a byte cmpxchg() to eliminate contention
with new readers.

A multithreaded microbenchmark running 5M read_lock/write_lock loop
on a 8-socket 80-core Westmere-EX machine running 4.0 based kernel
with the qspinlock patch have the following execution times (in ms)
with and without the patch:

With R:W ratio = 5:1

	Threads	   w/o patch	with patch	% change
	-------	   ---------	----------	--------
	   2	     990	    895		  -9.6%
	   3	    2136	   1912		 -10.5%
	   4	    3166	   2830		 -10.6%
	   5	    3953	   3629		  -8.2%
	   6	    4628	   4405		  -4.8%
	   7	    5344	   5197		  -2.8%
	   8	    6065	   6004		  -1.0%
	   9	    6826	   6811		  -0.2%
	  10	    7599	   7599		   0.0%
	  15	    9757	   9766		  +0.1%
	  20	   13767	  13817		  +0.4%

With small number of contending threads, this patch can improve
locking performance by up to 10%. With more contending threads,
however, the gain diminishes.

Signed-off-by: Waiman Long <Waiman.Long@hp.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Douglas Hatch <doug.hatch@hp.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Scott J Norton <scott.norton@hp.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1433863153-30722-3-git-send-email-Waiman.Long@hp.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/locking/qrwlock.c | 28 ++++++++++++++++++++++++----
 1 file changed, 24 insertions(+), 4 deletions(-)

diff --git a/kernel/locking/qrwlock.c b/kernel/locking/qrwlock.c
index 00c12bb..6c5da483 100644
--- a/kernel/locking/qrwlock.c
+++ b/kernel/locking/qrwlock.c
@@ -22,6 +22,26 @@
 #include <linux/hardirq.h>
 #include <asm/qrwlock.h>
 
+/*
+ * This internal data structure is used for optimizing access to some of
+ * the subfields within the atomic_t cnts.
+ */
+struct __qrwlock {
+	union {
+		atomic_t cnts;
+		struct {
+#ifdef __LITTLE_ENDIAN
+			u8 wmode;	/* Writer mode   */
+			u8 rcnts[3];	/* Reader counts */
+#else
+			u8 rcnts[3];	/* Reader counts */
+			u8 wmode;	/* Writer mode   */
+#endif
+		};
+	};
+	arch_spinlock_t	lock;
+};
+
 /**
  * rspin_until_writer_unlock - inc reader count & spin until writer is gone
  * @lock  : Pointer to queue rwlock structure
@@ -107,10 +127,10 @@ void queue_write_lock_slowpath(struct qrwlock *lock)
 	 * or wait for a previous writer to go away.
 	 */
 	for (;;) {
-		cnts = atomic_read(&lock->cnts);
-		if (!(cnts & _QW_WMASK) &&
-		    (atomic_cmpxchg(&lock->cnts, cnts,
-				    cnts | _QW_WAITING) == cnts))
+		struct __qrwlock *l = (struct __qrwlock *)lock;
+
+		if (!READ_ONCE(l->wmode) &&
+		   (cmpxchg(&l->wmode, 0, _QW_WAITING) == 0))
 			break;
 
 		cpu_relax_lowlatency();
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply related	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2015-06-19 18:00 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-06-09 15:19 [PATCH 0/2 v2] locking/qrwlock: Fix interrupt handling problem Waiman Long
2015-06-09 15:19 ` [PATCH v2 1/2] locking/qrwlock: Fix bug in interrupt handling code Waiman Long
2015-06-11 14:21   ` Will Deacon
2015-06-13  3:16     ` Waiman Long
2015-06-09 15:19 ` [PATCH v2 2/2] locking/qrwlock: Don't contend with readers when setting _QW_WAITING Waiman Long
2015-06-10  7:35   ` Ingo Molnar
2015-06-10 16:28     ` Waiman Long
2015-06-12  8:45       ` Ingo Molnar
2015-06-12 22:58         ` Waiman Long
2015-06-19 17:59   ` [tip:locking/core] locking/qrwlock: Don' t " tip-bot for Waiman Long

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.