linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] locking/rwsem: Take read lock immediate if queue empty with no writer
@ 2018-07-13 18:30 Waiman Long
  2018-07-18 15:15 ` Waiman Long
  2018-07-18 16:16 ` Peter Zijlstra
  0 siblings, 2 replies; 6+ messages in thread
From: Waiman Long @ 2018-07-13 18:30 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: linux-kernel, Mark Ray, Joe Mario, Scott Norton, Waiman Long

It was discovered that a constant stream of readers might cause the
count to go negative most of the time after an initial trigger by a
writer even if no writer was present afterward. As a result, most of the
readers would have to go through the slowpath reducing their performance.

To avoid that from happening, an additional check is added to detect
the special case that the reader in the critical section is the only
one in the wait queue and no writer is present. When that happens, it
can just have the lock and return immediately without further action.
Other incoming readers won't see a waiter is present and be forced
into the slowpath.

The additional code is in the slowpath and so should not have an impact
on rwsem performance. However, in the special case listed above, it may
greatly improve performance.

The issue was found in a customer site where they had an application
that pounded on the pread64 syscalls heavily on an XFS filesystem. The
application was run in a recent 4-socket boxes with a lot of CPUs. They
saw significant spinlock contention in the rwsem_down_read_failed() call.
With this patch applied, the system CPU usage went from 85% to 57%,
and the spinlock contention in the pread64 syscalls was gone.

v2: Add customer testing results and remove wording that may cause
    confusion.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/locking/rwsem-xadd.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index 3064c50..bf0570e 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -233,8 +233,19 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem,
 	waiter.type = RWSEM_WAITING_FOR_READ;
 
 	raw_spin_lock_irq(&sem->wait_lock);
-	if (list_empty(&sem->wait_list))
+	if (list_empty(&sem->wait_list)) {
+		/*
+		 * In the unlikely event that the task is the only one in
+		 * the wait queue and a writer isn't present, it can have
+		 * the lock and return immediately without going through
+		 * the remaining slowpath code.
+		 */
+		if (unlikely(atomic_long_read(&sem->count) >= 0)) {
+			raw_spin_unlock_irq(&sem->wait_lock);
+			return sem;
+		}
 		adjustment += RWSEM_WAITING_BIAS;
+	}
 	list_add_tail(&waiter.list, &sem->wait_list);
 
 	/* we're now waiting on the lock, but no longer actively locking */
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] locking/rwsem: Take read lock immediate if queue empty with no writer
  2018-07-13 18:30 [PATCH v2] locking/rwsem: Take read lock immediate if queue empty with no writer Waiman Long
@ 2018-07-18 15:15 ` Waiman Long
  2018-07-18 16:16 ` Peter Zijlstra
  1 sibling, 0 replies; 6+ messages in thread
From: Waiman Long @ 2018-07-18 15:15 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: linux-kernel, Mark Ray, Joe Mario, Scott Norton

On 07/13/2018 02:30 PM, Waiman Long wrote:
> It was discovered that a constant stream of readers might cause the
> count to go negative most of the time after an initial trigger by a
> writer even if no writer was present afterward. As a result, most of the
> readers would have to go through the slowpath reducing their performance.
>
> To avoid that from happening, an additional check is added to detect
> the special case that the reader in the critical section is the only
> one in the wait queue and no writer is present. When that happens, it
> can just have the lock and return immediately without further action.
> Other incoming readers won't see a waiter is present and be forced
> into the slowpath.
>
> The additional code is in the slowpath and so should not have an impact
> on rwsem performance. However, in the special case listed above, it may
> greatly improve performance.
>
> The issue was found in a customer site where they had an application
> that pounded on the pread64 syscalls heavily on an XFS filesystem. The
> application was run in a recent 4-socket boxes with a lot of CPUs. They
> saw significant spinlock contention in the rwsem_down_read_failed() call.
> With this patch applied, the system CPU usage went from 85% to 57%,
> and the spinlock contention in the pread64 syscalls was gone.
>
> v2: Add customer testing results and remove wording that may cause
>     confusion.
>
> Signed-off-by: Waiman Long <longman@redhat.com>
> ---
>  kernel/locking/rwsem-xadd.c | 13 ++++++++++++-
>  1 file changed, 12 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
> index 3064c50..bf0570e 100644
> --- a/kernel/locking/rwsem-xadd.c
> +++ b/kernel/locking/rwsem-xadd.c
> @@ -233,8 +233,19 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem,
>  	waiter.type = RWSEM_WAITING_FOR_READ;
>  
>  	raw_spin_lock_irq(&sem->wait_lock);
> -	if (list_empty(&sem->wait_list))
> +	if (list_empty(&sem->wait_list)) {
> +		/*
> +		 * In the unlikely event that the task is the only one in
> +		 * the wait queue and a writer isn't present, it can have
> +		 * the lock and return immediately without going through
> +		 * the remaining slowpath code.
> +		 */
> +		if (unlikely(atomic_long_read(&sem->count) >= 0)) {
> +			raw_spin_unlock_irq(&sem->wait_lock);
> +			return sem;
> +		}
>  		adjustment += RWSEM_WAITING_BIAS;
> +	}
>  	list_add_tail(&waiter.list, &sem->wait_list);
>  
>  	/* we're now waiting on the lock, but no longer actively locking */

Ping!

Any comment on this one?

Cheers,
Longman


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] locking/rwsem: Take read lock immediate if queue empty with no writer
  2018-07-13 18:30 [PATCH v2] locking/rwsem: Take read lock immediate if queue empty with no writer Waiman Long
  2018-07-18 15:15 ` Waiman Long
@ 2018-07-18 16:16 ` Peter Zijlstra
  2018-07-18 16:40   ` Waiman Long
  1 sibling, 1 reply; 6+ messages in thread
From: Peter Zijlstra @ 2018-07-18 16:16 UTC (permalink / raw)
  To: Waiman Long
  Cc: Ingo Molnar, Will Deacon, linux-kernel, Mark Ray, Joe Mario,
	Scott Norton

On Fri, Jul 13, 2018 at 02:30:53PM -0400, Waiman Long wrote:
> It was discovered that a constant stream of readers might cause the
> count to go negative most of the time after an initial trigger by a
> writer even if no writer was present afterward. As a result, most of the
> readers would have to go through the slowpath reducing their performance.

I'm slightly confused, what happens to trigger this?

(also, I'm forever confused by the whole BIAS scheme)

> To avoid that from happening, an additional check is added to detect
> the special case that the reader in the critical section is the only
> one in the wait queue and no writer is present. When that happens, it
> can just have the lock and return immediately without further action.
> Other incoming readers won't see a waiter is present and be forced
> into the slowpath.
> 
> The additional code is in the slowpath and so should not have an impact
> on rwsem performance. However, in the special case listed above, it may
> greatly improve performance.


> diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
> index 3064c50..bf0570e 100644
> --- a/kernel/locking/rwsem-xadd.c
> +++ b/kernel/locking/rwsem-xadd.c
> @@ -233,8 +233,19 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem,

your diff function thingy is busted, this is in fact
__rwsem_down_read_failed_common you're patching.

>  	waiter.type = RWSEM_WAITING_FOR_READ;
>  
>  	raw_spin_lock_irq(&sem->wait_lock);
> -	if (list_empty(&sem->wait_list))
> +	if (list_empty(&sem->wait_list)) {
> +		/*
> +		 * In the unlikely event that the task is the only one in
> +		 * the wait queue and a writer isn't present, it can have
> +		 * the lock and return immediately without going through
> +		 * the remaining slowpath code.
> +		 */
> +		if (unlikely(atomic_long_read(&sem->count) >= 0)) {
> +			raw_spin_unlock_irq(&sem->wait_lock);
> +			return sem;
> +		}
>  		adjustment += RWSEM_WAITING_BIAS;
> +	}
>  	list_add_tail(&waiter.list, &sem->wait_list);

So without this, we would add ourselves to the list and then immediately
call __rwsem_mark_wake() on ourselves and fall through the wait-loop, ow
what?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] locking/rwsem: Take read lock immediate if queue empty with no writer
  2018-07-18 16:16 ` Peter Zijlstra
@ 2018-07-18 16:40   ` Waiman Long
  2018-07-23  4:04     ` Davidlohr Bueso
  0 siblings, 1 reply; 6+ messages in thread
From: Waiman Long @ 2018-07-18 16:40 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Will Deacon, linux-kernel, Mark Ray, Joe Mario,
	Scott Norton

On 07/18/2018 12:16 PM, Peter Zijlstra wrote:
> On Fri, Jul 13, 2018 at 02:30:53PM -0400, Waiman Long wrote:
>> It was discovered that a constant stream of readers might cause the
>> count to go negative most of the time after an initial trigger by a
>> writer even if no writer was present afterward. As a result, most of the
>> readers would have to go through the slowpath reducing their performance.
> I'm slightly confused, what happens to trigger this?
>
> (also, I'm forever confused by the whole BIAS scheme)

As I said before, an occasional writer among a stream of incoming
readers can cause this problem.

>
>> To avoid that from happening, an additional check is added to detect
>> the special case that the reader in the critical section is the only
>> one in the wait queue and no writer is present. When that happens, it
>> can just have the lock and return immediately without further action.
>> Other incoming readers won't see a waiter is present and be forced
>> into the slowpath.

The last sentence above is the key. It is what the other incoming
readers observe that cause the vicious cycle. See below.

>>
>> The additional code is in the slowpath and so should not have an impact
>> on rwsem performance. However, in the special case listed above, it may
>> greatly improve performance.
>
>> diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
>> index 3064c50..bf0570e 100644
>> --- a/kernel/locking/rwsem-xadd.c
>> +++ b/kernel/locking/rwsem-xadd.c
>> @@ -233,8 +233,19 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem,
> your diff function thingy is busted, this is in fact
> __rwsem_down_read_failed_common you're patching.

Yes, I am patching __rwsem_down_read_failed_common().
>
>>  	waiter.type = RWSEM_WAITING_FOR_READ;
>>  
>>  	raw_spin_lock_irq(&sem->wait_lock);
>> -	if (list_empty(&sem->wait_list))
>> +	if (list_empty(&sem->wait_list)) {
>> +		/*
>> +		 * In the unlikely event that the task is the only one in
>> +		 * the wait queue and a writer isn't present, it can have
>> +		 * the lock and return immediately without going through
>> +		 * the remaining slowpath code.
>> +		 */
>> +		if (unlikely(atomic_long_read(&sem->count) >= 0)) {
>> +			raw_spin_unlock_irq(&sem->wait_lock);
>> +			return sem;
>> +		}
>>  		adjustment += RWSEM_WAITING_BIAS;
>> +	}
>>  	list_add_tail(&waiter.list, &sem->wait_list);
> So without this, we would add ourselves to the list and then immediately
> call __rwsem_mark_wake() on ourselves and fall through the wait-loop, ow
> what?

The key here is that we don't want other incoming readers to observe
that there are waiters in the wait queue and hence have to go into the
slowpath until the single waiter in the queue is sure that it probably
will need to go to sleep if there is writer.

With a constant stream of incoming readers, a major portion of them will
observe the a negative count and be serialized to enter the slowpath.
There are certainly other readers that do not observe the negative count
in the in between period after one reader clear the count in the unlock
path and a waiter set the count to negative again. Those readers can go
ahead and do the read in parallel. But it is the serialized readers that
cause the performance loss and the observation of spinlock contention in
the perf output.

It is the constant stream of incoming readers that sustain the spinlock
queue and the repeated clearing and negative setting of the count.

I can reword the commit log and comment with a v3 patch if you think
that is helpful.

Cheers,
Longman




^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] locking/rwsem: Take read lock immediate if queue empty with no writer
  2018-07-18 16:40   ` Waiman Long
@ 2018-07-23  4:04     ` Davidlohr Bueso
  2018-07-23 13:40       ` Waiman Long
  0 siblings, 1 reply; 6+ messages in thread
From: Davidlohr Bueso @ 2018-07-23  4:04 UTC (permalink / raw)
  To: Waiman Long
  Cc: Peter Zijlstra, Ingo Molnar, Will Deacon, linux-kernel, Mark Ray,
	Joe Mario, Scott Norton

On Wed, 18 Jul 2018, Waiman Long wrote:

>The key here is that we don't want other incoming readers to observe
>that there are waiters in the wait queue and hence have to go into the
>slowpath until the single waiter in the queue is sure that it probably
>will need to go to sleep if there is writer.
>
>With a constant stream of incoming readers, a major portion of them will
>observe the a negative count and be serialized to enter the slowpath.
>There are certainly other readers that do not observe the negative count
>in the in between period after one reader clear the count in the unlock
>path and a waiter set the count to negative again. Those readers can go
>ahead and do the read in parallel. But it is the serialized readers that
>cause the performance loss and the observation of spinlock contention in
>the perf output.

This makes sense and seems feasible in that the optimization is done with
the wait_lock held.

>
>It is the constant stream of incoming readers that sustain the spinlock
>queue and the repeated clearing and negative setting of the count.

This would not affect optimistic spinners that haven't yet arrived at the
waitqueue phase because the lock is anonymously owned, so they won't spin
in the first place, right?

Thanks,
Davidlohr

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] locking/rwsem: Take read lock immediate if queue empty with no writer
  2018-07-23  4:04     ` Davidlohr Bueso
@ 2018-07-23 13:40       ` Waiman Long
  0 siblings, 0 replies; 6+ messages in thread
From: Waiman Long @ 2018-07-23 13:40 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: Peter Zijlstra, Ingo Molnar, Will Deacon, linux-kernel, Mark Ray,
	Joe Mario, Scott Norton

On 07/23/2018 12:04 AM, Davidlohr Bueso wrote:
> On Wed, 18 Jul 2018, Waiman Long wrote:
>
>> The key here is that we don't want other incoming readers to observe
>> that there are waiters in the wait queue and hence have to go into the
>> slowpath until the single waiter in the queue is sure that it probably
>> will need to go to sleep if there is writer.
>>
>> With a constant stream of incoming readers, a major portion of them will
>> observe the a negative count and be serialized to enter the slowpath.
>> There are certainly other readers that do not observe the negative count
>> in the in between period after one reader clear the count in the unlock
>> path and a waiter set the count to negative again. Those readers can go
>> ahead and do the read in parallel. But it is the serialized readers that
>> cause the performance loss and the observation of spinlock contention in
>> the perf output.
>
> This makes sense and seems feasible in that the optimization is done with
> the wait_lock held.
>
>>
>> It is the constant stream of incoming readers that sustain the spinlock
>> queue and the repeated clearing and negative setting of the count.
>
> This would not affect optimistic spinners that haven't yet arrived at the
> waitqueue phase because the lock is anonymously owned, so they won't spin
> in the first place, right?

The reader fastpath would have incremented the active count before
entering the slowpath. The spinning writer, seeing a non-zero active
count, will not attempt to steal the lock until the reader decrement the
count and set the waiting bias in one atomic op. Nothing will happen
before that.

-Longman


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2018-07-23 13:41 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-13 18:30 [PATCH v2] locking/rwsem: Take read lock immediate if queue empty with no writer Waiman Long
2018-07-18 15:15 ` Waiman Long
2018-07-18 16:16 ` Peter Zijlstra
2018-07-18 16:40   ` Waiman Long
2018-07-23  4:04     ` Davidlohr Bueso
2018-07-23 13:40       ` Waiman Long

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).