All of lore.kernel.org
 help / color / mirror / Atom feed
From: Waiman Long <waiman.long@hpe.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>, <linux-kernel@vger.kernel.org>,
	Scott J Norton <scott.norton@hpe.com>,
	Douglas Hatch <doug.hatch@hpe.com>,
	Will Deacon <will.deacon@arm.com>
Subject: Re: [PATCH] locking/qrwlock: Allow multiple spinning readers
Date: Thu, 31 Mar 2016 18:12:38 -0400	[thread overview]
Message-ID: <56FDA0D6.4090904@hpe.com> (raw)
In-Reply-To: <20160329202050.GN3408@twins.programming.kicks-ass.net>

On 03/29/2016 04:20 PM, Peter Zijlstra wrote:
> On Sat, Mar 19, 2016 at 11:21:19PM -0400, Waiman Long wrote:
>> In qrwlock, the reader that is spining on the lock will need to notify
>> the next reader in the queue when the lock is free. That introduces a
>> reader-to-reader latency that is not present in the original rwlock.
> How did you find this 'problem'?

I am constantly on the lookout for twists that can make the code run 
faster. That change turn out to be good for reader performance and so I 
send it out to solicit feedback.

>> That is the price for reducing lock cacheline contention. It also
>> reduces the performance benefit of qrwlock on reader heavy workloads.
>>
>> However, if we allow a limited number of readers to spin on the
>> lock simultaneously, we can eliminates some of the reader-to-reader
>> latencies at the expense of a bit more cacheline contention and
>> probably more power consumption.
> So the embedded people might not like that much.

It could be. It is always a compromise.

>> This patch changes the reader slowpath to allow multiple readers to
>> spin on the lock. The maximum number of concurrent readers allowed
>> is currently set to 4 to limit the amount of additional cacheline
>> contention while improving reader performance on most workloads. If
>> a writer comes to the queue head, however, it will stop additional
>> readers from coming out.
>>
>> Using a multi-threaded locking microbenchmark on a 4-socket 40-core
>> Haswell-EX system, the locking throughput of 4.5-rc6 kernel with or
>> without the patch were as follows:
> Do you have an actual real world benchmark where this makes a
> difference?

Not yet. Will look out for some real world workload.

>>   /**
>>    * queued_read_lock_slowpath - acquire read lock of a queue rwlock
>>    * @lock: Pointer to queue rwlock structure
>>    * @cnts: Current qrwlock lock value
>>    */
>>   void queued_read_lock_slowpath(struct qrwlock *lock, u32 cnts)
>>   {
>> +	bool locked = true;
>> +
>>   	/*
>>   	 * Readers come here when they cannot get the lock without waiting
>>   	 */
>> @@ -78,7 +71,10 @@ void queued_read_lock_slowpath(struct qrwlock *lock, u32 cnts)
>>   		 * semantics) until the lock is available without waiting in
>>   		 * the queue.
>>   		 */
>> +		while ((cnts&  _QW_WMASK) == _QW_LOCKED) {
>> +			cpu_relax_lowlatency();
>> +			cnts = atomic_read_acquire(&lock->cnts);
>> +		}
>>   		return;
>>   	}
>>   	atomic_sub(_QR_BIAS,&lock->cnts);
>> @@ -92,14 +88,31 @@ void queued_read_lock_slowpath(struct qrwlock *lock, u32 cnts)
>>   	 * The ACQUIRE semantics of the following spinning code ensure
>>   	 * that accesses can't leak upwards out of our subsequent critical
>>   	 * section in the case that the lock is currently held for write.
>> +	 *
>> +	 * The reader increments the reader count&  wait until the writer
>> +	 * releases the lock.
>>   	 */
>>   	cnts = atomic_add_return_acquire(_QR_BIAS,&lock->cnts) - _QR_BIAS;
>> +	while ((cnts&  _QW_WMASK) == _QW_LOCKED) {
>> +		if (locked&&  ((cnts>>  _QR_SHIFT)<  MAX_SPINNING_READERS)) {
>> +			/*
>> +			 * Unlock the wait queue so that more readers can
>> +			 * come forward and waiting for the writer to exit
>> +			 * as long as no more than MAX_SPINNING_READERS
>> +			 * readers are present.
>> +			 */
>> +			arch_spin_unlock(&lock->wait_lock);
>> +			locked = false;
> Only 1 more can come forward with this logic. How can you ever get to 4?

Yes, each reader in the unlock path will release one in the queue. If 
the next one is also a reader, it will release one more and so on until 
the reader count reach 4 where the process will stop.

>
> Also, what says the next in queue is a reader?

I did say in the changelog that the queue head could be a writer. In 
that case, the process will stop and the writer will wait until all the 
readers are gone.

Cheers,
Longman

  reply	other threads:[~2016-03-31 22:12 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-03-20  3:21 [PATCH] locking/qrwlock: Allow multiple spinning readers Waiman Long
2016-03-20 10:43 ` Peter Zijlstra
2016-03-22  2:21   ` Waiman Long
2016-03-29 20:20 ` Peter Zijlstra
2016-03-31 22:12   ` Waiman Long [this message]
2016-04-01 10:29     ` Peter Zijlstra
2016-04-01 10:31     ` Peter Zijlstra
2016-04-01 10:41       ` Will Deacon
2016-04-01 10:54         ` Peter Zijlstra
2016-04-01 11:43           ` Peter Zijlstra
2016-04-01 16:47             ` Will Deacon
2016-04-01 19:53               ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56FDA0D6.4090904@hpe.com \
    --to=waiman.long@hpe.com \
    --cc=doug.hatch@hpe.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=scott.norton@hpe.com \
    --cc=will.deacon@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.