From: Waiman Long <longman@redhat.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>, Will Deacon <will.deacon@arm.com>,
Thomas Gleixner <tglx@linutronix.de>,
linux-kernel@vger.kernel.org, x86@kernel.org,
Davidlohr Bueso <dave@stgolabs.net>,
Linus Torvalds <torvalds@linux-foundation.org>,
Tim Chen <tim.c.chen@linux.intel.com>
Subject: Re: [PATCH-tip v2 02/12] locking/rwsem: Implement lock handoff to prevent lock starvation
Date: Wed, 10 Apr 2019 22:25:16 -0400 [thread overview]
Message-ID: <bc06ff44-4dd0-f49c-c938-5f6c514e9596@redhat.com> (raw)
In-Reply-To: <20190410184429.GX4038@hirez.programming.kicks-ass.net>
On 04/10/2019 02:44 PM, Peter Zijlstra wrote:
> On Fri, Apr 05, 2019 at 03:21:05PM -0400, Waiman Long wrote:
>> Because of writer lock stealing, it is possible that a constant
>> stream of incoming writers will cause a waiting writer or reader to
>> wait indefinitely leading to lock starvation.
>>
>> The mutex code has a lock handoff mechanism to prevent lock starvation.
>> This patch implements a similar lock handoff mechanism to disable
>> lock stealing and force lock handoff to the first waiter in the queue
>> after at least a 5ms waiting period. The waiting period is used to
>> avoid discouraging lock stealing too much to affect performance.
> I would say the handoff it not at all similar to the mutex code. It is
> in fact radically different.
>
I mean they are similar in concept. Of course, the implementations are
quite different.
>> @@ -131,6 +138,15 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem,
>> adjustment = RWSEM_READER_BIAS;
>> oldcount = atomic_long_fetch_add(adjustment, &sem->count);
>> if (unlikely(oldcount & RWSEM_WRITER_MASK)) {
>> + /*
>> + * Initiate handoff to reader, if applicable.
>> + */
>> + if (!(oldcount & RWSEM_FLAG_HANDOFF) &&
>> + time_after(jiffies, waiter->timeout)) {
>> + adjustment -= RWSEM_FLAG_HANDOFF;
>> + lockevent_inc(rwsem_rlock_handoff);
>> + }
>> +
>> atomic_long_sub(adjustment, &sem->count);
>> return;
>> }
> That confuses the heck out of me...
>
> The above seems to rely on __rwsem_mark_wake() to be fully serialized
> (and it is, by ->wait_lock, but that isn't spelled out anywhere) such
> that we don't get double increment of FLAG_HANDOFF.
>
> So there is NO __rwsem_mark_wake() vs __wesem_mark_wake() race like:
>
> CPU0 CPU1
>
> oldcount = atomic_long_fetch_add(adjustment, &sem->count)
>
> oldcount = atomic_long_fetch_add(adjustment, &sem->count)
>
> if (!(oldcount & HANDOFF))
> adjustment -= HANDOFF;
>
> if (!(oldcount & HANDOFF))
> adjustment -= HANDOFF;
> atomic_long_sub(adjustment)
> atomic_long_sub(adjustment)
>
>
> *whoops* double negative decrement of HANDOFF (aka double increment).
Yes, __rwsem_mark_wake() is always called with wait_lock held. I can add
a lockdep_assert() statement to clarify this point.
>
> However there is another site that fiddles with the HANDOFF bit, namely
> __rwsem_down_write_failed_common(), and that does:
>
> + atomic_long_or(RWSEM_FLAG_HANDOFF, &sem->count);
>
> _OUTSIDE_ of ->wait_lock, which would yield:
>
> CPU0 CPU1
>
> oldcount = atomic_long_fetch_add(adjustment, &sem->count)
>
> atomic_long_or(HANDOFF)
>
> if (!(oldcount & HANDOFF))
> adjustment -= HANDOFF;
>
> atomic_long_sub(adjustment)
>
> *whoops*, incremented HANDOFF on HANDOFF.
>
>
> And there's not a comment in sight that would elucidate if this is
> possible or not.
>
A writer can only set the handoff bit if it is the first waiter in the
queue. If it is the first waiter, a racing __rwsem_mark_wake() will see
that the first waiter is a writer and so won't go into the reader path.
I know I something don't spell out all the conditions that may look
obvious to me but not to others. I will elaborate more in comments.
> Also:
>
> + atomic_long_or(RWSEM_FLAG_HANDOFF, &sem->count);
> + first++;
> +
> + /*
> + * Make sure the handoff bit is seen by
> + * others before proceeding.
> + */
> + smp_mb__after_atomic();
>
> That comment is utter nonsense. smp_mb() doesn't (and cannot) 'make
> visible'. There needs to be order between two memops on both sides.
>
I kind of add that for safety. I will take some time to rethink if it is
really necessary.
Cheers,
Longman
next prev parent reply other threads:[~2019-04-11 2:25 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-04-05 19:21 [PATCH-tip v2 00/12] locking/rwsem: Rwsem rearchitecture part 2 Waiman Long
2019-04-05 19:21 ` [PATCH-tip v2 01/12] locking/rwsem: Implement a new locking scheme Waiman Long
2019-04-05 19:21 ` [PATCH-tip v2 02/12] locking/rwsem: Implement lock handoff to prevent lock starvation Waiman Long
2019-04-10 15:07 ` Peter Zijlstra
2019-04-10 15:28 ` Waiman Long
2019-04-10 15:10 ` Peter Zijlstra
2019-04-10 15:29 ` Waiman Long
2019-04-10 18:44 ` Peter Zijlstra
2019-04-11 2:25 ` Waiman Long [this message]
2019-04-11 7:16 ` Peter Zijlstra
2019-04-05 19:21 ` [PATCH-tip v2 03/12] locking/rwsem: Remove rwsem_wake() wakeup optimization Waiman Long
2019-04-10 18:38 ` Davidlohr Bueso
2019-04-05 19:21 ` [PATCH-tip v2 04/12] locking/rwsem: Make rwsem_spin_on_owner() return owner state Waiman Long
2019-04-05 19:21 ` [PATCH-tip v2 05/12] locking/rwsem: Ensure an RT task will not spin on reader Waiman Long
2019-04-05 19:21 ` [PATCH-tip v2 06/12] locking/rwsem: Wake up almost all readers in wait queue Waiman Long
2019-04-10 16:50 ` Davidlohr Bueso
2019-04-10 17:08 ` Waiman Long
2019-04-10 17:22 ` Davidlohr Bueso
2019-04-10 17:31 ` Davidlohr Bueso
2019-04-10 17:54 ` Waiman Long
2019-04-10 17:53 ` Waiman Long
2019-04-05 19:21 ` [PATCH-tip v2 07/12] locking/rwsem: Enable readers spinning on writer Waiman Long
2019-04-05 19:21 ` [PATCH-tip v2 08/12] locking/rwsem: Enable time-based spinning on reader-owned rwsem Waiman Long
2019-04-05 19:21 ` [PATCH-tip v2 09/12] locking/rwsem: Add more rwsem owner access helpers Waiman Long
2019-04-05 19:21 ` [PATCH-tip v2 10/12] locking/rwsem: Guard against making count negative Waiman Long
2019-04-05 19:21 ` [PATCH-tip v2 11/12] locking/rwsem: Merge owner into count on x86-64 Waiman Long
2019-04-05 19:21 ` [PATCH-tip v2 12/12] locking/rwsem: Remove redundant computation of writer lock word Waiman Long
2019-04-05 23:27 ` [PATCH-tip v2 00/12] locking/rwsem: Rwsem rearchitecture part 2 Linus Torvalds
2019-04-10 10:00 ` Ingo Molnar
2019-04-10 12:38 ` Waiman Long
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=bc06ff44-4dd0-f49c-c938-5f6c514e9596@redhat.com \
--to=longman@redhat.com \
--cc=dave@stgolabs.net \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=tglx@linutronix.de \
--cc=tim.c.chen@linux.intel.com \
--cc=torvalds@linux-foundation.org \
--cc=will.deacon@arm.com \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).