All of lore.kernel.org
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: Nicholas Piggin <npiggin@gmail.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>,
	Mel Gorman <mgorman@techsingularity.net>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@elte.hu>, Andi Kleen <ak@linux.intel.com>,
	Kan Liang <kan.liang@intel.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Johannes Weiner <hannes@cmpxchg.org>, Jan Kara <jack@suse.cz>,
	Christopher Lameter <cl@linux.com>,
	"Eric W . Biederman" <ebiederm@xmission.com>,
	Davidlohr Bueso <dave@stgolabs.net>,
	linux-mm <linux-mm@kvack.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 2/2 v2] sched/wait: Introduce lock breaker in wake_up_page_bit
Date: Sun, 27 Aug 2017 22:17:55 -0700	[thread overview]
Message-ID: <CA+55aFy0WnCeR-WaBQFtsvES1zJpR8BHRRL3aTrwQrUQbFq0fQ@mail.gmail.com> (raw)
In-Reply-To: <20170828112959.05622961@roar.ozlabs.ibm.com>

On Sun, Aug 27, 2017 at 6:29 PM, Nicholas Piggin <npiggin@gmail.com> wrote:
>
> BTW. since you are looking at this stuff, one other small problem I remember
> with exclusive waiters is that losing to a concurrent locker puts them to
> the back of the queue. I think that could be fixed with some small change to
> the wait loops (first add to tail, then retries add to head). Thoughts?

No, not that way.

First off, it's oddly complicated, but more importantly, the real
unfairness you lose to is not other things on the wait queue, but to
other lockers that aren't on the wait-queue at all, but instead just
come in and do a "test-and-set" without ever even going through the
slow path.

So instead of playing queuing games, you'd need to just change the
unlock sequence. Right now we basically do:

 - clear lock bit and atomically test if contended (and we play games
with bit numbering to do that atomic test efficiently)

 - if contended, wake things up

and you'd change the logic to be

 - if contended, don't clear the lock bit at all, just transfer the
lock ownership directly to the waiters by walking the wait list

 - clear the lock bit only once there are no more wait entries (either
because there were no waiters at all, or because all the entries were
just waiting for the lock to be released)

which is certainly doable with a couple of small extensions to the
page wait key data structure.

But most of my clever schemes the last few days were abject failures,
and honestly, it's late in the rc.

In fact, this late in the game I probably wouldn't even have committed
the small cleanups I did if it wasn't for the fact that thinking of
the whole WQ_FLAG_EXCLUSIVE bit made me find the bug.

So the cleanups were actually what got me to look at the problem in
the first place, and then I went "I'm going to commit the cleanup, and
then I can think about the bug I just found".

I'm just happy that the fix seems to be trivial. I was afraid I'd have
to do something nastier (like have the EINTR case send another
explicit wakeup to make up for the lost one, or some ugly hack like
that).

It was only when I started looking at the history of that code, and I
saw the old bit_lock code, and I went "Hmm. That has the _same_ bug -
oh wait, no it doesn't!" that I realized that there was that simple
fix.

You weren't cc'd on the earlier part of the discussion, you only got
added when I realized what the history and simple fix was.

               Linus

WARNING: multiple messages have this Message-ID (diff)
From: Linus Torvalds <torvalds@linux-foundation.org>
To: Nicholas Piggin <npiggin@gmail.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>,
	Mel Gorman <mgorman@techsingularity.net>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@elte.hu>, Andi Kleen <ak@linux.intel.com>,
	Kan Liang <kan.liang@intel.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Johannes Weiner <hannes@cmpxchg.org>, Jan Kara <jack@suse.cz>,
	Christopher Lameter <cl@linux.com>,
	"Eric W . Biederman" <ebiederm@xmission.com>,
	Davidlohr Bueso <dave@stgolabs.net>,
	linux-mm <linux-mm@kvack.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 2/2 v2] sched/wait: Introduce lock breaker in wake_up_page_bit
Date: Sun, 27 Aug 2017 22:17:55 -0700	[thread overview]
Message-ID: <CA+55aFy0WnCeR-WaBQFtsvES1zJpR8BHRRL3aTrwQrUQbFq0fQ@mail.gmail.com> (raw)
In-Reply-To: <20170828112959.05622961@roar.ozlabs.ibm.com>

On Sun, Aug 27, 2017 at 6:29 PM, Nicholas Piggin <npiggin@gmail.com> wrote:
>
> BTW. since you are looking at this stuff, one other small problem I remember
> with exclusive waiters is that losing to a concurrent locker puts them to
> the back of the queue. I think that could be fixed with some small change to
> the wait loops (first add to tail, then retries add to head). Thoughts?

No, not that way.

First off, it's oddly complicated, but more importantly, the real
unfairness you lose to is not other things on the wait queue, but to
other lockers that aren't on the wait-queue at all, but instead just
come in and do a "test-and-set" without ever even going through the
slow path.

So instead of playing queuing games, you'd need to just change the
unlock sequence. Right now we basically do:

 - clear lock bit and atomically test if contended (and we play games
with bit numbering to do that atomic test efficiently)

 - if contended, wake things up

and you'd change the logic to be

 - if contended, don't clear the lock bit at all, just transfer the
lock ownership directly to the waiters by walking the wait list

 - clear the lock bit only once there are no more wait entries (either
because there were no waiters at all, or because all the entries were
just waiting for the lock to be released)

which is certainly doable with a couple of small extensions to the
page wait key data structure.

But most of my clever schemes the last few days were abject failures,
and honestly, it's late in the rc.

In fact, this late in the game I probably wouldn't even have committed
the small cleanups I did if it wasn't for the fact that thinking of
the whole WQ_FLAG_EXCLUSIVE bit made me find the bug.

So the cleanups were actually what got me to look at the problem in
the first place, and then I went "I'm going to commit the cleanup, and
then I can think about the bug I just found".

I'm just happy that the fix seems to be trivial. I was afraid I'd have
to do something nastier (like have the EINTR case send another
explicit wakeup to make up for the lost one, or some ugly hack like
that).

It was only when I started looking at the history of that code, and I
saw the old bit_lock code, and I went "Hmm. That has the _same_ bug -
oh wait, no it doesn't!" that I realized that there was that simple
fix.

You weren't cc'd on the earlier part of the discussion, you only got
added when I realized what the history and simple fix was.

               Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2017-08-28  5:17 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-08-25 16:13 [PATCH 1/2 v2] sched/wait: Break up long wake list walk Tim Chen
2017-08-25 16:13 ` Tim Chen
2017-08-25 16:13 ` [PATCH 2/2 v2] sched/wait: Introduce lock breaker in wake_up_page_bit Tim Chen
2017-08-25 16:13   ` Tim Chen
2017-08-25 19:58   ` Linus Torvalds
2017-08-25 19:58     ` Linus Torvalds
2017-08-25 22:19     ` Tim Chen
2017-08-25 22:19       ` Tim Chen
2017-08-25 22:51       ` Linus Torvalds
2017-08-25 23:03         ` Linus Torvalds
2017-08-26  0:31           ` Linus Torvalds
2017-08-26  0:31             ` Linus Torvalds
2017-08-26  2:54             ` Linus Torvalds
2017-08-26  2:54               ` Linus Torvalds
2017-08-26 18:15               ` Linus Torvalds
2017-08-27 21:40                 ` Linus Torvalds
2017-08-27 21:40                   ` Linus Torvalds
2017-08-27 21:42                   ` Linus Torvalds
2017-08-27 21:42                     ` Linus Torvalds
2017-08-27 23:12                   ` Linus Torvalds
2017-08-27 23:12                     ` Linus Torvalds
2017-08-28  1:16                     ` Nicholas Piggin
2017-08-28  1:16                       ` Nicholas Piggin
2017-08-28  1:29                       ` Nicholas Piggin
2017-08-28  1:29                         ` Nicholas Piggin
2017-08-28  5:17                         ` Linus Torvalds [this message]
2017-08-28  5:17                           ` Linus Torvalds
2017-08-28  7:18                           ` Nicholas Piggin
2017-08-28  7:18                             ` Nicholas Piggin
2017-08-28 14:51                 ` Liang, Kan
2017-08-28 16:48                   ` Linus Torvalds
2017-08-28 20:01                     ` Tim Chen
2017-08-28 20:01                       ` Tim Chen
2017-08-29 12:57                     ` Liang, Kan
2017-08-29 16:01                       ` Linus Torvalds
2017-08-29 16:01                         ` Linus Torvalds
2017-08-29 16:13                         ` Tim Chen
2017-08-29 16:13                           ` Tim Chen
2017-08-29 16:24                           ` Linus Torvalds
2017-08-29 16:24                             ` Linus Torvalds
2017-08-29 16:57                             ` Tim Chen
2017-08-29 16:57                               ` Tim Chen
2017-09-14  2:12                             ` Tim Chen
2017-09-14  2:12                               ` Tim Chen
2017-09-14  2:27                               ` Linus Torvalds
2017-09-14  2:27                                 ` Linus Torvalds
2017-09-14 16:50                                 ` Tim Chen
2017-09-14 16:50                                   ` Tim Chen
2017-09-14 17:00                                   ` Linus Torvalds
2017-09-14 17:00                                     ` Linus Torvalds
2017-09-14 16:39                               ` Christopher Lameter
2017-08-29 16:17                     ` Tim Chen
2017-08-29 16:17                       ` Tim Chen
2017-08-29 16:22                       ` Linus Torvalds
2017-08-29 16:22                         ` Linus Torvalds
2017-08-25 17:46 ` [PATCH 1/2 v2] sched/wait: Break up long wake list walk Christopher Lameter
2017-08-25 17:46   ` Christopher Lameter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CA+55aFy0WnCeR-WaBQFtsvES1zJpR8BHRRL3aTrwQrUQbFq0fQ@mail.gmail.com \
    --to=torvalds@linux-foundation.org \
    --cc=ak@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=cl@linux.com \
    --cc=dave@stgolabs.net \
    --cc=ebiederm@xmission.com \
    --cc=hannes@cmpxchg.org \
    --cc=jack@suse.cz \
    --cc=kan.liang@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mingo@elte.hu \
    --cc=npiggin@gmail.com \
    --cc=peterz@infradead.org \
    --cc=tim.c.chen@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.