archive mirror
 help / color / mirror / Atom feed
From: Konstantin Khlebnikov <>
To: Linus Torvalds <>,
	David Sterba <>,
	David Howells <>,
	Eric Biggers <>,
	Al Viro <>,
	linux-fsdevel <>,
	Linux Kernel Mailing List <>
Cc: Peter Zijlstra <>,
	Ingo Molnar <>,
	Vincent Guittot <>
Subject: Re: [PATCH 0/2] pipe: Fixes [ver #2]
Date: Thu, 12 Dec 2019 13:18:02 +0300	[thread overview]
Message-ID: <> (raw)
In-Reply-To: <>

On 08/12/2019 01.47, Linus Torvalds wrote:
> On Fri, Dec 6, 2019 at 7:50 PM Linus Torvalds
> <> wrote:
>> The "make goes slow" problem bisects down to b667b8673443 ("pipe:
>> Advance tail pointer inside of wait spinlock in pipe_read()").
> I'm not entirely sure that ends up being 100% true. It did bisect to
> that, but the behavior wasn't entirely stable. There definitely is
> some nasty timing trigger.
> But I did finally figure out what seems to have been going on with at
> least the biggest part of the build performance regression. It's seems
> to be a nasty interaction with the scheduler and the GNU make
> jobserver, and in particular the pipe wakeups really _really_ do seem
> to want to be synchronous both for the readers and the writers.
> When a writer wakes up a reader, we want the reader to react quickly
> and vice versa. The most obvious case was for the GNU make jobserver,
> where sub-makes would do a single-byte write to the jobserver pipe,
> and we want to wake up the reader *immediatly*, because the reader is
> actually a lot more important than the writer. The reader is what gets
> the next job going, the writer just got done with the last one.
> And when a reader empties a full pipe, it's because the writer is
> generating data, and you want to just get the writer going again asap.
> Anyway, I've spent way too much time looking at this and wondering
> about odd performance patterns. It seems to be mostly back up to
> normal.
> I say "mostly", because I still see times of "not as many concurrent
> compiles going as I'd expect". It might be a kbuild problem, it might
> be an issue with GNU make (I've seen problems with the make jobserver
> wanting many more tokens than expected before and the kernel makefiles
> - it migth be about deep subdirectories etc), and it might be some
> remaining pipe issue. But my allmodconfig builds aren't _enormously_
> slower than they used to be.
> But there's definitely some unhappy interaction with the jobserver. I
> have 16 threads (8 cores with HT), and I generally use "make -j32" to
> keep them busy because the jobserver isn't great. The pipe rework made
> even that 2x slop not work all that well. Something held on to tokens
> too long, and there was definitely some interaction with the pipe
> wakeup code. Using "-j64" hid the problem, but it was a problem.
> It might be the new scheduler balancing changes that are interacting
> with the pipe thing. I'm adding PeterZ, Ingo and Vincent to the cc,
> because I hadn't realized just how important the sync wakeup seems to
> be for pipe performance even at a big level.
> I've pushed out my pipe changes. I really didn't want to do that kind
> of stuff at the end of the merge window, but I spent a lot more time
> than I wanted looking at this code, because I was getting to the point
> where the alternative was to just revert it all.
> DavidH, give these a look:
b>    85190d15f4ea pipe: don't use 'pipe_wait() for basic pipe IO
>    a28c8b9db8a1 pipe: remove 'waiting_writers' merging logic
>    f467a6a66419 pipe: fix and clarify pipe read wakeup logic
>    1b6b26ae7053 pipe: fix and clarify pipe write wakeup logic
>    ad910e36da4c pipe: fix poll/select race introduced by the pipe rework

commit f467a6a66419 pipe: fix and clarify pipe read wakeup logic
killed "wake writer when buffer becomes half empty" part added by
commit cefa80ced57a ("pipe: Increase the writer-wakeup threshold to reduce context-switch count").

I suppose that was unintentional. Jobserver juggles with few bytes and
should never reach half/full buffer thresholds.

Also reader should wake writer with sync wakeup only if buffer is empty.
Otherwise sync wakeup adds couple unneeded context switches.

> the top two of which are purely "I'm fed up looking at this code, this
> needs to go" kind of changes.
> In particular, that last change is because I think the GNU jobserver
> problem is partly a thundering herd issue: when a job token becomes
> free (ie somebody does a one-byte write to an empty jobserver pipe),
> it wakes up *everybody* who is waiting for a token. One of them will
> get it, and the others will go to sleep again. And then it repeats all
> over. I didn't fix it, but it _could_ be fixed with exclusive waits
> for readers/writers, but that means more smarts than pipe_wait() can
> do. And because the jobserver isn't great at keeping everybody happy,
> I'm using a much bigger "make -jX" value than the number of CPU's I
> have, which makes the herd bigger. And I suspect none of this helps
> the scheduler pick the _right_ process to run, which just makes
> scheduling an even bigger problem.
>              Linus

  parent reply	other threads:[~2019-12-12 10:18 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-05 22:30 [PATCH 0/2] pipe: Fixes [ver #2] David Howells
2019-12-05 22:30 ` [PATCH 1/2] pipe: Remove assertion from pipe_poll() " David Howells
2019-12-05 22:30 ` [PATCH 2/2] pipe: Fix missing mask update after pipe_wait() " David Howells
2019-12-05 23:58   ` Linus Torvalds
2019-12-06 13:56 ` [PATCH 0/2] pipe: Fixes " David Sterba
2019-12-06 17:09   ` Linus Torvalds
2019-12-06 17:42     ` Linus Torvalds
2019-12-06 18:59       ` Linus Torvalds
2019-12-07 21:31         ` Akemi Yagi
2019-12-08 16:45           ` Akemi Yagi
2019-12-08 18:04             ` Linus Torvalds
2019-12-09  3:07               ` Linus Torvalds
2019-12-06 20:28   ` Linus Torvalds
2019-12-06 21:04     ` Linus Torvalds
2019-12-07  3:50       ` Linus Torvalds
2019-12-07  4:01         ` Linus Torvalds
2019-12-07 22:47         ` Linus Torvalds
2019-12-09  9:53           ` Vincent Guittot
2019-12-09 17:48             ` Linus Torvalds
2019-12-09 17:57               ` Akemi Yagi
2019-12-09 18:18                 ` Linus Torvalds
2019-12-09 18:24                   ` Linus Torvalds
2019-12-18 20:59                   ` Josh Triplett
2019-12-10  2:58               ` DJ Delorie
2019-12-10 14:38               ` Vincent Guittot
2019-12-10 17:39                 ` Linus Torvalds
2019-12-11 18:09               ` DJ Delorie
2019-12-11 18:59                 ` Linus Torvalds
2019-12-12 10:18           ` Konstantin Khlebnikov [this message]
2019-12-18 22:51             ` Linus Torvalds
2019-12-19  0:03               ` Josh Triplett
2019-12-19  0:14                 ` Josh Triplett
2019-12-19  0:51                   ` Linus Torvalds
2019-12-19  0:54                     ` Linus Torvalds
2019-12-19  7:56                   ` David Howells
2019-12-19 16:35                     ` Linus Torvalds
2019-12-11 20:55         ` David Howells
2019-12-12  1:28           ` Linus Torvalds
2019-12-12  7:34           ` David Howells
2019-12-09 14:55       ` David Sterba
2019-12-06 21:26   ` David Howells

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \ \ \ \ \ \ \ \ \ \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).