archive mirror
 help / color / mirror / Atom feed
From: Linus Torvalds <>
To: David Sterba <>,
	David Howells <>,
	Eric Biggers <>,
	Al Viro <>,
	linux-fsdevel <>,
	Linux Kernel Mailing List <>
Cc: Peter Zijlstra <>,
	Ingo Molnar <>,
	Vincent Guittot <>
Subject: Re: [PATCH 0/2] pipe: Fixes [ver #2]
Date: Sat, 7 Dec 2019 14:47:52 -0800	[thread overview]
Message-ID: <> (raw)
In-Reply-To: <>

On Fri, Dec 6, 2019 at 7:50 PM Linus Torvalds
<> wrote:
> The "make goes slow" problem bisects down to b667b8673443 ("pipe:
> Advance tail pointer inside of wait spinlock in pipe_read()").

I'm not entirely sure that ends up being 100% true. It did bisect to
that, but the behavior wasn't entirely stable. There definitely is
some nasty timing trigger.

But I did finally figure out what seems to have been going on with at
least the biggest part of the build performance regression. It's seems
to be a nasty interaction with the scheduler and the GNU make
jobserver, and in particular the pipe wakeups really _really_ do seem
to want to be synchronous both for the readers and the writers.

When a writer wakes up a reader, we want the reader to react quickly
and vice versa. The most obvious case was for the GNU make jobserver,
where sub-makes would do a single-byte write to the jobserver pipe,
and we want to wake up the reader *immediatly*, because the reader is
actually a lot more important than the writer. The reader is what gets
the next job going, the writer just got done with the last one.

And when a reader empties a full pipe, it's because the writer is
generating data, and you want to just get the writer going again asap.

Anyway, I've spent way too much time looking at this and wondering
about odd performance patterns. It seems to be mostly back up to

I say "mostly", because I still see times of "not as many concurrent
compiles going as I'd expect". It might be a kbuild problem, it might
be an issue with GNU make (I've seen problems with the make jobserver
wanting many more tokens than expected before and the kernel makefiles
- it migth be about deep subdirectories etc), and it might be some
remaining pipe issue. But my allmodconfig builds aren't _enormously_
slower than they used to be.

But there's definitely some unhappy interaction with the jobserver. I
have 16 threads (8 cores with HT), and I generally use "make -j32" to
keep them busy because the jobserver isn't great. The pipe rework made
even that 2x slop not work all that well. Something held on to tokens
too long, and there was definitely some interaction with the pipe
wakeup code. Using "-j64" hid the problem, but it was a problem.

It might be the new scheduler balancing changes that are interacting
with the pipe thing. I'm adding PeterZ, Ingo and Vincent to the cc,
because I hadn't realized just how important the sync wakeup seems to
be for pipe performance even at a big level.

I've pushed out my pipe changes. I really didn't want to do that kind
of stuff at the end of the merge window, but I spent a lot more time
than I wanted looking at this code, because I was getting to the point
where the alternative was to just revert it all.

DavidH, give these a look:

  85190d15f4ea pipe: don't use 'pipe_wait() for basic pipe IO
  a28c8b9db8a1 pipe: remove 'waiting_writers' merging logic
  f467a6a66419 pipe: fix and clarify pipe read wakeup logic
  1b6b26ae7053 pipe: fix and clarify pipe write wakeup logic
  ad910e36da4c pipe: fix poll/select race introduced by the pipe rework

the top two of which are purely "I'm fed up looking at this code, this
needs to go" kind of changes.

In particular, that last change is because I think the GNU jobserver
problem is partly a thundering herd issue: when a job token becomes
free (ie somebody does a one-byte write to an empty jobserver pipe),
it wakes up *everybody* who is waiting for a token. One of them will
get it, and the others will go to sleep again. And then it repeats all
over. I didn't fix it, but it _could_ be fixed with exclusive waits
for readers/writers, but that means more smarts than pipe_wait() can
do. And because the jobserver isn't great at keeping everybody happy,
I'm using a much bigger "make -jX" value than the number of CPU's I
have, which makes the herd bigger. And I suspect none of this helps
the scheduler pick the _right_ process to run, which just makes
scheduling an even bigger problem.


  parent reply	other threads:[~2019-12-07 22:48 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-05 22:30 [PATCH 0/2] pipe: Fixes [ver #2] David Howells
2019-12-05 22:30 ` [PATCH 1/2] pipe: Remove assertion from pipe_poll() " David Howells
2019-12-05 22:30 ` [PATCH 2/2] pipe: Fix missing mask update after pipe_wait() " David Howells
2019-12-05 23:58   ` Linus Torvalds
2019-12-06 13:56 ` [PATCH 0/2] pipe: Fixes " David Sterba
2019-12-06 17:09   ` Linus Torvalds
2019-12-06 17:42     ` Linus Torvalds
2019-12-06 18:59       ` Linus Torvalds
2019-12-07 21:31         ` Akemi Yagi
2019-12-08 16:45           ` Akemi Yagi
2019-12-08 18:04             ` Linus Torvalds
2019-12-09  3:07               ` Linus Torvalds
2019-12-06 20:28   ` Linus Torvalds
2019-12-06 21:04     ` Linus Torvalds
2019-12-07  3:50       ` Linus Torvalds
2019-12-07  4:01         ` Linus Torvalds
2019-12-07 22:47         ` Linus Torvalds [this message]
2019-12-09  9:53           ` Vincent Guittot
2019-12-09 17:48             ` Linus Torvalds
2019-12-09 17:57               ` Akemi Yagi
2019-12-09 18:18                 ` Linus Torvalds
2019-12-09 18:24                   ` Linus Torvalds
2019-12-18 20:59                   ` Josh Triplett
2019-12-10  2:58               ` DJ Delorie
2019-12-10 14:38               ` Vincent Guittot
2019-12-10 17:39                 ` Linus Torvalds
2019-12-11 18:09               ` DJ Delorie
2019-12-11 18:59                 ` Linus Torvalds
2019-12-12 10:18           ` Konstantin Khlebnikov
2019-12-18 22:51             ` Linus Torvalds
2019-12-19  0:03               ` Josh Triplett
2019-12-19  0:14                 ` Josh Triplett
2019-12-19  0:51                   ` Linus Torvalds
2019-12-19  0:54                     ` Linus Torvalds
2019-12-19  7:56                   ` David Howells
2019-12-19 16:35                     ` Linus Torvalds
2019-12-11 20:55         ` David Howells
2019-12-12  1:28           ` Linus Torvalds
2019-12-12  7:34           ` David Howells
2019-12-09 14:55       ` David Sterba
2019-12-06 21:26   ` David Howells

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='' \ \ \ \ \ \ \ \ \ \ \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).