[ Added DJ to the participants, since he seems to be the Fedora make
maintainer - DJ, any chance that this absolutely horrid 'make' buf can
be fixed in older versions too, not just rawhide? The bugfix is two
and a half years old by now, and the bug looks real and very serious ]

On Mon, Dec 9, 2019 at 1:54 AM Vincent Guittot
<vincent.guittot@linaro.org> wrote:
>
> Which version of make should I use to reproduce the problem ?

So the problematic one is "make-4.2.1-13.fc30.x86_64" in Fedora 30.
I'm assuming it's fairly plain 4.2.1, but I didn't try to look into
the source rpm or anything like that.

The working one for me was just the top of -git from

    https://git.savannah.gnu.org/git/make.git

which is 4.2.92 right now.

The fix is presumably commit b552b05 ("[SV 51159] Use a non-blocking
read with pselect to avoid hangs") as per Akemi. That is indeed after
4.2.1, and it looks real.

Before that commit the buggy jobserver code basically does

 (1) use pselect() to wait for readable and see child deaths atomically
 (2) use blocking read to get the token

and while (1) is atomic, if the child death happens between the two,
it goes into the blocking read and has SIGCHLD blocked, so it will try
to read the token from the token pipe, but it will never react to the
child death - and the child death is what is going to _release_ a
token.

So what seems to happen is that when the right timing triggers, you
end up with a lot of sub-makes waiting for a token, but they are also
all supposed to _release_ a token. So you don't have enough tokens to
go around. In the worst case, _everybody_ who has a token is also not
releasing it, and then you end up triggering the timeout code (after
one second), which will make things really go into a crawl.

And by a crawl I mean that worst-case you really end up with just one
job per second per sub-make. It will take _hours_ to compile the
kernel at that speed, when it normally finishes in 15 minutes on my
machine even when I do a from-scratch allmodconfig build.

It does seem to be a major bug in the jobserver code. In particular
with the trial fair and exclusive wakeup patch that I sent out in the
other thread, it seems to be _reliably_ much worse and triggers 100%
of the time for me.

It's possible that my trial patch is buggy, but everything else looks
fine, and with a fixed make the trial patch works for me.

I'll include the trial patch here too, I think I cc'd you on the other
thread too, but hey..

Anyway, it looks like the sync wakeup thing is more of a "get timing
right by luck" thing than anything else. Possibly it actually causes
the reverse order of reader wakeups more often (ie the most _recent_
reader is most likely to get woken up synchronously) and that may be
what really ends up masking the jobserver problem, since apparently
doing wakeups in the fair and proper order makes things much worse..

What a horrible pain that pipe rework ended up being. But I think
we're in better shape now than we used to be, it just had very
unfortunate timing issues and several real bugs.

But sadly, there's no way I can push that fair pipe wakeup thing as
long as this horribly buggy version of make is widespread.

                 Linus