[ Added DJ to the participants, since he seems to be the Fedora make maintainer - DJ, any chance that this absolutely horrid 'make' buf can be fixed in older versions too, not just rawhide? The bugfix is two and a half years old by now, and the bug looks real and very serious ] On Mon, Dec 9, 2019 at 1:54 AM Vincent Guittot wrote: > > Which version of make should I use to reproduce the problem ? So the problematic one is "make-4.2.1-13.fc30.x86_64" in Fedora 30. I'm assuming it's fairly plain 4.2.1, but I didn't try to look into the source rpm or anything like that. The working one for me was just the top of -git from https://git.savannah.gnu.org/git/make.git which is 4.2.92 right now. The fix is presumably commit b552b05 ("[SV 51159] Use a non-blocking read with pselect to avoid hangs") as per Akemi. That is indeed after 4.2.1, and it looks real. Before that commit the buggy jobserver code basically does (1) use pselect() to wait for readable and see child deaths atomically (2) use blocking read to get the token and while (1) is atomic, if the child death happens between the two, it goes into the blocking read and has SIGCHLD blocked, so it will try to read the token from the token pipe, but it will never react to the child death - and the child death is what is going to _release_ a token. So what seems to happen is that when the right timing triggers, you end up with a lot of sub-makes waiting for a token, but they are also all supposed to _release_ a token. So you don't have enough tokens to go around. In the worst case, _everybody_ who has a token is also not releasing it, and then you end up triggering the timeout code (after one second), which will make things really go into a crawl. And by a crawl I mean that worst-case you really end up with just one job per second per sub-make. It will take _hours_ to compile the kernel at that speed, when it normally finishes in 15 minutes on my machine even when I do a from-scratch allmodconfig build. It does seem to be a major bug in the jobserver code. In particular with the trial fair and exclusive wakeup patch that I sent out in the other thread, it seems to be _reliably_ much worse and triggers 100% of the time for me. It's possible that my trial patch is buggy, but everything else looks fine, and with a fixed make the trial patch works for me. I'll include the trial patch here too, I think I cc'd you on the other thread too, but hey.. Anyway, it looks like the sync wakeup thing is more of a "get timing right by luck" thing than anything else. Possibly it actually causes the reverse order of reader wakeups more often (ie the most _recent_ reader is most likely to get woken up synchronously) and that may be what really ends up masking the jobserver problem, since apparently doing wakeups in the fair and proper order makes things much worse.. What a horrible pain that pipe rework ended up being. But I think we're in better shape now than we used to be, it just had very unfortunate timing issues and several real bugs. But sadly, there's no way I can push that fair pipe wakeup thing as long as this horribly buggy version of make is widespread. Linus