On Thu, Dec 27, 2018 at 1:09 PM Sargun Dhillon wrote: > > This appears to be broken since October on 4.18.5. We've only noticed > it recently with a workload which does ridiculously parallel compiles > in cgroups that are rapidly churned. Yeah, that's probably unusual enough that people will have missed it. Because it really looks like the bug has been there since 4.13, unless I'm mis-reading things. Other things have changed there since, so maybe I am. > It's also an awkward bug to catch, because none of the lockup > detectors, were catching it in our environment. The only reason we > caught it was that it was blocking other cores, and those other cores > were missing IPIs, resulting in catastrophic failure. My gut feel is that we just need to revert that commit. It doesn't revert clealy, but it doesn't look hard to do manually. Something like the attached? But we do need Tejun and PeterZ to take a look, since there might be something subtle going on. Everybody is probably still on well-deserved vacations, so it might be a while. But testing the attached patch is probably a good idea regardless. Linus