* [RT] hit recently-fixed PREEMPT_RT CFS-bandwidth timer locking issue in the wild
@ 2019-07-26 19:36 Chris Friesen
0 siblings, 0 replies; only message in thread
From: Chris Friesen @ 2019-07-26 19:36 UTC (permalink / raw)
To: rt-users, linux-kernel
Hi all,
I thought people might be interested to hear that we recently hit the
bug fixed by git commit c0ad4aa4d8 on multiple lab systems running the
RHEL 7 "kernel-rt" kernel. (But I think other versions are at risk as
well.)
Interestingly, when the bug hit the system just hung completely. Nothing
was emitted on netconsole or serial console, neither the hung task timer
nor the NMI watchdog triggered, CONFIG_DEBUG_SPINLOCK didn't output
anything, and magic sysrq didn't work on the serial console. As you can
imagine this was a bit frustrating. I was finally able to cause a panic
by sending an NMI from the BMC and that allowed kdump to store the core
file so I could get stack traces.
Given how annoying it was to debug, I'd recommend backporting this fix
as far back as it applies. HRTIMER_MODE_SOFT was introduced in mainline
in 4.16, but at least in the RHEL7 kernel-rt package (and I think in the
vanilla PREEMPT_RT patches as well) hrtimers are run by default in
softirq context and so the fix might apply to all supported PREEMPT_RT
versions.
Chris
^ permalink raw reply [flat|nested] only message in thread
only message in thread, other threads:[~2019-07-26 19:37 UTC | newest]
Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-26 19:36 [RT] hit recently-fixed PREEMPT_RT CFS-bandwidth timer locking issue in the wild Chris Friesen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).