linux-rt-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RT] hit recently-fixed PREEMPT_RT CFS-bandwidth timer locking issue in the wild
@ 2019-07-26 19:36 Chris Friesen
  0 siblings, 0 replies; only message in thread
From: Chris Friesen @ 2019-07-26 19:36 UTC (permalink / raw)
  To: rt-users, linux-kernel

Hi all,

I thought people might be interested to hear that we recently hit the 
bug fixed by git commit c0ad4aa4d8 on multiple lab systems running the 
RHEL 7 "kernel-rt" kernel.  (But I think other versions are at risk as 
well.)

Interestingly, when the bug hit the system just hung completely. Nothing 
was emitted on netconsole or serial console, neither the hung task timer 
nor the NMI watchdog triggered, CONFIG_DEBUG_SPINLOCK didn't output 
anything, and magic sysrq didn't work on the serial console.  As you can 
imagine this was a bit frustrating.  I was finally able to cause a panic 
by sending an NMI from the BMC and that allowed kdump to store the core 
file so I could get stack traces.

Given how annoying it was to debug, I'd recommend backporting this fix 
as far back as it applies.  HRTIMER_MODE_SOFT was introduced in mainline 
in 4.16, but at least in the RHEL7 kernel-rt package (and I think in the 
vanilla PREEMPT_RT patches as well) hrtimers are run by default in 
softirq context and so the fix might apply to all supported PREEMPT_RT 
versions.

Chris

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2019-07-26 19:37 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-26 19:36 [RT] hit recently-fixed PREEMPT_RT CFS-bandwidth timer locking issue in the wild Chris Friesen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).