Re: RFC: System hang / deadlock on Linux 6.1

From: Florian Bezdeka <florian.bezdeka@siemens.com>
To: Jan Kiszka <jan.kiszka@web.de>, xenomai@lists.linux.dev
Cc: Philippe Gerum <rpm@xenomai.org>
Subject: Re: RFC: System hang / deadlock on Linux 6.1
Date: Thu, 30 Mar 2023 18:13:49 +0200	[thread overview]
Message-ID: <de2ff0a777bc4d42a4d788c16061b7ed245b7616.camel@siemens.com> (raw)
In-Reply-To: <8cdb7db3-cf99-9fa7-be7e-d0104b74831b@siemens.com>

On Tue, 2023-03-28 at 18:01 +0200, Florian Bezdeka wrote:
> On 28.03.23 01:01, Jan Kiszka wrote:
> > On 27.03.23 19:30, Florian Bezdeka wrote:
> > > Hi all,
> > > 
> > > I'm currently investigating an issue reported by an internal customer. When
> > > trying to run Xenomai (next branch) on top of Dovetail (6.1.15) in an virtual
> > > environment (VirtualBox 7.0.6) a complete system hang / deadlock can be
> > > observed.
> > > 
> > > I was not able to reproduce the locking issue myself, but I'm able to "stall"
> > > the system by putting a lot of load on the system (stress-ng). After 10-20
> > > minutes there is no progress anymore.
> > > 
> > > The locking issue reported by the customer:
> > > 
> > > [    5.063059] [Xenomai] lock (____ptrval____) already unlocked on CPU #3
> > > [    5.063059]           last owner = kernel/xenomai/pipeline/intr.c:26 (xnintr_core_clock_handler(), CPU #0)
> > > [    5.063072] CPU: 3 PID: 130 Comm: systemd-udevd Not tainted 6.1.15-xenomai-1 #1
> > > [    5.063075] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VM 12/01/2006
> > > [    5.063075] IRQ stage: Xenomai
> > > [    5.063077] Call Trace:
> > > [    5.063141]  <TASK>
> > > [    5.063146]  dump_stack_lvl+0x71/0xa0
> > > [    5.063153]  xnlock_dbg_release.cold+0x21/0x2c
> > > [    5.063158]  xnintr_core_clock_handler+0xa4/0x140
> > > [    5.063166]  lapic_oob_handler+0x41/0xf0
> > > [    5.063172]  do_oob_irq+0x25a/0x3e0
> > > [    5.063179]  handle_oob_irq+0x4e/0xd0
> > > [    5.063182]  generic_pipeline_irq_desc+0xb0/0x160
> > > [    5.063213]  arch_handle_irq+0x5d/0x1e0
> > > [    5.063218]  arch_pipeline_entry+0xa1/0x110
> > > [    5.063222]  asm_sysvec_apic_timer_interrupt+0x16/0x20
> > > ...
> > > 
> > > After reading a lot of code I realized that the so called paravirtualized
> > > spinlocks are being used when running under VB (VirtualBox):
> > > 
> > > [    0.019574] kvm-guest: PV spinlocks enabled
> > > 
> > > vs. Qemu:
> > > 
> > > Qemu (with -enable-kvm):
> > > [    0.255790] kvm-guest: PV spinlocks disabled, no host support
> > > 
> > > The good news: With CONFIG_PARAVIRT_SPINLOCKS=n (or "nopvspin" on the kernel
> > > cmdline) the problem disappears.
> > > 
> > > The bad news: As Linux alone (and dovetail without Xenomai patch) runs fine,
> > > even with all the stress applied, I'm quite sure that we have a (maybe
> > > longstanding) locking bug.
> > > 
> > > RFC: I'm now testing the patch below, which is already running fine for some
> > > hours now. Please let me know if all of this makes sense. I might have
> > > overlooked something.
> > > 
> > > If I'm not mistaken the following can happen on one CPU:
> > > 
> > > // Example: taken from tick.c, proxy_set_next_ktime()
> > > xnlock_get_irqsave(&nklock, flags);
> > > // root domain stalled, but hard IRQs are still enabled
> > 
> > OOB + hard IRQs stalled (xnlock_get_irqsave -> splhigh -> oob_irq_save).
> > 
> > > 
> > > 	// PROXY TICK IRQ FIRES
> > > 	// taken from intr.c, xnintr_core_clock_handler()
> > > 	xnlock_get(&nklock);	// we already own the lock
> > 
> > If this code runs under the assumption that hard-IRQs and OOB is stalled
> > while they are not, we indeed have a problem. Please check where that
> > may have gone wrong.
> 
> My warnings used to check the context actually never triggered so far,
> but the system is still entering the broken state. Some further findings:
> 
> - nklock internally uses a arch_spinlock_t for locking
> - nklock is used by the proxy timer tick IRQ, so very "often"
> - arch_spinlock_t is very rarely used in Linux, I found only a couple
>   of usages inside x86 code (dumpstack.c, hpet.c, tsc_sync.c)
> - arch_spinlock_t might use the PV optimized spinlock operations
> - Most other spinlocks (especially in dovetail) are based on
>   raw_spinlock
> - raw_spinlock does not use PV optimized operations
> 
> That might be the reason why Linux alone runs fine.
> 
> I'm now asking myself if the PV spinlock implementation provided by
> virtualbox has a bug... As virtualbox is to some degree based on KVM
> that might be a KVM bug as well...
> 
> Let me try to force the slow path for arch_spinlock_t. Maybe my vCPUs
> never come back...

Short update:

Seems the slowpath matters, especially when there is noticeable load on
the host itself. The vCPU is "parked" and it's simply taking too long
until it is back on the CPU. In my case I'm running into an AHCI
timeout in most cases, rootfs unaccessable, system "dead".

Still wondering how the "lockdep" error above can actually happen. This
was on a different host, so maybe a different problem.

> 
> Florian
> 
> > 
> > Jan
> > 
> > > 	xnclock_tick(&nkclock);
> > > 	xnlock_put(&nklock);	// we unconditionally release the lock
> > > 	// EOI
> > > 
> > > // back in proxy_set_next_ktime(), but nklock released!
> > > // Other CPU might already own the lock
> > > sched = xnsched_current();
> > > ret = xntimer_start(&sched->htimer, delta, XN_INFINITE, XN_RELATIVE);
> > > xnlock_put_irqrestore(&nklock, flags);
> > > 
> > > 
> > > To avoid unconditional lock release I switched to xnlock_{get,put}_irqsave() in
> > > xnintr_core_clock_handler. I think it's correct. Additionally stalling the
> > > root domain should not be an issues as hard IRQs are already disabled.
> > > 
> > > diff --git a/kernel/cobalt/dovetail/intr.c b/kernel/cobalt/dovetail/intr.c
> > > index a9459b7a8..ce69dd602 100644
> > > --- a/kernel/cobalt/dovetail/intr.c
> > > +++ b/kernel/cobalt/dovetail/intr.c
> > > @@ -22,10 +22,11 @@ void xnintr_host_tick(struct xnsched *sched) /* hard irqs off */
> > >  void xnintr_core_clock_handler(void)
> > >  {
> > >         struct xnsched *sched;
> > > +       unsigned long flags;
> > > 
> > > -       xnlock_get(&nklock);
> > > +       xnlock_get_irqsave(&nklock, flags);
> > >         xnclock_tick(&nkclock);
> > > -       xnlock_put(&nklock);
> > > +       xnlock_put_irqrestore(&nklock, flags);
> > > 
> > > Please let me know what you think!
> > > 
> > > Best regards,
> > > Florian
> > > 
> > 
> 
>