RFC: System hang / deadlock on Linux 6.1

From: Florian Bezdeka <florian.bezdeka@siemens.com>
To: xenomai@lists.linux.dev
Cc: Jan Kiszka <jan.kiszka@siemens.com>, Philippe Gerum <rpm@xenomai.org>
Subject: RFC: System hang / deadlock on Linux 6.1
Date: Mon, 27 Mar 2023 19:30:03 +0200	[thread overview]
Message-ID: <1550a773-6461-5006-3686-d5f2f7e78ee4@siemens.com> (raw)

Hi all,

I'm currently investigating an issue reported by an internal customer. When 
trying to run Xenomai (next branch) on top of Dovetail (6.1.15) in an virtual
environment (VirtualBox 7.0.6) a complete system hang / deadlock can be 
observed.

I was not able to reproduce the locking issue myself, but I'm able to "stall"
the system by putting a lot of load on the system (stress-ng). After 10-20 
minutes there is no progress anymore.

The locking issue reported by the customer:

[    5.063059] [Xenomai] lock (____ptrval____) already unlocked on CPU #3
[    5.063059]           last owner = kernel/xenomai/pipeline/intr.c:26 (xnintr_core_clock_handler(), CPU #0)
[    5.063072] CPU: 3 PID: 130 Comm: systemd-udevd Not tainted 6.1.15-xenomai-1 #1
[    5.063075] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VM 12/01/2006
[    5.063075] IRQ stage: Xenomai
[    5.063077] Call Trace:
[    5.063141]  <TASK>
[    5.063146]  dump_stack_lvl+0x71/0xa0
[    5.063153]  xnlock_dbg_release.cold+0x21/0x2c
[    5.063158]  xnintr_core_clock_handler+0xa4/0x140
[    5.063166]  lapic_oob_handler+0x41/0xf0
[    5.063172]  do_oob_irq+0x25a/0x3e0
[    5.063179]  handle_oob_irq+0x4e/0xd0
[    5.063182]  generic_pipeline_irq_desc+0xb0/0x160
[    5.063213]  arch_handle_irq+0x5d/0x1e0
[    5.063218]  arch_pipeline_entry+0xa1/0x110
[    5.063222]  asm_sysvec_apic_timer_interrupt+0x16/0x20
...

After reading a lot of code I realized that the so called paravirtualized 
spinlocks are being used when running under VB (VirtualBox):

[    0.019574] kvm-guest: PV spinlocks enabled

vs. Qemu:

Qemu (with -enable-kvm):
[    0.255790] kvm-guest: PV spinlocks disabled, no host support

The good news: With CONFIG_PARAVIRT_SPINLOCKS=n (or "nopvspin" on the kernel 
cmdline) the problem disappears.

The bad news: As Linux alone (and dovetail without Xenomai patch) runs fine,
even with all the stress applied, I'm quite sure that we have a (maybe 
longstanding) locking bug.

RFC: I'm now testing the patch below, which is already running fine for some
hours now. Please let me know if all of this makes sense. I might have 
overlooked something.

If I'm not mistaken the following can happen on one CPU:

// Example: taken from tick.c, proxy_set_next_ktime()
xnlock_get_irqsave(&nklock, flags);
// root domain stalled, but hard IRQs are still enabled

	// PROXY TICK IRQ FIRES
	// taken from intr.c, xnintr_core_clock_handler()
	xnlock_get(&nklock);	// we already own the lock
	xnclock_tick(&nkclock); 
	xnlock_put(&nklock);	// we unconditionally release the lock
	// EOI

// back in proxy_set_next_ktime(), but nklock released!
// Other CPU might already own the lock
sched = xnsched_current();
ret = xntimer_start(&sched->htimer, delta, XN_INFINITE, XN_RELATIVE);
xnlock_put_irqrestore(&nklock, flags);


To avoid unconditional lock release I switched to xnlock_{get,put}_irqsave() in
xnintr_core_clock_handler. I think it's correct. Additionally stalling the 
root domain should not be an issues as hard IRQs are already disabled.

diff --git a/kernel/cobalt/dovetail/intr.c b/kernel/cobalt/dovetail/intr.c
index a9459b7a8..ce69dd602 100644
--- a/kernel/cobalt/dovetail/intr.c
+++ b/kernel/cobalt/dovetail/intr.c
@@ -22,10 +22,11 @@ void xnintr_host_tick(struct xnsched *sched) /* hard irqs off */
 void xnintr_core_clock_handler(void)
 {
        struct xnsched *sched;
+       unsigned long flags;
 
-       xnlock_get(&nklock);
+       xnlock_get_irqsave(&nklock, flags);
        xnclock_tick(&nkclock);
-       xnlock_put(&nklock);
+       xnlock_put_irqrestore(&nklock, flags);
 
Please let me know what you think!

Best regards,
Florian

-- 
Siemens AG, T RDA IOT
Corporate Competence Center Embedded Linux