xenomai.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
* RFC: System hang / deadlock on Linux 6.1
@ 2023-03-27 17:30 Florian Bezdeka
  2023-03-27 23:01 ` Jan Kiszka
  0 siblings, 1 reply; 7+ messages in thread
From: Florian Bezdeka @ 2023-03-27 17:30 UTC (permalink / raw)
  To: xenomai; +Cc: Jan Kiszka, Philippe Gerum

Hi all,

I'm currently investigating an issue reported by an internal customer. When 
trying to run Xenomai (next branch) on top of Dovetail (6.1.15) in an virtual
environment (VirtualBox 7.0.6) a complete system hang / deadlock can be 
observed.

I was not able to reproduce the locking issue myself, but I'm able to "stall"
the system by putting a lot of load on the system (stress-ng). After 10-20 
minutes there is no progress anymore.

The locking issue reported by the customer:

[    5.063059] [Xenomai] lock (____ptrval____) already unlocked on CPU #3
[    5.063059]           last owner = kernel/xenomai/pipeline/intr.c:26 (xnintr_core_clock_handler(), CPU #0)
[    5.063072] CPU: 3 PID: 130 Comm: systemd-udevd Not tainted 6.1.15-xenomai-1 #1
[    5.063075] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VM 12/01/2006
[    5.063075] IRQ stage: Xenomai
[    5.063077] Call Trace:
[    5.063141]  <TASK>
[    5.063146]  dump_stack_lvl+0x71/0xa0
[    5.063153]  xnlock_dbg_release.cold+0x21/0x2c
[    5.063158]  xnintr_core_clock_handler+0xa4/0x140
[    5.063166]  lapic_oob_handler+0x41/0xf0
[    5.063172]  do_oob_irq+0x25a/0x3e0
[    5.063179]  handle_oob_irq+0x4e/0xd0
[    5.063182]  generic_pipeline_irq_desc+0xb0/0x160
[    5.063213]  arch_handle_irq+0x5d/0x1e0
[    5.063218]  arch_pipeline_entry+0xa1/0x110
[    5.063222]  asm_sysvec_apic_timer_interrupt+0x16/0x20
...

After reading a lot of code I realized that the so called paravirtualized 
spinlocks are being used when running under VB (VirtualBox):

[    0.019574] kvm-guest: PV spinlocks enabled

vs. Qemu:

Qemu (with -enable-kvm):
[    0.255790] kvm-guest: PV spinlocks disabled, no host support

The good news: With CONFIG_PARAVIRT_SPINLOCKS=n (or "nopvspin" on the kernel 
cmdline) the problem disappears.

The bad news: As Linux alone (and dovetail without Xenomai patch) runs fine,
even with all the stress applied, I'm quite sure that we have a (maybe 
longstanding) locking bug.

RFC: I'm now testing the patch below, which is already running fine for some
hours now. Please let me know if all of this makes sense. I might have 
overlooked something.

If I'm not mistaken the following can happen on one CPU:

// Example: taken from tick.c, proxy_set_next_ktime()
xnlock_get_irqsave(&nklock, flags);
// root domain stalled, but hard IRQs are still enabled

	// PROXY TICK IRQ FIRES
	// taken from intr.c, xnintr_core_clock_handler()
	xnlock_get(&nklock);	// we already own the lock
	xnclock_tick(&nkclock); 
	xnlock_put(&nklock);	// we unconditionally release the lock
	// EOI

// back in proxy_set_next_ktime(), but nklock released!
// Other CPU might already own the lock
sched = xnsched_current();
ret = xntimer_start(&sched->htimer, delta, XN_INFINITE, XN_RELATIVE);
xnlock_put_irqrestore(&nklock, flags);


To avoid unconditional lock release I switched to xnlock_{get,put}_irqsave() in
xnintr_core_clock_handler. I think it's correct. Additionally stalling the 
root domain should not be an issues as hard IRQs are already disabled.

diff --git a/kernel/cobalt/dovetail/intr.c b/kernel/cobalt/dovetail/intr.c
index a9459b7a8..ce69dd602 100644
--- a/kernel/cobalt/dovetail/intr.c
+++ b/kernel/cobalt/dovetail/intr.c
@@ -22,10 +22,11 @@ void xnintr_host_tick(struct xnsched *sched) /* hard irqs off */
 void xnintr_core_clock_handler(void)
 {
        struct xnsched *sched;
+       unsigned long flags;
 
-       xnlock_get(&nklock);
+       xnlock_get_irqsave(&nklock, flags);
        xnclock_tick(&nkclock);
-       xnlock_put(&nklock);
+       xnlock_put_irqrestore(&nklock, flags);
 
Please let me know what you think!

Best regards,
Florian

-- 
Siemens AG, T RDA IOT
Corporate Competence Center Embedded Linux


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: RFC: System hang / deadlock on Linux 6.1
  2023-03-27 17:30 RFC: System hang / deadlock on Linux 6.1 Florian Bezdeka
@ 2023-03-27 23:01 ` Jan Kiszka
  2023-03-28 16:01   ` Florian Bezdeka
  0 siblings, 1 reply; 7+ messages in thread
From: Jan Kiszka @ 2023-03-27 23:01 UTC (permalink / raw)
  To: Florian Bezdeka, xenomai; +Cc: Philippe Gerum

On 27.03.23 19:30, Florian Bezdeka wrote:
> Hi all,
>
> I'm currently investigating an issue reported by an internal customer. When
> trying to run Xenomai (next branch) on top of Dovetail (6.1.15) in an virtual
> environment (VirtualBox 7.0.6) a complete system hang / deadlock can be
> observed.
>
> I was not able to reproduce the locking issue myself, but I'm able to "stall"
> the system by putting a lot of load on the system (stress-ng). After 10-20
> minutes there is no progress anymore.
>
> The locking issue reported by the customer:
>
> [    5.063059] [Xenomai] lock (____ptrval____) already unlocked on CPU #3
> [    5.063059]           last owner = kernel/xenomai/pipeline/intr.c:26 (xnintr_core_clock_handler(), CPU #0)
> [    5.063072] CPU: 3 PID: 130 Comm: systemd-udevd Not tainted 6.1.15-xenomai-1 #1
> [    5.063075] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VM 12/01/2006
> [    5.063075] IRQ stage: Xenomai
> [    5.063077] Call Trace:
> [    5.063141]  <TASK>
> [    5.063146]  dump_stack_lvl+0x71/0xa0
> [    5.063153]  xnlock_dbg_release.cold+0x21/0x2c
> [    5.063158]  xnintr_core_clock_handler+0xa4/0x140
> [    5.063166]  lapic_oob_handler+0x41/0xf0
> [    5.063172]  do_oob_irq+0x25a/0x3e0
> [    5.063179]  handle_oob_irq+0x4e/0xd0
> [    5.063182]  generic_pipeline_irq_desc+0xb0/0x160
> [    5.063213]  arch_handle_irq+0x5d/0x1e0
> [    5.063218]  arch_pipeline_entry+0xa1/0x110
> [    5.063222]  asm_sysvec_apic_timer_interrupt+0x16/0x20
> ...
>
> After reading a lot of code I realized that the so called paravirtualized
> spinlocks are being used when running under VB (VirtualBox):
>
> [    0.019574] kvm-guest: PV spinlocks enabled
>
> vs. Qemu:
>
> Qemu (with -enable-kvm):
> [    0.255790] kvm-guest: PV spinlocks disabled, no host support
>
> The good news: With CONFIG_PARAVIRT_SPINLOCKS=n (or "nopvspin" on the kernel
> cmdline) the problem disappears.
>
> The bad news: As Linux alone (and dovetail without Xenomai patch) runs fine,
> even with all the stress applied, I'm quite sure that we have a (maybe
> longstanding) locking bug.
>
> RFC: I'm now testing the patch below, which is already running fine for some
> hours now. Please let me know if all of this makes sense. I might have
> overlooked something.
>
> If I'm not mistaken the following can happen on one CPU:
>
> // Example: taken from tick.c, proxy_set_next_ktime()
> xnlock_get_irqsave(&nklock, flags);
> // root domain stalled, but hard IRQs are still enabled

OOB + hard IRQs stalled (xnlock_get_irqsave -> splhigh -> oob_irq_save).

>
> 	// PROXY TICK IRQ FIRES
> 	// taken from intr.c, xnintr_core_clock_handler()
> 	xnlock_get(&nklock);	// we already own the lock

If this code runs under the assumption that hard-IRQs and OOB is stalled
while they are not, we indeed have a problem. Please check where that
may have gone wrong.

Jan

> 	xnclock_tick(&nkclock);
> 	xnlock_put(&nklock);	// we unconditionally release the lock
> 	// EOI
>
> // back in proxy_set_next_ktime(), but nklock released!
> // Other CPU might already own the lock
> sched = xnsched_current();
> ret = xntimer_start(&sched->htimer, delta, XN_INFINITE, XN_RELATIVE);
> xnlock_put_irqrestore(&nklock, flags);
>
>
> To avoid unconditional lock release I switched to xnlock_{get,put}_irqsave() in
> xnintr_core_clock_handler. I think it's correct. Additionally stalling the
> root domain should not be an issues as hard IRQs are already disabled.
>
> diff --git a/kernel/cobalt/dovetail/intr.c b/kernel/cobalt/dovetail/intr.c
> index a9459b7a8..ce69dd602 100644
> --- a/kernel/cobalt/dovetail/intr.c
> +++ b/kernel/cobalt/dovetail/intr.c
> @@ -22,10 +22,11 @@ void xnintr_host_tick(struct xnsched *sched) /* hard irqs off */
>  void xnintr_core_clock_handler(void)
>  {
>         struct xnsched *sched;
> +       unsigned long flags;
>
> -       xnlock_get(&nklock);
> +       xnlock_get_irqsave(&nklock, flags);
>         xnclock_tick(&nkclock);
> -       xnlock_put(&nklock);
> +       xnlock_put_irqrestore(&nklock, flags);
>
> Please let me know what you think!
>
> Best regards,
> Florian
>


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: RFC: System hang / deadlock on Linux 6.1
  2023-03-27 23:01 ` Jan Kiszka
@ 2023-03-28 16:01   ` Florian Bezdeka
  2023-03-30 16:13     ` Florian Bezdeka
  0 siblings, 1 reply; 7+ messages in thread
From: Florian Bezdeka @ 2023-03-28 16:01 UTC (permalink / raw)
  To: Jan Kiszka, xenomai; +Cc: Philippe Gerum

On 28.03.23 01:01, Jan Kiszka wrote:
> On 27.03.23 19:30, Florian Bezdeka wrote:
>> Hi all,
>>
>> I'm currently investigating an issue reported by an internal customer. When
>> trying to run Xenomai (next branch) on top of Dovetail (6.1.15) in an virtual
>> environment (VirtualBox 7.0.6) a complete system hang / deadlock can be
>> observed.
>>
>> I was not able to reproduce the locking issue myself, but I'm able to "stall"
>> the system by putting a lot of load on the system (stress-ng). After 10-20
>> minutes there is no progress anymore.
>>
>> The locking issue reported by the customer:
>>
>> [    5.063059] [Xenomai] lock (____ptrval____) already unlocked on CPU #3
>> [    5.063059]           last owner = kernel/xenomai/pipeline/intr.c:26 (xnintr_core_clock_handler(), CPU #0)
>> [    5.063072] CPU: 3 PID: 130 Comm: systemd-udevd Not tainted 6.1.15-xenomai-1 #1
>> [    5.063075] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VM 12/01/2006
>> [    5.063075] IRQ stage: Xenomai
>> [    5.063077] Call Trace:
>> [    5.063141]  <TASK>
>> [    5.063146]  dump_stack_lvl+0x71/0xa0
>> [    5.063153]  xnlock_dbg_release.cold+0x21/0x2c
>> [    5.063158]  xnintr_core_clock_handler+0xa4/0x140
>> [    5.063166]  lapic_oob_handler+0x41/0xf0
>> [    5.063172]  do_oob_irq+0x25a/0x3e0
>> [    5.063179]  handle_oob_irq+0x4e/0xd0
>> [    5.063182]  generic_pipeline_irq_desc+0xb0/0x160
>> [    5.063213]  arch_handle_irq+0x5d/0x1e0
>> [    5.063218]  arch_pipeline_entry+0xa1/0x110
>> [    5.063222]  asm_sysvec_apic_timer_interrupt+0x16/0x20
>> ...
>>
>> After reading a lot of code I realized that the so called paravirtualized
>> spinlocks are being used when running under VB (VirtualBox):
>>
>> [    0.019574] kvm-guest: PV spinlocks enabled
>>
>> vs. Qemu:
>>
>> Qemu (with -enable-kvm):
>> [    0.255790] kvm-guest: PV spinlocks disabled, no host support
>>
>> The good news: With CONFIG_PARAVIRT_SPINLOCKS=n (or "nopvspin" on the kernel
>> cmdline) the problem disappears.
>>
>> The bad news: As Linux alone (and dovetail without Xenomai patch) runs fine,
>> even with all the stress applied, I'm quite sure that we have a (maybe
>> longstanding) locking bug.
>>
>> RFC: I'm now testing the patch below, which is already running fine for some
>> hours now. Please let me know if all of this makes sense. I might have
>> overlooked something.
>>
>> If I'm not mistaken the following can happen on one CPU:
>>
>> // Example: taken from tick.c, proxy_set_next_ktime()
>> xnlock_get_irqsave(&nklock, flags);
>> // root domain stalled, but hard IRQs are still enabled
> 
> OOB + hard IRQs stalled (xnlock_get_irqsave -> splhigh -> oob_irq_save).
> 
>>
>> 	// PROXY TICK IRQ FIRES
>> 	// taken from intr.c, xnintr_core_clock_handler()
>> 	xnlock_get(&nklock);	// we already own the lock
> 
> If this code runs under the assumption that hard-IRQs and OOB is stalled
> while they are not, we indeed have a problem. Please check where that
> may have gone wrong.

My warnings used to check the context actually never triggered so far,
but the system is still entering the broken state. Some further findings:

- nklock internally uses a arch_spinlock_t for locking
- nklock is used by the proxy timer tick IRQ, so very "often"
- arch_spinlock_t is very rarely used in Linux, I found only a couple
  of usages inside x86 code (dumpstack.c, hpet.c, tsc_sync.c)
- arch_spinlock_t might use the PV optimized spinlock operations
- Most other spinlocks (especially in dovetail) are based on
  raw_spinlock
- raw_spinlock does not use PV optimized operations

That might be the reason why Linux alone runs fine.

I'm now asking myself if the PV spinlock implementation provided by
virtualbox has a bug... As virtualbox is to some degree based on KVM
that might be a KVM bug as well...

Let me try to force the slow path for arch_spinlock_t. Maybe my vCPUs
never come back...

Florian

> 
> Jan
> 
>> 	xnclock_tick(&nkclock);
>> 	xnlock_put(&nklock);	// we unconditionally release the lock
>> 	// EOI
>>
>> // back in proxy_set_next_ktime(), but nklock released!
>> // Other CPU might already own the lock
>> sched = xnsched_current();
>> ret = xntimer_start(&sched->htimer, delta, XN_INFINITE, XN_RELATIVE);
>> xnlock_put_irqrestore(&nklock, flags);
>>
>>
>> To avoid unconditional lock release I switched to xnlock_{get,put}_irqsave() in
>> xnintr_core_clock_handler. I think it's correct. Additionally stalling the
>> root domain should not be an issues as hard IRQs are already disabled.
>>
>> diff --git a/kernel/cobalt/dovetail/intr.c b/kernel/cobalt/dovetail/intr.c
>> index a9459b7a8..ce69dd602 100644
>> --- a/kernel/cobalt/dovetail/intr.c
>> +++ b/kernel/cobalt/dovetail/intr.c
>> @@ -22,10 +22,11 @@ void xnintr_host_tick(struct xnsched *sched) /* hard irqs off */
>>  void xnintr_core_clock_handler(void)
>>  {
>>         struct xnsched *sched;
>> +       unsigned long flags;
>>
>> -       xnlock_get(&nklock);
>> +       xnlock_get_irqsave(&nklock, flags);
>>         xnclock_tick(&nkclock);
>> -       xnlock_put(&nklock);
>> +       xnlock_put_irqrestore(&nklock, flags);
>>
>> Please let me know what you think!
>>
>> Best regards,
>> Florian
>>
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: RFC: System hang / deadlock on Linux 6.1
  2023-03-28 16:01   ` Florian Bezdeka
@ 2023-03-30 16:13     ` Florian Bezdeka
  2023-04-03  5:31       ` Jan Kiszka
  0 siblings, 1 reply; 7+ messages in thread
From: Florian Bezdeka @ 2023-03-30 16:13 UTC (permalink / raw)
  To: Jan Kiszka, xenomai; +Cc: Philippe Gerum

On Tue, 2023-03-28 at 18:01 +0200, Florian Bezdeka wrote:
> On 28.03.23 01:01, Jan Kiszka wrote:
> > On 27.03.23 19:30, Florian Bezdeka wrote:
> > > Hi all,
> > > 
> > > I'm currently investigating an issue reported by an internal customer. When
> > > trying to run Xenomai (next branch) on top of Dovetail (6.1.15) in an virtual
> > > environment (VirtualBox 7.0.6) a complete system hang / deadlock can be
> > > observed.
> > > 
> > > I was not able to reproduce the locking issue myself, but I'm able to "stall"
> > > the system by putting a lot of load on the system (stress-ng). After 10-20
> > > minutes there is no progress anymore.
> > > 
> > > The locking issue reported by the customer:
> > > 
> > > [    5.063059] [Xenomai] lock (____ptrval____) already unlocked on CPU #3
> > > [    5.063059]           last owner = kernel/xenomai/pipeline/intr.c:26 (xnintr_core_clock_handler(), CPU #0)
> > > [    5.063072] CPU: 3 PID: 130 Comm: systemd-udevd Not tainted 6.1.15-xenomai-1 #1
> > > [    5.063075] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VM 12/01/2006
> > > [    5.063075] IRQ stage: Xenomai
> > > [    5.063077] Call Trace:
> > > [    5.063141]  <TASK>
> > > [    5.063146]  dump_stack_lvl+0x71/0xa0
> > > [    5.063153]  xnlock_dbg_release.cold+0x21/0x2c
> > > [    5.063158]  xnintr_core_clock_handler+0xa4/0x140
> > > [    5.063166]  lapic_oob_handler+0x41/0xf0
> > > [    5.063172]  do_oob_irq+0x25a/0x3e0
> > > [    5.063179]  handle_oob_irq+0x4e/0xd0
> > > [    5.063182]  generic_pipeline_irq_desc+0xb0/0x160
> > > [    5.063213]  arch_handle_irq+0x5d/0x1e0
> > > [    5.063218]  arch_pipeline_entry+0xa1/0x110
> > > [    5.063222]  asm_sysvec_apic_timer_interrupt+0x16/0x20
> > > ...
> > > 
> > > After reading a lot of code I realized that the so called paravirtualized
> > > spinlocks are being used when running under VB (VirtualBox):
> > > 
> > > [    0.019574] kvm-guest: PV spinlocks enabled
> > > 
> > > vs. Qemu:
> > > 
> > > Qemu (with -enable-kvm):
> > > [    0.255790] kvm-guest: PV spinlocks disabled, no host support
> > > 
> > > The good news: With CONFIG_PARAVIRT_SPINLOCKS=n (or "nopvspin" on the kernel
> > > cmdline) the problem disappears.
> > > 
> > > The bad news: As Linux alone (and dovetail without Xenomai patch) runs fine,
> > > even with all the stress applied, I'm quite sure that we have a (maybe
> > > longstanding) locking bug.
> > > 
> > > RFC: I'm now testing the patch below, which is already running fine for some
> > > hours now. Please let me know if all of this makes sense. I might have
> > > overlooked something.
> > > 
> > > If I'm not mistaken the following can happen on one CPU:
> > > 
> > > // Example: taken from tick.c, proxy_set_next_ktime()
> > > xnlock_get_irqsave(&nklock, flags);
> > > // root domain stalled, but hard IRQs are still enabled
> > 
> > OOB + hard IRQs stalled (xnlock_get_irqsave -> splhigh -> oob_irq_save).
> > 
> > > 
> > > 	// PROXY TICK IRQ FIRES
> > > 	// taken from intr.c, xnintr_core_clock_handler()
> > > 	xnlock_get(&nklock);	// we already own the lock
> > 
> > If this code runs under the assumption that hard-IRQs and OOB is stalled
> > while they are not, we indeed have a problem. Please check where that
> > may have gone wrong.
> 
> My warnings used to check the context actually never triggered so far,
> but the system is still entering the broken state. Some further findings:
> 
> - nklock internally uses a arch_spinlock_t for locking
> - nklock is used by the proxy timer tick IRQ, so very "often"
> - arch_spinlock_t is very rarely used in Linux, I found only a couple
>   of usages inside x86 code (dumpstack.c, hpet.c, tsc_sync.c)
> - arch_spinlock_t might use the PV optimized spinlock operations
> - Most other spinlocks (especially in dovetail) are based on
>   raw_spinlock
> - raw_spinlock does not use PV optimized operations
> 
> That might be the reason why Linux alone runs fine.
> 
> I'm now asking myself if the PV spinlock implementation provided by
> virtualbox has a bug... As virtualbox is to some degree based on KVM
> that might be a KVM bug as well...
> 
> Let me try to force the slow path for arch_spinlock_t. Maybe my vCPUs
> never come back...

Short update:

Seems the slowpath matters, especially when there is noticeable load on
the host itself. The vCPU is "parked" and it's simply taking too long
until it is back on the CPU. In my case I'm running into an AHCI
timeout in most cases, rootfs unaccessable, system "dead".

Still wondering how the "lockdep" error above can actually happen. This
was on a different host, so maybe a different problem.

> 
> Florian
> 
> > 
> > Jan
> > 
> > > 	xnclock_tick(&nkclock);
> > > 	xnlock_put(&nklock);	// we unconditionally release the lock
> > > 	// EOI
> > > 
> > > // back in proxy_set_next_ktime(), but nklock released!
> > > // Other CPU might already own the lock
> > > sched = xnsched_current();
> > > ret = xntimer_start(&sched->htimer, delta, XN_INFINITE, XN_RELATIVE);
> > > xnlock_put_irqrestore(&nklock, flags);
> > > 
> > > 
> > > To avoid unconditional lock release I switched to xnlock_{get,put}_irqsave() in
> > > xnintr_core_clock_handler. I think it's correct. Additionally stalling the
> > > root domain should not be an issues as hard IRQs are already disabled.
> > > 
> > > diff --git a/kernel/cobalt/dovetail/intr.c b/kernel/cobalt/dovetail/intr.c
> > > index a9459b7a8..ce69dd602 100644
> > > --- a/kernel/cobalt/dovetail/intr.c
> > > +++ b/kernel/cobalt/dovetail/intr.c
> > > @@ -22,10 +22,11 @@ void xnintr_host_tick(struct xnsched *sched) /* hard irqs off */
> > >  void xnintr_core_clock_handler(void)
> > >  {
> > >         struct xnsched *sched;
> > > +       unsigned long flags;
> > > 
> > > -       xnlock_get(&nklock);
> > > +       xnlock_get_irqsave(&nklock, flags);
> > >         xnclock_tick(&nkclock);
> > > -       xnlock_put(&nklock);
> > > +       xnlock_put_irqrestore(&nklock, flags);
> > > 
> > > Please let me know what you think!
> > > 
> > > Best regards,
> > > Florian
> > > 
> > 
> 
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: RFC: System hang / deadlock on Linux 6.1
  2023-03-30 16:13     ` Florian Bezdeka
@ 2023-04-03  5:31       ` Jan Kiszka
  2023-04-03  8:52         ` Florian Bezdeka
  0 siblings, 1 reply; 7+ messages in thread
From: Jan Kiszka @ 2023-04-03  5:31 UTC (permalink / raw)
  To: Florian Bezdeka, Jan Kiszka, xenomai; +Cc: Philippe Gerum

On 30.03.23 18:13, Florian Bezdeka wrote:
> On Tue, 2023-03-28 at 18:01 +0200, Florian Bezdeka wrote:
>> On 28.03.23 01:01, Jan Kiszka wrote:
>>> On 27.03.23 19:30, Florian Bezdeka wrote:
>>>> Hi all,
>>>>
>>>> I'm currently investigating an issue reported by an internal customer. When
>>>> trying to run Xenomai (next branch) on top of Dovetail (6.1.15) in an virtual
>>>> environment (VirtualBox 7.0.6) a complete system hang / deadlock can be
>>>> observed.
>>>>
>>>> I was not able to reproduce the locking issue myself, but I'm able to "stall"
>>>> the system by putting a lot of load on the system (stress-ng). After 10-20
>>>> minutes there is no progress anymore.
>>>>
>>>> The locking issue reported by the customer:
>>>>
>>>> [    5.063059] [Xenomai] lock (____ptrval____) already unlocked on CPU #3
>>>> [    5.063059]           last owner = kernel/xenomai/pipeline/intr.c:26 (xnintr_core_clock_handler(), CPU #0)
>>>> [    5.063072] CPU: 3 PID: 130 Comm: systemd-udevd Not tainted 6.1.15-xenomai-1 #1
>>>> [    5.063075] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VM 12/01/2006
>>>> [    5.063075] IRQ stage: Xenomai
>>>> [    5.063077] Call Trace:
>>>> [    5.063141]  <TASK>
>>>> [    5.063146]  dump_stack_lvl+0x71/0xa0
>>>> [    5.063153]  xnlock_dbg_release.cold+0x21/0x2c
>>>> [    5.063158]  xnintr_core_clock_handler+0xa4/0x140
>>>> [    5.063166]  lapic_oob_handler+0x41/0xf0
>>>> [    5.063172]  do_oob_irq+0x25a/0x3e0
>>>> [    5.063179]  handle_oob_irq+0x4e/0xd0
>>>> [    5.063182]  generic_pipeline_irq_desc+0xb0/0x160
>>>> [    5.063213]  arch_handle_irq+0x5d/0x1e0
>>>> [    5.063218]  arch_pipeline_entry+0xa1/0x110
>>>> [    5.063222]  asm_sysvec_apic_timer_interrupt+0x16/0x20
>>>> ...
>>>>
>>>> After reading a lot of code I realized that the so called paravirtualized
>>>> spinlocks are being used when running under VB (VirtualBox):
>>>>
>>>> [    0.019574] kvm-guest: PV spinlocks enabled
>>>>
>>>> vs. Qemu:
>>>>
>>>> Qemu (with -enable-kvm):
>>>> [    0.255790] kvm-guest: PV spinlocks disabled, no host support
>>>>
>>>> The good news: With CONFIG_PARAVIRT_SPINLOCKS=n (or "nopvspin" on the kernel
>>>> cmdline) the problem disappears.
>>>>
>>>> The bad news: As Linux alone (and dovetail without Xenomai patch) runs fine,
>>>> even with all the stress applied, I'm quite sure that we have a (maybe
>>>> longstanding) locking bug.
>>>>
>>>> RFC: I'm now testing the patch below, which is already running fine for some
>>>> hours now. Please let me know if all of this makes sense. I might have
>>>> overlooked something.
>>>>
>>>> If I'm not mistaken the following can happen on one CPU:
>>>>
>>>> // Example: taken from tick.c, proxy_set_next_ktime()
>>>> xnlock_get_irqsave(&nklock, flags);
>>>> // root domain stalled, but hard IRQs are still enabled
>>>
>>> OOB + hard IRQs stalled (xnlock_get_irqsave -> splhigh -> oob_irq_save).
>>>
>>>>
>>>> 	// PROXY TICK IRQ FIRES
>>>> 	// taken from intr.c, xnintr_core_clock_handler()
>>>> 	xnlock_get(&nklock);	// we already own the lock
>>>
>>> If this code runs under the assumption that hard-IRQs and OOB is stalled
>>> while they are not, we indeed have a problem. Please check where that
>>> may have gone wrong.
>>
>> My warnings used to check the context actually never triggered so far,
>> but the system is still entering the broken state. Some further findings:
>>
>> - nklock internally uses a arch_spinlock_t for locking
>> - nklock is used by the proxy timer tick IRQ, so very "often"
>> - arch_spinlock_t is very rarely used in Linux, I found only a couple
>>   of usages inside x86 code (dumpstack.c, hpet.c, tsc_sync.c)
>> - arch_spinlock_t might use the PV optimized spinlock operations
>> - Most other spinlocks (especially in dovetail) are based on
>>   raw_spinlock
>> - raw_spinlock does not use PV optimized operations
>>
>> That might be the reason why Linux alone runs fine.
>>
>> I'm now asking myself if the PV spinlock implementation provided by
>> virtualbox has a bug... As virtualbox is to some degree based on KVM
>> that might be a KVM bug as well...
>>
>> Let me try to force the slow path for arch_spinlock_t. Maybe my vCPUs
>> never come back...
> 
> Short update:
> 
> Seems the slowpath matters, especially when there is noticeable load on
> the host itself. The vCPU is "parked" and it's simply taking too long
> until it is back on the CPU. In my case I'm running into an AHCI
> timeout in most cases, rootfs unaccessable, system "dead".
> 
> Still wondering how the "lockdep" error above can actually happen. This
> was on a different host, so maybe a different problem.
> 

It might be that the PV code is not playing according to dovetail rules,
e.g. not always using virtualized irq-enable/disable or not using hard
disabling when actually needed. Tracing the call path, along with the
virtual and physical irq states, may help. Any this may easier be
debugged over KVM with pv-spinlocks enabled.

Jan

-- 
Siemens AG, Technology
Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: RFC: System hang / deadlock on Linux 6.1
  2023-04-03  5:31       ` Jan Kiszka
@ 2023-04-03  8:52         ` Florian Bezdeka
  2023-04-18 12:16           ` Florian Bezdeka
  0 siblings, 1 reply; 7+ messages in thread
From: Florian Bezdeka @ 2023-04-03  8:52 UTC (permalink / raw)
  To: Jan Kiszka, Jan Kiszka, xenomai; +Cc: Philippe Gerum

On Mon, 2023-04-03 at 07:31 +0200, Jan Kiszka wrote:
> On 30.03.23 18:13, Florian Bezdeka wrote:
> > On Tue, 2023-03-28 at 18:01 +0200, Florian Bezdeka wrote:
> > > On 28.03.23 01:01, Jan Kiszka wrote:
> > > > On 27.03.23 19:30, Florian Bezdeka wrote:
> > > > > Hi all,
> > > > > 
> > > > > I'm currently investigating an issue reported by an internal customer. When
> > > > > trying to run Xenomai (next branch) on top of Dovetail (6.1.15) in an virtual
> > > > > environment (VirtualBox 7.0.6) a complete system hang / deadlock can be
> > > > > observed.
> > > > > 
> > > > > I was not able to reproduce the locking issue myself, but I'm able to "stall"
> > > > > the system by putting a lot of load on the system (stress-ng). After 10-20
> > > > > minutes there is no progress anymore.
> > > > > 
> > > > > The locking issue reported by the customer:
> > > > > 
> > > > > [    5.063059] [Xenomai] lock (____ptrval____) already unlocked on CPU #3
> > > > > [    5.063059]           last owner = kernel/xenomai/pipeline/intr.c:26 (xnintr_core_clock_handler(), CPU #0)
> > > > > [    5.063072] CPU: 3 PID: 130 Comm: systemd-udevd Not tainted 6.1.15-xenomai-1 #1
> > > > > [    5.063075] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VM 12/01/2006
> > > > > [    5.063075] IRQ stage: Xenomai
> > > > > [    5.063077] Call Trace:
> > > > > [    5.063141]  <TASK>
> > > > > [    5.063146]  dump_stack_lvl+0x71/0xa0
> > > > > [    5.063153]  xnlock_dbg_release.cold+0x21/0x2c
> > > > > [    5.063158]  xnintr_core_clock_handler+0xa4/0x140
> > > > > [    5.063166]  lapic_oob_handler+0x41/0xf0
> > > > > [    5.063172]  do_oob_irq+0x25a/0x3e0
> > > > > [    5.063179]  handle_oob_irq+0x4e/0xd0
> > > > > [    5.063182]  generic_pipeline_irq_desc+0xb0/0x160
> > > > > [    5.063213]  arch_handle_irq+0x5d/0x1e0
> > > > > [    5.063218]  arch_pipeline_entry+0xa1/0x110
> > > > > [    5.063222]  asm_sysvec_apic_timer_interrupt+0x16/0x20
> > > > > ...
> > > > > 
> > > > > After reading a lot of code I realized that the so called paravirtualized
> > > > > spinlocks are being used when running under VB (VirtualBox):
> > > > > 
> > > > > [    0.019574] kvm-guest: PV spinlocks enabled
> > > > > 
> > > > > vs. Qemu:
> > > > > 
> > > > > Qemu (with -enable-kvm):
> > > > > [    0.255790] kvm-guest: PV spinlocks disabled, no host support
> > > > > 
> > > > > The good news: With CONFIG_PARAVIRT_SPINLOCKS=n (or "nopvspin" on the kernel
> > > > > cmdline) the problem disappears.
> > > > > 
> > > > > The bad news: As Linux alone (and dovetail without Xenomai patch) runs fine,
> > > > > even with all the stress applied, I'm quite sure that we have a (maybe
> > > > > longstanding) locking bug.
> > > > > 
> > > > > RFC: I'm now testing the patch below, which is already running fine for some
> > > > > hours now. Please let me know if all of this makes sense. I might have
> > > > > overlooked something.
> > > > > 
> > > > > If I'm not mistaken the following can happen on one CPU:
> > > > > 
> > > > > // Example: taken from tick.c, proxy_set_next_ktime()
> > > > > xnlock_get_irqsave(&nklock, flags);
> > > > > // root domain stalled, but hard IRQs are still enabled
> > > > 
> > > > OOB + hard IRQs stalled (xnlock_get_irqsave -> splhigh -> oob_irq_save).
> > > > 
> > > > > 
> > > > > 	// PROXY TICK IRQ FIRES
> > > > > 	// taken from intr.c, xnintr_core_clock_handler()
> > > > > 	xnlock_get(&nklock);	// we already own the lock
> > > > 
> > > > If this code runs under the assumption that hard-IRQs and OOB is stalled
> > > > while they are not, we indeed have a problem. Please check where that
> > > > may have gone wrong.
> > > 
> > > My warnings used to check the context actually never triggered so far,
> > > but the system is still entering the broken state. Some further findings:
> > > 
> > > - nklock internally uses a arch_spinlock_t for locking
> > > - nklock is used by the proxy timer tick IRQ, so very "often"
> > > - arch_spinlock_t is very rarely used in Linux, I found only a couple
> > >   of usages inside x86 code (dumpstack.c, hpet.c, tsc_sync.c)
> > > - arch_spinlock_t might use the PV optimized spinlock operations
> > > - Most other spinlocks (especially in dovetail) are based on
> > >   raw_spinlock
> > > - raw_spinlock does not use PV optimized operations
> > > 
> > > That might be the reason why Linux alone runs fine.
> > > 
> > > I'm now asking myself if the PV spinlock implementation provided by
> > > virtualbox has a bug... As virtualbox is to some degree based on KVM
> > > that might be a KVM bug as well...
> > > 
> > > Let me try to force the slow path for arch_spinlock_t. Maybe my vCPUs
> > > never come back...
> > 
> > Short update:
> > 
> > Seems the slowpath matters, especially when there is noticeable load on
> > the host itself. The vCPU is "parked" and it's simply taking too long
> > until it is back on the CPU. In my case I'm running into an AHCI
> > timeout in most cases, rootfs unaccessable, system "dead".
> > 
> > Still wondering how the "lockdep" error above can actually happen. This
> > was on a different host, so maybe a different problem.
> > 
> 
> It might be that the PV code is not playing according to dovetail rules,
> e.g. not always using virtualized irq-enable/disable or not using hard
> disabling when actually needed. Tracing the call path, along with the
> virtual and physical irq states, may help. Any this may easier be
> debugged over KVM with pv-spinlocks enabled.

Some more test results over the weekend:


Kernel                         PV spin            test result
Linux 6.1.15                   enabled            OK (5h test)
Linux 6.1.15-dovetail          enabled            OK (5h test)
Linux 6.1.15-dovetail + xeno   enabled            FAIL after ~20m
Linux 6.1.15-dovetail + xeno   disabled           FAIL after ~4h

Booting the same image with KVM on the same host gives me
[    0.284674] kvm-guest: PV spinlocks disabled, no host support
so I'm not able to debug that under KVM.

Disabling PV spinlocks with "nopvspin" does not fully solve this
problem, but it seems to minimize the "race window".

I have some RCU stall warnings with some call graphs from the failed
runs above. I will try to analyze them. They seem to be identical.
Let's see.

> 
> Jan
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: RFC: System hang / deadlock on Linux 6.1
  2023-04-03  8:52         ` Florian Bezdeka
@ 2023-04-18 12:16           ` Florian Bezdeka
  0 siblings, 0 replies; 7+ messages in thread
From: Florian Bezdeka @ 2023-04-18 12:16 UTC (permalink / raw)
  To: Jan Kiszka, Jan Kiszka, xenomai; +Cc: Philippe Gerum

On Mon, 2023-04-03 at 10:52 +0200, Florian Bezdeka wrote:
> On Mon, 2023-04-03 at 07:31 +0200, Jan Kiszka wrote:
> > On 30.03.23 18:13, Florian Bezdeka wrote:
> > > On Tue, 2023-03-28 at 18:01 +0200, Florian Bezdeka wrote:
> > > > On 28.03.23 01:01, Jan Kiszka wrote:
> > > > > On 27.03.23 19:30, Florian Bezdeka wrote:
> > > > > > Hi all,
> > > > > > 
> > > > > > I'm currently investigating an issue reported by an internal customer. When
> > > > > > trying to run Xenomai (next branch) on top of Dovetail (6.1.15) in an virtual
> > > > > > environment (VirtualBox 7.0.6) a complete system hang / deadlock can be
> > > > > > observed.
> > > > > > 
> > > > > > I was not able to reproduce the locking issue myself, but I'm able to "stall"
> > > > > > the system by putting a lot of load on the system (stress-ng). After 10-20
> > > > > > minutes there is no progress anymore.
> > > > > > 
> > > > > > The locking issue reported by the customer:
> > > > > > 
> > > > > > [    5.063059] [Xenomai] lock (____ptrval____) already unlocked on CPU #3
> > > > > > [    5.063059]           last owner = kernel/xenomai/pipeline/intr.c:26 (xnintr_core_clock_handler(), CPU #0)
> > > > > > [    5.063072] CPU: 3 PID: 130 Comm: systemd-udevd Not tainted 6.1.15-xenomai-1 #1
> > > > > > [    5.063075] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VM 12/01/2006
> > > > > > [    5.063075] IRQ stage: Xenomai
> > > > > > [    5.063077] Call Trace:
> > > > > > [    5.063141]  <TASK>
> > > > > > [    5.063146]  dump_stack_lvl+0x71/0xa0
> > > > > > [    5.063153]  xnlock_dbg_release.cold+0x21/0x2c
> > > > > > [    5.063158]  xnintr_core_clock_handler+0xa4/0x140
> > > > > > [    5.063166]  lapic_oob_handler+0x41/0xf0
> > > > > > [    5.063172]  do_oob_irq+0x25a/0x3e0
> > > > > > [    5.063179]  handle_oob_irq+0x4e/0xd0
> > > > > > [    5.063182]  generic_pipeline_irq_desc+0xb0/0x160
> > > > > > [    5.063213]  arch_handle_irq+0x5d/0x1e0
> > > > > > [    5.063218]  arch_pipeline_entry+0xa1/0x110
> > > > > > [    5.063222]  asm_sysvec_apic_timer_interrupt+0x16/0x20
> > > > > > ...
> > > > > > 
> > > > > > After reading a lot of code I realized that the so called paravirtualized
> > > > > > spinlocks are being used when running under VB (VirtualBox):
> > > > > > 
> > > > > > [    0.019574] kvm-guest: PV spinlocks enabled
> > > > > > 
> > > > > > vs. Qemu:
> > > > > > 
> > > > > > Qemu (with -enable-kvm):
> > > > > > [    0.255790] kvm-guest: PV spinlocks disabled, no host support
> > > > > > 
> > > > > > The good news: With CONFIG_PARAVIRT_SPINLOCKS=n (or "nopvspin" on the kernel
> > > > > > cmdline) the problem disappears.
> > > > > > 
> > > > > > The bad news: As Linux alone (and dovetail without Xenomai patch) runs fine,
> > > > > > even with all the stress applied, I'm quite sure that we have a (maybe
> > > > > > longstanding) locking bug.
> > > > > > 
> > > > > > RFC: I'm now testing the patch below, which is already running fine for some
> > > > > > hours now. Please let me know if all of this makes sense. I might have
> > > > > > overlooked something.
> > > > > > 
> > > > > > If I'm not mistaken the following can happen on one CPU:
> > > > > > 
> > > > > > // Example: taken from tick.c, proxy_set_next_ktime()
> > > > > > xnlock_get_irqsave(&nklock, flags);
> > > > > > // root domain stalled, but hard IRQs are still enabled
> > > > > 
> > > > > OOB + hard IRQs stalled (xnlock_get_irqsave -> splhigh -> oob_irq_save).
> > > > > 
> > > > > > 
> > > > > > 	// PROXY TICK IRQ FIRES
> > > > > > 	// taken from intr.c, xnintr_core_clock_handler()
> > > > > > 	xnlock_get(&nklock);	// we already own the lock
> > > > > 
> > > > > If this code runs under the assumption that hard-IRQs and OOB is stalled
> > > > > while they are not, we indeed have a problem. Please check where that
> > > > > may have gone wrong.
> > > > 
> > > > My warnings used to check the context actually never triggered so far,
> > > > but the system is still entering the broken state. Some further findings:
> > > > 
> > > > - nklock internally uses a arch_spinlock_t for locking
> > > > - nklock is used by the proxy timer tick IRQ, so very "often"
> > > > - arch_spinlock_t is very rarely used in Linux, I found only a couple
> > > >   of usages inside x86 code (dumpstack.c, hpet.c, tsc_sync.c)
> > > > - arch_spinlock_t might use the PV optimized spinlock operations
> > > > - Most other spinlocks (especially in dovetail) are based on
> > > >   raw_spinlock
> > > > - raw_spinlock does not use PV optimized operations
> > > > 
> > > > That might be the reason why Linux alone runs fine.
> > > > 
> > > > I'm now asking myself if the PV spinlock implementation provided by
> > > > virtualbox has a bug... As virtualbox is to some degree based on KVM
> > > > that might be a KVM bug as well...
> > > > 
> > > > Let me try to force the slow path for arch_spinlock_t. Maybe my vCPUs
> > > > never come back...
> > > 
> > > Short update:
> > > 
> > > Seems the slowpath matters, especially when there is noticeable load on
> > > the host itself. The vCPU is "parked" and it's simply taking too long
> > > until it is back on the CPU. In my case I'm running into an AHCI
> > > timeout in most cases, rootfs unaccessable, system "dead".
> > > 
> > > Still wondering how the "lockdep" error above can actually happen. This
> > > was on a different host, so maybe a different problem.
> > > 
> > 
> > It might be that the PV code is not playing according to dovetail rules,
> > e.g. not always using virtualized irq-enable/disable or not using hard
> > disabling when actually needed. Tracing the call path, along with the
> > virtual and physical irq states, may help. Any this may easier be
> > debugged over KVM with pv-spinlocks enabled.
> 
> Some more test results over the weekend:
> 
> 
> Kernel                         PV spin            test result
> Linux 6.1.15                   enabled            OK (5h test)
> Linux 6.1.15-dovetail          enabled            OK (5h test)
> Linux 6.1.15-dovetail + xeno   enabled            FAIL after ~20m
> Linux 6.1.15-dovetail + xeno   disabled           FAIL after ~4h
> 
> Booting the same image with KVM on the same host gives me
> [    0.284674] kvm-guest: PV spinlocks disabled, no host support
> so I'm not able to debug that under KVM.
> 
> Disabling PV spinlocks with "nopvspin" does not fully solve this
> problem, but it seems to minimize the "race window".
> 
> I have some RCU stall warnings with some call graphs from the failed
> runs above. I will try to analyze them. They seem to be identical.
> Let's see.

TLDR: It's not a Xenomai issue. Problem disappeared after a KVM bug has
been identified [1]. In addition the guest/systemd no longer detects
KVM as hypervisor but "oracle" (VBox vendor) which is even more
correct.

I saw that after some time the highest bit of the ISR of the vCPU was
set and never cleared. With that I easily found [1], which was
available as update for my host system. After updating my host
something else must have changed in addition. The guest no longer
detects KVM as hypervisor but "oracle". I did not spend the time yet to
identify which patch/commit introduced this behavior.

The system is running fine for > 24h now. No further investigations
will be made.

[1] is part of Linux >= 6.1.16.

[1] https://github.com/torvalds/linux/commit/97a71c444a147ae41c7d0ab5b3d855d7f762f3ed

Florian

> 
> > 
> > Jan
> > 
> 
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2023-04-18 13:57 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-27 17:30 RFC: System hang / deadlock on Linux 6.1 Florian Bezdeka
2023-03-27 23:01 ` Jan Kiszka
2023-03-28 16:01   ` Florian Bezdeka
2023-03-30 16:13     ` Florian Bezdeka
2023-04-03  5:31       ` Jan Kiszka
2023-04-03  8:52         ` Florian Bezdeka
2023-04-18 12:16           ` Florian Bezdeka

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).