From: Florian Bezdeka <florian.bezdeka@siemens.com>
To: Jan Kiszka <jan.kiszka@siemens.com>,
Jan Kiszka <jan.kiszka@web.de>,
xenomai@lists.linux.dev
Cc: Philippe Gerum <rpm@xenomai.org>
Subject: Re: RFC: System hang / deadlock on Linux 6.1
Date: Tue, 18 Apr 2023 14:16:30 +0200 [thread overview]
Message-ID: <8bc373bcf9b5a52eba1b45f1d8ce2e33efb7dd13.camel@siemens.com> (raw)
In-Reply-To: <d86fab4595a09ff4e78b9e49ab59f96e20a95a99.camel@siemens.com>
On Mon, 2023-04-03 at 10:52 +0200, Florian Bezdeka wrote:
> On Mon, 2023-04-03 at 07:31 +0200, Jan Kiszka wrote:
> > On 30.03.23 18:13, Florian Bezdeka wrote:
> > > On Tue, 2023-03-28 at 18:01 +0200, Florian Bezdeka wrote:
> > > > On 28.03.23 01:01, Jan Kiszka wrote:
> > > > > On 27.03.23 19:30, Florian Bezdeka wrote:
> > > > > > Hi all,
> > > > > >
> > > > > > I'm currently investigating an issue reported by an internal customer. When
> > > > > > trying to run Xenomai (next branch) on top of Dovetail (6.1.15) in an virtual
> > > > > > environment (VirtualBox 7.0.6) a complete system hang / deadlock can be
> > > > > > observed.
> > > > > >
> > > > > > I was not able to reproduce the locking issue myself, but I'm able to "stall"
> > > > > > the system by putting a lot of load on the system (stress-ng). After 10-20
> > > > > > minutes there is no progress anymore.
> > > > > >
> > > > > > The locking issue reported by the customer:
> > > > > >
> > > > > > [ 5.063059] [Xenomai] lock (____ptrval____) already unlocked on CPU #3
> > > > > > [ 5.063059] last owner = kernel/xenomai/pipeline/intr.c:26 (xnintr_core_clock_handler(), CPU #0)
> > > > > > [ 5.063072] CPU: 3 PID: 130 Comm: systemd-udevd Not tainted 6.1.15-xenomai-1 #1
> > > > > > [ 5.063075] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VM 12/01/2006
> > > > > > [ 5.063075] IRQ stage: Xenomai
> > > > > > [ 5.063077] Call Trace:
> > > > > > [ 5.063141] <TASK>
> > > > > > [ 5.063146] dump_stack_lvl+0x71/0xa0
> > > > > > [ 5.063153] xnlock_dbg_release.cold+0x21/0x2c
> > > > > > [ 5.063158] xnintr_core_clock_handler+0xa4/0x140
> > > > > > [ 5.063166] lapic_oob_handler+0x41/0xf0
> > > > > > [ 5.063172] do_oob_irq+0x25a/0x3e0
> > > > > > [ 5.063179] handle_oob_irq+0x4e/0xd0
> > > > > > [ 5.063182] generic_pipeline_irq_desc+0xb0/0x160
> > > > > > [ 5.063213] arch_handle_irq+0x5d/0x1e0
> > > > > > [ 5.063218] arch_pipeline_entry+0xa1/0x110
> > > > > > [ 5.063222] asm_sysvec_apic_timer_interrupt+0x16/0x20
> > > > > > ...
> > > > > >
> > > > > > After reading a lot of code I realized that the so called paravirtualized
> > > > > > spinlocks are being used when running under VB (VirtualBox):
> > > > > >
> > > > > > [ 0.019574] kvm-guest: PV spinlocks enabled
> > > > > >
> > > > > > vs. Qemu:
> > > > > >
> > > > > > Qemu (with -enable-kvm):
> > > > > > [ 0.255790] kvm-guest: PV spinlocks disabled, no host support
> > > > > >
> > > > > > The good news: With CONFIG_PARAVIRT_SPINLOCKS=n (or "nopvspin" on the kernel
> > > > > > cmdline) the problem disappears.
> > > > > >
> > > > > > The bad news: As Linux alone (and dovetail without Xenomai patch) runs fine,
> > > > > > even with all the stress applied, I'm quite sure that we have a (maybe
> > > > > > longstanding) locking bug.
> > > > > >
> > > > > > RFC: I'm now testing the patch below, which is already running fine for some
> > > > > > hours now. Please let me know if all of this makes sense. I might have
> > > > > > overlooked something.
> > > > > >
> > > > > > If I'm not mistaken the following can happen on one CPU:
> > > > > >
> > > > > > // Example: taken from tick.c, proxy_set_next_ktime()
> > > > > > xnlock_get_irqsave(&nklock, flags);
> > > > > > // root domain stalled, but hard IRQs are still enabled
> > > > >
> > > > > OOB + hard IRQs stalled (xnlock_get_irqsave -> splhigh -> oob_irq_save).
> > > > >
> > > > > >
> > > > > > // PROXY TICK IRQ FIRES
> > > > > > // taken from intr.c, xnintr_core_clock_handler()
> > > > > > xnlock_get(&nklock); // we already own the lock
> > > > >
> > > > > If this code runs under the assumption that hard-IRQs and OOB is stalled
> > > > > while they are not, we indeed have a problem. Please check where that
> > > > > may have gone wrong.
> > > >
> > > > My warnings used to check the context actually never triggered so far,
> > > > but the system is still entering the broken state. Some further findings:
> > > >
> > > > - nklock internally uses a arch_spinlock_t for locking
> > > > - nklock is used by the proxy timer tick IRQ, so very "often"
> > > > - arch_spinlock_t is very rarely used in Linux, I found only a couple
> > > > of usages inside x86 code (dumpstack.c, hpet.c, tsc_sync.c)
> > > > - arch_spinlock_t might use the PV optimized spinlock operations
> > > > - Most other spinlocks (especially in dovetail) are based on
> > > > raw_spinlock
> > > > - raw_spinlock does not use PV optimized operations
> > > >
> > > > That might be the reason why Linux alone runs fine.
> > > >
> > > > I'm now asking myself if the PV spinlock implementation provided by
> > > > virtualbox has a bug... As virtualbox is to some degree based on KVM
> > > > that might be a KVM bug as well...
> > > >
> > > > Let me try to force the slow path for arch_spinlock_t. Maybe my vCPUs
> > > > never come back...
> > >
> > > Short update:
> > >
> > > Seems the slowpath matters, especially when there is noticeable load on
> > > the host itself. The vCPU is "parked" and it's simply taking too long
> > > until it is back on the CPU. In my case I'm running into an AHCI
> > > timeout in most cases, rootfs unaccessable, system "dead".
> > >
> > > Still wondering how the "lockdep" error above can actually happen. This
> > > was on a different host, so maybe a different problem.
> > >
> >
> > It might be that the PV code is not playing according to dovetail rules,
> > e.g. not always using virtualized irq-enable/disable or not using hard
> > disabling when actually needed. Tracing the call path, along with the
> > virtual and physical irq states, may help. Any this may easier be
> > debugged over KVM with pv-spinlocks enabled.
>
> Some more test results over the weekend:
>
>
> Kernel PV spin test result
> Linux 6.1.15 enabled OK (5h test)
> Linux 6.1.15-dovetail enabled OK (5h test)
> Linux 6.1.15-dovetail + xeno enabled FAIL after ~20m
> Linux 6.1.15-dovetail + xeno disabled FAIL after ~4h
>
> Booting the same image with KVM on the same host gives me
> [ 0.284674] kvm-guest: PV spinlocks disabled, no host support
> so I'm not able to debug that under KVM.
>
> Disabling PV spinlocks with "nopvspin" does not fully solve this
> problem, but it seems to minimize the "race window".
>
> I have some RCU stall warnings with some call graphs from the failed
> runs above. I will try to analyze them. They seem to be identical.
> Let's see.
TLDR: It's not a Xenomai issue. Problem disappeared after a KVM bug has
been identified [1]. In addition the guest/systemd no longer detects
KVM as hypervisor but "oracle" (VBox vendor) which is even more
correct.
I saw that after some time the highest bit of the ISR of the vCPU was
set and never cleared. With that I easily found [1], which was
available as update for my host system. After updating my host
something else must have changed in addition. The guest no longer
detects KVM as hypervisor but "oracle". I did not spend the time yet to
identify which patch/commit introduced this behavior.
The system is running fine for > 24h now. No further investigations
will be made.
[1] is part of Linux >= 6.1.16.
[1] https://github.com/torvalds/linux/commit/97a71c444a147ae41c7d0ab5b3d855d7f762f3ed
Florian
>
> >
> > Jan
> >
>
>
prev parent reply other threads:[~2023-04-18 13:57 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-03-27 17:30 RFC: System hang / deadlock on Linux 6.1 Florian Bezdeka
2023-03-27 23:01 ` Jan Kiszka
2023-03-28 16:01 ` Florian Bezdeka
2023-03-30 16:13 ` Florian Bezdeka
2023-04-03 5:31 ` Jan Kiszka
2023-04-03 8:52 ` Florian Bezdeka
2023-04-18 12:16 ` Florian Bezdeka [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8bc373bcf9b5a52eba1b45f1d8ce2e33efb7dd13.camel@siemens.com \
--to=florian.bezdeka@siemens.com \
--cc=jan.kiszka@siemens.com \
--cc=jan.kiszka@web.de \
--cc=rpm@xenomai.org \
--cc=xenomai@lists.linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).