xen/evtchn: Dom0 boot hangs using preempt_rt kernel 5.10

* xen/evtchn: Dom0 boot hangs using preempt_rt kernel 5.10
       [not found] <VI1PR08MB3629824170C1707255465D8BE46A9@VI1PR08MB3629.eurprd08.prod.outlook.com>
@ 2021-03-17 14:32 ` Luca Fancellu
  2021-03-18  7:54   ` Jürgen Groß
  0 siblings, 1 reply; 12+ messages in thread
From: Luca Fancellu @ 2021-03-17 14:32 UTC (permalink / raw)
  To: xen-devel; +Cc: Juergen Gross, jgrall

[-- Attachment #1: Type: text/plain, Size: 3209 bytes --]

Hi all,

we've been encountering an issue when using the kernel 5.10 with preempt_rt support for Dom0, the problem is that during the boot of Dom0, it hits a BUG_ON(!irqs_disabled()) from the function evtchn_fifo_unmask defined in events_fifo.c.

This is the call stack:

[   17.817018] ------------[ cut here ]------------
[   17.817021] kernel BUG at drivers/xen/events/events_fifo.c:258!
[   18.817079] Internal error: Oops - BUG: 0 [#1] PREEMPT_RT SMP
[   18.817081] Modules linked in: bridge stp llc ipv6
[   18.817086] CPU: 3 PID: 558 Comm: xenstored Not tainted 5.10.16-rt25-yocto-preempt-rt #1
[   18.817089] Hardware name: Arm Neoverse N1 System Development Platform (DT)
[   18.817090] pstate: 60400005 (nZCv daif +PAN -UAO -TCO BTYPE=--)
[   18.817092] pc : evtchn_fifo_unmask+0xd4/0xe0
[   18.817099] lr : xen_irq_lateeoi_locked+0xec/0x200
[   18.817102] sp : ffff8000123f3cc0
[   18.817102] x29: ffff8000123f3cc0 x28: ffff0000427b1d80
[   18.817104] x27: 0000000000000000 x26: 0000000000000000
[   18.817106] x25: 0000000000000001 x24: 0000000000000001
[   18.817107] x23: ffff0000412fc900 x22: 0000000000000004
[   18.817109] x21: 0000000000000000 x20: ffff000042e06990
[   18.817110] x19: ffff0000427b1d80 x18: 0000000000000010
[   18.817112] x17: 0000000000000000 x16: 0000000000000000
[   18.817113] x15: 0000000000000002 x14: 0000000000000001
[   18.817114] x13: 000000000001a7e8 x12: 0000000000000040
[   18.817116] x11: ffff000040400248 x10: ffff00004040024a
[   18.817117] x9 : ffff800011be5200 x8 : ffff000040400270
[   18.817119] x7 : 0000000000000000 x6 : 0000000000000003
[   18.817120] x5 : 0000000000000000 x4 : ffff000040400308
[   18.817121] x3 : ffff0000408a400c x2 : 0000000000000000
[   18.817122] x1 : 0000000000000000 x0 : ffff0000408a4000
[   18.817124] Call trace:
[   18.817125]  evtchn_fifo_unmask+0xd4/0xe0
[   18.817127]  xen_irq_lateeoi_locked+0xec/0x200
[   18.817129]  xen_irq_lateeoi+0x48/0x64
[   18.817131]  evtchn_write+0x124/0x15c
[   18.817134]  vfs_write+0xf0/0x2cc
[   18.817137]  ksys_write+0xe0/0x100
[   18.817139]  __arm64_sys_write+0x20/0x30
[   18.817142]  el0_svc_common.constprop.0+0x78/0x1a0
[   18.817145]  do_el0_svc+0x24/0x90
[   18.817147]  el0_svc+0x14/0x20
[   18.817151]  el0_sync_handler+0x1a4/0x1b0
[   18.817153]  el0_sync+0x174/0x180
[   18.817156] Code: 52800120 b90023e6 97e6d104 17fffff0 (d4210000)
[   18.817158] ---[ end trace 0000000000000002 ]---

Our last tested kernel was the 5.4 and our analysis pointed out that the introduction of the lateeoi framework (xen/events: add a new "late EOI" evtchn framework) in conjunction with the preempt_rt patches (irqs kept enabled between spinlock_t/rwlock_t _irqsave/_irqrestore operations) is the root cause.

Given that many modifications were made to the mask/unmask operations, a big one from Juergen Gross (xen/events: don't unmask an event channel when an eoi is pending), is the BUG_ON(...) still needed?

With the mentioned commit every call to a mask/unmask operation is protected by a spinlock, so I would like to have some feedbacks from who has more experience than me on this part of the code.

Thank you,

Luca

[-- Attachment #2: Type: text/html, Size: 6232 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread