All of lore.kernel.org
 help / color / mirror / Atom feed
* xen/evtchn: Interrupt for port 34, but apparently not enabled; per-user 00000000a86a4c1b on 5.10
@ 2020-12-14 21:25 Julien Grall
  2020-12-15  7:27 ` Jürgen Groß
  2021-02-04 12:30 ` Jürgen Groß
  0 siblings, 2 replies; 6+ messages in thread
From: Julien Grall @ 2020-12-14 21:25 UTC (permalink / raw)
  To: aams, Juergen Gross; +Cc: linux-kernel, xen-devel, foersleo

Hi Juergen,

When testing Linux 5.10 dom0, I could reliably hit the following warning 
with using event 2L ABI:

[  589.591737] Interrupt for port 34, but apparently not enabled; 
per-user 00000000a86a4c1b
[  589.593259] WARNING: CPU: 0 PID: 1111 at 
/home/ANT.AMAZON.COM/jgrall/works/oss/linux/drivers/xen/evtchn.c:170 
evtchn_interrupt+0xeb/0x100
[  589.595514] Modules linked in:
[  589.596145] CPU: 0 PID: 1111 Comm: qemu-system-i38 Tainted: G 
W         5.10.0+ #180
[  589.597708] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014
[  589.599782] RIP: e030:evtchn_interrupt+0xeb/0x100
[  589.600698] Code: 48 8d bb d8 01 00 00 ba 01 00 00 00 be 1d 00 00 00 
e8 d9 10 ca ff eb b2 8b 75 20 48 89 da 48 c7 c7 a8 31 3d 82 e8 65 29 a0 
ff <0f> 0b e9 42 ff ff ff 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f
[  589.604087] RSP: e02b:ffffc90040003e70 EFLAGS: 00010086
[  589.605102] RAX: 0000000000000000 RBX: ffff888102091800 RCX: 
0000000000000027
[  589.606445] RDX: 0000000000000000 RSI: ffff88817fe19150 RDI: 
ffff88817fe19158
[  589.607790] RBP: ffff88810f5ab980 R08: 0000000000000001 R09: 
0000000000328980
[  589.609134] R10: 0000000000000000 R11: ffffc90040003c70 R12: 
ffff888107fd3c00
[  589.610484] R13: ffffc90040003ed4 R14: 0000000000000000 R15: 
ffff88810f5ffd80
[  589.611828] FS:  00007f960c4b8ac0(0000) GS:ffff88817fe00000(0000) 
knlGS:0000000000000000
[  589.613348] CS:  10000e030 DS: 0000 ES: 0000 CR0: 0000000080050033
[  589.614525] CR2: 00007f17ee72e000 CR3: 000000010f5b6000 CR4: 
0000000000050660
[  589.615874] Call Trace:
[  589.616402]  <IRQ>
[  589.616855]  __handle_irq_event_percpu+0x4e/0x2c0
[  589.617784]  handle_irq_event_percpu+0x30/0x80
[  589.618660]  handle_irq_event+0x3a/0x60
[  589.619428]  handle_edge_irq+0x9b/0x1f0
[  589.620209]  generic_handle_irq+0x4f/0x60
[  589.621008]  evtchn_2l_handle_events+0x160/0x280
[  589.621913]  __xen_evtchn_do_upcall+0x66/0xb0
[  589.622767]  __xen_pv_evtchn_do_upcall+0x11/0x20
[  589.623665]  asm_call_irq_on_stack+0x12/0x20
[  589.624511]  </IRQ>
[  589.624978]  xen_pv_evtchn_do_upcall+0x77/0xf0
[  589.625848]  exc_xen_hypervisor_callback+0x8/0x10

This can be reproduced when creating/destroying guest in a loop. 
Although, I have struggled to reproduce it on a vanilla Xen.

After several hours of debugging, I think I have found the root cause.

While we only expect the unmask to happen when the event channel is 
EOIed, there is an unmask happening as part of handle_edge_irq() because 
the interrupt was seen as pending by another vCPU (IRQS_PENDING is set).

It turns out that the event channel is set for multiple vCPU is in 
cpu_evtchn_mask. This is happening because the affinity is not cleared 
when freeing an event channel.

The implementation of evtchn_2l_handle_events() will look for all the 
active interrupts for the current vCPU and later on clear the pending 
bit (via the ack() callback). IOW, I believe, this is not an atomic 
operation.

Even if Xen will notify the event to a single vCPU, evtchn_pending_sel 
may still be set on the other vCPU (thanks to a different event 
channel). Therefore, there is a chance that two vCPUs will try to handle 
the same interrupt.

The IRQ handler handle_edge_irq() is able to deal with that and will 
mask/unmask the interrupt. This will mess us with the lateeoi logic 
(although, I managed to reproduce it once without XSA-332).

My initial idea to fix the problem was to switch the affinity from CPU X 
to CPU0 when the event channel is freed.

However, I am not sure this is enough because I haven't found anything 
yet preventing a race between evtchn_2l_handle_events9) and 
evtchn_2l_bind_vcpu().

So maybe we want to introduce a refcounting (if there is nothing 
provided by the IRQ framework) and only unmask when the counter drop to 0.

Any opinions?

Cheers,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: xen/evtchn: Interrupt for port 34, but apparently not enabled; per-user 00000000a86a4c1b on 5.10
  2020-12-14 21:25 xen/evtchn: Interrupt for port 34, but apparently not enabled; per-user 00000000a86a4c1b on 5.10 Julien Grall
@ 2020-12-15  7:27 ` Jürgen Groß
  2020-12-15 10:20   ` Jürgen Groß
  2021-02-04 12:30 ` Jürgen Groß
  1 sibling, 1 reply; 6+ messages in thread
From: Jürgen Groß @ 2020-12-15  7:27 UTC (permalink / raw)
  To: Julien Grall, aams; +Cc: linux-kernel, xen-devel, foersleo


[-- Attachment #1.1.1: Type: text/plain, Size: 4772 bytes --]

On 14.12.20 22:25, Julien Grall wrote:
> Hi Juergen,
> 
> When testing Linux 5.10 dom0, I could reliably hit the following warning 
> with using event 2L ABI:
> 
> [  589.591737] Interrupt for port 34, but apparently not enabled; 
> per-user 00000000a86a4c1b
> [  589.593259] WARNING: CPU: 0 PID: 1111 at 
> /home/ANT.AMAZON.COM/jgrall/works/oss/linux/drivers/xen/evtchn.c:170 
> evtchn_interrupt+0xeb/0x100
> [  589.595514] Modules linked in:
> [  589.596145] CPU: 0 PID: 1111 Comm: qemu-system-i38 Tainted: G 
> W         5.10.0+ #180
> [  589.597708] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
> rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014
> [  589.599782] RIP: e030:evtchn_interrupt+0xeb/0x100
> [  589.600698] Code: 48 8d bb d8 01 00 00 ba 01 00 00 00 be 1d 00 00 00 
> e8 d9 10 ca ff eb b2 8b 75 20 48 89 da 48 c7 c7 a8 31 3d 82 e8 65 29 a0 
> ff <0f> 0b e9 42 ff ff ff 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f
> [  589.604087] RSP: e02b:ffffc90040003e70 EFLAGS: 00010086
> [  589.605102] RAX: 0000000000000000 RBX: ffff888102091800 RCX: 
> 0000000000000027
> [  589.606445] RDX: 0000000000000000 RSI: ffff88817fe19150 RDI: 
> ffff88817fe19158
> [  589.607790] RBP: ffff88810f5ab980 R08: 0000000000000001 R09: 
> 0000000000328980
> [  589.609134] R10: 0000000000000000 R11: ffffc90040003c70 R12: 
> ffff888107fd3c00
> [  589.610484] R13: ffffc90040003ed4 R14: 0000000000000000 R15: 
> ffff88810f5ffd80
> [  589.611828] FS:  00007f960c4b8ac0(0000) GS:ffff88817fe00000(0000) 
> knlGS:0000000000000000
> [  589.613348] CS:  10000e030 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  589.614525] CR2: 00007f17ee72e000 CR3: 000000010f5b6000 CR4: 
> 0000000000050660
> [  589.615874] Call Trace:
> [  589.616402]  <IRQ>
> [  589.616855]  __handle_irq_event_percpu+0x4e/0x2c0
> [  589.617784]  handle_irq_event_percpu+0x30/0x80
> [  589.618660]  handle_irq_event+0x3a/0x60
> [  589.619428]  handle_edge_irq+0x9b/0x1f0
> [  589.620209]  generic_handle_irq+0x4f/0x60
> [  589.621008]  evtchn_2l_handle_events+0x160/0x280
> [  589.621913]  __xen_evtchn_do_upcall+0x66/0xb0
> [  589.622767]  __xen_pv_evtchn_do_upcall+0x11/0x20
> [  589.623665]  asm_call_irq_on_stack+0x12/0x20
> [  589.624511]  </IRQ>
> [  589.624978]  xen_pv_evtchn_do_upcall+0x77/0xf0
> [  589.625848]  exc_xen_hypervisor_callback+0x8/0x10
> 
> This can be reproduced when creating/destroying guest in a loop. 
> Although, I have struggled to reproduce it on a vanilla Xen.
> 
> After several hours of debugging, I think I have found the root cause.
> 
> While we only expect the unmask to happen when the event channel is 
> EOIed, there is an unmask happening as part of handle_edge_irq() because 
> the interrupt was seen as pending by another vCPU (IRQS_PENDING is set).
> 
> It turns out that the event channel is set for multiple vCPU is in 
> cpu_evtchn_mask. This is happening because the affinity is not cleared 
> when freeing an event channel.
> 
> The implementation of evtchn_2l_handle_events() will look for all the 
> active interrupts for the current vCPU and later on clear the pending 
> bit (via the ack() callback). IOW, I believe, this is not an atomic 
> operation.
> 
> Even if Xen will notify the event to a single vCPU, evtchn_pending_sel 
> may still be set on the other vCPU (thanks to a different event 
> channel). Therefore, there is a chance that two vCPUs will try to handle 
> the same interrupt.
> 
> The IRQ handler handle_edge_irq() is able to deal with that and will 
> mask/unmask the interrupt. This will mess us with the lateeoi logic 
> (although, I managed to reproduce it once without XSA-332).

Thanks for the analysis!

> My initial idea to fix the problem was to switch the affinity from CPU X 
> to CPU0 when the event channel is freed.
> 
> However, I am not sure this is enough because I haven't found anything 
> yet preventing a race between evtchn_2l_handle_events9) and 
> evtchn_2l_bind_vcpu().
> 
> So maybe we want to introduce a refcounting (if there is nothing 
> provided by the IRQ framework) and only unmask when the counter drop to 0.
> 
> Any opinions?

I think we don't need a refcount, but just the internal states "masked"
and "eoi_pending" and unmask only if both are false. "masked" will be
set when the event is being masked. When delivering a lateeoi irq
"eoi_pending" will be set and "masked "reset. "masked" will be reset
when a normal unmask is happening. And "eoi_pending" will be reset
when a lateeoi is signaled. Any reset of "masked" and "eoi_pending"
will check the other flag and do an unmask if both are false.

I'll write a patch.


Juergen

[-- Attachment #1.1.2: OpenPGP_0xB0DE9DD628BF132F.asc --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: xen/evtchn: Interrupt for port 34, but apparently not enabled; per-user 00000000a86a4c1b on 5.10
  2020-12-15  7:27 ` Jürgen Groß
@ 2020-12-15 10:20   ` Jürgen Groß
  2020-12-18 13:35     ` Julien Grall
  2020-12-23 12:11     ` Julien Grall
  0 siblings, 2 replies; 6+ messages in thread
From: Jürgen Groß @ 2020-12-15 10:20 UTC (permalink / raw)
  To: Julien Grall, aams; +Cc: linux-kernel, xen-devel, foersleo


[-- Attachment #1.1.1: Type: text/plain, Size: 5004 bytes --]

On 15.12.20 08:27, Jürgen Groß wrote:
> On 14.12.20 22:25, Julien Grall wrote:
>> Hi Juergen,
>>
>> When testing Linux 5.10 dom0, I could reliably hit the following 
>> warning with using event 2L ABI:
>>
>> [  589.591737] Interrupt for port 34, but apparently not enabled; 
>> per-user 00000000a86a4c1b
>> [  589.593259] WARNING: CPU: 0 PID: 1111 at 
>> /home/ANT.AMAZON.COM/jgrall/works/oss/linux/drivers/xen/evtchn.c:170 
>> evtchn_interrupt+0xeb/0x100
>> [  589.595514] Modules linked in:
>> [  589.596145] CPU: 0 PID: 1111 Comm: qemu-system-i38 Tainted: G 
>> W         5.10.0+ #180
>> [  589.597708] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), 
>> BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014
>> [  589.599782] RIP: e030:evtchn_interrupt+0xeb/0x100
>> [  589.600698] Code: 48 8d bb d8 01 00 00 ba 01 00 00 00 be 1d 00 00 
>> 00 e8 d9 10 ca ff eb b2 8b 75 20 48 89 da 48 c7 c7 a8 31 3d 82 e8 65 
>> 29 a0 ff <0f> 0b e9 42 ff ff ff 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 
>> 00 0f
>> [  589.604087] RSP: e02b:ffffc90040003e70 EFLAGS: 00010086
>> [  589.605102] RAX: 0000000000000000 RBX: ffff888102091800 RCX: 
>> 0000000000000027
>> [  589.606445] RDX: 0000000000000000 RSI: ffff88817fe19150 RDI: 
>> ffff88817fe19158
>> [  589.607790] RBP: ffff88810f5ab980 R08: 0000000000000001 R09: 
>> 0000000000328980
>> [  589.609134] R10: 0000000000000000 R11: ffffc90040003c70 R12: 
>> ffff888107fd3c00
>> [  589.610484] R13: ffffc90040003ed4 R14: 0000000000000000 R15: 
>> ffff88810f5ffd80
>> [  589.611828] FS:  00007f960c4b8ac0(0000) GS:ffff88817fe00000(0000) 
>> knlGS:0000000000000000
>> [  589.613348] CS:  10000e030 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [  589.614525] CR2: 00007f17ee72e000 CR3: 000000010f5b6000 CR4: 
>> 0000000000050660
>> [  589.615874] Call Trace:
>> [  589.616402]  <IRQ>
>> [  589.616855]  __handle_irq_event_percpu+0x4e/0x2c0
>> [  589.617784]  handle_irq_event_percpu+0x30/0x80
>> [  589.618660]  handle_irq_event+0x3a/0x60
>> [  589.619428]  handle_edge_irq+0x9b/0x1f0
>> [  589.620209]  generic_handle_irq+0x4f/0x60
>> [  589.621008]  evtchn_2l_handle_events+0x160/0x280
>> [  589.621913]  __xen_evtchn_do_upcall+0x66/0xb0
>> [  589.622767]  __xen_pv_evtchn_do_upcall+0x11/0x20
>> [  589.623665]  asm_call_irq_on_stack+0x12/0x20
>> [  589.624511]  </IRQ>
>> [  589.624978]  xen_pv_evtchn_do_upcall+0x77/0xf0
>> [  589.625848]  exc_xen_hypervisor_callback+0x8/0x10
>>
>> This can be reproduced when creating/destroying guest in a loop. 
>> Although, I have struggled to reproduce it on a vanilla Xen.
>>
>> After several hours of debugging, I think I have found the root cause.
>>
>> While we only expect the unmask to happen when the event channel is 
>> EOIed, there is an unmask happening as part of handle_edge_irq() 
>> because the interrupt was seen as pending by another vCPU 
>> (IRQS_PENDING is set).
>>
>> It turns out that the event channel is set for multiple vCPU is in 
>> cpu_evtchn_mask. This is happening because the affinity is not cleared 
>> when freeing an event channel.
>>
>> The implementation of evtchn_2l_handle_events() will look for all the 
>> active interrupts for the current vCPU and later on clear the pending 
>> bit (via the ack() callback). IOW, I believe, this is not an atomic 
>> operation.
>>
>> Even if Xen will notify the event to a single vCPU, evtchn_pending_sel 
>> may still be set on the other vCPU (thanks to a different event 
>> channel). Therefore, there is a chance that two vCPUs will try to 
>> handle the same interrupt.
>>
>> The IRQ handler handle_edge_irq() is able to deal with that and will 
>> mask/unmask the interrupt. This will mess us with the lateeoi logic 
>> (although, I managed to reproduce it once without XSA-332).
> 
> Thanks for the analysis!
> 
>> My initial idea to fix the problem was to switch the affinity from CPU 
>> X to CPU0 when the event channel is freed.
>>
>> However, I am not sure this is enough because I haven't found anything 
>> yet preventing a race between evtchn_2l_handle_events9) and 
>> evtchn_2l_bind_vcpu().
>>
>> So maybe we want to introduce a refcounting (if there is nothing 
>> provided by the IRQ framework) and only unmask when the counter drop 
>> to 0.
>>
>> Any opinions?
> 
> I think we don't need a refcount, but just the internal states "masked"
> and "eoi_pending" and unmask only if both are false. "masked" will be
> set when the event is being masked. When delivering a lateeoi irq
> "eoi_pending" will be set and "masked "reset. "masked" will be reset
> when a normal unmask is happening. And "eoi_pending" will be reset
> when a lateeoi is signaled. Any reset of "masked" and "eoi_pending"
> will check the other flag and do an unmask if both are false.
> 
> I'll write a patch.

Julien, could you please test the attached (only build tested) patch?


Juergen

[-- Attachment #1.1.2: 0001-xen-events-don-t-unmask-an-event-channel-when-an-eoi.patch --]
[-- Type: text/x-patch, Size: 4649 bytes --]

From 2ce5786fd6f29ec09ad653e30e089042ea62b309 Mon Sep 17 00:00:00 2001
From: Juergen Gross <jgross@suse.com>
Date: Tue, 15 Dec 2020 10:37:11 +0100
Subject: [PATCH] xen/events: don't unmask an event channel when an eoi is
 pending

An event channel should be kept masked when an eoi is pending for it.
When being migrated to another cpu it might be unmasked, though.

In order to avoid this keep two different flags for each event channel
to be able to distinguish "normal" masking/unmasking from eoi related
masking/unmasking. The event channel should only be able to generate
an interrupt if both flags are cleared.

Cc: stable@vger.kernel.org
Fixes: 54c9de89895e0a36047 ("xen/events: add a new late EOI evtchn framework")
Reported-by: Julien Grall <julien@xen.org>
Signed-off-by: Juergen Gross <jgross@suse.com>
---
 drivers/xen/events/events_base.c | 64 +++++++++++++++++++++++++++-----
 1 file changed, 54 insertions(+), 10 deletions(-)

diff --git a/drivers/xen/events/events_base.c b/drivers/xen/events/events_base.c
index 6038c4c35db5..b024200f1677 100644
--- a/drivers/xen/events/events_base.c
+++ b/drivers/xen/events/events_base.c
@@ -96,7 +96,9 @@ struct irq_info {
 	struct list_head eoi_list;
 	short refcnt;
 	short spurious_cnt;
-	enum xen_irq_type type; /* type */
+	short type;		/* type: IRQT_* */
+	bool masked;		/* Is event explicitly masked? */
+	bool eoi_pending;	/* Is EOI pending? */
 	unsigned irq;
 	evtchn_port_t evtchn;   /* event channel */
 	unsigned short cpu;     /* cpu bound */
@@ -272,6 +274,8 @@ static int xen_irq_info_common_setup(struct irq_info *info,
 	info->irq = irq;
 	info->evtchn = evtchn;
 	info->cpu = cpu;
+	info->masked = true;
+	info->eoi_pending = false;
 
 	ret = set_evtchn_to_irq(evtchn, irq);
 	if (ret < 0)
@@ -545,7 +549,10 @@ static void xen_irq_lateeoi_locked(struct irq_info *info, bool spurious)
 	}
 
 	info->eoi_time = 0;
-	unmask_evtchn(evtchn);
+	info->eoi_pending = false;
+
+	if (!info->masked)
+		unmask_evtchn(evtchn);
 }
 
 static void xen_irq_lateeoi_worker(struct work_struct *work)
@@ -801,7 +808,11 @@ static unsigned int __startup_pirq(unsigned int irq)
 		goto err;
 
 out:
-	unmask_evtchn(evtchn);
+	info->masked = false;
+
+	if (!info->eoi_pending)
+		unmask_evtchn(evtchn);
+
 	eoi_pirq(irq_get_irq_data(irq));
 
 	return 0;
@@ -828,6 +839,7 @@ static void shutdown_pirq(struct irq_data *data)
 	if (!VALID_EVTCHN(evtchn))
 		return;
 
+	info->masked = true;
 	mask_evtchn(evtchn);
 	xen_evtchn_close(evtchn);
 	xen_irq_info_cleanup(info);
@@ -1713,18 +1725,26 @@ EXPORT_SYMBOL_GPL(xen_set_affinity_evtchn);
 
 static void enable_dynirq(struct irq_data *data)
 {
-	evtchn_port_t evtchn = evtchn_from_irq(data->irq);
+	struct irq_info *info = info_for_irq(data->irq);
+	evtchn_port_t evtchn = info ? info->evtchn : 0;
 
-	if (VALID_EVTCHN(evtchn))
-		unmask_evtchn(evtchn);
+	if (VALID_EVTCHN(evtchn)) {
+		info->masked = false;
+
+		if (!info->eoi_pending)
+			unmask_evtchn(evtchn);
+	}
 }
 
 static void disable_dynirq(struct irq_data *data)
 {
-	evtchn_port_t evtchn = evtchn_from_irq(data->irq);
+	struct irq_info *info = info_for_irq(data->irq);
+	evtchn_port_t evtchn = info ? info->evtchn : 0;
 
-	if (VALID_EVTCHN(evtchn))
+	if (VALID_EVTCHN(evtchn)) {
+		info->masked = true;
 		mask_evtchn(evtchn);
+	}
 }
 
 static void ack_dynirq(struct irq_data *data)
@@ -1754,6 +1774,30 @@ static void mask_ack_dynirq(struct irq_data *data)
 	ack_dynirq(data);
 }
 
+static void lateeoi_ack_dynirq(struct irq_data *data)
+{
+	struct irq_info *info = info_for_irq(data->irq);
+	evtchn_port_t evtchn = info ? info->evtchn : 0;
+
+	if (VALID_EVTCHN(evtchn)) {
+		info->masked = false;
+		info->eoi_pending = true;
+		mask_evtchn(evtchn);
+	}
+}
+
+static void lateeoi_mask_ack_dynirq(struct irq_data *data)
+{
+	struct irq_info *info = info_for_irq(data->irq);
+	evtchn_port_t evtchn = info ? info->evtchn : 0;
+
+	if (VALID_EVTCHN(evtchn)) {
+		info->masked = true;
+		info->eoi_pending = true;
+		mask_evtchn(evtchn);
+	}
+}
+
 static int retrigger_dynirq(struct irq_data *data)
 {
 	evtchn_port_t evtchn = evtchn_from_irq(data->irq);
@@ -1973,8 +2017,8 @@ static struct irq_chip xen_lateeoi_chip __read_mostly = {
 	.irq_mask		= disable_dynirq,
 	.irq_unmask		= enable_dynirq,
 
-	.irq_ack		= mask_ack_dynirq,
-	.irq_mask_ack		= mask_ack_dynirq,
+	.irq_ack		= lateeoi_ack_dynirq,
+	.irq_mask_ack		= lateeoi_mask_ack_dynirq,
 
 	.irq_set_affinity	= set_affinity_irq,
 	.irq_retrigger		= retrigger_dynirq,
-- 
2.26.2


[-- Attachment #1.1.3: OpenPGP_0xB0DE9DD628BF132F.asc --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: xen/evtchn: Interrupt for port 34, but apparently not enabled; per-user 00000000a86a4c1b on 5.10
  2020-12-15 10:20   ` Jürgen Groß
@ 2020-12-18 13:35     ` Julien Grall
  2020-12-23 12:11     ` Julien Grall
  1 sibling, 0 replies; 6+ messages in thread
From: Julien Grall @ 2020-12-18 13:35 UTC (permalink / raw)
  To: Jürgen Groß, aams; +Cc: linux-kernel, xen-devel, foersleo

Hi Juergen,

On 15/12/2020 10:20, Jürgen Groß wrote:
> On 15.12.20 08:27, Jürgen Groß wrote:
>> On 14.12.20 22:25, Julien Grall wrote:
>>> Hi Juergen,
>>>
>>> When testing Linux 5.10 dom0, I could reliably hit the following 
>>> warning with using event 2L ABI:
>>>
>>> [  589.591737] Interrupt for port 34, but apparently not enabled; 
>>> per-user 00000000a86a4c1b
>>> [  589.593259] WARNING: CPU: 0 PID: 1111 at 
>>> /home/ANT.AMAZON.COM/jgrall/works/oss/linux/drivers/xen/evtchn.c:170 
>>> evtchn_interrupt+0xeb/0x100
>>> [  589.595514] Modules linked in:
>>> [  589.596145] CPU: 0 PID: 1111 Comm: qemu-system-i38 Tainted: G 
>>> W         5.10.0+ #180
>>> [  589.597708] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), 
>>> BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014
>>> [  589.599782] RIP: e030:evtchn_interrupt+0xeb/0x100
>>> [  589.600698] Code: 48 8d bb d8 01 00 00 ba 01 00 00 00 be 1d 00 00 
>>> 00 e8 d9 10 ca ff eb b2 8b 75 20 48 89 da 48 c7 c7 a8 31 3d 82 e8 65 
>>> 29 a0 ff <0f> 0b e9 42 ff ff ff 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 
>>> 00 00 0f
>>> [  589.604087] RSP: e02b:ffffc90040003e70 EFLAGS: 00010086
>>> [  589.605102] RAX: 0000000000000000 RBX: ffff888102091800 RCX: 
>>> 0000000000000027
>>> [  589.606445] RDX: 0000000000000000 RSI: ffff88817fe19150 RDI: 
>>> ffff88817fe19158
>>> [  589.607790] RBP: ffff88810f5ab980 R08: 0000000000000001 R09: 
>>> 0000000000328980
>>> [  589.609134] R10: 0000000000000000 R11: ffffc90040003c70 R12: 
>>> ffff888107fd3c00
>>> [  589.610484] R13: ffffc90040003ed4 R14: 0000000000000000 R15: 
>>> ffff88810f5ffd80
>>> [  589.611828] FS:  00007f960c4b8ac0(0000) GS:ffff88817fe00000(0000) 
>>> knlGS:0000000000000000
>>> [  589.613348] CS:  10000e030 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [  589.614525] CR2: 00007f17ee72e000 CR3: 000000010f5b6000 CR4: 
>>> 0000000000050660
>>> [  589.615874] Call Trace:
>>> [  589.616402]  <IRQ>
>>> [  589.616855]  __handle_irq_event_percpu+0x4e/0x2c0
>>> [  589.617784]  handle_irq_event_percpu+0x30/0x80
>>> [  589.618660]  handle_irq_event+0x3a/0x60
>>> [  589.619428]  handle_edge_irq+0x9b/0x1f0
>>> [  589.620209]  generic_handle_irq+0x4f/0x60
>>> [  589.621008]  evtchn_2l_handle_events+0x160/0x280
>>> [  589.621913]  __xen_evtchn_do_upcall+0x66/0xb0
>>> [  589.622767]  __xen_pv_evtchn_do_upcall+0x11/0x20
>>> [  589.623665]  asm_call_irq_on_stack+0x12/0x20
>>> [  589.624511]  </IRQ>
>>> [  589.624978]  xen_pv_evtchn_do_upcall+0x77/0xf0
>>> [  589.625848]  exc_xen_hypervisor_callback+0x8/0x10
>>>
>>> This can be reproduced when creating/destroying guest in a loop. 
>>> Although, I have struggled to reproduce it on a vanilla Xen.
>>>
>>> After several hours of debugging, I think I have found the root cause.
>>>
>>> While we only expect the unmask to happen when the event channel is 
>>> EOIed, there is an unmask happening as part of handle_edge_irq() 
>>> because the interrupt was seen as pending by another vCPU 
>>> (IRQS_PENDING is set).
>>>
>>> It turns out that the event channel is set for multiple vCPU is in 
>>> cpu_evtchn_mask. This is happening because the affinity is not 
>>> cleared when freeing an event channel.
>>>
>>> The implementation of evtchn_2l_handle_events() will look for all the 
>>> active interrupts for the current vCPU and later on clear the pending 
>>> bit (via the ack() callback). IOW, I believe, this is not an atomic 
>>> operation.
>>>
>>> Even if Xen will notify the event to a single vCPU, 
>>> evtchn_pending_sel may still be set on the other vCPU (thanks to a 
>>> different event channel). Therefore, there is a chance that two vCPUs 
>>> will try to handle the same interrupt.
>>>
>>> The IRQ handler handle_edge_irq() is able to deal with that and will 
>>> mask/unmask the interrupt. This will mess us with the lateeoi logic 
>>> (although, I managed to reproduce it once without XSA-332).
>>
>> Thanks for the analysis!
>>
>>> My initial idea to fix the problem was to switch the affinity from 
>>> CPU X to CPU0 when the event channel is freed.
>>>
>>> However, I am not sure this is enough because I haven't found 
>>> anything yet preventing a race between evtchn_2l_handle_events9) and 
>>> evtchn_2l_bind_vcpu().
>>>
>>> So maybe we want to introduce a refcounting (if there is nothing 
>>> provided by the IRQ framework) and only unmask when the counter drop 
>>> to 0.
>>>
>>> Any opinions?
>>
>> I think we don't need a refcount, but just the internal states "masked"
>> and "eoi_pending" and unmask only if both are false. "masked" will be
>> set when the event is being masked. When delivering a lateeoi irq
>> "eoi_pending" will be set and "masked "reset. "masked" will be reset
>> when a normal unmask is happening. And "eoi_pending" will be reset
>> when a lateeoi is signaled. Any reset of "masked" and "eoi_pending"
>> will check the other flag and do an unmask if both are false.
>>
>> I'll write a patch.
> 
> Julien, could you please test the attached (only build tested) patch?

Thank you writing the patches. I will aim to give a spin next week.

Cheers,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: xen/evtchn: Interrupt for port 34, but apparently not enabled; per-user 00000000a86a4c1b on 5.10
  2020-12-15 10:20   ` Jürgen Groß
  2020-12-18 13:35     ` Julien Grall
@ 2020-12-23 12:11     ` Julien Grall
  1 sibling, 0 replies; 6+ messages in thread
From: Julien Grall @ 2020-12-23 12:11 UTC (permalink / raw)
  To: Jürgen Groß, aams; +Cc: linux-kernel, xen-devel, foersleo

Hi Juergen,

On 15/12/2020 10:20, Jürgen Groß wrote:
> On 15.12.20 08:27, Jürgen Groß wrote:
>> On 14.12.20 22:25, Julien Grall wrote:
>>> Hi Juergen,
>>>
>>> When testing Linux 5.10 dom0, I could reliably hit the following 
>>> warning with using event 2L ABI:
>>>
>>> [  589.591737] Interrupt for port 34, but apparently not enabled; 
>>> per-user 00000000a86a4c1b
>>> [  589.593259] WARNING: CPU: 0 PID: 1111 at 
>>> /home/ANT.AMAZON.COM/jgrall/works/oss/linux/drivers/xen/evtchn.c:170 
>>> evtchn_interrupt+0xeb/0x100
>>> [  589.595514] Modules linked in:
>>> [  589.596145] CPU: 0 PID: 1111 Comm: qemu-system-i38 Tainted: G 
>>> W         5.10.0+ #180
>>> [  589.597708] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), 
>>> BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014
>>> [  589.599782] RIP: e030:evtchn_interrupt+0xeb/0x100
>>> [  589.600698] Code: 48 8d bb d8 01 00 00 ba 01 00 00 00 be 1d 00 00 
>>> 00 e8 d9 10 ca ff eb b2 8b 75 20 48 89 da 48 c7 c7 a8 31 3d 82 e8 65 
>>> 29 a0 ff <0f> 0b e9 42 ff ff ff 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 
>>> 00 00 0f
>>> [  589.604087] RSP: e02b:ffffc90040003e70 EFLAGS: 00010086
>>> [  589.605102] RAX: 0000000000000000 RBX: ffff888102091800 RCX: 
>>> 0000000000000027
>>> [  589.606445] RDX: 0000000000000000 RSI: ffff88817fe19150 RDI: 
>>> ffff88817fe19158
>>> [  589.607790] RBP: ffff88810f5ab980 R08: 0000000000000001 R09: 
>>> 0000000000328980
>>> [  589.609134] R10: 0000000000000000 R11: ffffc90040003c70 R12: 
>>> ffff888107fd3c00
>>> [  589.610484] R13: ffffc90040003ed4 R14: 0000000000000000 R15: 
>>> ffff88810f5ffd80
>>> [  589.611828] FS:  00007f960c4b8ac0(0000) GS:ffff88817fe00000(0000) 
>>> knlGS:0000000000000000
>>> [  589.613348] CS:  10000e030 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [  589.614525] CR2: 00007f17ee72e000 CR3: 000000010f5b6000 CR4: 
>>> 0000000000050660
>>> [  589.615874] Call Trace:
>>> [  589.616402]  <IRQ>
>>> [  589.616855]  __handle_irq_event_percpu+0x4e/0x2c0
>>> [  589.617784]  handle_irq_event_percpu+0x30/0x80
>>> [  589.618660]  handle_irq_event+0x3a/0x60
>>> [  589.619428]  handle_edge_irq+0x9b/0x1f0
>>> [  589.620209]  generic_handle_irq+0x4f/0x60
>>> [  589.621008]  evtchn_2l_handle_events+0x160/0x280
>>> [  589.621913]  __xen_evtchn_do_upcall+0x66/0xb0
>>> [  589.622767]  __xen_pv_evtchn_do_upcall+0x11/0x20
>>> [  589.623665]  asm_call_irq_on_stack+0x12/0x20
>>> [  589.624511]  </IRQ>
>>> [  589.624978]  xen_pv_evtchn_do_upcall+0x77/0xf0
>>> [  589.625848]  exc_xen_hypervisor_callback+0x8/0x10
>>>
>>> This can be reproduced when creating/destroying guest in a loop. 
>>> Although, I have struggled to reproduce it on a vanilla Xen.
>>>
>>> After several hours of debugging, I think I have found the root cause.
>>>
>>> While we only expect the unmask to happen when the event channel is 
>>> EOIed, there is an unmask happening as part of handle_edge_irq() 
>>> because the interrupt was seen as pending by another vCPU 
>>> (IRQS_PENDING is set).
>>>
>>> It turns out that the event channel is set for multiple vCPU is in 
>>> cpu_evtchn_mask. This is happening because the affinity is not 
>>> cleared when freeing an event channel.
>>>
>>> The implementation of evtchn_2l_handle_events() will look for all the 
>>> active interrupts for the current vCPU and later on clear the pending 
>>> bit (via the ack() callback). IOW, I believe, this is not an atomic 
>>> operation.
>>>
>>> Even if Xen will notify the event to a single vCPU, 
>>> evtchn_pending_sel may still be set on the other vCPU (thanks to a 
>>> different event channel). Therefore, there is a chance that two vCPUs 
>>> will try to handle the same interrupt.
>>>
>>> The IRQ handler handle_edge_irq() is able to deal with that and will 
>>> mask/unmask the interrupt. This will mess us with the lateeoi logic 
>>> (although, I managed to reproduce it once without XSA-332).
>>
>> Thanks for the analysis!
>>
>>> My initial idea to fix the problem was to switch the affinity from 
>>> CPU X to CPU0 when the event channel is freed.
>>>
>>> However, I am not sure this is enough because I haven't found 
>>> anything yet preventing a race between evtchn_2l_handle_events9) and 
>>> evtchn_2l_bind_vcpu().
>>>
>>> So maybe we want to introduce a refcounting (if there is nothing 
>>> provided by the IRQ framework) and only unmask when the counter drop 
>>> to 0.
>>>
>>> Any opinions?
>>
>> I think we don't need a refcount, but just the internal states "masked"
>> and "eoi_pending" and unmask only if both are false. "masked" will be
>> set when the event is being masked. When delivering a lateeoi irq
>> "eoi_pending" will be set and "masked "reset. "masked" will be reset
>> when a normal unmask is happening. And "eoi_pending" will be reset
>> when a lateeoi is signaled. Any reset of "masked" and "eoi_pending"
>> will check the other flag and do an unmask if both are false.
>>
>> I'll write a patch.
> 
> Julien, could you please test the attached (only build tested) patch?

I can boot dom0 and a guest. However, if I destroy the guest and create 
a new one, I will get hundreds of WARN() similar to the one I originally 
reported and the guest wouldn't boot.

The same issue can now be reproduced on a vanilla Xen 4.15 and Linux 
5.10 (no change expect your patch). I haven't looked at the code but it 
looks like to me the interrupt state is getting de-synchronized when 
re-used.

I also got the below splat once, so I am not entirely sure if this is 
related:

[   86.134903] ------------[ cut here ]------------
[   86.135950] WARNING: CPU: 0 PID: 904 at linux/kernel/softirq.c:175 
__local_bh_enable_ip+0x9a/0xd0
[   86.138232] Modules linked in:
[   86.138937] CPU: 0 PID: 904 Comm: xenstored Not tainted 5.10.0+ #183
[   86.140162] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[   86.142221] RIP: e030:__local_bh_enable_ip+0x9a/0xd0
[   86.143204] Code: 00 ff ff 00 74 3e 65 ff 0d 63 fa f2 7e e8 0e 15 13 
00 e8 59 55 f4 ff 66 90 48 83 c4 08 5b c3 65 8b 05 46 0a f3 7e 85 c0 75 
99 <0f> 0b eb 95 48 89 3c 24 e8 93
[   86.146587] RSP: e02b:ffffc90040003ad0 EFLAGS: 00010246
[   86.147632] RAX: 0000000000000000 RBX: 0000000000000200 RCX: 
ffffe8ffffc05000
[   86.148986] RDX: 00000000000000f7 RSI: 0000000000000200 RDI: 
ffffffff81a4e9c3
[   86.150354] RBP: ffff8881003124b0 R08: 0000000000000001 R09: 
0000000000000000
[   86.151722] R10: 0000000000000000 R11: 0000000000000000 R12: 
ffff88810d150600
[   86.153078] R13: ffff88810d2b3f8e R14: ffff8881030b0000 R15: 
ffffc90040003c10
[   86.154422] FS:  00007f39a12f5240(0000) GS:ffff88817fe00000(0000) 
knlGS:0000000000000000
[   86.155958] CS:  e030 DS: 0000 ES: 0000 CR0: 0000000080050033
[   86.157063] CR2: 00007f0754002858 CR3: 000000010e13a000 CR4: 
0000000000050660
[   86.158409] Call Trace:
[   86.158959]  <IRQ>
[   86.159425]  ? __local_bh_disable_ip+0x4b/0x60
[   86.160329]  ipt_do_table+0x36f/0x660
[   86.161079]  ? lock_acquire+0x252/0x3a0
[   86.161846]  ? ip_local_deliver+0x70/0x200
[   86.162664]  nf_hook_slow+0x43/0xb0
[   86.163398]  ip_local_deliver+0x15b/0x200
[   86.164200]  ? ip_protocol_deliver_rcu+0x270/0x270
[   86.165146]  ip_rcv+0x13a/0x210
[   86.165794]  ? __lock_acquire+0x2e2/0x1a30
[   86.166610]  ? lock_is_held_type+0xe9/0x110
[   86.167477]  __netif_receive_skb_core+0x414/0xf60
[   86.168468]  ? find_held_lock+0x2d/0x90
[   86.169279]  ? __netif_receive_skb_list_core+0x134/0x2d0
[   86.170353]  __netif_receive_skb_list_core+0x134/0x2d0
[   86.171373]  netif_receive_skb_list_internal+0x1ef/0x3c0
[   86.172402]  ? e1000_clean_rx_irq+0x338/0x3d0
[   86.173285]  gro_normal_list.part.149+0x19/0x40
[   86.174171]  napi_complete_done+0xf3/0x1a0
[   86.175013]  e1000e_poll+0xc9/0x2b0
[   86.175738]  net_rx_action+0x176/0x4e0
[   86.176500]  __do_softirq+0xd4/0x432
[   86.177230]  irq_exit_rcu+0xbc/0xc0
[   86.177946]  asm_call_irq_on_stack+0xf/0x20
[   86.178780]  </IRQ>
[   86.179267]  xen_pv_evtchn_do_upcall+0x77/0xf0
[   86.180169]  exc_xen_hypervisor_callback+0x8/0x10
[   86.181098] RIP: e030:xen_hypercall_domctl+0xa/0x20
[   86.182055] Code: 51 41 53 b8 23 00 00 00 0f 05 41 5b 59 c3 cc cc cc 
cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 51 41 53 b8 24 00 00 00 0f 
05 <41> 5b 59 c3 cc cc cc cc cc cc
[   86.185490] RSP: e02b:ffffc900407dbe18 EFLAGS: 00000282
[   86.186563] RAX: 0000000000000000 RBX: ffff888104f7b000 RCX: 
ffffffff8100248a
[   86.187973] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 
deadbeefdeadf00d
[   86.189381] RBP: ffffffffffffffff R08: 0000000000000000 R09: 
0000000000000000
[   86.190753] R10: 0000000000000000 R11: 0000000000000282 R12: 
0000000000305000
[   86.192113] R13: 00007ffdc8d6a5a0 R14: 0000000000000008 R15: 
0000000000000000
[   86.193473]  ? xen_hypercall_domctl+0xa/0x20
[   86.194324]  ? privcmd_ioctl+0x179/0xa80
[   86.195131]  ? common_file_perm+0x84/0x2c0
[   86.195956]  ? __x64_sys_ioctl+0x8e/0xd0
[   86.196744]  ? lockdep_hardirqs_on+0x4d/0xf0
[   86.197598]  ? do_syscall_64+0x33/0x80
[   86.198347]  ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   86.199402] irq event stamp: 39806070
[   86.200168] hardirqs last  enabled at (39806078): 
[<ffffffff8116c714>] console_unlock+0x4b4/0x5b0
[   86.201832] hardirqs last disabled at (39806085): 
[<ffffffff8116c670>] console_unlock+0x410/0x5b0
[   86.203508] softirqs last  enabled at (39685022): 
[<ffffffff81e0030f>] __do_softirq+0x30f/0x432
[   86.205140] softirqs last disabled at (39805575): 
[<ffffffff810e857c>] irq_exit_rcu+0xbc/0xc0
[   86.206739] ---[ end trace 178144c74d23e738 ]---

Cheers,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: xen/evtchn: Interrupt for port 34, but apparently not enabled; per-user 00000000a86a4c1b on 5.10
  2020-12-14 21:25 xen/evtchn: Interrupt for port 34, but apparently not enabled; per-user 00000000a86a4c1b on 5.10 Julien Grall
  2020-12-15  7:27 ` Jürgen Groß
@ 2021-02-04 12:30 ` Jürgen Groß
  1 sibling, 0 replies; 6+ messages in thread
From: Jürgen Groß @ 2021-02-04 12:30 UTC (permalink / raw)
  To: Julien Grall, aams; +Cc: linux-kernel, xen-devel, foersleo


[-- Attachment #1.1.1: Type: text/plain, Size: 4476 bytes --]

On 14.12.20 22:25, Julien Grall wrote:
> Hi Juergen,
> 
> When testing Linux 5.10 dom0, I could reliably hit the following warning 
> with using event 2L ABI:
> 
> [  589.591737] Interrupt for port 34, but apparently not enabled; 
> per-user 00000000a86a4c1b
> [  589.593259] WARNING: CPU: 0 PID: 1111 at 
> /home/ANT.AMAZON.COM/jgrall/works/oss/linux/drivers/xen/evtchn.c:170 
> evtchn_interrupt+0xeb/0x100
> [  589.595514] Modules linked in:
> [  589.596145] CPU: 0 PID: 1111 Comm: qemu-system-i38 Tainted: G 
> W         5.10.0+ #180
> [  589.597708] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
> rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014
> [  589.599782] RIP: e030:evtchn_interrupt+0xeb/0x100
> [  589.600698] Code: 48 8d bb d8 01 00 00 ba 01 00 00 00 be 1d 00 00 00 
> e8 d9 10 ca ff eb b2 8b 75 20 48 89 da 48 c7 c7 a8 31 3d 82 e8 65 29 a0 
> ff <0f> 0b e9 42 ff ff ff 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f
> [  589.604087] RSP: e02b:ffffc90040003e70 EFLAGS: 00010086
> [  589.605102] RAX: 0000000000000000 RBX: ffff888102091800 RCX: 
> 0000000000000027
> [  589.606445] RDX: 0000000000000000 RSI: ffff88817fe19150 RDI: 
> ffff88817fe19158
> [  589.607790] RBP: ffff88810f5ab980 R08: 0000000000000001 R09: 
> 0000000000328980
> [  589.609134] R10: 0000000000000000 R11: ffffc90040003c70 R12: 
> ffff888107fd3c00
> [  589.610484] R13: ffffc90040003ed4 R14: 0000000000000000 R15: 
> ffff88810f5ffd80
> [  589.611828] FS:  00007f960c4b8ac0(0000) GS:ffff88817fe00000(0000) 
> knlGS:0000000000000000
> [  589.613348] CS:  10000e030 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  589.614525] CR2: 00007f17ee72e000 CR3: 000000010f5b6000 CR4: 
> 0000000000050660
> [  589.615874] Call Trace:
> [  589.616402]  <IRQ>
> [  589.616855]  __handle_irq_event_percpu+0x4e/0x2c0
> [  589.617784]  handle_irq_event_percpu+0x30/0x80
> [  589.618660]  handle_irq_event+0x3a/0x60
> [  589.619428]  handle_edge_irq+0x9b/0x1f0
> [  589.620209]  generic_handle_irq+0x4f/0x60
> [  589.621008]  evtchn_2l_handle_events+0x160/0x280
> [  589.621913]  __xen_evtchn_do_upcall+0x66/0xb0
> [  589.622767]  __xen_pv_evtchn_do_upcall+0x11/0x20
> [  589.623665]  asm_call_irq_on_stack+0x12/0x20
> [  589.624511]  </IRQ>
> [  589.624978]  xen_pv_evtchn_do_upcall+0x77/0xf0
> [  589.625848]  exc_xen_hypervisor_callback+0x8/0x10
> 
> This can be reproduced when creating/destroying guest in a loop. 
> Although, I have struggled to reproduce it on a vanilla Xen.
> 
> After several hours of debugging, I think I have found the root cause.
> 
> While we only expect the unmask to happen when the event channel is 
> EOIed, there is an unmask happening as part of handle_edge_irq() because 
> the interrupt was seen as pending by another vCPU (IRQS_PENDING is set).
> 
> It turns out that the event channel is set for multiple vCPU is in 
> cpu_evtchn_mask. This is happening because the affinity is not cleared 
> when freeing an event channel.
> 
> The implementation of evtchn_2l_handle_events() will look for all the 
> active interrupts for the current vCPU and later on clear the pending 
> bit (via the ack() callback). IOW, I believe, this is not an atomic 
> operation.
> 
> Even if Xen will notify the event to a single vCPU, evtchn_pending_sel 
> may still be set on the other vCPU (thanks to a different event 
> channel). Therefore, there is a chance that two vCPUs will try to handle 
> the same interrupt.
> 
> The IRQ handler handle_edge_irq() is able to deal with that and will 
> mask/unmask the interrupt. This will mess us with the lateeoi logic 
> (although, I managed to reproduce it once without XSA-332).
> 
> My initial idea to fix the problem was to switch the affinity from CPU X 
> to CPU0 when the event channel is freed.
> 
> However, I am not sure this is enough because I haven't found anything 
> yet preventing a race between evtchn_2l_handle_events9) and 
> evtchn_2l_bind_vcpu().
> 
> So maybe we want to introduce a refcounting (if there is nothing 
> provided by the IRQ framework) and only unmask when the counter drop to 0.
> 
> Any opinions?

With the two attached patches testing on my side survived more than 2
hours of constant guest reboots and destroy/create loops. Without the
patches the WARN()s came up after less than one minute.

Can you please give it a try?


Juergen


[-- Attachment #1.1.2: 0001-xen-events-reset-affinity-of-2-level-event-initially.patch --]
[-- Type: text/x-patch, Size: 2506 bytes --]

From 908940c92fb916146a7ce24bc41a18125967c54a Mon Sep 17 00:00:00 2001
From: Juergen Gross <jgross@suse.com>
Date: Wed, 3 Feb 2021 16:24:42 +0100
Subject: [PATCH 1/2] xen/events: reset affinity of 2-level event initially

When creating a new event channel with 2-level events the affinity
needs to be reset initially in order to avoid using an old affinity
from earlier usage of the event channel port.

The same applies to the affinity when onlining a vcpu: all old
affinity settings for this vcpu must be reset. As percpu events get
initialized before the percpu event channel hook is called,
resetting of the affinities happens after offlining a vcpu (this is
working, as initial percpu memory is zeroed out).

Cc: stable@vger.kernel.org
Reported-by: Julien Grall <julien@xen.org>
Signed-off-by: Juergen Gross <jgross@suse.com>
---
 drivers/xen/events/events_2l.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/drivers/xen/events/events_2l.c b/drivers/xen/events/events_2l.c
index da87f3a1e351..23217940144a 100644
--- a/drivers/xen/events/events_2l.c
+++ b/drivers/xen/events/events_2l.c
@@ -47,6 +47,16 @@ static unsigned evtchn_2l_max_channels(void)
 	return EVTCHN_2L_NR_CHANNELS;
 }
 
+static int evtchn_2l_setup(evtchn_port_t evtchn)
+{
+	unsigned int cpu;
+
+	for_each_online_cpu(cpu)
+		clear_bit(evtchn, BM(per_cpu(cpu_evtchn_mask, cpu)));
+
+	return 0;
+}
+
 static void evtchn_2l_bind_to_cpu(evtchn_port_t evtchn, unsigned int cpu,
 				  unsigned int old_cpu)
 {
@@ -355,9 +365,18 @@ static void evtchn_2l_resume(void)
 				EVTCHN_2L_NR_CHANNELS/BITS_PER_EVTCHN_WORD);
 }
 
+static int evtchn_2l_percpu_deinit(unsigned int cpu)
+{
+	memset(per_cpu(cpu_evtchn_mask, cpu), 0, sizeof(xen_ulong_t) *
+			EVTCHN_2L_NR_CHANNELS/BITS_PER_EVTCHN_WORD);
+
+	return 0;
+}
+
 static const struct evtchn_ops evtchn_ops_2l = {
 	.max_channels      = evtchn_2l_max_channels,
 	.nr_channels       = evtchn_2l_max_channels,
+	.setup             = evtchn_2l_setup,
 	.bind_to_cpu       = evtchn_2l_bind_to_cpu,
 	.clear_pending     = evtchn_2l_clear_pending,
 	.set_pending       = evtchn_2l_set_pending,
@@ -367,6 +386,7 @@ static const struct evtchn_ops evtchn_ops_2l = {
 	.unmask            = evtchn_2l_unmask,
 	.handle_events     = evtchn_2l_handle_events,
 	.resume	           = evtchn_2l_resume,
+	.percpu_deinit     = evtchn_2l_percpu_deinit,
 };
 
 void __init xen_evtchn_2l_init(void)
-- 
2.26.2


[-- Attachment #1.1.3: 0002-xen-events-don-t-unmask-an-event-channel-when-an-eoi.patch --]
[-- Type: text/x-patch, Size: 4649 bytes --]

From 1f59b7827f734f2d86cff149ea8d297944e136d1 Mon Sep 17 00:00:00 2001
From: Juergen Gross <jgross@suse.com>
Date: Tue, 15 Dec 2020 10:37:11 +0100
Subject: [PATCH 2/2] xen/events: don't unmask an event channel when an eoi is
 pending

An event channel should be kept masked when an eoi is pending for it.
When being migrated to another cpu it might be unmasked, though.

In order to avoid this keep two different flags for each event channel
to be able to distinguish "normal" masking/unmasking from eoi related
masking/unmasking. The event channel should only be able to generate
an interrupt if both flags are cleared.

Cc: stable@vger.kernel.org
Fixes: 54c9de89895e0a36047 ("xen/events: add a new late EOI evtchn framework")
Reported-by: Julien Grall <julien@xen.org>
Signed-off-by: Juergen Gross <jgross@suse.com>
---
 drivers/xen/events/events_base.c | 63 +++++++++++++++++++++++++++-----
 1 file changed, 53 insertions(+), 10 deletions(-)

diff --git a/drivers/xen/events/events_base.c b/drivers/xen/events/events_base.c
index e850f79351cb..6a836d131e73 100644
--- a/drivers/xen/events/events_base.c
+++ b/drivers/xen/events/events_base.c
@@ -97,7 +97,9 @@ struct irq_info {
 	short refcnt;
 	u8 spurious_cnt;
 	u8 is_accounted;
-	enum xen_irq_type type; /* type */
+	short type;		/* type: IRQT_* */
+	bool masked;		/* Is event explicitly masked? */
+	bool eoi_pending;	/* Is EOI pending? */
 	unsigned irq;
 	evtchn_port_t evtchn;   /* event channel */
 	unsigned short cpu;     /* cpu bound */
@@ -302,6 +304,8 @@ static int xen_irq_info_common_setup(struct irq_info *info,
 	info->irq = irq;
 	info->evtchn = evtchn;
 	info->cpu = cpu;
+	info->masked = true;
+	info->eoi_pending = false;
 
 	ret = set_evtchn_to_irq(evtchn, irq);
 	if (ret < 0)
@@ -585,7 +589,10 @@ static void xen_irq_lateeoi_locked(struct irq_info *info, bool spurious)
 	}
 
 	info->eoi_time = 0;
-	unmask_evtchn(evtchn);
+	info->eoi_pending = false;
+
+	if (!info->masked)
+		unmask_evtchn(evtchn);
 }
 
 static void xen_irq_lateeoi_worker(struct work_struct *work)
@@ -830,7 +837,11 @@ static unsigned int __startup_pirq(unsigned int irq)
 		goto err;
 
 out:
-	unmask_evtchn(evtchn);
+	info->masked = false;
+
+	if (!info->eoi_pending)
+		unmask_evtchn(evtchn);
+
 	eoi_pirq(irq_get_irq_data(irq));
 
 	return 0;
@@ -857,6 +868,7 @@ static void shutdown_pirq(struct irq_data *data)
 	if (!VALID_EVTCHN(evtchn))
 		return;
 
+	info->masked = true;
 	mask_evtchn(evtchn);
 	xen_evtchn_close(evtchn);
 	xen_irq_info_cleanup(info);
@@ -1768,18 +1780,26 @@ static int set_affinity_irq(struct irq_data *data, const struct cpumask *dest,
 
 static void enable_dynirq(struct irq_data *data)
 {
-	evtchn_port_t evtchn = evtchn_from_irq(data->irq);
+	struct irq_info *info = info_for_irq(data->irq);
+	evtchn_port_t evtchn = info ? info->evtchn : 0;
 
-	if (VALID_EVTCHN(evtchn))
-		unmask_evtchn(evtchn);
+	if (VALID_EVTCHN(evtchn)) {
+		info->masked = false;
+
+		if (!info->eoi_pending)
+			unmask_evtchn(evtchn);
+	}
 }
 
 static void disable_dynirq(struct irq_data *data)
 {
-	evtchn_port_t evtchn = evtchn_from_irq(data->irq);
+	struct irq_info *info = info_for_irq(data->irq);
+	evtchn_port_t evtchn = info ? info->evtchn : 0;
 
-	if (VALID_EVTCHN(evtchn))
+	if (VALID_EVTCHN(evtchn)) {
+		info->masked = true;
 		mask_evtchn(evtchn);
+	}
 }
 
 static void ack_dynirq(struct irq_data *data)
@@ -1798,6 +1818,29 @@ static void mask_ack_dynirq(struct irq_data *data)
 	ack_dynirq(data);
 }
 
+static void lateeoi_ack_dynirq(struct irq_data *data)
+{
+	struct irq_info *info = info_for_irq(data->irq);
+	evtchn_port_t evtchn = info ? info->evtchn : 0;
+
+	if (VALID_EVTCHN(evtchn)) {
+		info->eoi_pending = true;
+		mask_evtchn(evtchn);
+	}
+}
+
+static void lateeoi_mask_ack_dynirq(struct irq_data *data)
+{
+	struct irq_info *info = info_for_irq(data->irq);
+	evtchn_port_t evtchn = info ? info->evtchn : 0;
+
+	if (VALID_EVTCHN(evtchn)) {
+		info->masked = true;
+		info->eoi_pending = true;
+		mask_evtchn(evtchn);
+	}
+}
+
 static int retrigger_dynirq(struct irq_data *data)
 {
 	evtchn_port_t evtchn = evtchn_from_irq(data->irq);
@@ -2023,8 +2066,8 @@ static struct irq_chip xen_lateeoi_chip __read_mostly = {
 	.irq_mask		= disable_dynirq,
 	.irq_unmask		= enable_dynirq,
 
-	.irq_ack		= mask_ack_dynirq,
-	.irq_mask_ack		= mask_ack_dynirq,
+	.irq_ack		= lateeoi_ack_dynirq,
+	.irq_mask_ack		= lateeoi_mask_ack_dynirq,
 
 	.irq_set_affinity	= set_affinity_irq,
 	.irq_retrigger		= retrigger_dynirq,
-- 
2.26.2


[-- Attachment #1.1.4: OpenPGP_0xB0DE9DD628BF132F.asc --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-02-04 12:31 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-14 21:25 xen/evtchn: Interrupt for port 34, but apparently not enabled; per-user 00000000a86a4c1b on 5.10 Julien Grall
2020-12-15  7:27 ` Jürgen Groß
2020-12-15 10:20   ` Jürgen Groß
2020-12-18 13:35     ` Julien Grall
2020-12-23 12:11     ` Julien Grall
2021-02-04 12:30 ` Jürgen Groß

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.