All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/7] xen/events: bug fixes and some diagnostic aids
@ 2021-02-06 10:49 Juergen Gross
  2021-02-06 10:49 ` [PATCH 1/7] xen/events: reset affinity of 2-level event initially Juergen Gross
                   ` (7 more replies)
  0 siblings, 8 replies; 53+ messages in thread
From: Juergen Gross @ 2021-02-06 10:49 UTC (permalink / raw)
  To: xen-devel, linux-kernel, linux-block, netdev, linux-scsi
  Cc: Juergen Gross, Boris Ostrovsky, Stefano Stabellini, stable,
	Konrad Rzeszutek Wilk, Roger Pau Monné,
	Jens Axboe, Wei Liu, Paul Durrant, David S. Miller,
	Jakub Kicinski

The first three patches are fixes for XSA-332. The avoid WARN splats
and a performance issue with interdomain events.

Patches 4 and 5 are some additions to event handling in order to add
some per pv-device statistics to sysfs and the ability to have a per
backend device spurious event delay control.

Patches 6 and 7 are minor fixes I had lying around.

Juergen Gross (7):
  xen/events: reset affinity of 2-level event initially
  xen/events: don't unmask an event channel when an eoi is pending
  xen/events: fix lateeoi irq acknowledgement
  xen/events: link interdomain events to associated xenbus device
  xen/events: add per-xenbus device event statistics and settings
  xen/evtch: use smp barriers for user event ring
  xen/evtchn: read producer index only once

 drivers/block/xen-blkback/xenbus.c  |   2 +-
 drivers/net/xen-netback/interface.c |  16 ++--
 drivers/xen/events/events_2l.c      |  20 +++++
 drivers/xen/events/events_base.c    | 133 ++++++++++++++++++++++------
 drivers/xen/evtchn.c                |   6 +-
 drivers/xen/pvcalls-back.c          |   4 +-
 drivers/xen/xen-pciback/xenbus.c    |   2 +-
 drivers/xen/xen-scsiback.c          |   2 +-
 drivers/xen/xenbus/xenbus_probe.c   |  66 ++++++++++++++
 include/xen/events.h                |   7 +-
 include/xen/xenbus.h                |   7 ++
 11 files changed, 217 insertions(+), 48 deletions(-)

-- 
2.26.2


^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH 1/7] xen/events: reset affinity of 2-level event initially
  2021-02-06 10:49 [PATCH 0/7] xen/events: bug fixes and some diagnostic aids Juergen Gross
@ 2021-02-06 10:49 ` Juergen Gross
  2021-02-06 11:20   ` Julien Grall
  2021-02-06 10:49 ` [PATCH 2/7] xen/events: don't unmask an event channel when an eoi is pending Juergen Gross
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 53+ messages in thread
From: Juergen Gross @ 2021-02-06 10:49 UTC (permalink / raw)
  To: xen-devel, linux-kernel
  Cc: Juergen Gross, Boris Ostrovsky, Stefano Stabellini, stable, Julien Grall

When creating a new event channel with 2-level events the affinity
needs to be reset initially in order to avoid using an old affinity
from earlier usage of the event channel port.

The same applies to the affinity when onlining a vcpu: all old
affinity settings for this vcpu must be reset. As percpu events get
initialized before the percpu event channel hook is called,
resetting of the affinities happens after offlining a vcpu (this is
working, as initial percpu memory is zeroed out).

Cc: stable@vger.kernel.org
Reported-by: Julien Grall <julien@xen.org>
Signed-off-by: Juergen Gross <jgross@suse.com>
---
 drivers/xen/events/events_2l.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/drivers/xen/events/events_2l.c b/drivers/xen/events/events_2l.c
index da87f3a1e351..23217940144a 100644
--- a/drivers/xen/events/events_2l.c
+++ b/drivers/xen/events/events_2l.c
@@ -47,6 +47,16 @@ static unsigned evtchn_2l_max_channels(void)
 	return EVTCHN_2L_NR_CHANNELS;
 }
 
+static int evtchn_2l_setup(evtchn_port_t evtchn)
+{
+	unsigned int cpu;
+
+	for_each_online_cpu(cpu)
+		clear_bit(evtchn, BM(per_cpu(cpu_evtchn_mask, cpu)));
+
+	return 0;
+}
+
 static void evtchn_2l_bind_to_cpu(evtchn_port_t evtchn, unsigned int cpu,
 				  unsigned int old_cpu)
 {
@@ -355,9 +365,18 @@ static void evtchn_2l_resume(void)
 				EVTCHN_2L_NR_CHANNELS/BITS_PER_EVTCHN_WORD);
 }
 
+static int evtchn_2l_percpu_deinit(unsigned int cpu)
+{
+	memset(per_cpu(cpu_evtchn_mask, cpu), 0, sizeof(xen_ulong_t) *
+			EVTCHN_2L_NR_CHANNELS/BITS_PER_EVTCHN_WORD);
+
+	return 0;
+}
+
 static const struct evtchn_ops evtchn_ops_2l = {
 	.max_channels      = evtchn_2l_max_channels,
 	.nr_channels       = evtchn_2l_max_channels,
+	.setup             = evtchn_2l_setup,
 	.bind_to_cpu       = evtchn_2l_bind_to_cpu,
 	.clear_pending     = evtchn_2l_clear_pending,
 	.set_pending       = evtchn_2l_set_pending,
@@ -367,6 +386,7 @@ static const struct evtchn_ops evtchn_ops_2l = {
 	.unmask            = evtchn_2l_unmask,
 	.handle_events     = evtchn_2l_handle_events,
 	.resume	           = evtchn_2l_resume,
+	.percpu_deinit     = evtchn_2l_percpu_deinit,
 };
 
 void __init xen_evtchn_2l_init(void)
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 2/7] xen/events: don't unmask an event channel when an eoi is pending
  2021-02-06 10:49 [PATCH 0/7] xen/events: bug fixes and some diagnostic aids Juergen Gross
  2021-02-06 10:49 ` [PATCH 1/7] xen/events: reset affinity of 2-level event initially Juergen Gross
@ 2021-02-06 10:49 ` Juergen Gross
  2021-02-08 10:06   ` Jan Beulich
  2021-02-08 10:15   ` Ross Lagerwall
  2021-02-06 10:49 ` [PATCH 3/7] xen/events: fix lateeoi irq acknowledgment Juergen Gross
                   ` (5 subsequent siblings)
  7 siblings, 2 replies; 53+ messages in thread
From: Juergen Gross @ 2021-02-06 10:49 UTC (permalink / raw)
  To: xen-devel, linux-kernel
  Cc: Juergen Gross, Boris Ostrovsky, Stefano Stabellini, stable, Julien Grall

An event channel should be kept masked when an eoi is pending for it.
When being migrated to another cpu it might be unmasked, though.

In order to avoid this keep two different flags for each event channel
to be able to distinguish "normal" masking/unmasking from eoi related
masking/unmasking. The event channel should only be able to generate
an interrupt if both flags are cleared.

Cc: stable@vger.kernel.org
Fixes: 54c9de89895e0a36047 ("xen/events: add a new late EOI evtchn framework")
Reported-by: Julien Grall <julien@xen.org>
Signed-off-by: Juergen Gross <jgross@suse.com>
---
 drivers/xen/events/events_base.c | 63 +++++++++++++++++++++++++++-----
 1 file changed, 53 insertions(+), 10 deletions(-)

diff --git a/drivers/xen/events/events_base.c b/drivers/xen/events/events_base.c
index e850f79351cb..6a836d131e73 100644
--- a/drivers/xen/events/events_base.c
+++ b/drivers/xen/events/events_base.c
@@ -97,7 +97,9 @@ struct irq_info {
 	short refcnt;
 	u8 spurious_cnt;
 	u8 is_accounted;
-	enum xen_irq_type type; /* type */
+	short type;		/* type: IRQT_* */
+	bool masked;		/* Is event explicitly masked? */
+	bool eoi_pending;	/* Is EOI pending? */
 	unsigned irq;
 	evtchn_port_t evtchn;   /* event channel */
 	unsigned short cpu;     /* cpu bound */
@@ -302,6 +304,8 @@ static int xen_irq_info_common_setup(struct irq_info *info,
 	info->irq = irq;
 	info->evtchn = evtchn;
 	info->cpu = cpu;
+	info->masked = true;
+	info->eoi_pending = false;
 
 	ret = set_evtchn_to_irq(evtchn, irq);
 	if (ret < 0)
@@ -585,7 +589,10 @@ static void xen_irq_lateeoi_locked(struct irq_info *info, bool spurious)
 	}
 
 	info->eoi_time = 0;
-	unmask_evtchn(evtchn);
+	info->eoi_pending = false;
+
+	if (!info->masked)
+		unmask_evtchn(evtchn);
 }
 
 static void xen_irq_lateeoi_worker(struct work_struct *work)
@@ -830,7 +837,11 @@ static unsigned int __startup_pirq(unsigned int irq)
 		goto err;
 
 out:
-	unmask_evtchn(evtchn);
+	info->masked = false;
+
+	if (!info->eoi_pending)
+		unmask_evtchn(evtchn);
+
 	eoi_pirq(irq_get_irq_data(irq));
 
 	return 0;
@@ -857,6 +868,7 @@ static void shutdown_pirq(struct irq_data *data)
 	if (!VALID_EVTCHN(evtchn))
 		return;
 
+	info->masked = true;
 	mask_evtchn(evtchn);
 	xen_evtchn_close(evtchn);
 	xen_irq_info_cleanup(info);
@@ -1768,18 +1780,26 @@ static int set_affinity_irq(struct irq_data *data, const struct cpumask *dest,
 
 static void enable_dynirq(struct irq_data *data)
 {
-	evtchn_port_t evtchn = evtchn_from_irq(data->irq);
+	struct irq_info *info = info_for_irq(data->irq);
+	evtchn_port_t evtchn = info ? info->evtchn : 0;
 
-	if (VALID_EVTCHN(evtchn))
-		unmask_evtchn(evtchn);
+	if (VALID_EVTCHN(evtchn)) {
+		info->masked = false;
+
+		if (!info->eoi_pending)
+			unmask_evtchn(evtchn);
+	}
 }
 
 static void disable_dynirq(struct irq_data *data)
 {
-	evtchn_port_t evtchn = evtchn_from_irq(data->irq);
+	struct irq_info *info = info_for_irq(data->irq);
+	evtchn_port_t evtchn = info ? info->evtchn : 0;
 
-	if (VALID_EVTCHN(evtchn))
+	if (VALID_EVTCHN(evtchn)) {
+		info->masked = true;
 		mask_evtchn(evtchn);
+	}
 }
 
 static void ack_dynirq(struct irq_data *data)
@@ -1798,6 +1818,29 @@ static void mask_ack_dynirq(struct irq_data *data)
 	ack_dynirq(data);
 }
 
+static void lateeoi_ack_dynirq(struct irq_data *data)
+{
+	struct irq_info *info = info_for_irq(data->irq);
+	evtchn_port_t evtchn = info ? info->evtchn : 0;
+
+	if (VALID_EVTCHN(evtchn)) {
+		info->eoi_pending = true;
+		mask_evtchn(evtchn);
+	}
+}
+
+static void lateeoi_mask_ack_dynirq(struct irq_data *data)
+{
+	struct irq_info *info = info_for_irq(data->irq);
+	evtchn_port_t evtchn = info ? info->evtchn : 0;
+
+	if (VALID_EVTCHN(evtchn)) {
+		info->masked = true;
+		info->eoi_pending = true;
+		mask_evtchn(evtchn);
+	}
+}
+
 static int retrigger_dynirq(struct irq_data *data)
 {
 	evtchn_port_t evtchn = evtchn_from_irq(data->irq);
@@ -2023,8 +2066,8 @@ static struct irq_chip xen_lateeoi_chip __read_mostly = {
 	.irq_mask		= disable_dynirq,
 	.irq_unmask		= enable_dynirq,
 
-	.irq_ack		= mask_ack_dynirq,
-	.irq_mask_ack		= mask_ack_dynirq,
+	.irq_ack		= lateeoi_ack_dynirq,
+	.irq_mask_ack		= lateeoi_mask_ack_dynirq,
 
 	.irq_set_affinity	= set_affinity_irq,
 	.irq_retrigger		= retrigger_dynirq,
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 3/7] xen/events: fix lateeoi irq acknowledgment
  2021-02-06 10:49 [PATCH 0/7] xen/events: bug fixes and some diagnostic aids Juergen Gross
  2021-02-06 10:49 ` [PATCH 1/7] xen/events: reset affinity of 2-level event initially Juergen Gross
  2021-02-06 10:49 ` [PATCH 2/7] xen/events: don't unmask an event channel when an eoi is pending Juergen Gross
@ 2021-02-06 10:49 ` Juergen Gross
  2021-02-06 10:49 ` [PATCH 4/7] xen/events: link interdomain events to associated xenbus device Juergen Gross
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 53+ messages in thread
From: Juergen Gross @ 2021-02-06 10:49 UTC (permalink / raw)
  To: xen-devel, linux-kernel
  Cc: Juergen Gross, Boris Ostrovsky, Stefano Stabellini, stable

When having accepted an irq as result from receiving an event the
related event should be cleared. The lateeoi model is missing that,
resulting in a continuous stream of events being signalled.

Fixes: 54c9de89895e0a ("xen/events: add a new late EOI evtchn framework")
Cc: stable@vger.kernel.org
Signed-off-by: Juergen Gross <jgross@suse.com>
---
 drivers/xen/events/events_base.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/xen/events/events_base.c b/drivers/xen/events/events_base.c
index 6a836d131e73..7b26ef817f8b 100644
--- a/drivers/xen/events/events_base.c
+++ b/drivers/xen/events/events_base.c
@@ -1826,6 +1826,7 @@ static void lateeoi_ack_dynirq(struct irq_data *data)
 	if (VALID_EVTCHN(evtchn)) {
 		info->eoi_pending = true;
 		mask_evtchn(evtchn);
+		clear_evtchn(evtchn);
 	}
 }
 
@@ -1838,6 +1839,7 @@ static void lateeoi_mask_ack_dynirq(struct irq_data *data)
 		info->masked = true;
 		info->eoi_pending = true;
 		mask_evtchn(evtchn);
+		clear_evtchn(evtchn);
 	}
 }
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 4/7] xen/events: link interdomain events to associated xenbus device
  2021-02-06 10:49 [PATCH 0/7] xen/events: bug fixes and some diagnostic aids Juergen Gross
                   ` (2 preceding siblings ...)
  2021-02-06 10:49 ` [PATCH 3/7] xen/events: fix lateeoi irq acknowledgment Juergen Gross
@ 2021-02-06 10:49 ` Juergen Gross
  2021-02-08 23:26   ` Boris Ostrovsky
  2021-02-09 13:55   ` Wei Liu
  2021-02-06 10:49 ` [PATCH 5/7] xen/events: add per-xenbus device event statistics and settings Juergen Gross
                   ` (3 subsequent siblings)
  7 siblings, 2 replies; 53+ messages in thread
From: Juergen Gross @ 2021-02-06 10:49 UTC (permalink / raw)
  To: xen-devel, linux-block, linux-kernel, netdev, linux-scsi
  Cc: Juergen Gross, Konrad Rzeszutek Wilk, Roger Pau Monné,
	Jens Axboe, Wei Liu, Paul Durrant, David S. Miller,
	Jakub Kicinski, Boris Ostrovsky, Stefano Stabellini

In order to support the possibility of per-device event channel
settings (e.g. lateeoi spurious event thresholds) add a xenbus device
pointer to struct irq_info() and modify the related event channel
binding interfaces to take the pointer to the xenbus device as a
parameter instead of the domain id of the other side.

While at it remove the stale prototype of bind_evtchn_to_irq_lateeoi().

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 drivers/block/xen-blkback/xenbus.c  |  2 +-
 drivers/net/xen-netback/interface.c | 16 +++++------
 drivers/xen/events/events_base.c    | 41 +++++++++++++++++------------
 drivers/xen/pvcalls-back.c          |  4 +--
 drivers/xen/xen-pciback/xenbus.c    |  2 +-
 drivers/xen/xen-scsiback.c          |  2 +-
 include/xen/events.h                |  7 ++---
 7 files changed, 41 insertions(+), 33 deletions(-)

diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c
index 9860d4842f36..c2aaf690352c 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -245,7 +245,7 @@ static int xen_blkif_map(struct xen_blkif_ring *ring, grant_ref_t *gref,
 	if (req_prod - rsp_prod > size)
 		goto fail;
 
-	err = bind_interdomain_evtchn_to_irqhandler_lateeoi(blkif->domid,
+	err = bind_interdomain_evtchn_to_irqhandler_lateeoi(blkif->be->dev,
 			evtchn, xen_blkif_be_int, 0, "blkif-backend", ring);
 	if (err < 0)
 		goto fail;
diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
index acb786d8b1d8..494b4330a4ea 100644
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@@ -628,13 +628,13 @@ int xenvif_connect_ctrl(struct xenvif *vif, grant_ref_t ring_ref,
 			unsigned int evtchn)
 {
 	struct net_device *dev = vif->dev;
+	struct xenbus_device *xendev = xenvif_to_xenbus_device(vif);
 	void *addr;
 	struct xen_netif_ctrl_sring *shared;
 	RING_IDX rsp_prod, req_prod;
 	int err;
 
-	err = xenbus_map_ring_valloc(xenvif_to_xenbus_device(vif),
-				     &ring_ref, 1, &addr);
+	err = xenbus_map_ring_valloc(xendev, &ring_ref, 1, &addr);
 	if (err)
 		goto err;
 
@@ -648,7 +648,7 @@ int xenvif_connect_ctrl(struct xenvif *vif, grant_ref_t ring_ref,
 	if (req_prod - rsp_prod > RING_SIZE(&vif->ctrl))
 		goto err_unmap;
 
-	err = bind_interdomain_evtchn_to_irq_lateeoi(vif->domid, evtchn);
+	err = bind_interdomain_evtchn_to_irq_lateeoi(xendev, evtchn);
 	if (err < 0)
 		goto err_unmap;
 
@@ -671,8 +671,7 @@ int xenvif_connect_ctrl(struct xenvif *vif, grant_ref_t ring_ref,
 	vif->ctrl_irq = 0;
 
 err_unmap:
-	xenbus_unmap_ring_vfree(xenvif_to_xenbus_device(vif),
-				vif->ctrl.sring);
+	xenbus_unmap_ring_vfree(xendev, vif->ctrl.sring);
 	vif->ctrl.sring = NULL;
 
 err:
@@ -717,6 +716,7 @@ int xenvif_connect_data(struct xenvif_queue *queue,
 			unsigned int tx_evtchn,
 			unsigned int rx_evtchn)
 {
+	struct xenbus_device *dev = xenvif_to_xenbus_device(queue->vif);
 	struct task_struct *task;
 	int err;
 
@@ -753,7 +753,7 @@ int xenvif_connect_data(struct xenvif_queue *queue,
 	if (tx_evtchn == rx_evtchn) {
 		/* feature-split-event-channels == 0 */
 		err = bind_interdomain_evtchn_to_irqhandler_lateeoi(
-			queue->vif->domid, tx_evtchn, xenvif_interrupt, 0,
+			dev, tx_evtchn, xenvif_interrupt, 0,
 			queue->name, queue);
 		if (err < 0)
 			goto err;
@@ -764,7 +764,7 @@ int xenvif_connect_data(struct xenvif_queue *queue,
 		snprintf(queue->tx_irq_name, sizeof(queue->tx_irq_name),
 			 "%s-tx", queue->name);
 		err = bind_interdomain_evtchn_to_irqhandler_lateeoi(
-			queue->vif->domid, tx_evtchn, xenvif_tx_interrupt, 0,
+			dev, tx_evtchn, xenvif_tx_interrupt, 0,
 			queue->tx_irq_name, queue);
 		if (err < 0)
 			goto err;
@@ -774,7 +774,7 @@ int xenvif_connect_data(struct xenvif_queue *queue,
 		snprintf(queue->rx_irq_name, sizeof(queue->rx_irq_name),
 			 "%s-rx", queue->name);
 		err = bind_interdomain_evtchn_to_irqhandler_lateeoi(
-			queue->vif->domid, rx_evtchn, xenvif_rx_interrupt, 0,
+			dev, rx_evtchn, xenvif_rx_interrupt, 0,
 			queue->rx_irq_name, queue);
 		if (err < 0)
 			goto err;
diff --git a/drivers/xen/events/events_base.c b/drivers/xen/events/events_base.c
index 7b26ef817f8b..8c620c11e32a 100644
--- a/drivers/xen/events/events_base.c
+++ b/drivers/xen/events/events_base.c
@@ -63,6 +63,7 @@
 #include <xen/interface/physdev.h>
 #include <xen/interface/sched.h>
 #include <xen/interface/vcpu.h>
+#include <xen/xenbus.h>
 #include <asm/hw_irq.h>
 
 #include "events_internal.h"
@@ -117,6 +118,7 @@ struct irq_info {
 			unsigned char flags;
 			uint16_t domid;
 		} pirq;
+		struct xenbus_device *interdomain;
 	} u;
 };
 
@@ -317,11 +319,16 @@ static int xen_irq_info_common_setup(struct irq_info *info,
 }
 
 static int xen_irq_info_evtchn_setup(unsigned irq,
-				     evtchn_port_t evtchn)
+				     evtchn_port_t evtchn,
+				     struct xenbus_device *dev)
 {
 	struct irq_info *info = info_for_irq(irq);
+	int ret;
 
-	return xen_irq_info_common_setup(info, irq, IRQT_EVTCHN, evtchn, 0);
+	ret = xen_irq_info_common_setup(info, irq, IRQT_EVTCHN, evtchn, 0);
+	info->u.interdomain = dev;
+
+	return ret;
 }
 
 static int xen_irq_info_ipi_setup(unsigned cpu,
@@ -1128,7 +1135,8 @@ int xen_pirq_from_irq(unsigned irq)
 }
 EXPORT_SYMBOL_GPL(xen_pirq_from_irq);
 
-static int bind_evtchn_to_irq_chip(evtchn_port_t evtchn, struct irq_chip *chip)
+static int bind_evtchn_to_irq_chip(evtchn_port_t evtchn, struct irq_chip *chip,
+				   struct xenbus_device *dev)
 {
 	int irq;
 	int ret;
@@ -1148,7 +1156,7 @@ static int bind_evtchn_to_irq_chip(evtchn_port_t evtchn, struct irq_chip *chip)
 		irq_set_chip_and_handler_name(irq, chip,
 					      handle_edge_irq, "event");
 
-		ret = xen_irq_info_evtchn_setup(irq, evtchn);
+		ret = xen_irq_info_evtchn_setup(irq, evtchn, dev);
 		if (ret < 0) {
 			__unbind_from_irq(irq);
 			irq = ret;
@@ -1175,7 +1183,7 @@ static int bind_evtchn_to_irq_chip(evtchn_port_t evtchn, struct irq_chip *chip)
 
 int bind_evtchn_to_irq(evtchn_port_t evtchn)
 {
-	return bind_evtchn_to_irq_chip(evtchn, &xen_dynamic_chip);
+	return bind_evtchn_to_irq_chip(evtchn, &xen_dynamic_chip, NULL);
 }
 EXPORT_SYMBOL_GPL(bind_evtchn_to_irq);
 
@@ -1224,27 +1232,27 @@ static int bind_ipi_to_irq(unsigned int ipi, unsigned int cpu)
 	return irq;
 }
 
-static int bind_interdomain_evtchn_to_irq_chip(unsigned int remote_domain,
+static int bind_interdomain_evtchn_to_irq_chip(struct xenbus_device *dev,
 					       evtchn_port_t remote_port,
 					       struct irq_chip *chip)
 {
 	struct evtchn_bind_interdomain bind_interdomain;
 	int err;
 
-	bind_interdomain.remote_dom  = remote_domain;
+	bind_interdomain.remote_dom  = dev->otherend_id;
 	bind_interdomain.remote_port = remote_port;
 
 	err = HYPERVISOR_event_channel_op(EVTCHNOP_bind_interdomain,
 					  &bind_interdomain);
 
 	return err ? : bind_evtchn_to_irq_chip(bind_interdomain.local_port,
-					       chip);
+					       chip, dev);
 }
 
-int bind_interdomain_evtchn_to_irq_lateeoi(unsigned int remote_domain,
+int bind_interdomain_evtchn_to_irq_lateeoi(struct xenbus_device *dev,
 					   evtchn_port_t remote_port)
 {
-	return bind_interdomain_evtchn_to_irq_chip(remote_domain, remote_port,
+	return bind_interdomain_evtchn_to_irq_chip(dev, remote_port,
 						   &xen_lateeoi_chip);
 }
 EXPORT_SYMBOL_GPL(bind_interdomain_evtchn_to_irq_lateeoi);
@@ -1357,7 +1365,7 @@ static int bind_evtchn_to_irqhandler_chip(evtchn_port_t evtchn,
 {
 	int irq, retval;
 
-	irq = bind_evtchn_to_irq_chip(evtchn, chip);
+	irq = bind_evtchn_to_irq_chip(evtchn, chip, NULL);
 	if (irq < 0)
 		return irq;
 	retval = request_irq(irq, handler, irqflags, devname, dev_id);
@@ -1392,14 +1400,13 @@ int bind_evtchn_to_irqhandler_lateeoi(evtchn_port_t evtchn,
 EXPORT_SYMBOL_GPL(bind_evtchn_to_irqhandler_lateeoi);
 
 static int bind_interdomain_evtchn_to_irqhandler_chip(
-		unsigned int remote_domain, evtchn_port_t remote_port,
+		struct xenbus_device *dev, evtchn_port_t remote_port,
 		irq_handler_t handler, unsigned long irqflags,
 		const char *devname, void *dev_id, struct irq_chip *chip)
 {
 	int irq, retval;
 
-	irq = bind_interdomain_evtchn_to_irq_chip(remote_domain, remote_port,
-						  chip);
+	irq = bind_interdomain_evtchn_to_irq_chip(dev, remote_port, chip);
 	if (irq < 0)
 		return irq;
 
@@ -1412,14 +1419,14 @@ static int bind_interdomain_evtchn_to_irqhandler_chip(
 	return irq;
 }
 
-int bind_interdomain_evtchn_to_irqhandler_lateeoi(unsigned int remote_domain,
+int bind_interdomain_evtchn_to_irqhandler_lateeoi(struct xenbus_device *dev,
 						  evtchn_port_t remote_port,
 						  irq_handler_t handler,
 						  unsigned long irqflags,
 						  const char *devname,
 						  void *dev_id)
 {
-	return bind_interdomain_evtchn_to_irqhandler_chip(remote_domain,
+	return bind_interdomain_evtchn_to_irqhandler_chip(dev,
 				remote_port, handler, irqflags, devname,
 				dev_id, &xen_lateeoi_chip);
 }
@@ -1691,7 +1698,7 @@ void rebind_evtchn_irq(evtchn_port_t evtchn, int irq)
 	   so there should be a proper type */
 	BUG_ON(info->type == IRQT_UNBOUND);
 
-	(void)xen_irq_info_evtchn_setup(irq, evtchn);
+	(void)xen_irq_info_evtchn_setup(irq, evtchn, NULL);
 
 	mutex_unlock(&irq_mapping_update_lock);
 
diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
index a7d293fa8d14..b47fd8435061 100644
--- a/drivers/xen/pvcalls-back.c
+++ b/drivers/xen/pvcalls-back.c
@@ -348,7 +348,7 @@ static struct sock_mapping *pvcalls_new_active_socket(
 	map->bytes = page;
 
 	ret = bind_interdomain_evtchn_to_irqhandler_lateeoi(
-			fedata->dev->otherend_id, evtchn,
+			fedata->dev, evtchn,
 			pvcalls_back_conn_event, 0, "pvcalls-backend", map);
 	if (ret < 0)
 		goto out;
@@ -948,7 +948,7 @@ static int backend_connect(struct xenbus_device *dev)
 		goto error;
 	}
 
-	err = bind_interdomain_evtchn_to_irq_lateeoi(dev->otherend_id, evtchn);
+	err = bind_interdomain_evtchn_to_irq_lateeoi(dev, evtchn);
 	if (err < 0)
 		goto error;
 	fedata->irq = err;
diff --git a/drivers/xen/xen-pciback/xenbus.c b/drivers/xen/xen-pciback/xenbus.c
index e7c692cfb2cf..5188f02e75fb 100644
--- a/drivers/xen/xen-pciback/xenbus.c
+++ b/drivers/xen/xen-pciback/xenbus.c
@@ -124,7 +124,7 @@ static int xen_pcibk_do_attach(struct xen_pcibk_device *pdev, int gnt_ref,
 	pdev->sh_info = vaddr;
 
 	err = bind_interdomain_evtchn_to_irqhandler_lateeoi(
-		pdev->xdev->otherend_id, remote_evtchn, xen_pcibk_handle_event,
+		pdev->xdev, remote_evtchn, xen_pcibk_handle_event,
 		0, DRV_NAME, pdev);
 	if (err < 0) {
 		xenbus_dev_fatal(pdev->xdev, err,
diff --git a/drivers/xen/xen-scsiback.c b/drivers/xen/xen-scsiback.c
index 862162dca33c..8b59897b2df9 100644
--- a/drivers/xen/xen-scsiback.c
+++ b/drivers/xen/xen-scsiback.c
@@ -799,7 +799,7 @@ static int scsiback_init_sring(struct vscsibk_info *info, grant_ref_t ring_ref,
 	sring = (struct vscsiif_sring *)area;
 	BACK_RING_INIT(&info->ring, sring, PAGE_SIZE);
 
-	err = bind_interdomain_evtchn_to_irq_lateeoi(info->domid, evtchn);
+	err = bind_interdomain_evtchn_to_irq_lateeoi(info->dev, evtchn);
 	if (err < 0)
 		goto unmap_page;
 
diff --git a/include/xen/events.h b/include/xen/events.h
index 8ec418e30c7f..c204262d9fc2 100644
--- a/include/xen/events.h
+++ b/include/xen/events.h
@@ -12,10 +12,11 @@
 #include <asm/xen/hypercall.h>
 #include <asm/xen/events.h>
 
+struct xenbus_device;
+
 unsigned xen_evtchn_nr_channels(void);
 
 int bind_evtchn_to_irq(evtchn_port_t evtchn);
-int bind_evtchn_to_irq_lateeoi(evtchn_port_t evtchn);
 int bind_evtchn_to_irqhandler(evtchn_port_t evtchn,
 			      irq_handler_t handler,
 			      unsigned long irqflags, const char *devname,
@@ -35,9 +36,9 @@ int bind_ipi_to_irqhandler(enum ipi_vector ipi,
 			   unsigned long irqflags,
 			   const char *devname,
 			   void *dev_id);
-int bind_interdomain_evtchn_to_irq_lateeoi(unsigned int remote_domain,
+int bind_interdomain_evtchn_to_irq_lateeoi(struct xenbus_device *dev,
 					   evtchn_port_t remote_port);
-int bind_interdomain_evtchn_to_irqhandler_lateeoi(unsigned int remote_domain,
+int bind_interdomain_evtchn_to_irqhandler_lateeoi(struct xenbus_device *dev,
 						  evtchn_port_t remote_port,
 						  irq_handler_t handler,
 						  unsigned long irqflags,
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 5/7] xen/events: add per-xenbus device event statistics and settings
  2021-02-06 10:49 [PATCH 0/7] xen/events: bug fixes and some diagnostic aids Juergen Gross
                   ` (3 preceding siblings ...)
  2021-02-06 10:49 ` [PATCH 4/7] xen/events: link interdomain events to associated xenbus device Juergen Gross
@ 2021-02-06 10:49 ` Juergen Gross
  2021-02-08 23:35   ` Boris Ostrovsky
  2021-02-06 10:49 ` [PATCH 6/7] xen/evtch: use smp barriers for user event ring Juergen Gross
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 53+ messages in thread
From: Juergen Gross @ 2021-02-06 10:49 UTC (permalink / raw)
  To: xen-devel, linux-kernel
  Cc: Juergen Gross, Boris Ostrovsky, Stefano Stabellini

Add sysfs nodes for each xenbus device showing event statistics (number
of events and spurious events, number of associated event channels)
and for setting a spurious event threshold in case a frontend is
sending too many events without being rogue on purpose.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 drivers/xen/events/events_base.c  | 27 ++++++++++++-
 drivers/xen/xenbus/xenbus_probe.c | 66 +++++++++++++++++++++++++++++++
 include/xen/xenbus.h              |  7 ++++
 3 files changed, 98 insertions(+), 2 deletions(-)

diff --git a/drivers/xen/events/events_base.c b/drivers/xen/events/events_base.c
index 8c620c11e32a..d0c57c5664c0 100644
--- a/drivers/xen/events/events_base.c
+++ b/drivers/xen/events/events_base.c
@@ -327,6 +327,8 @@ static int xen_irq_info_evtchn_setup(unsigned irq,
 
 	ret = xen_irq_info_common_setup(info, irq, IRQT_EVTCHN, evtchn, 0);
 	info->u.interdomain = dev;
+	if (dev)
+		atomic_inc(&dev->event_channels);
 
 	return ret;
 }
@@ -572,18 +574,28 @@ static void xen_irq_lateeoi_locked(struct irq_info *info, bool spurious)
 		return;
 
 	if (spurious) {
+		struct xenbus_device *dev = info->u.interdomain;
+		unsigned int threshold = 1;
+
+		if (dev && dev->spurious_threshold)
+			threshold = dev->spurious_threshold;
+
 		if ((1 << info->spurious_cnt) < (HZ << 2)) {
 			if (info->spurious_cnt != 0xFF)
 				info->spurious_cnt++;
 		}
-		if (info->spurious_cnt > 1) {
-			delay = 1 << (info->spurious_cnt - 2);
+		if (info->spurious_cnt > threshold) {
+			delay = 1 << (info->spurious_cnt - 1 - threshold);
 			if (delay > HZ)
 				delay = HZ;
 			if (!info->eoi_time)
 				info->eoi_cpu = smp_processor_id();
 			info->eoi_time = get_jiffies_64() + delay;
+			if (dev)
+				atomic_add(delay, &dev->jiffies_eoi_delayed);
 		}
+		if (dev)
+			atomic_inc(&dev->spurious_events);
 	} else {
 		info->spurious_cnt = 0;
 	}
@@ -920,6 +932,7 @@ static void __unbind_from_irq(unsigned int irq)
 
 	if (VALID_EVTCHN(evtchn)) {
 		unsigned int cpu = cpu_from_irq(irq);
+		struct xenbus_device *dev;
 
 		xen_evtchn_close(evtchn);
 
@@ -930,6 +943,11 @@ static void __unbind_from_irq(unsigned int irq)
 		case IRQT_IPI:
 			per_cpu(ipi_to_irq, cpu)[ipi_from_irq(irq)] = -1;
 			break;
+		case IRQT_EVTCHN:
+			dev = info->u.interdomain;
+			if (dev)
+				atomic_dec(&dev->event_channels);
+			break;
 		default:
 			break;
 		}
@@ -1593,6 +1611,7 @@ void handle_irq_for_port(evtchn_port_t port, struct evtchn_loop_ctrl *ctrl)
 {
 	int irq;
 	struct irq_info *info;
+	struct xenbus_device *dev;
 
 	irq = get_evtchn_to_irq(port);
 	if (irq == -1)
@@ -1622,6 +1641,10 @@ void handle_irq_for_port(evtchn_port_t port, struct evtchn_loop_ctrl *ctrl)
 
 	info = info_for_irq(irq);
 
+	dev = (info->type == IRQT_EVTCHN) ? info->u.interdomain : NULL;
+	if (dev)
+		atomic_inc(&dev->events);
+
 	if (ctrl->defer_eoi) {
 		info->eoi_cpu = smp_processor_id();
 		info->irq_epoch = __this_cpu_read(irq_epoch);
diff --git a/drivers/xen/xenbus/xenbus_probe.c b/drivers/xen/xenbus/xenbus_probe.c
index 18ffd0551b54..9494ecad3c92 100644
--- a/drivers/xen/xenbus/xenbus_probe.c
+++ b/drivers/xen/xenbus/xenbus_probe.c
@@ -206,6 +206,65 @@ void xenbus_otherend_changed(struct xenbus_watch *watch,
 }
 EXPORT_SYMBOL_GPL(xenbus_otherend_changed);
 
+#define XENBUS_SHOW_STAT(name)						\
+static ssize_t show_##name(struct device *_dev,				\
+			   struct device_attribute *attr,		\
+			   char *buf)					\
+{									\
+	struct xenbus_device *dev = to_xenbus_device(_dev);		\
+									\
+	return sprintf(buf, "%d\n", atomic_read(&dev->name));		\
+}									\
+static DEVICE_ATTR(name, 0444, show_##name, NULL)
+
+XENBUS_SHOW_STAT(event_channels);
+XENBUS_SHOW_STAT(events);
+XENBUS_SHOW_STAT(spurious_events);
+XENBUS_SHOW_STAT(jiffies_eoi_delayed);
+
+static ssize_t show_spurious_threshold(struct device *_dev,
+				       struct device_attribute *attr,
+				       char *buf)
+{
+	struct xenbus_device *dev = to_xenbus_device(_dev);
+
+	return sprintf(buf, "%d\n", dev->spurious_threshold);
+}
+
+static ssize_t set_spurious_threshold(struct device *_dev,
+				      struct device_attribute *attr,
+				      const char *buf, size_t count)
+{
+	struct xenbus_device *dev = to_xenbus_device(_dev);
+	unsigned int val;
+	ssize_t ret;
+
+	ret = kstrtouint(buf, 0, &val);
+	if (ret)
+		return ret;
+
+	dev->spurious_threshold = val;
+
+	return count;
+}
+
+static DEVICE_ATTR(spurious_threshold, 0644, show_spurious_threshold,
+		   set_spurious_threshold);
+
+static struct attribute *xenbus_attrs[] = {
+	&dev_attr_event_channels.attr,
+	&dev_attr_events.attr,
+	&dev_attr_spurious_events.attr,
+	&dev_attr_jiffies_eoi_delayed.attr,
+	&dev_attr_spurious_threshold.attr,
+	NULL
+};
+
+static const struct attribute_group xenbus_group = {
+	.name = "xenbus",
+	.attrs = xenbus_attrs,
+};
+
 int xenbus_dev_probe(struct device *_dev)
 {
 	struct xenbus_device *dev = to_xenbus_device(_dev);
@@ -253,6 +312,11 @@ int xenbus_dev_probe(struct device *_dev)
 		return err;
 	}
 
+	dev->spurious_threshold = 1;
+	if (sysfs_create_group(&dev->dev.kobj, &xenbus_group))
+		dev_warn(&dev->dev, "sysfs_create_group on %s failed.\n",
+			 dev->nodename);
+
 	return 0;
 fail_put:
 	module_put(drv->driver.owner);
@@ -269,6 +333,8 @@ int xenbus_dev_remove(struct device *_dev)
 
 	DPRINTK("%s", dev->nodename);
 
+	sysfs_remove_group(&dev->dev.kobj, &xenbus_group);
+
 	free_otherend_watch(dev);
 
 	if (drv->remove) {
diff --git a/include/xen/xenbus.h b/include/xen/xenbus.h
index 2c43b0ef1e4d..13ee375a1f05 100644
--- a/include/xen/xenbus.h
+++ b/include/xen/xenbus.h
@@ -88,6 +88,13 @@ struct xenbus_device {
 	struct completion down;
 	struct work_struct work;
 	struct semaphore reclaim_sem;
+
+	/* Event channel based statistics and settings. */
+	atomic_t event_channels;
+	atomic_t events;
+	atomic_t spurious_events;
+	atomic_t jiffies_eoi_delayed;
+	unsigned int spurious_threshold;
 };
 
 static inline struct xenbus_device *to_xenbus_device(struct device *dev)
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 6/7] xen/evtch: use smp barriers for user event ring
  2021-02-06 10:49 [PATCH 0/7] xen/events: bug fixes and some diagnostic aids Juergen Gross
                   ` (4 preceding siblings ...)
  2021-02-06 10:49 ` [PATCH 5/7] xen/events: add per-xenbus device event statistics and settings Juergen Gross
@ 2021-02-06 10:49 ` Juergen Gross
  2021-02-08  9:38   ` Jan Beulich
  2021-02-08  9:44   ` Andrew Cooper
  2021-02-06 10:49 ` [PATCH 7/7] xen/evtchn: read producer index only once Juergen Gross
  2021-02-06 18:46 ` [PATCH 0/7] xen/events: bug fixes and some diagnostic aids Julien Grall
  7 siblings, 2 replies; 53+ messages in thread
From: Juergen Gross @ 2021-02-06 10:49 UTC (permalink / raw)
  To: xen-devel, linux-kernel
  Cc: Juergen Gross, Boris Ostrovsky, Stefano Stabellini, Andrew Cooper

The ring buffer for user events is used in the local system only, so
smp barriers are fine for ensuring consistency.

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
---
 drivers/xen/evtchn.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/xen/evtchn.c b/drivers/xen/evtchn.c
index a7a85719a8c8..421382c73d88 100644
--- a/drivers/xen/evtchn.c
+++ b/drivers/xen/evtchn.c
@@ -173,7 +173,7 @@ static irqreturn_t evtchn_interrupt(int irq, void *data)
 
 	if ((u->ring_prod - u->ring_cons) < u->ring_size) {
 		*evtchn_ring_entry(u, u->ring_prod) = evtchn->port;
-		wmb(); /* Ensure ring contents visible */
+		smp_wmb(); /* Ensure ring contents visible */
 		if (u->ring_cons == u->ring_prod++) {
 			wake_up_interruptible(&u->evtchn_wait);
 			kill_fasync(&u->evtchn_async_queue,
@@ -245,7 +245,7 @@ static ssize_t evtchn_read(struct file *file, char __user *buf,
 	}
 
 	rc = -EFAULT;
-	rmb(); /* Ensure that we see the port before we copy it. */
+	smp_rmb(); /* Ensure that we see the port before we copy it. */
 	if (copy_to_user(buf, evtchn_ring_entry(u, c), bytes1) ||
 	    ((bytes2 != 0) &&
 	     copy_to_user(&buf[bytes1], &u->ring[0], bytes2)))
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 7/7] xen/evtchn: read producer index only once
  2021-02-06 10:49 [PATCH 0/7] xen/events: bug fixes and some diagnostic aids Juergen Gross
                   ` (5 preceding siblings ...)
  2021-02-06 10:49 ` [PATCH 6/7] xen/evtch: use smp barriers for user event ring Juergen Gross
@ 2021-02-06 10:49 ` Juergen Gross
  2021-02-08  9:48   ` Jan Beulich
  2021-02-08 11:40   ` Julien Grall
  2021-02-06 18:46 ` [PATCH 0/7] xen/events: bug fixes and some diagnostic aids Julien Grall
  7 siblings, 2 replies; 53+ messages in thread
From: Juergen Gross @ 2021-02-06 10:49 UTC (permalink / raw)
  To: xen-devel, linux-kernel
  Cc: Juergen Gross, Boris Ostrovsky, Stefano Stabellini

In evtchn_read() use READ_ONCE() for reading the producer index in
order to avoid the compiler generating multiple accesses.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 drivers/xen/evtchn.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/xen/evtchn.c b/drivers/xen/evtchn.c
index 421382c73d88..f6b199b597bf 100644
--- a/drivers/xen/evtchn.c
+++ b/drivers/xen/evtchn.c
@@ -211,7 +211,7 @@ static ssize_t evtchn_read(struct file *file, char __user *buf,
 			goto unlock_out;
 
 		c = u->ring_cons;
-		p = u->ring_prod;
+		p = READ_ONCE(u->ring_prod);
 		if (c != p)
 			break;
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* Re: [PATCH 1/7] xen/events: reset affinity of 2-level event initially
  2021-02-06 10:49 ` [PATCH 1/7] xen/events: reset affinity of 2-level event initially Juergen Gross
@ 2021-02-06 11:20   ` Julien Grall
  2021-02-06 12:09     ` Jürgen Groß
  0 siblings, 1 reply; 53+ messages in thread
From: Julien Grall @ 2021-02-06 11:20 UTC (permalink / raw)
  To: Juergen Gross, xen-devel, linux-kernel
  Cc: Boris Ostrovsky, Stefano Stabellini, stable

Hi Juergen,

On 06/02/2021 10:49, Juergen Gross wrote:
> When creating a new event channel with 2-level events the affinity
> needs to be reset initially in order to avoid using an old affinity
> from earlier usage of the event channel port.
> 
> The same applies to the affinity when onlining a vcpu: all old
> affinity settings for this vcpu must be reset. As percpu events get
> initialized before the percpu event channel hook is called,
> resetting of the affinities happens after offlining a vcpu (this is
> working, as initial percpu memory is zeroed out).
> 
> Cc: stable@vger.kernel.org
> Reported-by: Julien Grall <julien@xen.org>
> Signed-off-by: Juergen Gross <jgross@suse.com>
> ---
>   drivers/xen/events/events_2l.c | 20 ++++++++++++++++++++
>   1 file changed, 20 insertions(+)
> 
> diff --git a/drivers/xen/events/events_2l.c b/drivers/xen/events/events_2l.c
> index da87f3a1e351..23217940144a 100644
> --- a/drivers/xen/events/events_2l.c
> +++ b/drivers/xen/events/events_2l.c
> @@ -47,6 +47,16 @@ static unsigned evtchn_2l_max_channels(void)
>   	return EVTCHN_2L_NR_CHANNELS;
>   }
>   
> +static int evtchn_2l_setup(evtchn_port_t evtchn)
> +{
> +	unsigned int cpu;
> +
> +	for_each_online_cpu(cpu)
> +		clear_bit(evtchn, BM(per_cpu(cpu_evtchn_mask, cpu)));

The bit corresponding to the event channel can only be set on a single 
CPU. Could we avoid the loop and instead clear the bit while closing the 
port?

Cheers,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 1/7] xen/events: reset affinity of 2-level event initially
  2021-02-06 11:20   ` Julien Grall
@ 2021-02-06 12:09     ` Jürgen Groß
  2021-02-06 12:19       ` Julien Grall
  0 siblings, 1 reply; 53+ messages in thread
From: Jürgen Groß @ 2021-02-06 12:09 UTC (permalink / raw)
  To: Julien Grall, xen-devel, linux-kernel
  Cc: Boris Ostrovsky, Stefano Stabellini, stable


[-- Attachment #1.1.1: Type: text/plain, Size: 1638 bytes --]

On 06.02.21 12:20, Julien Grall wrote:
> Hi Juergen,
> 
> On 06/02/2021 10:49, Juergen Gross wrote:
>> When creating a new event channel with 2-level events the affinity
>> needs to be reset initially in order to avoid using an old affinity
>> from earlier usage of the event channel port.
>>
>> The same applies to the affinity when onlining a vcpu: all old
>> affinity settings for this vcpu must be reset. As percpu events get
>> initialized before the percpu event channel hook is called,
>> resetting of the affinities happens after offlining a vcpu (this is
>> working, as initial percpu memory is zeroed out).
>>
>> Cc: stable@vger.kernel.org
>> Reported-by: Julien Grall <julien@xen.org>
>> Signed-off-by: Juergen Gross <jgross@suse.com>
>> ---
>>   drivers/xen/events/events_2l.c | 20 ++++++++++++++++++++
>>   1 file changed, 20 insertions(+)
>>
>> diff --git a/drivers/xen/events/events_2l.c 
>> b/drivers/xen/events/events_2l.c
>> index da87f3a1e351..23217940144a 100644
>> --- a/drivers/xen/events/events_2l.c
>> +++ b/drivers/xen/events/events_2l.c
>> @@ -47,6 +47,16 @@ static unsigned evtchn_2l_max_channels(void)
>>       return EVTCHN_2L_NR_CHANNELS;
>>   }
>> +static int evtchn_2l_setup(evtchn_port_t evtchn)
>> +{
>> +    unsigned int cpu;
>> +
>> +    for_each_online_cpu(cpu)
>> +        clear_bit(evtchn, BM(per_cpu(cpu_evtchn_mask, cpu)));
> 
> The bit corresponding to the event channel can only be set on a single 
> CPU. Could we avoid the loop and instead clear the bit while closing the 
> port?

This would need another callback.


Juergen


[-- Attachment #1.1.2: OpenPGP_0xB0DE9DD628BF132F.asc --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 1/7] xen/events: reset affinity of 2-level event initially
  2021-02-06 12:09     ` Jürgen Groß
@ 2021-02-06 12:19       ` Julien Grall
  0 siblings, 0 replies; 53+ messages in thread
From: Julien Grall @ 2021-02-06 12:19 UTC (permalink / raw)
  To: Jürgen Groß, xen-devel, linux-kernel
  Cc: Boris Ostrovsky, Stefano Stabellini, stable



On 06/02/2021 12:09, Jürgen Groß wrote:
> On 06.02.21 12:20, Julien Grall wrote:
>> Hi Juergen,
>>
>> On 06/02/2021 10:49, Juergen Gross wrote:
>>> When creating a new event channel with 2-level events the affinity
>>> needs to be reset initially in order to avoid using an old affinity
>>> from earlier usage of the event channel port.
>>>
>>> The same applies to the affinity when onlining a vcpu: all old
>>> affinity settings for this vcpu must be reset. As percpu events get
>>> initialized before the percpu event channel hook is called,
>>> resetting of the affinities happens after offlining a vcpu (this is
>>> working, as initial percpu memory is zeroed out).
>>>
>>> Cc: stable@vger.kernel.org
>>> Reported-by: Julien Grall <julien@xen.org>
>>> Signed-off-by: Juergen Gross <jgross@suse.com>
>>> ---
>>>   drivers/xen/events/events_2l.c | 20 ++++++++++++++++++++
>>>   1 file changed, 20 insertions(+)
>>>
>>> diff --git a/drivers/xen/events/events_2l.c 
>>> b/drivers/xen/events/events_2l.c
>>> index da87f3a1e351..23217940144a 100644
>>> --- a/drivers/xen/events/events_2l.c
>>> +++ b/drivers/xen/events/events_2l.c
>>> @@ -47,6 +47,16 @@ static unsigned evtchn_2l_max_channels(void)
>>>       return EVTCHN_2L_NR_CHANNELS;
>>>   }
>>> +static int evtchn_2l_setup(evtchn_port_t evtchn)
>>> +{
>>> +    unsigned int cpu;
>>> +
>>> +    for_each_online_cpu(cpu)
>>> +        clear_bit(evtchn, BM(per_cpu(cpu_evtchn_mask, cpu)));
>>
>> The bit corresponding to the event channel can only be set on a single 
>> CPU. Could we avoid the loop and instead clear the bit while closing 
>> the port?
> 
> This would need another callback.

Right, this seems to be better than walking over all the CPUs every time 
just for cleaning one bit.

Cheers,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 0/7] xen/events: bug fixes and some diagnostic aids
  2021-02-06 10:49 [PATCH 0/7] xen/events: bug fixes and some diagnostic aids Juergen Gross
                   ` (6 preceding siblings ...)
  2021-02-06 10:49 ` [PATCH 7/7] xen/evtchn: read producer index only once Juergen Gross
@ 2021-02-06 18:46 ` Julien Grall
  2021-02-07 12:58   ` Jürgen Groß
  7 siblings, 1 reply; 53+ messages in thread
From: Julien Grall @ 2021-02-06 18:46 UTC (permalink / raw)
  To: Juergen Gross, xen-devel, linux-kernel, linux-block, netdev, linux-scsi
  Cc: Boris Ostrovsky, Stefano Stabellini, stable,
	Konrad Rzeszutek Wilk, Roger Pau Monné,
	Jens Axboe, Wei Liu, Paul Durrant, David S. Miller,
	Jakub Kicinski

Hi Juergen,

On 06/02/2021 10:49, Juergen Gross wrote:
> The first three patches are fixes for XSA-332. The avoid WARN splats
> and a performance issue with interdomain events.

Thanks for helping to figure out the problem. Unfortunately, I still see 
reliably the WARN splat with the latest Linux master (1e0d27fce010) + 
your first 3 patches.

I am using Xen 4.11 (1c7d984645f9) and dom0 is forced to use the 2L 
events ABI.

After some debugging, I think I have an idea what's went wrong. The 
problem happens when the event is initially bound from vCPU0 to a 
different vCPU.

 From the comment in xen_rebind_evtchn_to_cpu(), we are masking the 
event to prevent it being delivered on an unexpected vCPU. However, I 
believe the following can happen:

vCPU0				| vCPU1
				|
				| Call xen_rebind_evtchn_to_cpu()
receive event X			|
				| mask event X
				| bind to vCPU1
<vCPU descheduled>		| unmask event X
				|
				| receive event X
				|
				| handle_edge_irq(X)
handle_edge_irq(X)		|  -> handle_irq_event()
				|   -> set IRQD_IN_PROGRESS
  -> set IRQS_PENDING		|
				|   -> evtchn_interrupt()
				|   -> clear IRQD_IN_PROGRESS
				|  -> IRQS_PENDING is set
				|  -> handle_irq_event()
				|   -> evtchn_interrupt()
				|     -> WARN()
				|

All the lateeoi handlers expect a ONESHOT semantic and 
evtchn_interrupt() is doesn't tolerate any deviation.

I think the problem was introduced by 7f874a0447a9 ("xen/events: fix 
lateeoi irq acknowledgment") because the interrupt was disabled 
previously. Therefore we wouldn't do another iteration in handle_edge_irq().

Aside the handlers, I think it may impact the defer EOI mitigation 
because in theory if a 3rd vCPU is joining the party (let say vCPU A 
migrate the event from vCPU B to vCPU C). So info->{eoi_cpu, irq_epoch, 
eoi_time} could possibly get mangled?

For a fix, we may want to consider to hold evtchn_rwlock with the write 
permission. Although, I am not 100% sure this is going to prevent 
everything.

Does my write-up make sense to you?

Cheers,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 0/7] xen/events: bug fixes and some diagnostic aids
  2021-02-06 18:46 ` [PATCH 0/7] xen/events: bug fixes and some diagnostic aids Julien Grall
@ 2021-02-07 12:58   ` Jürgen Groß
  2021-02-08  9:11     ` Julien Grall
  0 siblings, 1 reply; 53+ messages in thread
From: Jürgen Groß @ 2021-02-07 12:58 UTC (permalink / raw)
  To: Julien Grall, xen-devel, linux-kernel, linux-block, netdev, linux-scsi
  Cc: Boris Ostrovsky, Stefano Stabellini, stable,
	Konrad Rzeszutek Wilk, Roger Pau Monné,
	Jens Axboe, Wei Liu, Paul Durrant, David S. Miller,
	Jakub Kicinski


[-- Attachment #1.1.1: Type: text/plain, Size: 3441 bytes --]

On 06.02.21 19:46, Julien Grall wrote:
> Hi Juergen,
> 
> On 06/02/2021 10:49, Juergen Gross wrote:
>> The first three patches are fixes for XSA-332. The avoid WARN splats
>> and a performance issue with interdomain events.
> 
> Thanks for helping to figure out the problem. Unfortunately, I still see 
> reliably the WARN splat with the latest Linux master (1e0d27fce010) + 
> your first 3 patches.
> 
> I am using Xen 4.11 (1c7d984645f9) and dom0 is forced to use the 2L 
> events ABI.
> 
> After some debugging, I think I have an idea what's went wrong. The 
> problem happens when the event is initially bound from vCPU0 to a 
> different vCPU.
> 
>  From the comment in xen_rebind_evtchn_to_cpu(), we are masking the 
> event to prevent it being delivered on an unexpected vCPU. However, I 
> believe the following can happen:
> 
> vCPU0                | vCPU1
>                  |
>                  | Call xen_rebind_evtchn_to_cpu()
> receive event X            |
>                  | mask event X
>                  | bind to vCPU1
> <vCPU descheduled>        | unmask event X
>                  |
>                  | receive event X
>                  |
>                  | handle_edge_irq(X)
> handle_edge_irq(X)        |  -> handle_irq_event()
>                  |   -> set IRQD_IN_PROGRESS
>   -> set IRQS_PENDING        |
>                  |   -> evtchn_interrupt()
>                  |   -> clear IRQD_IN_PROGRESS
>                  |  -> IRQS_PENDING is set
>                  |  -> handle_irq_event()
>                  |   -> evtchn_interrupt()
>                  |     -> WARN()
>                  |
> 
> All the lateeoi handlers expect a ONESHOT semantic and 
> evtchn_interrupt() is doesn't tolerate any deviation.
> 
> I think the problem was introduced by 7f874a0447a9 ("xen/events: fix 
> lateeoi irq acknowledgment") because the interrupt was disabled 
> previously. Therefore we wouldn't do another iteration in 
> handle_edge_irq().

I think you picked the wrong commit for blaming, as this is just
the last patch of the three patches you were testing.

> Aside the handlers, I think it may impact the defer EOI mitigation 
> because in theory if a 3rd vCPU is joining the party (let say vCPU A 
> migrate the event from vCPU B to vCPU C). So info->{eoi_cpu, irq_epoch, 
> eoi_time} could possibly get mangled?
> 
> For a fix, we may want to consider to hold evtchn_rwlock with the write 
> permission. Although, I am not 100% sure this is going to prevent 
> everything.

It will make things worse, as it would violate the locking hierarchy
(xen_rebind_evtchn_to_cpu() is called with the IRQ-desc lock held).

On a first glance I think we'll need a 3rd masking state ("temporarily
masked") in the second patch in order to avoid a race with lateeoi.

In order to avoid the race you outlined above we need an "event is being
handled" indicator checked via test_and_set() semantics in
handle_irq_for_port() and reset only when calling clear_evtchn().

> Does my write-up make sense to you?

Yes. What about my reply? ;-)


Juergen

[-- Attachment #1.1.2: OpenPGP_0xB0DE9DD628BF132F.asc --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 0/7] xen/events: bug fixes and some diagnostic aids
  2021-02-07 12:58   ` Jürgen Groß
@ 2021-02-08  9:11     ` Julien Grall
  2021-02-08  9:41       ` Jürgen Groß
  0 siblings, 1 reply; 53+ messages in thread
From: Julien Grall @ 2021-02-08  9:11 UTC (permalink / raw)
  To: Jürgen Groß,
	xen-devel, linux-kernel, linux-block, netdev, linux-scsi
  Cc: Boris Ostrovsky, Stefano Stabellini, stable,
	Konrad Rzeszutek Wilk, Roger Pau Monné,
	Jens Axboe, Wei Liu, Paul Durrant, David S. Miller,
	Jakub Kicinski

Hi Juergen,

On 07/02/2021 12:58, Jürgen Groß wrote:
> On 06.02.21 19:46, Julien Grall wrote:
>> Hi Juergen,
>>
>> On 06/02/2021 10:49, Juergen Gross wrote:
>>> The first three patches are fixes for XSA-332. The avoid WARN splats
>>> and a performance issue with interdomain events.
>>
>> Thanks for helping to figure out the problem. Unfortunately, I still 
>> see reliably the WARN splat with the latest Linux master 
>> (1e0d27fce010) + your first 3 patches.
>>
>> I am using Xen 4.11 (1c7d984645f9) and dom0 is forced to use the 2L 
>> events ABI.
>>
>> After some debugging, I think I have an idea what's went wrong. The 
>> problem happens when the event is initially bound from vCPU0 to a 
>> different vCPU.
>>
>>  From the comment in xen_rebind_evtchn_to_cpu(), we are masking the 
>> event to prevent it being delivered on an unexpected vCPU. However, I 
>> believe the following can happen:
>>
>> vCPU0                | vCPU1
>>                  |
>>                  | Call xen_rebind_evtchn_to_cpu()
>> receive event X            |
>>                  | mask event X
>>                  | bind to vCPU1
>> <vCPU descheduled>        | unmask event X
>>                  |
>>                  | receive event X
>>                  |
>>                  | handle_edge_irq(X)
>> handle_edge_irq(X)        |  -> handle_irq_event()
>>                  |   -> set IRQD_IN_PROGRESS
>>   -> set IRQS_PENDING        |
>>                  |   -> evtchn_interrupt()
>>                  |   -> clear IRQD_IN_PROGRESS
>>                  |  -> IRQS_PENDING is set
>>                  |  -> handle_irq_event()
>>                  |   -> evtchn_interrupt()
>>                  |     -> WARN()
>>                  |
>>
>> All the lateeoi handlers expect a ONESHOT semantic and 
>> evtchn_interrupt() is doesn't tolerate any deviation.
>>
>> I think the problem was introduced by 7f874a0447a9 ("xen/events: fix 
>> lateeoi irq acknowledgment") because the interrupt was disabled 
>> previously. Therefore we wouldn't do another iteration in 
>> handle_edge_irq().
> 
> I think you picked the wrong commit for blaming, as this is just
> the last patch of the three patches you were testing.

I actually found the right commit for blaming but I copied the 
information from the wrong shell :/. The bug was introduced by:

c44b849cee8c ("xen/events: switch user event channels to lateeoi model")

> 
>> Aside the handlers, I think it may impact the defer EOI mitigation 
>> because in theory if a 3rd vCPU is joining the party (let say vCPU A 
>> migrate the event from vCPU B to vCPU C). So info->{eoi_cpu, 
>> irq_epoch, eoi_time} could possibly get mangled?
>>
>> For a fix, we may want to consider to hold evtchn_rwlock with the 
>> write permission. Although, I am not 100% sure this is going to 
>> prevent everything.
> 
> It will make things worse, as it would violate the locking hierarchy
> (xen_rebind_evtchn_to_cpu() is called with the IRQ-desc lock held).

Ah, right.

> 
> On a first glance I think we'll need a 3rd masking state ("temporarily
> masked") in the second patch in order to avoid a race with lateeoi.
> 
> In order to avoid the race you outlined above we need an "event is being
> handled" indicator checked via test_and_set() semantics in
> handle_irq_for_port() and reset only when calling clear_evtchn().

It feels like we are trying to workaround the IRQ flow we are using 
(i.e. handle_edge_irq()).

This reminds me the thread we had before discovering XSA-332 (see [1]). 
Back then, it was suggested to switch back to handle_fasteoi_irq().

Cheers,

[1] 
https://lore.kernel.org/xen-devel/alpine.DEB.2.21.2004271552430.29217@sstabellini-ThinkPad-T480s/

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 6/7] xen/evtch: use smp barriers for user event ring
  2021-02-06 10:49 ` [PATCH 6/7] xen/evtch: use smp barriers for user event ring Juergen Gross
@ 2021-02-08  9:38   ` Jan Beulich
  2021-02-08  9:41     ` Jürgen Groß
  2021-02-08  9:44   ` Andrew Cooper
  1 sibling, 1 reply; 53+ messages in thread
From: Jan Beulich @ 2021-02-08  9:38 UTC (permalink / raw)
  To: Juergen Gross
  Cc: Boris Ostrovsky, Stefano Stabellini, Andrew Cooper, linux-kernel,
	xen-devel

On 06.02.2021 11:49, Juergen Gross wrote:
> The ring buffer for user events is used in the local system only, so
> smp barriers are fine for ensuring consistency.
> 
> Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Signed-off-by: Juergen Gross <jgross@suse.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>

Albeit I think "local system" is at least ambiguous (physical
machine? VM?). How about something like "is local to the given
kernel instance"?

Jan

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 0/7] xen/events: bug fixes and some diagnostic aids
  2021-02-08  9:11     ` Julien Grall
@ 2021-02-08  9:41       ` Jürgen Groß
  2021-02-08  9:54         ` Julien Grall
  0 siblings, 1 reply; 53+ messages in thread
From: Jürgen Groß @ 2021-02-08  9:41 UTC (permalink / raw)
  To: Julien Grall, xen-devel, linux-kernel, linux-block, netdev, linux-scsi
  Cc: Boris Ostrovsky, Stefano Stabellini, stable,
	Konrad Rzeszutek Wilk, Roger Pau Monné,
	Jens Axboe, Wei Liu, Paul Durrant, David S. Miller,
	Jakub Kicinski


[-- Attachment #1.1.1: Type: text/plain, Size: 4379 bytes --]

On 08.02.21 10:11, Julien Grall wrote:
> Hi Juergen,
> 
> On 07/02/2021 12:58, Jürgen Groß wrote:
>> On 06.02.21 19:46, Julien Grall wrote:
>>> Hi Juergen,
>>>
>>> On 06/02/2021 10:49, Juergen Gross wrote:
>>>> The first three patches are fixes for XSA-332. The avoid WARN splats
>>>> and a performance issue with interdomain events.
>>>
>>> Thanks for helping to figure out the problem. Unfortunately, I still 
>>> see reliably the WARN splat with the latest Linux master 
>>> (1e0d27fce010) + your first 3 patches.
>>>
>>> I am using Xen 4.11 (1c7d984645f9) and dom0 is forced to use the 2L 
>>> events ABI.
>>>
>>> After some debugging, I think I have an idea what's went wrong. The 
>>> problem happens when the event is initially bound from vCPU0 to a 
>>> different vCPU.
>>>
>>>  From the comment in xen_rebind_evtchn_to_cpu(), we are masking the 
>>> event to prevent it being delivered on an unexpected vCPU. However, I 
>>> believe the following can happen:
>>>
>>> vCPU0                | vCPU1
>>>                  |
>>>                  | Call xen_rebind_evtchn_to_cpu()
>>> receive event X            |
>>>                  | mask event X
>>>                  | bind to vCPU1
>>> <vCPU descheduled>        | unmask event X
>>>                  |
>>>                  | receive event X
>>>                  |
>>>                  | handle_edge_irq(X)
>>> handle_edge_irq(X)        |  -> handle_irq_event()
>>>                  |   -> set IRQD_IN_PROGRESS
>>>   -> set IRQS_PENDING        |
>>>                  |   -> evtchn_interrupt()
>>>                  |   -> clear IRQD_IN_PROGRESS
>>>                  |  -> IRQS_PENDING is set
>>>                  |  -> handle_irq_event()
>>>                  |   -> evtchn_interrupt()
>>>                  |     -> WARN()
>>>                  |
>>>
>>> All the lateeoi handlers expect a ONESHOT semantic and 
>>> evtchn_interrupt() is doesn't tolerate any deviation.
>>>
>>> I think the problem was introduced by 7f874a0447a9 ("xen/events: fix 
>>> lateeoi irq acknowledgment") because the interrupt was disabled 
>>> previously. Therefore we wouldn't do another iteration in 
>>> handle_edge_irq().
>>
>> I think you picked the wrong commit for blaming, as this is just
>> the last patch of the three patches you were testing.
> 
> I actually found the right commit for blaming but I copied the 
> information from the wrong shell :/. The bug was introduced by:
> 
> c44b849cee8c ("xen/events: switch user event channels to lateeoi model")
> 
>>
>>> Aside the handlers, I think it may impact the defer EOI mitigation 
>>> because in theory if a 3rd vCPU is joining the party (let say vCPU A 
>>> migrate the event from vCPU B to vCPU C). So info->{eoi_cpu, 
>>> irq_epoch, eoi_time} could possibly get mangled?
>>>
>>> For a fix, we may want to consider to hold evtchn_rwlock with the 
>>> write permission. Although, I am not 100% sure this is going to 
>>> prevent everything.
>>
>> It will make things worse, as it would violate the locking hierarchy
>> (xen_rebind_evtchn_to_cpu() is called with the IRQ-desc lock held).
> 
> Ah, right.
> 
>>
>> On a first glance I think we'll need a 3rd masking state ("temporarily
>> masked") in the second patch in order to avoid a race with lateeoi.
>>
>> In order to avoid the race you outlined above we need an "event is being
>> handled" indicator checked via test_and_set() semantics in
>> handle_irq_for_port() and reset only when calling clear_evtchn().
> 
> It feels like we are trying to workaround the IRQ flow we are using 
> (i.e. handle_edge_irq()).

I'm not really sure this is the main problem here. According to your
analysis the main problem is occurring when handling the event, not when
handling the IRQ: the event is being received on two vcpus.

Our problem isn't due to the IRQ still being pending, but due it being
raised again, which should happen for a one shot IRQ the same way.

But maybe I'm misunderstanding your idea.


Juergen

[-- Attachment #1.1.2: OpenPGP_0xB0DE9DD628BF132F.asc --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 6/7] xen/evtch: use smp barriers for user event ring
  2021-02-08  9:38   ` Jan Beulich
@ 2021-02-08  9:41     ` Jürgen Groß
  0 siblings, 0 replies; 53+ messages in thread
From: Jürgen Groß @ 2021-02-08  9:41 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Boris Ostrovsky, Stefano Stabellini, Andrew Cooper, linux-kernel,
	xen-devel


[-- Attachment #1.1.1: Type: text/plain, Size: 554 bytes --]

On 08.02.21 10:38, Jan Beulich wrote:
> On 06.02.2021 11:49, Juergen Gross wrote:
>> The ring buffer for user events is used in the local system only, so
>> smp barriers are fine for ensuring consistency.
>>
>> Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
>> Signed-off-by: Juergen Gross <jgross@suse.com>
> 
> Reviewed-by: Jan Beulich <jbeulich@suse.com>
> 
> Albeit I think "local system" is at least ambiguous (physical
> machine? VM?). How about something like "is local to the given
> kernel instance"?

Yes.


Juergen


[-- Attachment #1.1.2: OpenPGP_0xB0DE9DD628BF132F.asc --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 6/7] xen/evtch: use smp barriers for user event ring
  2021-02-06 10:49 ` [PATCH 6/7] xen/evtch: use smp barriers for user event ring Juergen Gross
  2021-02-08  9:38   ` Jan Beulich
@ 2021-02-08  9:44   ` Andrew Cooper
  2021-02-08  9:50     ` Jan Beulich
  1 sibling, 1 reply; 53+ messages in thread
From: Andrew Cooper @ 2021-02-08  9:44 UTC (permalink / raw)
  To: Juergen Gross, xen-devel, linux-kernel
  Cc: Boris Ostrovsky, Stefano Stabellini

On 06/02/2021 10:49, Juergen Gross wrote:
> The ring buffer for user events is used in the local system only, so
> smp barriers are fine for ensuring consistency.
>
> Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Signed-off-by: Juergen Gross <jgross@suse.com>

These need to be virt_* to not break in UP builds (on non-x86).

~Andrew

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 7/7] xen/evtchn: read producer index only once
  2021-02-06 10:49 ` [PATCH 7/7] xen/evtchn: read producer index only once Juergen Gross
@ 2021-02-08  9:48   ` Jan Beulich
  2021-02-08 10:41     ` Jürgen Groß
  2021-02-08 11:40   ` Julien Grall
  1 sibling, 1 reply; 53+ messages in thread
From: Jan Beulich @ 2021-02-08  9:48 UTC (permalink / raw)
  To: Juergen Gross
  Cc: Boris Ostrovsky, Stefano Stabellini, linux-kernel, xen-devel

On 06.02.2021 11:49, Juergen Gross wrote:
> In evtchn_read() use READ_ONCE() for reading the producer index in
> order to avoid the compiler generating multiple accesses.
> 
> Signed-off-by: Juergen Gross <jgross@suse.com>
> ---
>  drivers/xen/evtchn.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/xen/evtchn.c b/drivers/xen/evtchn.c
> index 421382c73d88..f6b199b597bf 100644
> --- a/drivers/xen/evtchn.c
> +++ b/drivers/xen/evtchn.c
> @@ -211,7 +211,7 @@ static ssize_t evtchn_read(struct file *file, char __user *buf,
>  			goto unlock_out;
>  
>  		c = u->ring_cons;
> -		p = u->ring_prod;
> +		p = READ_ONCE(u->ring_prod);
>  		if (c != p)
>  			break;

Why only here and not also in

		rc = wait_event_interruptible(u->evtchn_wait,
					      u->ring_cons != u->ring_prod);

or in evtchn_poll()? I understand it's not needed when
ring_prod_lock is held, but that's not the case in the two
afaics named places. Plus isn't the same then true for
ring_cons and ring_cons_mutex, i.e. aren't the two named
places plus evtchn_interrupt() also in need of READ_ONCE()
for ring_cons?

Jan

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 6/7] xen/evtch: use smp barriers for user event ring
  2021-02-08  9:44   ` Andrew Cooper
@ 2021-02-08  9:50     ` Jan Beulich
  2021-02-08 10:23       ` Andrew Cooper
  0 siblings, 1 reply; 53+ messages in thread
From: Jan Beulich @ 2021-02-08  9:50 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Boris Ostrovsky, Stefano Stabellini, Juergen Gross, linux-kernel,
	xen-devel

On 08.02.2021 10:44, Andrew Cooper wrote:
> On 06/02/2021 10:49, Juergen Gross wrote:
>> The ring buffer for user events is used in the local system only, so
>> smp barriers are fine for ensuring consistency.
>>
>> Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
>> Signed-off-by: Juergen Gross <jgross@suse.com>
> 
> These need to be virt_* to not break in UP builds (on non-x86).

Initially I though so, too, but isn't the sole vCPU of such a
VM getting re-scheduled to a different pCPU in the hypervisor
an implied barrier anyway?

Jan

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 0/7] xen/events: bug fixes and some diagnostic aids
  2021-02-08  9:41       ` Jürgen Groß
@ 2021-02-08  9:54         ` Julien Grall
  2021-02-08 10:22           ` Jürgen Groß
  0 siblings, 1 reply; 53+ messages in thread
From: Julien Grall @ 2021-02-08  9:54 UTC (permalink / raw)
  To: Jürgen Groß,
	xen-devel, linux-kernel, linux-block, netdev, linux-scsi
  Cc: Boris Ostrovsky, Stefano Stabellini, stable,
	Konrad Rzeszutek Wilk, Roger Pau Monné,
	Jens Axboe, Wei Liu, Paul Durrant, David S. Miller,
	Jakub Kicinski



On 08/02/2021 09:41, Jürgen Groß wrote:
> On 08.02.21 10:11, Julien Grall wrote:
>> Hi Juergen,
>>
>> On 07/02/2021 12:58, Jürgen Groß wrote:
>>> On 06.02.21 19:46, Julien Grall wrote:
>>>> Hi Juergen,
>>>>
>>>> On 06/02/2021 10:49, Juergen Gross wrote:
>>>>> The first three patches are fixes for XSA-332. The avoid WARN splats
>>>>> and a performance issue with interdomain events.
>>>>
>>>> Thanks for helping to figure out the problem. Unfortunately, I still 
>>>> see reliably the WARN splat with the latest Linux master 
>>>> (1e0d27fce010) + your first 3 patches.
>>>>
>>>> I am using Xen 4.11 (1c7d984645f9) and dom0 is forced to use the 2L 
>>>> events ABI.
>>>>
>>>> After some debugging, I think I have an idea what's went wrong. The 
>>>> problem happens when the event is initially bound from vCPU0 to a 
>>>> different vCPU.
>>>>
>>>>  From the comment in xen_rebind_evtchn_to_cpu(), we are masking the 
>>>> event to prevent it being delivered on an unexpected vCPU. However, 
>>>> I believe the following can happen:
>>>>
>>>> vCPU0                | vCPU1
>>>>                  |
>>>>                  | Call xen_rebind_evtchn_to_cpu()
>>>> receive event X            |
>>>>                  | mask event X
>>>>                  | bind to vCPU1
>>>> <vCPU descheduled>        | unmask event X
>>>>                  |
>>>>                  | receive event X
>>>>                  |
>>>>                  | handle_edge_irq(X)
>>>> handle_edge_irq(X)        |  -> handle_irq_event()
>>>>                  |   -> set IRQD_IN_PROGRESS
>>>>   -> set IRQS_PENDING        |
>>>>                  |   -> evtchn_interrupt()
>>>>                  |   -> clear IRQD_IN_PROGRESS
>>>>                  |  -> IRQS_PENDING is set
>>>>                  |  -> handle_irq_event()
>>>>                  |   -> evtchn_interrupt()
>>>>                  |     -> WARN()
>>>>                  |
>>>>
>>>> All the lateeoi handlers expect a ONESHOT semantic and 
>>>> evtchn_interrupt() is doesn't tolerate any deviation.
>>>>
>>>> I think the problem was introduced by 7f874a0447a9 ("xen/events: fix 
>>>> lateeoi irq acknowledgment") because the interrupt was disabled 
>>>> previously. Therefore we wouldn't do another iteration in 
>>>> handle_edge_irq().
>>>
>>> I think you picked the wrong commit for blaming, as this is just
>>> the last patch of the three patches you were testing.
>>
>> I actually found the right commit for blaming but I copied the 
>> information from the wrong shell :/. The bug was introduced by:
>>
>> c44b849cee8c ("xen/events: switch user event channels to lateeoi model")
>>
>>>
>>>> Aside the handlers, I think it may impact the defer EOI mitigation 
>>>> because in theory if a 3rd vCPU is joining the party (let say vCPU A 
>>>> migrate the event from vCPU B to vCPU C). So info->{eoi_cpu, 
>>>> irq_epoch, eoi_time} could possibly get mangled?
>>>>
>>>> For a fix, we may want to consider to hold evtchn_rwlock with the 
>>>> write permission. Although, I am not 100% sure this is going to 
>>>> prevent everything.
>>>
>>> It will make things worse, as it would violate the locking hierarchy
>>> (xen_rebind_evtchn_to_cpu() is called with the IRQ-desc lock held).
>>
>> Ah, right.
>>
>>>
>>> On a first glance I think we'll need a 3rd masking state ("temporarily
>>> masked") in the second patch in order to avoid a race with lateeoi.
>>>
>>> In order to avoid the race you outlined above we need an "event is being
>>> handled" indicator checked via test_and_set() semantics in
>>> handle_irq_for_port() and reset only when calling clear_evtchn().
>>
>> It feels like we are trying to workaround the IRQ flow we are using 
>> (i.e. handle_edge_irq()).
> 
> I'm not really sure this is the main problem here. According to your
> analysis the main problem is occurring when handling the event, not when
> handling the IRQ: the event is being received on two vcpus.

I don't think we can easily divide the two because we rely on the IRQ 
framework to handle the lifecycle of the event. So...

> 
> Our problem isn't due to the IRQ still being pending, but due it being
> raised again, which should happen for a one shot IRQ the same way.

... I don't really see how the difference matter here. The idea is to 
re-use what's already existing rather than trying to re-invent the wheel 
with an extra lock (or whatever we can come up).

Cheers,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 2/7] xen/events: don't unmask an event channel when an eoi is pending
  2021-02-06 10:49 ` [PATCH 2/7] xen/events: don't unmask an event channel when an eoi is pending Juergen Gross
@ 2021-02-08 10:06   ` Jan Beulich
  2021-02-08 10:21     ` Jürgen Groß
  2021-02-08 10:15   ` Ross Lagerwall
  1 sibling, 1 reply; 53+ messages in thread
From: Jan Beulich @ 2021-02-08 10:06 UTC (permalink / raw)
  To: Juergen Gross
  Cc: Boris Ostrovsky, Stefano Stabellini, stable, Julien Grall,
	xen-devel, linux-kernel

On 06.02.2021 11:49, Juergen Gross wrote:
> @@ -1798,6 +1818,29 @@ static void mask_ack_dynirq(struct irq_data *data)
>  	ack_dynirq(data);
>  }
>  
> +static void lateeoi_ack_dynirq(struct irq_data *data)
> +{
> +	struct irq_info *info = info_for_irq(data->irq);
> +	evtchn_port_t evtchn = info ? info->evtchn : 0;
> +
> +	if (VALID_EVTCHN(evtchn)) {
> +		info->eoi_pending = true;
> +		mask_evtchn(evtchn);
> +	}
> +}
> +
> +static void lateeoi_mask_ack_dynirq(struct irq_data *data)
> +{
> +	struct irq_info *info = info_for_irq(data->irq);
> +	evtchn_port_t evtchn = info ? info->evtchn : 0;
> +
> +	if (VALID_EVTCHN(evtchn)) {
> +		info->masked = true;
> +		info->eoi_pending = true;
> +		mask_evtchn(evtchn);
> +	}
> +}
> +
>  static int retrigger_dynirq(struct irq_data *data)
>  {
>  	evtchn_port_t evtchn = evtchn_from_irq(data->irq);
> @@ -2023,8 +2066,8 @@ static struct irq_chip xen_lateeoi_chip __read_mostly = {
>  	.irq_mask		= disable_dynirq,
>  	.irq_unmask		= enable_dynirq,
>  
> -	.irq_ack		= mask_ack_dynirq,
> -	.irq_mask_ack		= mask_ack_dynirq,
> +	.irq_ack		= lateeoi_ack_dynirq,
> +	.irq_mask_ack		= lateeoi_mask_ack_dynirq,
>  
>  	.irq_set_affinity	= set_affinity_irq,
>  	.irq_retrigger		= retrigger_dynirq,
> 

Unlike the prior handler the two new ones don't call ack_dynirq()
anymore, and the description doesn't give a hint towards this
difference. As a consequence, clear_evtchn() also doesn't get
called anymore - patch 3 adds the calls, but claims an older
commit to have been at fault. _If_ ack_dynirq() indeed isn't to
be called here, shouldn't the clear_evtchn() calls get added
right here?

Jan

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 2/7] xen/events: don't unmask an event channel when an eoi is pending
  2021-02-06 10:49 ` [PATCH 2/7] xen/events: don't unmask an event channel when an eoi is pending Juergen Gross
  2021-02-08 10:06   ` Jan Beulich
@ 2021-02-08 10:15   ` Ross Lagerwall
  1 sibling, 0 replies; 53+ messages in thread
From: Ross Lagerwall @ 2021-02-08 10:15 UTC (permalink / raw)
  To: Juergen Gross, xen-devel, linux-kernel
  Cc: Boris Ostrovsky, Stefano Stabellini, stable, Julien Grall

On 2021-02-06 10:49, Juergen Gross wrote:
> An event channel should be kept masked when an eoi is pending for it.
> When being migrated to another cpu it might be unmasked, though.
> 
> In order to avoid this keep two different flags for each event channel
> to be able to distinguish "normal" masking/unmasking from eoi related
> masking/unmasking. The event channel should only be able to generate
> an interrupt if both flags are cleared.
> 
> Cc: stable@vger.kernel.org
> Fixes: 54c9de89895e0a36047 ("xen/events: add a new late EOI evtchn framework")
> Reported-by: Julien Grall <julien@xen.org>
> Signed-off-by: Juergen Gross <jgross@suse.com>
...> +static void lateeoi_ack_dynirq(struct irq_data *data)
> +{
> +	struct irq_info *info = info_for_irq(data->irq);
> +	evtchn_port_t evtchn = info ? info->evtchn : 0;
> +
> +	if (VALID_EVTCHN(evtchn)) {
> +		info->eoi_pending = true;
> +		mask_evtchn(evtchn);
> +	}
> +}

Doesn't this (and the one below) need a call to clear_evtchn() to
actually ack the pending event? Otherwise I can't see what clears
the pending bit.

I tested out this patch but processes using the userspace evtchn device did
not work very well without the clear_evtchn() call.

Ross

> +
> +static void lateeoi_mask_ack_dynirq(struct irq_data *data)
> +{
> +	struct irq_info *info = info_for_irq(data->irq);
> +	evtchn_port_t evtchn = info ? info->evtchn : 0;
> +
> +	if (VALID_EVTCHN(evtchn)) {
> +		info->masked = true;
> +		info->eoi_pending = true;
> +		mask_evtchn(evtchn);
> +	}
> +}
> +

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 2/7] xen/events: don't unmask an event channel when an eoi is pending
  2021-02-08 10:06   ` Jan Beulich
@ 2021-02-08 10:21     ` Jürgen Groß
  0 siblings, 0 replies; 53+ messages in thread
From: Jürgen Groß @ 2021-02-08 10:21 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Boris Ostrovsky, Stefano Stabellini, stable, Julien Grall,
	xen-devel, linux-kernel


[-- Attachment #1.1.1: Type: text/plain, Size: 2110 bytes --]

On 08.02.21 11:06, Jan Beulich wrote:
> On 06.02.2021 11:49, Juergen Gross wrote:
>> @@ -1798,6 +1818,29 @@ static void mask_ack_dynirq(struct irq_data *data)
>>   	ack_dynirq(data);
>>   }
>>   
>> +static void lateeoi_ack_dynirq(struct irq_data *data)
>> +{
>> +	struct irq_info *info = info_for_irq(data->irq);
>> +	evtchn_port_t evtchn = info ? info->evtchn : 0;
>> +
>> +	if (VALID_EVTCHN(evtchn)) {
>> +		info->eoi_pending = true;
>> +		mask_evtchn(evtchn);
>> +	}
>> +}
>> +
>> +static void lateeoi_mask_ack_dynirq(struct irq_data *data)
>> +{
>> +	struct irq_info *info = info_for_irq(data->irq);
>> +	evtchn_port_t evtchn = info ? info->evtchn : 0;
>> +
>> +	if (VALID_EVTCHN(evtchn)) {
>> +		info->masked = true;
>> +		info->eoi_pending = true;
>> +		mask_evtchn(evtchn);
>> +	}
>> +}
>> +
>>   static int retrigger_dynirq(struct irq_data *data)
>>   {
>>   	evtchn_port_t evtchn = evtchn_from_irq(data->irq);
>> @@ -2023,8 +2066,8 @@ static struct irq_chip xen_lateeoi_chip __read_mostly = {
>>   	.irq_mask		= disable_dynirq,
>>   	.irq_unmask		= enable_dynirq,
>>   
>> -	.irq_ack		= mask_ack_dynirq,
>> -	.irq_mask_ack		= mask_ack_dynirq,
>> +	.irq_ack		= lateeoi_ack_dynirq,
>> +	.irq_mask_ack		= lateeoi_mask_ack_dynirq,
>>   
>>   	.irq_set_affinity	= set_affinity_irq,
>>   	.irq_retrigger		= retrigger_dynirq,
>>
> 
> Unlike the prior handler the two new ones don't call ack_dynirq()
> anymore, and the description doesn't give a hint towards this
> difference. As a consequence, clear_evtchn() also doesn't get
> called anymore - patch 3 adds the calls, but claims an older
> commit to have been at fault. _If_ ack_dynirq() indeed isn't to
> be called here, shouldn't the clear_evtchn() calls get added
> right here?

There was clearly too much time between writing this patch and looking
at its performance impact. :-(

Somehow I managed to overlook that I just introduced the bug here. This
OTOH explains why there are not tons of complaints with the current
implementation. :-)

Will merge patch 3 into this one.


Juergen

[-- Attachment #1.1.2: OpenPGP_0xB0DE9DD628BF132F.asc --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 0/7] xen/events: bug fixes and some diagnostic aids
  2021-02-08  9:54         ` Julien Grall
@ 2021-02-08 10:22           ` Jürgen Groß
  2021-02-08 10:40             ` Julien Grall
  0 siblings, 1 reply; 53+ messages in thread
From: Jürgen Groß @ 2021-02-08 10:22 UTC (permalink / raw)
  To: Julien Grall, xen-devel, linux-kernel, linux-block, netdev, linux-scsi
  Cc: Boris Ostrovsky, Stefano Stabellini, stable,
	Konrad Rzeszutek Wilk, Roger Pau Monné,
	Jens Axboe, Wei Liu, Paul Durrant, David S. Miller,
	Jakub Kicinski


[-- Attachment #1.1.1: Type: text/plain, Size: 5110 bytes --]

On 08.02.21 10:54, Julien Grall wrote:
> 
> 
> On 08/02/2021 09:41, Jürgen Groß wrote:
>> On 08.02.21 10:11, Julien Grall wrote:
>>> Hi Juergen,
>>>
>>> On 07/02/2021 12:58, Jürgen Groß wrote:
>>>> On 06.02.21 19:46, Julien Grall wrote:
>>>>> Hi Juergen,
>>>>>
>>>>> On 06/02/2021 10:49, Juergen Gross wrote:
>>>>>> The first three patches are fixes for XSA-332. The avoid WARN splats
>>>>>> and a performance issue with interdomain events.
>>>>>
>>>>> Thanks for helping to figure out the problem. Unfortunately, I 
>>>>> still see reliably the WARN splat with the latest Linux master 
>>>>> (1e0d27fce010) + your first 3 patches.
>>>>>
>>>>> I am using Xen 4.11 (1c7d984645f9) and dom0 is forced to use the 2L 
>>>>> events ABI.
>>>>>
>>>>> After some debugging, I think I have an idea what's went wrong. The 
>>>>> problem happens when the event is initially bound from vCPU0 to a 
>>>>> different vCPU.
>>>>>
>>>>>  From the comment in xen_rebind_evtchn_to_cpu(), we are masking the 
>>>>> event to prevent it being delivered on an unexpected vCPU. However, 
>>>>> I believe the following can happen:
>>>>>
>>>>> vCPU0                | vCPU1
>>>>>                  |
>>>>>                  | Call xen_rebind_evtchn_to_cpu()
>>>>> receive event X            |
>>>>>                  | mask event X
>>>>>                  | bind to vCPU1
>>>>> <vCPU descheduled>        | unmask event X
>>>>>                  |
>>>>>                  | receive event X
>>>>>                  |
>>>>>                  | handle_edge_irq(X)
>>>>> handle_edge_irq(X)        |  -> handle_irq_event()
>>>>>                  |   -> set IRQD_IN_PROGRESS
>>>>>   -> set IRQS_PENDING        |
>>>>>                  |   -> evtchn_interrupt()
>>>>>                  |   -> clear IRQD_IN_PROGRESS
>>>>>                  |  -> IRQS_PENDING is set
>>>>>                  |  -> handle_irq_event()
>>>>>                  |   -> evtchn_interrupt()
>>>>>                  |     -> WARN()
>>>>>                  |
>>>>>
>>>>> All the lateeoi handlers expect a ONESHOT semantic and 
>>>>> evtchn_interrupt() is doesn't tolerate any deviation.
>>>>>
>>>>> I think the problem was introduced by 7f874a0447a9 ("xen/events: 
>>>>> fix lateeoi irq acknowledgment") because the interrupt was disabled 
>>>>> previously. Therefore we wouldn't do another iteration in 
>>>>> handle_edge_irq().
>>>>
>>>> I think you picked the wrong commit for blaming, as this is just
>>>> the last patch of the three patches you were testing.
>>>
>>> I actually found the right commit for blaming but I copied the 
>>> information from the wrong shell :/. The bug was introduced by:
>>>
>>> c44b849cee8c ("xen/events: switch user event channels to lateeoi model")
>>>
>>>>
>>>>> Aside the handlers, I think it may impact the defer EOI mitigation 
>>>>> because in theory if a 3rd vCPU is joining the party (let say vCPU 
>>>>> A migrate the event from vCPU B to vCPU C). So info->{eoi_cpu, 
>>>>> irq_epoch, eoi_time} could possibly get mangled?
>>>>>
>>>>> For a fix, we may want to consider to hold evtchn_rwlock with the 
>>>>> write permission. Although, I am not 100% sure this is going to 
>>>>> prevent everything.
>>>>
>>>> It will make things worse, as it would violate the locking hierarchy
>>>> (xen_rebind_evtchn_to_cpu() is called with the IRQ-desc lock held).
>>>
>>> Ah, right.
>>>
>>>>
>>>> On a first glance I think we'll need a 3rd masking state ("temporarily
>>>> masked") in the second patch in order to avoid a race with lateeoi.
>>>>
>>>> In order to avoid the race you outlined above we need an "event is 
>>>> being
>>>> handled" indicator checked via test_and_set() semantics in
>>>> handle_irq_for_port() and reset only when calling clear_evtchn().
>>>
>>> It feels like we are trying to workaround the IRQ flow we are using 
>>> (i.e. handle_edge_irq()).
>>
>> I'm not really sure this is the main problem here. According to your
>> analysis the main problem is occurring when handling the event, not when
>> handling the IRQ: the event is being received on two vcpus.
> 
> I don't think we can easily divide the two because we rely on the IRQ 
> framework to handle the lifecycle of the event. So...
> 
>>
>> Our problem isn't due to the IRQ still being pending, but due it being
>> raised again, which should happen for a one shot IRQ the same way.
> 
> ... I don't really see how the difference matter here. The idea is to 
> re-use what's already existing rather than trying to re-invent the wheel 
> with an extra lock (or whatever we can come up).

The difference is that the race is occurring _before_ any IRQ is
involved. So I don't see how modification of IRQ handling would help.


Juergen

[-- Attachment #1.1.2: OpenPGP_0xB0DE9DD628BF132F.asc --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 6/7] xen/evtch: use smp barriers for user event ring
  2021-02-08  9:50     ` Jan Beulich
@ 2021-02-08 10:23       ` Andrew Cooper
  2021-02-08 10:25         ` Jürgen Groß
  2021-02-08 10:36         ` Jan Beulich
  0 siblings, 2 replies; 53+ messages in thread
From: Andrew Cooper @ 2021-02-08 10:23 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Boris Ostrovsky, Stefano Stabellini, Juergen Gross, linux-kernel,
	xen-devel

On 08/02/2021 09:50, Jan Beulich wrote:
> On 08.02.2021 10:44, Andrew Cooper wrote:
>> On 06/02/2021 10:49, Juergen Gross wrote:
>>> The ring buffer for user events is used in the local system only, so
>>> smp barriers are fine for ensuring consistency.
>>>
>>> Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
>>> Signed-off-by: Juergen Gross <jgross@suse.com>
>> These need to be virt_* to not break in UP builds (on non-x86).
> Initially I though so, too, but isn't the sole vCPU of such a
> VM getting re-scheduled to a different pCPU in the hypervisor
> an implied barrier anyway?

Yes, but that isn't relevant to why UP builds break.

smp_*() degrade to compiler barriers in UP builds, and while that's
mostly fine for x86 read/write, its not fine for ARM barriers.

virt_*() exist specifically to be smp_*() which don't degrade to broken
in UP builds.

~Andrew

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 6/7] xen/evtch: use smp barriers for user event ring
  2021-02-08 10:23       ` Andrew Cooper
@ 2021-02-08 10:25         ` Jürgen Groß
  2021-02-08 10:31           ` Andrew Cooper
  2021-02-08 10:36         ` Jan Beulich
  1 sibling, 1 reply; 53+ messages in thread
From: Jürgen Groß @ 2021-02-08 10:25 UTC (permalink / raw)
  To: Andrew Cooper, Jan Beulich
  Cc: Boris Ostrovsky, Stefano Stabellini, linux-kernel, xen-devel


[-- Attachment #1.1.1: Type: text/plain, Size: 1194 bytes --]

On 08.02.21 11:23, Andrew Cooper wrote:
> On 08/02/2021 09:50, Jan Beulich wrote:
>> On 08.02.2021 10:44, Andrew Cooper wrote:
>>> On 06/02/2021 10:49, Juergen Gross wrote:
>>>> The ring buffer for user events is used in the local system only, so
>>>> smp barriers are fine for ensuring consistency.
>>>>
>>>> Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
>>>> Signed-off-by: Juergen Gross <jgross@suse.com>
>>> These need to be virt_* to not break in UP builds (on non-x86).
>> Initially I though so, too, but isn't the sole vCPU of such a
>> VM getting re-scheduled to a different pCPU in the hypervisor
>> an implied barrier anyway?
> 
> Yes, but that isn't relevant to why UP builds break.
> 
> smp_*() degrade to compiler barriers in UP builds, and while that's
> mostly fine for x86 read/write, its not fine for ARM barriers.
> 
> virt_*() exist specifically to be smp_*() which don't degrade to broken
> in UP builds.

But the barrier is really only necessary to serialize accesses within
the guest against each other. There is no guest outside party involved.

In case you are right this would mean that UP guests are all broken on
Arm.


Juergen

[-- Attachment #1.1.2: OpenPGP_0xB0DE9DD628BF132F.asc --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 6/7] xen/evtch: use smp barriers for user event ring
  2021-02-08 10:25         ` Jürgen Groß
@ 2021-02-08 10:31           ` Andrew Cooper
  0 siblings, 0 replies; 53+ messages in thread
From: Andrew Cooper @ 2021-02-08 10:31 UTC (permalink / raw)
  To: Jürgen Groß, Jan Beulich
  Cc: Boris Ostrovsky, Stefano Stabellini, linux-kernel, xen-devel

On 08/02/2021 10:25, Jürgen Groß wrote:
> On 08.02.21 11:23, Andrew Cooper wrote:
>> On 08/02/2021 09:50, Jan Beulich wrote:
>>> On 08.02.2021 10:44, Andrew Cooper wrote:
>>>> On 06/02/2021 10:49, Juergen Gross wrote:
>>>>> The ring buffer for user events is used in the local system only, so
>>>>> smp barriers are fine for ensuring consistency.
>>>>>
>>>>> Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
>>>>> Signed-off-by: Juergen Gross <jgross@suse.com>
>>>> These need to be virt_* to not break in UP builds (on non-x86).
>>> Initially I though so, too, but isn't the sole vCPU of such a
>>> VM getting re-scheduled to a different pCPU in the hypervisor
>>> an implied barrier anyway?
>>
>> Yes, but that isn't relevant to why UP builds break.
>>
>> smp_*() degrade to compiler barriers in UP builds, and while that's
>> mostly fine for x86 read/write, its not fine for ARM barriers.
>>
>> virt_*() exist specifically to be smp_*() which don't degrade to broken
>> in UP builds.
>
> But the barrier is really only necessary to serialize accesses within
> the guest against each other. There is no guest outside party involved.
>
> In case you are right this would mean that UP guests are all broken on
> Arm.

Oh - right.  This is a ring between the interrupt handler and a task. 
Not a ring between the guest and something else.

In which case smp_*() are correct.  Sorry for the noise.

~Andrew

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 6/7] xen/evtch: use smp barriers for user event ring
  2021-02-08 10:23       ` Andrew Cooper
  2021-02-08 10:25         ` Jürgen Groß
@ 2021-02-08 10:36         ` Jan Beulich
  2021-02-08 10:45           ` Andrew Cooper
  1 sibling, 1 reply; 53+ messages in thread
From: Jan Beulich @ 2021-02-08 10:36 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Boris Ostrovsky, Stefano Stabellini, Juergen Gross, linux-kernel,
	xen-devel

On 08.02.2021 11:23, Andrew Cooper wrote:
> On 08/02/2021 09:50, Jan Beulich wrote:
>> On 08.02.2021 10:44, Andrew Cooper wrote:
>>> On 06/02/2021 10:49, Juergen Gross wrote:
>>>> The ring buffer for user events is used in the local system only, so
>>>> smp barriers are fine for ensuring consistency.
>>>>
>>>> Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
>>>> Signed-off-by: Juergen Gross <jgross@suse.com>
>>> These need to be virt_* to not break in UP builds (on non-x86).
>> Initially I though so, too, but isn't the sole vCPU of such a
>> VM getting re-scheduled to a different pCPU in the hypervisor
>> an implied barrier anyway?
> 
> Yes, but that isn't relevant to why UP builds break.
> 
> smp_*() degrade to compiler barriers in UP builds, and while that's
> mostly fine for x86 read/write, its not fine for ARM barriers.

Hmm, I may not know enough of Arm's memory model - are you saying
Arm CPUs aren't even self-coherent, i.e. later operations (e.g.
the consuming of ring contents) won't observe earlier ones (the
updating of ring contents) when only a single physical CPU is
involved in all of this? (I did mention the hypervisor level
context switch simply because that's the only way multiple CPUs
can get involved.)

Jan

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 0/7] xen/events: bug fixes and some diagnostic aids
  2021-02-08 10:22           ` Jürgen Groß
@ 2021-02-08 10:40             ` Julien Grall
  2021-02-08 12:14               ` Jürgen Groß
  0 siblings, 1 reply; 53+ messages in thread
From: Julien Grall @ 2021-02-08 10:40 UTC (permalink / raw)
  To: Jürgen Groß,
	xen-devel, linux-kernel, linux-block, netdev, linux-scsi
  Cc: Boris Ostrovsky, Stefano Stabellini, stable,
	Konrad Rzeszutek Wilk, Roger Pau Monné,
	Jens Axboe, Wei Liu, Paul Durrant, David S. Miller,
	Jakub Kicinski

Hi Juergen,

On 08/02/2021 10:22, Jürgen Groß wrote:
> On 08.02.21 10:54, Julien Grall wrote:
>> ... I don't really see how the difference matter here. The idea is to 
>> re-use what's already existing rather than trying to re-invent the 
>> wheel with an extra lock (or whatever we can come up).
> 
> The difference is that the race is occurring _before_ any IRQ is
> involved. So I don't see how modification of IRQ handling would help.

Roughly our current IRQ handling flow (handle_eoi_irq()) looks like:

if ( irq in progress )
{
   set IRQS_PENDING
   return;
}

do
{
   clear IRQS_PENDING
   handle_irq()
} while (IRQS_PENDING is set)

IRQ handling flow like handle_fasteoi_irq() looks like:

if ( irq in progress )
   return;

handle_irq()

The latter flow would catch "spurious" interrupt and ignore them. So it 
would handle nicely the race when changing the event affinity.

Cheers,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 7/7] xen/evtchn: read producer index only once
  2021-02-08  9:48   ` Jan Beulich
@ 2021-02-08 10:41     ` Jürgen Groß
  2021-02-08 10:51       ` Jan Beulich
  0 siblings, 1 reply; 53+ messages in thread
From: Jürgen Groß @ 2021-02-08 10:41 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Boris Ostrovsky, Stefano Stabellini, linux-kernel, xen-devel


[-- Attachment #1.1.1: Type: text/plain, Size: 1512 bytes --]

On 08.02.21 10:48, Jan Beulich wrote:
> On 06.02.2021 11:49, Juergen Gross wrote:
>> In evtchn_read() use READ_ONCE() for reading the producer index in
>> order to avoid the compiler generating multiple accesses.
>>
>> Signed-off-by: Juergen Gross <jgross@suse.com>
>> ---
>>   drivers/xen/evtchn.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/xen/evtchn.c b/drivers/xen/evtchn.c
>> index 421382c73d88..f6b199b597bf 100644
>> --- a/drivers/xen/evtchn.c
>> +++ b/drivers/xen/evtchn.c
>> @@ -211,7 +211,7 @@ static ssize_t evtchn_read(struct file *file, char __user *buf,
>>   			goto unlock_out;
>>   
>>   		c = u->ring_cons;
>> -		p = u->ring_prod;
>> +		p = READ_ONCE(u->ring_prod);
>>   		if (c != p)
>>   			break;
> 
> Why only here and not also in
> 
> 		rc = wait_event_interruptible(u->evtchn_wait,
> 					      u->ring_cons != u->ring_prod);
> 
> or in evtchn_poll()? I understand it's not needed when
> ring_prod_lock is held, but that's not the case in the two
> afaics named places. Plus isn't the same then true for
> ring_cons and ring_cons_mutex, i.e. aren't the two named
> places plus evtchn_interrupt() also in need of READ_ONCE()
> for ring_cons?

The problem solved here is the further processing using "p" multiple
times. p must not be silently replaced with u->ring_prod by the
compiler, so I probably should reword the commit message to say:

... in order to not allow the compiler to refetch p.


Juergen

[-- Attachment #1.1.2: OpenPGP_0xB0DE9DD628BF132F.asc --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 6/7] xen/evtch: use smp barriers for user event ring
  2021-02-08 10:36         ` Jan Beulich
@ 2021-02-08 10:45           ` Andrew Cooper
  0 siblings, 0 replies; 53+ messages in thread
From: Andrew Cooper @ 2021-02-08 10:45 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Boris Ostrovsky, Stefano Stabellini, Juergen Gross, linux-kernel,
	xen-devel

On 08/02/2021 10:36, Jan Beulich wrote:
> On 08.02.2021 11:23, Andrew Cooper wrote:
>> On 08/02/2021 09:50, Jan Beulich wrote:
>>> On 08.02.2021 10:44, Andrew Cooper wrote:
>>>> On 06/02/2021 10:49, Juergen Gross wrote:
>>>>> The ring buffer for user events is used in the local system only, so
>>>>> smp barriers are fine for ensuring consistency.
>>>>>
>>>>> Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
>>>>> Signed-off-by: Juergen Gross <jgross@suse.com>
>>>> These need to be virt_* to not break in UP builds (on non-x86).
>>> Initially I though so, too, but isn't the sole vCPU of such a
>>> VM getting re-scheduled to a different pCPU in the hypervisor
>>> an implied barrier anyway?
>> Yes, but that isn't relevant to why UP builds break.
>>
>> smp_*() degrade to compiler barriers in UP builds, and while that's
>> mostly fine for x86 read/write, its not fine for ARM barriers.
> Hmm, I may not know enough of Arm's memory model - are you saying
> Arm CPUs aren't even self-coherent, i.e. later operations (e.g.
> the consuming of ring contents) won't observe earlier ones (the
> updating of ring contents) when only a single physical CPU is
> involved in all of this? (I did mention the hypervisor level
> context switch simply because that's the only way multiple CPUs
> can get involved.)

In this case, no - see my later reply.  I'd mistaken the two ends of
this ring.  As they're both inside the same guest, its fine.

For cases such as the xenstore/console ring, the semantics required
really are SMP, even if the guest is built UP.  These cases really will
break when smp_rmb() etc degrade to just a compiler barrier on
architectures with weaker semantics than x86.

~Andrew

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 7/7] xen/evtchn: read producer index only once
  2021-02-08 10:41     ` Jürgen Groß
@ 2021-02-08 10:51       ` Jan Beulich
  2021-02-08 10:59         ` Jürgen Groß
  0 siblings, 1 reply; 53+ messages in thread
From: Jan Beulich @ 2021-02-08 10:51 UTC (permalink / raw)
  To: Jürgen Groß
  Cc: Boris Ostrovsky, Stefano Stabellini, linux-kernel, xen-devel

On 08.02.2021 11:41, Jürgen Groß wrote:
> On 08.02.21 10:48, Jan Beulich wrote:
>> On 06.02.2021 11:49, Juergen Gross wrote:
>>> In evtchn_read() use READ_ONCE() for reading the producer index in
>>> order to avoid the compiler generating multiple accesses.
>>>
>>> Signed-off-by: Juergen Gross <jgross@suse.com>
>>> ---
>>>   drivers/xen/evtchn.c | 2 +-
>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/xen/evtchn.c b/drivers/xen/evtchn.c
>>> index 421382c73d88..f6b199b597bf 100644
>>> --- a/drivers/xen/evtchn.c
>>> +++ b/drivers/xen/evtchn.c
>>> @@ -211,7 +211,7 @@ static ssize_t evtchn_read(struct file *file, char __user *buf,
>>>   			goto unlock_out;
>>>   
>>>   		c = u->ring_cons;
>>> -		p = u->ring_prod;
>>> +		p = READ_ONCE(u->ring_prod);
>>>   		if (c != p)
>>>   			break;
>>
>> Why only here and not also in
>>
>> 		rc = wait_event_interruptible(u->evtchn_wait,
>> 					      u->ring_cons != u->ring_prod);
>>
>> or in evtchn_poll()? I understand it's not needed when
>> ring_prod_lock is held, but that's not the case in the two
>> afaics named places. Plus isn't the same then true for
>> ring_cons and ring_cons_mutex, i.e. aren't the two named
>> places plus evtchn_interrupt() also in need of READ_ONCE()
>> for ring_cons?
> 
> The problem solved here is the further processing using "p" multiple
> times. p must not be silently replaced with u->ring_prod by the
> compiler, so I probably should reword the commit message to say:
> 
> ... in order to not allow the compiler to refetch p.

I still wouldn't understand the change (and the lack of
further changes) then: The first further use of p is
outside the loop, alongside one of c. IOW why would c
then not need treating the same as p?

I also still don't see the difference between latching a
value into a local variable vs a "freestanding" access -
neither are guaranteed to result in exactly one memory
access afaict.

And of course there's also our beloved topic of access
tearing here: READ_ONCE() also excludes that (at least as
per its intentions aiui).

Jan

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 7/7] xen/evtchn: read producer index only once
  2021-02-08 10:51       ` Jan Beulich
@ 2021-02-08 10:59         ` Jürgen Groß
  2021-02-08 11:50           ` Julien Grall
  2021-02-08 11:54           ` Jan Beulich
  0 siblings, 2 replies; 53+ messages in thread
From: Jürgen Groß @ 2021-02-08 10:59 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Boris Ostrovsky, Stefano Stabellini, linux-kernel, xen-devel


[-- Attachment #1.1.1: Type: text/plain, Size: 2808 bytes --]

On 08.02.21 11:51, Jan Beulich wrote:
> On 08.02.2021 11:41, Jürgen Groß wrote:
>> On 08.02.21 10:48, Jan Beulich wrote:
>>> On 06.02.2021 11:49, Juergen Gross wrote:
>>>> In evtchn_read() use READ_ONCE() for reading the producer index in
>>>> order to avoid the compiler generating multiple accesses.
>>>>
>>>> Signed-off-by: Juergen Gross <jgross@suse.com>
>>>> ---
>>>>    drivers/xen/evtchn.c | 2 +-
>>>>    1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/xen/evtchn.c b/drivers/xen/evtchn.c
>>>> index 421382c73d88..f6b199b597bf 100644
>>>> --- a/drivers/xen/evtchn.c
>>>> +++ b/drivers/xen/evtchn.c
>>>> @@ -211,7 +211,7 @@ static ssize_t evtchn_read(struct file *file, char __user *buf,
>>>>    			goto unlock_out;
>>>>    
>>>>    		c = u->ring_cons;
>>>> -		p = u->ring_prod;
>>>> +		p = READ_ONCE(u->ring_prod);
>>>>    		if (c != p)
>>>>    			break;
>>>
>>> Why only here and not also in
>>>
>>> 		rc = wait_event_interruptible(u->evtchn_wait,
>>> 					      u->ring_cons != u->ring_prod);
>>>
>>> or in evtchn_poll()? I understand it's not needed when
>>> ring_prod_lock is held, but that's not the case in the two
>>> afaics named places. Plus isn't the same then true for
>>> ring_cons and ring_cons_mutex, i.e. aren't the two named
>>> places plus evtchn_interrupt() also in need of READ_ONCE()
>>> for ring_cons?
>>
>> The problem solved here is the further processing using "p" multiple
>> times. p must not be silently replaced with u->ring_prod by the
>> compiler, so I probably should reword the commit message to say:
>>
>> ... in order to not allow the compiler to refetch p.
> 
> I still wouldn't understand the change (and the lack of
> further changes) then: The first further use of p is
> outside the loop, alongside one of c. IOW why would c
> then not need treating the same as p?

Its value wouldn't change, as ring_cons is being modified only at
the bottom of this function, and nowhere else (apart from the reset
case, but this can't run concurrently due to ring_cons_mutex).

> I also still don't see the difference between latching a
> value into a local variable vs a "freestanding" access -
> neither are guaranteed to result in exactly one memory
> access afaict.

READ_ONCE() is using a pointer to volatile, so any refetching by
the compiler would be a bug.

> And of course there's also our beloved topic of access
> tearing here: READ_ONCE() also excludes that (at least as
> per its intentions aiui).

Yes, but I don't see an urgent need to fix that, as there would
be thousands of accesses in the kernel needing a fix. A compiler
tearing a naturally aligned access into multiple memory accesses
would be rejected as buggy from the kernel community IMO.


Juergen

[-- Attachment #1.1.2: OpenPGP_0xB0DE9DD628BF132F.asc --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 7/7] xen/evtchn: read producer index only once
  2021-02-06 10:49 ` [PATCH 7/7] xen/evtchn: read producer index only once Juergen Gross
  2021-02-08  9:48   ` Jan Beulich
@ 2021-02-08 11:40   ` Julien Grall
  2021-02-08 11:48     ` Jürgen Groß
  1 sibling, 1 reply; 53+ messages in thread
From: Julien Grall @ 2021-02-08 11:40 UTC (permalink / raw)
  To: Juergen Gross, xen-devel, linux-kernel
  Cc: Boris Ostrovsky, Stefano Stabellini



On 06/02/2021 10:49, Juergen Gross wrote:
> In evtchn_read() use READ_ONCE() for reading the producer index in
> order to avoid the compiler generating multiple accesses.
> 
> Signed-off-by: Juergen Gross <jgross@suse.com>
> ---
>   drivers/xen/evtchn.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/xen/evtchn.c b/drivers/xen/evtchn.c
> index 421382c73d88..f6b199b597bf 100644
> --- a/drivers/xen/evtchn.c
> +++ b/drivers/xen/evtchn.c
> @@ -211,7 +211,7 @@ static ssize_t evtchn_read(struct file *file, char __user *buf,
>   			goto unlock_out;
>   
>   		c = u->ring_cons;
> -		p = u->ring_prod;
> +		p = READ_ONCE(u->ring_prod);
For consistency, don't you also need the write side in 
evtchn_interrupt() to use WRITE_ONCE()?

>   		if (c != p)
>   			break;
>   
> 

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 7/7] xen/evtchn: read producer index only once
  2021-02-08 11:40   ` Julien Grall
@ 2021-02-08 11:48     ` Jürgen Groß
  2021-02-08 12:03       ` Julien Grall
  0 siblings, 1 reply; 53+ messages in thread
From: Jürgen Groß @ 2021-02-08 11:48 UTC (permalink / raw)
  To: Julien Grall, xen-devel, linux-kernel; +Cc: Boris Ostrovsky, Stefano Stabellini


[-- Attachment #1.1.1: Type: text/plain, Size: 1077 bytes --]

On 08.02.21 12:40, Julien Grall wrote:
> 
> 
> On 06/02/2021 10:49, Juergen Gross wrote:
>> In evtchn_read() use READ_ONCE() for reading the producer index in
>> order to avoid the compiler generating multiple accesses.
>>
>> Signed-off-by: Juergen Gross <jgross@suse.com>
>> ---
>>   drivers/xen/evtchn.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/xen/evtchn.c b/drivers/xen/evtchn.c
>> index 421382c73d88..f6b199b597bf 100644
>> --- a/drivers/xen/evtchn.c
>> +++ b/drivers/xen/evtchn.c
>> @@ -211,7 +211,7 @@ static ssize_t evtchn_read(struct file *file, char 
>> __user *buf,
>>               goto unlock_out;
>>           c = u->ring_cons;
>> -        p = u->ring_prod;
>> +        p = READ_ONCE(u->ring_prod);
> For consistency, don't you also need the write side in 
> evtchn_interrupt() to use WRITE_ONCE()?

Only in case I'd consider the compiler needing multiple memory
accesses for doing the update (see my reply to Jan's comment on this
patch).


Juergen

[-- Attachment #1.1.2: OpenPGP_0xB0DE9DD628BF132F.asc --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 7/7] xen/evtchn: read producer index only once
  2021-02-08 10:59         ` Jürgen Groß
@ 2021-02-08 11:50           ` Julien Grall
  2021-02-08 11:54           ` Jan Beulich
  1 sibling, 0 replies; 53+ messages in thread
From: Julien Grall @ 2021-02-08 11:50 UTC (permalink / raw)
  To: Jürgen Groß, Jan Beulich
  Cc: Boris Ostrovsky, Stefano Stabellini, linux-kernel, xen-devel



On 08/02/2021 10:59, Jürgen Groß wrote:
> On 08.02.21 11:51, Jan Beulich wrote:
> Yes, but I don't see an urgent need to fix that, as there would
> be thousands of accesses in the kernel needing a fix. A compiler
> tearing a naturally aligned access into multiple memory accesses
> would be rejected as buggy from the kernel community IMO.

I would not be so sure. From lwn [1]:

"In the Linux kernel, tearing of plain C-language loads has been 
observed even given properly aligned and machine-word-sized loads.)"

And for store tearing:

"Note that this tearing can happen even on properly aligned and 
machine-word-sized accesses, and in this particular case, even for 
volatile stores. Some might argue that this behavior constitutes a bug 
in the compiler, but either way it illustrates the perceived value of 
store tearing from a compiler-writer viewpoint. [...] But for properly 
aligned machine-sized stores, WRITE_ONCE() will prevent store tearing."

Cheers,

[1] https://lwn.net/Articles/793253/#Load%20Tearing

> 
> 
> Juergen

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 7/7] xen/evtchn: read producer index only once
  2021-02-08 10:59         ` Jürgen Groß
  2021-02-08 11:50           ` Julien Grall
@ 2021-02-08 11:54           ` Jan Beulich
  2021-02-08 12:15             ` Jürgen Groß
  1 sibling, 1 reply; 53+ messages in thread
From: Jan Beulich @ 2021-02-08 11:54 UTC (permalink / raw)
  To: Jürgen Groß
  Cc: Boris Ostrovsky, Stefano Stabellini, linux-kernel, xen-devel

On 08.02.2021 11:59, Jürgen Groß wrote:
> On 08.02.21 11:51, Jan Beulich wrote:
>> On 08.02.2021 11:41, Jürgen Groß wrote:
>>> On 08.02.21 10:48, Jan Beulich wrote:
>>>> On 06.02.2021 11:49, Juergen Gross wrote:
>>>>> In evtchn_read() use READ_ONCE() for reading the producer index in
>>>>> order to avoid the compiler generating multiple accesses.
>>>>>
>>>>> Signed-off-by: Juergen Gross <jgross@suse.com>
>>>>> ---
>>>>>    drivers/xen/evtchn.c | 2 +-
>>>>>    1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/drivers/xen/evtchn.c b/drivers/xen/evtchn.c
>>>>> index 421382c73d88..f6b199b597bf 100644
>>>>> --- a/drivers/xen/evtchn.c
>>>>> +++ b/drivers/xen/evtchn.c
>>>>> @@ -211,7 +211,7 @@ static ssize_t evtchn_read(struct file *file, char __user *buf,
>>>>>    			goto unlock_out;
>>>>>    
>>>>>    		c = u->ring_cons;
>>>>> -		p = u->ring_prod;
>>>>> +		p = READ_ONCE(u->ring_prod);
>>>>>    		if (c != p)
>>>>>    			break;
>>>>
>>>> Why only here and not also in
>>>>
>>>> 		rc = wait_event_interruptible(u->evtchn_wait,
>>>> 					      u->ring_cons != u->ring_prod);
>>>>
>>>> or in evtchn_poll()? I understand it's not needed when
>>>> ring_prod_lock is held, but that's not the case in the two
>>>> afaics named places. Plus isn't the same then true for
>>>> ring_cons and ring_cons_mutex, i.e. aren't the two named
>>>> places plus evtchn_interrupt() also in need of READ_ONCE()
>>>> for ring_cons?
>>>
>>> The problem solved here is the further processing using "p" multiple
>>> times. p must not be silently replaced with u->ring_prod by the
>>> compiler, so I probably should reword the commit message to say:
>>>
>>> ... in order to not allow the compiler to refetch p.
>>
>> I still wouldn't understand the change (and the lack of
>> further changes) then: The first further use of p is
>> outside the loop, alongside one of c. IOW why would c
>> then not need treating the same as p?
> 
> Its value wouldn't change, as ring_cons is being modified only at
> the bottom of this function, and nowhere else (apart from the reset
> case, but this can't run concurrently due to ring_cons_mutex).
> 
>> I also still don't see the difference between latching a
>> value into a local variable vs a "freestanding" access -
>> neither are guaranteed to result in exactly one memory
>> access afaict.
> 
> READ_ONCE() is using a pointer to volatile, so any refetching by
> the compiler would be a bug.

Of course, but this wasn't my point. I was contrasting

		c = u->ring_cons;
		p = u->ring_prod;

which you change with

		rc = wait_event_interruptible(u->evtchn_wait,
					      u->ring_cons != u->ring_prod);

which you leave alone.

Jan

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 7/7] xen/evtchn: read producer index only once
  2021-02-08 11:48     ` Jürgen Groß
@ 2021-02-08 12:03       ` Julien Grall
  0 siblings, 0 replies; 53+ messages in thread
From: Julien Grall @ 2021-02-08 12:03 UTC (permalink / raw)
  To: Jürgen Groß, xen-devel, linux-kernel
  Cc: Boris Ostrovsky, Stefano Stabellini

Hi Juergen,

On 08/02/2021 11:48, Jürgen Groß wrote:
> On 08.02.21 12:40, Julien Grall wrote:
>>
>>
>> On 06/02/2021 10:49, Juergen Gross wrote:
>>> In evtchn_read() use READ_ONCE() for reading the producer index in
>>> order to avoid the compiler generating multiple accesses.
>>>
>>> Signed-off-by: Juergen Gross <jgross@suse.com>
>>> ---
>>>   drivers/xen/evtchn.c | 2 +-
>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/xen/evtchn.c b/drivers/xen/evtchn.c
>>> index 421382c73d88..f6b199b597bf 100644
>>> --- a/drivers/xen/evtchn.c
>>> +++ b/drivers/xen/evtchn.c
>>> @@ -211,7 +211,7 @@ static ssize_t evtchn_read(struct file *file, 
>>> char __user *buf,
>>>               goto unlock_out;
>>>           c = u->ring_cons;
>>> -        p = u->ring_prod;
>>> +        p = READ_ONCE(u->ring_prod);
>> For consistency, don't you also need the write side in 
>> evtchn_interrupt() to use WRITE_ONCE()?
> 
> Only in case I'd consider the compiler needing multiple memory
> accesses for doing the update (see my reply to Jan's comment on this
> patch).

Right, I have just answered there :). AFAICT, without using 
WRITE_ONCE()/READ_ONCE() there is no guarantee that load/store tearing 
will not happen.

We can continue the conversation there.

Cheers,

> 
> Juergen

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 0/7] xen/events: bug fixes and some diagnostic aids
  2021-02-08 10:40             ` Julien Grall
@ 2021-02-08 12:14               ` Jürgen Groß
  2021-02-08 12:16                 ` Julien Grall
  0 siblings, 1 reply; 53+ messages in thread
From: Jürgen Groß @ 2021-02-08 12:14 UTC (permalink / raw)
  To: Julien Grall, xen-devel, linux-kernel, linux-block, netdev, linux-scsi
  Cc: Boris Ostrovsky, Stefano Stabellini, stable,
	Konrad Rzeszutek Wilk, Roger Pau Monné,
	Jens Axboe, Wei Liu, Paul Durrant, David S. Miller,
	Jakub Kicinski


[-- Attachment #1.1.1: Type: text/plain, Size: 1252 bytes --]

On 08.02.21 11:40, Julien Grall wrote:
> Hi Juergen,
> 
> On 08/02/2021 10:22, Jürgen Groß wrote:
>> On 08.02.21 10:54, Julien Grall wrote:
>>> ... I don't really see how the difference matter here. The idea is to 
>>> re-use what's already existing rather than trying to re-invent the 
>>> wheel with an extra lock (or whatever we can come up).
>>
>> The difference is that the race is occurring _before_ any IRQ is
>> involved. So I don't see how modification of IRQ handling would help.
> 
> Roughly our current IRQ handling flow (handle_eoi_irq()) looks like:
> 
> if ( irq in progress )
> {
>    set IRQS_PENDING
>    return;
> }
> 
> do
> {
>    clear IRQS_PENDING
>    handle_irq()
> } while (IRQS_PENDING is set)
> 
> IRQ handling flow like handle_fasteoi_irq() looks like:
> 
> if ( irq in progress )
>    return;
> 
> handle_irq()
> 
> The latter flow would catch "spurious" interrupt and ignore them. So it 
> would handle nicely the race when changing the event affinity.

Sure? Isn't "irq in progress" being reset way before our "lateeoi" is
issued, thus having the same problem again? And I think we want to keep
the lateeoi behavior in order to be able to control event storms.


Juergen

[-- Attachment #1.1.2: OpenPGP_0xB0DE9DD628BF132F.asc --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 7/7] xen/evtchn: read producer index only once
  2021-02-08 11:54           ` Jan Beulich
@ 2021-02-08 12:15             ` Jürgen Groß
  2021-02-08 12:23               ` Jan Beulich
  0 siblings, 1 reply; 53+ messages in thread
From: Jürgen Groß @ 2021-02-08 12:15 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Boris Ostrovsky, Stefano Stabellini, linux-kernel, xen-devel


[-- Attachment #1.1.1: Type: text/plain, Size: 2943 bytes --]

On 08.02.21 12:54, Jan Beulich wrote:
> On 08.02.2021 11:59, Jürgen Groß wrote:
>> On 08.02.21 11:51, Jan Beulich wrote:
>>> On 08.02.2021 11:41, Jürgen Groß wrote:
>>>> On 08.02.21 10:48, Jan Beulich wrote:
>>>>> On 06.02.2021 11:49, Juergen Gross wrote:
>>>>>> In evtchn_read() use READ_ONCE() for reading the producer index in
>>>>>> order to avoid the compiler generating multiple accesses.
>>>>>>
>>>>>> Signed-off-by: Juergen Gross <jgross@suse.com>
>>>>>> ---
>>>>>>     drivers/xen/evtchn.c | 2 +-
>>>>>>     1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/drivers/xen/evtchn.c b/drivers/xen/evtchn.c
>>>>>> index 421382c73d88..f6b199b597bf 100644
>>>>>> --- a/drivers/xen/evtchn.c
>>>>>> +++ b/drivers/xen/evtchn.c
>>>>>> @@ -211,7 +211,7 @@ static ssize_t evtchn_read(struct file *file, char __user *buf,
>>>>>>     			goto unlock_out;
>>>>>>     
>>>>>>     		c = u->ring_cons;
>>>>>> -		p = u->ring_prod;
>>>>>> +		p = READ_ONCE(u->ring_prod);
>>>>>>     		if (c != p)
>>>>>>     			break;
>>>>>
>>>>> Why only here and not also in
>>>>>
>>>>> 		rc = wait_event_interruptible(u->evtchn_wait,
>>>>> 					      u->ring_cons != u->ring_prod);
>>>>>
>>>>> or in evtchn_poll()? I understand it's not needed when
>>>>> ring_prod_lock is held, but that's not the case in the two
>>>>> afaics named places. Plus isn't the same then true for
>>>>> ring_cons and ring_cons_mutex, i.e. aren't the two named
>>>>> places plus evtchn_interrupt() also in need of READ_ONCE()
>>>>> for ring_cons?
>>>>
>>>> The problem solved here is the further processing using "p" multiple
>>>> times. p must not be silently replaced with u->ring_prod by the
>>>> compiler, so I probably should reword the commit message to say:
>>>>
>>>> ... in order to not allow the compiler to refetch p.
>>>
>>> I still wouldn't understand the change (and the lack of
>>> further changes) then: The first further use of p is
>>> outside the loop, alongside one of c. IOW why would c
>>> then not need treating the same as p?
>>
>> Its value wouldn't change, as ring_cons is being modified only at
>> the bottom of this function, and nowhere else (apart from the reset
>> case, but this can't run concurrently due to ring_cons_mutex).
>>
>>> I also still don't see the difference between latching a
>>> value into a local variable vs a "freestanding" access -
>>> neither are guaranteed to result in exactly one memory
>>> access afaict.
>>
>> READ_ONCE() is using a pointer to volatile, so any refetching by
>> the compiler would be a bug.
> 
> Of course, but this wasn't my point. I was contrasting
> 
> 		c = u->ring_cons;
> 		p = u->ring_prod;
> 
> which you change with
> 
> 		rc = wait_event_interruptible(u->evtchn_wait,
> 					      u->ring_cons != u->ring_prod);
> 
> which you leave alone.

Can you point out which problem might arise from that?


Juergen

[-- Attachment #1.1.2: OpenPGP_0xB0DE9DD628BF132F.asc --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 0/7] xen/events: bug fixes and some diagnostic aids
  2021-02-08 12:14               ` Jürgen Groß
@ 2021-02-08 12:16                 ` Julien Grall
  2021-02-08 12:31                   ` Jürgen Groß
  0 siblings, 1 reply; 53+ messages in thread
From: Julien Grall @ 2021-02-08 12:16 UTC (permalink / raw)
  To: Jürgen Groß,
	xen-devel, linux-kernel, linux-block, netdev, linux-scsi
  Cc: Boris Ostrovsky, Stefano Stabellini, stable,
	Konrad Rzeszutek Wilk, Roger Pau Monné,
	Jens Axboe, Wei Liu, Paul Durrant, David S. Miller,
	Jakub Kicinski



On 08/02/2021 12:14, Jürgen Groß wrote:
> On 08.02.21 11:40, Julien Grall wrote:
>> Hi Juergen,
>>
>> On 08/02/2021 10:22, Jürgen Groß wrote:
>>> On 08.02.21 10:54, Julien Grall wrote:
>>>> ... I don't really see how the difference matter here. The idea is 
>>>> to re-use what's already existing rather than trying to re-invent 
>>>> the wheel with an extra lock (or whatever we can come up).
>>>
>>> The difference is that the race is occurring _before_ any IRQ is
>>> involved. So I don't see how modification of IRQ handling would help.
>>
>> Roughly our current IRQ handling flow (handle_eoi_irq()) looks like:
>>
>> if ( irq in progress )
>> {
>>    set IRQS_PENDING
>>    return;
>> }
>>
>> do
>> {
>>    clear IRQS_PENDING
>>    handle_irq()
>> } while (IRQS_PENDING is set)
>>
>> IRQ handling flow like handle_fasteoi_irq() looks like:
>>
>> if ( irq in progress )
>>    return;
>>
>> handle_irq()
>>
>> The latter flow would catch "spurious" interrupt and ignore them. So 
>> it would handle nicely the race when changing the event affinity.
> 
> Sure? Isn't "irq in progress" being reset way before our "lateeoi" is
> issued, thus having the same problem again? 

Sorry I can't parse this.

And I think we want to keep
> the lateeoi behavior in order to be able to control event storms.

I didn't (yet) suggest to remove lateeoi. I only suggest to use a 
different workflow to handle the race with vCPU affinity.

Cheers,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 7/7] xen/evtchn: read producer index only once
  2021-02-08 12:15             ` Jürgen Groß
@ 2021-02-08 12:23               ` Jan Beulich
  2021-02-08 12:26                 ` Jürgen Groß
  0 siblings, 1 reply; 53+ messages in thread
From: Jan Beulich @ 2021-02-08 12:23 UTC (permalink / raw)
  To: Jürgen Groß
  Cc: Boris Ostrovsky, Stefano Stabellini, linux-kernel, xen-devel

On 08.02.2021 13:15, Jürgen Groß wrote:
> On 08.02.21 12:54, Jan Beulich wrote:
>> On 08.02.2021 11:59, Jürgen Groß wrote:
>>> On 08.02.21 11:51, Jan Beulich wrote:
>>>> On 08.02.2021 11:41, Jürgen Groß wrote:
>>>>> On 08.02.21 10:48, Jan Beulich wrote:
>>>>>> On 06.02.2021 11:49, Juergen Gross wrote:
>>>>>>> In evtchn_read() use READ_ONCE() for reading the producer index in
>>>>>>> order to avoid the compiler generating multiple accesses.
>>>>>>>
>>>>>>> Signed-off-by: Juergen Gross <jgross@suse.com>
>>>>>>> ---
>>>>>>>     drivers/xen/evtchn.c | 2 +-
>>>>>>>     1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>>>
>>>>>>> diff --git a/drivers/xen/evtchn.c b/drivers/xen/evtchn.c
>>>>>>> index 421382c73d88..f6b199b597bf 100644
>>>>>>> --- a/drivers/xen/evtchn.c
>>>>>>> +++ b/drivers/xen/evtchn.c
>>>>>>> @@ -211,7 +211,7 @@ static ssize_t evtchn_read(struct file *file, char __user *buf,
>>>>>>>     			goto unlock_out;
>>>>>>>     
>>>>>>>     		c = u->ring_cons;
>>>>>>> -		p = u->ring_prod;
>>>>>>> +		p = READ_ONCE(u->ring_prod);
>>>>>>>     		if (c != p)
>>>>>>>     			break;
>>>>>>
>>>>>> Why only here and not also in
>>>>>>
>>>>>> 		rc = wait_event_interruptible(u->evtchn_wait,
>>>>>> 					      u->ring_cons != u->ring_prod);
>>>>>>
>>>>>> or in evtchn_poll()? I understand it's not needed when
>>>>>> ring_prod_lock is held, but that's not the case in the two
>>>>>> afaics named places. Plus isn't the same then true for
>>>>>> ring_cons and ring_cons_mutex, i.e. aren't the two named
>>>>>> places plus evtchn_interrupt() also in need of READ_ONCE()
>>>>>> for ring_cons?
>>>>>
>>>>> The problem solved here is the further processing using "p" multiple
>>>>> times. p must not be silently replaced with u->ring_prod by the
>>>>> compiler, so I probably should reword the commit message to say:
>>>>>
>>>>> ... in order to not allow the compiler to refetch p.
>>>>
>>>> I still wouldn't understand the change (and the lack of
>>>> further changes) then: The first further use of p is
>>>> outside the loop, alongside one of c. IOW why would c
>>>> then not need treating the same as p?
>>>
>>> Its value wouldn't change, as ring_cons is being modified only at
>>> the bottom of this function, and nowhere else (apart from the reset
>>> case, but this can't run concurrently due to ring_cons_mutex).
>>>
>>>> I also still don't see the difference between latching a
>>>> value into a local variable vs a "freestanding" access -
>>>> neither are guaranteed to result in exactly one memory
>>>> access afaict.
>>>
>>> READ_ONCE() is using a pointer to volatile, so any refetching by
>>> the compiler would be a bug.
>>
>> Of course, but this wasn't my point. I was contrasting
>>
>> 		c = u->ring_cons;
>> 		p = u->ring_prod;
>>
>> which you change with
>>
>> 		rc = wait_event_interruptible(u->evtchn_wait,
>> 					      u->ring_cons != u->ring_prod);
>>
>> which you leave alone.
> 
> Can you point out which problem might arise from that?

Not any particular active one. Yet enhancing some accesses
but not others seems to me like a recipe for new problems
down the road.

Jan

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 7/7] xen/evtchn: read producer index only once
  2021-02-08 12:23               ` Jan Beulich
@ 2021-02-08 12:26                 ` Jürgen Groß
  0 siblings, 0 replies; 53+ messages in thread
From: Jürgen Groß @ 2021-02-08 12:26 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Boris Ostrovsky, Stefano Stabellini, linux-kernel, xen-devel


[-- Attachment #1.1.1: Type: text/plain, Size: 3590 bytes --]

On 08.02.21 13:23, Jan Beulich wrote:
> On 08.02.2021 13:15, Jürgen Groß wrote:
>> On 08.02.21 12:54, Jan Beulich wrote:
>>> On 08.02.2021 11:59, Jürgen Groß wrote:
>>>> On 08.02.21 11:51, Jan Beulich wrote:
>>>>> On 08.02.2021 11:41, Jürgen Groß wrote:
>>>>>> On 08.02.21 10:48, Jan Beulich wrote:
>>>>>>> On 06.02.2021 11:49, Juergen Gross wrote:
>>>>>>>> In evtchn_read() use READ_ONCE() for reading the producer index in
>>>>>>>> order to avoid the compiler generating multiple accesses.
>>>>>>>>
>>>>>>>> Signed-off-by: Juergen Gross <jgross@suse.com>
>>>>>>>> ---
>>>>>>>>      drivers/xen/evtchn.c | 2 +-
>>>>>>>>      1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>>>>
>>>>>>>> diff --git a/drivers/xen/evtchn.c b/drivers/xen/evtchn.c
>>>>>>>> index 421382c73d88..f6b199b597bf 100644
>>>>>>>> --- a/drivers/xen/evtchn.c
>>>>>>>> +++ b/drivers/xen/evtchn.c
>>>>>>>> @@ -211,7 +211,7 @@ static ssize_t evtchn_read(struct file *file, char __user *buf,
>>>>>>>>      			goto unlock_out;
>>>>>>>>      
>>>>>>>>      		c = u->ring_cons;
>>>>>>>> -		p = u->ring_prod;
>>>>>>>> +		p = READ_ONCE(u->ring_prod);
>>>>>>>>      		if (c != p)
>>>>>>>>      			break;
>>>>>>>
>>>>>>> Why only here and not also in
>>>>>>>
>>>>>>> 		rc = wait_event_interruptible(u->evtchn_wait,
>>>>>>> 					      u->ring_cons != u->ring_prod);
>>>>>>>
>>>>>>> or in evtchn_poll()? I understand it's not needed when
>>>>>>> ring_prod_lock is held, but that's not the case in the two
>>>>>>> afaics named places. Plus isn't the same then true for
>>>>>>> ring_cons and ring_cons_mutex, i.e. aren't the two named
>>>>>>> places plus evtchn_interrupt() also in need of READ_ONCE()
>>>>>>> for ring_cons?
>>>>>>
>>>>>> The problem solved here is the further processing using "p" multiple
>>>>>> times. p must not be silently replaced with u->ring_prod by the
>>>>>> compiler, so I probably should reword the commit message to say:
>>>>>>
>>>>>> ... in order to not allow the compiler to refetch p.
>>>>>
>>>>> I still wouldn't understand the change (and the lack of
>>>>> further changes) then: The first further use of p is
>>>>> outside the loop, alongside one of c. IOW why would c
>>>>> then not need treating the same as p?
>>>>
>>>> Its value wouldn't change, as ring_cons is being modified only at
>>>> the bottom of this function, and nowhere else (apart from the reset
>>>> case, but this can't run concurrently due to ring_cons_mutex).
>>>>
>>>>> I also still don't see the difference between latching a
>>>>> value into a local variable vs a "freestanding" access -
>>>>> neither are guaranteed to result in exactly one memory
>>>>> access afaict.
>>>>
>>>> READ_ONCE() is using a pointer to volatile, so any refetching by
>>>> the compiler would be a bug.
>>>
>>> Of course, but this wasn't my point. I was contrasting
>>>
>>> 		c = u->ring_cons;
>>> 		p = u->ring_prod;
>>>
>>> which you change with
>>>
>>> 		rc = wait_event_interruptible(u->evtchn_wait,
>>> 					      u->ring_cons != u->ring_prod);
>>>
>>> which you leave alone.
>>
>> Can you point out which problem might arise from that?
> 
> Not any particular active one. Yet enhancing some accesses
> but not others seems to me like a recipe for new problems
> down the road.

I already reasoned that the usage of READ_ONCE() is due to storing the
value in a local variable which needs to be kept constant during the
following processing (no refetches by the compiler). This reasoning
very clearly doesn't apply to the other accesses.


Juergen

[-- Attachment #1.1.2: OpenPGP_0xB0DE9DD628BF132F.asc --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 0/7] xen/events: bug fixes and some diagnostic aids
  2021-02-08 12:16                 ` Julien Grall
@ 2021-02-08 12:31                   ` Jürgen Groß
  2021-02-08 13:09                     ` Julien Grall
  0 siblings, 1 reply; 53+ messages in thread
From: Jürgen Groß @ 2021-02-08 12:31 UTC (permalink / raw)
  To: Julien Grall, xen-devel, linux-kernel, linux-block, netdev, linux-scsi
  Cc: Boris Ostrovsky, Stefano Stabellini, stable,
	Konrad Rzeszutek Wilk, Roger Pau Monné,
	Jens Axboe, Wei Liu, Paul Durrant, David S. Miller,
	Jakub Kicinski


[-- Attachment #1.1.1: Type: text/plain, Size: 1743 bytes --]

On 08.02.21 13:16, Julien Grall wrote:
> 
> 
> On 08/02/2021 12:14, Jürgen Groß wrote:
>> On 08.02.21 11:40, Julien Grall wrote:
>>> Hi Juergen,
>>>
>>> On 08/02/2021 10:22, Jürgen Groß wrote:
>>>> On 08.02.21 10:54, Julien Grall wrote:
>>>>> ... I don't really see how the difference matter here. The idea is 
>>>>> to re-use what's already existing rather than trying to re-invent 
>>>>> the wheel with an extra lock (or whatever we can come up).
>>>>
>>>> The difference is that the race is occurring _before_ any IRQ is
>>>> involved. So I don't see how modification of IRQ handling would help.
>>>
>>> Roughly our current IRQ handling flow (handle_eoi_irq()) looks like:
>>>
>>> if ( irq in progress )
>>> {
>>>    set IRQS_PENDING
>>>    return;
>>> }
>>>
>>> do
>>> {
>>>    clear IRQS_PENDING
>>>    handle_irq()
>>> } while (IRQS_PENDING is set)
>>>
>>> IRQ handling flow like handle_fasteoi_irq() looks like:
>>>
>>> if ( irq in progress )
>>>    return;
>>>
>>> handle_irq()
>>>
>>> The latter flow would catch "spurious" interrupt and ignore them. So 
>>> it would handle nicely the race when changing the event affinity.
>>
>> Sure? Isn't "irq in progress" being reset way before our "lateeoi" is
>> issued, thus having the same problem again? 
> 
> Sorry I can't parse this.

handle_fasteoi_irq() will do nothing "if ( irq in progress )". When is
this condition being reset again in order to be able to process another
IRQ? I believe this will be the case before our "lateeoi" handling is
becoming active (more precise: when our IRQ handler is returning to
handle_fasteoi_irq()), resulting in the possibility of the same race we
are experiencing now.


Juergen

[-- Attachment #1.1.2: OpenPGP_0xB0DE9DD628BF132F.asc --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 0/7] xen/events: bug fixes and some diagnostic aids
  2021-02-08 12:31                   ` Jürgen Groß
@ 2021-02-08 13:09                     ` Julien Grall
  2021-02-08 13:58                       ` Jürgen Groß
  0 siblings, 1 reply; 53+ messages in thread
From: Julien Grall @ 2021-02-08 13:09 UTC (permalink / raw)
  To: Jürgen Groß,
	xen-devel, linux-kernel, linux-block, netdev, linux-scsi
  Cc: Boris Ostrovsky, Stefano Stabellini, stable,
	Konrad Rzeszutek Wilk, Roger Pau Monné,
	Jens Axboe, Wei Liu, Paul Durrant, David S. Miller,
	Jakub Kicinski

Hi Juergen,

On 08/02/2021 12:31, Jürgen Groß wrote:
> On 08.02.21 13:16, Julien Grall wrote:
>>
>>
>> On 08/02/2021 12:14, Jürgen Groß wrote:
>>> On 08.02.21 11:40, Julien Grall wrote:
>>>> Hi Juergen,
>>>>
>>>> On 08/02/2021 10:22, Jürgen Groß wrote:
>>>>> On 08.02.21 10:54, Julien Grall wrote:
>>>>>> ... I don't really see how the difference matter here. The idea is 
>>>>>> to re-use what's already existing rather than trying to re-invent 
>>>>>> the wheel with an extra lock (or whatever we can come up).
>>>>>
>>>>> The difference is that the race is occurring _before_ any IRQ is
>>>>> involved. So I don't see how modification of IRQ handling would help.
>>>>
>>>> Roughly our current IRQ handling flow (handle_eoi_irq()) looks like:
>>>>
>>>> if ( irq in progress )
>>>> {
>>>>    set IRQS_PENDING
>>>>    return;
>>>> }
>>>>
>>>> do
>>>> {
>>>>    clear IRQS_PENDING
>>>>    handle_irq()
>>>> } while (IRQS_PENDING is set)
>>>>
>>>> IRQ handling flow like handle_fasteoi_irq() looks like:
>>>>
>>>> if ( irq in progress )
>>>>    return;
>>>>
>>>> handle_irq()
>>>>
>>>> The latter flow would catch "spurious" interrupt and ignore them. So 
>>>> it would handle nicely the race when changing the event affinity.
>>>
>>> Sure? Isn't "irq in progress" being reset way before our "lateeoi" is
>>> issued, thus having the same problem again? 
>>
>> Sorry I can't parse this.
> 
> handle_fasteoi_irq() will do nothing "if ( irq in progress )". When is
> this condition being reset again in order to be able to process another
> IRQ?
It is reset after the handler has been called. See handle_irq_event().

> I believe this will be the case before our "lateeoi" handling is
> becoming active (more precise: when our IRQ handler is returning to
> handle_fasteoi_irq()), resulting in the possibility of the same race we
> are experiencing now.

I am a bit confused what you mean by "lateeoi" handling is becoming 
active. Can you clarify?

Note that are are other IRQ flows existing. We should have a look at 
them before trying to fix thing ourself.

Although, the other issue I can see so far is handle_irq_for_port() will 
update info->{eoi_cpu, irq_epoch, eoi_time} without any locking. But it 
is not clear this is what you mean by "becoming active".

Cheers,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 0/7] xen/events: bug fixes and some diagnostic aids
  2021-02-08 13:09                     ` Julien Grall
@ 2021-02-08 13:58                       ` Jürgen Groß
  2021-02-08 14:20                         ` Julien Grall
  0 siblings, 1 reply; 53+ messages in thread
From: Jürgen Groß @ 2021-02-08 13:58 UTC (permalink / raw)
  To: Julien Grall, xen-devel, linux-kernel, linux-block, netdev, linux-scsi
  Cc: Boris Ostrovsky, Stefano Stabellini, stable,
	Konrad Rzeszutek Wilk, Roger Pau Monné,
	Jens Axboe, Wei Liu, Paul Durrant, David S. Miller,
	Jakub Kicinski


[-- Attachment #1.1.1: Type: text/plain, Size: 3399 bytes --]

On 08.02.21 14:09, Julien Grall wrote:
> Hi Juergen,
> 
> On 08/02/2021 12:31, Jürgen Groß wrote:
>> On 08.02.21 13:16, Julien Grall wrote:
>>>
>>>
>>> On 08/02/2021 12:14, Jürgen Groß wrote:
>>>> On 08.02.21 11:40, Julien Grall wrote:
>>>>> Hi Juergen,
>>>>>
>>>>> On 08/02/2021 10:22, Jürgen Groß wrote:
>>>>>> On 08.02.21 10:54, Julien Grall wrote:
>>>>>>> ... I don't really see how the difference matter here. The idea 
>>>>>>> is to re-use what's already existing rather than trying to 
>>>>>>> re-invent the wheel with an extra lock (or whatever we can come up).
>>>>>>
>>>>>> The difference is that the race is occurring _before_ any IRQ is
>>>>>> involved. So I don't see how modification of IRQ handling would help.
>>>>>
>>>>> Roughly our current IRQ handling flow (handle_eoi_irq()) looks like:
>>>>>
>>>>> if ( irq in progress )
>>>>> {
>>>>>    set IRQS_PENDING
>>>>>    return;
>>>>> }
>>>>>
>>>>> do
>>>>> {
>>>>>    clear IRQS_PENDING
>>>>>    handle_irq()
>>>>> } while (IRQS_PENDING is set)
>>>>>
>>>>> IRQ handling flow like handle_fasteoi_irq() looks like:
>>>>>
>>>>> if ( irq in progress )
>>>>>    return;
>>>>>
>>>>> handle_irq()
>>>>>
>>>>> The latter flow would catch "spurious" interrupt and ignore them. 
>>>>> So it would handle nicely the race when changing the event affinity.
>>>>
>>>> Sure? Isn't "irq in progress" being reset way before our "lateeoi" is
>>>> issued, thus having the same problem again? 
>>>
>>> Sorry I can't parse this.
>>
>> handle_fasteoi_irq() will do nothing "if ( irq in progress )". When is
>> this condition being reset again in order to be able to process another
>> IRQ?
> It is reset after the handler has been called. See handle_irq_event().

Right. And for us this is too early, as we want the next IRQ being
handled only after we have called xen_irq_lateeoi().

> 
>> I believe this will be the case before our "lateeoi" handling is
>> becoming active (more precise: when our IRQ handler is returning to
>> handle_fasteoi_irq()), resulting in the possibility of the same race we
>> are experiencing now.
> 
> I am a bit confused what you mean by "lateeoi" handling is becoming 
> active. Can you clarify?

See above: the next call of the handler should be allowed only after
xen_irq_lateeoi() for the IRQ has been called.

If the handler is being called earlier we have the race resulting
in the WARN() splats.

> Note that are are other IRQ flows existing. We should have a look at 
> them before trying to fix thing ourself.

Fine with me, but it either needs to fit all use cases (interdomain,
IPI, real interrupts) or we need to have a per-type IRQ flow.

I think we should fix the issue locally first, then we can start to do
a thorough rework planning. Its not as if the needed changes with the
current flow would be so huge, and I'd really like to have a solution
rather sooner than later. Changing the IRQ flow might have other side
effects which need to be excluded by thorough testing.

> Although, the other issue I can see so far is handle_irq_for_port() will 
> update info->{eoi_cpu, irq_epoch, eoi_time} without any locking. But it 
> is not clear this is what you mean by "becoming active".

As long as a single event can't be handled on multiple cpus at the same
time, there is no locking needed.


Juergen

[-- Attachment #1.1.2: OpenPGP_0xB0DE9DD628BF132F.asc --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 0/7] xen/events: bug fixes and some diagnostic aids
  2021-02-08 13:58                       ` Jürgen Groß
@ 2021-02-08 14:20                         ` Julien Grall
  2021-02-08 14:35                           ` Julien Grall
  2021-02-08 14:50                           ` Jürgen Groß
  0 siblings, 2 replies; 53+ messages in thread
From: Julien Grall @ 2021-02-08 14:20 UTC (permalink / raw)
  To: Jürgen Groß,
	xen-devel, linux-kernel, linux-block, netdev, linux-scsi
  Cc: Boris Ostrovsky, Stefano Stabellini, stable,
	Konrad Rzeszutek Wilk, Roger Pau Monné,
	Jens Axboe, Wei Liu, Paul Durrant, David S. Miller,
	Jakub Kicinski

Hi Juergen,

On 08/02/2021 13:58, Jürgen Groß wrote:
> On 08.02.21 14:09, Julien Grall wrote:
>> Hi Juergen,
>>
>> On 08/02/2021 12:31, Jürgen Groß wrote:
>>> On 08.02.21 13:16, Julien Grall wrote:
>>>>
>>>>
>>>> On 08/02/2021 12:14, Jürgen Groß wrote:
>>>>> On 08.02.21 11:40, Julien Grall wrote:
>>>>>> Hi Juergen,
>>>>>>
>>>>>> On 08/02/2021 10:22, Jürgen Groß wrote:
>>>>>>> On 08.02.21 10:54, Julien Grall wrote:
>>>>>>>> ... I don't really see how the difference matter here. The idea 
>>>>>>>> is to re-use what's already existing rather than trying to 
>>>>>>>> re-invent the wheel with an extra lock (or whatever we can come 
>>>>>>>> up).
>>>>>>>
>>>>>>> The difference is that the race is occurring _before_ any IRQ is
>>>>>>> involved. So I don't see how modification of IRQ handling would 
>>>>>>> help.
>>>>>>
>>>>>> Roughly our current IRQ handling flow (handle_eoi_irq()) looks like:
>>>>>>
>>>>>> if ( irq in progress )
>>>>>> {
>>>>>>    set IRQS_PENDING
>>>>>>    return;
>>>>>> }
>>>>>>
>>>>>> do
>>>>>> {
>>>>>>    clear IRQS_PENDING
>>>>>>    handle_irq()
>>>>>> } while (IRQS_PENDING is set)
>>>>>>
>>>>>> IRQ handling flow like handle_fasteoi_irq() looks like:
>>>>>>
>>>>>> if ( irq in progress )
>>>>>>    return;
>>>>>>
>>>>>> handle_irq()
>>>>>>
>>>>>> The latter flow would catch "spurious" interrupt and ignore them. 
>>>>>> So it would handle nicely the race when changing the event affinity.
>>>>>
>>>>> Sure? Isn't "irq in progress" being reset way before our "lateeoi" is
>>>>> issued, thus having the same problem again? 
>>>>
>>>> Sorry I can't parse this.
>>>
>>> handle_fasteoi_irq() will do nothing "if ( irq in progress )". When is
>>> this condition being reset again in order to be able to process another
>>> IRQ?
>> It is reset after the handler has been called. See handle_irq_event().
> 
> Right. And for us this is too early, as we want the next IRQ being
> handled only after we have called xen_irq_lateeoi().

It is not really the next IRQ here. It is more a spurious IRQ because we 
don't clear & mask the event right away. Instead, it is done later in 
the handling.

> 
>>
>>> I believe this will be the case before our "lateeoi" handling is
>>> becoming active (more precise: when our IRQ handler is returning to
>>> handle_fasteoi_irq()), resulting in the possibility of the same race we
>>> are experiencing now.
>>
>> I am a bit confused what you mean by "lateeoi" handling is becoming 
>> active. Can you clarify?
> 
> See above: the next call of the handler should be allowed only after
> xen_irq_lateeoi() for the IRQ has been called.
> 
> If the handler is being called earlier we have the race resulting
> in the WARN() splats.

I feel it is dislike to understand race with just words. Can you provide 
a scenario (similar to the one I originally provided) with two vCPUs and 
show how this can happen?

> 
>> Note that are are other IRQ flows existing. We should have a look at 
>> them before trying to fix thing ourself.
> 
> Fine with me, but it either needs to fit all use cases (interdomain,
> IPI, real interrupts) or we need to have a per-type IRQ flow.

AFAICT, we already used different flow based on the use cases. Before 
2011, we used to use the fasteoi one but this was changed by the 
following commit:


commit 7e186bdd0098b34c69fb8067c67340ae610ea499
Author: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Date:   Fri May 6 12:27:50 2011 +0100

     xen: do not clear and mask evtchns in __xen_evtchn_do_upcall

     Change the irq handler of evtchns and pirqs that don't need EOI (pirqs
     that correspond to physical edge interrupts) to handle_edge_irq.

     Use handle_fasteoi_irq for pirqs that need eoi (they generally
     correspond to level triggered irqs), no risk in loosing interrupts
     because we have to EOI the irq anyway.

     This change has the following benefits:

     - it uses the very same handlers that Linux would use on native for the
     same irqs (handle_edge_irq for edge irqs and msis, and
     handle_fasteoi_irq for everything else);

     - it uses these handlers in the same way native code would use them: it
     let Linux mask\unmask and ack the irq when Linux want to mask\unmask
     and ack the irq;

     - it fixes a problem occurring when a driver calls disable_irq() in its
     handler: the old code was unconditionally unmasking the evtchn even if
     the irq is disabled when irq_eoi was called.

     See Documentation/DocBook/genericirq.tmpl for more informations.

     Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
     [v1: Fixed space/tab issues]
     Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>


> 
> I think we should fix the issue locally first, then we can start to do
> a thorough rework planning. Its not as if the needed changes with the
> current flow would be so huge, and I'd really like to have a solution
> rather sooner than later. Changing the IRQ flow might have other side
> effects which need to be excluded by thorough testing.
I agree that we need a solution ASAP. But I am a bit worry to:
   1) Add another lock in that event handling path.
   2) Add more complexity in the event handling (it is already fairly 
difficult to reason about the locking/race)

Let see what the local fix look like.

>> Although, the other issue I can see so far is handle_irq_for_port() 
>> will update info->{eoi_cpu, irq_epoch, eoi_time} without any locking. 
>> But it is not clear this is what you mean by "becoming active".
> 
> As long as a single event can't be handled on multiple cpus at the same
> time, there is no locking needed.

Well, it can happen in the current code (see my original scenario). If 
your idea fix it then fine.

Cheers,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 0/7] xen/events: bug fixes and some diagnostic aids
  2021-02-08 14:20                         ` Julien Grall
@ 2021-02-08 14:35                           ` Julien Grall
  2021-02-08 14:50                           ` Jürgen Groß
  1 sibling, 0 replies; 53+ messages in thread
From: Julien Grall @ 2021-02-08 14:35 UTC (permalink / raw)
  To: Jürgen Groß,
	xen-devel, linux-kernel, linux-block, netdev, linux-scsi
  Cc: Boris Ostrovsky, Stefano Stabellini, stable,
	Konrad Rzeszutek Wilk, Roger Pau Monné,
	Jens Axboe, Wei Liu, Paul Durrant, David S. Miller,
	Jakub Kicinski



On 08/02/2021 14:20, Julien Grall wrote:
>>>> I believe this will be the case before our "lateeoi" handling is
>>>> becoming active (more precise: when our IRQ handler is returning to
>>>> handle_fasteoi_irq()), resulting in the possibility of the same race we
>>>> are experiencing now.
>>>
>>> I am a bit confused what you mean by "lateeoi" handling is becoming 
>>> active. Can you clarify?
>>
>> See above: the next call of the handler should be allowed only after
>> xen_irq_lateeoi() for the IRQ has been called.
>>
>> If the handler is being called earlier we have the race resulting
>> in the WARN() splats.
> 
> I feel it is dislike to understand race with just words. Can you provide

Sorry I meant difficult rather than dislike.

Cheers,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 0/7] xen/events: bug fixes and some diagnostic aids
  2021-02-08 14:20                         ` Julien Grall
  2021-02-08 14:35                           ` Julien Grall
@ 2021-02-08 14:50                           ` Jürgen Groß
  1 sibling, 0 replies; 53+ messages in thread
From: Jürgen Groß @ 2021-02-08 14:50 UTC (permalink / raw)
  To: Julien Grall, xen-devel, linux-kernel, linux-block, netdev, linux-scsi
  Cc: Boris Ostrovsky, Stefano Stabellini, stable,
	Konrad Rzeszutek Wilk, Roger Pau Monné,
	Jens Axboe, Wei Liu, Paul Durrant, David S. Miller,
	Jakub Kicinski


[-- Attachment #1.1.1: Type: text/plain, Size: 5598 bytes --]

On 08.02.21 15:20, Julien Grall wrote:
> Hi Juergen,
> 
> On 08/02/2021 13:58, Jürgen Groß wrote:
>> On 08.02.21 14:09, Julien Grall wrote:
>>> Hi Juergen,
>>>
>>> On 08/02/2021 12:31, Jürgen Groß wrote:
>>>> On 08.02.21 13:16, Julien Grall wrote:
>>>>>
>>>>>
>>>>> On 08/02/2021 12:14, Jürgen Groß wrote:
>>>>>> On 08.02.21 11:40, Julien Grall wrote:
>>>>>>> Hi Juergen,
>>>>>>>
>>>>>>> On 08/02/2021 10:22, Jürgen Groß wrote:
>>>>>>>> On 08.02.21 10:54, Julien Grall wrote:
>>>>>>>>> ... I don't really see how the difference matter here. The idea 
>>>>>>>>> is to re-use what's already existing rather than trying to 
>>>>>>>>> re-invent the wheel with an extra lock (or whatever we can come 
>>>>>>>>> up).
>>>>>>>>
>>>>>>>> The difference is that the race is occurring _before_ any IRQ is
>>>>>>>> involved. So I don't see how modification of IRQ handling would 
>>>>>>>> help.
>>>>>>>
>>>>>>> Roughly our current IRQ handling flow (handle_eoi_irq()) looks like:
>>>>>>>
>>>>>>> if ( irq in progress )
>>>>>>> {
>>>>>>>    set IRQS_PENDING
>>>>>>>    return;
>>>>>>> }
>>>>>>>
>>>>>>> do
>>>>>>> {
>>>>>>>    clear IRQS_PENDING
>>>>>>>    handle_irq()
>>>>>>> } while (IRQS_PENDING is set)
>>>>>>>
>>>>>>> IRQ handling flow like handle_fasteoi_irq() looks like:
>>>>>>>
>>>>>>> if ( irq in progress )
>>>>>>>    return;
>>>>>>>
>>>>>>> handle_irq()
>>>>>>>
>>>>>>> The latter flow would catch "spurious" interrupt and ignore them. 
>>>>>>> So it would handle nicely the race when changing the event affinity.
>>>>>>
>>>>>> Sure? Isn't "irq in progress" being reset way before our "lateeoi" is
>>>>>> issued, thus having the same problem again? 
>>>>>
>>>>> Sorry I can't parse this.
>>>>
>>>> handle_fasteoi_irq() will do nothing "if ( irq in progress )". When is
>>>> this condition being reset again in order to be able to process another
>>>> IRQ?
>>> It is reset after the handler has been called. See handle_irq_event().
>>
>> Right. And for us this is too early, as we want the next IRQ being
>> handled only after we have called xen_irq_lateeoi().
> 
> It is not really the next IRQ here. It is more a spurious IRQ because we 
> don't clear & mask the event right away. Instead, it is done later in 
> the handling.
> 
>>
>>>
>>>> I believe this will be the case before our "lateeoi" handling is
>>>> becoming active (more precise: when our IRQ handler is returning to
>>>> handle_fasteoi_irq()), resulting in the possibility of the same race we
>>>> are experiencing now.
>>>
>>> I am a bit confused what you mean by "lateeoi" handling is becoming 
>>> active. Can you clarify?
>>
>> See above: the next call of the handler should be allowed only after
>> xen_irq_lateeoi() for the IRQ has been called.
>>
>> If the handler is being called earlier we have the race resulting
>> in the WARN() splats.
> 
> I feel it is dislike to understand race with just words. Can you provide 
> a scenario (similar to the one I originally provided) with two vCPUs and 
> show how this can happen?

vCPU0                | vCPU1
                      |
                      | Call xen_rebind_evtchn_to_cpu()
receive event X      |
                      | mask event X
                      | bind to vCPU1
<vCPU descheduled>   | unmask event X
                      |
                      | receive event X
                      |
                      | handle_fasteoi_irq(X)
                      |  -> handle_irq_event()
                      |   -> set IRQD_IN_PROGRESS
                      |   -> evtchn_interrupt()
                      |      -> evtchn->enabled = false
                      |   -> clear IRQD_IN_PROGRESS
handle_fasteoi_irq(X)|
-> evtchn_interrupt()|
    -> WARN()         |
                      | xen_irq_lateeoi(X)

> 
>>
>>> Note that are are other IRQ flows existing. We should have a look at 
>>> them before trying to fix thing ourself.
>>
>> Fine with me, but it either needs to fit all use cases (interdomain,
>> IPI, real interrupts) or we need to have a per-type IRQ flow.
> 
> AFAICT, we already used different flow based on the use cases. Before 
> 2011, we used to use the fasteoi one but this was changed by the 
> following commit:

Yes, I know that.

>>
>> I think we should fix the issue locally first, then we can start to do
>> a thorough rework planning. Its not as if the needed changes with the
>> current flow would be so huge, and I'd really like to have a solution
>> rather sooner than later. Changing the IRQ flow might have other side
>> effects which need to be excluded by thorough testing.
> I agree that we need a solution ASAP. But I am a bit worry to:
>    1) Add another lock in that event handling path.

Regarding complexity: it is very simple (just around masking/unmasking
of the event channel). Contention is very unlikely.

>    2) Add more complexity in the event handling (it is already fairly 
> difficult to reason about the locking/race)
> 
> Let see what the local fix look like.

Yes.

> 
>>> Although, the other issue I can see so far is handle_irq_for_port() 
>>> will update info->{eoi_cpu, irq_epoch, eoi_time} without any locking. 
>>> But it is not clear this is what you mean by "becoming active".
>>
>> As long as a single event can't be handled on multiple cpus at the same
>> time, there is no locking needed.
> 
> Well, it can happen in the current code (see my original scenario). If 
> your idea fix it then fine.

I hope so.


Juergen

[-- Attachment #1.1.2: OpenPGP_0xB0DE9DD628BF132F.asc --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 4/7] xen/events: link interdomain events to associated xenbus device
  2021-02-06 10:49 ` [PATCH 4/7] xen/events: link interdomain events to associated xenbus device Juergen Gross
@ 2021-02-08 23:26   ` Boris Ostrovsky
  2021-02-09 13:55   ` Wei Liu
  1 sibling, 0 replies; 53+ messages in thread
From: Boris Ostrovsky @ 2021-02-08 23:26 UTC (permalink / raw)
  To: Juergen Gross, xen-devel, linux-block, linux-kernel, netdev, linux-scsi
  Cc: Konrad Rzeszutek Wilk, Roger Pau Monné,
	Jens Axboe, Wei Liu, Paul Durrant, David S. Miller,
	Jakub Kicinski, Stefano Stabellini


On 2/6/21 5:49 AM, Juergen Gross wrote:
> In order to support the possibility of per-device event channel
> settings (e.g. lateeoi spurious event thresholds) add a xenbus device
> pointer to struct irq_info() and modify the related event channel
> binding interfaces to take the pointer to the xenbus device as a
> parameter instead of the domain id of the other side.
>
> While at it remove the stale prototype of bind_evtchn_to_irq_lateeoi().
>
> Signed-off-by: Juergen Gross <jgross@suse.com>


Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 5/7] xen/events: add per-xenbus device event statistics and settings
  2021-02-06 10:49 ` [PATCH 5/7] xen/events: add per-xenbus device event statistics and settings Juergen Gross
@ 2021-02-08 23:35   ` Boris Ostrovsky
  0 siblings, 0 replies; 53+ messages in thread
From: Boris Ostrovsky @ 2021-02-08 23:35 UTC (permalink / raw)
  To: Juergen Gross, xen-devel, linux-kernel; +Cc: Stefano Stabellini


On 2/6/21 5:49 AM, Juergen Gross wrote:
> Add sysfs nodes for each xenbus device showing event statistics (number
> of events and spurious events, number of associated event channels)
> and for setting a spurious event threshold in case a frontend is
> sending too many events without being rogue on purpose.
>
> Signed-off-by: Juergen Gross <jgross@suse.com>
> ---
>  drivers/xen/events/events_base.c  | 27 ++++++++++++-
>  drivers/xen/xenbus/xenbus_probe.c | 66 +++++++++++++++++++++++++++++++
>  include/xen/xenbus.h              |  7 ++++
>  3 files changed, 98 insertions(+), 2 deletions(-)


This needs Documentation/ABI update.


-boris


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 4/7] xen/events: link interdomain events to associated xenbus device
  2021-02-06 10:49 ` [PATCH 4/7] xen/events: link interdomain events to associated xenbus device Juergen Gross
  2021-02-08 23:26   ` Boris Ostrovsky
@ 2021-02-09 13:55   ` Wei Liu
  1 sibling, 0 replies; 53+ messages in thread
From: Wei Liu @ 2021-02-09 13:55 UTC (permalink / raw)
  To: Juergen Gross
  Cc: xen-devel, linux-block, linux-kernel, netdev, linux-scsi,
	Konrad Rzeszutek Wilk, Roger Pau Monné,
	Jens Axboe, Wei Liu, Paul Durrant, David S. Miller,
	Jakub Kicinski, Boris Ostrovsky, Stefano Stabellini

On Sat, Feb 06, 2021 at 11:49:29AM +0100, Juergen Gross wrote:
> In order to support the possibility of per-device event channel
> settings (e.g. lateeoi spurious event thresholds) add a xenbus device
> pointer to struct irq_info() and modify the related event channel
> binding interfaces to take the pointer to the xenbus device as a
> parameter instead of the domain id of the other side.
> 
> While at it remove the stale prototype of bind_evtchn_to_irq_lateeoi().
> 
> Signed-off-by: Juergen Gross <jgross@suse.com>

Reviewed-by: Wei Liu <wei.liu@kernel.org>

^ permalink raw reply	[flat|nested] 53+ messages in thread

end of thread, other threads:[~2021-02-09 13:57 UTC | newest]

Thread overview: 53+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-06 10:49 [PATCH 0/7] xen/events: bug fixes and some diagnostic aids Juergen Gross
2021-02-06 10:49 ` [PATCH 1/7] xen/events: reset affinity of 2-level event initially Juergen Gross
2021-02-06 11:20   ` Julien Grall
2021-02-06 12:09     ` Jürgen Groß
2021-02-06 12:19       ` Julien Grall
2021-02-06 10:49 ` [PATCH 2/7] xen/events: don't unmask an event channel when an eoi is pending Juergen Gross
2021-02-08 10:06   ` Jan Beulich
2021-02-08 10:21     ` Jürgen Groß
2021-02-08 10:15   ` Ross Lagerwall
2021-02-06 10:49 ` [PATCH 3/7] xen/events: fix lateeoi irq acknowledgment Juergen Gross
2021-02-06 10:49 ` [PATCH 4/7] xen/events: link interdomain events to associated xenbus device Juergen Gross
2021-02-08 23:26   ` Boris Ostrovsky
2021-02-09 13:55   ` Wei Liu
2021-02-06 10:49 ` [PATCH 5/7] xen/events: add per-xenbus device event statistics and settings Juergen Gross
2021-02-08 23:35   ` Boris Ostrovsky
2021-02-06 10:49 ` [PATCH 6/7] xen/evtch: use smp barriers for user event ring Juergen Gross
2021-02-08  9:38   ` Jan Beulich
2021-02-08  9:41     ` Jürgen Groß
2021-02-08  9:44   ` Andrew Cooper
2021-02-08  9:50     ` Jan Beulich
2021-02-08 10:23       ` Andrew Cooper
2021-02-08 10:25         ` Jürgen Groß
2021-02-08 10:31           ` Andrew Cooper
2021-02-08 10:36         ` Jan Beulich
2021-02-08 10:45           ` Andrew Cooper
2021-02-06 10:49 ` [PATCH 7/7] xen/evtchn: read producer index only once Juergen Gross
2021-02-08  9:48   ` Jan Beulich
2021-02-08 10:41     ` Jürgen Groß
2021-02-08 10:51       ` Jan Beulich
2021-02-08 10:59         ` Jürgen Groß
2021-02-08 11:50           ` Julien Grall
2021-02-08 11:54           ` Jan Beulich
2021-02-08 12:15             ` Jürgen Groß
2021-02-08 12:23               ` Jan Beulich
2021-02-08 12:26                 ` Jürgen Groß
2021-02-08 11:40   ` Julien Grall
2021-02-08 11:48     ` Jürgen Groß
2021-02-08 12:03       ` Julien Grall
2021-02-06 18:46 ` [PATCH 0/7] xen/events: bug fixes and some diagnostic aids Julien Grall
2021-02-07 12:58   ` Jürgen Groß
2021-02-08  9:11     ` Julien Grall
2021-02-08  9:41       ` Jürgen Groß
2021-02-08  9:54         ` Julien Grall
2021-02-08 10:22           ` Jürgen Groß
2021-02-08 10:40             ` Julien Grall
2021-02-08 12:14               ` Jürgen Groß
2021-02-08 12:16                 ` Julien Grall
2021-02-08 12:31                   ` Jürgen Groß
2021-02-08 13:09                     ` Julien Grall
2021-02-08 13:58                       ` Jürgen Groß
2021-02-08 14:20                         ` Julien Grall
2021-02-08 14:35                           ` Julien Grall
2021-02-08 14:50                           ` Jürgen Groß

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.